💡 Note: This package is meant to be run by Flexor users in order to transform unstructured data (such as ZenDesk tickets, Steam Reviews and more) into structued data in dbt. If you found us on GitHub, please contact us at [email protected] to get access!
Flexor is an Unstructured Transformation Layer, that allows for the creation of gold-standard tables from unstructured data by performing transformations on the raw text.
This package is meant to help Flexor users who are also dbt users perform flex
transformations as part of dbt models. The package contains flexor
macros that can be (re)used across dbt projects. The most useful is flexor.flex
that allows to ask any question on ingested data.
Assuming your data is loaded into a data warehouse, for every ticket and ticket comment, the package exposes 3 types of data:
- Data referenced from the source ordered by comment_timestamp (View).
- Various
flex
transformations on the unstructured data. Each transformation is stored in a separate incremental table. The table columns are
- flex json column (with the flex transformation results)
- flex_id string column (used for joins)
- Statistics on the ticket (Views).
Flexor itself is agnostic, and can be run in any data warehouse - including Amazon Redshift, Google BigQuery, Snowflake, Vertica and more.
This package currently only supports 🧱Databricks and ❄️Snowflake, but more platforms are coming soon!
To get a sense of how Flexor works with real data, take a look at the Zendesk example project.
Note that the ZenDesk package also contains our approach to which questions you should ask about your ZenDesk data, and the exact flex
queries you should use to extract said information.
Wraps FLEX(flex_id, "flex_query")
db function to be more suitable for dbt.
Requires that src_table
will have a "flex_id" column.
Syntax:
flexor.flex(src_table, flex_query, cache_mode=True, online_mode=True) -> json
Example:
flexor.flex(ref('train_review'), 'Is it slow?')
Notes:
cache_mode
- if set to false, even in incremental mode, re-run the queryonline_mode
- if set to false, never run real transformation and use only incremental (cached) results
Converts classification results to boolean
Syntax:
flexor.answer(flex_json) -> bool
Example:
flexor.answer(flexor.flex(ref('train_review'), 'Is it slow?'))
Converts categorization results to a nullable string.
Syntax:
flexor.category(flex_json) -> string | null
Converts prediction results to a nullable string.
Syntax:
flexor.prediction(flex_json) -> string | null
Exploration macros - great for views to go over the data.
Count and aggregates predictions of a flexor.flex
model. Can split based on reference_table_fields
Syntax:
flexor.prediction_statistics(flex_table, reference_table=null, reference_table_fields=null, filter_nulls=true)
Example:
{{ flexor.prediction_statistics(ref('train_review_slow')) }}
Another Example:
{{ flexor.prediction_statistics(ref('train_review_slow'), ref('train_review'), ["year", "month"]) }}
Count and aggregates predictions the of multiple flexor.flex
models (including intersections). Can split based on reference_table_fields
.
Syntax:
flexor.predictions_statistics(flex_tables, reference_table=null, reference_table_fields=null)
Example:
{{ flexor.predictions_statistics([
ref('train_review_slow'),
ref('train_review_fun')
]) }}
Count and aggregates categorization of a flexor.flex
model. Can split based on reference_table_fields
Syntax:
flexor.categories_statistics(flex_table, reference_table=null, reference_table_fields=null, filter_nulls=true)
No - but you should have acess to the Flexor platform in order to use it.
The Flexor team actively maintains this package - if you have any questions or want to contribute, simply open a PR or email us at [email protected].
- If you have questions or want to reach out for help, please refer to the GitHub Issue section and create an issue.
- If you want to learn more about Flexor, you can visit our website at Flexor.ai