-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature union schemas #55
base: main
Are you sure you want to change the base?
Changes from all commits
1252e5f
41a8422
66aa846
c690e97
aaec378
e3c29b5
48dead0
41c9766
3c5fab6
8972d87
825a384
9b498ac
3044b00
510cbb1
272585e
37ac2a0
6cfefd3
dac847c
ab7e2a0
4822a1e
ec8aa01
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you also address Issue #54 in this PR since it's a fairly small update and quick fix. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -39,7 +39,7 @@ The following table provides a detailed list of all tables materialized within t | |
| [mixpanel__sessions](https://fivetran.github.io/dbt_mixpanel/#!/model/model.mixpanel.mixpanel__sessions) | Each record represents a unique user session, including metrics reflecting the frequency and type of actions taken during the session and any relevant fields from the session's first event. | | ||
|
||
### Materialized Models | ||
Each Quickstart transformation job run materializes 6 models if all components of this data model are enabled. This count includes all staging, intermediate, and final models materialized as `view`, `table`, or `incremental`. | ||
Each Quickstart transformation job run materializes 7 models if all components of this data model are enabled (6 if you are running the package on only one Mixpanel connection). This count includes all staging, intermediate, and final models materialized as `view`, `table`, or `incremental`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since this is just for Quickstart I would say we only show 6. Unioning isn't available on Quickstart yet, so I worry this would confuse users more than help. |
||
<!--section-end--> | ||
|
||
## How do I use the dbt package? | ||
|
@@ -67,17 +67,19 @@ For **BigQuery** and **Databricks All Purpose Cluster runtime** destinations, we | |
For **Snowflake**, **Redshift**, and **Postgres** databases, we have chosen `delete+insert` as the default strategy. | ||
|
||
> Regardless of strategy, we recommend that users periodically run a `--full-refresh` to ensure a high level of data quality. | ||
|
||
### Step 2: Install the package | ||
Include the following mixpanel package version in your `packages.yml` file: | ||
> TIP: Check [dbt Hub](https://hub.getdbt.com/) for the latest installation instructions or [read the dbt docs](https://docs.getdbt.com/docs/package-management) for more information on installing packages. | ||
|
||
```yaml | ||
packages: | ||
- package: fivetran/mixpanel | ||
version: [">=0.10.0", "<0.11.0"] # we recommend using ranges to capture non-breaking changes automatically | ||
version: [">=0.11.0", "<0.12.0"] # we recommend using ranges to capture non-breaking changes automatically | ||
``` | ||
|
||
### Step 3: Define database and schema variables | ||
#### Option A: Single connection | ||
By default, this package runs using your destination and the `mixpanel` schema. If this is not where your Mixpanel data is (for example, if your Mixpanel schema is named `mixpanel_fivetran`), add the following configuration to your root `dbt_project.yml` file: | ||
|
||
```yml | ||
|
@@ -86,6 +88,65 @@ vars: | |
mixpanel_schema: your_schema_name | ||
``` | ||
|
||
#### Option B: Union multiple connections | ||
If you have multiple Mixpanel connections in Fivetran and would like to use this package on all of them simultaneously, we have provided functionality to do so. For each source table, the package will union all of the data together and pass the unioned table into the transformations. The `source_relation` column in each model indicates the origin of each record. | ||
|
||
To use this functionality, you will need to set the `mixpanel_sources` variable in your root `dbt_project.yml` file: | ||
|
||
```yml | ||
# dbt_project.yml | ||
|
||
vars: | ||
mixpanel_sources: | ||
- database: connection_1_destination_name # Likely Required. Default value = target.database | ||
schema: connection_1_schema_name # Likely Required. Default value = 'mixpanel' | ||
name: connection_1_source_name # Required only if following the step in the following subsection | ||
|
||
- database: connection_2_destination_name | ||
schema: connection_2_schema_name | ||
name: connection_2_source_name | ||
``` | ||
|
||
> [!NOTE] | ||
> If you choose to make use of this unioning functionality, you will incur an additional model materialized as a `view`, called `stg_mixpanel__event_tmp`. This extra model is necessary for the proper compilation of our connection-unioning macros. | ||
|
||
##### Recommended: Incorporate unioned sources into DAG | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What are your thoughts around collapsing this subsection as it may not be applicable for the majority of users? |
||
> *If you are running the package through [Fivetran Transformations for dbt Core™](https://fivetran.com/docs/transformations/dbt#transformationsfordbtcore), the below step is necessary in order to synchronize model runs with your Mixpanel connections. Alternatively, you may choose to run the package through Fivetran [Quickstart](https://fivetran.com/docs/transformations/quickstart), which would create separate sets of models for each Mixpanel source rather than one set of unioned models.* | ||
|
||
By default, this package defines one single-connection source, called `mixpanel`, which will be disabled if you are unioning multiple connections. This means that your DAG will not include your Mixpanel sources, though the package will run successfully. | ||
|
||
To properly incorporate all of your Mixpanel connections into your project's DAG: | ||
1. Define each of your sources in a `.yml` file in your project. Utilize the following template for the `source`-level configurations, and, **most importantly**, copy and paste the table and column-level definitions from the package's `src_mixpanel.yml` [file](https://github.com/fivetran/dbt_mixpanel/blob/main/models/staging/src_mixpanel.yml). This package currently only uses the `EVENT` source table. | ||
|
||
```yml | ||
# a .yml file in your root project | ||
version: 2 | ||
|
||
sources: | ||
- name: <name> # ex: Should match name in mixpanel_sources | ||
schema: <schema_name> | ||
database: <database_name> | ||
loader: fivetran | ||
loaded_at_field: _fivetran_synced | ||
|
||
freshness: # feel free to adjust to your liking | ||
warn_after: {count: 72, period: hour} | ||
error_after: {count: 168, period: hour} | ||
|
||
tables: | ||
- name: event | ||
description: Table of all events tracked by Mixpanel across web, ios, and android platforms. | ||
columns: # copy and paste from mixpanel/models/staging/src_mixpanel.yml - see https://support.atlassian.com/bitbucket-cloud/docs/yaml-anchors/ for how to use &/* anchors to only do so once | ||
``` | ||
|
||
2. Set the `has_defined_sources` variable (scoped to the `mixpanel` package) to `True`, like such: | ||
```yml | ||
# dbt_project.yml | ||
vars: | ||
mixpanel: | ||
has_defined_sources: true | ||
``` | ||
|
||
### (Optional) Step 4: Additional configurations | ||
<details open><summary>Collapse/expand details</summary> | ||
|
||
|
@@ -224,8 +285,8 @@ models: | |
+schema: my_new_schema_name # leave blank for just the target_schema | ||
``` | ||
|
||
#### Change the source table references | ||
If an individual source table has a different name than the package expects, add the table name as it appears in your destination to the respective variable: | ||
#### Change the source table references (only if using a single connection) | ||
If an individual source table has a different name than the package expects, add the table name as it appears in your destination to the respective variable. This is not available when running the package on multiple unioned connections. | ||
|
||
> IMPORTANT: See this project's [`dbt_project.yml`](https://github.com/fivetran/dbt_mixpanel/blob/main/dbt_project.yml) variable declarations to see the expected names. | ||
|
||
|
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this extra carrot necessary?