Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature union schemas #55

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,17 @@
# dbt_mixpanel version.version
# dbt_mixpanel v0.11.0
[PR #53](https://github.com/fivetran/dbt_mixpanel/pull/53) and [PR #55](https://github.com/fivetran/dbt_mixpanel/pull/55) include the following updates:

## Feature Update: Run Package on Unioned Connections
- This release supports running the package on multiple Mixpanel sources at once! See the [README](https://github.com/fivetran/dbt_mixpanel?tab=readme-ov-file#step-3-define-database-and-schema-variables) for details on how to leverage this feature.
- This was achieved through the introduction of new unioning [macros](https://github.com/fivetran/dbt_mixpanel/tree/main/macros/union).

> Please note: This is a **Breaking Change** in that we have a added a new field, `source_relation`, that points to the source connection from which the record originated.
> This `source_relation` field is now part of all generated unique keys.
>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this extra carrot necessary?

> This will **require running a full refresh**.

## Documentation
- Provided missing column yml documentation.
- Added Quickstart model counts to README. ([#56](https://github.com/fivetran/dbt_mixpanel/pull/56))
- Corrected references to connectors and connections in the README. ([#56](https://github.com/fivetran/dbt_mixpanel/pull/56))

Expand Down
69 changes: 65 additions & 4 deletions README.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also address Issue #54 in this PR since it's a fairly small update and quick fix.

Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ The following table provides a detailed list of all tables materialized within t
| [mixpanel__sessions](https://fivetran.github.io/dbt_mixpanel/#!/model/model.mixpanel.mixpanel__sessions) | Each record represents a unique user session, including metrics reflecting the frequency and type of actions taken during the session and any relevant fields from the session's first event. |

### Materialized Models
Each Quickstart transformation job run materializes 6 models if all components of this data model are enabled. This count includes all staging, intermediate, and final models materialized as `view`, `table`, or `incremental`.
Each Quickstart transformation job run materializes 7 models if all components of this data model are enabled (6 if you are running the package on only one Mixpanel connection). This count includes all staging, intermediate, and final models materialized as `view`, `table`, or `incremental`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is just for Quickstart I would say we only show 6. Unioning isn't available on Quickstart yet, so I worry this would confuse users more than help.

<!--section-end-->

## How do I use the dbt package?
Expand Down Expand Up @@ -67,17 +67,19 @@ For **BigQuery** and **Databricks All Purpose Cluster runtime** destinations, we
For **Snowflake**, **Redshift**, and **Postgres** databases, we have chosen `delete+insert` as the default strategy.

> Regardless of strategy, we recommend that users periodically run a `--full-refresh` to ensure a high level of data quality.

### Step 2: Install the package
Include the following mixpanel package version in your `packages.yml` file:
> TIP: Check [dbt Hub](https://hub.getdbt.com/) for the latest installation instructions or [read the dbt docs](https://docs.getdbt.com/docs/package-management) for more information on installing packages.

```yaml
packages:
- package: fivetran/mixpanel
version: [">=0.10.0", "<0.11.0"] # we recommend using ranges to capture non-breaking changes automatically
version: [">=0.11.0", "<0.12.0"] # we recommend using ranges to capture non-breaking changes automatically
```

### Step 3: Define database and schema variables
#### Option A: Single connection
By default, this package runs using your destination and the `mixpanel` schema. If this is not where your Mixpanel data is (for example, if your Mixpanel schema is named `mixpanel_fivetran`), add the following configuration to your root `dbt_project.yml` file:

```yml
Expand All @@ -86,6 +88,65 @@ vars:
mixpanel_schema: your_schema_name
```

#### Option B: Union multiple connections
If you have multiple Mixpanel connections in Fivetran and would like to use this package on all of them simultaneously, we have provided functionality to do so. For each source table, the package will union all of the data together and pass the unioned table into the transformations. The `source_relation` column in each model indicates the origin of each record.

To use this functionality, you will need to set the `mixpanel_sources` variable in your root `dbt_project.yml` file:

```yml
# dbt_project.yml

vars:
mixpanel_sources:
- database: connection_1_destination_name # Likely Required. Default value = target.database
schema: connection_1_schema_name # Likely Required. Default value = 'mixpanel'
name: connection_1_source_name # Required only if following the step in the following subsection

- database: connection_2_destination_name
schema: connection_2_schema_name
name: connection_2_source_name
```

> [!NOTE]
> If you choose to make use of this unioning functionality, you will incur an additional model materialized as a `view`, called `stg_mixpanel__event_tmp`. This extra model is necessary for the proper compilation of our connection-unioning macros.

##### Recommended: Incorporate unioned sources into DAG
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts around collapsing this subsection as it may not be applicable for the majority of users?

> *If you are running the package through [Fivetran Transformations for dbt Core™](https://fivetran.com/docs/transformations/dbt#transformationsfordbtcore), the below step is necessary in order to synchronize model runs with your Mixpanel connections. Alternatively, you may choose to run the package through Fivetran [Quickstart](https://fivetran.com/docs/transformations/quickstart), which would create separate sets of models for each Mixpanel source rather than one set of unioned models.*

By default, this package defines one single-connection source, called `mixpanel`, which will be disabled if you are unioning multiple connections. This means that your DAG will not include your Mixpanel sources, though the package will run successfully.

To properly incorporate all of your Mixpanel connections into your project's DAG:
1. Define each of your sources in a `.yml` file in your project. Utilize the following template for the `source`-level configurations, and, **most importantly**, copy and paste the table and column-level definitions from the package's `src_mixpanel.yml` [file](https://github.com/fivetran/dbt_mixpanel/blob/main/models/staging/src_mixpanel.yml). This package currently only uses the `EVENT` source table.

```yml
# a .yml file in your root project
version: 2

sources:
- name: <name> # ex: Should match name in mixpanel_sources
schema: <schema_name>
database: <database_name>
loader: fivetran
loaded_at_field: _fivetran_synced

freshness: # feel free to adjust to your liking
warn_after: {count: 72, period: hour}
error_after: {count: 168, period: hour}

tables:
- name: event
description: Table of all events tracked by Mixpanel across web, ios, and android platforms.
columns: # copy and paste from mixpanel/models/staging/src_mixpanel.yml - see https://support.atlassian.com/bitbucket-cloud/docs/yaml-anchors/ for how to use &/* anchors to only do so once
```

2. Set the `has_defined_sources` variable (scoped to the `mixpanel` package) to `True`, like such:
```yml
# dbt_project.yml
vars:
mixpanel:
has_defined_sources: true
```

### (Optional) Step 4: Additional configurations
<details open><summary>Collapse/expand details</summary>

Expand Down Expand Up @@ -224,8 +285,8 @@ models:
+schema: my_new_schema_name # leave blank for just the target_schema
```

#### Change the source table references
If an individual source table has a different name than the package expects, add the table name as it appears in your destination to the respective variable:
#### Change the source table references (only if using a single connection)
If an individual source table has a different name than the package expects, add the table name as it appears in your destination to the respective variable. This is not available when running the package on multiple unioned connections.

> IMPORTANT: See this project's [`dbt_project.yml`](https://github.com/fivetran/dbt_mixpanel/blob/main/dbt_project.yml) variable declarations to see the expected names.

Expand Down
3 changes: 2 additions & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
config-version: 2
name: 'mixpanel'
version: '0.10.0'
version: '0.11.0'
require-dbt-version: [">=1.3.0", "<2.0.0"]
models:
mixpanel:
Expand All @@ -23,3 +23,4 @@ vars:
# session_event_criteria: # filter to place on events in order to qualify for sessionization
sessionization_trailing_window: 3 # number of hours to look back at for each mixpanel__sessions run. this allows you to sessionize events that arrive late without requiring a full refresh
session_passthrough_columns: [] # choose event columns to pass through to mixpanel__sessions (values taken from first event of session)
mixpanel_sources: []
2 changes: 1 addition & 1 deletion docs/catalog.json

Large diffs are not rendered by default.

253 changes: 214 additions & 39 deletions docs/index.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/manifest.json

Large diffs are not rendered by default.

1 change: 0 additions & 1 deletion docs/run_results.json

This file was deleted.

10 changes: 5 additions & 5 deletions integration_tests/ci/sample.profiles.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@ integration_tests:
pass: "{{ env_var('CI_REDSHIFT_DBT_PASS') }}"
dbname: "{{ env_var('CI_REDSHIFT_DBT_DBNAME') }}"
port: 5439
schema: mixpanel_integration_tests_2
schema: mixpanel_integration_tests_3
threads: 8
bigquery:
type: bigquery
method: service-account-json
project: 'dbt-package-testing'
schema: mixpanel_integration_tests_2
schema: mixpanel_integration_tests_3
threads: 8
keyfile_json: "{{ env_var('GCLOUD_SERVICE_KEY') | as_native }}"
snowflake:
Expand All @@ -33,7 +33,7 @@ integration_tests:
role: "{{ env_var('CI_SNOWFLAKE_DBT_ROLE') }}"
database: "{{ env_var('CI_SNOWFLAKE_DBT_DATABASE') }}"
warehouse: "{{ env_var('CI_SNOWFLAKE_DBT_WAREHOUSE') }}"
schema: mixpanel_integration_tests_2
schema: mixpanel_integration_tests_3
threads: 8
postgres:
type: postgres
Expand All @@ -42,13 +42,13 @@ integration_tests:
pass: "{{ env_var('CI_POSTGRES_DBT_PASS') }}"
dbname: "{{ env_var('CI_POSTGRES_DBT_DBNAME') }}"
port: 5432
schema: mixpanel_integration_tests_2
schema: mixpanel_integration_tests_3
threads: 8
databricks:
catalog: "{{ env_var('CI_DATABRICKS_DBT_CATALOG') }}"
host: "{{ env_var('CI_DATABRICKS_DBT_HOST') }}"
http_path: "{{ env_var('CI_DATABRICKS_DBT_HTTP_PATH') }}"
schema: mixpanel_integration_tests_2
schema: mixpanel_integration_tests_3
threads: 8
token: "{{ env_var('CI_DATABRICKS_DBT_TOKEN') }}"
type: databricks
Expand Down
14 changes: 11 additions & 3 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'mixpanel_integration_tests'
version: '0.10.0'
version: '0.11.0'
config-version: 2
profile: 'integration_tests'

Expand All @@ -9,10 +9,18 @@ models:
# +schema: "mixpanel_{{ var('directed_schema','dev') }}" ## To be used for validation testing

vars:
mixpanel_schema: mixpanel_integration_tests_2
mixpanel_schema: mixpanel_integration_tests_3

# mixpanel_sources:
# - schema: mixpanel_integration_tests_3
# name: source_3
# - schema: mixpanel_integration_tests_4
# name: source_4

mixpanel:
mixpanel_event_identifier: "event"

has_defined_sources: true

seeds:
mixpanel_integration_tests:
+column_types:
Expand Down
Loading