Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/performance enhancement #41

Merged
merged 42 commits into from
Feb 21, 2024
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
29c24b7
Update README.md
fivetran-dejantucakov Dec 5, 2023
3497935
feature/performance-enhancement
fivetran-catfritz Jan 2, 2024
a9119ce
update to incremental
fivetran-catfritz Jan 3, 2024
0de2a07
update to incremental
fivetran-catfritz Jan 3, 2024
1129752
feature/performance-enhancement
fivetran-catfritz Jan 23, 2024
198081e
feature/performance-enhancement
fivetran-catfritz Jan 25, 2024
46fd5c4
feature/performance-enhancement
fivetran-catfritz Jan 25, 2024
18ac897
feature/performance-enhancement
fivetran-catfritz Jan 25, 2024
c6cd6a8
update clustering
fivetran-catfritz Jan 25, 2024
0fd82bf
update clustering
fivetran-catfritz Jan 25, 2024
334010b
Merge pull request #40 from fivetran/fivetran-dejantucakov-patch-1
fivetran-catfritz Jan 26, 2024
f049d87
update changelog & readme
fivetran-catfritz Jan 26, 2024
bcc310d
update ymls
fivetran-catfritz Jan 26, 2024
19901af
update readme
fivetran-catfritz Jan 26, 2024
d1ae3f1
updates
fivetran-catfritz Jan 26, 2024
ba53a3e
update changelog, ymls, regen docs
fivetran-catfritz Jan 30, 2024
00e2ccc
update changelog
fivetran-catfritz Jan 30, 2024
5f374f7
update changelog
fivetran-catfritz Jan 30, 2024
b17a9bd
update lookbacks
fivetran-catfritz Feb 2, 2024
702694b
update lookbacks
fivetran-catfritz Feb 2, 2024
dcb14e7
update lookbacks
fivetran-catfritz Feb 6, 2024
6345dc1
update readme
fivetran-catfritz Feb 6, 2024
2cb15a3
update
fivetran-catfritz Feb 6, 2024
823e2d0
update
fivetran-catfritz Feb 6, 2024
1b66f95
updates
fivetran-catfritz Feb 21, 2024
4b760c0
delete extra macro
fivetran-catfritz Feb 21, 2024
6f4c906
updates
fivetran-catfritz Feb 21, 2024
59e2434
updates
fivetran-catfritz Feb 21, 2024
2fac3fd
Merge pull request #43 from fivetran/feature/test-materializations
fivetran-catfritz Feb 21, 2024
6574992
Merge branch 'main' into feature/performance-enhancement
fivetran-catfritz Feb 21, 2024
5e91f92
update var names
fivetran-catfritz Feb 21, 2024
7b54bdb
update macro
fivetran-catfritz Feb 21, 2024
9bfcffa
remove extra comma
fivetran-catfritz Feb 21, 2024
ee45a9a
Apply suggestions from code review
fivetran-catfritz Feb 21, 2024
ea07fae
Update models/staging/stg_mixpanel__user_event_date_spine.sql
fivetran-catfritz Feb 21, 2024
befd1be
Apply suggestions from code review
fivetran-catfritz Feb 21, 2024
c8fe97d
update models, readme, changelog
fivetran-catfritz Feb 21, 2024
6e4fbc5
update changelog and regen docs
fivetran-catfritz Feb 21, 2024
ed92bba
update yml
fivetran-catfritz Feb 21, 2024
f9ae48d
update changelog
fivetran-catfritz Feb 21, 2024
d818a3a
add autoreleaser
fivetran-catfritz Feb 21, 2024
4f245f0
update changelog
fivetran-catfritz Feb 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 11 additions & 32 deletions .github/PULL_REQUEST_TEMPLATE/maintainer_pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,48 +4,27 @@
**This PR will result in the following new package version:**
<!--- Please add details around your decision for breaking vs non-breaking version upgrade. If this is a breaking change, were backwards-compatible options explored? -->

**Please detail what change(s) this PR introduces and any additional information that should be known during the review of this PR:**
**Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:**
<!--- Copy/paste the CHANGELOG for this version below. -->

## PR Checklist
### Basic Validation
Please acknowledge that you have successfully performed the following commands locally:
- [ ] dbt compile
- [ ] dbt run –full-refresh
- [ ] dbt run
- [ ] dbt test
- [ ] dbt run –vars (if applicable)
- [ ] dbt run –full-refresh && dbt test
- [ ] dbt run (if incremental models are present) && dbt test

Before marking this PR as "ready for review" the following have been applied:
- [ ] The appropriate issue has been linked and tagged
- [ ] You are assigned to the corresponding issue and this PR
- [ ] BuildKite integration tests are passing
- [ ] The appropriate issue has been linked, tagged, and properly assigned.
- [ ] All necessary documentation and version upgrades have been applied.
<!--- Be sure to update the package version in the dbt_project.yml, integration_tests/dbt_project.yml, and README if necessary. -->
- [ ] docs were regenerated (unless this PR does not include any code or yml updates).
- [ ] BuildKite integration tests are passing.
- [ ] Detailed validation steps have been provided below.

### Detailed Validation
Please acknowledge that the following validation checks have been performed prior to marking this PR as "ready for review":
- [ ] You have validated these changes and assure this PR will address the respective Issue/Feature.
- [ ] You are reasonably confident these changes will not impact any other components of this package or any dependent packages.
- [ ] You have provided details below around the validation steps performed to gain confidence in these changes.
Please share any and all of your validation steps:
<!--- Provide the steps you took to validate your changes below. -->

### Standard Updates
Please acknowledge that your PR contains the following standard updates:
- Package versioning has been appropriately indexed in the following locations:
- [ ] indexed within dbt_project.yml
- [ ] indexed within integration_tests/dbt_project.yml
- [ ] CHANGELOG has individual entries for each respective change in this PR
<!--- If there is a parallel upstream change, remember to reference the corresponding CHANGELOG as an individual entry. -->
- [ ] README updates have been applied (if applicable)
<!--- Remember to check the following README locations for common updates. →
<!--- Suggested install range (needed for breaking changes) →
<!--- Dependency matrix is appropriately updated (if applicable) →
<!--- New variable documentation (if applicable) -->
- [ ] DECISIONLOG updates have been updated (if applicable)
- [ ] Appropriate yml documentation has been added (if applicable)

### dbt Docs
Please acknowledge that after the above were all completed the below were applied to your branch:
- [ ] docs were regenerated (unless this PR does not include any code or yml updates)

### If you had to summarize this PR in an emoji, which would it be?
<!--- For a complete list of markdown compatible emojis check our this git repo (https://gist.github.com/rxaviers/7360908) -->
:dancer:
10 changes: 10 additions & 0 deletions .quickstart/quickstart.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
database_key: mixpanel_database
schema_key: mixpanel_schema

dbt_versions: ">=1.3.0 <2.0.0"

destination_configurations:
databricks:
dispatch:
- macro_namespace: dbt_utils
search_order: [ 'spark_utils', 'dbt_utils' ]
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
# dbt_mixpanel v0.9.0
[PR #41](https://github.com/fivetran/dbt_mixpanel/pull/41) includes the following updates:

## 🚨 Breaking Changes 🚨
>Note: This update was made breaking since it will alter the materialization of existing models. While these changes do not necessitate a `--full-refresh`, it may be beneficial if you run into issues with this update.
- Updated models with the following performance improvements:
- Update the incremental strategy for all models to `insert_overwrite` for BigQuery and Databricks and `delete+insert` for all other warehouses.
- Removed `stg_mixpanel__event_tmp` in favor of `stg_mixpanel__event_tmp`, which is now an incremental model. While this will increase storage, this change was made to improve compute.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the second table name is wrong. did you mean:

Removed stg_mixpanel__event_tmp in favor of stg_mixpanel__event, which is now ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yes. 😄


## Feature Updates
- Added `cluster_by` columns to the configs for incremental models. This will benefit Snowflake and BigQuery users.
- Added column `dbt_run_date` to incremental models to improve accuracy and optimize downstream models. This date captures the date a record was added or updated by this package.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you! it's always helpful to have these timestamp columns.

- Added a 7-day look-back to incremental models to accommodate late arriving events.

fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved
# dbt_mixpanel v0.8.0
>Note: If you run into issues with this update, we suggest to try a **full refresh**.
## 🎉 Feature Updates 🎉
Expand Down
16 changes: 7 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,12 @@ dispatch:
```

### Database Incremental Strategies
Some end models in this package are materialized incrementally. We currently use the `merge` strategy as the default strategy for BigQuery, Snowflake, and Databricks databases. For Redshift and Postgres databases, we use `delete+insert` as the default strategy.
Some of the end models in this package are materialized incrementally. We have chosen `insert_overwrite` as the default strategy for **BigQuery** and **Databricks** databases, as it is only available for these dbt adapters. For **Snowflake**, **Redshift**, and **Postgres** databases, we have chosen `delete+insert` as the default strategy.

We recognize there are some limitations with these strategies, particularly around updated records in the past which cause duplicates, and are assessing using a different strategy in the future.
`insert_overwrite` is our preferred incremental strategy because it will be able to properly handle updates to records that exist outside the immediate incremental window. That is, because it leverages partitions, `insert_overwrite` will appropriately update existing rows that have been changed upstream instead of inserting duplicates of them--all without requiring a full table scan.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you define immediate incremental window? if the queries only pull in data from the last X days, do you mean changes that occurred to records >X days ago?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc comments. I am going to rework this tomorrow!


> For either of these strategies, we highly recommend that users periodically run a `--full-refresh` to ensure a high level of data quality.
`delete+insert` is our second-choice as it resembles `insert_overwrite` but lacks partitions. This strategy works most of the time and appropriately handles incremental loads that do not contain changes to past records. However, if a past record has been updated and is outside of the incremental window, `delete+insert` will insert a duplicate record. 😱

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be worth qualifying preferred and second-choice as it depends on the data platform. As an example, Snowflake doesn't support insert_overwrite. I don't think you mean to imply that Snowflake is using a lesser methodology.

https://docs.getdbt.com/docs/build/incremental-models#about-incremental_strategy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jasongroob and this can be a bit misleading. @fivetran-catfritz can you reword this to be more direct as we discussed earlier.

> Because of this, we highly recommend that **Snowflake**, **Redshift**, and **Postgres** users periodically run a `--full-refresh` to ensure a high level of data quality and remove any possible duplicates.

## Step 2: Install the package
Include the following mixpanel package version in your `packages.yml` file:
Expand All @@ -69,7 +70,7 @@ Include the following mixpanel package version in your `packages.yml` file:
```yaml
packages:
- package: fivetran/mixpanel
version: [">=0.8.0", "<0.9.0"] # we recommend using ranges to capture non-breaking changes automatically
version: [">=0.9.0", "<0.10.0"] # we recommend using ranges to capture non-breaking changes automatically
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved
```

## Step 3: Define database and schema variables
Expand All @@ -82,7 +83,6 @@ vars:
```

## (Optional) Step 4: Additional configurations
<details><summary>Expand for configurations</summary>

## Macros
### analyze_funnel [(source)](https://github.com/fivetran/dbt_mixpanel/blob/master/macros/analyze_funnel.sql)
Expand All @@ -98,7 +98,7 @@ The macro takes the following as arguments:
- `event_funnel`: List of event types (not case sensitive).
- Example: `'['play_song', 'stop_song', 'exit']`
- `group_by_column`: (Optional) A column by which you want to segment the funnel (this macro pulls data from the `mixpanel__event` model). The default value is `None`.
- Examaple: `group_by_column = 'country_code'`.
- Example: `group_by_column = 'country_code'`.
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved
- `conversion_criteria`: (Optional) A `WHERE` clause that will be applied when selecting from `mixpanel__event`.
- Example: To limit all events in the funnel to the United States, you'd provide `conversion_criteria = 'country_code = "US"'`. To limit the events to only song play events to the US, you'd input `conversion_criteria = 'country_code = "US"' OR event_type != 'play_song'`.

Expand Down Expand Up @@ -224,7 +224,7 @@ models:
### Change the source table references
If an individual source table has a different name than the package expects, add the table name as it appears in your destination to the respective variable:

> IMPORTANT: See this project's [`dbt_project.yml`](https://github.com/fivetran/dbt_mixpanel_source/blob/main/dbt_project.yml) variable declarations to see the expected names.
> IMPORTANT: See this project's [`dbt_project.yml`](https://github.com/fivetran/dbt_mixpanel/blob/main/dbt_project.yml) variable declarations to see the expected names.

```yml
vars:
Expand All @@ -241,8 +241,6 @@ Events are considered duplicates and consolidated by the package if they contain

This is performed in line with Mixpanel's internal de-duplication process, in which events are de-duped at the end of each day. This means that if an event was triggered during an offline session at 11:59 PM and _resent_ when the user came online at 12:01 AM, these records would _not_ be de-duplicated. This is the case in both Mixpanel and the Mixpanel dbt package.

</details>

## (Optional) Step 5: Orchestrate your models with Fivetran Transformations for dbt Core™
<details><summary>Expand for details</summary>
<br>
Expand Down
2 changes: 1 addition & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
config-version: 2
name: 'mixpanel'
version: '0.8.0'
version: '0.9.0'
require-dbt-version: [">=1.3.0", "<2.0.0"]
models:
mixpanel:
Expand Down
2 changes: 1 addition & 1 deletion docs/catalog.json

Large diffs are not rendered by default.

24 changes: 12 additions & 12 deletions docs/index.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/manifest.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/run_results.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'mixpanel_integration_tests'
version: '0.8.0'
version: '0.9.0'
config-version: 2
profile: 'integration_tests'
vars:
Expand Down
6 changes: 6 additions & 0 deletions macros/date_today.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{% macro date_today(col_name) %}

cast( {{ dbt.date_trunc('day', dbt.current_timestamp_backcompat()) }} as date) as {{ col_name }}
{# cast( '2024-02-06' as date) as {{ col_name }} -- for testing #}

{% endmacro %}
9 changes: 9 additions & 0 deletions macros/lookback.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{% macro lookback(from_date, datepart='day', interval=7, default_start_date='2010-01-01') %}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having '2010-01-01' be a project variable could help if someone wants to have their own custom start date. 14 years is a LOT of data to include by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize calling this value default_start_date is a bit misleading. The 2010-01-01 is only meant as a super cautious failsafe in case the of null values in an incremental run, which I imagine would rarely occur. I'll rename this. The actual start date of events being brought in can be set by variable date_range_start. Thanks for the callout!


coalesce(
(select {{ dbt.dateadd(datepart=datepart, interval=-interval, from_date_or_timestamp=from_date) }}
from {{ this }}),
{{ "'" ~ default_start_date ~ "'" }}
)

{% endmacro %}
1 change: 1 addition & 0 deletions macros/staging_columns.sql
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
{% set columns = [

{"name": "_fivetran_synced", "datatype": dbt.type_timestamp()},
{"name": "_fivetran_id", "datatype": dbt.type_string()},
{"name": "ae_session_length", "datatype": dbt.type_string(), "alias": "app_session_length"},
{"name": "app_build_number", "datatype": dbt.type_string()},

Expand Down
10 changes: 7 additions & 3 deletions models/mixpanel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,21 @@ models:

Default materialization is incremental.

columns:
columns:
- name: unique_event_id
description: >
description: >
Unique ID of the event. Events are de-duped according to Mixpanel's [requirements](https://developer.mixpanel.com/reference/http#event-deduplication).
This is hashed on `insert_id`, `people_id`, `date_day`, and `event_type`
tests:
- unique
- not_null

- name: _fivetran_id
description: >
Hash of `insert_id`, `distinct_id`, and `name` columns.

- name: insert_id
description: >
description: >
Random 16 character string of alphanumeric characters that is unique to an event.
Used to de-duplicate data.

Expand Down
40 changes: 14 additions & 26 deletions models/mixpanel__daily_events.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,15 @@
config(
materialized='incremental',
unique_key='unique_key',
partition_by={'field': 'date_day', 'data_type': 'date'} if target.type not in ('spark','databricks') else ['date_day'],
incremental_strategy = 'merge' if target.type not in ('postgres', 'redshift') else 'delete+insert',
file_format = 'delta'
incremental_strategy='insert_overwrite' if target.type in ('bigquery', 'spark', 'databricks') else 'delete+insert',
partition_by={
"field": "date_day",
"data_type": "date"
} if target.type not in ('spark','databricks')
else ['date_day'],
cluster_by=['date_day', 'event_type'],
file_format='parquet',
on_schema_change='append_new_columns'
)
}}

Expand All @@ -20,13 +26,8 @@ with events as (
from {{ ref('mixpanel__event') }}

{% if is_incremental() %}

-- we look at the most recent 28 days for this model's window functions to compute properly
where date_day >= coalesce( ( select {{ dbt.dateadd(datepart='day', interval=-27, from_date_or_timestamp="max(date_day)") }}
from {{ this }} ), '2010-01-01')

where date_day >= {{ mixpanel.lookback(from_date="max(date_day)", interval=27) }}
{% endif %}

),


Expand All @@ -36,13 +37,8 @@ date_spine as (
from {{ ref('stg_mixpanel__user_event_date_spine') }}

{% if is_incremental() %}

-- look backward for the last 28 days
where date_day >= coalesce((select {{ dbt.dateadd(datepart='day', interval=-27, from_date_or_timestamp="max(date_day)") }}
from {{ this }} ), '2010-01-01')

where date_day >= {{ mixpanel.lookback(from_date="max(date_day)", interval=27) }}
{% endif %}

),

agg_user_events as (
Expand All @@ -55,7 +51,6 @@ agg_user_events as (

from events
group by 1,2,3

),

-- join the spine with event metrics
Expand All @@ -74,7 +69,6 @@ spine_joined as (
on agg_user_events.date_day = date_spine.date_day
and agg_user_events.people_id = date_spine.people_id
and agg_user_events.event_type = date_spine.event_type

),

trailing_events as (
Expand All @@ -89,7 +83,6 @@ trailing_events as (
and number_of_events > 0 as is_repeat_user

from spine_joined

),

agg_event_days as (
Expand All @@ -109,7 +102,6 @@ agg_event_days as (

from trailing_events
group by 1,2

),

final as (
Expand All @@ -127,18 +119,14 @@ final as (
number_of_users - number_of_new_users - number_of_repeat_users as number_of_return_users,
trailing_users_28d,
trailing_users_7d,
event_type || '-' || date_day as unique_key
{{ dbt_utils.generate_surrogate_key(['event_type', 'date_day']) }} as unique_key,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to generate a surrogate key here? The surrogate key will create a hash whereas the previous record was a concatenation of the two records. We are losing some decipherable information if we leverage the surrogate key, although I am not sure if this change was made to work better with the incremental updates.

If we do end up changing this field we will need to update the docs and also call this out as part of a breaking change as this will drastically change the previous results.

What are your thoughts?

fivetran-catfritz marked this conversation as resolved.
Show resolved Hide resolved
{{ mixpanel.date_today('dbt_run_date') }}

from agg_event_days

{% if is_incremental() %}

-- only return the most recent day of data
where date_day >= coalesce( (select max(date_day) from {{ this }} ), '2010-01-01')

where date_day >= {{ mixpanel.lookback(from_date="max(dbt_run_date)") }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't this potentially exclude some late arriving data that occurred prior to the max(dbt_run_date)?

Copy link
Contributor Author

@fivetran-catfritz fivetran-catfritz Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you for catching. I ended up scrapping using dbt_run_date in the incremental strategy and updated to use the 7-day lookback to be where date_day >= {{ mixpanel.mixpanel_lookback(from_date="max(date_day)", interval=var('lookback_window', 7), datepart='day') }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome! nice improvement.

{% endif %}

order by date_day desc, event_type
)

select *
Expand Down
24 changes: 13 additions & 11 deletions models/mixpanel__event.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,31 @@
config(
materialized='incremental',
unique_key='unique_event_id',
partition_by={'field': 'date_day', 'data_type': 'date'} if target.type not in ('spark','databricks') else ['date_day'],
incremental_strategy = 'merge' if target.type not in ('postgres', 'redshift') else 'delete+insert',
file_format = 'delta'
incremental_strategy='insert_overwrite' if target.type in ('bigquery', 'spark', 'databricks') else 'delete+insert',
partition_by={
"field": "date_day",
"data_type": "date"
} if target.type not in ('spark','databricks')
else ['date_day'],
cluster_by=['date_day', 'event_type', 'people_id'],
file_format='parquet',
on_schema_change='append_new_columns'
)
}}

with stg_event as (

select *

from {{ ref('stg_mixpanel__event') }}

where
{% if is_incremental() %}

-- events are only eligible for de-duping if they occurred on the same calendar day
occurred_at >= coalesce((select cast( max(date_day) as {{ dbt.type_timestamp() }} ) from {{ this }} ), '2010-01-01')
{% if is_incremental() %}
dbt_run_date >= {{ mixpanel.lookback(from_date="max(dbt_run_date)", interval=1) }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move the default interval period to a project variable? i assume one day will be enough but maybe there's scenarios where more is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jasongroob that is a great idea. I have have created a variable lookback_windows with a default of value 7 days. In your experience with mixpanel, would that be a more appropriate length?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i think that should be fine. i don't have a good sense of how late new data can arrive.


{% else %}

-- limit date range on the first run / refresh
occurred_at >= {{ "'" ~ var('date_range_start', '2010-01-01') ~ "'" }}

{% endif %}
),

Expand All @@ -51,8 +53,8 @@ pivot_properties as (

select
*
{% if var('event_properties_to_pivot') %},
{{ fivetran_utils.pivot_json_extract(string = 'event_properties', list_of_properties = var('event_properties_to_pivot')) }}
{% if var('event_properties_to_pivot') %}
, {{ fivetran_utils.pivot_json_extract(string = 'event_properties', list_of_properties = var('event_properties_to_pivot')) }}
{% endif %}

from dedupe
Expand Down
Loading