Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add dbt unit tests #349

Draft
wants to merge 46 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
9d8f0c8
Replaced python unit test with dbt 1.8 unit test
adamribaudo-velir Jun 7, 2024
298f373
refactored unit tests for stg_ga4__session_conversions_daily
adamribaudo-velir Jun 7, 2024
906bec2
update test name
adamribaudo-velir Jun 7, 2024
e994408
Replaced Python unit test with dbt unit test
adamribaudo-velir Jun 22, 2024
34bbab9
variable override working properly
adamribaudo-velir Jun 22, 2024
4b66d1f
using overrides properly
adamribaudo-velir Jun 22, 2024
79f7e27
replaced another unit test
adamribaudo-velir Jun 22, 2024
cecf337
replaced python unit test
adamribaudo-velir Jun 22, 2024
63a7d86
add unit test for stg_ga4__client_key_first_last_pageviews
adamribaudo-velir Jun 22, 2024
6e709db
replace unit test
adamribaudo-velir Jun 22, 2024
9d53c9a
unit test for stg_ga4__sessions_traffic_sources_last_non_direct_daily…
adamribaudo-velir Jun 22, 2024
3425fdf
Add package-lock.yml to .gitignore
davidbooke4 Oct 22, 2024
c3ba7f7
Add vars to dbt_project.yml for testing
davidbooke4 Oct 23, 2024
10456ef
Merge branch 'main' into feature/dbt-unit-tests
davidbooke4 Oct 23, 2024
a1f10df
Add unit tests to stg_ga4__events.yml for the url_parsing macros
davidbooke4 Oct 23, 2024
5972788
Add conditions for cases when event_source is null for session parame…
davidbooke4 Oct 23, 2024
20598fb
Add unit test to stg_ga4__sessions_traffic_sources_daily for testing …
davidbooke4 Oct 23, 2024
282eeee
Add unit test to stg_ga4__user_id_mapping to test the latest mapping …
davidbooke4 Oct 23, 2024
c321197
Add descriptions for unit tests that were missing them
davidbooke4 Oct 23, 2024
8a1796e
Remove python unit tests that have been migrated to dbt unit tests
davidbooke4 Oct 23, 2024
c0aba5f
Add unit test to stg_ga4__events for testing transformations in stg_g…
davidbooke4 Oct 24, 2024
922ba07
Remove todo and example stg_ga4__events unit test files
davidbooke4 Oct 24, 2024
3a4f677
Add sessions_traffic_sources_last_non_direct_daily python unit test back
davidbooke4 Oct 24, 2024
c870130
Comment out unit tests for disabled models
davidbooke4 Oct 24, 2024
7386371
Remove edits from dbt_project.yml
davidbooke4 Oct 24, 2024
76f2c7f
Comment out unit test for sessions_traffic_sources_last_non_direct_da…
davidbooke4 Oct 24, 2024
697bafd
Update unit test section in README
davidbooke4 Oct 24, 2024
616da99
Simplify event_params construction in test_base_to_stg_ga4__events in…
davidbooke4 Oct 24, 2024
653e1ae
Update yml files to use consistent new line convention
davidbooke4 Oct 24, 2024
50ff2e8
update PR template
adamribaudo-velir Oct 24, 2024
68f9f87
Update default channel grouping test to use seed instead of fixture a…
davidbooke4 Oct 25, 2024
a3d9c1e
Comment out unit tests for disabled models
davidbooke4 Oct 28, 2024
1dd415e
Un-comment unit tests
davidbooke4 Oct 29, 2024
4ef2503
Add profiles.yml for Github Actions to execute dbt commands and add .…
davidbooke4 Oct 29, 2024
83bd23b
Add profile and variables to dbt_project.yml so Github Action can run…
davidbooke4 Oct 29, 2024
6f4335e
Add dbt unit tests job to github CI workflow
davidbooke4 Oct 29, 2024
947868d
Remove empty step
davidbooke4 Oct 29, 2024
8c879f7
Add repo to checkout step so PR code is checked out to test adding ne…
davidbooke4 Oct 29, 2024
ca215a1
Change workflow on behavior for testing changes
davidbooke4 Oct 29, 2024
984c311
Remove source project and property ID variables from dbt_project.yml
davidbooke4 Oct 30, 2024
6493051
Use environment variables for project, dataset, property ID when not …
davidbooke4 Oct 30, 2024
637e166
Add conditional logic to allow for use of --empty flag
davidbooke4 Oct 30, 2024
845e1df
Add step to materialize tables/views needed to run dbt unit tests
davidbooke4 Oct 30, 2024
48f2a14
Add environment variables related to project variables so models are …
davidbooke4 Nov 4, 2024
561c3e0
Update models to look for environment variables before project variables
davidbooke4 Nov 5, 2024
c32d699
Add environment variables to CI workflow
davidbooke4 Nov 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ Describe your changes, and why you're making them.
- [ ] I have verified that these changes work locally
- [ ] I have updated the README.md (if applicable)
- [ ] I have added tests & descriptions to my models (and macros if applicable)
- [ ] I have run `dbt test` and `python -m pytest .` to validate existing tests
- [ ] I have run `dbt test` to validate existing tests
47 changes: 46 additions & 1 deletion .github/workflows/run_unit_tests_on_pr.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,20 @@
name: Run Unit Tests on Pull Request

on: [pull_request_target,workflow_dispatch]
# on: [pull_request_target,workflow_dispatch]
on:
push:
branches:
- 'feature/add_dbt_unit_tests'
env:
BIGQUERY_PROJECT: ${{ secrets.BIGQUERY_PROJECT }}
BIGQUERY_PROPERTY_ID: ${{ secrets.BIGQUERY_PROPERTY_ID }}
BIGQUERY_DATASET: ${{ secrets.BIGQUERY_DATASET }}
BIGQUERY_KEYFILE: ./unit_tests/dbt-service-account.json
GA4_CONVERSION_EVENTS: ${{ vars.GA4_CONVERSION_EVENTS }}
GA4_DERIVED_SESSION_PROPERTIES: ${{ vars.GA4_DERIVED_SESSION_PROPERTIES }}
GA4_DERIVED_USER_PROPERTIES: ${{ vars.GA4_DERIVED_USER_PROPERTIES }}
GA4_INCREMENTAL_DAYS: ${{ vars.GA4_INCREMENTAL_DAYS }}
GA4_START_DATE: ${{ vars.GA4_START_DATE }}

jobs:
pytest_run_all:
Expand Down Expand Up @@ -35,3 +47,36 @@ jobs:

- name: Run tests
run: python -m pytest .

run_dbt_unit_tests:
name: Run dbt Unit Tests
runs-on: ubuntu-latest
steps:
- name: Check out
uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.sha }}

- uses: actions/setup-python@v1
with:
python-version: "3.11.x"

- name: Authenticate using service account
run: 'echo "$KEYFILE" > ./unit_tests/dbt-service-account.json'
shell: bash
env:
KEYFILE: ${{ secrets.GCP_BIGQUERY_USER_KEYFILE }}

- name: Install dbt
run: |
pip install dbt-core
pip install dbt-bigquery
dbt deps

- name: Materialize necessary dbt resources
run: |
dbt seed -f
dbt run -s +test_type:unit -f --empty

- name: Run dbt unit tests
run: dbt test -s test_type:unit
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
target/
dbt_packages/
logs/
package-lock.yml
.user.yml

google-cloud-sdk/
unit_tests/.env
Expand Down
21 changes: 20 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -304,7 +304,26 @@ gcloud auth application-default login --scopes=https://www.googleapis.com/auth/b
```
# Unit Testing

This package uses `pytest` as a method of unit testing individual models. More details can be found in the [unit_tests/README.md](unit_tests) folder.
The dbt-ga4 package treats each model and macro as a 'unit' of code. If we fix the input to each unit, we can test that we received the expected output.

This package currently uses a combination of dbt unit tests and `pytest` as a method of unit testing individual models. The remaining `pytest` unit test will be refactored to a dbt unit test when possible - progress on the bug preventing that work can be tracked [here](https://github.com/dbt-labs/dbt-core/issues/10353).

### dbt unit tests

dbt's documentation on unit tests can be found [here](https://docs.getdbt.com/docs/build/unit-tests). Unit tests are performed the same way other types of dbt tests are executed.

Execute a specific test:
```
dbt test -s <test_name>
```
Execute all tests configured for a model:
```
dbt test -s <model_name>
```

### pytest

More details on using `pytest` for unit testing can be found in the [unit_tests/README.md](unit_tests) folder.

# Overriding Default Channel Groupings

Expand Down
2 changes: 2 additions & 0 deletions dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

profile: 'default'

target-path: "target" # directory which will store compiled SQL files
clean-targets: # directories to be removed by `dbt clean`
- "target"
Expand Down
3 changes: 2 additions & 1 deletion macros/base_select.sql
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,8 @@
, ecommerce.transaction_id
, items
, {%- if var('combined_dataset', false) != false %} cast(left(regexp_replace(_table_suffix, r'^(intraday_)?\d{8}', ''), 100) as int64)
{%- else %} {{ var('property_ids')[0] }}
{%- elif var('property_ids', false) != false %} {{ var('property_ids')[0] }}
{%- else %} {{ env_var('BIGQUERY_PROPERTY_ID') }}
{%- endif %} as property_id
{% endmacro %}

Expand Down
2 changes: 1 addition & 1 deletion models/marts/core/dim_ga4__sessions_daily.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{% set partitions_to_replace = ['current_date'] %}
{% for i in range(var('static_incremental_days')) %}
{% for i in range(env_var('GA4_INCREMENTAL_DAYS')|int if env_var('GA4_INCREMENTAL_DAYS', false) else var('static_incremental_days')) %}
{% set partitions_to_replace = partitions_to_replace.append('date_sub(current_date, interval ' + (i+1)|string + ' day)') %}
{% endfor %}
{{
Expand Down
2 changes: 1 addition & 1 deletion models/marts/core/fct_ga4__pages.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{% set partitions_to_replace = ['current_date'] %}
{% for i in range(var('static_incremental_days')) %}
{% for i in range(env_var('GA4_INCREMENTAL_DAYS')|int if env_var('GA4_INCREMENTAL_DAYS', false) else var('static_incremental_days')) %}
{% set partitions_to_replace = partitions_to_replace.append('date_sub(current_date, interval ' + (i+1)|string + ' day)') %}
{% endfor %}
{{
Expand Down
2 changes: 1 addition & 1 deletion models/marts/core/fct_ga4__sessions_daily.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{% set partitions_to_replace = ['current_date'] %}
{% for i in range(var('static_incremental_days')) %}
{% for i in range(env_var('GA4_INCREMENTAL_DAYS')|int if env_var('GA4_INCREMENTAL_DAYS', false) else var('static_incremental_days')) %}
{% set partitions_to_replace = partitions_to_replace.append('date_sub(current_date, interval ' + (i+1)|string + ' day)') %}
{% endfor %}
{{
Expand Down
10 changes: 6 additions & 4 deletions models/staging/base/base_ga4__events.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{% set partitions_to_replace = ['current_date'] %}
{% for i in range(var('static_incremental_days')) %}
{% for i in range(env_var('GA4_INCREMENTAL_DAYS')|int if env_var('GA4_INCREMENTAL_DAYS', false) else var('static_incremental_days')) %}
{% set partitions_to_replace = partitions_to_replace.append('date_sub(current_date, interval ' + (i+1)|string + ' day)') %}
{% endfor %}

Expand All @@ -21,9 +21,11 @@ with source as (
select
{{ ga4.base_select_source() }}
from {{ source('ga4', 'events') }}
where cast(left(replace(_table_suffix, 'intraday_', ''), 8) as int64) >= {{var('start_date')}}
{% if is_incremental() %}
and parse_date('%Y%m%d', left(replace(_table_suffix, 'intraday_', ''), 8)) in ({{ partitions_to_replace | join(',') }})
{% if not flags.EMPTY %}
where cast(left(replace(_table_suffix, 'intraday_', ''), 8) as int64) >= {{ env_var('GA4_START_DATE') if env_var('GA4_START_DATE', false) else var('start_date') }}
{% if is_incremental() %}
and parse_date('%Y%m%d', left(replace(_table_suffix, 'intraday_', ''), 8)) in ({{ partitions_to_replace | join(',') }})
{% endif %}
{% endif %}
),
renamed as (
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{% if not flags.FULL_REFRESH %}
{% set partitions_to_query = ['current_date'] %}
{% for i in range(var('static_incremental_days', 1)) %}
{% for i in range(env_var('GA4_INCREMENTAL_DAYS')|int if env_var('GA4_INCREMENTAL_DAYS', false) else var('static_incremental_days')) %}
{% set partitions_to_query = partitions_to_query.append('date_sub(current_date, interval ' + (i+1)|string + ' day)') %}
{% endfor %}
{% endif %}
Expand Down
6 changes: 4 additions & 2 deletions models/staging/src_ga4.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,13 @@ sources:
- name: ga4
database: | # Source from target.project if multi-property, otherwise source from source_project
{%- if var('combined_dataset', false) != false -%} {{target.project}}
{%- else -%} {{var('source_project')}}
{%- elif var('source_project', false) != false -%} {{var('source_project')}}
{%- else -%} {{env_var('BIGQUERY_PROJECT')}}
{%- endif -%}
schema: | # Source from combined property dataset if set, otherwise source from original GA4 property
{%- if var('combined_dataset', false) != false -%} {{var('combined_dataset')}}
{%- else -%} analytics_{{var('property_ids')[0]}}
{%- elif var('property_ids', false) != false -%} analytics_{{var('property_ids')[0]}}
{%- else -%} analytics_{{env_var('BIGQUERY_PROPERTY_ID')}}
{%- endif -%}
tables:
- name: events
Expand Down
18 changes: 17 additions & 1 deletion models/staging/stg_ga4__client_key_first_last_events.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,20 @@ models:
- name: client_key
description: Hashed combination of user_pseudo_id and stream_id
tests:
- unique
- unique
unit_tests:
- name: test_stg_ga4__client_key_first_last_events
description: Test pulling the first and last event per client key
model: stg_ga4__client_key_first_last_events
given:
- input: ref('stg_ga4__events')
format: csv
rows: |
stream_id,client_key,event_key,event_timestamp
1,IX+OyYJBgjwqML19GB/XIQ==,H06dLW6OhNJJ6SoEPFsSyg==,1661339279816517
1,IX+OyYJBgjwqML19GB/XIQ==,gt1SoAtrxDv33uDGwVeMVA==,1661339279816518
expect:
format: csv
rows: |
client_key,first_event,last_event
IX+OyYJBgjwqML19GB/XIQ==,H06dLW6OhNJJ6SoEPFsSyg==,gt1SoAtrxDv33uDGwVeMVA==
18 changes: 17 additions & 1 deletion models/staging/stg_ga4__client_key_first_last_pageviews.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,20 @@ models:
- name: client_key
description: Hashed combination of user_pseudo_id and stream_id
tests:
- unique
- unique
unit_tests:
- name: test_stg_ga4__client_key_first_last_pageviews
description: Test pulling the first and last page view per client key
model: stg_ga4__client_key_first_last_pageviews
given:
- input: ref('stg_ga4__event_page_view')
format: csv
rows: |
stream_id,client_key,event_key,event_timestamp,page_location
1,IX+OyYJBgjwqML19GB/XIQ==,H06dLW6OhNJJ6SoEPFsSyg==,1661339279816517,A
1,IX+OyYJBgjwqML19GB/XIQ==,gt1SoAtrxDv33uDGwVeMVA==,1661339279816518,B
expect:
format: csv
rows: |
client_key,first_page_view_event_key,last_page_view_event_key,first_page_location,last_page_location
IX+OyYJBgjwqML19GB/XIQ==,H06dLW6OhNJJ6SoEPFsSyg==,gt1SoAtrxDv33uDGwVeMVA==,A,B
2 changes: 1 addition & 1 deletion models/staging/stg_ga4__derived_session_properties.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{{ config(
enabled = true if var('derived_session_properties', false) else false,
enabled = true if var('derived_session_properties', false) or env_var('GA4_DERIVED_SESSION_PROPERTIES', false) else false,
materialized = "table"
) }}

Expand Down
37 changes: 36 additions & 1 deletion models/staging/stg_ga4__derived_session_properties.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,39 @@ models:
columns:
- name: session_key
tests:
- unique
- unique
unit_tests:
- name: test_derived_session_properties
description: Test whether a derived property is successfully retrieved from multiple event payloads
model: stg_ga4__derived_session_properties
given:
- input: ref('stg_ga4__events')
format: sql
rows: |
select
'AAA' as session_key
, 1617691790431476 as event_timestamp
, 'first_visit' as event_name
, ARRAY[STRUCT('my_param' as key, STRUCT(1 as int_value) as value)] as event_params
, ARRAY[STRUCT('my_property' as key, STRUCT('value1' as string_value) as value)] as user_properties
union all
select
'AAA' as session_key
, 1617691790431477 as event_timestamp
, 'first_visit' as event_name
, ARRAY[STRUCT('my_param' as key, STRUCT(2 as int_value) as value)] as event_params
, ARRAY[] as user_properties
union all
select
'BBB' as session_key
, 1617691790431477 as event_timestamp
, 'first_visit' as event_name
, ARRAY[STRUCT('my_param' as key, STRUCT(1 as int_value) as value)] as event_params
, ARRAY[STRUCT('my_property' as key, STRUCT('value2' as string_value) as value)] as user_properties
expect:
format: dict
rows:
- {session_key: AAA, my_derived_property: 2, my_derived_property2: value1}
- {session_key: BBB, my_derived_property: 1, my_derived_property2: value2}
overrides:
vars: {derived_session_properties: [{event_parameter: 'my_param',session_property_name: 'my_derived_property',value_type: 'int_value'},{user_property: 'my_property',session_property_name: 'my_derived_property2',value_type: 'string_value'}]}
4 changes: 2 additions & 2 deletions models/staging/stg_ga4__derived_session_properties_daily.sql
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
{% set partitions_to_replace = ['current_date'] %}
{% for i in range(var('static_incremental_days')) %}
{% for i in range(env_var('GA4_INCREMENTAL_DAYS')|int if env_var('GA4_INCREMENTAL_DAYS', false) else var('static_incremental_days')) %}
{% set partitions_to_replace = partitions_to_replace.append('date_sub(current_date, interval ' + (i+1)|string + ' day)') %}
{% endfor %}
{{
config(
enabled = true if var('derived_session_properties', false) else false,
enabled = true if var('derived_session_properties', false) or env_var('GA4_DERIVED_SESSION_PROPERTIES', false) else false,
materialized = 'incremental',
incremental_strategy = 'insert_overwrite',
tags = ["incremental"],
Expand Down
2 changes: 1 addition & 1 deletion models/staging/stg_ga4__derived_user_properties.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{{ config(
enabled = true if var('derived_user_properties', false) else false,
enabled = true if var('derived_user_properties', false) or env_var('GA4_DERIVED_USER_PROPERTIES', false) else false,
materialized = "table"
) }}

Expand Down
34 changes: 33 additions & 1 deletion models/staging/stg_ga4__derived_user_properties.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,36 @@ models:
- name: client_key
description: Hashed combination of user_pseudo_id and stream_id
tests:
- unique
- unique
unit_tests:
- name: test_derived_user_properties
description: Test whether a derived user property is successfully retrieved from multiple event payloads
model: stg_ga4__derived_user_properties
given:
- input: ref('stg_ga4__events')
format: sql
rows: |
select
'AAA' as client_key
, 1617691790431476 as event_timestamp
, 'first_visit' as event_name
, ARRAY[STRUCT('my_param' as key, STRUCT(1 as int_value) as value)] as event_params
union all
select
'AAA' as client_key
, 1617691790431477 as event_timestamp
, 'first_visit' as event_name
, ARRAY[STRUCT('my_param' as key, STRUCT(2 as int_value) as value)] as event_params
union all
select
'BBB' as client_key
, 1617691790431477 as event_timestamp
, 'first_visit' as event_name
, ARRAY[STRUCT('my_param' as key, STRUCT(1 as int_value) as value)] as event_params
expect:
format: dict
rows:
- {client_key: AAA, my_derived_property: 2}
- {client_key: BBB, my_derived_property: 1}
overrides:
vars: {derived_user_properties: [{event_parameter: 'my_param',user_property_name: 'my_derived_property',value_type: 'int_value'}]}
21 changes: 20 additions & 1 deletion models/staging/stg_ga4__event_to_query_string_params.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,23 @@ version: 2
models:
- name: stg_ga4__event_to_query_string_params
description: This model pivots the query string parameters contained within the event's page_location field to become rows. Each row is a single parameter/value combination contained in a single event's query string.

unit_tests:
- name: test_stg_ga4__event_to_query_string_params
description: Test whether event query strings are flattened for each query string parameter
model: stg_ga4__event_to_query_string_params
given:
- input: ref('stg_ga4__events')
format: csv
rows: |
event_key,page_query_string
aaa,param1=value1&param2=value2
bbb,param1
ccc,param1=
expect:
format: csv
rows: |
event_key,param,value
aaa,param1,value1
aaa,param2,value2
bbb,param1,
ccc,param1,
Loading
Loading