Releases: Velir/dbt-ga4
3.1.0
What's Changed
- Set "Direct" instead "(none)" as default by @clemens7haar in #157
- Macro to facilitate easy creation of custom events + params by @adamribaudo-velir in #154
- Add recommended event model for "select_content" event. by @elyobo in #156
New Contributors
Full Changelog: 3.0.1...3.1.0
3.0.1
v3.0.0
Welcome to dbt-ga4 3.0!
We're excited to share many improvements, optimizations, and new capabilities in v3.0.0. This is a major version change as there are several breaking changes (see below).
Thanks to all the new contributors listed below! If you're interested in contributing, please feel free to open an issue or PR. You can also find @adamribaudo on dbt's Slack instance as 'Adam Ribaudo'.
Summary of updates:
- Better session-level attribution by ignoring
session_start
andfirst_visit
events - Support for multiple GA4 properties (with
daily
frequency, only) - Implementation of dbt's 'adapter dispatch' pattern which allows package users to override package macros
- Mapping between
user_pseudo_ids
anduser_ids
- Better documentation and more descriptive column names
- Several small bug fixes and improvements
Breaking Changes and Upgrade Notes
- With #149, we updated several column names for clarity's sake. This may break downstream models or BI tools
- With #145, we removed the
models/staging/ga4
folder. Anydbt_project.yml
configs that referenced this folder should be removed.
A quick note on breaking changes
It's not lost on me that most releases have included breaking changes. This is a function of both the reality of coming up to speed on Google's imperfectly-documented export process and also this being my first dbt package. I can't promise the avoid breaking changes in the future, but I do plan to be much more conservative in releasing breaking changes post-July when I expect most package users will need to run these jobs in production.
Change log:
- Refactor page conversion count and unit test by @adamribaudo-velir in #98
- user_id mapping table that enables distinct user_id and user_pseudo_id marts by @adamribaudo-velir in #99
- Misc doc updates by @adamribaudo-velir in #101
- Use first event instead of first session_start by @clemens7haar in #102
- correct data_type of tax and value by @Sgiostra-BitBang in #116
- correct the data_type of ''shipping'' field from "float_value" to "double_value" by @Sgiostra-BitBang in #121
- Add page_path field by @adamribaudo-velir in #124
- Allow users to override default_channel_grouping by @adamribaudo-velir in #111
- Added event_name clustering to base event model by @adamribaudo-velir in #119
- Add event_date_dt to stg_ga4__event_items by @dgitis in #133
- Dispatch the unnest_key macro by @adamribaudo-velir in #130
- Add session_number field to dim_ga4_sessions by @adamribaudo-velir in #134
- CPC Campaign Attribution Fix by @dgitis in #138
- Fix error with ga_session_number rename: consistent column name in base and intraday models by @adamribaudo-velir in #140
- Fix engaged sessions by @ivan-toriya-precis in #137
- Remove extra ga4 folder by @adamribaudo-velir in #145
- Descriptive field names and doc updates by @adamribaudo-velir in #149
- Add push notifications channel by @gblazq in #152
New Contributors
- @Sgiostra-BitBang made their first contribution in #116
- @ivan-toriya-precis made their first contribution in #137
- @gblazq made their first contribution in #152
Full Changelog: 2.0.1...3.0.0
v2.0.1 utm values available when gclid is present
In this release, the utm values for source, medium, and campaign will be retained even when a gclid
value is present in the page_location
field. There were also changes to unit test definition and execution which should only impact package developers.
What's Changed
- Fix Session Conversion unit test and update model name to _daily by @adamribaudo-velir in #91
- GitHub workflow on pr by @adamribaudo-velir in #93
- Fix automated unit tests - updated config to look for service json relative to workspace by @adamribaudo-velir in #95
- use utm parameters also if gclid is found by @clemens7haar in #94
Full Changelog: 2.0.0...2.0.1
v2.0.0 Performance updates and other QoL changes
The primary focus of this release is performance optimizations with an eye towards reducing table scans and ultimately BQ query costs. There are also a number of new features and quality-of-life improvements.
With regards to query optimization, the following major changes were made:
- Removed window functions from
stg_ga4__events
and other models to avoid unnecessary table scans - Moved event deduplication to
base_ga4__events
so that events are deduped on the way in to the base model - Created an incremental
fct_ga4__sessions_daily
partitioned table that offers an efficient (cheap) method of calculating session-level metrics and conversion counts - Created an incremental
fct_ga4__pages
partitioned table containing page-level metrics
Note that mart tables with records that contain information spanning multiple days cannot be partitioned on date and will incur run costs commensurate with the size of your implementation. You can disable any specific models you like by updating the models
section of your dbt_project.yml
file.
Breaking changes
- The inputs to the surrogate keys have changed which means that new event, session, and user keys will not match previous keys
- The mart tables have changed significantly which may introduce errors for any connected BI tools
- Removed
user_key
in many cases in favor ofuser_pseudo_id
. A mapping betweenuser_id
anduser_pseudo_id
will be provided in the future.
Full List of Changes
- adding recommended events - share & generate_lead by @vibhorj in #47
- login, sign_up, and search recommended events by @3v-dgudaitis in #45
- Fixing 2 broken unit tests by @adamribaudo-velir in #50
- Allow renaming of custom event parameters (#48) by @willbryant in #49
- Flattened records by @dgitis in #53
- fix: query_parameter_exclusions applies to page_referrer too (#55) by @willbryant in #56
- Support derived_session_properties (#54) by @willbryant in #58
- Fix inconsistent indentation in README yaml by @willbryant in #57
- Support defining custom parameters that apply to all events by @willbryant in #51
- Base model partition optimization fixes by @adamribaudo-velir in #59
- Fix Google Ads Attributed to Organic by @dgitis in #70
- Add stream_id and platform to the sessions dimension table by @willbryant in #75
- Fix integration tests for detect_gclid change (#78) by @willbryant in #79
- Add landing_page_referrer to the sessions dimensions table by @willbryant in #77
- Use user_pseudo_id rather than user_key to construct session_key (#61) by @willbryant in #76
- Fix dim_ga4__sessions unique test by @adamribaudo-velir in #73
- Fix to make event_key unique by @adamribaudo-velir in #74
- Simplify derived user properties & session properties queries by @willbryant in #82
- De-dup events in the base events queries (#74) by @willbryant in #83
- Channel grouping fixes by @willbryant in #85
- Utm content and term by @willbryant in #84
- refactor: Simplify sessions conversions queries by @willbryant in #88
- Support deriving session properties from user_properties too by @willbryant in #87
- Incremental session fact table that includes conversion metrics by @adamribaudo-velir in #64
New Contributors
- @3v-dgudaitis made their first contribution in #45
- @willbryant made their first contribution in #49
Full Changelog: 1.0.0...2.0.0
v1.0.0
🎉 Release 1.0.0 🎉
This release offers many new features and multiple breaking changes from v0.1.4. Please read the release notes for more details.
Summary of Enhancements
- Added
user_key
field which is used to aggregate users on eitheruser_pseudo_id
oruser_id
with preference foruser_id
- Support for GA4 BQ exports that only export to the streaming 'intraday' table
- Added
pages
fact table which aggregates common metrics at the URL & hour grain - Added many recommended and ecommerce events
- Added support for user properties and 'derived' user properties which pull from event parameters
Breaking Changes
- #25 introduces
user_key
as the primary method of identifying unique users. Previouslyclient_id
was used but this field has been removed. - #29 introduces
frequency
as the method of determining whether batch or streaming tables will be ingested. Theinclude_intraday_events
variable is no longer used - Hashed keys (user_key, session_key, event_key) are now STRING rather than BYTES data types
Known Issues
- Further updates are necessary to effectively prune the
base_ga4__events
partitions. Building and querying tables will potentially scan more rows than desirable in the meantime. Take this into account when considering this package for high volume installations.
What's Changed
- Prefixed macros with ga4. to simplify replacing models in local project by @dgitis in #20
- Adding sum of engagement time as a session-level fact by @adamribaudo-velir in #26
- Additional pageview dimensions by @adamribaudo-velir in #24
- add campaign and improve sessions traffic sources model by @clemens7haar in #28
- change materialization of user models to 'table' by @adamribaudo-velir in #23
- session_engaged - using string instead of int value by @clemens7haar in #30
- Ecommerce by @dgitis in #21
- fixed in base_ga4__events.sql - BUG #31 by @vibhorj in #32
- Introduce user_key which pulls from user_id or user_pseudo_id depending on which is available by @adamribaudo-velir in #25
- Bug fix 31 static vs dynamic partition by @vibhorj in #35
- Update static incremental load to use table suffix by @adamribaudo-velir in #37
- seed file - removed blank lines by @vibhorj in #39
- Pages Mart by @dgitis in #38
- Basic streaming support by @dgitis in #29
- User key by @dgitis in #42
- 36 handling of user scope dimensions user properties by @adamribaudo-velir in #43
- adding unit test for page conversions by @adamribaudo-velir in #44
New Contributors
- @clemens7haar made their first contribution in #28
- @vibhorj made their first contribution in #32
Full Changelog: 0.1.4...1.0.0
v0.1.4
- Added ability to switch between a dynamic or static lookback when determining which dates to pull during an incremental load.
- Added ability to exclude query parameters from
page_location
field - Added ability to calculate conversion metrics for sessions based on a configurable set of conversions
- Added ability to map between source/medium and default channel grouping. Requires
dbt seed
to be run. - Added pytest unit tests
Thanks for @dgitis for his contributions!
v0.1.3
Updates
- Package is now available on DBT Package Hub! https://hub.getdbt.com/velir/ga4/latest/
- Removed variables set at the package level. They should be set within the project that loads this package
- Misc doc updates
v0.1.2
Release Updates
This release has been tested across 3 different GA4 export datasets, but would benefit from input from additional users who have access to additional datasets. We would appreciate feedback on its performance against a variety of use cases and scales of data.