Skip to content

Commit

Permalink
Union Data V2 (#178)
Browse files Browse the repository at this point in the history
* get new branch on remote

* send to bk

* docs

* joe feedback

* Apply suggestions from code review

Co-authored-by: fivetran-catfritz <[email protected]>

* update source packege ref

---------

Co-authored-by: fivetran-catfritz <[email protected]>
  • Loading branch information
fivetran-jamie and fivetran-catfritz authored Dec 2, 2024
1 parent 7fa81ff commit 4e495c4
Show file tree
Hide file tree
Showing 71 changed files with 874 additions and 221 deletions.
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,21 @@
# dbt_zendesk v0.19.0
[PR #178](https://github.com/fivetran/dbt_zendesk/pull/178) includes the following updates:

## Feature Update: Run Models on Muliple Zendesk Sources
- This release supports running the package on multiple Zendesk sources at once! See the [README](https://github.com/fivetran/dbt_zendesk?tab=readme-ov-file#step-3-define-database-and-schema-variables) for details on how to leverage this feature.

> Please note: This is a **Breaking Change** in that we have a added a new field, `source_relation`, that points to the source connector from which the record originated. This field addition will require a `dbt run --full-refresh`, even if you are not using this new functionality.
## Documentation
- Cleaned up the column-level documentation descriptions for the `zendesk__ticket_enriched` and `zendesk__ticket_metrics` models.

## Under the Hood
- Relevant to package maintainers only:
- Added a consistency data validation test for each end model.
- Added `consistency_test_exclude_fields` to ignore in consistency tests. These are largely timestamp fields that can differ slightly due to different runtimes, but `source_relation` is also currently included due to the nature of this update.
- Filtered out records made or updated today from consistency tests to avoid false positive failures due to different runtimes.
- Incorporated `source_relation` into each validation test.

# dbt_zendesk v0.18.1
[PR #174](https://github.com/fivetran/dbt_zendesk/pull/174) includes the following changes:

Expand Down
62 changes: 60 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,12 +65,13 @@ Include the following zendesk package version in your `packages.yml` file:
```yml
packages:
- package: fivetran/zendesk
version: [">=0.18.0", "<0.19.0"]
version: [">=0.19.0", "<0.20.0"]
```
> **Note**: Do not include the Zendesk Support source package. The Zendesk Support transform package already has a dependency on the source in its own `packages.yml` file.

### Step 3: Define database and schema variables
#### Option A: Single connector
By default, this package runs using your destination and the `zendesk` schema. If this is not where your zendesk data is (for example, if your zendesk schema is named `zendesk_fivetran`), update the following variables in your root `dbt_project.yml` file accordingly:

```yml
Expand All @@ -79,7 +80,64 @@ vars:
zendesk_schema: your_schema_name
```

> **Note**: When running the package with a single source connector, the `source_relation` column in each model will be populated with an empty string.

#### Option B: Union multiple connectors
If you have multiple Zendesk connectors in Fivetran and would like to use this package on all of them simultaneously, we have provided functionality to do so. For each source table, the package will union all of the data together and pass the unioned table into the transformations. The `source_relation` column in each model indicates the origin of each record.

To use this functionality, you will need to set the `zendesk_sources` variable in your root `dbt_project.yml` file:

```yml
# dbt_project.yml
vars:
zendesk_sources:
- database: connector_1_destination_name # Required
schema: connector_1_schema_name # Rquired
name: connector_1_source_name # Required only if following the step in the following subsection
- database: connector_2_destination_name
schema: connector_2_schema_name
name: connector_2_source_name
```

##### Recommended: Incorporate unioned sources into DAG
> *If you are running the package through [Fivetran Transformations for dbt Core™](https://fivetran.com/docs/transformations/dbt#transformationsfordbtcore), the below step is necessary in order to synchronize model runs with your Zendesk connectors. Alternatively, you may choose to run the package through Fivetran [Quickstart](https://fivetran.com/docs/transformations/quickstart), which would create separate sets of models for each Zendesk source rather than one set of unioned models.*

By default, this package defines one single-connector source, called `zendesk`, which will be disabled if you are unioning multiple connectors. This means that your DAG will not include your Zendesk sources, though the package will run successfully.

To properly incorporate all of your Zendesk connectors into your project's DAG:
1. Define each of your sources in a `.yml` file in your project. Utilize the following template for the `source`-level configurations, and, **most importantly**, copy and paste the table and column-level definitions from the package's `src_zendesk.yml` [file](https://github.com/fivetran/dbt_zendesk_source/blob/main/models/src_zendesk.yml#L15-L351).

```yml
# a .yml file in your root project
sources:
- name: <name> # ex: Should match name in zendesk_sources
schema: <schema_name>
database: <database_name>
loader: fivetran
loaded_at_field: _fivetran_synced
freshness: # feel free to adjust to your liking
warn_after: {count: 72, period: hour}
error_after: {count: 168, period: hour}
tables: # copy and paste from zendesk_source/models/src_zendesk.yml - see https://support.atlassian.com/bitbucket-cloud/docs/yaml-anchors/ for how to use anchors to only do so once
```

> **Note**: If there are source tables you do not have (see [Step 4](https://github.com/fivetran/dbt_zendesk_source?tab=readme-ov-file#step-4-disable-models-for-non-existent-sources)), you may still include them, as long as you have set the right variables to `False`. Otherwise, you may remove them from your source definition.

2. Set the `has_defined_sources` variable (scoped to the `zendesk_source` package) to `True`, like such:
```yml
# dbt_project.yml
vars:
zendesk_source:
has_defined_sources: true
```

### Step 4: Enable/Disable models for non-existent sources

> _This step is optional if you are unioning multiple connectors together in the previous step. The `union_data` macro will create empty staging models for sources that are not found in any of your Zendesk schemas/databases. However, you can still leverage the below variables if you would like to avoid this behavior._
This package takes into consideration that not every Zendesk Support account utilizes the `schedule`, `schedule_holiday`, `ticket_schedule`, `daylight_time`, `time_zone`, `audit_log`, `domain_name`, `user_tag`, `organization_tag`, or `ticket_form_history` features, and allows you to disable the corresponding functionality. By default, all variables' values are assumed to be `true`, except for `using_schedule_histories`. Add variables for only the tables you want to enable/disable:
```yml
vars:
Expand Down Expand Up @@ -230,7 +288,7 @@ This dbt package is dependent on the following dbt packages. These dependencies
```yml
packages:
- package: fivetran/zendesk_source
version: [">=0.13.0", "<0.14.0"]
version: [">=0.14.0", "<0.15.0"]
- package: fivetran/fivetran_utils
version: [">=0.4.0", "<0.5.0"]
Expand Down
2 changes: 1 addition & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'zendesk'
version: '0.18.1'
version: '0.19.0'

config-version: 2
require-dbt-version: [">=1.3.0", "<2.0.0"]
Expand Down
2 changes: 1 addition & 1 deletion docs/catalog.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/manifest.json

Large diffs are not rendered by default.

7 changes: 5 additions & 2 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
config-version: 2

name: 'zendesk_integration_tests'
version: '0.18.1'
version: '0.19.0'

profile: 'integration_tests'

vars:
zendesk_schema: zendesk_integration_tests_50

zendesk_source:
zendesk_organization_identifier: "organization_data"
zendesk_schedule_identifier: "schedule_data"
Expand Down Expand Up @@ -36,9 +37,11 @@ vars:
# using_domain_names: false
# using_user_tags: false
# using_organization_tags: false
# using_holidays: false
# fivetran_integrity_sla_metric_parity_exclusion_tickets: (56,80)
# fivetran_integrity_sla_first_reply_time_exclusion_tickets: (56,80)
# fivetran_consistency_sla_policies_exclusion_tickets: (55,58) # can remove after this release
consistency_test_exclude_fields: ['source_relation', 'ticket_tags', 'ticket_day_id', 'assignee_ticket_last_update_at', 'assignee_last_login_at', 'requester_created_at', 'requester_updated_at', 'requester_ticket_last_update_at','requester_organization_created_at', 'requester_organization_updated_at', 'requester_last_login_at', 'created_at', 'updated_at']
# get rid of source_relation and ticket_day_id after v0.19.0

models:
+schema: "zendesk_{{ var('directed_schema','dev') }}"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,4 +76,5 @@ select *
from final
where
{# Take differences in runtime into account #}
max_sla_elapsed_time - min_sla_elapsed_time > 2
max_sla_elapsed_time - min_sla_elapsed_time > 5
and date(sla_applied_at) < current_date
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ with prod as (
ticket_id,
count(*) as total_slas
from {{ target.schema }}_zendesk_prod.zendesk__sla_policies
{{ "where ticket_id not in " ~ var('fivetran_consistency_sla_policy_count_exclusion_tickets',[]) ~ "" if var('fivetran_consistency_sla_policy_count_exclusion_tickets',[]) }}
where date(sla_applied_at) < current_date
{{ "and ticket_id not in " ~ var('fivetran_consistency_sla_policy_count_exclusion_tickets',[]) ~ "" if var('fivetran_consistency_sla_policy_count_exclusion_tickets',[]) }}
group by 1
),

Expand All @@ -18,7 +19,8 @@ dev as (
ticket_id,
count(*) as total_slas
from {{ target.schema }}_zendesk_dev.zendesk__sla_policies
{{ "where ticket_id not in " ~ var('fivetran_consistency_sla_policy_count_exclusion_tickets',[]) ~ "" if var('fivetran_consistency_sla_policy_count_exclusion_tickets',[]) }}
where date(sla_applied_at) < current_date
{{ "and ticket_id not in " ~ var('fivetran_consistency_sla_policy_count_exclusion_tickets',[]) ~ "" if var('fivetran_consistency_sla_policy_count_exclusion_tickets',[]) }}
group by 1
),

Expand Down
53 changes: 53 additions & 0 deletions integration_tests/tests/consistency/consistency_ticket_backlog.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@

{{ config(
tags="fivetran_validations",
enabled=var('fivetran_validation_tests_enabled', false)
) }}

with prod as (
select
{{ dbt_utils.star(from=ref('zendesk__ticket_backlog'), except=var('consistency_test_exclude_fields', '[]')) }}
from {{ target.schema }}_zendesk_prod.zendesk__ticket_backlog
),

dev as (
select
{{ dbt_utils.star(from=ref('zendesk__ticket_backlog'), except=var('consistency_test_exclude_fields', '[]')) }}
from {{ target.schema }}_zendesk_dev.zendesk__ticket_backlog

{# Make sure we're only comparing one schema since this current update (v0.19.0) added mult-schema support. Can remove for future releases #}
{{ "where source_relation = '" ~ (var("zendesk_database", target.database)|lower ~ "." ~ var("zendesk_schema", "zendesk")) ~ "'" if 'source_relation' in var("consistency_test_exclude_fields", '[]') }}

),

prod_not_in_dev as (
-- rows from prod not found in dev
select * from prod
except distinct
select * from dev
),

dev_not_in_prod as (
-- rows from dev not found in prod
select * from dev
except distinct
select * from prod
),

final as (
select
*,
'from prod' as source
from prod_not_in_dev

union all -- union since we only care if rows are produced

select
*,
'from dev' as source
from dev_not_in_prod
)

select *
from final
where date_day < current_date
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@

{{ config(
tags="fivetran_validations",
enabled=var('fivetran_validation_tests_enabled', false)
) }}

with prod as (
select
{{ dbt_utils.star(from=ref('zendesk__ticket_enriched'), except=var('consistency_test_exclude_fields', '[]')) }}
from {{ target.schema }}_zendesk_prod.zendesk__ticket_enriched
where true
and {{ dbt.datediff(dbt.current_timestamp(), "updated_at", "minute") }} >= 60
),

dev as (
select
{{ dbt_utils.star(from=ref('zendesk__ticket_enriched'), except=var('consistency_test_exclude_fields', '[]')) }}
from {{ target.schema }}_zendesk_dev.zendesk__ticket_enriched
where true
and {{ dbt.datediff(dbt.current_timestamp(), "updated_at", "minute") }} >= 60

{# Make sure we're only comparing one schema since this current update (v0.19.0) added mult-schema support. Can remove for future releases #}
{{ "and source_relation = '" ~ (var("zendesk_database", target.database)|lower ~ "." ~ var("zendesk_schema", "zendesk")) ~ "'" if 'source_relation' in var("consistency_test_exclude_fields", '[]') }}
),

prod_not_in_dev as (
-- rows from prod not found in dev
select * from prod
except distinct
select * from dev
),

dev_not_in_prod as (
-- rows from dev not found in prod
select * from dev
except distinct
select * from prod
),

final as (
select
*,
'from prod' as source
from prod_not_in_dev

union all -- union since we only care if rows are produced

select
*,
'from dev' as source
from dev_not_in_prod
)

select *
from final
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@

{{ config(
tags="fivetran_validations",
enabled=var('fivetran_validation_tests_enabled', false)
) }}

with prod as (
select
{{ dbt_utils.star(from=ref('zendesk__ticket_field_history'), except=var('consistency_test_exclude_fields', '[]')) }}
from {{ target.schema }}_zendesk_prod.zendesk__ticket_field_history
),

dev as (
select
{{ dbt_utils.star(from=ref('zendesk__ticket_field_history'), except=var('consistency_test_exclude_fields', '[]')) }}
from {{ target.schema }}_zendesk_dev.zendesk__ticket_field_history

{# Make sure we're only comparing one schema since this current update (v0.19.0) added mult-schema support. Can remove for future releases #}
{{ "where source_relation = '" ~ (var("zendesk_database", target.database)|lower ~ "." ~ var("zendesk_schema", "zendesk")) ~ "'" if 'source_relation' in var("consistency_test_exclude_fields", '[]') }}
),

prod_not_in_dev as (
-- rows from prod not found in dev
select * from prod
except distinct
select * from dev
),

dev_not_in_prod as (
-- rows from dev not found in prod
select * from dev
except distinct
select * from prod
),

final as (
select
*,
'from prod' as source
from prod_not_in_dev

union all -- union since we only care if rows are produced

select
*,
'from dev' as source
from dev_not_in_prod
)

select *
from final
where date_day < current_date
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ dev as (
first_reply_time_business_minutes,
first_reply_time_calendar_minutes
from {{ target.schema }}_zendesk_dev.zendesk__ticket_metrics

{# Make sure we're only comparing one schema since this current update (v0.19.0) added mult-schema support. Can remove for future releases #}
{{ "where source_relation = '" ~ (var("zendesk_database", target.database)|lower ~ "." ~ var("zendesk_schema", "zendesk")) ~ "'" if 'source_relation' in var("consistency_test_exclude_fields", '[]') }}
),

final as (
Expand Down
Loading

0 comments on commit 4e495c4

Please sign in to comment.