Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add union data #132

Closed
wants to merge 18 commits into from
Closed

Feature/add union data #132

wants to merge 18 commits into from

Conversation

fivetran-jamie
Copy link
Contributor

@fivetran-jamie fivetran-jamie commented Dec 27, 2023

PR Overview

This PR will address the following Issue/Feature:
#124

This PR will result in the following new package version:

v0.14.0

Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:

dbt_zendesk v0.14.0

🎉 Feature Update 🎉

This release supports running the package on multiple Zendesk sources at once! See the README for details on how to leverage this feature (PR #44).

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

  • dbt run –full-refresh && dbt test
  • dbt run (if incremental models are present)

Before marking this PR as "ready for review" the following have been applied:

  • The appropriate issue has been linked, tagged, and properly assigned
  • All necessary documentation and version upgrades have been applied
  • docs were regenerated (unless this PR does not include any code or yml updates) -- waiting on approval
  • BuildKite integration tests are passing
  • Detailed validation steps have been provided below

Detailed Validation

Please share any and all of your validation steps:

See Hex notebook linked in Height

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-jamie great work on this PR. I just have a few questions below that I would like you to provide more clarity around before I approve. Let me know if you have any questions. Thanks!


with spine as (

{% if execute %}
{% set current_ts = dbt.current_timestamp_backcompat() %}
{% set first_date_query %}
select min( created_at ) as min_date from {{ source('zendesk', 'ticket') }}
select min( created_at ) as min_date from {{ var('ticket') }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a need for these to be changed? If I recall, these were intentionally pointing at the source because if you tried to run dbt compile before the staging model was built it will throw an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this became necessary with my change to the package-defined zendesk source (ie dynamically disabling it if the customer is using the unioning variables).

we need to look at the staging model instead, as there's no single source (and the customer isn't required to define sources)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that running dbt compile before running the package may be a dealbreaker for some users. Is there any possibility for us to allow the use of the source if the user is not leveraging the union feature, and then allow for the switch if they are using the feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a great idea, this was kind of the biggest uncertainty for me in these PRs. i'll adapt this check for here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on top of this, would we want to add something like this so union-folks could perhaps compile before running?

{% set first_date = run_query(first_date_query).columns[0][0]|string if var('ticket') is not None else '2016-01-01' %}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh i suppose this wouldn't really be possible in the pivot model since we're pulling field_names and not just a start_date, so compile would still fail

@@ -1,4 +1,4 @@
-- depends_on: {{ source('zendesk', 'ticket_field_history') }}
-- depends_on: {{ ref('stg_zendesk__ticket_field_history') }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about why this is no longer pointing to the source? Is this because the source is deactivated if the union schema is being used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we call out that folks may not be able to compile before running on a new schema?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll add the source check for folks not using union_data

models/intermediate/int_zendesk__ticket_schedules.sql Outdated Show resolved Hide resolved
Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-jamie thanks for working through this! I do have a few comments and questions I would like for you to review below before approving. Let me know if you have any questions or would like to discuss any of these points in more detail.

CHANGELOG.md Outdated
# dbt_zendesk v0.14.0

## 🎉 Feature Update 🎉
This release supports running the package on multiple Zendesk sources at once! See the [README](https://github.com/fivetran/dbt_zendesk?tab=readme-ov-file#step-3-define-database-and-schema-variables) for details on how to leverage this feature ([PR #132](https://github.com/fivetran/dbt_zendesk/pull/132)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also call out that there is a new field in the end models source_relation and this will require a full refresh to account for the schema change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

README.md Outdated

<details><summary><i>Expand for source configuration template</i></summary><p>

> **Note**: If there are source tables you do not have (see [Step 4](https://github.com/fivetran/dbt_zendesk?tab=readme-ov-file#step-4-disable-models-for-non-existent-sources)), you may still include them, as long as you have set the right variables to `False`. Otherwise, you may remove them from your source definitions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about contradicting my previous point here and would like to explore a more maintainable way to inform users of this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -36,17 +36,18 @@ with timezone as (
from timezone
left join daylight_time
on timezone.time_zone = daylight_time.time_zone
and timezone.source_relation = daylight_time.source_relation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am now running into this issue when running this model on the zendesk_yashwanth schema. Do you have any understanding of why this would be ocurring?

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmm oddly enough i am not getting this error ...

@@ -11,13 +11,15 @@ with organizations as (
), tag_aggregates as (
select
organizations.organization_id,
organizations.source_relation,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason I am running into the following error when running on the zendesk_yashwanth schema. Do you have any ideas why this may be occurring?

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also not getting this error... could you rerun @fivetran-joemarkiewicz ?

Comment on lines +24 to +36
select
schedule_id,
source_relation
from (

select
schedule_id,
source_relation,
row_number() over (partition by source_relation order by created_at) = 1 as is_default_schedule
from schedule

{% if execute %}

{% set default_schedule_id_query %}
with set_default_schedule_flag as (
select
row_number() over (order by created_at) = 1 as is_default_schedule,
id
from {{ source('zendesk','schedule') }}
)
select
id
from set_default_schedule_flag
where is_default_schedule

{% endset %}

{% set default_schedule_id = run_query(default_schedule_id_query).columns[0][0]|string %}
) as order_schedules
where is_default_schedule
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the results of this query change between your changes and prod. I understand the need to remove the run query as more than one default id may be returned and it ensure we don't need to look at the source, but the results do seem to differ for the zendesk_yashwanth schema.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh so schedule 360000389531 is deleted...

image

Because the query is now pointing to the staging model, it's filtering out deleted schedules and therefore selecting 11630270684308

So the question is, should we stick to the pre-existing behavior, which will select the very first (but potentially deleted) schedule as the default, or is this kinda a bug?


with spine as (

{% if execute %}
{% set current_ts = dbt.current_timestamp_backcompat() %}
{% set first_date_query %}
select min( created_at ) as min_date from {{ source('zendesk', 'ticket') }}
select min( created_at ) as min_date from {{ var('ticket') }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that running dbt compile before running the package may be a dealbreaker for some users. Is there any possibility for us to allow the use of the source if the user is not leveraging the union feature, and then allow for the switch if they are using the feature?

@fivetran-jamie fivetran-jamie mentioned this pull request Nov 19, 2024
7 tasks
@fivetran-jamie fivetran-jamie added the type:duplicate This issue or pull request already exists label Nov 20, 2024
@fivetran-jamie
Copy link
Contributor Author

closed in favor of #178

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:duplicate This issue or pull request already exists
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants