Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update partition logic, add source relation #78

Merged

Conversation

fivetran-avinash
Copy link
Contributor

@fivetran-avinash fivetran-avinash commented Mar 5, 2024

PR Overview

This PR will address the following Issue/Feature: [#77]

This PR will result in the following new package version: dbt_shopify_source v0.10.1

Please detail what change(s) this PR introduces and any additional information that should be known during the review of this PR:

🐛 Bug Fixes 🪛

  • Added source_relation to the partition_by clauses that determine the is_most_recent_record in the stg_shopify__metafield and stg_shopify__abandoned_checkout_discount_code tables.
  • Additionally updated partition logic in stg_shopify__metafield and stg_shopify__abandoned_checkout_discount_code to account for null table Redshift errors when handling null field cases.

🚘 Under The Hood 🚘

  • Included auto-releaser GitHub Actions workflow to automate future releases.
  • Added additional casting in seed dependencies for above models integration_tests/dbt_project.yml to ensure local testing passed on null cases.

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

  • dbt compile
  • dbt run –full-refresh
  • [NA] dbt run
  • dbt test
  • [NA] dbt run –vars (if applicable)

Before marking this PR as "ready for review" the following have been applied:

  • The appropriate issue has been linked and tagged
  • You are assigned to the corresponding issue and this PR
  • BuildKite integration tests are passing

Detailed Validation

Please acknowledge that the following validation checks have been performed prior to marking this PR as "ready for review":

  • You have validated these changes and assure this PR will address the respective Issue/Feature.
  • You are reasonably confident these changes will not impact any other components of this package or any dependent packages.
  • You have provided details below around the validation steps performed to gain confidence in these changes.

See Height ticket.

Standard Updates

Please acknowledge that your PR contains the following standard updates:

  • Package versioning has been appropriately indexed in the following locations:
    • indexed within dbt_project.yml
    • indexed within integration_tests/dbt_project.yml
  • CHANGELOG has individual entries for each respective change in this PR
  • [NA] README updates have been applied (if applicable)
  • [NA] DECISIONLOG updates have been updated (if applicable)
  • [NA] Appropriate yml documentation has been added (if applicable)

dbt Docs

Please acknowledge that after the above were all completed the below were applied to your branch:

  • docs were regenerated (unless this PR does not include any code or yml updates)

If you had to summarize this PR in an emoji, which would it be?

🏃🏽‍♂️

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-avinash thanks for pushing this PR forward! I have a few comments below, let me know if you have any questions!

dbt_project.yml Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
dbt_project.yml Outdated Show resolved Hide resolved
integration_tests/dbt_project.yml Outdated Show resolved Hide resolved
models/src_shopify.yml Outdated Show resolved Hide resolved
models/stg_shopify__abandoned_checkout_discount_code.sql Outdated Show resolved Hide resolved
Comment on lines 41 to 44
case when id is null
then row_number() over(partition by source_relation order by updated_at desc) = 1
else row_number() over(partition by id, source_relation order by updated_at desc) = 1
end as is_most_recent_record,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note here, we should include the order by field in the null case when logic.

Suggested change
case when id is null
then row_number() over(partition by source_relation order by updated_at desc) = 1
else row_number() over(partition by id, source_relation order by updated_at desc) = 1
end as is_most_recent_record,
case when id is null and updated_at is null
then row_number() over(partition by source_relation) = 1
else row_number() over(partition by id, source_relation order by updated_at desc) = 1
end as is_most_recent_record,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An order by is actually required on partitions in Databricks and Snowflake, so I added an order by source relation to the partition.

@fivetran-avinash fivetran-avinash marked this pull request as ready for review March 5, 2024 20:09
Copy link
Contributor Author

@fivetran-avinash fivetran-avinash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-joemarkiewicz PR ready for re-review.

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved! Changes look good but I have one comment to update the CHANGELOG wording a bit.

CHANGELOG.md Outdated
[PR #78](https://github.com/fivetran/dbt_shopify_source/pull/78) introduces the following changes:

## 🚨 Breaking Changes 🚨
- Added `source_relation` to the `partition_by` clauses that determine the `is_most_recent_record` in the `stg_shopify__metafield` and `stg_shopify__abandoned_checkout_discount_code` tables. If the user is leveraging the union feature, this could change data values, so would recommend a `dbt run --full-refresh` in this case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A full refresh is really only needed if there is an incremental model. Since this is not an incremental model we should remove that blurb.

Suggested change
- Added `source_relation` to the `partition_by` clauses that determine the `is_most_recent_record` in the `stg_shopify__metafield` and `stg_shopify__abandoned_checkout_discount_code` tables. If the user is leveraging the union feature, this could change data values, so would recommend a `dbt run --full-refresh` in this case.
- Added `source_relation` to the `partition_by` clauses that determine the `is_most_recent_record` in the `stg_shopify__metafield` and `stg_shopify__abandoned_checkout_discount_code` tables. If the user is leveraging the union feature, this could change data values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@fivetran-catfritz fivetran-catfritz self-requested a review March 6, 2024 15:49
Copy link
Contributor

@fivetran-catfritz fivetran-catfritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small changelog suggestion!

CHANGELOG.md Outdated
[PR #78](https://github.com/fivetran/dbt_shopify_source/pull/78) introduces the following changes:

## 🚨 Breaking Changes 🚨
- Added `source_relation` to the `partition_by` clauses that determine the `is_most_recent_record` in the `stg_shopify__metafield` and `stg_shopify__abandoned_checkout_discount_code` tables. If the user is leveraging the union feature, this could change data values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing the col is called index in the abandoned_checkout table.

Suggested change
- Added `source_relation` to the `partition_by` clauses that determine the `is_most_recent_record` in the `stg_shopify__metafield` and `stg_shopify__abandoned_checkout_discount_code` tables. If the user is leveraging the union feature, this could change data values.
- Added `source_relation` to the `partition_by` clauses that determine the `is_most_recent_record` in the `stg_shopify__metafield` table and `index` in the `stg_shopify__abandoned_checkout_discount_code` table. If the user is leveraging the union feature, this could change data values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@fivetran-avinash fivetran-avinash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-catfritz Changes applied!

CHANGELOG.md Outdated
[PR #78](https://github.com/fivetran/dbt_shopify_source/pull/78) introduces the following changes:

## 🚨 Breaking Changes 🚨
- Added `source_relation` to the `partition_by` clauses that determine the `is_most_recent_record` in the `stg_shopify__metafield` and `stg_shopify__abandoned_checkout_discount_code` tables. If the user is leveraging the union feature, this could change data values.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@fivetran-catfritz fivetran-catfritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@fivetran-avinash fivetran-avinash merged commit 738491a into main Mar 6, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Update partitioning logic to account for source_relation, empty source tables and union data
3 participants