Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] union_schemas support #52

Open
3 of 4 tasks
bdtoole opened this issue Sep 17, 2024 · 4 comments
Open
3 of 4 tasks

[Feature] union_schemas support #52

bdtoole opened this issue Sep 17, 2024 · 4 comments
Assignees
Labels
type:enhancement New functionality or enhancement

Comments

@bdtoole
Copy link

bdtoole commented Sep 17, 2024

Is there an existing feature request for this?

  • I have searched the existing issues

Describe the Feature

We have multiple Fivetran connectors for mixpanel that ultimately result in multiple source schemas for our mixpanel data. This package currently does not support the union_schemas functionality, which means if I want to use the package for our mixpanel data, I can only use it for one schema and I'd have to manually define the second schema and model the data myself.

How would you implement this feature?

The feature has effectively been built out already - it just hasn't been applied to this package, so the necessary changes would need to be made to support it here.

Describe alternatives you've considered

No response

Are you interested in contributing this feature?

  • Yes.
  • Yes, but I will need assistance.
  • No.

Anything else?

For reference, here's a Slack conversation I had with @fivetran-joemarkiewicz about this.

@bdtoole bdtoole added the type:enhancement New functionality or enhancement label Sep 17, 2024
@fivetran-joemarkiewicz
Copy link
Contributor

Thanks for opening this Feature Request @bdtoole!

As discussed in the slack conversation you linked above, this is something that should be relatively straightforward to add to the package. However, this is not something our team would likely pick up in the short term unless we see increased community interest in this PR. That being said, since it seems you're open to creating a PR I would be happy to share the steps necessary to provide union_schema support for this package. You can then contribute the changes in a PR which our team at Fivetran would be able to review and merge into the next release in a quicker turnaround.

If that is of interest to you, or anyone else who comes across this thread and is interested in contributing the union schema feature then you can view the required steps below:

  1. Create a models/tmp folder which will include a single model titled stg_mixpanel__event_tmp.
  2. Inside this model you will add code which reflects the following structure. Essentially the contents would be something along the lines of:
{{
    fivetran_utils.union_data(
        table_identifier='event', 
        database_variable='mixpanel_database', 
        schema_variable='mixpanel_schema', 
        default_database=target.database,
        default_schema='mixpanel',
        default_variable='event',
        union_schema_variable='mixpanel_union_schemas',
        union_database_variable='mixpanel_union_databases'
    )
}}
  1. You will replace this line with a star reference to the new tmp model instead of the source table since the new tmp model will include the unioned data.
  2. In this section of the staging model you will now add this macro to persist the source_relation field. For Mixpanel the code will look something similar to the following.
        {{ fivetran_utils.source_relation(
            union_schema_variable='mixpanel_union_schemas', 
            union_database_variable='mixpanel_union_databases') 
        }}
  1. Now that the source_relation field is persisted from the above macro, you will need to add that field to every downstream model reference. For example, you will need to add source_relation as a new field here and anywhere else we explicitly select fields in the models.
  2. Lastly, wherever there is a join you must add the source_relation as a condition. So here you will need to include the following to ensure joins are only performed on the relevant data:
    join user_first_events
        on spine.date_day >= user_first_events.first_event_day -- each user-event_type will a record for every day since their first day
        and spine.source_relation = user_first_events.source_relation

The above steps should be all that's required. There are additional changes and some improvements we need to apply as well, but our team can handle that during the PR review if you choose to open one. Let me know if you're interested and if you have any other questions. Thanks!

@bdtoole bdtoole mentioned this issue Sep 26, 2024
4 tasks
@fivetran-jamie fivetran-jamie self-assigned this Jan 3, 2025
@fivetran-jamie
Copy link
Collaborator

fivetran-jamie commented Jan 7, 2025

Hi @bdtoole and @silentachiever -- I've merged and adjusted Brian's PR in my own feature_union_schemas branch if you'd like to test it out! We've moved away from the mutually exclusive union_schema vs union_database approach and opted for a dictionary approach.

The variable is configured as such:

# dbt_project.yml

vars:
  mixpanel_sources:
    - database: connection_1_destination_name # Default value = target.database
      schema: connection_1_schema_name # Default value = 'mixpanel'

    - database: connection_2_destination_name
      schema: connection_2_schema_name

There are some additional optional configs explained here.

Here's my branch 🤠 Please let me know if you have any questions or issues or any feedback!

packages:
  - git: https://github.com/fivetran/dbt_mixpanel.git
    revision: feature_union_schemas

@bdtoole
Copy link
Author

bdtoole commented Jan 7, 2025

This is great! Out of curiosity, do you have an estimated timeline on when this will get rolled into the main package?

@fivetran-jamie
Copy link
Collaborator

I'd say likely sometime next week!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:enhancement New functionality or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants