-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Unable to Find Source Schemas on Database that is Different From the Target Database (Multi-Database Setup) #126
Comments
Hi @Kryt87 thanks for opening this issue, but sorry to hear you're running into this problem. Quick question before diving into what could be happening here. What is the name of your source database you would expect the package to be reading from as opposed to the target? In your While listing relations in database=ingestion, schema=fivetran_reddit_ads Is the |
Hi Joe, thanks for jumping on this so fast. The source database is called
`ingestion`, where Fivetran is writing to and we want to read from. Our
target database, where we want the models to reside is called `prod`.
|
Thanks for clarifying that! It does seem like from the logs that the package is reading from the Would you be able to confirm that in this case you do see the |
I can confirm that the However, when I ran a few tests I was able to confirm that the packages was not reading from the ingestion database. In a dev target database I created the dummy table
This dummy table does not exist in the |
Thanks for sharing. One last question before diving deeper - do you have any other configs (outside of what you already shared in the issue description) in your |
Nothing that stands out to me 😞 |
Thanks for checking @Kryt87, I was able to dig into this deeper and believe I may have found the culprit here. I was able to recreate this error on Redshift. I did notice that this worked as expected on all other destinations which was a bit odd to me. However, for Redshift I saw the exact same behavior as you mentioned. I then went under the hood and was inspecting these lines which are run during the first query step of the Ad Reporting models and found that the proper source So it seems the package is interpreting the variables properly, but the actual relation.value is not working as expected. I then decided to hardcode a Database Error in model stg_reddit_ads__ad_tmp (models/tmp/stg_reddit_ads__ad_tmp.sql)
Datasharing is not enabled on this cluster This would make sense why the variables are working as expected, but then the actual relation is failing since datasharing isn't enabled on my end. Do you by chance have datasharing enabled on your end to facilitate this? I imagine if you're able to enable this then you'll be able see success when reading from one database and write to another. Let me know if this helps address your issue! |
I'm glad you are seeing the same error, I find Redshift can be very particular about this kind of thing. Regarding data sharing, it is enabled for the cluster. However, there are no data shares set up between these two databases and there shouldn't need to be as they are on the same cluster. Data sharing is primarily designed to facilitate sharing data across different Redshift clusters, which can be in different AWS accounts or regions. When testing, I created both a staging model (view) that sourced the |
I also was confused why the data sharing was not enabled for the same cluster, but that's the error I was seeing. It's also strange that you're able to query directly from the source in that staging model yet you don't see the error I saw. Could you share your test staging model view and the log result? I'm wondering if something may be causing issues at the |
I think you are on the right track with dbt-redshift. The view is rather simple: with source as (
select * from {{ source('fivetran_reddit_ads', 'campaign_report') }}
),
renamed as (
select * from source
)
select * from renamed With the source being: sources:
- name: fivetran_reddit_ads
description: ""
database: ingestion
tables:
- name: campaign_report
description: "" debug.log - slimmed down debugging logs |
Thanks so much for sharing @Kryt87. I've been looking into this more, but struggling on my end since I continue to run into the data sharing error. I want to narrow this down even further to the dbt-redshift adapter and see if the {% set relation = namespace(value="") %}
{%- set relation.value=adapter.get_relation(
database=source('fivetran_reddit_ads', 'campaign_report').database,
schema=source('fivetran_reddit_ads', 'campaign_report').schema,
identifier='campaign_report'
) -%}
with source as (
select * from {{ relation.value }}
),
renamed as (
select * from source
)
select * from renamed Let me know if this ends up working for you. If it doesn't then we may need to open an issue on the dbt-redshift adapter to have those maintainers help address this issue. If not, then we can narrow this issue down to some component in the package not working as expected. |
Maybe I have something set up wrong, but I am getting the following error:
|
The compiled code is:
|
Thanks for sharing @Kryt87, this is promising in the sense that we see in your compiled code that However, it is interesting to see the |
FYI I have a draft bug report for the dbt-redshift repo that I can send over if we don't find anything obvious with the |
We have had some issues with table bindings in the past, so in |
Thanks for sharing @Kryt87. I don't imagine that config is the cause of these issues. I've gone ahead and opened the above issue on the dbt-redshift repo. Let's move our conversation there for the meantime to see if the maintainers of that repo are able to help shed light on what may be going on. Thanks! |
@Kryt87, I don't know if you've been following the linked dbt-redshift issue above but dbt Labs has confirmed that Redshift does not support cross-db querying 😞. The issue has been consolidated within this feature request on the dbt-redshift repo. As dbt-redshift does not support cross-db querying, we are blocked from allowing this to work as we would expect within the dbt package. Unfortunately we are limited now to the prioritization of the above mentioned feature request within dbt-redshift. Since we're blocked, I've opted to create PR #128 to clarify this limitation in the README for users wanting to use this package across databases and that this operation is limited to the dbt adapter's capabilities. Additionally, I'll need to mark this ticket as won't fix since we unfortunately won't be able to address this issue at the dbt package level. I sincerely apologize this doesn't address your original issue, but I encourage you to post in the dbt-redshift feature request linked above to let the folks at dbt Labs know this is an important feature to you. |
Is there an existing issue for this?
Describe the issue
Selecting the source database using relevant
<connector>_database
(reddit_ads_database
etc.) variables in thedbt_project.yml
file is not working as expected when the target database is not the same as the database where the data is stored. In this instance many "Please be aware" messages about being unable to find source tables in the database.After reviewing the debug level logs and running a few tests, I observed that when running the staging models, a query against the Redshift
information_schema.tables
table is run but is pointing to theinformation_schema
schema on the target database and not the data source database. Which is why the tables cannot be found.Relevant error log or model output
CLI Output:
Debugging Logs:
Expected behavior
Expected that the required source tables could be identified in the
ingestion
database. There is no issue if the target database is the same as the target database.Possible solution
The
redshift__list_relations_without_caching
appears to be what is used to list the tables in a schema. Not sure what needs to happen to get it pointing at the correct database.dbt Project configurations
Package versions
version: 0.9.0
version: [">=1.10.0", "<1.11.0"]
What database are you using dbt with?
redshift
How are you running this dbt package?
dbt Core™
dbt Version
Core:
Plugins:
Additional Context
No response
Are you willing to open a PR to help address this issue?
The text was updated successfully, but these errors were encountered: