Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experiment] Basic column lineage with sqlglot #2065

Draft
wants to merge 5 commits into
base: devel
Choose a base branch
from

Conversation

sh-rp
Copy link
Collaborator

@sh-rp sh-rp commented Nov 14, 2024

Description

Implements a few helper functions that, given a simplified schema and an sql statement, can determine what the names of the resulting columns are and from which original table and column they came from. We can use this together with our dlt schema to compute the dlt schema of a dataset query ahead of time and set the right hints.

Notes:

  • Does not support the concept of schemas and databases yet, do we need this? We'll have to see.
  • Basic queries with subqueries, renaming, aggregating and group by etc work. I'm pretty sure we will find queries that will not work, and I am not at all certain that my implementation is correct.
  • What will happen if there are qualified table names in the statement? Only god knows, this will be the next step :)
  • What happens if a result column has two ancestors, maybe if there is a concatenation of two strings or something like that?

Copy link

netlify bot commented Nov 14, 2024

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit 5ea49ee
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/6736576d2decf20008ed656f

@sh-rp sh-rp self-assigned this Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant