Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query table_comments without unrolling with UNION ALL #357

Conversation

findepi
Copy link
Contributor

@findepi findepi commented Oct 3, 2023

Overview

table_comments is equally "hard" for the engine as information_schema.columns. Avoid optimizing the query on the dbt side by sending UNION ALL with individual schemas. Let the engine figure out the optimal strategy.

Checklist

  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • README.md updated and added information about my change
  • I have run changie new to create a changelog entry

@findepi findepi changed the title Query table_comments without unrolling Query table_comments without unrolling with UNION ALL Oct 3, 2023
@findepi findepi force-pushed the findepi/query-table-comments-without-unrolling-ace130 branch from 57b3b80 to 3177f04 Compare October 3, 2023 14:14
@findepi
Copy link
Contributor Author

findepi commented Oct 3, 2023

@findepi findepi force-pushed the findepi/query-table-comments-without-unrolling-ace130 branch from 3177f04 to 8029d67 Compare October 3, 2023 16:58
@findepi
Copy link
Contributor Author

findepi commented Oct 3, 2023

@mdesmet thanks, updated

@damian3031
Copy link
Member

It was raised as an issue some time ago that dbt docs generate is very slow #272, and it was fixed by #273.

I remember that we measure that, and after refactor from in to union all performance boost was indisputable (few minutes with in vs. few seconds with union all).

The reason for this is that the in predicate isn't pushed down.

I think that we shouldn't revert changes from this PR #273

@findepi
Copy link
Contributor Author

findepi commented Oct 3, 2023

The reason for this is that the in predicate isn't pushed down.

It is.

It is into the connector.
Not part of metadata listing APIs.

Good catch.

@findepi
Copy link
Contributor Author

findepi commented Oct 4, 2023

@damian3031 @mdesmet this should do the trick: trinodb/trino#19259

@findepi findepi force-pushed the findepi/query-table-comments-without-unrolling-ace130 branch from 8029d67 to 2e5d310 Compare October 4, 2023 18:57
`table_comments` is equally "hard" for the engine as
`information_schema.columns`. Avoid optimizing the query on the `dbt`
side by sending `UNION ALL` with individual schemas. Let the engine
figure out the optimal strategy.
@findepi findepi force-pushed the findepi/query-table-comments-without-unrolling-ace130 branch from 2e5d310 to 56b1459 Compare November 13, 2023 10:42
@findepi
Copy link
Contributor Author

findepi commented Nov 13, 2023

Rebased to resolve conflicts.

Decided not to modify the new code branch added in the meantime in 7bc9ae5, because that could lead to performance degradation due to metadata queries not being smart enough to optimize OR conditions (see also https://github.com/starburstdata/dbt-trino/pull/368/files#r1390942314).

@findepi
Copy link
Contributor Author

findepi commented Nov 16, 2023

Build is green. Looking fwd to review comments!

Copy link
Contributor

@hovaesco hovaesco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Is Galaxy already updated to the version which supports IN predicate pushdown in metadata listing API?

Copy link
Member

@damian3031 damian3031 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes are ok, but IMHO we should wait for Galaxy update with IN predicate pushdown and test it once again. We can merge it, but then let's definitely wait with a release until Galaxy update.

@hovaesco hovaesco merged commit 22720e3 into starburstdata:master Nov 29, 2023
8 checks passed
@findepi findepi deleted the findepi/query-table-comments-without-unrolling-ace130 branch November 29, 2023 12:53
@findepi
Copy link
Contributor Author

findepi commented Nov 29, 2023

Thanks for the merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants