-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(flow): use DataFusion's optimizer #4489
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add Documentation and Community
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4489 +/- ##
==========================================
- Coverage 84.89% 84.68% -0.22%
==========================================
Files 1101 1102 +1
Lines 197166 197601 +435
==========================================
- Hits 167386 167329 -57
- Misses 29780 30272 +492 |
5200f97
to
c2534c6
Compare
refactor: mv `sql_to_flow_plan` elsewhere feat(WIP): use df optimization WIP analyzer rule feat(WIP): avg expander fix: transform avg expander fix: avg expand feat: names from substrait fix: avg rewrite test: update `test_avg`&`test_avg_group_by` test: fix `test_sum` test: fix some tests chore: remove unused flow plan transform feat: tumble expander test: update tests
@waynexia PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done 👍 This greatly simplifies the implementation.
* feat: use datafusion optimization refactor: mv `sql_to_flow_plan` elsewhere feat(WIP): use df optimization WIP analyzer rule feat(WIP): avg expander fix: transform avg expander fix: avg expand feat: names from substrait fix: avg rewrite test: update `test_avg`&`test_avg_group_by` test: fix `test_sum` test: fix some tests chore: remove unused flow plan transform feat: tumble expander test: update tests * chore: clippy * fix: tumble lose `group expr` * test: sqlness test update * test: rm unused cast * test: simplify sqlness * refactor: per review * chore: after rebase * fix: remove a outdated test * test: add comment * fix: report error when not literal * chore: update sqlness test after rebase * refactor: per review
I hereby agree to the terms of the GreptimeDB CLA.
What's changed and what's your intention?
use and add a few Datafusion
Analyzer
andOptimizer
to improve writing sql query in flow:add three new analyzer in
df_optimizer.rs
file, which is also the major changes happen, other edits are either removal of obsolete functions or adding test for itAvgExpandRule
: rewriteavg
tosum/count
TumbleExpandRule
: rewritetumble
totumble_start
andtumble_end
CheckGroupByRule
: check if all exprs in group by is also used in select listalso apply
AnalyzerRule::TypeCoercion
to cast type properlyoptimizer from Datafusion is also appiled before encode to substrait, include:
OptimizeProjections
: identifies and eliminates unused columnsCommonSubexprEliminate
: avoid redundant computation of common sub-expressionsSimplifyExpressions
: simplifies expressions in the logical planUnwrapCastInComparison
: rewrites CAST(col) = lit to col = CAST(lit)Checklist