-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transform with payload #8664
Transform with payload #8664
Conversation
e9ffe10
to
725ec25
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @peter-toth -- this seems like an improvement to me. Perhaps @viirya and @berkaysynnada can take a look too prior to merging this
where | ||
F: FnMut(Self, P) -> Result<(Transformed<Self>, Vec<P>)>, | ||
{ | ||
let (new_node, new_payload) = f(self, payload)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could potentially reduce redundancy by implementing this function terms of transform_with_payload
and empty f_up
and PU=()
-- it probably doesn't matter but I figured I would bring it up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, both the down and up versions can be implemented using transform_with_payload
. But I would keep them separated in this PR and check later what happens if we we use empty f_up and PU=()
. Does the "up logic" get optimized out by the compiler at "down invocations"? I.e. payload_up
vec is not allocated?
@@ -269,11 +266,18 @@ impl PhysicalOptimizerRule for EnforceDistribution { | |||
/// 4) If the current plan is Projection, transform the requirements to the columns before the Projection and push down requirements | |||
/// 5) For other types of operators, by default, pushdown the parent requirements to children. | |||
/// | |||
type RequiredKeyOrdering = Vec<Arc<dyn PhysicalExpr>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make this code simpler to make this struct RequiredKeyOrder
and give its methods names rather than dealing with Vec<Vec<...>>
-- but that can be a refactor for a different PR perhaps
Thanks @alamb for the review. Just a note that in this discussion: #8672 (comment) some concerns came up regarding the new |
Yes, I have some concerns about this PR (and getting rid of derived trees in general), so let's not merge this for now. I gave a brief explanation before, but @berkaysynnada and I will give more details on our reasoning after the holiday (next week). |
Let's not merge until we have consensus and more discussion
2b6d221
to
269d8ba
Compare
@ozankabak, @berkaysynnada, I've refactored IMO the commit not just removes Please let me know if you still have concerns. |
@peter-toth thank you for working on this. We were a little busy but we will work on this and reply in a couple days |
Two pieces of news from our side:
In any case, 2 (which is the main concern of this PR) will benefit from 1 so I will go step by step finish off 1 first, and then continue working on this PR (and 2 in general). |
Thanks for the update @ozankabak. As regards to 1., #7942 and our discussion with @berkaysynnada at #8663 (comment) might be relevant. Although that PR didn't try to eliminate code duplication, it might be useful to design better tree visit/transform functions in terms of effectiveness (self mutation of nodes where possible) and API usability (common I think 2. is somewhat orthogonal to 1., the new You might have seen that yesterday I pushed another commit to this PR: 88a43c1 to refactor and remove If you have some time after your simplification a review of this PR would be greatly appreciated as I would like to continue working on refactoring the remaining derived trees (either in this PR or follow-up ones). |
Certainly, I will do a deep dive on this after I finish the simplification PR. Thank you for the awesome collaboration |
FYI, the draft PR is #8817. I will focus on this once we are done with it 🚀 |
I will update this PR once #8817 gets merged. |
ecf18e2
to
51ef5d5
Compare
Marking as draft as we work on #8817 -- I am trying to clean up the list of PRs that need review |
51ef5d5
to
e7a4ddc
Compare
@alamb, @ozankabak, @berkaysynnada I rebased this PR and updated the PR description after #8817 got merged. Please share your thoughts. |
@peter-toth what do you think about first focusing on #8891 and then on this PR? I believe it holds higher importance and could contribute more to the simplification process. It would be easier if we concentrate these 2 PR's one by one, rather than proceeding with both at the same time. |
The 2 PRs are not connected but I'm fine with any order. #8891 is a small PR so I think we can probably conclude on something easily. And actually there will be a few more API fixing PRs in #8913. |
…ith_payload()` and `TreeNode.transform_up_with_payload()` - Refactor SortPushDown` and `PlanWithRequitements` using `TreeNode.transform_down_with_payload()` - Refactor `ExprOrdering`, `ExprTreeNode` and `PipelineStatePropagator` using `TreeNode.transform_up_with_payload()` - Refactor `OrderPreservationContext` and `PlanWithCorrespondingCoalescePartitions` using `TreeNode.transform_with_payload()`
e7a4ddc
to
0007fe1
Compare
I wonder what the current state / plan for this PR is? (I am going through the PR review backlog) |
Thanks @alamb fot the question! I plan to rebase this PR in a few days, maybe a week. Let me move this to draft till then. |
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Which issue does this PR close?
Part of #8913, #8663.
Rationale for this change
After #8817 the derived
TreeNode
trees got unified into 2 generic trees:PlanContext
andExprContext
. These generic trees still have the drawback that they contain derived nodes (contains some data and a link to a node of the original tree) for each of the original tree nodes to store some additional information / state of the node, which means transformations needs to keep the 2 trees in sync. While this is done automaticaly, it comes with additional costs and make tranformation algorightms less readable as the generic derived nodes has only adata
field that encodes different information in different trees.This PR proposes an alternative approach to many of the transformations that uses derived trees currently. The new API that PR adds to
TreeNode
looks like the following:as well as the
TreeNode.transform_down_with_payload()
andTreeNode.transform_up_with_payload()
variants that do one direction transformations.With the help of above new API this PR refactors 9 of the 11 derived trees and makes
ExprContext
obsolete.Besides the cleaner state transition functions, the new API is capable to describe some transformations that currently require 2 pass with 1 pass. With the new API, the example given in #8817 with
RequiringExec
that requires a top-down transformation to store minimum sort requirement in derived nodes and then a bottom-up transformation to insertSortExecs
s can be done in 1 pass without creating any additional trees.There are 2 examples for this advanced usecase in this PR:
PlanWithCorrespondingCoalescePartitions
:Currently
parallelize_sorts
is a bottom-up transformation while at some nodes it kicks offremove_corresponding_coalesce_in_sub_plan
to do a top-down transformation of the subtree.After this PR
propagate_unnecessary_coalesce_connections_down
top-down andparallelize_sorts_up
bottom-up transformations are incorporated into one pass usingtransform_down_with_payload
.OrderPreservationContext
:Currently
replace_with_order_preserving_variants
is a bottom-up transformation while at some nodes it starts a top-downplan_with_order_preserving_variants
transformation.After this PR
propagate_order_maintaining_connections_down
top-down andreplace_with_order_preserving_variants_up
bottom-up transformations are incorporated into one pass usingtransform_down_with_payload
.Please note that the new API can't replace or non-trivially can replace derived trees in some cases like:
This PR doesn't necessary want to remove
PlanContext
andExprContext
as there might be downstream usescases of those. The purpose of this PR is to offer better API (more effective and readable transformations) for usescases where it make sense to use.This PR implements idea 4. from #7942.
What changes are included in this PR?
This PR:
TreeNode.transform_with_payload()
,TreeNode.transform_down_with_payload()
andTreeNode.transform_up_with_payload()
,SortPushDown
andPlanWithRequitements
usingTreeNode.transform_down_with_payload()
,ExprOrdering
,ExprTreeNode
andPipelineStatePropagator
usingTreeNode.transform_up_with_payload()
,OrderPreservationContext
andPlanWithCorrespondingCoalescePartitions
usingTreeNode.transform_with_payload()
.The remaining 2 derived trees can be refactored in follow-up PRs if possible and needed.
Are these changes tested?
Using exinsting tests.
Are there any user-facing changes?
No.