-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epic: Unified TreeNode rewrite API #8913
Comments
The main concern I have is composability. Downstream, we have many use cases where the flow goes something like this:
(this can repeat multiple times) For context, we plan to contribute code implementing algorithms that work in this fashion upstream at some point. #8817 works towards having a composable architecture that lends itself to such flows, and will eliminate most major sources of code duplication concerning Once we have such a reusable/composable "infrastructure", we can discuss/ideate on various flavors of the |
This is actually interresting use case and I don't see that requirement in the current codebase. Are you adding this optimization in #8817 to somewhere?
I don't think that code duplication is the major issue here. I think the main issue with the current 11 derived trees (and also with the proposed generic What I would propose is proabaly to take a look at #8664 first (it is a much simpler PR than #8817 is now) and check where the new API can help. If there remain any usecase where |
@alamb If you don't mind I would suggest a bit different list of tasks as some of the API inconsistencies can be fixed without
|
Use cases where composability/reusability is critical are not yet in the current codebase (upstream) yet but at least we at Synnada have downstream use cases.
Having worked with most of the physical optimization rules as well as some logical rules for a long time now (upstream and downstream), I think I can safely that this is necessary. As I mentioned before, it is also on our roadmap to contribute more physical optimizations upstream in the near future and they need this kind of composability/reusability.
This is not really true though -- once you have the generics, you do not need to maintain anything with regards to All in all, the composability/reusability we will get via #8817 is really critical to certain use cases and upcoming contributions. Let's get that in and then let's discuss refining the In any case, I agree that parts of #8891 would be a good next step to iterate on and improve after #8817 merges. FYI, I hope to finish #8817 in a few days so that I can help with the rest of the work. |
I can't argue with that since I don't know your downstream code, but I'm looking forward to see a usecase in a future contribution where
Thanks for the update! |
Thank you -- I have removed that suggestion from the description here and linked the PRs you mentioned to this ticket |
Update here is that we are making good progress See |
Here is some thoughts from @peter-toth on what a unified API might look like from #10035 (comment)
+ Maybe rename + |
I think the epic description is a bit misleading as the BTW, I'm still investigating |
Ok, given we largely implemented what this epic was designed to do, let's close it and track additional work in a new ticket. I filed #10121 to track |
Well, since you asked @peter-toth 😄 migrating other passes listed on to use the TreeNode API would certainly be helpful. I know CSE is a big one. #9873 might also be an interesting exercise |
Sure. I want to take a look at #10097 next, but then I can check CSE: #9873 |
Is your feature request related to a problem or challenge?
(note this is mostly from @peter-toth 's description on #8891 (comment))
As @peter-toth noted, the current
TreeNode
APIs are somewhat inconsistent which has the following challenges:Specific Challenges of the Current APIs
Some specific challenges with the current APIs. These can cause a newcommer (like me) some confusion:
The
apply
/visit
functions are capable to prune (skip) subtrees or return immediately (stop) but thetransform
functions are not. To make it more confusionrewrite
is also capable to to prune, but instead of using a common way to control recursion,visit
andrewrite
use different enums with conflicting semantics.See this PR (Consolidate
TreeNode
transform and rewrite APIs #8891) for details onVisitRecursion
vsRewriteRecursion
.There is this
Transformed
enum whose purpose is to explicitely define if any change was made to a node. This enum is used intransform
clousres but for some reason it is missing fromrewrite
. Moreover this explicit information is neither propogatged up to API callee nor it is used for detecting changes in nodes.See details in Remove
Transformed
enum #8835. I believe the current state simply doesn't make sense and just causes confusion.rewrite
also seems to be inconsistent withtransform_up
andtransform_down
.rewrite
probably organically ended up in its current state and capable to do its current job, but what I would expect from such a function is that it simply incorporates 2 closures fromtransform_up
andtransform_down
to define:See this PR (Consolidate
TreeNode
transform and rewrite APIs #8891) for the details. IMO the currentTreeNodeRewriter.pre_visit()
seems like a mess and its logic is hard to grasp.Specific missing APIs
transform
functions:Pruning would be very important as many of the transformations wouldn't require traversing on the whole tree and could improve performance a lot.
I think the the reason why there are so many (4 base + 9 derived) tree implementations (with many code duplications) in DataFusion is that there is no API to describe a transformation while also propagating state. In Transform with payload #8664 with the help of
transform_with_payload
I not just eleminated 7 derived trees but also improved some of the algorightms to require only one traversal (also a perf improvement).Describe the solution you'd like
The proposal is a general API like this:
Obviously the closure return types can be extracted to a type alias or
struc
s like in the case off_down
could be:All the other functions can then be turned into a specialization of the above:
apply
(visit_down
): Onlyf_down
closure that doesn't return a modified node and payload (only retrurnsTreeNodeRecursion
)visit
/TreeNodeVisitor
: Bothf_down
andf_up
closures needed. The closures can be incorporated into aTreeNodeVisitor
object but they don't return a modified node and payloadtransform_down
: Onlyf_down
closure that doesn't return payloadtransform_up
: Onlyf_up
closure that doesn't return payloadtransform
: Bothf_down
andf_up
closures needed, but they don't return payloadsrewrite
/TreeNodeRewriter
: Bothf_down
andf_up
are incorporated into aTreeNodeRewriter
object, but they don't return a payloadtransform_down_with_payload
: Onlyf_down
closuretransform_up_with_payload
: Onlyf_up
closureDescribe alternatives you've considered
As noted above, there are a variety of PRs that test out various ideas in one way or another and the major challenge I think is figuring out what changes to make, in what order, and how much code churn to apply)
Additional context
Related tickets / PRs:
TreeNode
transform and rewrite APIs #8891Transformed
enum #8835LogicalPlan
tree node walking / rewriting code in one place #9994transform_xx_mut
methods #10097The text was updated successfully, but these errors were encountered: