-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow custom planning behavior for selecting wildcard expression #11639
Comments
take |
Hi @alamb, do you know the purpose of this comment? datafusion/datafusion/expr/src/logical_plan/builder.rs Lines 1443 to 1458 in 7db4213
I believe this expansion for datafusion/datafusion/sql/src/select.rs Lines 594 to 595 in cf9da76
I found the wildcard expression is still expanded even when I disable this behavior in the planner. I think the reason is that the planner invokes the datafusion/datafusion/sql/src/select.rs Line 693 in 7db4213
Could we remove the expansion behavior in the builder? (I guess that's what the comment purposes.) |
I think that is my understanding as well I believe @jayzhan211 and @tshauck have had some recent discussion about this as well, so maybe they remember more than do I |
We were working on the related |
Thanks, @tshauck. I agree there is no conflict between them. I'm curious, which PR did you do for this? There are duplicate behaviors for the expansion (in the |
It was part of the work to support |
@goldmedal If we move the expression expansion to analyzer as the comment said, I guess you can still tell the difference from |
I think I got it. So, ideally, we don't do any expansions in the planner or builder. We need to have an analyze rule or optimize rule to expand the expression. If we move this to analyze or optimizer, I can get the wildcard expression before optimizing, even if we don't add a customized behavior in ExprPlanner, right? |
Correct
…On Sat, Jul 27, 2024, 2:16 PM Jax Liu ***@***.***> wrote:
@goldmedal <https://github.com/goldmedal> If we move the expression
expansion to analyzer as the comment said, I guess you can still tell the
difference from SELECT * FROM A and SELECT c1, c2, c3 FROM A *before
optimizer*. I think expand expression in builder is quite early, if there
are 1000 columns, we need to carry those columns around in many places. It
is better to delay expansion *lazily* as long as possible.
I think I got it. So, ideally, we don't do any expansions in the planner
or builder. We need to have an analyze rule or optimize rule to expand the
expression. If we move this to analyze or optimizer, I can get the wildcard
expression before optimizing, even if we don't add a customized behavior in
ExprPlanner, right?
—
Reply to this email directly, view it on GitHub
<#11639 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADZCLR3QAJ4XRXA7VKFYJITZOM3KBAVCNFSM6AAAAABLM6DYKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJTHAYTGNJWHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I have drafted a version #11681 for moving wildcard expansions to the analyzer. I think it's better than #11673. I might change the purpose of this PR. WDYT @alamb, @jayzhan211 ? By the way, I'm thinking about handling |
I think handling replace item in options is a good idea. We store the replace item in Expr::Wildcard in planning stage and convert it to expanded columns with expected replace items in analyzer. |
Is your feature request related to a problem or challenge?
Currently, DataFusion always expands the wildcard expression to the list of columns when planning a
SELECT
statement. This causes problems for those who want to check the wildcard expression in the logical plan layer.For example, given a table "A" with three columns: c1, c2, and c3, it is difficult to distinguish between
SELECT * FROM A
andSELECT c1, c2, c3 FROM A
in the logical plan layer.If we provide a way for users to customize the behavior of wildcard expression planning, it will be easier to distinguish between these two cases.
Describe the solution you'd like
I plan to implement this feature based on
ExprPlanner
. I guess we can add a method calledplan_select_wildcard
and putting the default expression expansion there. Then, invoke this method inSqlToRel::sql_select_to_rex
.https://github.com/apache/datafusion/blob/main/datafusion/sql/src/select.rs#L594
Describe alternatives you've considered
No response
Additional context
My current project, Wren engine, is a semantic engine based on the DataFusion Logical Plan. We have a type of column called a "calculated field" that can be selected specifically but won't be included when querying using the wildcard expression. Since it could trigger some joins for its parent table, we want to ensure that it is only used when the user specifically selects it.
This feature helps us align with the native behavior of DataFusion.
The text was updated successfully, but these errors were encountered: