-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: make sure JOIN ON expression is boolean type #11423
Conversation
|
||
# ON expression must be boolean type | ||
query error DataFusion error: Error during planning: ON expression must be boolean, but got Utf8 | ||
SELECT * FROM t1 INNER JOIN t2 ON 'TRUE' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This query is available in PostgreSQL, but not available in Spark.
If we plan to support it, we can directly cast the ON expression to boolean without checking its current type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the error is clear and is a reasonable behavior. If it is important for someone's usecase, perhaps we can add support for it then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the structure you have now implemented (in the analyzer) makes it pretty easy to support if desired
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jonahgao (and @2010YOUY01 for applying SQL lancer!)
datafusion/sql/src/relation/join.rs
Outdated
} | ||
other => { | ||
return plan_err!( | ||
"ON expression must be boolean, but got {other:?}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the same rule applies to other APIs (not just SQL).
What would you think about moving the check to LogicalPlanBuilder
datafusion/datafusion/expr/src/logical_plan/builder.rs
Lines 671 to 686 in 6e63748
pub fn join_on( | |
self, | |
right: LogicalPlan, | |
join_type: JoinType, | |
on_exprs: impl IntoIterator<Item = Expr>, | |
) -> Result<Self> { | |
let filter = on_exprs.into_iter().reduce(Expr::and); | |
self.join_detailed( | |
right, | |
join_type, | |
(Vec::<Column>::new(), Vec::<Column>::new()), | |
filter, | |
false, | |
) | |
} |
That way this check would be applied to the DataFrame API as well as any custom plans that were created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I re-implemented it in type_coercion
to avoid creating a temporary joined schema. Also added a test for DataFrame.
|
||
# ON expression must be boolean type | ||
query error DataFusion error: Error during planning: ON expression must be boolean, but got Utf8 | ||
SELECT * FROM t1 INNER JOIN t2 ON 'TRUE' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the error is clear and is a reasonable behavior. If it is important for someone's usecase, perhaps we can add support for it then
@@ -896,9 +896,8 @@ impl DataFrame { | |||
join_type: JoinType, | |||
on_exprs: impl IntoIterator<Item = Expr>, | |||
) -> Result<DataFrame> { | |||
let expr = on_exprs.into_iter().reduce(Expr::and); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's redundant because the reduce operation will be performed inside the join_on function.
|
||
# ON expression must be boolean type | ||
query error DataFusion error: Error during planning: ON expression must be boolean, but got Utf8 | ||
SELECT * FROM t1 INNER JOIN t2 ON 'TRUE' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the structure you have now implemented (in the analyzer) makes it pretty easy to support if desired
Co-authored-by: Andrew Lamb <[email protected]>
* fix: make sure JOIN ON expression is boolean type * Applied to DataFrame * Update datafusion/optimizer/src/analyzer/type_coercion.rs Co-authored-by: Andrew Lamb <[email protected]> --------- Co-authored-by: Andrew Lamb <[email protected]>
* fix: make sure JOIN ON expression is boolean type * Applied to DataFrame * Update datafusion/optimizer/src/analyzer/type_coercion.rs Co-authored-by: Andrew Lamb <[email protected]> --------- Co-authored-by: Andrew Lamb <[email protected]>
* fix: make sure JOIN ON expression is boolean type * Applied to DataFrame * Update datafusion/optimizer/src/analyzer/type_coercion.rs Co-authored-by: Andrew Lamb <[email protected]> --------- Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
Closes #11414.
Rationale for this change
The query in the issue generates the following plan after some optimizations.
When the
simplify_expressions
rule attempts to simplifyBoolean(true) AND NULL
, it fails because the operand types on both sides are different (DataType::Boolean
vsDataType::NULL
).The fix is to cast the NULL type to Boolean type.
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?
No