Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: make sure JOIN ON expression is boolean type #11423

Merged
merged 3 commits into from
Jul 13, 2024
Merged

Conversation

jonahgao
Copy link
Member

Which issue does this PR close?

Closes #11414.

Rationale for this change

The query in the issue generates the following plan after some optimizations.

select * from t1 inner join t2 on null right join t3 on true;
    Projection: t1.v1, t2.v1, t3.v1
      Right Join:
        Inner Join:
          Filter: Boolean(true) AND NULL
            TableScan: t1
          TableScan: t2
        TableScan: t3

When the simplify_expressions rule attempts to simplify Boolean(true) AND NULL, it fails because the operand types on both sides are different (DataType::Boolean vs DataType::NULL).

The fix is to cast the NULL type to Boolean type.

What changes are included in this PR?

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added sql SQL Planner sqllogictest SQL Logic Tests (.slt) labels Jul 12, 2024

# ON expression must be boolean type
query error DataFusion error: Error during planning: ON expression must be boolean, but got Utf8
SELECT * FROM t1 INNER JOIN t2 ON 'TRUE'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query is available in PostgreSQL, but not available in Spark.
If we plan to support it, we can directly cast the ON expression to boolean without checking its current type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the error is clear and is a reasonable behavior. If it is important for someone's usecase, perhaps we can add support for it then

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the structure you have now implemented (in the analyzer) makes it pretty easy to support if desired

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jonahgao (and @2010YOUY01 for applying SQL lancer!)

}
other => {
return plan_err!(
"ON expression must be boolean, but got {other:?}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the same rule applies to other APIs (not just SQL).

What would you think about moving the check to LogicalPlanBuilder

pub fn join_on(
self,
right: LogicalPlan,
join_type: JoinType,
on_exprs: impl IntoIterator<Item = Expr>,
) -> Result<Self> {
let filter = on_exprs.into_iter().reduce(Expr::and);
self.join_detailed(
right,
join_type,
(Vec::<Column>::new(), Vec::<Column>::new()),
filter,
false,
)
}

That way this check would be applied to the DataFrame API as well as any custom plans that were created?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I re-implemented it in type_coercion to avoid creating a temporary joined schema. Also added a test for DataFrame.


# ON expression must be boolean type
query error DataFusion error: Error during planning: ON expression must be boolean, but got Utf8
SELECT * FROM t1 INNER JOIN t2 ON 'TRUE'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the error is clear and is a reasonable behavior. If it is important for someone's usecase, perhaps we can add support for it then

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate and removed sql SQL Planner labels Jul 12, 2024
@@ -896,9 +896,8 @@ impl DataFrame {
join_type: JoinType,
on_exprs: impl IntoIterator<Item = Expr>,
) -> Result<DataFrame> {
let expr = on_exprs.into_iter().reduce(Expr::and);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's redundant because the reduce operation will be performed inside the join_on function.

datafusion/optimizer/src/analyzer/type_coercion.rs Outdated Show resolved Hide resolved

# ON expression must be boolean type
query error DataFusion error: Error during planning: ON expression must be boolean, but got Utf8
SELECT * FROM t1 INNER JOIN t2 ON 'TRUE'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the structure you have now implemented (in the analyzer) makes it pretty easy to support if desired

@alamb alamb merged commit 08fa444 into apache:main Jul 13, 2024
23 checks passed
@jonahgao jonahgao deleted the join_on branch July 13, 2024 13:48
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* fix: make sure JOIN ON expression is boolean type

* Applied to DataFrame

* Update datafusion/optimizer/src/analyzer/type_coercion.rs

Co-authored-by: Andrew Lamb <[email protected]>

---------

Co-authored-by: Andrew Lamb <[email protected]>
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jul 17, 2024
* fix: make sure JOIN ON expression is boolean type

* Applied to DataFrame

* Update datafusion/optimizer/src/analyzer/type_coercion.rs

Co-authored-by: Andrew Lamb <[email protected]>

---------

Co-authored-by: Andrew Lamb <[email protected]>
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jul 18, 2024
* fix: make sure JOIN ON expression is boolean type

* Applied to DataFrame

* Update datafusion/optimizer/src/analyzer/type_coercion.rs

Co-authored-by: Andrew Lamb <[email protected]>

---------

Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash bug from an inner join query (SQLancer)
2 participants