Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Add PullUpCorrelatedExpr::new and improve documentation #10500

Merged
merged 5 commits into from
May 17, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented May 14, 2024

Which issue does this PR close?

Closes #10294

Rationale for this change

While working on #10489 I spent some time studying PullUpCorrelatedExpr as it wasn't clear to me what was different about the two callsites.

While I was in here I figured I would refactor things into a structure to make it easier to understand for my future self (and hopefully other readers)

What changes are included in this PR?

  1. Add PullUpCorrelatedExpr::new and improve documentation
  2. Improve some documentation

Are these changes tested?

Covered by existing CI

Are there any user-facing changes?

@github-actions github-actions bot added the optimizer Optimizer rules label May 14, 2024
pub can_pull_up: bool,
// indicate whether need to handle the Count bug during the pull up process
/// Do we need to handle the Count bug during the pull up process
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what/who is the "count bug"? Is this a hotfix for an actual issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick look and couldn't figure it out

It looks like it was introduced in #6457. @mingmwang or @jackwener do you remember what the "count bug is" and if it has a ticket?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the param, if we dont use it, the correlated subquery fails

Running "subquery.slt"
External error: query result mismatch:
[SQL] SELECT t1_id, (SELECT count(*) FROM t2 WHERE t2.t2_int = t1.t1_int) from t1
[Diff] (-expected|+actual)
    11 1
-   22 0
+   22 NULL
    33 3
-   44 0
+   44 NULL
at test_files/subquery.slt:763

Looks like its related if there is no match in correlated subq the count should be 0 instead of NULL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be introduced by @mingmwang

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #10500 to track this issue (thanks @comphead and everyone for calling it out and the context) and added a reference to the ticket in the code

pull_up_having_expr: None,
};
let mut pull_up = PullUpCorrelatedExpr::new()
.with_in_predicate_opt(in_predicate_opt.clone())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these were the two fields that are set differently which I think is much clearer now

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @alamb its good to go, but we probably want to revisit this count bug sometime

@alamb
Copy link
Contributor Author

alamb commented May 16, 2024

lgtm thanks @alamb its good to go, but we probably want to revisit this count bug sometime

Thank you -- I will file a ticket to track the issue and leave a note in the comments

@alamb alamb merged commit f6646a4 into apache:main May 17, 2024
24 checks passed
@alamb alamb deleted the alamb/API_for_decorrelate branch May 17, 2024 08:14
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
…che#10500)

* Minor `PullUpCorrelatedExpr::new` and add documentation

* clippy

* Update datafusion/optimizer/src/decorrelate.rs

Co-authored-by: Oleks V <[email protected]>

* Add ticket reference to count bug

---------

Co-authored-by: Oleks V <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stop copying LogicalPlan and Exprs in ScalarSubqueryToJoin
4 participants