Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not add redundant subquery ordering into plan #30

Closed
wants to merge 47 commits into from

Conversation

mertak-synnada
Copy link

@mertak-synnada mertak-synnada commented Aug 6, 2024

Which issue does this PR close?

Rationale for this change

This PR solves an issue with the subquery sortings. If a main query does not support the same sorting with the subquery and the subquery has no limit, then the ordering is redundant (unless used in an Aggregation but it's being handled in a different plan type anyway).

While implementing this behavior we discovered that it's only possible to use WITH ORDER syntax with EXTERNAL tables. That's why we needed to add LIMIT xx conditions to subqueries so that the system can understand they're valid orderings. The sql-parser issue will be handled in a different PR to the sql-parser project and those tests will be changed back to their original forms

Other than that, we discovered there are some redundant LimitExec plans even if the child plan has fetch support. Also, fixed this issue and replaced GlobalLimitExec plans with skip=0 with LocalLimitExec plan type.

Also added Limit push down support to the UnionExec plan type.

What changes are included in this PR?

Explained above

Are these changes tested?

Yes

Are there any user-facing changes?

No

Copy link
Collaborator

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's address my comments and iterate.

datafusion/sql/src/relation/mod.rs Outdated Show resolved Hide resolved
datafusion/sqllogictest/test_files/group_by.slt Outdated Show resolved Hide resolved
datafusion/sqllogictest/test_files/group_by.slt Outdated Show resolved Hide resolved
datafusion/sqllogictest/test_files/group_by.slt Outdated Show resolved Hide resolved
datafusion/sqllogictest/test_files/group_by.slt Outdated Show resolved Hide resolved
datafusion/sqllogictest/test_files/select.slt Outdated Show resolved Hide resolved
datafusion/sqllogictest/test_files/select.slt Show resolved Hide resolved
datafusion/sqllogictest/test_files/window.slt Outdated Show resolved Hide resolved
datafusion/sqllogictest/test_files/subquery_sort.slt Outdated Show resolved Hide resolved
datafusion/sql/src/relation/mod.rs Outdated Show resolved Hide resolved
mustafasrepo and others added 11 commits August 7, 2024 09:17
…ting' into feature/Sort-For-Subqueries

# Conflicts:
#	datafusion/sqllogictest/test_files/window.slt
remove unnecessary limit plans when used with sort + fetch
add test case for Sort and Limit with offset
push down limit even if a child with no fetch appears when the child supports push down
mertak-synnada and others added 12 commits August 8, 2024 09:03
Co-authored-by: Mehmet Ozan Kabak <[email protected]>
…ting' into feature/Sort-For-Subqueries

# Conflicts:
#	datafusion/sqllogictest/test_files/aggregates_topk.slt
…eature/Sort-For-Subqueries

# Conflicts:
#	datafusion/core/src/physical_optimizer/enforce_distribution.rs
#	datafusion/core/src/physical_optimizer/enforce_sorting.rs
#	datafusion/core/src/physical_optimizer/limit_pushdown.rs
#	datafusion/core/src/physical_optimizer/sort_pushdown.rs
#	datafusion/physical-plan/src/coalesce_batches.rs
#	datafusion/physical-plan/src/limit.rs
#	datafusion/physical-plan/src/sorts/sort.rs
#	datafusion/sqllogictest/test_files/order.slt
add pushes_global_limit_into_multiple_fetch_plans test case
change limit_pushdown.rs as manual top down operator and simplify algorithm by supporting most parent node remove and other pushdown cases
Copy link
Collaborator

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close to finalizing

Comment on lines 265 to 267
// UnionExec is supporting limit_pushdown because it's only applicable with
// SortPreservingMergeExec plan. Since SortExec does not support limit_pushdowm
// it's not possible to lose sorting in UnionExec's children.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this is trying to say

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed adding the supports_limit_pushdown method to UnionExec or not. We agreed to do so because LimitPushDown is only applicable to SPM in UnionExec's possible children. That's why I put this comment actually, maybe we can put a better comment I don't know

datafusion/sql/src/relation/mod.rs Outdated Show resolved Hide resolved
datafusion/physical-optimizer/src/limit_pushdown.rs Outdated Show resolved Hide resolved
datafusion/physical-optimizer/src/limit_pushdown.rs Outdated Show resolved Hide resolved
datafusion/physical-optimizer/src/limit_pushdown.rs Outdated Show resolved Hide resolved
datafusion/physical-optimizer/src/limit_pushdown.rs Outdated Show resolved Hide resolved
datafusion/physical-optimizer/src/limit_pushdown.rs Outdated Show resolved Hide resolved
datafusion/physical-optimizer/src/limit_pushdown.rs Outdated Show resolved Hide resolved
datafusion/physical-plan/src/coalesce_batches.rs Outdated Show resolved Hide resolved
datafusion/physical-plan/src/limit.rs Outdated Show resolved Hide resolved
@ozankabak
Copy link
Collaborator

I am going to merge this today unless there are any concerns raised. We will then proceed with WITH ORDER support for memory tables and remove the temporary LIMIT clauses introduced

@ozankabak
Copy link
Collaborator

Merged upstream.

@ozankabak ozankabak closed this Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants