Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSVExec projection pushdown #14161

Open
jayzhan211 opened this issue Jan 17, 2025 · 2 comments
Open

CSVExec projection pushdown #14161

jayzhan211 opened this issue Jan 17, 2025 · 2 comments
Assignees

Comments

@jayzhan211
Copy link
Contributor

jayzhan211 commented Jan 17, 2025

          A solid example of what I am looking for is, these plans will project a&c (first CsvExec), and b only (second CsvExec). Embedding the projection into CsvExec is already done, we just need to pushdown the projection below NestedLoopJoinExec

Originally posted by @berkaysynnada in #14120 (comment)

Current optimization is like

        let expected = [
            "NestedLoopJoinExec: join_type=Inner, filter=a@0 < b@1, projection=[c@2]",
            "  CsvExec: file_groups={1 group: [[x]]}, projection=[a, b, c, d, e], has_header=false",
            "  CsvExec: file_groups={1 group: [[x]]}, projection=[a, b, c, d, e], has_header=false",
        ];

where we could further pushdown projection to CscExec

        let expected = [
            "NestedLoopJoinExec: join_type=Inner, filter=a@0 < b@1, projection=[c@2]",
            "  CsvExec: file_groups={1 group: [[x]]}, projection=[a, c], has_header=false",
            "  CsvExec: file_groups={1 group: [[x]]}, projection=[b], has_header=false",
        ];

Since we only requires a, c from left side and b from right side

@berkaysynnada
Copy link
Contributor

It's root cause is actually not CsvExec, it is NestedLoopJoinExec (projections can be embedded into CsvExec seamlessly). Moreover, the issue can be generalized as "improve projection pushdown logic on operators having built-in projection.

Giving more details, when a projection has an input having a built-in projection, it is either pushed-down over it through the children, or refine its built-in projection. However, these can happen at the same time, and the example in the issue description is a good reproducer of it.

@Rachelint
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants