Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Test parquet reads with bloom filters #11962

Open
revans2 opened this issue Jan 13, 2025 · 0 comments
Open

[FEA] Test parquet reads with bloom filters #11962

revans2 opened this issue Jan 13, 2025 · 0 comments
Assignees
Labels
test Only impacts tests

Comments

@revans2
Copy link
Collaborator

revans2 commented Jan 13, 2025

Is your feature request related to a problem? Please describe.
By default in Spark parquet does not write out bloom filters. CUDF is in the process of adding in support for using bloom filters when doing predicate push down. rapidsai/cudf#17289

I don't expect that to impact us because we don't use the CUDF predicate push down yet. But we probably want some tests to at least verify that we are doing the right thing on reads. This is especially true for combining readers. We need to make sure that if there were bloom filter references, that we either copied/updated them or deleted them from the footers.

I think this becomes more important if we do start to try and use CUDF for predicate push down, which we have plans to try and do.

@revans2 revans2 added ? - Needs Triage Need team to review and classify feature request New feature or request test Only impacts tests labels Jan 13, 2025
@mattahrens mattahrens removed ? - Needs Triage Need team to review and classify feature request New feature or request labels Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
Development

No branches or pull requests

3 participants