[FEA] Test parquet reads with bloom filters #11962

revans2 · 2025-01-13T20:52:46Z

Is your feature request related to a problem? Please describe.
By default in Spark parquet does not write out bloom filters. CUDF is in the process of adding in support for using bloom filters when doing predicate push down. rapidsai/cudf#17289

I don't expect that to impact us because we don't use the CUDF predicate push down yet. But we probably want some tests to at least verify that we are doing the right thing on reads. This is especially true for combining readers. We need to make sure that if there were bloom filter references, that we either copied/updated them or deleted them from the footers.

I think this becomes more important if we do start to try and use CUDF for predicate push down, which we have plans to try and do.

revans2 added ? - Needs Triage Need team to review and classify feature request New feature or request test Only impacts tests labels Jan 13, 2025

mattahrens assigned mythrocks Jan 14, 2025

mattahrens removed ? - Needs Triage Need team to review and classify feature request New feature or request labels Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Test parquet reads with bloom filters #11962

[FEA] Test parquet reads with bloom filters #11962

revans2 commented Jan 13, 2025

[FEA] Test parquet reads with bloom filters #11962

[FEA] Test parquet reads with bloom filters #11962

Comments

revans2 commented Jan 13, 2025