-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression: bloom filters are not being used in Parquet queries #8685
Comments
I think the next step to proceed here would be to get some sort of reproducer so we can debug further. |
I tried to add more detailed metric for bloomfilters. Codes here https://github.com/apache/arrow-datafusion/compare/main...my-vegetable-has-exploded:arrow-datafusion:metric-sbbf?expand=1, it works well on unit tests. But when I build datafusion-cli, it fails to execute
|
I filed #8690 to track |
The issue has been fixed now |
good catch @domyway. |
For anyone following along, the fix is #8732 |
Hi @domyway, you can check whether bloom filter works by
Thank you for finding it. |
I conducted a test locally by writing 200GB of data. When using a Bloom filter for queries, I observed that the query only takes 0.1 seconds, whereas without using the Bloom filter, the query takes 1 second. If a query takes 1 second, I can infer that it is not using the Bloom filter because using the Bloom filter should yield results within 0.1 seconds.
Originally posted by @domyway in #8436 (comment)
The text was updated successfully, but these errors were encountered: