-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support reading bloom filters from Parquet files and filter row groups using them #17289
base: branch-25.02
Are you sure you want to change the base?
Support reading bloom filters from Parquet files and filter row groups using them #17289
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ty operators in the expression
ast_operator::NOT, _bloom_filter_expr.push(ast::operation{ast_operator::NOT, value})}); | ||
} | ||
// For all other expressions, push an always true expression | ||
else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karthikeyann @vuule added this logic handle any non col == lit
type expressions in the filter. Essentially just transforming them all to always true.
* @brief Collects lists of equality predicate literals in the AST expression, one list per input | ||
* table column. This is used in row group filtering based on bloom filters. | ||
*/ | ||
class equality_literals_collector : public ast::detail::expression_transformer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for an ast::tree
in this expression converter as we only visit and collect literals for col == lit
expressions.
*/ | ||
std::reference_wrapper<ast::expression const> visit(ast::literal const& expr) override | ||
{ | ||
return expr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to push any of these to the ast::tree
from the child class bloom_filter_expression_converter
either as these columns or literals don't participate in the transformed expression tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few small comments, looks good overall :)
Co-authored-by: Vukasin Milovanovic <[email protected]>
Bump cuco by one commit which contains updates to unblock rapidsai/cudf#17289
#735) Simply bump cuco by one commit which contains updates to unblock rapidsai/cudf#17289 Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Bradley Dice (https://github.com/bdice) URL: #735
…st::tree` (#17587) This PR simplifies the StatsAST expression transformer in Parquet reader's predicate pushdown using `ast::tree` from (#17156). This PR is a follow up to @bdice's comment at #17289 (comment). Similar changes for the `BloomfilterAST` expression converter have been incorporated in the PR #17289. Related to #17164 Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Karthikeyan (https://github.com/karthikeyann) - Vukasin Milovanovic (https://github.com/vuule) - Bradley Dice (https://github.com/bdice) URL: #17587
Description
This PR adds support to read bloom filters from Parquet files and use them to filter row groups based on
col == literal
like predicate(s), if provided.Related to #17164
Checklist