-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
count optimization for multisplits #5048
Conversation
.map_err(|err| SearchError::InvalidQuery(err.to_string()))?; | ||
|
||
// CanSplitDoBetter or rewrite_request may have changed the request to be a count only request | ||
// This may be the case for AllQuery with a sort by date, where the current split can't have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great comment.
/// This is done by exclusion, so we will need to keep it up to date if fields are added. | ||
/// | ||
/// The passed query_ast should match the serialized on in request. | ||
pub fn is_metadata_count_request_with_ast(query_ast: &QueryAst, request: &SearchRequest) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn is_metadata_count_request_with_ast(query_ast: &QueryAst, request: &SearchRequest) -> bool { | |
fn is_metadata_count_request_with_ast(query_ast: &QueryAst, request: &SearchRequest) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also used in leaf.rs
* optimization requests by passing threshold in leaf search * Execute query.count() instead of QuickwitCollector for count searches We have 100 concurrent split searches by default, but num_cpus worker threads. This means most search futures will wait to be scheduled. When they are scheduled they can check the new threshold from the preceding searches and maybe skip the search. Switches to RWLock for the threshold since we read more often now. Future Work: We run num_cpu full searches in some cases before the threshold kicks in. But in some cases we could statically analyze from which split the best results come and generate count only requests for the others. Addresses #5032
We have 100 concurrent split searches by default, but num_cpus worker
threads. This means most search futures will wait to be
scheduled. When they are scheduled they can check the new threshold from
the preceding searches and maybe skip the search.
Switches from Mutex to RWLock for the threshold since we read more often now.
Addresses #5032
Local SSD:
https://qw-benchmarks.104.155.161.122.nip.io/?run_ids=1707,1708&search_metric=engine_duration
CI on S3:
https://qw-benchmarks.104.155.161.122.nip.io/?run_ids=1694,1710&search_metric=engine_duration
Future Work
We run num_cpu full searches in some cases before the threshold kicks
in. But in some cases we could statically
analyze from which split the best results come and generate count only
requests for the others. For that we need the counts, so either we send them to the leaf or this optimization happens on the root.
We can pass a threshold based on the limits of the fast field (if available) for numeric queries