Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

count optimization for multisplits #5048

Merged
merged 2 commits into from
May 31, 2024
Merged

count optimization for multisplits #5048

merged 2 commits into from
May 31, 2024

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented May 30, 2024

  • optimization requests by passing threshold in leaf search
  • Execute query.count() instead of QuickwitCollector for count searches

We have 100 concurrent split searches by default, but num_cpus worker
threads. This means most search futures will wait to be
scheduled. When they are scheduled they can check the new threshold from
the preceding searches and maybe skip the search.

Switches from Mutex to RWLock for the threshold since we read more often now.

Addresses #5032

Local SSD:
https://qw-benchmarks.104.155.161.122.nip.io/?run_ids=1707,1708&search_metric=engine_duration
CI on S3:
https://qw-benchmarks.104.155.161.122.nip.io/?run_ids=1694,1710&search_metric=engine_duration

Future Work

We run num_cpu full searches in some cases before the threshold kicks
in. But in some cases we could statically
analyze from which split the best results come and generate count only
requests for the others. For that we need the counts, so either we send them to the leaf or this optimization happens on the root.

We can pass a threshold based on the limits of the fast field (if available) for numeric queries

.map_err(|err| SearchError::InvalidQuery(err.to_string()))?;

// CanSplitDoBetter or rewrite_request may have changed the request to be a count only request
// This may be the case for AllQuery with a sort by date, where the current split can't have
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great comment.

/// This is done by exclusion, so we will need to keep it up to date if fields are added.
///
/// The passed query_ast should match the serialized on in request.
pub fn is_metadata_count_request_with_ast(query_ast: &QueryAst, request: &SearchRequest) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn is_metadata_count_request_with_ast(query_ast: &QueryAst, request: &SearchRequest) -> bool {
fn is_metadata_count_request_with_ast(query_ast: &QueryAst, request: &SearchRequest) -> bool {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also used in leaf.rs

@PSeitz PSeitz requested a review from trinity-1686a May 30, 2024 11:08
quickwit/quickwit-search/src/collector.rs Outdated Show resolved Hide resolved
quickwit/quickwit-search/src/root.rs Show resolved Hide resolved
quickwit/quickwit-search/src/leaf.rs Outdated Show resolved Hide resolved
* optimization requests by passing threshold in leaf search
* Execute query.count() instead of QuickwitCollector for count searches

We have 100 concurrent split searches by default, but num_cpus worker
threads. This means most search futures will wait to be
scheduled. When they are scheduled they can check the new threshold from
the preceding searches and maybe skip the search.

Switches to RWLock for the threshold since we read more often now.

Future Work:
We run num_cpu full searches in some cases before the threshold kicks
in. But in some cases we could statically
analyze from which split the best results come and generate count only
requests for the others.

Addresses #5032
@PSeitz PSeitz merged commit c90edeb into main May 31, 2024
5 checks passed
@PSeitz PSeitz deleted the count_opt branch May 31, 2024 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants