count optimization for multisplits #5048

PSeitz · 2024-05-30T04:34:34Z

optimization requests by passing threshold in leaf search
Execute query.count() instead of QuickwitCollector for count searches

We have 100 concurrent split searches by default, but num_cpus worker
threads. This means most search futures will wait to be
scheduled. When they are scheduled they can check the new threshold from
the preceding searches and maybe skip the search.

Switches from Mutex to RWLock for the threshold since we read more often now.

Addresses #5032

Local SSD:
https://qw-benchmarks.104.155.161.122.nip.io/?run_ids=1707,1708&search_metric=engine_duration
CI on S3:
https://qw-benchmarks.104.155.161.122.nip.io/?run_ids=1694,1710&search_metric=engine_duration

Future Work

We run num_cpu full searches in some cases before the threshold kicks
in. But in some cases we could statically
analyze from which split the best results come and generate count only
requests for the others. For that we need the counts, so either we send them to the leaf or this optimization happens on the root.

We can pass a threshold based on the limits of the fast field (if available) for numeric queries

fulmicoton · 2024-05-30T07:40:10Z

quickwit/quickwit-search/src/leaf.rs

+        .map_err(|err| SearchError::InvalidQuery(err.to_string()))?;
+
+    // CanSplitDoBetter or rewrite_request may have changed the request to be a count only request
+    // This may be the case for AllQuery with a sort by date, where the current split can't have


great comment.

fulmicoton · 2024-05-30T09:04:15Z

quickwit/quickwit-search/src/root.rs

+/// This is done by exclusion, so we will need to keep it up to date if fields are added.
+///
+/// The passed query_ast should match the serialized on in request.
+pub fn is_metadata_count_request_with_ast(query_ast: &QueryAst, request: &SearchRequest) -> bool {


Suggested change

pub fn is_metadata_count_request_with_ast(query_ast: &QueryAst, request: &SearchRequest) -> bool {

fn is_metadata_count_request_with_ast(query_ast: &QueryAst, request: &SearchRequest) -> bool {

this is also used in leaf.rs

quickwit/quickwit-search/src/collector.rs

quickwit/quickwit-search/src/root.rs

quickwit/quickwit-search/src/leaf.rs

* optimization requests by passing threshold in leaf search * Execute query.count() instead of QuickwitCollector for count searches We have 100 concurrent split searches by default, but num_cpus worker threads. This means most search futures will wait to be scheduled. When they are scheduled they can check the new threshold from the preceding searches and maybe skip the search. Switches to RWLock for the threshold since we read more often now. Future Work: We run num_cpu full searches in some cases before the threshold kicks in. But in some cases we could statically analyze from which split the best results come and generate count only requests for the others. Addresses #5032

PSeitz force-pushed the count_opt branch from 7b60514 to b6fb677 Compare May 30, 2024 04:36

fulmicoton reviewed May 30, 2024

View reviewed changes

PSeitz requested a review from trinity-1686a May 30, 2024 11:08

trinity-1686a approved these changes May 30, 2024

View reviewed changes

quickwit/quickwit-search/src/collector.rs Outdated Show resolved Hide resolved

quickwit/quickwit-search/src/root.rs Show resolved Hide resolved

quickwit/quickwit-search/src/leaf.rs Outdated Show resolved Hide resolved

PSeitz force-pushed the count_opt branch from b6fb677 to 363e1c2 Compare May 31, 2024 05:35

add comments

bcf5dfd

PSeitz force-pushed the count_opt branch from 363e1c2 to bcf5dfd Compare May 31, 2024 05:41

PSeitz merged commit c90edeb into main May 31, 2024
5 checks passed

PSeitz deleted the count_opt branch May 31, 2024 07:54

PSeitz mentioned this pull request Jun 4, 2024

Add optimization for pure count and count aggregation #5032

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

count optimization for multisplits #5048

count optimization for multisplits #5048

PSeitz commented May 30, 2024 •

edited

Loading

fulmicoton May 30, 2024

fulmicoton May 30, 2024

PSeitz May 30, 2024

	pub fn is_metadata_count_request_with_ast(query_ast: &QueryAst, request: &SearchRequest) -> bool {
	fn is_metadata_count_request_with_ast(query_ast: &QueryAst, request: &SearchRequest) -> bool {

count optimization for multisplits #5048

count optimization for multisplits #5048

Conversation

PSeitz commented May 30, 2024 • edited Loading

Future Work

fulmicoton May 30, 2024

Choose a reason for hiding this comment

fulmicoton May 30, 2024

Choose a reason for hiding this comment

PSeitz May 30, 2024

Choose a reason for hiding this comment

PSeitz commented May 30, 2024 •

edited

Loading