-
-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow warming partially an sstable for an automaton #2559
Conversation
src/index/inverted_index_reader.rs
Outdated
// TODO this operation is often cheap for "friendly" automatons, but can be very costly for | ||
// "unfriendly" ones such as ".*a{50}" (very few terms if any match this pattern, but we | ||
// can't know early). In this case, we decompress and iterate over the entire sstable, while | ||
// still being in async context. Ideally we should spawn this on a threadpool. | ||
let range_to_load = term_info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, it is difficult for readers to know what is the type of range_to_load.
src/index/inverted_index_reader.rs
Outdated
// TODO this operation is often cheap for "friendly" automatons, but can be very costly for | ||
// "unfriendly" ones such as ".*a{50}" (very few terms if any match this pattern, but we | ||
// can't know early). In this case, we decompress and iterate over the entire sstable, while | ||
// still being in async context. Ideally we should spawn this on a threadpool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So if I understand correctly, the trouble is that the computation is done within an iterator.
We eventually yield, but that's only when we effectively get an element.
Is decompressing a single block on the runtime is fast enough, we could probably write ugly code and have our iterator return Option<Range<usize>>
to offer the runtime to yield more often.
If we do it this way, we might also want to actively yield.
I believe tokio only force yield on await, after await has been reach N times. (to be checked). That N might be way too high for us.
The second is probably not of a problem
src/index/inverted_index_reader.rs
Outdated
.get_term_range_async(.., automaton.clone(), None, merge_holes_under_bytes) | ||
.await?; | ||
|
||
// we build a 2nd iterator, this one with no holes, so we don't go through blocks we can't | ||
// match, and just download them to reduce our query count. This makes the assumption | ||
// there is a caching layer below, which might not always be true, but is in Quickwit. | ||
let term_info = self.get_term_range_async(.., automaton, None, 0).await?; | ||
let term_infos = self.get_term_range_async(.., automaton, None, 0).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let term_infos = self.get_term_range_async(.., automaton, None, 0).await?; | |
let term_info_stream = self.get_term_range_async(.., automaton, None, 0).await?; |
(approved but we need to fix the runtime footgun you identified of course) |
src/index/inverted_index_reader.rs
Outdated
let merge_holes_under_bytes = (80 * 1024 * 1024 * 50) / 1000; | ||
// we build a first iterator to download everything. Simply calling the function already | ||
// loads everything, but doesn't start iterating over the sstable. | ||
let mut _term_infos = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut _term_infos = self | |
let _term_infos = self |
src/index/inverted_index_reader.rs
Outdated
// we could do without an iterator, but this allows us access to coalesce which simplify | ||
// things | ||
let posting_ranges_iter = | ||
std::iter::from_fn(move || stream.next().map(|(_k, v)| v.postings_range.clone())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we break it into several statement.
For instance:
- Create the iterator of postings ranges.
- doe the coalesce thing after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments
No description provided.