-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
phrase_prefix match on random value causes slow querys and spike memory usage #5086
Comments
@esatterwhite can you check how quickwit behaves on this if it was a simple phrase query? |
@trinity-1686a Here are two optimization. The removal of the allocation at each match is probably important for this specific query. |
some of the tokens composing the uuid appear frequently (~150m times) other appear only around 1k time. There is probably some improvement to do on how we do the intersection of terms |
Next steps:
Consider using emitting a PhraseScorer when we detect there is only one term in the suffix detected in the Weight. |
this was diagnosed to happen because the code tokenizer doesn't like hex much, and given |
fixed by #5200 |
Describe the bug
A clear and concise description of what the bug is.
Steps to reproduce (if applicable)
Steps to reproduce the behavior:
Example Request
Caution
The index pattern in the request expands to
4 indexes
and contains3,869,958,639 documents
in total/api/v1/_elastic/logline.996226df4b.2024-06-03*,logline.996226df4b.2024-06-04*/_search
Note
It seems like its the length of the query we're trying to match here, if I shorten the search query, it'll finish but the time increases
the longer it gets
6c59f652
- 19556 ms6c59f652-f1f5-11ee
- 32743 ms6c59f652-f1f5-11ee-86b2-562c83a610e2
- Did not finishThe memory usage spikes above defined memory limits (4gb), and the longer running queries take long enough that kubernetes OOMKiller is prompted to terminate the pod running the query.
In comparison to the timings presented by elasticsearch on a near identical data set
6c59f652
5277 ms6c59f652-f1f5-11ee
- 3378 ms6c59f652-f1f5-11ee-86b2-562c83a610e2
- 5365 msExample Document
Expected behavior
I would generally expect memory to remain under control, and/or be released after the query completes.. There don't seem to be any settings to help control memory. We have strict limits on memory for quickwit running on kubernetes, But quickwit itself seems to have an unbounded memory usage cap internally (or a leak).
Every time this query is run, memory makes a significant jump and never seems to come down.
Additionally, query times are very slow for this kind of query and fail to complete in a reasonable amount of time (10-30seconds).
Configuration:
quickwit --version
The text was updated successfully, but these errors were encountered: