phrase_prefix match on random value causes slow querys and spike memory usage #5086

esatterwhite · 2024-06-05T20:04:32Z

Describe the bug
A clear and concise description of what the bug is.

Steps to reproduce (if applicable)
Steps to reproduce the behavior:

Example Request

Caution

The index pattern in the request expands to 4 indexes and contains 3,869,958,639 documents in total

/api/v1/_elastic/logline.996226df4b.2024-06-03*,logline.996226df4b.2024-06-04*/_search

{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "type": "phrase_prefix",
            "query": "6c59f652-f1f5-11ee-86b2-562c83a610e2",
            "fields": [
              "_line"
            ],
            "lenient": true
          }
        }
      ]
    }
  }
}

Note

It seems like its the length of the query we're trying to match here, if I shorten the search query, it'll finish but the time increases
the longer it gets

6c59f652 - 19556 ms
6c59f652-f1f5-11ee - 32743 ms
6c59f652-f1f5-11ee-86b2-562c83a610e2 - Did not finish

The memory usage spikes above defined memory limits (4gb), and the longer running queries take long enough that kubernetes OOMKiller is prompted to terminate the pod running the query.

In comparison to the timings presented by elasticsearch on a near identical data set

6c59f652 5277 ms
6c59f652-f1f5-11ee - 3378 ms
6c59f652-f1f5-11ee-86b2-562c83a610e2 - 5365 ms

Example Document

{
  "__key": "logline:996226df4b:webapp:ldw-5c4b45dd7d-w94fv:ld11",
  "_account": "996226df4b",
  "_app": "webapp",
  "_bid": "34c1c7cb-e8c4-4190-afcc-5c2591384b7e:31899:ld11",
  "_cluster": "ld11",
  "_file": "/var/log/containers/ldw-5c4b45dd7d-w94fv_default_webapp-0fe267d84862d915a960d4159a47f52d560b4dfd072a1b6d4691c3979a5c3369.log",
  "_host": "ldw-5c4b45dd7d-w94fv",
  "_id": "1751243621446717490",
  "_ingester": "logdna-agent/3.10.0-beta.1 (Amazon,Linux/2)",
  "_ip": "***REDACTED***",
  "_label": {
    "app": "ldw",
    "pod-template-hash": "5c4b45dd7d"
  },
  "_lid": "1751243621446717490",
  "_line": "[240604 19:23:32] [info] {\"message\":\"Proxy request from 2a6204e8c7 to /v3/pipeline/6c59f652-f1f5-11ee-86b2-562c83a610e2\",\"meta\":{\"stack\":[\"at ProxyServer.<anonymous> (/opt/app/server/config/proxies/proxy.js:15:12)\",\"at ProxyServer.emit (/opt/app/node_modules/eventemitter3/index.js:210:27)\",\"at ClientRequest.<anonymous> (/opt/app/node_modules/http-proxy/lib/http-proxy/passes/web-incoming.js:133:16)\",\"at ClientRequest.emit (node:events:519:28)\",\"at ClientRequest.emit (node:domain:488:12)\",\"at tickOnSocket (node:_http_client:839:7)\",\"at onSocketNT (node:_http_client:902:5)\",\"at process.processTicksAndRejections (node:internal/process/task_queues:83:21)\"],\"version\":\"8.5.108\",\"commit\":\"32c4bf268d\"}}",
  "_logtype": "json",
  "_meta": {
    "Image Name": "us.gcr.io/logdna-k8s/logdna-web",
    "Tag": "8.5.108"
  },
  "_mezmo_line_size": "1305",
  "_originating_user_agent": null,
  "_tag": [
    "agent-v2",
    "prod",
    "us-east-1",
    "aws"
  ],
  "_ts": 1717529012835,
  "container": "webapp",
  "containerid": "0fe267d84862d915a960d4159a47f52d560b4dfd072a1b6d4691c3979a5c3369",
  "level": "info",
  "message": "Proxy request from 2a6204e8c7 to /v3/pipeline/6c59f652-f1f5-11ee-86b2-562c83a610e2",
  "namespace": "default",
  "node": "***REDACTED***",
  "o_meta": {
    "commit": "32c4bf268d",
    "stack": [
      "at ProxyServer.<anonymous> (/opt/app/server/config/proxies/proxy.js:15:12)",
      "at ProxyServer.emit (/opt/app/node_modules/eventemitter3/index.js:210:27)",
      "at ClientRequest.<anonymous> (/opt/app/node_modules/http-proxy/lib/http-proxy/passes/web-incoming.js:133:16)",
      "at ClientRequest.emit (node:events:519:28)",
      "at ClientRequest.emit (node:domain:488:12)",
      "at tickOnSocket (node:_http_client:839:7)",
      "at onSocketNT (node:_http_client:902:5)",
      "at process.processTicksAndRejections (node:internal/process/task_queues:83:21)"
    ],
    "version": "8.5.108"
  },
  "pod": "ldw-5c4b45dd7d-w94fv"
}

Expected behavior
I would generally expect memory to remain under control, and/or be released after the query completes.. There don't seem to be any settings to help control memory. We have strict limits on memory for quickwit running on kubernetes, But quickwit itself seems to have an unbounded memory usage cap internally (or a leak).

Every time this query is run, memory makes a significant jump and never seems to come down.

Additionally, query times are very slow for this kind of query and fail to complete in a reasonable amount of time (10-30seconds).

Configuration:

Output of quickwit --version

Quickwit 0.8.0 (aarch64-unknown-linux-gnu 2024-06-04T12:57:54Z 67a798d)

The index_config.yaml

version: '0.8'
# metastore_uri: postgres connection string generated by postgres operator
default_index_root_uri: s3://***REDACTED***/indexes
searcher:
  fast_field_cache_capacity: 2G # default 1G
  aggregation_memory_limit: 500M # default 500M
  partial_request_cache_capacity: 64M # default 64M
  max_num_concurrent_split_searches: 100 # default 100
  split_cache:
    max_num_bytes: 2000G
    max_num_splits: 1000000
    num_concurrent_downloads: 4
indexer:
  enable_cooperative_indexing: true
  enable_otlp_endpoint: true
  cpu_capacity: 1000m
ingest_api:
  max_queue_disk_usage: 4GiB
storage:
  s3:
    region: us-east-1
    endpoint: https://s3.amazonaws.com

The text was updated successfully, but these errors were encountered:

fulmicoton · 2024-06-06T06:38:02Z

@esatterwhite can you check how quickwit behaves on this if it was a simple phrase query?
@trinity-1686a can you investigate?

fulmicoton · 2024-06-06T07:15:49Z

@trinity-1686a
https://github.com/quickwit-oss/tantivy/pull/2425/files

Here are two optimization. The removal of the allocation at each match is probably important for this specific query.
The other one much less.

trinity-1686a · 2024-06-06T17:35:30Z

some of the tokens composing the uuid appear frequently (~150m times) other appear only around 1k time. There is probably some improvement to do on how we do the intersection of terms

fulmicoton · 2024-06-13T09:06:26Z

Next steps:

ask if regular phrase query are faster.
ask eric if the recent image are faster. (they included Add REST endpoints for creating/deleting for index/source resources. #2425)
integrate the optim introduced in Add REST endpoints for creating/deleting for index/source resources. #2425. At the very least remove the allocation per doc.

Consider using emitting a PhraseScorer when we detect there is only one term in the suffix detected in the Weight.

trinity-1686a · 2024-07-10T15:42:15Z

this was diagnosed to happen because the code tokenizer doesn't like hex much, and given 6c59f652-f1f5-11ee-86b2-562c83a610e2 would generate [6, c, 59, f, 652, f, 1, f, 5, 11, ee, 86, b, 2, 562, c, 83, a, 610, e, 2] which contains a lot of token, many of which are likely present in tons of documents (the 1-2 bytes ones)

trinity-1686a · 2024-08-13T12:49:07Z

fixed by #5200

esatterwhite added the bug Something isn't working label Jun 5, 2024

fulmicoton assigned trinity-1686a Jun 6, 2024

trinity-1686a added the high-priority label Jun 13, 2024

trinity-1686a closed this as completed Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

phrase_prefix match on random value causes slow querys and spike memory usage #5086

phrase_prefix match on random value causes slow querys and spike memory usage #5086

esatterwhite commented Jun 5, 2024

fulmicoton commented Jun 6, 2024

fulmicoton commented Jun 6, 2024

trinity-1686a commented Jun 6, 2024

fulmicoton commented Jun 13, 2024 •

edited

Loading

trinity-1686a commented Jul 10, 2024

trinity-1686a commented Aug 13, 2024

phrase_prefix match on random value causes slow querys and spike memory usage #5086

phrase_prefix match on random value causes slow querys and spike memory usage #5086

Comments

esatterwhite commented Jun 5, 2024

Example Request

Example Document

fulmicoton commented Jun 6, 2024

fulmicoton commented Jun 6, 2024

trinity-1686a commented Jun 6, 2024

fulmicoton commented Jun 13, 2024 • edited Loading

trinity-1686a commented Jul 10, 2024

trinity-1686a commented Aug 13, 2024

fulmicoton commented Jun 13, 2024 •

edited

Loading