Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Pagination is broken in the alerts table #201913

Open
MadameSheema opened this issue Nov 27, 2024 · 7 comments
Open

[Security Solution] Pagination is broken in the alerts table #201913

MadameSheema opened this issue Nov 27, 2024 · 7 comments
Labels
bug Fixes for quality problems that affect the customer experience Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) triage_needed

Comments

@MadameSheema
Copy link
Member

Describe the bug:

  • Pagination is broken in the alerts table

Kibana/Elasticsearch Stack version:
8.17.0 - BC1

Initial setup:

  • To have a big amount of alerts generated. In my case, 12.187 alerts.

Steps to reproduce:

  1. Navigate to the alerts page
  2. Click the last pagination number, in my case, 244.

Current behavior:

Image

An error is displayed

Result window is too large, from + size must be less than or equal to: [10000] but was [12200]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.

  • The alerts are not displayed

Expected behavior:

  • No error should be displayed
  • Alerts should be displayed
@MadameSheema MadameSheema added bug Fixes for quality problems that affect the customer experience Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Threat Hunting Security Solution Threat Hunting Team Team:Threat Hunting:Investigations Security Solution Investigations Team triage_needed labels Nov 27, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-threat-hunting (Team:Threat Hunting)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-threat-hunting-investigations (Team:Threat Hunting:Investigations)

@PhilippeOberti
Copy link
Contributor

PhilippeOberti commented Nov 27, 2024

This is related to the 10k limit that ES has. It happens not only on the last page, but on the first page after you reach 10k elements in the table. For example in the video below, I had 100 elements per page, as soon as I reach page 101 the error appears.

Screen.Recording.2024-11-27.at.4.18.24.PM.mov

@PhilippeOberti
Copy link
Contributor

Here's the payload of the call being made

{
  "featureIds": [
    "siem"
  ],
  "fields": [
    {
      "field": "@timestamp",
      "include_unmapped": true
    },
    {
      "field": "kibana.alert.rule.name",
      "include_unmapped": true
    },
    {
      "field": "kibana.alert.workflow_assignee_ids",
      "include_unmapped": true
    },
    {
      "field": "kibana.alert.severity",
      "include_unmapped": true
    },
    {
      "field": "kibana.alert.risk_score",
      "include_unmapped": true
    },
    {
      "field": "kibana.alert.reason",
      "include_unmapped": true
    },
    {
      "field": "host.name",
      "include_unmapped": true
    },
    {
      "field": "user.name",
      "include_unmapped": true
    },
    {
      "field": "host.risk.calculated_level",
      "include_unmapped": true
    },
    {
      "field": "user.risk.calculated_level",
      "include_unmapped": true
    },
    {
      "field": "host.asset.criticality",
      "include_unmapped": true
    },
    {
      "field": "user.asset.criticality",
      "include_unmapped": true
    },
    {
      "field": "process.name",
      "include_unmapped": true
    },
    {
      "field": "file.name",
      "include_unmapped": true
    },
    {
      "field": "source.ip",
      "include_unmapped": true
    },
    {
      "field": "destination.ip",
      "include_unmapped": true
    }
  ],
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [],
          "filter": [
            {
              "match_phrase": {
                "kibana.alert.workflow_status": "open"
              }
            },
            {
              "range": {
                "@timestamp": {
                  "gte": "2024-11-27T06:00:00.000Z",
                  "lte": "2024-11-28T05:59:59.999Z",
                  "format": "strict_date_optional_time"
                }
              }
            }
          ],
          "should": [],
          "must_not": [
            {
              "exists": {
                "field": "kibana.alert.building_block_type"
              }
            }
          ]
        }
      }
    }
  },
  "pagination": {
    "pageIndex": 100,
    "pageSize": 100
  },
  "sort": [
    {
      "@timestamp": {
        "order": "desc"
      }
    }
  ],
  "runtimeMappings": {},
  "isSearchStored": false,
  "stream": false
}

and here's the error coming back from the backend

{
    "statusCode": 400,
    "error": "Bad Request",
    "message": "status_exception\n\tCaused by:\n\t\tsearch_phase_execution_exception: all shards failed",
    "attributes": {
        "error": {
            "type": "status_exception",
            "reason": "error while executing search",
            "caused_by": {
                "type": "search_phase_execution_exception",
                "reason": "all shards failed",
                "phase": "query",
                "grouped": true,
                "failed_shards": [
                    {
                        "shard": 0,
                        "index": ".internal.alerts-security.alerts-default-000001",
                        "node": "nDFzgFmYRvmk4IQhJ4zftw",
                        "reason": {
                            "type": "illegal_argument_exception",
                            "reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [10100]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
                        }
                    }
                ],
                "caused_by": {
                    "type": "illegal_argument_exception",
                    "reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [10100]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.",
                    "caused_by": {
                        "type": "illegal_argument_exception",
                        "reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [10100]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
                    }
                }
            }
        },
        "rawResponse": {
            "took": 15,
            "timed_out": false,
            "terminated_early": false,
            "num_reduce_phases": 0,
            "_shards": {
                "total": 1,
                "successful": 0,
                "skipped": 0,
                "failed": 1,
                "failures": [
                    {
                        "shard": 0,
                        "index": ".internal.alerts-security.alerts-default-000001",
                        "node": "nDFzgFmYRvmk4IQhJ4zftw",
                        "reason": {
                            "type": "illegal_argument_exception",
                            "reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [10100]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
                        }
                    }
                ]
            },
            "hits": {
                "total": {
                    "value": 0,
                    "relation": "gte"
                },
                "max_score": null,
                "hits": []
            }
        },
        "requestParams": {
            "method": "POST",
            "path": "/.alerts-security.alerts-default/_async_search",
            "querystring": "batched_reduce_size=64&ccs_minimize_roundtrips=true&wait_for_completion_timeout=200ms&keep_on_completion=false&keep_alive=60000ms&ignore_unavailable=true&allow_no_indices=true"
        }
    }
}

@logeekal
Copy link
Contributor

logeekal commented Nov 29, 2024

This is related to the 10k limit that ES has. It happens not only on the last page, but on the first page after you reach 10k elements in the table. For example in the video below, I had 100 elements per page, as soon as I reach page 101 the error appears.

@PhilippeOberti , I agree that it is an ES limit but they also provide an alternative to avoid this problem.

I think alternative could be to cap the results at 10000 instead of giving an error.

@elastic/response-ops team, i think this will be affecting all the consumers of alert table because of how privateRuleRegistryAlertsSearchStrategy. Should we plan to include scroll API in privateRuleRegistryAlertsSearchStrategy ?

@PhilippeOberti PhilippeOberti added Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) and removed Team:Threat Hunting Security Solution Threat Hunting Team Team:Threat Hunting:Investigations Security Solution Investigations Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. labels Dec 3, 2024
@cnasikas
Copy link
Member

cnasikas commented Dec 18, 2024

Hey all. Sorry for the late reply. I agree that we should cap the results to 10K instead of showing an error. I would suggest not using the Scroll API or the Search after API as ES does not recommend the first one and the second one does not work with pagination (you cannot get results by page or perPage). We can show a warning banner to the users that only 10K alerts are being shown, and if they want to view more, they should narrow their search criteria. We follow this pattern in cases and the rule's execution log.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) triage_needed
Projects
None yet
Development

No branches or pull requests

5 participants