Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Smart limits for the package with prebuilt rules #187645

Closed
4 tasks done
Tracked by #174168
banderror opened this issue Jul 5, 2024 · 7 comments
Closed
4 tasks done
Tracked by #174168

[Security Solution] Smart limits for the package with prebuilt rules #187645

banderror opened this issue Jul 5, 2024 · 7 comments
Assignees
Labels
8.18 candidate Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.17.0 v8.18.0

Comments

@banderror
Copy link
Contributor

banderror commented Jul 5, 2024

Epics: https://github.com/elastic/security-team/issues/1974 (internal), #174168

Summary

Recently we had an incident in Serverless where Kibana instances would crash with an OOM because of an installation of the security_detection_engine Fleet package that Security Solution uses to distribute prebuilt detection rules. Fleet loads whole packages into memory before installing their assets, and this package had become too big for that. The incident has been mitigated by temporarily decreasing the number of assets in the package by ~50%. However, this is a short-term measure that we cannot keep for a long time, because we won't be able to release Milestone 3 of the prebuilt rule customization feature with the current limit of 2 versions per rule in the package.

Before we can release Milestone 3, we will need to increase back the number of versions per rule we ship in the package. In general, the more versions we ship, the better is the UX for upgrading prebuilt rules; the fewer versions we ship, the lighter is the package which also positively affects the UX and increases reliability.

Our goal is to find a balance between reliability and good UX and achieve both. For that, we need to come up with smart and efficient limits for the package with prebuilt rules.

Ideas

Total limits for the package as a whole:

  • Total number of package assets (in our case, rule versions). Currently set to 15000 in Kibana for all Fleet packages. We might want to enforce this limit on the package side and set it to a lower value.
  • Total size of the package in megabytes.

Per rule limits:

  • Hard cap: <= X number of versions no matter what. Exclude older versions and keep newer ones.
  • Hard time cap: exclude versions older than now - X days.
  • "Exponential" time caps: keep <= X versions created within last 3 months, <= Y within last 6 months, <= Z within last 12 months, etc; X < Y < Z < ..., e.g. 4 < 6 < 7. Time ranges grow exponentially while limits grow slower than that: logarithmically or linearly.
  • Time window cap: exclude versions if Elastic published more than X of them within a Y time window. E.g. it could be more than 2 per each 3 months. This could help prevent some "noisy" rules (in terms of the frequency of updates to them) from "eating" too much space of the package, as well as evicting older versions of a noisy rule by newer versions of the same rule.
  • We could include more versions for more popular rules (rules that are installed by users more often) and less versions for less popular rules.

Todo

@banderror banderror added triage_needed Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Rule Management Security Detection Rule Management Team Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area labels Jul 5, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management)

@banderror banderror changed the title [Security Solution] Efficient limits for the number of prebuilt rule assets in the package (DRAFT) [Security Solution] Efficient limits for the package with prebuilt rules (DRAFT) Jul 5, 2024
@banderror banderror changed the title [Security Solution] Efficient limits for the package with prebuilt rules (DRAFT) [Security Solution] Smart limits for the package with prebuilt rules Jul 5, 2024
@approksiu
Copy link

Research update notes in the doc.

@brokensound77
Copy link
Contributor

As an alternative to being limited to what is included in a single package, can we revisit additional rules being fetched from older EPR packages as needed within the specific workflows? I imagine this mostly applies when diving into a single rule?

  • click rule -> load rule
  • click load historical -> iterate EPR and fetch versions

@approksiu
Copy link

Research results here, to be discussed at the product meeting.

@banderror
Copy link
Contributor Author

I'm closing this one as the v8.17.1 package with all historical rule versions has been released. Whether a global limit has been set or not is minor compared to that, as with the current amount of assets in the package (~8000) we have some room for growth before we start having issues with the package installation.

@approksiu @xcrzx I think we'll revisit the limit question in a few months from now, and we'll open a separate ticket for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.18 candidate Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.17.0 v8.18.0
Projects
None yet
Development

No branches or pull requests

5 participants