-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic SLO Cleanup Mechanism #198776
Comments
Pinging @elastic/obs-ux-management-team (Team:obs-ux-management) |
Thanks for writing this up. In your use case, what is the scale we're talking about here? How often do you have an SLO that needs deleting vs updating? |
From slack (https://elastic.slack.com/archives/C044PV8EJ4X/p1730729974044599?thread_ts=1730725339.130429&cid=C044PV8EJ4X)
|
Thanks for following up, @drewpost! In our case, the scale is quite large, we’re managing thousands of SLOs, and over time, quite a few become outdated or irrelevant. We usually find that deletions are more common than updates, especially as services evolve or get deprecated. It’s not uncommon for large batches of SLOs to need periodic cleanup |
Related: #195266 |
Description
Currently, there is no automated cleanup feature for SLOs, and as a result, our existing SLOs may not accurately reflect the true reliability of our services. We propose a solution to introduce an automated cleanup mechanism for SLOs to ensure that only relevant and up-to-date SLOs are maintained in the production environment.
Currently, to clean up SLOs, we run an
update_by_query
against the SLO indices. However, we need a more straightforward method for users and customers to clean up their SLOs without added hassleProblem Statement:
group_by
fields, resulting in inaccurate reliability metricsno_data
for extended periods. An automatic removal of SLOs with ano_data
status for more than X hours would help maintain only meaningful and actionable SLOs.Ideas/Solutions:
no_data
Status: Allow SLOs with ano_data
status to be automatically removed if this condition persists for more than a configurable duration (e.g., X hours).group_by
Fields: Implement checks to ensure that SLOs referencing non-existentgroup_by
fields are either flagged for review or automatically removed, depending on the configuration.Benefits
This feature would help maintain a cleaner and more accurate set of SLOs that reflect only the SLOs that actually matters/works and by reducing the need for manual cleanup, engineers can focus on other critical tasks, improving overall productivity.
The text was updated successfully, but these errors were encountered: