Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SLO] SLO ops management page #195266

Open
kdelemme opened this issue Oct 7, 2024 · 1 comment
Open

[SLO] SLO ops management page #195266

kdelemme opened this issue Oct 7, 2024 · 1 comment
Assignees
Labels
Meta Team:obs-ux-management Observability Management User Experience Team

Comments

@kdelemme
Copy link
Contributor

kdelemme commented Oct 7, 2024

🍒 Summary

The SLO ops management page would provide a single pane of glass for users to manage all their SLOs and get insight into potential issues with the underlying infrastructure powering the SLOs, e.g. pipeline and transforms.
This page would focus on the SLO definitions instead of the SLO instances as the current SLO listing page is showing.

Available bulk actions:

  • Bulk delete SLO
  • Bulk reset SLO
  • Bulk delete stale SLO instances

The page lists the SLO definitions including:

  • SLO name
  • Ops Health Status
  • Number of instances
  • Other?

Misc:

  • Pagination
  • Searching by slo.name
  • Searching by slo.id
  • Filter by status
  • Sorting by number of instances

Questions

  • Expectation around search?
    • In order to investigate issues with an SLO, would you search by an SLO id, name, instanceId?
    • Do we expect to surface the problematic SLO first? i.e. sort by Ops Health Status?

Ops Health Status

Note

This field provides the overall SLO health status from an operational standpoint. If possible this should be a red-yellow-green light.

This field is computed using the following checks:

  • SLO Version Model is up to date
  • Both transforms exist and are healthy
  • Both ingest pipelines exist
  • Duration between last sli @timestamp and its event.ingested is within reason
  • Other?

APIs

Note

Some existing APIs might be reusable.

List SLO definitions

The current GET /_definitions routes uses the SLO Repository directly. On this Ops page, we need to be able to filter by the Ops Status, which requires us to start from the summary index and then merge the result with the SLO Definition retrieved from the SLO Repository.

@kdelemme kdelemme added the Team:obs-ux-management Observability Management User Experience Team label Oct 7, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

@kdelemme kdelemme self-assigned this Oct 7, 2024
@kdelemme kdelemme changed the title [SLO] SLO management page [SLO] SLO ops management page Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Meta Team:obs-ux-management Observability Management User Experience Team
Projects
None yet
Development

No branches or pull requests

4 participants