Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SLO] Add info on "SLO Overview" panel for custom dashboards #3331

Merged
merged 3 commits into from
Nov 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 4 additions & 2 deletions docs/en/observability/slo-burn-rate-alert.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,12 @@

beta::[]

You can create a SLO burn rate rule to get alerts when the burn rate is above a defined threshold for two different lookback periods: a long period and a short period that is 1/12th of the long period.
include::slo-overview.asciidoc[tag=slo-license]

You can create a SLO burn rate rule to get alerts when the burn rate is above a defined threshold for two different lookback periods: a long period and a short period that is 1/12th of the long period.
For example, if your long lookback period is one hour, your short lookback period is five minutes.

For each lookback period, the burn rate is computed as the error rate divided by the error budget.
For each lookback period, the burn rate is computed as the error rate divided by the error budget.
When the burn rates for both periods surpass the threshold, an alert is triggered.

To create an SLO burn rate rule, go to *Observability → SLOs*. Click the more options icon to the right of the SLO you want to add a burn rate rule for, and select *Create new alert rule* from the drop-down menu:
Expand Down
46 changes: 29 additions & 17 deletions docs/en/observability/slo-create.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@

beta::[]

IMPORTANT: Before creating an SLO, you need to <<slo-privileges, configure your SLO access>>.
include::slo-overview.asciidoc[tag=slo-license]

To create an SLO, go to *Observability → SLOs*:
To create an SLO, go to *Observability → SLOs*:

* If you're creating your first SLO, you'll see an introductory page. Click the *Create SLO* button.
* If you've created SLOs before, click the *Create new SLO* button in the upper-right corner of the page.
Expand All @@ -26,18 +26,18 @@ From here, complete the following steps:

The type of SLI to use depends on the location of your data:

* <<custom-kql-sli, Custom KQL>> — create an SLI based on raw logs coming from your services.
* <<custom-kql-sli, Custom KQL>> — create an SLI based on raw logs coming from your services.
* <<custom-metric-sli, Custom metric>> — create an SLI to define custom equations from metric fields in your indices.
* <<histogram-metric-sli, Histogram metric>> — create an SLI based on histogram metrics.
* <<apm-latency-and-availability-sli, APM latency and APM availability>> — create an SLI based on services using application performance monitoring (APM).
* <<apm-latency-and-availability-sli, APM latency and APM availability>> — create an SLI based on services using application performance monitoring (APM).

[discrete]
[[custom-kql-sli]]
== Custom KQL

Create an indicator based on any of your {es} indices or data views. You define two queries: one that yields the good events from your index, and one that yields the total events from your index.
Create an indicator based on any of your {es} indices or data views. You define two queries: one that yields the good events from your index, and one that yields the total events from your index.

*Example:* You can define a custom KQL indicator based on the `service-logs` with the *good query* defined as `nested.field.response.latency <= 100 and nested.field.env : “production”` and the *total query* defined as `nested.field.env : “production”`.
*Example:* You can define a custom KQL indicator based on the `service-logs` with the *good query* defined as `nested.field.response.latency <= 100 and nested.field.env : “production”` and the *total query* defined as `nested.field.env : “production”`.

When defining a custom KQL SLI, set the following fields:

Expand All @@ -54,7 +54,7 @@ When defining a custom KQL SLI, set the following fields:

Create an indicator to define custom equations from metric fields in your indices.

*Example:* You can define *Good events* as the sum of the field `processor.processed` with a filter of `"processor.outcome: \"success\""`, and the *Total events* as the sum of `processor.processed` with a filter of `"processor.outcome: *"`.
*Example:* You can define *Good events* as the sum of the field `processor.processed` with a filter of `"processor.outcome: \"success\""`, and the *Total events* as the sum of `processor.processed` with a filter of `"processor.outcome: *"`.

When defining a custom metric SLI, set the following fields:

Expand All @@ -76,11 +76,11 @@ When defining a custom metric SLI, set the following fields:
[[histogram-metric-sli]]
== Histogram metric

Histograms record data in a compressed format and can record latency and delay metrics. You can create an SLI based on histogram metrics using a `range` aggregation or a `value_count` aggregation for both the good and total events. Filtering with KQL queries is supported on both event types.
Histograms record data in a compressed format and can record latency and delay metrics. You can create an SLI based on histogram metrics using a `range` aggregation or a `value_count` aggregation for both the good and total events. Filtering with KQL queries is supported on both event types.

When using a `range` aggregation, both the `from` and `to` thresholds are required for the range and the events are the total number of events within that range. The range includes the `from` value and excludes the `to` value.

*Example:* You can define your *Good events* using the `processor.latency` field with a filter of `"processor.outcome: \"success\""`, and your *Total events* using the `processor.latency` field with a filter of `"processor.outcome: *"`.
*Example:* You can define your *Good events* using the `processor.latency` field with a filter of `"processor.outcome: \"success\""`, and your *Total events* using the `processor.latency` field with a filter of `"processor.outcome: *"`.

When defining a histogram metric SLI, set the following fields:

Expand All @@ -89,7 +89,7 @@ When defining a histogram metric SLI, set the following fields:
** *Timestamp field* — The timestamp field used by the index.
** *Query filter* — A KQL filter to specify relevant criteria by which to filter the index documents. For example, `field.environment : "production" and service.name : "my-service"`.
* *Good events*
** *Aggregation* — The type of aggregation to use for good events, either *Value count* or *Range*.
** *Aggregation* — The type of aggregation to use for good events, either *Value count* or *Range*.
** *Field* — The field used to aggregate events considered good or successful. For example, `processor.latency`.
** *From* — (`range` aggregation only) The starting value of the range for good events. For example, `0`.
** *To* — (`range` aggregation only) The ending value of the range for good events. For example, `100`.
Expand All @@ -110,15 +110,15 @@ When defining a histogram metric SLI, set the following fields:
[[apm-latency-sli]]
=== APM latency

Create an indicator based on the APM data that you received from your instrumented services and a latency threshold.
Create an indicator based on the APM data that you received from your instrumented services and a latency threshold.

*Example:* You can define an indicator on an APM service named `banking-service` for the `production` environment, and the transaction name `POST /deposit` with a latency threshold value of 300ms.

[discrete]
[[apm-availability-sli]]
=== APM availability

Create an indicator based on the APM data received from your instrumented services.
Create an indicator based on the APM data received from your instrumented services.

*Example:* You can define an indicator on an APM service named `search-service` for the `production` environment, and the transaction name `POST /search`.

Expand All @@ -143,7 +143,7 @@ After defining your SLI, you need to set your objectives. To set your objectives
[discrete]
[[slo-budgeting-method]]
== Select your budgeting method
You can select either an *occurrences* or a *timeslices* budgeting method:
You can select either an *occurrences* or a *timeslices* budgeting method:

[cols="1,1"]
|===
Expand All @@ -166,7 +166,7 @@ If the SLO target is 98%, we have a `100-98 = 2%` error budget or `8640 * 0.02 =
[discrete]
[[slo-time-window]]
== Set your time window
Select the durations over which you want to compute your SLO. The time window uses the data from the defined rolling period. For example, the last 30 days.
Select the durations over which you want to compute your SLO. The time window uses the data from the defined rolling period. For example, the last 30 days.

[discrete]
[[slo-target]]
Expand All @@ -180,7 +180,19 @@ After setting your objectives, give your SLO a name, a short description, and ad

[discrete]
[[slo-alert-checkbox]]
= SLO burn rate alert rule
= Create an SLO burn rate alert rule

When the *Create an SLO burn rate alert rule* checkbox is selected, the *Create rule* window opens immediately after you click the *Create SLO* button.
Here you can define your SLO burn rate alert rule.
For more information, see <<slo-burn-rate-alert, Create an SLO burn rate rule>>.
Here you can define your SLO burn rate alert rule.
For more information, see <<slo-burn-rate-alert, Create an SLO burn rate rule>>.

[discrete]
[[slo-dashboard]]
= Add an SLO Overview panel to a custom dashboard

After you've created your SLO, you can monitor it from the _SLOs_ page in Observability,
but you can also add an _SLO Overview_ panel to a custom dashboard.
Read more about dashboards in {kibana-ref}/dashboard.html[Dashboard and visualizations].

[role="screenshot"]
image::images/slo-overview-embeddable-widget.png[Using the Add panel button to add an SLO Overview widget to a dashboard]
13 changes: 10 additions & 3 deletions docs/en/observability/slo-overview.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,23 @@

beta::[]

SLOs allow you to set clear, measurable targets for your service performance, based on factors like availability, response times, error rates, and other key metrics.
// tag::slo-license[]
[IMPORTANT]
====
To create and manage SLOs, you need an {subscriptions}[appropriate license] and <<slo-privileges,SLO access>> must be configured.
====
Comment on lines +11 to +14
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grabowskit confirmed that SLO access is limited to Platinum/Enterprise/Trial users so I added a note to the top of all pages that cover SLOs:

To create and manage SLOs, you need an appropriate license and SLO access must be configured.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to this approach, but this would be a new pattern that, AFAIK, we don't replicate elsewhere in the docs. If we move forward with adding a banner to the top of every SLO page because the feature requires an elevated license, we should consider doing the same for other features in Obs that have the same requirement. For example:

Screenshot 2023-11-08 at 3 35 50 PM

// end::slo-license[]

SLOs allow you to set clear, measurable targets for your service performance, based on factors like availability, response times, error rates, and other key metrics.
You can define SLOs based on different types of data sources, such as custom KQL queries and APM latency or availability data.

Once you've defined your SLOs, you can monitor them in real time, with detailed dashboards and alerts that help you quickly identify and troubleshoot any issues that may arise.
Once you've defined your SLOs, you can monitor them in real time, with detailed dashboards and alerts that help you quickly identify and troubleshoot any issues that may arise.
You can also track your progress against your SLO targets over time, with a clear view of your error budgets and burn rates.

[discrete]
[[slo-important-concepts]]
== Important concepts
The following table lists some important concepts related to SLOs:
The following table lists some important concepts related to SLOs:

[horizontal]
Service-level indicator (SLI):: The measurement of your service's performance, such as service latency or availability.
Expand Down
8 changes: 5 additions & 3 deletions docs/en/observability/slo-privileges.asciidoc
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
[[slo-privileges]]
= Configure service-level objective (SLO) access
= Configure service-level objective (SLO) access

++++
<titleabbrev>Configure SLO access</titleabbrev>
++++

beta::[]

IMPORTANT: To create and manage SLOs, you need an {subscriptions}[appropriate license].

You can create the following roles for your SLOs:

* <<slo-all-access,*SLO All*>> — Create, edit, and manage SLOs and their historical summaries.
Expand All @@ -31,7 +33,7 @@ Set the following privileges for the SLO All role:
+
[role="screenshot"]
image::images/slo-es-priv-all.png[Cluster and index privileges for SLO All role]
. In the *Kibana* section, click *Add Kibana privilege*.
. In the *Kibana* section, click *Add Kibana privilege*.
. From the *Spaces* dropdown, either select any specific spaces you want the role to apply to, or select *All Spaces*.
. Set *Observability → SLOs* to `All`.
+
Expand All @@ -48,7 +50,7 @@ Set the following privileges for the SLO Read role:
+
[role="screenshot"]
image::images/slo-es-priv-read.png[Index privileges for SLO Read role]
. In the *Kibana* section, click *Add Kibana privilege*.
. In the *Kibana* section, click *Add Kibana privilege*.
. From the *Spaces* dropdown, either select any specific spaces you want the role to apply to, or select *All Spaces*.
. Set *Observability → SLOs* to `Read`.
+
Expand Down