Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs about timeslice SLI #3604

Merged
merged 1 commit into from
Feb 20, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 34 additions & 3 deletions docs/en/observability/slo-create.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ The type of SLI to use depends on the location of your data:

* <<custom-kql-sli, Custom KQL>> — create an SLI based on raw logs coming from your services.
* <<custom-metric-sli, Custom metric>> — create an SLI to define custom equations from metric fields in your indices.
* <<timeslice-metric-sli, Timeslice metric>> — create an SLI based on a custom equation that uses multiple aggregations.
* <<histogram-metric-sli, Histogram metric>> — create an SLI based on histogram metrics.
* <<apm-latency-and-availability-sli, APM latency and APM availability>> — create an SLI based on services using application performance monitoring (APM).

Expand All @@ -44,7 +45,7 @@ When defining a custom KQL SLI, set the following fields:
* *Query filter* — A KQL filter to specify relevant criteria by which to filter the index documents.
* *Good query* — The query yielding events that are considered good or successful. For example, `nested.field.response.latency <= 100 and nested.field.env : “production”`
* *Total query* — The query yielding all events to take into account for computing the SLI. For example, `nested.field.env : “production”`.
* *Partition by* — The field used to partition the data based on the values of the specific field. For example, you could partition by the `url.domain` field, which would create individual SLOs for each value of the selected field.
* *Group by* — The field used to group the data based on the values of the specific field. For example, you could group by the `url.domain` field, which would create individual SLOs for each value of the selected field.

[discrete]
[[custom-metric-sli]]
Expand All @@ -68,7 +69,37 @@ When defining a custom metric SLI, set the following fields:
** *Metric [A-Z]* — The field that is aggregated using the `sum` aggregation for total events. For example, `processor.processed`
** *Filter [A-Z]* — The filter to apply to the metric for total events. For example, `"processor.outcome: *"`
** *Equation* — The equation that calculates the total metric. For example, `A`.
* *Partition by* — The field used to partition the data based on the values of the specific field. For example, you could partition by the `url.domain` field, which would create individual SLOs for each value of the selected field.
* *Group by* — The field used to group the data based on the values of the specific field. For example, you could group by the `url.domain` field, which would create individual SLOs for each value of the selected field.

[discrete]
[[timeslice-metric-sli]]
== Timeslice metric

Create an indicator based on a custom equation that uses statistical aggregations and a threshold to determine whether a slice is good or bad.
Supported aggregations include `Average`, `Max`, `Min`, `Sum`, `Cardinality`, `Last value`, `Std. deviation`, `Doc count`, and `Percentile`.
The equation supports basic math and logic.

NOTE: This indicator requires you to use the `Timeslices` budgeting method.

*Example:* You can define an indicator to determine whether a Kubernetes StatefulSet is healthy.
First you set the query filter to `orchestrator.cluster.name: "elastic-k8s" AND kubernetes.namespace: "my-ns" AND data_stream.dataset: "kubernetes.state_statefulset"`.
Then you define an equation that compares the number of ready (healthy) replicas to the number of observed replicas:
`A == B ? 1 : 0`, where `A` retrieves the last value of `kubernetes.statefulset.replicas.ready` and `B` retrieves the last value of `kubernetes.statefulset.replicas.observed`.
The equation returns `1` if the condition `A == B` is true (indicating the same number of replicas) or `0` if it's false. If the value is less than 1, you can determine that the Kubernetes StatefulSet is unhealthy.

When defining a timeslice metric SLI, set the following fields:

* *Source*
** *Index* — The data view or index pattern you want to base the SLI on. For example, `metrics-*:metrics-*`.
** *Timestamp field* — The timestamp field used by the index.
** *Query filter* — A KQL filter to specify relevant criteria by which to filter the index documents. For example, `orchestrator.cluster.name: "elastic-k8s" AND kubernetes.namespace: "my-ns" AND data_stream.dataset: "kubernetes.state_statefulset"`.
* *Metric definition*
** *Aggregation [A-Z]* — The type of aggregation to use.
** *Field [A-Z]* — The field to use in the aggregation. For example, `kubernetes.statefulset.replicas.ready`.
** *Filter [A-Z]* — The filter to apply to the metric.
** *Equation* — The equation that calculates the total metric. For example, `A == B ? 1 : 0`.
** *Comparator* - The type of comparison to perform.
** *Threshold* - The value to use along with the comparator to determine if the slice is good or bad.

[discrete]
[[histogram-metric-sli]]
Expand Down Expand Up @@ -98,7 +129,7 @@ When defining a histogram metric SLI, set the following fields:
** *From* — (`range` aggregation only) The starting value of the range for total events. For example, `0`.
** *To* — (`range` aggregation only) The ending value of the range for total events. For example, `100`.
** *KQL filter* — The filter for total events. For example, `"processor.outcome : *"`.
* *Partition by* — The field used to partition the data based on the values of the specific field. For example, you could partition by the `url.domain` field, which would create individual SLOs for each value of the selected field.
* *Group by* — The field used to group the data based on the values of the specific field. For example, you could group by the `url.domain` field, which would create individual SLOs for each value of the selected field.

[discrete]
[[apm-latency-and-availability-sli]]
Expand Down
Loading