Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs:alerting match the new severities #2108

Merged
merged 5 commits into from
Nov 6, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 26 additions & 4 deletions docs/source/procedures/alerts/alerting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ The Alertmanager also acts as a data source for Grafana so the active alerts ar

Prometheus Alerts
^^^^^^^^^^^^^^^^^
The Prometheus alerts are defined in the `prometheus.rules.yml` file that is located in the prometheus directory.
The Prometheus alerts are defined in files placed under `prometheus/prom_rules/`. Prometheus will load any file that ends with `yml` or `yaml`.

Each alert consists of:
Each Prometheus alert consists of:

- Name
- What happened
Expand All @@ -36,7 +36,7 @@ For example, let us look at the `InstanceDown` alert that comes by default as pa
expr: up == 0
for: 60s
labels:
severity: "2"
severity: "error"
annotations:
description: '{{ $labels.instance }} has been down for more than 30 seconds.'
summary: Instance {{ $labels.instance }} down
Expand All @@ -45,10 +45,32 @@ The **expr** is a legal Prometheus expression, in this case, the **up** metric i

The **for** set the minimal duration before the alert will be active. Prior to that, the alert will be in pending-mode and will not be sent.

The **labels** part holds additional labels that will be added to the alert, in the example, the **severity** label will be set to **2**.
The **labels** part holds additional labels that will be added to the alert, in the example, the **severity** label will be set to **error**.

**annotations** are special labels that add a description to the alert and can be used in emails and notification when the alert is forward to external systems.

Alerts severity
---------------

Prometheus uses the following severities from low to high: **info**, **warn**, **error**, **critical**.

The Alertmanager inhibition rule (see the Alertmanager section) makes
sure that an alert with higher priority will mute the lower priority rule. For example, out-of-disk space alerts have different thresholds,
this way when the **error** alert for disk-full is firing, it silences the **warn** alert.

When adding your own alerts, you can use the severities to signal the action's importance, for example **critical** would be a pager-duty, while an **error** will be an email.

Adding Alerts
----------------

Add your own alerts in a separate file with a **yml** or **yaml** extension. Making it easier during upgrade.

Modifying Alerts
----------------

It is a common practice to update the alerts that comes with the monitoring stack. Verify that the existing alerts suitable to your needs.
When modifying an alert, remember that it will be overridden on you next upgrade.

Alertmanager
^^^^^^^^^^^^

Expand Down