Alerts

We use Prometheus to collect metrics, and can trigger alerts based on these metrics. Alerts are specified in their own Kubernetes-resource called alerts as we have made our own operator called Alerterator.

This means that you can manage your alerts with kubectl, and that alerts are not tied to a specific app or namespace. Giving you more freedom to set up the necessary alerts for one or several apps. You can even make your own personal alert-profile.

Custom alerts

Underneath we have an example for a complete Alert-resource, ready to be kubectl apply -f alerts.yaml.

---
apiVersion: "nais.io/v1alpha1"
kind: "Alert"
metadata:
  name: AuraAppAlerts
  labels:
    team: aura
spec:
  receivers: # receivers for all alerts below
    slack:
      channel: '#teambeam-alerts-dev'
      prependText: '<!here> | ' # this text will be prepended to the Slack alert title
  alerts:
    - alert: nais-testapp unavailable
      expr: 'kube_deployment_status_replicas_unavailable{deployment="app-name"} > 0'
      for: 2m
      action: Read app logs(kubectl logs appname). Read Application events (kubectl descibe deployment appname)
      description: Av og til går appen ned, ofte pga. feil i kode ved oppstart
      documentation: https://github.com/navikt/aura-doc/naisvakt/alerts.md#app_unavailable
      sla: respond within 1h, during office hours
      severity: critical

We also support e-mail as a receiver, check out a bigger example in the alerterator-repo. In the same folder we also have a set of recommended-alerts for you to get started with.

Fields/spec

Parameter	Description	Required
metadata.name	Name for the group of alerts	x
metadata.labels.team	mailnick/tag	x
spec.receivers	You need at least one receiver	x
spec.receivers.slack.channel	Slack channel to send notifications to
spec.receivers.slack.preprend_text	Text to prepend every Slack-message with severity `danger`
spec.receivers.email.to	The email address to send notifications to
spec.receivers.email.send_resolved	Whether or not to notify about resolved alerts	false
spec.alerts[].alert	The title of the alerts	x
spec.alerts[].description	Simple description of the triggered alert
spec.alerts[].expr	Prometheus expression that triggers an alert	x
spec.alerts[].for	Duration before the alert should trigger	x
spec.alerts[].action	How to resolve this alert	x
spec.alerts[].documentation	URL for docmentation for this alert
spec.alerts[].sla	Time before the alert should be resolved
spec.alerts[].severity	Alert level for Slack-messages	danger

Kubectl

Use kubectl to add, update, and remove the alert-resource. Adding and updating is done with the kubectl apply -f alerts.yaml, while delete is done either with kubectl delete alert <alert-name> og kubectl delete -f alerts.yaml.

You can list alerts in the cluster with kubectl get alerts (singluar: alert, shorten: al), and describe a specific alert-resource with kubectl describe alert <alert-name>.

Writing the `expr`

In order to minimize the feedback loop we suggest experimenting on the Prometheus server to find the right metric for your alert and the notification threshold. The Prometheus server can be found in each cluster, at https://prometheus.{cluster.domain} (i.e. https://prometheus.nais.preprod.local).

You can also visit the Alertmanager at https://alertmanager.{cluster.domain} (i.e. https://alertmanager.nais.preprod.local) to see which alerts are triggered now (you can also silence already triggered alerts).

Expressive descriptions or actions

You can also use labels in your Slack/e-mail notification by referencing them with {{ $labels.<field> }}. Run your query on the Prometheus server to see which labels are available for your alert.

For example:

{{ $labels.node }} is marked as unschedulable

turns into the following when posted to Slack/email:

b27apvl00178.preprod.local is marked as unschedulable

Another example is how you can avoid specifying all the namespaces your app is running in, by using the kubernetes_namespace label in your action og description. Just add {{ $labels.kubernetes_namespace }}, and it will write the namespace for the app that is having problems.

You can read more about this over at the Prometheus documentation.

Target several apps or namespaces in a query

Using regular expression, you can target multiple apps or namespaces with one query. This saves on repeating the same alert for each your apps.

absent(up{app=~"myapp|otherapp|thirdapp"})

Here we use =~ to select labels that regex-match the provided string (or substring). Use !~ to negate the regular expression.

Slack @here and @team

Slack has their own syntax for notifying @channel or @here, respectively <!channel> and <!here>.

Notifying a user group on the other hand is a bit more complicated. The user group @nais-vakt is written <!subteam^SB8KS4WAV|nais-vakt> in a Slack alert-message, where SB8KS4WAV is the id for the specific user group, and nais-vakt is the name of the user group.

You can find the id by right-clicking on the name in the user group list. Where last part in the URL is the id. The URL will look something like the one below:

https://nav-it.slack.com/usergroups/SB8KS4WAV

Example of the different Slack/severity colors

Migrating from Naisd

It's pretty straight forward to move alerts from Naisd to Alerterator, as the only difference is that the annotation-fields has been move to the top-level.

alerts:
- alert: appNotAvailable
  expr: kube_deployment_status_replicas_unavailable{deployment="app-name"} > 0
  for: 5m
  annotations:
    action: Read app logs(kubectl logs appname). Read Application events (kubectl descibe deployment appname)
    severity: Warning

should be transformed to

alerts:
- alert: appNotAvailable
  expr: kube_deployment_status_replicas_unavailable{deployment="app-name"} > 0
  for: 5m
  action: Read app logs(kubectl logs appname). Read Application events (kubectl descibe deployment appname)
  severity: Warning

Check out the complete spec for more information about the different keys.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Alerts

Custom alerts

Fields/spec

Kubectl

Writing the `expr`

Expressive descriptions or actions

Target several apps or namespaces in a query

Slack @here and @team

Example of the different Slack/severity colors

Migrating from Naisd

Flow

Files

README.md

Latest commit

History

README.md

File metadata and controls

Alerts

Custom alerts

Fields/spec

Kubectl

Writing the expr

Expressive descriptions or actions

Target several apps or namespaces in a query

Slack @here and @team

Example of the different Slack/severity colors

Migrating from Naisd

Flow

Writing the `expr`