Skip to content

Latest commit

 

History

History
93 lines (69 loc) · 3.25 KB

README.md

File metadata and controls

93 lines (69 loc) · 3.25 KB

Montagu monitor

Monitoring and alerts for Montagu and supporting services.

We should consider separating out the Montagu-specific bits.

This repo of a Docker Compose configuration that spins up a Prometheus instance with an accompanying alert manager. These instances are configured by:

  • prometheus.yml - Main config (see docs)
  • alert-rules.yml - What conditions should trigger alerts (see docs)
  • alertmanager.yml - Alertmanager config. This controls where alerts get posted to (see docs)

To start the monitor and external metric exporters (see below) use:

git submodule init
git submodule update
pip3 install -r requirements.txt --user
./run

To reload Prometheus and the alert manager after a config change, run

./reload

Local development

To run locally and have the alert manager notify a test Slack channel rather than creating noise in the real monitor channel, use

./run --dev

and for reloading

./reload --dev

To force alerts to fire just invert the rules in prometheus/alert-rules.yml temporarily, e.g. change a rule expression like

up{job="bb8"} == 0

to

up{job="bb8"} == 1

Deployment on bots.dide.ic.ac.uk

Connect as the vagrant user on bots.dide.ic.ac.uk, then

# git clone --recursive https://github.com/vimc/montagu-monitor monitor
cd ~/monitor
git pull
pip3 install --user -r requirements.txt

And then either call ./run (if there are code changes) or ./reload (to refresh the config).

Metric exporters

Prometheus relies on the services it is monitoring serving up a text file that exports values to monitor. By convention, these are served at SERVICE_URL/metrics, and each line follows this syntax:

<metric name>{<label name>=<label value>, ...} <metric value>

Internal metric exporters

The intention is that we will add /metric endpoints to our various apps, either:

  • Using existing metrics endpoints built-in to things like Docker Daemon (see list)
  • Using existing "exporters", that sit alongside in a separate docker container, like the one for Postgres (see list)
  • Directly integrated into the app (using one of the client libraries)
  • Write our own exporter to sit alongside as a small Flask app in a separate container

External metric exporters

For monitoring external services (like S3) there's no need to deploy them separately; instead we can deploy them alongside Prometheus. So far we have one: aws_metrics. When you run run it will also build and start the exporter.

Machine metrics

See machine-metrics for turning on Prometheus Node Exporter for publishing machine metrics from a system. This will make the metrics accessible on localhost:9100. You then need to add a new job to prometheus.yml to pull metrics, they can then be used to build alerts or for graphs.