The prometheus-service is a Keptn integration responsible for:
- configuring Prometheus for monitoring services managed by Keptn,
- receiving alerts (on port 8080) from Prometheus Alertmanager and translating the alert payload to a cloud event ( remediation.triggered) that is sent to the Keptn API,
- retrieving Service Level Indicators (SLIs) from a Prometheus API endpoint and sending it back to Keptn.
Please always double-check the version of Keptn you are using compared to the version of this service, and follow the compatibility matrix below.
Keptn Version* | Prometheus Service Image |
---|---|
0.13.x | keptncontrib/prometheus-service:0.7.4 |
0.13.x | keptncontrib/prometheus-service:0.7.5 |
0.14.2** | keptncontrib/prometheus-service:0.8.0 |
0.15.1 | keptncontrib/prometheus-service:0.8.1*** |
0.16.0 | keptncontrib/prometheus-service:0.8.2*** |
0.16.x | keptncontrib/prometheus-service:0.8.3 |
0.17.x | keptncontrib/prometheus-service:0.8.6 |
0.18.x | keptncontrib/prometheus-service:0.9.1 |
0.19.x | keptncontrib/prometheus-service:0.9.1 |
0.20.x | keptncontrib/prometheus-service:0.9.1 |
1.x.y | keptncontrib/prometheus-service:0.9.1 |
* This is the Keptn version we aim to be compatible with. Other versions should work too, but there is no guarantee.
** This version is only compatible with Keptn 0.14.2 and potentially newer releases of Keptn 0.14.x due to a breaking change in NATS cluster name.
*** These versions are not compatible with Prometheus Alertmanager <= 0.24
You can find more information and older releases on the Releases page.
Keptn does not install or manage Prometheus and its components. Users need to install Prometheus and Prometheus Alert manager as a prerequisite.
The easiest way would be to setup Prometheus using helm, e.g.:
kubectl create namespace monitoring
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/prometheus --namespace monitoring
- To verify that the Prometheus scrape jobs are correctly set up, you can access Prometheus by enabling port-forwarding for the prometheus-server:
kubectl port-forward svc/prometheus-server 8080:80 -n monitoring
Prometheus is then available on localhost:8080/targets where you can see the targets for the service.
Please replace the placeholders in the commands below. Examples are provided.
<VERSION>
: prometheus-service version, e.g.,0.8.3
<PROMETHEUS_NS>
: If prometheus is installed in the same Kubernetes cluster, the namespace needs to be provided, e.g.,monitoring
<PROMETHEUS_ENDPOINT>
: Endpoint for prometheus (primarily used for fetching metrics), e.g.,http://prometheus-server.monitoring.svc.cluster.local:80
<ALERT_MANAGER_NS>
: if prometheus alert manager is installed in the same Kubernetes cluster, the namespace needs to be provided, e.g.,monitoring
Once this is done, you can go ahead and install prometheus-service:
Note: Make sure to replace <VERSION>
with the version you want to install.
-
Install Keptn prometheus-service in Kubernetes using the following command. This will install the prometheus-service into the
keptn
namespace and will autodetect the prometheus and the alertmanager namespaces.helm upgrade --install -n keptn prometheus-service \ https://github.com/keptn-contrib/prometheus-service/releases/download/<VERSION>/prometheus-service-<VERSION>.tgz \ --reuse-values
-
(Optional) If you want to customize the namespaces of Keptn or the Prometheus installation, replace the environment variable values according to the use case and apply the manifest (this will automatically disable the autodetect):
PROMETHEUS_NS=<PROMETHEUS_NS> PROMETHEUS_ENDPOINT=<PROMETHEUS_ENDPOINT> ALERT_MANAGER_NS=<ALERT_MANAGER_NS> KEPTN_NAMESPACE="keptn" helm upgrade -n ${KEPTN_NAMESPACE} prometheus-service \ https://github.com/keptn-contrib/prometheus-service/releases/download/<VERSION>/prometheus-service-<VERSION>.tgz \ --reuse-values \ --set prometheus.namespace=${PROMETHEUS_NS} \ --set prometheus.endpoint=${PROMETHEUS_ENDPOINT} \ --set prometheus.namespace_am=${ALERT_MANAGER_NS}
-
Execute the following command to configure Prometheus and set up the rules for the Prometheus Alerting Manager:
keptn configure monitoring prometheus --project=sockshop --service=carts
You can customize prometheus-service with the following environment variables:
# Prometheus installed namespace
- name: PROMETHEUS_NS
value: 'default'
# Prometheus server configmap name
- name: PROMETHEUS_CM
value: 'prometheus-server'
# Prometheus server app labels
- name: PROMETHEUS_LABELS
value: 'component=server'
# Prometheus configmap data's config filename
- name: PROMETHEUS_CONFIG_FILENAME
value: 'prometheus.yml'
# AlertManager configmap data's config filename
- name: ALERT_MANAGER_CONFIG_FILENAME
value: 'alertmanager.yml'
# Alert Manager config map name
- name: ALERT_MANAGER_CM
value: 'prometheus-alertmanager'
# Alert Manager app labels
- name: ALERT_MANAGER_LABELS
value: 'component=alertmanager'
# Alert Manager installed namespace
- name: ALERT_MANAGER_NS
value: 'default'
# Alert Manager template configmap name
- name: ALERT_MANAGER_TEMPLATE_CM
value: 'alertmanager-templates'
Per default, the service works with the following assumptions regarding the setup of the Prometheus instance:
-
Each service within a stage of a project has a Prometheus scrape job definition with the name:
<service>-<project>-<stage>
For example, if
project=sockshop
,stage=production
andservice=carts
, the scrape job name would have to becarts-sockshop-production
. -
Every service provides the following metrics for its corresponding scrape job:
-
http_response_time_milliseconds (Histogram)
-
http_requests_total (Counter)
This metric has to contain the
status
label, indicating the HTTP response code of the requests handled by the service. It is highly recommended that this metric also provides a label to query metric values for specific endpoints, e.g.handler
.An example of an entry would look like this:
http_requests_total{method="GET",handler="VersionController.getInformation",status="200",} 4.0
-
-
Based on those metrics, the queries for the SLIs are built as follows:
- throughput:
sum(rate(http_requests_total{job="<service>-<project>-<stage>-canary"}[<test_duration_in_seconds>s]))
- error_rate:
sum(rate(http_requests_total{job="<service>-<project>-<stage>-canary",status!~'2..'}[<test_duration_in_seconds>s]))/sum(rate(http_requests_total{job="<service>-<project>-<stage>-canary"}[<test_duration_in_seconds>s]))
- response_time_p50:
histogram_quantile(0.50, sum(rate(http_response_time_milliseconds_bucket{job='<service>-<project>-<stage>-canary'}[<test_duration_in_seconds>s])) by (le))
- response_time_p90:
histogram_quantile(0.90, sum(rate(http_response_time_milliseconds_bucket{job='<service>-<project>-<stage>-canary'}[<test_duration_in_seconds>s])) by (le))
- response_time_p95:
histogram_quantile(0.95, sum(rate(http_response_time_milliseconds_bucket{job='<service>-<project>-<stage>-canary'}[<test_duration_in_seconds>s])) by (le))
- throughput:
To use an external Prometheus instance for a certain project, a secret containing the URL and the access credentials has to be created using the keptn
cli (don't forget to replace the <project>
placeholder with the name of your project):
PROMETHEUS_USER=test
PROMETHEUS_PASSWORD=test-password
PROMETHEUS_URL=http://prometheus-server.monitoring.svc.cluster.local
keptn create secret prometheus-credentials-<project> --scope="keptn-prometheus-service" --from-literal="PROMETHEUS_USER=$PROMETHEUS_USER" --from-literal="PROMETHEUS_PASSWORD=$PROMETHEUS_PASSWORD" --from-literal="PROMETHEUS_URL=$PROMETHEUS_URL"
Note: This creates an actual Kubernetes secret, with some Kubernetes labels (app.kubernetes.io/managed-by=keptn-secret-service
, app.kubernetes.io/scope=prometheus-service
) and is bound to the correct role (keptn-prometheus-svc-read
) which allow prometheus-service to access it.
Users can override the predefined queries, as well as add custom queries by creating a SLI configuration.
-
A SLI configuration is a yaml file as shown below:
--- spec_version: '1.0' indicators: cpu_usage: avg(rate(container_cpu_usage_seconds_total{namespace="$PROJECT-$STAGE",pod_name=~"$SERVICE-primary-.*"}[5m])) response_time_p95: histogram_quantile(0.95, sum by(le) (rate(http_response_time_milliseconds_bucket{handler="ItemsController.addToCart",job="$SERVICE-$PROJECT-$STAGE-canary"}[$DURATION_SECONDS])))
This file contains a list of keys (e.g.,
cpu_usage
) and a prometheus metric expressions (e.g.,avg(rate(...{filters}[timeframe]))
). -
To store this configuration, you need to add this file to a Keptn's configuration store, e.g., using the keptn add-resource command:
keptn add-resource --project <project> --service <service> --stage <stage> --resource=sli.yaml --resourceUri=prometheus/sli.yaml
Within the user-defined queries, the following variables can be used to dynamically build the query, depending on the project/stage/service, and the time frame:
$PROJECT
: will be replaced with the name of the project$STAGE
: will be replaced with the name of the stage$SERVICE
: will be replaced with the name of the service$DEPLOYMENT
: type of the deployment (e.g., direct, canary, primary)$DURATION_SECONDS
: will be replaced with the test run duration, e.g. 30s
For example, if an evaluation for the service carts in the stage production of the project sockshop is triggered, and the tests ran for 30s these will be the resulting queries:
rate(my_custom_metric{job='$SERVICE-$PROJECT-$STAGE',handler=~'$handler'}[$DURATION_SECONDS]) => rate(my_custom_metric{job='carts-sockshop-production',handler=~'$handler'}[30s])
By default, the prometheus-service
automatically creates all the needed configmaps for targets and alerts without needing to configure anything. In some cases, the user might want to manually create the configmaps and alerts instead, which can be enabled by changing the following flags inside the values.yaml
file:
prometheus.createTargets
(default: true) - Enable or disable the automatic creation of Prometheus targetsprometheus.createAlerts
(default: true) - Enable or disable the automatic creation of Prometheus alerts
Take a look at the TROUBLESHOOTING page for common errors and troubleshooting guidelines.
You are welcome to contribute using Pull Requests against the master branch. Before contributing, please read our Contributing Guidelines.