From 26e9703548493d2348dc00b606a205ac68dd8a89 Mon Sep 17 00:00:00 2001 From: Fridrik Asmundsson Date: Tue, 19 Sep 2023 16:00:12 +0000 Subject: [PATCH 1/4] Add instructions for setting up Grafana+Prometheus This PR also includes location where to put our grafana dashboards which we should maintain in repo. --- CHANGELOG.md | 1 + metrics/README.md | 128 ++++++++++++++ metrics/grafana/MessageExecution.json | 241 ++++++++++++++++++++++++++ metrics/prometheus.yml | 8 + 4 files changed, 378 insertions(+) create mode 100644 metrics/README.md create mode 100644 metrics/grafana/MessageExecution.json create mode 100644 metrics/prometheus.yml diff --git a/CHANGELOG.md b/CHANGELOG.md index 6ba9dc94a1c..93e7c81062d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,7 @@ ## Improvements - fix: Add time slicing to splitstore purging step during compaction to reduce lock congestion [filecoin-project/lotus#11269](https://github.com/filecoin-project/lotus/pull/11269) +- feat: Added instructions on how to setup Prometheus/Grafana for monitoring a local Lotus node [filecoin-project/lotus#11276](https://github.com/filecoin-project/lotus/pull/11276) # v1.23.3 / 2023-08-01 diff --git a/metrics/README.md b/metrics/README.md new file mode 100644 index 00000000000..18cc07d1a44 --- /dev/null +++ b/metrics/README.md @@ -0,0 +1,128 @@ +# Setting Up Prometheus and Grafana + +Lotus supports exporting a wide range of metrics, enabling users to gain insights into its behavior and effectively analyze performance issues. These metrics can be conveniently utilized with aggregation and visualization tools for in-depth analysis. In this document, we show how you can set up Prometheus and Grafana for monitoring and visualizing these metrics: + +- **Prometheus**: Prometheus is an open-source monitoring and alerting toolkit designed for collecting and storing time-series data from various systems and applications. It provides a robust querying language (PromQL) and a web-based interface for analyzing and visualizing metrics. + +- **Grafana**: Grafana is an open-source platform for creating, sharing, and visualizing interactive dashboards and graphs. It integrates with various data sources, including Prometheus, to help users create meaningful visual representations of their data and set up alerting based on specific conditions. + +## Prerequisites + +- You have a Linux or Mac based system. +- You have root access to install software +- You have lotus node already running + +## Install and start Prometheus + +### On Ubuntu: + +``` +# install prometheus +sudo apt-get install prometheus + +# copy the prometheus.yml config to the correct directory +cp lotus/metrics/prometheus.yml /etc/prometheus/prometheus.yml + +# start prometheus +sudo systemctl start prometheus + +# enable prometheus on boot (optional) +sudo systemctl enable prometheus +``` + +### On Mac: + +``` +# install prometheus +brew install prometheus + +# start prometheus +prometheus --config.file=lotus/metrics/prometheus.yml +``` + +## Install and start Grafana + +### On Ubuntu: + +``` +# install grafana +sudo apt-get install grafana + +# start grafana +sudo systemctl start grafana-server + +# start grafana on boot (optional) +sudo systemctl enable grafana-server +``` + +### On Mac: + +``` +brew install grafana +brew services start grafana +``` + +You should now have Prometheus and Grafana running on your machine where Promotheus is already collecting metrics from your running Lotus node and saving it to a database. + +You can confirm everything is setup correctly by visiting: +- Prometheus (http://localhost:9090): You can open the metric explorer and view any of the aggregated metrics scraped from Lotus +- Grafana (http://localhost:3000): Default username/password is admin/admin, remember to change it after login. + +## Add Prometheus as datasource in Grafana + +1. Log in to Grafana using the web interface. +2. Navigate to "Home" > "Connections" > "Data Sources." +3. Click "Add data source." +4. Choose "Prometheus." +5. In the "HTTP" section, set the URL to http://localhost:9090. +6. Click "Save & Test" to verify the connection. + +## Import one of the existing dashboards in lotus/metrics/grafana + +1. Log in to Grafana using the web interface. +2. Navigate to "Home" > "Dashboards" > Click the drop down menu in the "New" button and select "Import" +3. Paste any of the existing dashboards in lotus/metrics/grafana into the "Import via panel json" panel. +4. Click "Load" + +# Collect system metrics using node_exporter + +Although Lotus includes many useful metrics it does not include system metrics such as information about cpu, memory, disk, network, etc. If you are investigating an issue and have Lotus metrics available, its often very useful to correlate certain events or behaviour with general system metrics. + +## Install node_exporter +If you have followed this guide so far and have Prometheus and Grafana already running, you can run the following commands to also aggregate the system metrics: + + +Ubuntu: + +``` +``` + +Mac: + +``` +# install node_exporter +brew install node_exporter + +# run node_exporter +node_exporter +``` + +## Update prometheus config to include node_exporter + +Add the following to the prometheus config and then restart prometheus: + +``` +- job_name: node_exporter + static_configs: + - targets: ['localhost:9100'] +``` + +## Import system dashboard + +1. Download the most recent dashboard from https://grafana.com/grafana/dashboards/1860-node-exporter-full/ +2. Log in to Grafana (http://localhost:3000) using the web interface. +3. Navigate to "Home" > "Dashboards" > Click the drop down menu in the "New" button and select "Import" +4. Paste any of the existing dashboards in lotus/metrics/grafana into the "Import via panel json" panel. +5. Click "Load" +6. Select the Prometheus datasource you created earlier +7. Click "Import" diff --git a/metrics/grafana/MessageExecution.json b/metrics/grafana/MessageExecution.json new file mode 100644 index 00000000000..1bdee4e0a34 --- /dev/null +++ b/metrics/grafana/MessageExecution.json @@ -0,0 +1,241 @@ +{ + "__inputs": [ + { + "name": "DS_PROMETHEUS", + "label": "Prometheus", + "description": "", + "type": "datasource", + "pluginId": "prometheus", + "pluginName": "Prometheus" + } + ], + "__elements": {}, + "__requires": [ + { + "type": "grafana", + "id": "grafana", + "name": "Grafana", + "version": "10.1.1" + }, + { + "type": "datasource", + "id": "prometheus", + "name": "Prometheus", + "version": "1.0.0" + }, + { + "type": "panel", + "id": "timeseries", + "name": "Time series", + "version": "" + } + ], + "annotations": { + "list": [ + { + "builtIn": 1, + "datasource": { + "type": "grafana", + "uid": "-- Grafana --" + }, + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations & Alerts", + "type": "dashboard" + } + ] + }, + "editable": true, + "fiscalYearStartMonth": 0, + "graphTooltip": 0, + "id": null, + "links": [], + "liveNow": false, + "panels": [ + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "description": "Understand where time is spent in ApplyBlocks which is executed as part of ExecuteTipSet, its metric include:\n\n- applyblocks_total_ms (total): The total time spent in Applyblocks\n- applyblocks_cron (cron): Time spent in cron\n- applyblocks_early (early): Time spent in early apply-blocks (null cron, upgrades)\n- applyblocks_flush (flush): Time spent flushing vm state\n- applyblocks_messages (apply messages): Time spent applying block messages\n", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "Time in MS", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "insertNulls": false, + "lineInterpolation": "smooth", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": 60000, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + }, + { + "color": "red", + "value": 80 + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 10, + "w": 12, + "x": 0, + "y": 0 + }, + "id": 1, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "disableTextWrap": false, + "editorMode": "builder", + "expr": "histogram_quantile(0.99, sum by(le) (rate(lotus_vm_applyblocks_total_ms_bucket[$__rate_interval])))", + "fullMetaSearch": false, + "includeNullMetadata": false, + "instant": false, + "legendFormat": "Total", + "range": true, + "refId": "A", + "useBackend": false + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "disableTextWrap": false, + "editorMode": "builder", + "expr": "histogram_quantile(0.99, sum by(le) (rate(lotus_vm_applyblocks_cron_bucket[$__rate_interval])))", + "fullMetaSearch": false, + "hide": false, + "includeNullMetadata": false, + "instant": false, + "legendFormat": "Cron", + "range": true, + "refId": "B", + "useBackend": false + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "disableTextWrap": false, + "editorMode": "builder", + "expr": "histogram_quantile(0.99, sum by(le) (rate(lotus_vm_applyblocks_early_bucket[$__rate_interval])))", + "fullMetaSearch": false, + "hide": false, + "includeNullMetadata": false, + "instant": false, + "legendFormat": "Early", + "range": true, + "refId": "C", + "useBackend": false + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "disableTextWrap": false, + "editorMode": "builder", + "expr": "histogram_quantile(0.99, sum by(le) (rate(lotus_vm_applyblocks_flush_bucket[$__rate_interval])))", + "fullMetaSearch": false, + "hide": false, + "includeNullMetadata": false, + "instant": false, + "legendFormat": "Flush", + "range": true, + "refId": "D", + "useBackend": false + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "disableTextWrap": false, + "editorMode": "builder", + "expr": "histogram_quantile(0.99, sum by(le) (rate(lotus_vm_applyblocks_messages_bucket[$__rate_interval])))", + "fullMetaSearch": false, + "hide": false, + "includeNullMetadata": false, + "instant": false, + "legendFormat": "Apply messages", + "range": true, + "refId": "E", + "useBackend": false + } + ], + "title": "ApplyBlocks (ms)", + "type": "timeseries" + } + ], + "refresh": "", + "schemaVersion": 38, + "style": "dark", + "tags": [], + "templating": { + "list": [] + }, + "time": { + "from": "now-5m", + "to": "now" + }, + "timepicker": {}, + "timezone": "", + "title": "Lotus Message Execution", + "uid": "a7bacd0e-f7a1-418f-98e5-3469c5e0b6ea", + "version": 5, + "weekStart": "" +} \ No newline at end of file diff --git a/metrics/prometheus.yml b/metrics/prometheus.yml new file mode 100644 index 00000000000..d6a97b3e6f9 --- /dev/null +++ b/metrics/prometheus.yml @@ -0,0 +1,8 @@ +global: + scrape_interval: 10s + +scrape_configs: +- job_name: 'lotus' + metrics_path: '/debug/metrics' + static_configs: + - targets: ['localhost:1234'] From 113b8b50023887ca9818333a5defac574e53f558 Mon Sep 17 00:00:00 2001 From: Fridrik Asmundsson Date: Wed, 20 Sep 2023 11:15:45 +0000 Subject: [PATCH 2/4] Fix after testing with ubuntu --- metrics/README.md | 39 +++++++++++++++++++++++++++------------ metrics/prometheus.yml | 16 +++++++++++----- 2 files changed, 38 insertions(+), 17 deletions(-) diff --git a/metrics/README.md b/metrics/README.md index 18cc07d1a44..1d395088114 100644 --- a/metrics/README.md +++ b/metrics/README.md @@ -21,7 +21,7 @@ Lotus supports exporting a wide range of metrics, enabling users to gain insight sudo apt-get install prometheus # copy the prometheus.yml config to the correct directory -cp lotus/metrics/prometheus.yml /etc/prometheus/prometheus.yml +sudo cp metrics/prometheus.yml /etc/prometheus/prometheus.yml # start prometheus sudo systemctl start prometheus @@ -45,7 +45,16 @@ prometheus --config.file=lotus/metrics/prometheus.yml ### On Ubuntu: ``` -# install grafana +# download the Grafana GPG key in our keyring +wget -q -O - https://packages.grafana.com/gpg.key | gpg --dearmor | sudo tee /usr/share/keyrings/grafana.gpg > /dev/null + +# add the Grafana repository to our APT sources +echo "deb [signed-by=/usr/share/keyrings/grafana.gpg] https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list + +# update our APT cache +sudo apt-get update + +# now we can install grafana sudo apt-get install grafana # start grafana @@ -83,6 +92,8 @@ You can confirm everything is setup correctly by visiting: 2. Navigate to "Home" > "Dashboards" > Click the drop down menu in the "New" button and select "Import" 3. Paste any of the existing dashboards in lotus/metrics/grafana into the "Import via panel json" panel. 4. Click "Load" +5. Select the Prometheus datasource you created earlier +6. Click "Import" # Collect system metrics using node_exporter @@ -95,6 +106,18 @@ If you have followed this guide so far and have Prometheus and Grafana already r Ubuntu: ``` + +# download the newest release by https://github.com/prometheus/node_exporter/releases (it was 1.6.1 as of writing this doc) +wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz + +# extract the release (in contains a single binary plus some docs) +tar -xf node_exporter-1.6.1.linux-amd64.tar.gz + +# move it to /usr/local/bin +sudo mv node_exporter-1.6.1.linux-amd64/node_exporter /usr/local/bin + +# run node_exorter +node_exporter ``` Mac: @@ -107,18 +130,10 @@ brew install node_exporter node_exporter ``` -## Update prometheus config to include node_exporter - -Add the following to the prometheus config and then restart prometheus: - -``` -- job_name: node_exporter - static_configs: - - targets: ['localhost:9100'] -``` - ## Import system dashboard +Since our `prometheus.yml` config already has configuration for node_exporter we can go straight away and import a Grafana dashboard for viewing: + 1. Download the most recent dashboard from https://grafana.com/grafana/dashboards/1860-node-exporter-full/ 2. Log in to Grafana (http://localhost:3000) using the web interface. 3. Navigate to "Home" > "Dashboards" > Click the drop down menu in the "New" button and select "Import" diff --git a/metrics/prometheus.yml b/metrics/prometheus.yml index d6a97b3e6f9..6d1564ab99a 100644 --- a/metrics/prometheus.yml +++ b/metrics/prometheus.yml @@ -1,8 +1,14 @@ global: - scrape_interval: 10s + scrape_interval: 1m scrape_configs: -- job_name: 'lotus' - metrics_path: '/debug/metrics' - static_configs: - - targets: ['localhost:1234'] + - job_name: lotus + scrape_interval: 10s + metrics_path: '/debug/metrics' + static_configs: + - targets: ['localhost:1234'] + + - job_name: node_exporter + scrape_interval: 15s + static_configs: + - targets: ['localhost:9100'] From 768b2b381283fec112521a923e9fcd5a1da26b92 Mon Sep 17 00:00:00 2001 From: Fridrik Asmundsson Date: Wed, 20 Sep 2023 14:15:03 +0000 Subject: [PATCH 3/4] Update README.md --- README.md | 2 ++ metrics/README.md | 5 +++-- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index f6ac7593222..57a59f70c89 100644 --- a/README.md +++ b/README.md @@ -133,6 +133,8 @@ Note: The default branch `master` is the dev branch where the latest new feature 6. You should now have Lotus installed. You can now [start the Lotus daemon and sync the chain](https://lotus.filecoin.io/lotus/install/linux/#start-the-lotus-daemon-and-sync-the-chain). +7. (Optional) Follow the [Setting Up Prometheus and Grafana](https://github.com/filecoin-project/lotus/tree/master/metrics/README.md) guide for detailed instructions on setting up a working monitoring system running against a local running lotus node. + ## License Dual-licensed under [MIT](https://github.com/filecoin-project/lotus/blob/master/LICENSE-MIT) + [Apache 2.0](https://github.com/filecoin-project/lotus/blob/master/LICENSE-APACHE) diff --git a/metrics/README.md b/metrics/README.md index 1d395088114..a8865d3bd76 100644 --- a/metrics/README.md +++ b/metrics/README.md @@ -12,6 +12,8 @@ Lotus supports exporting a wide range of metrics, enabling users to gain insight - You have root access to install software - You have lotus node already running +**Note:** These instructions have been tested on Ubuntu 23.04 and on Mac M1. + ## Install and start Prometheus ### On Ubuntu: @@ -71,7 +73,7 @@ brew install grafana brew services start grafana ``` -You should now have Prometheus and Grafana running on your machine where Promotheus is already collecting metrics from your running Lotus node and saving it to a database. +You should now have Prometheus and Grafana running on your machine where Promotheus is already collecting metrics from your Lotus node (if its running) and saving it to a database. You can confirm everything is setup correctly by visiting: - Prometheus (http://localhost:9090): You can open the metric explorer and view any of the aggregated metrics scraped from Lotus @@ -102,7 +104,6 @@ Although Lotus includes many useful metrics it does not include system metrics s ## Install node_exporter If you have followed this guide so far and have Prometheus and Grafana already running, you can run the following commands to also aggregate the system metrics: - Ubuntu: ``` From e0f90274e471ac5fff190b2a19f6c80f19f3154d Mon Sep 17 00:00:00 2001 From: Fridrik Asmundsson Date: Wed, 27 Sep 2023 14:03:50 +0000 Subject: [PATCH 4/4] fix spelling --- metrics/README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/metrics/README.md b/metrics/README.md index a8865d3bd76..dce18e308b0 100644 --- a/metrics/README.md +++ b/metrics/README.md @@ -73,7 +73,7 @@ brew install grafana brew services start grafana ``` -You should now have Prometheus and Grafana running on your machine where Promotheus is already collecting metrics from your Lotus node (if its running) and saving it to a database. +You should now have Prometheus and Grafana running on your machine where Prometheus is already collecting metrics from your Lotus node (if its running) and saving it to a database. You can confirm everything is setup correctly by visiting: - Prometheus (http://localhost:9090): You can open the metric explorer and view any of the aggregated metrics scraped from Lotus @@ -99,7 +99,7 @@ You can confirm everything is setup correctly by visiting: # Collect system metrics using node_exporter -Although Lotus includes many useful metrics it does not include system metrics such as information about cpu, memory, disk, network, etc. If you are investigating an issue and have Lotus metrics available, its often very useful to correlate certain events or behaviour with general system metrics. +Although Lotus includes many useful metrics it does not include system metrics, such as information about cpu, memory, disk, network, etc. If you are investigating an issue and have Lotus metrics available, its often very useful to correlate certain events or behaviour with general system metrics. ## Install node_exporter If you have followed this guide so far and have Prometheus and Grafana already running, you can run the following commands to also aggregate the system metrics: @@ -111,13 +111,13 @@ Ubuntu: # download the newest release by https://github.com/prometheus/node_exporter/releases (it was 1.6.1 as of writing this doc) wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz -# extract the release (in contains a single binary plus some docs) +# extract the release (it contains a single binary plus some docs) tar -xf node_exporter-1.6.1.linux-amd64.tar.gz # move it to /usr/local/bin sudo mv node_exporter-1.6.1.linux-amd64/node_exporter /usr/local/bin -# run node_exorter +# run node_exporter node_exporter ``` @@ -133,7 +133,7 @@ node_exporter ## Import system dashboard -Since our `prometheus.yml` config already has configuration for node_exporter we can go straight away and import a Grafana dashboard for viewing: +Since our `prometheus.yml` config already has configuration for node_exporter, we can go straight away and import a Grafana dashboard for viewing: 1. Download the most recent dashboard from https://grafana.com/grafana/dashboards/1860-node-exporter-full/ 2. Log in to Grafana (http://localhost:3000) using the web interface.