Feature Request: Lightweight CNCLI Blocks Prometheus Exporter #1439

Straightpool · 2022-06-25T18:29:25Z

Feature description
A lightweight Prometheus exporter for the CNCLI "Blocks" data.
I propose the following metrics, always in relation to the current active epoch for monitoring purposes:

cntools_cncli_blocks_metrics_next_leader_time_utc
cntools_cncli_blocks_metrics_next_next_leader_time_utc
cntools_cncli_blocks_metrics_ideal
cntools_cncli_blocks_metrics_luck
cntools_cncli_blocks_metrics_adopted_total
cntools_cncli_blocks_metrics_confirmed_total
cntools_cncli_blocks_metrics_missed_total
cntools_cncli_blocks_metrics_ghosted_total
cntools_cncli_blocks_metrics_stolen_total
cntools_cncli_blocks_metrics_invalid_total
cntools_cncli_blocks_metrics_adopted_max_consec
cntools_cncli_blocks_metrics_confirmed_max_consec
cntools_cncli_blocks_metrics_missed_max_consec
cntools_cncli_blocks_metrics_ghosted_max_consec
cntools_cncli_blocks_metrics_stolen_max_consec
cntools_cncli_blocks_metrics_invalid_max_consec

Next_leader_time_UTC returns the UTC time of the next leader slot
Next_next_leader_time_UTC returns the UTC time of the leader slot after next

*_total refers to the total number at the current time. *_max_consec refers to the max consecutive occurrence of said block state.

Example
Example block sequence:

confirmed
confirmed
stolen
confirmed
ghosted
confirmed
confirmed
confirmed
missed
confirmed
confirmed
confirmed
confirmed
missed
missed
missed
invalid

Results in:

cntools_cncli_blocks_metrics_adopted_total: 0
cntools_cncli_blocks_metrics_confirmed_total: 10
cntools_cncli_blocks_metrics_missed_total: 4
cntools_cncli_blocks_metrics_ghosted_total: 1
cntools_cncli_blocks_metrics_stolen_total: 1
cntools_cncli_blocks_metrics_invalid_total: 1
cntools_cncli_blocks_metrics_adopted_max_consec: 0
cntools_cncli_blocks_metrics_confirmed_max_consec: 4
cntools_cncli_blocks_metrics_missed_max_consec: 3
cntools_cncli_blocks_metrics_ghosted_max_consec: 1
cntools_cncli_blocks_metrics_stolen_max_consec: 1
cntools_cncli_blocks_metrics_invalid_max_consec: 1

Rationale
Sometimes out of the blue errors or bugs can occur in the pool infrastructure or Cardano node itself which can lead to a number of consecutive lost blocks. A single missed block is currently usually a false classification and rather a ghosted block. A single ghosted block is usually a race condition currently.
However, if any of these error states do happen in multiples in direct succession something is clearly off though and ought to trigger an alert / action based on Prometheus / Grafana rules. In the example "cntools_cncli_blocks_metrics_missed_max_consec: 3" would be such a case which would warrant an alert / action or cntools_cncli_blocks_metrics_invalid_total =! 0. Without a Prometheus Exporter it might take a while to notice the issue, especailly if it triggers no other alerts, with more lost blocks than necessary.

Possible implementation approaches
From first looks this seems to be a feasible architecture with SQL scraping:

Build upon an open source extensible sql-query Prometheus exporter such as: https://github.com/albertodonato/query-exporter
To determine consecutive runs a SQL count partition query should deliver expected results, see e.g. https://stackoverflow.com/questions/36927685/count-number-of-consecutive-occurrence-of-values-in-table

Another approach could be to use Bash pushing:

Employ a Prometheus PushGateway to use Bash scripts to push an update when there is an update on a block, see e.g. https://medium.com/avmconsulting-blog/pushing-bash-script-result-to-prometheus-using-pushgateway-a0760cd261e
See https://prometheus.io/docs/practices/pushing/ on the implications

Considered alternatives
None

Version:

OS: Ubunto 20.04 LTS
Product version: CNTools 9.1.0
Cardano Node version:
cardano-node 1.34.1 - linux-x86_64 - ghc-8.10
git rev 73f9a746362695dc2cb63ba757fbcabb81733d23
Network you're connecting to: Mainnet

rdlrt assigned Scitz0 Dec 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Lightweight CNCLI Blocks Prometheus Exporter #1439

Feature Request: Lightweight CNCLI Blocks Prometheus Exporter #1439

Straightpool commented Jun 25, 2022 •

edited

Loading

Feature Request: Lightweight CNCLI Blocks Prometheus Exporter #1439

Feature Request: Lightweight CNCLI Blocks Prometheus Exporter #1439

Comments

Straightpool commented Jun 25, 2022 • edited Loading

Straightpool commented Jun 25, 2022 •

edited

Loading