Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add daemon mode which exposes prometheus metrics #27

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

IceWreck
Copy link

Prometheus format is kinda the de-facto standard for emitting metrics these days. We wanted to record latency and do healthchecks + draw graphs of variations in grafana and alert if something is out of place. So we added a daemon mode to tcp-shaker which runs the checker at regular intervals and runs an HTTP endpoint which exposes metrics.

  • CLI/one-off mode is still the default.
  • Daemon mode reads daemon config from a yaml file. This yaml file has a list of TCP address to check + options specific to daemon mode. Use CLI arguments for options that are common between CLI and Daemon modes.
  • Config has been renamed to CLIConfig to prevent confusion with DaemonConfig.
  • Concurrent checker now accepts address as parameter instead of directly using the one in CLI Config.
  • Some misc changes

Example yaml file at app/tcp-checker/example.yaml

Test it with go run ./app/tcp-checker/ -d -f ./app/tcp-checker/example.yaml, wait for a couple of seconds then visit http://localhost:8785/metrics

They look something like this

# HELP error_count Number of errors occurred, partitioned by error type, destination address and number of requests per check.
# TYPE error_count counter
error_count{destination="example.com:443",error_type="connect",requests_per_check="1"} 0
error_count{destination="example.com:443",error_type="other",requests_per_check="1"} 0
error_count{destination="example.com:443",error_type="timeout",requests_per_check="1"} 0
error_count{destination="example.com:5454",error_type="connect",requests_per_check="1"} 0
error_count{destination="example.com:5454",error_type="other",requests_per_check="1"} 0
error_count{destination="example.com:5454",error_type="timeout",requests_per_check="1"} 7
error_count{destination="google.com:80",error_type="connect",requests_per_check="1"} 0
error_count{destination="google.com:80",error_type="other",requests_per_check="1"} 0
error_count{destination="google.com:80",error_type="timeout",requests_per_check="1"} 0
error_count{destination="smtp.gmail.com:465",error_type="connect",requests_per_check="1"} 0
error_count{destination="smtp.gmail.com:465",error_type="other",requests_per_check="1"} 0
error_count{destination="smtp.gmail.com:465",error_type="timeout",requests_per_check="1"} 0
# HELP promhttp_metric_handler_errors_total Total number of internal errors encountered by the promhttp metric handler.
# TYPE promhttp_metric_handler_errors_total counter
promhttp_metric_handler_errors_total{cause="encoding"} 0
promhttp_metric_handler_errors_total{cause="gathering"} 0
# HELP tcpcheck_duration TCP Check duration in ms, partitioned by destination address and number of requests per check.
# TYPE tcpcheck_duration gauge
tcpcheck_duration{destination="example.com:443",requests_per_check="1"} 256
tcpcheck_duration{destination="example.com:5454",requests_per_check="1"} 1000
tcpcheck_duration{destination="google.com:80",requests_per_check="1"} 57
tcpcheck_duration{destination="smtp.gmail.com:465",requests_per_check="1"} 82

- CLI/one-off mode is still the default.
- Daemon mode reads daemon config from a yaml file.
- Config has been renamed to CLIConfig to prevent confusion with
  DaemonConfig.
- Concurrent checker now accepts address as parameter instead of
  directly using the one in CLI Config.
@tevino
Copy link
Owner

tevino commented May 17, 2023

I believe this PR could be a great contribution to the TCP prober of the official blackbox exporter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants