-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Introduce a Persistent HTTP Probe Status Counter Metric #1367
Comments
This is not necessary, as each |
Thank you for your suggestion @SuperQ Gauges vs. Counters: The current HTTP status metric is a gauge that reflects the most recent status code rather than accumulating each probe event. Using range vector functions on a gauge only aggregates data over a fixed time window instead of maintaining a persistent, cumulative count. Limited Time Window: Functions like sum_over_time only work within the specified range (e.g., the last minute), meaning past data is not retained once the window slides. This approach does not provide the long-term accumulation required for calculating accurate rates. State Persistence: A true persistent counter needs to increment with every probe event and “remember” previous increments across scrapes. This cannot be achieved by merely applying aggregation functions to a gauge that resets or updates each scrape. In summary, to obtain a persistent HTTP status counter, the exporter must emit a genuine counter metric that increments with every probe rather than relying on range vector functions applied to a gauge. Thank you again for your suggestion, and I hope this clarifies the limitations of the proposed approach. |
Prometheus is the system that "remembers". The exporter is explicitly stateless. I suggest you read about recording rules. |
Description
I would like to propose adding a new persistent counter metric that accumulates the HTTP status codes returned by the HTTP probe. Currently, the Blackbox Exporter exposes a gauge metric (probe_http_status_code) representing the last HTTP status code received during a probe. However, for many use cases (e.g., calculating cumulative success rates or error counts over time), it is desirable to have a monotonically increasing counter for HTTP status codes.
that will a solution of the first part of "SLI/SLO friendly metrics #925
SLIs for success rates are built with counters "#925
Motivation & Goals
• Improved Observability: With a cumulative counter (e.g., probe_http_status_counter_total with labels for target and status code), it becomes easier to calculate rates (using functions like rate() or increase()) over custom time windows in Prometheus.
• Consistency with Prometheus Counters: Counters are generally used for event counts and having a counter metric for HTTP statuses would be consistent with best practices.
Proposed Implementation
Modification in ProbeHTTP: In the ProbeHTTP function (in prober/http.go), remove the local instantiation of a counter and, instead, increment the global counter each time a probe is executed:
Exposure: Ensure that the global counter is registered during package initialization (for example, in a new file such as global_metrics.go), so that it persists across multiple probe executions and is exposed on the /metrics endpoint.
Additional Considerations:
• The new metric should be documented and added to the exporter’s README and any relevant configuration examples.
Conclusion:
This change would enhance the observability of HTTP probe results by providing a cumulative counter for HTTP status codes. I believe this would be a valuable addition for users who rely on long-term metrics and SLO calculations .
I am happy to work on a pull request to implement this feature if there is interest.
The text was updated successfully, but these errors were encountered: