Keep failed result history per target #750

igorwwwwwwwwwwwwwwwwwwww · 2021-02-19T13:59:09Z

Currently the result history is stored globally across all probes. This means that if there is one target that is constantly failing, and one that only fails occasionally, the failing one will kick the rare one out of the result history.

So when we then come in and try to understand why that rare failure occurred, it is likely gone from the history.

If we were to track these separately per target, it'd be much easier to figure out what happened, without having to increase the history limit.

mem · 2021-02-19T19:49:28Z

If I understand correctly what you are saying, you want to do some relabeling in Prometheus. For example, as shown here: https://www.robustperception.io/what-percentage-of-time-is-my-service-down-for

With that particular configuration each target will get its own "instance" value, and each module will get its own "job", so you can query the job/instance combination.

Is that what you are trying to do?

roidelapluie · 2021-02-19T19:53:15Z

I think this is more about the history shown in the UI.

I think it is really difficult because we can have an infinite number of targets, it is upon the requester to ask.

mem · 2021-02-22T13:17:35Z

Oh, I understand.

I think you want to capture and upload blackbox_exporter logs, so that you can see the failure (e.g. probe_success) and go to the corresponding logs to identify the issue. You can use e.g. Loki for that.

SuperQ · 2021-02-22T21:18:46Z

We keep some amount of debug logs in memory in the exporter, so it's visible in the UI.

The difficulty we have is that we have a medium number of blackbox targets, around a couple hundred, broken down into 5 or so modules.

We can enable longer history, but the UI isn't organized by module or target, so it's hard to follow.

SuperQ · 2021-02-22T21:22:24Z

The other issue is there's no option for the blackbox exporter to log failures only. So you can only run at debug level, which is too noisy.

Having an option like --probe.log-failures would make the logs to Loki or whatever more useful.

roidelapluie · 2021-02-22T21:23:30Z

I developed a proxy to do this. The proxy takes /metrics call, add ?debug=true to the query, passes it to blackbox_exporter, saves the logs and metrics in a CSV file, and returns the metrics to Prometheus.

roidelapluie · 2021-02-22T21:23:57Z

(This is YOLO quality so it's not on github)

SuperQ · 2021-02-22T21:28:50Z

I think we can easily implement a flag for logging errors. Maybe @igorwwwwwwwwwwwwwwwwwwww would be interested in implementing this?

roidelapluie · 2021-02-22T21:31:09Z

I'd merge that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep failed result history per target #750

Keep failed result history per target #750

igorwwwwwwwwwwwwwwwwwwww commented Feb 19, 2021

mem commented Feb 19, 2021

roidelapluie commented Feb 19, 2021

mem commented Feb 22, 2021

SuperQ commented Feb 22, 2021

SuperQ commented Feb 22, 2021

roidelapluie commented Feb 22, 2021

roidelapluie commented Feb 22, 2021

SuperQ commented Feb 22, 2021 •

edited

Loading

roidelapluie commented Feb 22, 2021

Keep failed result history per target #750

Keep failed result history per target #750

Comments

igorwwwwwwwwwwwwwwwwwwww commented Feb 19, 2021

mem commented Feb 19, 2021

roidelapluie commented Feb 19, 2021

mem commented Feb 22, 2021

SuperQ commented Feb 22, 2021

SuperQ commented Feb 22, 2021

roidelapluie commented Feb 22, 2021

roidelapluie commented Feb 22, 2021

SuperQ commented Feb 22, 2021 • edited Loading

roidelapluie commented Feb 22, 2021

SuperQ commented Feb 22, 2021 •

edited

Loading