Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep failed result history per target #750

Open
igorwwwwwwwwwwwwwwwwwwww opened this issue Feb 19, 2021 · 9 comments
Open

Keep failed result history per target #750

igorwwwwwwwwwwwwwwwwwwww opened this issue Feb 19, 2021 · 9 comments

Comments

@igorwwwwwwwwwwwwwwwwwwww

Currently the result history is stored globally across all probes. This means that if there is one target that is constantly failing, and one that only fails occasionally, the failing one will kick the rare one out of the result history.

So when we then come in and try to understand why that rare failure occurred, it is likely gone from the history.

If we were to track these separately per target, it'd be much easier to figure out what happened, without having to increase the history limit.

@mem
Copy link
Contributor

mem commented Feb 19, 2021

If I understand correctly what you are saying, you want to do some relabeling in Prometheus. For example, as shown here: https://www.robustperception.io/what-percentage-of-time-is-my-service-down-for

With that particular configuration each target will get its own "instance" value, and each module will get its own "job", so you can query the job/instance combination.

Is that what you are trying to do?

@roidelapluie
Copy link
Member

I think this is more about the history shown in the UI.

I think it is really difficult because we can have an infinite number of targets, it is upon the requester to ask.

@mem
Copy link
Contributor

mem commented Feb 22, 2021

Oh, I understand.

I think you want to capture and upload blackbox_exporter logs, so that you can see the failure (e.g. probe_success) and go to the corresponding logs to identify the issue. You can use e.g. Loki for that.

@SuperQ
Copy link
Member

SuperQ commented Feb 22, 2021

We keep some amount of debug logs in memory in the exporter, so it's visible in the UI.

The difficulty we have is that we have a medium number of blackbox targets, around a couple hundred, broken down into 5 or so modules.

We can enable longer history, but the UI isn't organized by module or target, so it's hard to follow.

@SuperQ
Copy link
Member

SuperQ commented Feb 22, 2021

The other issue is there's no option for the blackbox exporter to log failures only. So you can only run at debug level, which is too noisy.

Having an option like --probe.log-failures would make the logs to Loki or whatever more useful.

@roidelapluie
Copy link
Member

I developed a proxy to do this. The proxy takes /metrics call, add ?debug=true to the query, passes it to blackbox_exporter, saves the logs and metrics in a CSV file, and returns the metrics to Prometheus.

@roidelapluie
Copy link
Member

(This is YOLO quality so it's not on github)

@SuperQ
Copy link
Member

SuperQ commented Feb 22, 2021

I think we can easily implement a flag for logging errors. Maybe @igorwwwwwwwwwwwwwwwwwwww would be interested in implementing this?

@roidelapluie
Copy link
Member

I'd merge that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants