-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep failed result history per target #750
Comments
If I understand correctly what you are saying, you want to do some relabeling in Prometheus. For example, as shown here: https://www.robustperception.io/what-percentage-of-time-is-my-service-down-for With that particular configuration each target will get its own "instance" value, and each module will get its own "job", so you can query the job/instance combination. Is that what you are trying to do? |
I think this is more about the history shown in the UI. I think it is really difficult because we can have an infinite number of targets, it is upon the requester to ask. |
Oh, I understand. I think you want to capture and upload blackbox_exporter logs, so that you can see the failure (e.g. probe_success) and go to the corresponding logs to identify the issue. You can use e.g. Loki for that. |
We keep some amount of debug logs in memory in the exporter, so it's visible in the UI. The difficulty we have is that we have a medium number of blackbox targets, around a couple hundred, broken down into 5 or so modules. We can enable longer history, but the UI isn't organized by module or target, so it's hard to follow. |
The other issue is there's no option for the blackbox exporter to log failures only. So you can only run at debug level, which is too noisy. Having an option like |
I developed a proxy to do this. The proxy takes /metrics call, add ?debug=true to the query, passes it to blackbox_exporter, saves the logs and metrics in a CSV file, and returns the metrics to Prometheus. |
(This is YOLO quality so it's not on github) |
I think we can easily implement a flag for logging errors. Maybe @igorwwwwwwwwwwwwwwwwwwww would be interested in implementing this? |
I'd merge that. |
Currently the result history is stored globally across all probes. This means that if there is one target that is constantly failing, and one that only fails occasionally, the failing one will kick the rare one out of the result history.
So when we then come in and try to understand why that rare failure occurred, it is likely gone from the history.
If we were to track these separately per target, it'd be much easier to figure out what happened, without having to increase the history limit.
The text was updated successfully, but these errors were encountered: