Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threshold email overload on failed server #9

Open
wr0ngway opened this issue Oct 7, 2011 · 0 comments
Open

Threshold email overload on failed server #9

wr0ngway opened this issue Oct 7, 2011 · 0 comments

Comments

@wr0ngway
Copy link

wr0ngway commented Oct 7, 2011

I had a situation with my graphite server today, which caused connections to it from this plugin to fail, which resulted in thousands of emails getting sent to my ops email (hundreds from each collectd node).

I have thresholds and notifications setup on each node to email me when numbers get too high, and it also notifies me when a metric hasn't been updated within N iterations. The emails I got during the failure were all of the latter type.

I think this is happening because a failure in send_to_graphite seems to cause the data to stop getting through to the rest of collectd, and thus it thinks metrics aren't getting updated. Not sure how perl works, but maybe an exception is getting thrown which propagates back into collectd and aborts the data collection? Can you wrap all of send_to_graphite in an exception handler and log/ignore? The failure may not only be on connect, but on write (my graphite server was having IO issues, so connection was sometimes ok, but write was failing)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant