You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had a situation with my graphite server today, which caused connections to it from this plugin to fail, which resulted in thousands of emails getting sent to my ops email (hundreds from each collectd node).
I have thresholds and notifications setup on each node to email me when numbers get too high, and it also notifies me when a metric hasn't been updated within N iterations. The emails I got during the failure were all of the latter type.
I think this is happening because a failure in send_to_graphite seems to cause the data to stop getting through to the rest of collectd, and thus it thinks metrics aren't getting updated. Not sure how perl works, but maybe an exception is getting thrown which propagates back into collectd and aborts the data collection? Can you wrap all of send_to_graphite in an exception handler and log/ignore? The failure may not only be on connect, but on write (my graphite server was having IO issues, so connection was sometimes ok, but write was failing)
The text was updated successfully, but these errors were encountered:
I had a situation with my graphite server today, which caused connections to it from this plugin to fail, which resulted in thousands of emails getting sent to my ops email (hundreds from each collectd node).
I have thresholds and notifications setup on each node to email me when numbers get too high, and it also notifies me when a metric hasn't been updated within N iterations. The emails I got during the failure were all of the latter type.
I think this is happening because a failure in send_to_graphite seems to cause the data to stop getting through to the rest of collectd, and thus it thinks metrics aren't getting updated. Not sure how perl works, but maybe an exception is getting thrown which propagates back into collectd and aborts the data collection? Can you wrap all of send_to_graphite in an exception handler and log/ignore? The failure may not only be on connect, but on write (my graphite server was having IO issues, so connection was sometimes ok, but write was failing)
The text was updated successfully, but these errors were encountered: