-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable disk size alerting for AWS EBS volumes on VEDA staging hub #5062
Comments
@sgibson91, raising here for visibility so you're not caught by surprise in case you haven't seen #4923 (for more context https://2i2c.slack.com/archives/C055A1J1DRP/p1727942694562319) |
Thank you @GeorgianaElena I did see it and did remember about it, but hadn't done the work digging it out yet! Thank you! @sunu, #4923 will need to be reverted for Grafana alerting :) |
@sunu found it difficult to programatically enable grafana alerting via grafonnet. I have proposed that we add the enablement of alerts as a one-time manual step in our hub deployment guide for now to close this issue out. I will open an issue to track a spike investigating grafonnet further in the new year, maybe with more 2i2c folks helping him. |
Grafana needs access to 2i2c's Freshdesk SMTP server to send emails to support[at] |
@sgibson91 let's send these alerts to pagerduty rather than use smtp directly. I think the integration will be more straightforward this way, given we already do that in https://github.com/2i2c-org/infrastructure/blob/main/terraform/uptime-checks/pagerduty.tf |
Amazing, thank you for the suggestion @yuvipanda! |
Another simpler suggestion is to try to use prometheus alertmanager. We already have it deployed, but disabled (
It should be easier than trying to automatically do this in grafana:
This would also allow us to add more alerting in the future without having to directly tie it to a grafana graph. |
These alerts may get overwritten when we deploy the grafana dashboard next, so will have to be checked to see if they persist. |
Oh, that's a great suggestion! Let's go with Prometheus Alertmanager if that's an option. |
Context
Task list
Tasks
Definition of Done
The text was updated successfully, but these errors were encountered: