You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During the resolution process, we had a lot of general health metric alerts firing but it wasn't immediately clear it was an ingestion halt until we looked at error logs.
What would you like to see?
An explicit metric that tracks ingestion failures. When a single instance of this occurs, we should alert (critical) on it. This will make it clearer to the responding engineer immediately what the root cause of the issue is.
The text was updated successfully, but these errors were encountered:
What problem does your feature solve?
There's been 2 incidents in the past year where ingestion in horizon halted (for different reasons):
During the resolution process, we had a lot of general health metric alerts firing but it wasn't immediately clear it was an ingestion halt until we looked at error logs.
What would you like to see?
An explicit metric that tracks ingestion failures. When a single instance of this occurs, we should alert (critical) on it. This will make it clearer to the responding engineer immediately what the root cause of the issue is.
The text was updated successfully, but these errors were encountered: