Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sophora-server] Adjust alert "SophoraServerAPISlow" #141

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion charts/sophora-server/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 2.5.2
version: 2.6.0

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
Expand Down
4 changes: 2 additions & 2 deletions charts/sophora-server/alerting-runbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ This document is a reference to the alerts this Helm chart can fire.

**Severity:** high

**Summary:** The API of the server exhibits a response time exceeding 300ms for more than 15 minutes at the 95th percentile.
**Summary:** The API of the server exhibits a response time exceeding ${threshold} for more than 15 minutes at the 95th percentile.

**Remediation steps:**

Expand Down Expand Up @@ -105,4 +105,4 @@ This document is a reference to the alerts this Helm chart can fire.
* Check if the primary server is running
* Check the logs of the server
* Check the logs of the primary server
* Check whether there are any network issues
* Check whether there are any network issues
4 changes: 2 additions & 2 deletions charts/sophora-server/templates/prometheusrule.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@ spec:
runbook_url: 'https://github.com/subshell/helm-charts/blob/main/charts/sophora-server/alerting-runbook.md'
- alert: SophoraServerAPISlow
for: 15m
expr: 'histogram_quantile(0.95, sum(rate(sophora_server_contentmanager_call_duration_seconds_bucket{job="{{ include "sophora-server.fullname" . }}"}[1m])) by (pod, le)) > 0.3'
expr: 'histogram_quantile(0.95, sum(rate(sophora_server_contentmanager_call_duration_seconds_bucket{job="{{ include "sophora-server.fullname" . }}"}[1m])) by (pod, le)) > 0.5'
labels:
severity: high
annotations:
summary: Sophora Server API is slow
description: The API of the server "{{`{{ $labels.pod }}`}}" exhibits a response time exceeding 300ms for more than 15 minutes at the 95th percentile.
description: The API of the server "{{`{{ $labels.pod }}`}}" exhibits a response time exceeding 500ms for more than 15 minutes at the 95th percentile.
runbook_url: 'https://github.com/subshell/helm-charts/blob/main/charts/sophora-server/alerting-runbook.md'
- alert: SophoraServerAsyncEventQueueBlocked
for: 10m
Expand Down
Loading