Skip to content

Commit

Permalink
ci: 🎡 add ingress alerts (#78)
Browse files Browse the repository at this point in the history
* ci: 🎡 add ingress alerts

* terraform-docs: automated action

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
  • Loading branch information
jaskaransarkaria and github-actions[bot] authored Feb 28, 2024
1 parent c18183a commit 0ded266
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ No modules.
|------|------|
| [helm_release.nginx_ingress](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release) | resource |
| [kubectl_manifest.nginx_ingress_default_certificate](https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/resources/manifest) | resource |
| [kubectl_manifest.prometheus_rule_alert](https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/resources/manifest) | resource |
| [kubernetes_config_map.fluent-bit-config](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/config_map) | resource |
| [kubernetes_config_map.fluent_bit_lua_script](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/config_map) | resource |
| [kubernetes_config_map.logrotate_config](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/config_map) | resource |
Expand Down
9 changes: 9 additions & 0 deletions main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,12 @@ resource "kubectl_manifest" "nginx_ingress_default_certificate" {
kubernetes_namespace.ingress_controllers
]
}

#########################
# prometheus rule alert #
#########################
resource "kubectl_manifest" "prometheus_rule_alert" {
count = var.controller_name == "default" ? 1 : 0
depends_on = [helm_release.nginx_ingress]
yaml_body = file("${path.module}/resources/alerts.yaml")
}
30 changes: 30 additions & 0 deletions resources/alerts.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: ingress-controller-errors
namespace: ingress-controllers
labels:
prometheus: cloud-platform
spec:
groups:
- name: ingress-controllers
rules:
- alert: IngressControllerIsCrashLoopBackoffing
expr: rate(kube_pod_container_status_restarts_total{job="kube-state-metrics",namespace="ingress-controllers"}[15m]) * 60 * 15 > 0
for: 10m
labels:
severity: warning
annotations:
message: An Ingress Controller pod is CrashLoopBackOff'ing
- alert: IngressControllerIsOOMKilled
expr: |-
kube_pod_container_status_last_terminated_reason{container="controller",namespace="ingress-controllers",reason="OOMKilled"} == 1
and on(container, namespace, pod) increase(kube_pod_container_status_restarts_total{container="controller",namespace="ingress-controllers"}[5m]) > 0
for: 15m
labels:
severity: warning
annotations:
message: |
An Ingress Controller pod has restarted because of OOMKilled. This alert works by watching for a pod that has been restarted within 5 minutes and the last termination status is OOMKilled.

0 comments on commit 0ded266

Please sign in to comment.