Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inhibition rule not working #8029

Closed
3 tasks done
ricksj5 opened this issue Jan 14, 2025 · 1 comment
Closed
3 tasks done

Inhibition rule not working #8029

ricksj5 opened this issue Jan 14, 2025 · 1 comment
Labels
question The question issue

Comments

@ricksj5
Copy link

ricksj5 commented Jan 14, 2025

Is your question request related to a specific component?

vmalert | Inhibition rule

Describe the question in detail

We have some inhibition rules that are working as expected, but we are trying to add a new inhibition rule without the "equal" field, and it is not working. We also tested the new rule with the "equal" field like the other rules, but it still did not work.

Working Rules:

  inhibit_rules:
    - source_match:
        alertname: "BlackboxProbeFailed"
      target_match_re:
        severity: "very high|high|warning"
      equal: ["hostname"]
    - source_match:
        alertname: "Network-Down"
      target_match_re:
        alertname: "BlackboxProbeFailed|Host-DOWN|prometheus-heartbeat"
      equal: ["category"]

New rule is added after the above rules without equal field:

   - source_match:
       alertname: "Test-service-cron"
     target_match_re:
       alertname: "Test-service-sshd"

Below are test alerting rules created for the same:

          - alert: Test-service-cron
            expr: node_systemd_unit_state{name="cron.service",exported_state="active"} == 0
            for: 5m
            labels:
              severity: very high
              category: Exceptions
            annotations:
              description: "Service has been down for over 5 minutes in - {{$labels.hostname}}"
              summary: "RED - {{$labels.hostname}} - CRON Service down"

          - alert: Test-service-sshd
            expr: node_systemd_unit_state{name="sshd.service",exported_state="active"} == 0
            for: 5m
            labels:
              severity: very high
              spc: disabled
              category: Exceptions
            annotations:
              description: "Service sshd has been down for over 5 minutes in - {{$labels.hostname}}"
              summary: "RED - {{$labels.hostname}} -  SSHD Service down"

To test the new rule, we first stopped the cron service. Once the "Test-service-cron" alert was fired, we stopped the sshd service. However, the "Test-service-sshd" alert also fired, indicating that the inhibition rule is not working as expected. The inhibition rule should suppress the target alert, but it did not. We verified the alert firing status through the "ALERTS" metric.

Questions:

  1. Are there any specific requirements or conditions for inhibition rules to work without the "equal" field?
  2. Could there be any conflicts or precedence issues with the existing inhibition rules that might affect the new rule?
  3. Could there be any version-specific issues or bugs related to inhibition rules that we should be aware of?

Troubleshooting docs

@ricksj5 ricksj5 added the question The question issue label Jan 14, 2025
@zekker6
Copy link
Contributor

zekker6 commented Jan 24, 2025

Going to close this one in favor of an issue opened in Alertmanager's repo here as it has a comment already - prometheus/alertmanager#4205

@zekker6 zekker6 closed this as completed Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question The question issue
Projects
None yet
Development

No branches or pull requests

2 participants