Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request]: Documenting no-data settings for the custom threshold rule #4068

Closed
maryam-saeidi opened this issue Jul 15, 2024 · 0 comments · Fixed by #4418
Closed

[Request]: Documenting no-data settings for the custom threshold rule #4068

maryam-saeidi opened this issue Jul 15, 2024 · 0 comments · Fixed by #4418

Comments

@maryam-saeidi
Copy link
Member

maryam-saeidi commented Jul 15, 2024

Description

In the custom threshold rule, we have 2 settings:

  1. Alert on no data (alertOnNoData)
  2. Alert on missing group (alertOnGroupDisappear)

At each given time, only one of these settings can be used depending on whether there is a group by field for the rule or not. In this PR, we tried to improve the UX for using these settings by only showing one setting in the UI and changing the related underlying setting depending on whether there is a group by field or not.

Here is a description related to these settings that we would like to include in this document:
In the API, we have 2 settings related to no data:

  1. alertOnNoData: this is used when we don't have any group by and will trigger a no alert data if the condition doesn’t report any data over the expected time period or if the rule fails to query Elasticsearch. (Essentially, it means something is wrong, and we don't have data to evaluate the related threshold)
  2. alertOnGroupDisappeer: When we have group by fields and after seeing a group, if that group does not report any data, we trigger a missing group no data alert.
    Here is an example scenario when we have host.name as the group by field for CPU usage above 80%:
    • During the first rule execution, we have 2 hosts that are reporting data: host-1 and host-2

    • In the next rule execution, host-1 does not report any data, so we trigger a no-data alert for host-1

    • In the next execution, if host-1 starts reporting data again, we will have 2 scenarios:

      • If the host-1 reports data for CPU usage and it is above the threshold of 80%, then we don't trigger a new alert, but we change the existing alert from no-data to a triggered alert that breaches the threshold. It is important to keep in mind that we don't send any notifications in this case because there is still an ongoing issue.
      • If the host-1 reports CPU usage below the threshold of 80%, then we change the alert status to recovered.

It would be great to also include information about how users can untrack a group that is decommissioned (related ticket). For example, maybe in the above scenario, host-1 is decommissioned so we don't need the related alert anymore. In that case, user can select that alert in the alert table and use mark as untrack action:

In the UI, we are now using only one setting, and we enable the related API setting according to the group by field. Here, you can see that we also adjust the related tooltip based on whether a group by field is selected or not.

image

Resources

PR: elastic/kibana#188300

Related issues:

Which documentation set does this change impact?

Stateful and Serverless

Feature differences

This feature is identical in both environments.

What release is this request related to?

8.16

Collaboration model

The documentation team

Point of contact.

Main contact: @maryam-saeidi

Stakeholders:

maryam-saeidi added a commit to elastic/kibana that referenced this issue Jul 18, 2024
…d rule (#188300)

Fixes #188229, related to #183921

Documentation request:
elastic/observability-docs#4068

## Summary

**Note**: I've added an item to deprecate/remove one of the no-data
settings in v9.

Fixes not showing no data setting and set the related settings to false
by default. Based on @maciejforcone's input, we can combine these 2
settings for simplicity, as one of them works at a time.

I also changed the tooltip according to which setting is relevant: (we
use one action group for both of them in connectors)

|No data (without group)|Missing group (with group)|
|---|---|

|![image](https://github.com/user-attachments/assets/ecf45dd2-d2a7-46ce-abd0-e2a07426f28e)|![image](https://github.com/user-attachments/assets/8dedd0fe-bb4b-4e51-808f-f65f54ee73fd)|

Here is how the setting is applied in API:


https://github.com/user-attachments/assets/52c52724-6011-4f6d-8464-023cd9a9ea10
@dedemorton dedemorton self-assigned this Oct 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants