Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Add information about default alert settings #611 fixes #611 #640

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Unnati-Gupta24
Copy link

@Unnati-Gupta24 Unnati-Gupta24 commented Feb 21, 2025

Checklist

  • I have read the OpenWISP Contributing Guidelines.
  • I have manually tested the changes proposed in this pull request.
  • I have written new test cases for new code and/or updated existing tests for changes to existing code.
  • I have updated the documentation.

Reference to Existing Issue

Closes #611.

Description of Changes

Ping Dependency: The device status instantly becomes "critical" if the ping check is unable to contact the management interface, which is assumed to be properly configured.

Config Applied Timing: Temporary delays are avoided by using a 5-minute tolerance.

WiFi Clients: Expected network demand is reflected in the maximum and minimum client thresholds.

Iperf3: By default, alerts are turned off, but they can be turned on.

@pandafy @nemesifier please review it.

We can find the reference of changes in file
openwisp_monitoring/monitoring/configuration.py

DEFAULT_METRICS = {
    'ping': {
        'label': _('Ping'),
        'name': 'Ping',
        'key': 'ping',
        'field_name': 'reachable',
        'related_fields': ['loss', 'rtt_min', 'rtt_max', 'rtt_avg'],
        'charts': {
            'uptime': {
                'type': 'bar',
                'title': _('Ping Success Rate'),
                'description': _(
                    'A value of 100% means reachable, 0% means unreachable, values in '
                    'between 0% and 100% indicate the average reachability in the '
                    'period observed. Obtained with the fping linux program.'
                ),
              .
              .
              .
              .
        'alert_settings': {'operator': '<', 'threshold': 1, 'tolerance': 0},
        'notification': {
            'problem': {
                'verbose_name': 'Ping PROBLEM',
                'verb': _('is not reachable'),
                'level': 'warning',
                'email_subject': _(
                    '[{site.name}] PROBLEM: {notification.target} {notification.verb}'
                ),
                'message': _(
                    'The device [{notification.target}]({notification.target_link}) '
                    '{notification.verb}.'
                ),
            },
            'recovery': {
                'verbose_name': 'Ping RECOVERY',
                'verb': _('is reachable again'),
                'level': 'info',
                'email_subject': _(
                    '[{site.name}] RECOVERY: {notification.target} {notification.verb}'
                ),
                'message': _(
                    'The device [{notification.target}]({notification.target_link}) '
                    '{notification.verb}.'
                ),
            },
        },
    },
    'config_applied': {
        'label': _('Configuration Applied'),
        'name': 'Configuration Applied',
        'key': 'config_applied',
        'field_name': 'config_applied',
        'alert_settings': {'operator': '<', 'threshold': 1, 'tolerance': 5},
        'notification': {
            'problem': {
                'verbose_name': 'Configuration Applied PROBLEM',
                'verb': _('has not been applied'),
                'level': 'warning',
                'email_subject': _(
                    '[{site.name}] PROBLEM: {notification.target} configuration '
                    'status issue'
                ),
                'message': _(
                    'The configuration of device [{notification.target}]'
                    '({notification.target_link}) {notification.verb} in a timely manner.'
                ),
            },
            'recovery': {
                'verbose_name': 'Configuration Applied RECOVERY',
                'verb': _('configuration has been applied again'),
                'level': 'info',
                'email_subject': _(
                    '[{site.name}] RECOVERY: {notification.target} {notification.verb} '
                    'successfully'
                ),
                'message': _(
                    'The configuration of device [{notification.target}]({notification.target_link}) '
                    '{notification.verb} successfully.'
                ),
            },
        },
    },
  .
  .
  .
  .
    'wifi_clients_max': {
        'label': _('WiFi Clients (Maximum)'),
        'name': '{name}',
        'key': 'wifi_clients_max',
        'field_name': 'clients',
        'alert_settings': {'operator': '>', 'threshold': 50, 'tolerance': 120},
        'notification': {
            'problem': {
                'verbose_name': 'Max WiFi clients PROBLEM',
                'verb': _('exceeds the expected threshold'),
                'level': 'warning',
                'email_subject': _(
                    '[{site.name}] PROBLEM: {notification.target} has too many WiFi clients'
                ),
                'message': _(
                    'The WiFi client count on [{notification.target}]({notification.target_link})'
                    ' {notification.verb}.'
                ),
            },
            'recovery': {
                'verbose_name': 'Max WiFi clients RECOVERY',
                'verb': _('has decreased'),
                'level': 'info',
                'email_subject': _(
                    '[{site.name}] RECOVERY: {notification.target} WiFi client count has returned to normal'
                ),
                'message': (
                    'The WiFi client count on  [{notification.target}]({notification.target_link})'
                    ' {notification.verb} and is now within the expected range.'
                ),
            },
        },
    },
    'wifi_clients_min': {
        'label': _('WiFi Clients (Minimum)'),
        'name': '{name}',
        'key': 'wifi_clients_min',
        'field_name': 'clients',
        'alert_settings': {'operator': '<', 'threshold': 1, 'tolerance': 0},
        'notification': {
            'problem': {
                'verbose_name': 'Min WiFi clients PROBLEM',
                'verb': _('is below the expected threshold'),
                'level': 'warning',
                'email_subject': _(
                    '[{site.name}] PROBLEM: {notification.target} has too few WiFi clients'
                ),
                'message': _(
                    'The WiFi client count on [{notification.target}]({notification.target_link})'
                    ' {notification.verb}.'
                ),
            },
            'recovery': {
                'verbose_name': 'Min WiFi clients RECOVERY',
                'verb': _('has increased'),
                'level': 'info',
                'email_subject': _(
                    '[{site.name}] RECOVERY: {notification.target} has WiFi clients connecting again'
                ),
                'message': (
                    'The WiFi client count on [{notification.target}]({notification.target_link})'
                    ' {notification.verb} and is now within the expected range.'
                ),
            },
        },
    },
  .
  .
  .
 .
            'recovery': {
                'verbose_name': 'Disk usage RECOVERY',
                'verb': _('has returned to normal levels'),
                'level': 'info',
                'email_subject': _(
                    '[{site.name}] RECOVERY: {notification.target} disk usage '
                    '{notification.verb}'
                ),
                'message': (
                    'The device [{notification.target}]({notification.target_link}) '
                    'disk usage {notification.verb}.'
                ),
            },
        },
    },
    'memory': {
        'label': _('Memory usage'),
        'name': 'Memory usage',
        'key': 'memory',
        'field_name': 'percent_used',
        'related_fields': [
            'total_memory',
            'free_memory',
            'buffered_memory',
            'shared_memory',
            'cached_memory',
            'available_memory',
        ],
        'charts': {
            'memory': {
                'type': 'scatter',
                'title': _('Memory Usage'),
                'description': _('Percentage of memory (RAM) being used.'),
                'summary_labels': [_('Memory Usage')],
                'unit': '%',
                'colors': [DEFAULT_COLORS[4]],
                'order': 250,
                'query': chart_query['memory'],
            }
        },
        'alert_settings': {'operator': '>', 'threshold': 95, 'tolerance': 5},
        'notification': {
            'problem': {
                'verbose_name': 'Memory usage PROBLEM',
                'verb': _('is experiencing a peak in'),
                'level': 'warning',
                'email_subject': _(
                    '[{site.name}] PROBLEM: {notification.target} {notification.verb} RAM usage'
                ),
                'message': _(
                    'The device [{notification.target}]({notification.target_link}) '
                    '{notification.verb} RAM usage which has gone '
                    'over {notification.actor.alertsettings.threshold}%.'
                ),
            },
            'recovery': {
                'verbose_name': 'Memory usage RECOVERY',
                'verb': _('has returned to normal levels'),
                'level': 'info',
                'email_subject': _(
                    '[{site.name}] RECOVERY: {notification.target} RAM usage {notification.verb}'
                ),
                'message': (
                    'The device [{notification.target}]({notification.target_link}) RAM usage '
                    '{notification.verb}.'
                ),
            },
        },
    },
    'cpu': {
        'label': _('CPU usage'),
        'name': 'CPU usage',
        'key': 'cpu',
        'field_name': 'cpu_usage',
        'related_fields': ['load_1', 'load_5', 'load_15'],
        'charts': {
            'cpu': {
                'type': 'scatter',
                'title': _('CPU Load'),
                'description': _(
                    'Average CPU load, measured using the Linux load averages, '
                    'taking into account the number of available CPUs.'
                ),
                'summary_labels': [_('CPU Load')],
                'unit': '%',
                'colors': [DEFAULT_COLORS[-3]],
                'order': 260,
                'query': chart_query['cpu'],
            }
        },
        'alert_settings': {'operator': '>', 'threshold': 90, 'tolerance': 5},
        'notification': {
            'problem': {
                'verbose_name': 'CPU usage PROBLEM',
                'verb': _('is experiencing a peak in'),
                'level': 'warning',
                'email_subject': _(
                    '[{site.name}] PROBLEM: {notification.target} {notification.verb} CPU usage'
                ),
                'message': _(
                    'The device [{notification.target}]({notification.target_link}) '
                    '{notification.verb} CPU usage which has gone '
                    'over {notification.actor.alertsettings.threshold}%.'
                ),
            },
       

@Unnati-Gupta24
Copy link
Author

I have done qa-checks locally now.
restructured the files and made some changes in the file now all the tests are passing.
a1

@devkapilbansal devkapilbansal self-requested a review February 23, 2025 18:56
@devkapilbansal devkapilbansal added the documentation Improvements or additions to documentation label Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[docs] Add information about default alert settings
2 participants