Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: notification variable is encoded as a string, fix alert thresholds #231

Merged
merged 2 commits into from
Oct 18, 2023

Conversation

geekbrother
Copy link
Contributor

Description

This PR fixes firing alerts for 5xx errors by making the following changes:

  • Removing jsonencode from the local notifications variable that causes encoding json array as a string.
  • Changing evaluation for the 5xx errors for every 5 minutes with 0 threshold. In this case, we can catch 500 errors even when the error appears and then disappears, so we can investigate it.
  • Changing the threshold for an error count from 5 to 1 so we can catch and investigate a single 500 errors if they occur.
  • Changing the policy for no data state from: no_data to keep the last state to provide consistent alarm firing in case the threshold was met and then no data was provided.

Resolves #210

How Has This Been Tested?

Deployed from the PR branch to the staging environment, the alert was fired successfully and delivered to the opsgenie and slack channel when the 500 error count was > 1 for 5 minutes.

Due Diligence

  • Breaking change
  • Requires a documentation update
  • Requires a e2e/integration test update

@geekbrother geekbrother added the area-telemetry Metrics & Monitoring label Oct 2, 2023
@geekbrother geekbrother self-assigned this Oct 2, 2023
@arein arein added the accepted The issue has been accepted into the project label Oct 2, 2023
@geekbrother geekbrother temporarily deployed to staging October 2, 2023 14:54 — with GitHub Actions Inactive
@geekbrother geekbrother marked this pull request as ready for review October 2, 2023 14:54
@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2023

Show Plan

[command]/home/runner/work/_temp/1dca0faf-4f43-45a3-a61e-d66cbc25759b/terraform-bin -chdir=terraform show -no-color /tmp/plan.tfplan

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration
and found no differences, so no changes are needed.
::debug::Terraform exited with code 0.
::debug::stdout: %0ANo changes. Your infrastructure matches the configuration.%0A%0ATerraform has compared your real infrastructure against your configuration%0Aand found no differences, so no changes are needed.%0A
::debug::stderr: 
::debug::exitcode: 0

Action: pull_request

@geekbrother geekbrother force-pushed the max/fix/graphana_alerts branch from decd0e4 to 42754db Compare October 18, 2023 21:03
@geekbrother geekbrother temporarily deployed to staging October 18, 2023 21:03 — with GitHub Actions Inactive
@github-actions
Copy link
Contributor

Show Plan

[command]/home/runner/work/_temp/c554bfac-59c2-492e-be49-a0a1bb741bfb/terraform-bin -chdir=terraform show -no-color /tmp/plan.tfplan

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  ~ update in-place
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # module.ecs.aws_ecs_service.app_service will be updated in-place
  ~ resource "aws_ecs_service" "app_service" {
        id                                 = "arn:aws:ecs:eu-central-1:898587786287:service/staging-push/staging-push-service"
        name                               = "staging-push-service"
        tags                               = {}
      ~ task_definition                    = "arn:aws:ecs:eu-central-1:898587786287:task-definition/staging-push:121" -> (known after apply)
        # (15 unchanged attributes hidden)

        # (4 unchanged blocks hidden)
    }

  # module.ecs.aws_ecs_task_definition.app_task_definition must be replaced
-/+ resource "aws_ecs_task_definition" "app_task_definition" {
      ~ arn                      = "arn:aws:ecs:eu-central-1:898587786287:task-definition/staging-push:121" -> (known after apply)
      ~ arn_without_revision     = "arn:aws:ecs:eu-central-1:898587786287:task-definition/staging-push" -> (known after apply)
      ~ container_definitions    = (sensitive value) # forces replacement
      ~ id                       = "staging-push" -> (known after apply)
      ~ revision                 = 121 -> (known after apply)
      - tags                     = {} -> null
        # (9 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

  # module.monitoring.grafana_dashboard.at_a_glance will be updated in-place
  ~ resource "grafana_dashboard" "at_a_glance" {
      ~ config_json  = jsonencode(
          ~ {
              ~ panels               = [
                    # (6 unchanged elements hidden)
                    {
                        datasource  = {
                            type = "cloudwatch"
                            uid  = "XnMFKnQVk"
                        }
                        fieldConfig = {
                            defaults  = {
                                color      = {
                                    mode = "palette-classic"
                                }
                                custom     = {
                                    axisLabel         = ""
                                    axisPlacement     = "auto"
                                    barAlignment      = 0
                                    drawStyle         = "line"
                                    fillOpacity       = 0
                                    gradientMode      = "none"
                                    hideFrom          = {
                                        legend  = false
                                        tooltip = false
                                        viz     = false
                                    }
                                    lineInterpolation = "linear"
                                    lineWidth         = 1
                                    pointSize         = 5
                                    scaleDistribution = {
                                        type = "linear"
                                    }
                                    showPoints        = "auto"
                                    spanNulls         = false
                                    stacking          = {
                                        group = "A"
                                        mode  = "none"
                                    }
                                    thresholdsStyle   = {
                                        mode = "off"
                                    }
                                }
                                mappings   = []
                                thresholds = {
                                    mode  = "absolute"
                                    steps = [
                                        {
                                            color = "green"
                                            value = null
                                        },
                                        {
                                            color = "red"
                                            value = 80
                                        },
                                    ]
                                }
                            }
                            overrides = []
                        }
                        gridPos     = {
                            h = 9
                            w = 7
                            x = 0
                            y = 18
                        }
                        options     = {
                            legend  = {
                                calcs       = []
                                displayMode = "list"
                                placement   = "bottom"
                            }
                            tooltip = {
                                mode = "single"
                                sort = "none"
                            }
                        }
                        targets     = [
                            {
                                alias            = ""
                                datasource       = {
                                    type = "cloudwatch"
                                    uid  = "XnMFKnQVk"
                                }
                                dimensions       = {
                                    LoadBalancer = "app/staging-push-load-balancer/aea5ef9d0a34453a"
                                }
                                expression       = ""
                                id               = ""
                                matchExact       = true
                                metricEditorMode = 0
                                metricName       = "RequestCount"
                                metricQueryType  = 0
                                namespace        = "AWS/ApplicationELB"
                                period           = ""
                                queryMode        = "Metrics"
                                refId            = "A"
                                region           = "default"
                                sqlExpression    = ""
                                statistic        = "Sum"
                            },
                        ]
                        title       = "Requests"
                        type        = "timeseries"
                    },
                  ~ {
                      ~ alert       = {
                          ~ conditions          = [
                              ~ {
                                  ~ evaluator = {
                                      ~ params = [
                                          - 5,
                                          + 1,
                                        ]
                                        # (1 unchanged attribute hidden)
                                    }
                                    # (4 unchanged attributes hidden)
                                },
                              ~ {
                                  ~ evaluator = {
                                      ~ params = [
                                          - 5,
                                          + 1,
                                        ]
                                        # (1 unchanged attribute hidden)
                                    }
                                    # (4 unchanged attributes hidden)
                                },
                            ]
                          - for                 = "5m"
                            name                = "staging Echo Server 5XX alert"
                          ~ noDataState         = "no_data" -> "keep_state"
                          ~ notifications       = jsonencode([]) -> []
                            # (4 unchanged attributes hidden)
                        }
                        # (7 unchanged attributes hidden)
                    },
                    {
                        datasource  = {
                            type = "cloudwatch"
                            uid  = "XnMFKnQVk"
                        }
                        fieldConfig = {
                            defaults  = {
                                color      = {
                                    mode = "palette-classic"
                                }
                                custom     = {
                                    axisLabel         = ""
                                    axisPlacement     = "auto"
                                    barAlignment      = 0
                                    drawStyle         = "line"
                                    fillOpacity       = 0
                                    gradientMode      = "none"
                                    hideFrom          = {
                                        legend  = false
                                        tooltip = false
                                        viz     = false
                                    }
                                    lineInterpolation = "linear"
                                    lineWidth         = 1
                                    pointSize         = 5
                                    scaleDistribution = {
                                        type = "linear"
                                    }
                                    showPoints        = "auto"
                                    spanNulls         = false
                                    stacking          = {
                                        group = "A"
                                        mode  = "none"
                                    }
                                    thresholdsStyle   = {
                                        mode = "off"
                                    }
                                }
                                mappings   = []
                                thresholds = {
                                    mode  = "absolute"
                                    steps = [
                                        {
                                            color = "green"
                                            value = null
                                        },
                                        {
                                            color = "red"
                                            value = 80
                                        },
                                    ]
                                }
                            }
                            overrides = []
                        }
                        gridPos     = {
                            h = 9
                            w = 7
                            x = 14
                            y = 18
                        }
                        options     = {
                            legend  = {
                                calcs       = []
                                displayMode = "list"
                                placement   = "bottom"
                            }
                            tooltip = {
                                mode = "single"
                                sort = "none"
                            }
                        }
                        targets     = [
                            {
                                alias            = ""
                                datasource       = {
                                    type = "cloudwatch"
                                    uid  = "XnMFKnQVk"
                                }
                                dimensions       = {
                                    LoadBalancer = "app/staging-push-load-balancer/aea5ef9d0a34453a"
                                }
                                expression       = ""
                                id               = ""
                                matchExact       = true
                                metricEditorMode = 0
                                metricName       = "HTTPCode_ELB_4XX_Count"
                                metricQueryType  = 0
                                namespace        = "AWS/ApplicationELB"
                                period           = ""
                                queryMode        = "Metrics"
                                refId            = "A"
                                region           = "default"
                                sqlExpression    = ""
                                statistic        = "Sum"
                            },
                            {
                                alias            = ""
                                datasource       = {
                                    type = "cloudwatch"
                                    uid  = "XnMFKnQVk"
                                }
                                dimensions       = {
                                    LoadBalancer = "app/staging-push-load-balancer/aea5ef9d0a34453a"
                                }
                                expression       = ""
                                id               = ""
                                matchExact       = true
                                metricEditorMode = 0
                                metricName       = "HTTPCode_Target_4XX_Count"
                                metricQueryType  = 0
                                namespace        = "AWS/ApplicationELB"
                                period           = ""
                                queryMode        = "Metrics"
                                refId            = "B"
                                region           = "default"
                                sqlExpression    = ""
                                statistic        = "Sum"
                            },
                        ]
                        title       = "4XX"
                        type        = "timeseries"
                    },
                ]
                tags                 = []
                # (15 unchanged attributes hidden)
            }
        )
        id           = "0:staging-push"
        # (7 unchanged attributes hidden)
    }

Plan: 1 to add, 2 to change, 1 to destroy.
::debug::Terraform exited with code 0.
::debug::stdout: %0ATerraform used the selected providers to generate the following execution%0Aplan. Resource actions are indicated with the following symbols:%0A  ~ update in-place%0A-/+ destroy and then create replacement%0A%0ATerraform will perform the following actions:%0A%0A  # module.ecs.aws_ecs_service.app_service will be updated in-place%0A  ~ resource "aws_ecs_service" "app_service" {%0A        id                                 = "arn:aws:ecs:eu-central-1:898587786287:service/staging-push/staging-push-service"%0A        name                               = "staging-push-service"%0A        tags                               = {}%0A      ~ task_definition                    = "arn:aws:ecs:eu-central-1:898587786287:task-definition/staging-push:121" -> (known after apply)%0A        # (15 unchanged attributes hidden)%0A%0A        # (4 unchanged blocks hidden)%0A    }%0A%0A  # module.ecs.aws_ecs_task_definition.app_task_definition must be replaced%0A-/+ resource "aws_ecs_task_definition" "app_task_definition" {%0A      ~ arn                      = "arn:aws:ecs:eu-central-1:898587786287:task-definition/staging-push:121" -> (known after apply)%0A      ~ arn_without_revision     = "arn:aws:ecs:eu-central-1:898587786287:task-definition/staging-push" -> (known after apply)%0A      ~ container_definitions    = (sensitive value) # forces replacement%0A      ~ id                       = "staging-push" -> (known after apply)%0A      ~ revision                 = 121 -> (known after apply)%0A      - tags                     = {} -> null%0A        # (9 unchanged attributes hidden)%0A%0A        # (1 unchanged block hidden)%0A    }%0A%0A  # module.monitoring.grafana_dashboard.at_a_glance will be updated in-place%0A  ~ resource "grafana_dashboard" "at_a_glance" {%0A      ~ config_json  = jsonencode(%0A          ~ {%0A              ~ panels               = [%0A                    # (6 unchanged elements hidden)%0A                    {%0A                        datasource  = {%0A                            type = "cloudwatch"%0A                            uid  = "XnMFKnQVk"%0A                        }%0A                        fieldConfig = {%0A                            defaults  = {%0A                                color      = {%0A                                    mode = "palette-classic"%0A                                }%0A                                custom     = {%0A                                    axisLabel         = ""%0A                                    axisPlacement     = "auto"%0A                                    barAlignment      = 0%0A                                    drawStyle         = "line"%0A                                    fillOpacity       = 0%0A                                    gradientMode      = "none"%0A                                    hideFrom          = {%0A                                        legend  = false%0A                                        tooltip = false%0A                                        viz     = false%0A                                    }%0A                                    lineInterpolation = "linear"%0A                                    lineWidth         = 1%0A                                    pointSize         = 5%0A                                    scaleDistribution = {%0A                                        type = "linear"%0A                                    }%0A                                    showPoints        = "auto"%0A                                    spanNulls         = false%0A                                    stacking          = {%0A                                        group = "A"%0A                                        mode  = "none"%0A                                    }%0A                                    thresholdsStyle   = {%0A                                        mode = "off"%0A                                    }%0A                                }%0A                                mappings   = []%0A                                thresholds = {%0A                                    mode  = "absolute"%0A                                    steps = [%0A                                        {%0A                                            color = "green"%0A                                            value = null%0A                                        },%0A                                        {%0A                                            color = "red"%0A                                            value = 80%0A                                        },%0A                                    ]%0A                                }%0A                            }%0A                            overrides = []%0A                        }%0A                        gridPos     = {%0A                            h = 9%0A                            w = 7%0A                            x = 0%0A                            y = 18%0A                        }%0A                        options     = {%0A                            legend  = {%0A                                calcs       = []%0A                                displayMode = "list"%0A                                placement   = "bottom"%0A                            }%0A                            tooltip = {%0A                                mode = "single"%0A                                sort = "none"%0A                            }%0A                        }%0A                        targets     = [%0A                            {%0A                                alias            = ""%0A                                datasource       = {%0A                                    type = "cloudwatch"%0A                                    uid  = "XnMFKnQVk"%0A                                }%0A                                dimensions       = {%0A                                    LoadBalancer = "app/staging-push-load-balancer/aea5ef9d0a34453a"%0A                                }%0A                                expression       = ""%0A                                id               = ""%0A                                matchExact       = true%0A                                metricEditorMode = 0%0A                                metricName       = "RequestCount"%0A                                metricQueryType  = 0%0A                                namespace        = "AWS/ApplicationELB"%0A                                period           = ""%0A                                queryMode        = "Metrics"%0A                                refId            = "A"%0A                                region           = "default"%0A                                sqlExpression    = ""%0A                                statistic        = "Sum"%0A                            },%0A                        ]%0A                        title       = "Requests"%0A                        type        = "timeseries"%0A                    },%0A                  ~ {%0A                      ~ alert       = {%0A                          ~ conditions          = [%0A                              ~ {%0A                                  ~ evaluator = {%0A                                      ~ params = [%0A                                          - 5,%0A                                          + 1,%0A                                        ]%0A                                        # (1 unchanged attribute hidden)%0A                                    }%0A                                    # (4 unchanged attributes hidden)%0A                                },%0A                              ~ {%0A                                  ~ evaluator = {%0A                                      ~ params = [%0A                                          - 5,%0A                                          + 1,%0A                                        ]%0A                                        # (1 unchanged attribute hidden)%0A                                    }%0A                                    # (4 unchanged attributes hidden)%0A                                },%0A                            ]%0A                          - for                 = "5m"%0A                            name                = "staging Echo Server 5XX alert"%0A                          ~ noDataState         = "no_data" -> "keep_state"%0A                          ~ notifications       = jsonencode([]) -> []%0A                            # (4 unchanged attributes hidden)%0A                        }%0A                        # (7 unchanged attributes hidden)%0A                    },%0A                    {%0A                        datasource  = {%0A                            type = "cloudwatch"%0A                            uid  = "XnMFKnQVk"%0A                        }%0A                        fieldConfig = {%0A                            defaults  = {%0A                                color      = {%0A                                    mode = "palette-classic"%0A                                }%0A                                custom     = {%0A                                    axisLabel         = ""%0A                                    axisPlacement     = "auto"%0A                                    barAlignment      = 0%0A                                    drawStyle         = "line"%0A                                    fillOpacity       = 0%0A                                    gradientMode      = "none"%0A                                    hideFrom          = {%0A                                        legend  = false%0A                                        tooltip = false%0A                                        viz     = false%0A                                    }%0A                                    lineInterpolation = "linear"%0A                                    lineWidth         = 1%0A                                    pointSize         = 5%0A                                    scaleDistribution = {%0A                                        type = "linear"%0A                                    }%0A                                    showPoints        = "auto"%0A                                    spanNulls         = false%0A                                    stacking          = {%0A                                        group = "A"%0A                                        mode  = "none"%0A                                    }%0A                                    thresholdsStyle   = {%0A                                        mode = "off"%0A                                    }%0A                                }%0A                                mappings   = []%0A                                thresholds = {%0A                                    mode  = "absolute"%0A                                    steps = [%0A                                        {%0A                                            color = "green"%0A                                            value = null%0A                                        },%0A                                        {%0A                                            color = "red"%0A                                            value = 80%0A                                        },%0A                                    ]%0A                                }%0A                            }%0A                            overrides = []%0A                        }%0A                        gridPos     = {%0A                            h = 9%0A                            w = 7%0A                            x = 14%0A                            y = 18%0A                        }%0A                        options     = {%0A                            legend  = {%0A                                calcs       = []%0A                                displayMode = "list"%0A                                placement   = "bottom"%0A                            }%0A                            tooltip = {%0A                                mode = "single"%0A                                sort = "none"%0A                            }%0A                        }%0A                        targets     = [%0A                            {%0A                                alias            = ""%0A                                datasource       = {%0A                                    type = "cloudwatch"%0A                                    uid  = "XnMFKnQVk"%0A                                }%0A                                dimensions       = {%0A                                    LoadBalancer = "app/staging-push-load-balancer/aea5ef9d0a34453a"%0A                                }%0A                                expression       = ""%0A                                id               = ""%0A                                matchExact       = true%0A                                metricEditorMode = 0%0A                                metricName       = "HTTPCode_ELB_4XX_Count"%0A                                metricQueryType  = 0%0A                                namespace        = "AWS/ApplicationELB"%0A                                period           = ""%0A                                queryMode        = "Metrics"%0A                                refId            = "A"%0A                                region           = "default"%0A                                sqlExpression    = ""%0A                                statistic        = "Sum"%0A                            },%0A                            {%0A                                alias            = ""%0A                                datasource       = {%0A                                    type = "cloudwatch"%0A                                    uid  = "XnMFKnQVk"%0A                                }%0A                                dimensions       = {%0A                                    LoadBalancer = "app/staging-push-load-balancer/aea5ef9d0a34453a"%0A                                }%0A                                expression       = ""%0A                                id               = ""%0A                                matchExact       = true%0A                                metricEditorMode = 0%0A                                metricName       = "HTTPCode_Target_4XX_Count"%0A                                metricQueryType  = 0%0A                                namespace        = "AWS/ApplicationELB"%0A                                period           = ""%0A                                queryMode        = "Metrics"%0A                                refId            = "B"%0A                                region           = "default"%0A                                sqlExpression    = ""%0A                                statistic        = "Sum"%0A                            },%0A                        ]%0A                        title       = "4XX"%0A                        type        = "timeseries"%0A                    },%0A                ]%0A                tags                 = []%0A                # (15 unchanged attributes hidden)%0A            }%0A        )%0A        id           = "0:staging-push"%0A        # (7 unchanged attributes hidden)%0A    }%0A%0APlan: 1 to add, 2 to change, 1 to destroy.%0A
::debug::stderr: 
::debug::exitcode: 0

Action: pull_request

@geekbrother geekbrother merged commit 8028744 into main Oct 18, 2023
4 checks passed
@chris13524 chris13524 deleted the max/fix/graphana_alerts branch October 23, 2023 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted The issue has been accepted into the project area-telemetry Metrics & Monitoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fix: alarm notifications disabled
4 participants