Add optimized grafana dashboard #1454

jacobbaungard · 2024-05-23T15:17:18Z

This optimized dashboard mainly lowers the cardinality of the CPU
metrics. Specifically instead of using avg(rate(node_cpu_seconds_total
which has a cardinality of total CPUs across all managed clusters, we
instead use cluster:node_cpu:ratio which has a cardinality of 1 per
cluster.

That is with 100 clusters, with 16 CPUs, the cardinality before was
100*16 = 1600, where as with this change we now only fetch 100 metrics.

This should scale quite a bit better on larger installations with many
clusters/nodes.

jacobbaungard · 2024-05-23T15:17:30Z

/test test-unit
/test e2e-kind

openshift-ci · 2024-05-23T15:17:32Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

jacobbaungard · 2024-05-23T15:23:34Z

/test test-unit

jacobbaungard · 2024-05-23T15:35:02Z

/test test-unit

jacobbaungard · 2024-05-23T15:59:55Z

/test test-unit

jacobbaungard · 2024-05-27T05:51:34Z

/test test-unit

jacobbaungard · 2024-05-27T07:51:05Z

/test test-unit

jacobbaungard · 2024-05-27T13:56:50Z

/test images

jacobbaungard · 2024-05-27T15:26:21Z

/retest

jacobbaungard · 2024-05-28T06:19:13Z

/retest

jacobbaungard · 2024-05-28T09:35:59Z

/test e2e-kind

This optimized dashboard mainly lowers the cardinality of the CPU metrics. Specifically instead of using `avg(rate(node_cpu_seconds_total` which has a cardinality of total CPUs across all managed clusters, we instead use `cluster:node_cpu:ratio` which has a cardinality of 1 per cluster. That is with 100 clusters, with 16 CPUs, the cardinality before was 100*16 = 1600, where as with this change we now only fetch 100 metrics. This should scale quite a bit better on larger installations with many clusters/nodes. Signed-off-by: Jacob Baungard Hansen <[email protected]>

Instead of listing all clusters manually in the query, i.e like: ``` cluster=~"(local-cluster|simulated-managed-cluster-1|simulated-managed-cluster-1-1|simulated-managed-cluster-1-10|simulated-managed-cluster-1-2|simulated-managed-cluster-1-3..." ``` We set it to `".+"` simplifying the query significantly. Signed-off-by: Jacob Baungard Hansen <[email protected]>

A quick test that checks if the dashboards exists. Signed-off-by: Jacob Baungard Hansen <[email protected]>

While the auxiliary images (endpoint-monitoring-operator, etc) correctly used the CI built images in kind, this was that the case for MCO itself. In this commit we make sure to load in the `IMAGE REF` from the kind env file, so that the CI image for MCO is used as well. Signed-off-by: Jacob Baungard Hansen <[email protected]>

sonarqubecloud · 2024-05-28T12:04:17Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

jacobbaungard · 2024-05-28T12:51:00Z

/retest

subbarao-meduri

/lgtm

openshift-ci · 2024-05-28T13:33:21Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jacobbaungard, subbarao-meduri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jacobbaungard,subbarao-meduri]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

These were missed before merging: stolostron#1454 Signed-off-by: Jacob Baungard Hansen <[email protected]>

These were missed before merging: #1454 Signed-off-by: Jacob Baungard Hansen <[email protected]>

openshift-ci bot added the do-not-merge/work-in-progress label May 23, 2024

openshift-ci bot added dco-signoff: yes approved labels May 23, 2024

jacobbaungard force-pushed the ACM-10962-dashboard-optimization branch from 84ec537 to 776c7a2 Compare May 27, 2024 07:49

jacobbaungard marked this pull request as ready for review May 27, 2024 08:08

openshift-ci bot removed the do-not-merge/work-in-progress label May 27, 2024

jacobbaungard force-pushed the ACM-10962-dashboard-optimization branch 4 times, most recently from 899d86a to 0e2886c Compare May 27, 2024 13:39

jacobbaungard force-pushed the ACM-10962-dashboard-optimization branch from c96d7b6 to 939fbbd Compare May 28, 2024 10:01

jacobbaungard added 3 commits May 28, 2024 13:33

Tests: Add basic test for dashboard existence

1d46a09

A quick test that checks if the dashboards exists. Signed-off-by: Jacob Baungard Hansen <[email protected]>

jacobbaungard force-pushed the ACM-10962-dashboard-optimization branch from bb0437f to ba93d28 Compare May 28, 2024 11:34

jacobbaungard force-pushed the ACM-10962-dashboard-optimization branch from ba93d28 to ad9b81a Compare May 28, 2024 11:59

subbarao-meduri approved these changes May 28, 2024

View reviewed changes

openshift-ci bot assigned subbarao-meduri May 28, 2024

openshift-ci bot added the lgtm label May 28, 2024

openshift-merge-bot bot merged commit a1c94fa into stolostron:main May 28, 2024
16 checks passed

jacobbaungard added a commit to jacobbaungard/multicluster-observability-operator that referenced this pull request May 28, 2024

Tests: remove additional test print

3fef7cb

These were missed before merging: stolostron#1454 Signed-off-by: Jacob Baungard Hansen <[email protected]>

jacobbaungard mentioned this pull request May 28, 2024

Tests: remove additional test print #1457

Merged

jacobbaungard added a commit to jacobbaungard/multicluster-observability-operator that referenced this pull request May 31, 2024

Tests: remove additional test print

08831ce

These were missed before merging: stolostron#1454 Signed-off-by: Jacob Baungard Hansen <[email protected]>

jacobbaungard added a commit that referenced this pull request May 31, 2024

Tests: remove additional test print (#1457)

0c92c63

These were missed before merging: #1454 Signed-off-by: Jacob Baungard Hansen <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optimized grafana dashboard #1454

Add optimized grafana dashboard #1454

jacobbaungard commented May 23, 2024

jacobbaungard commented May 23, 2024

openshift-ci bot commented May 23, 2024

jacobbaungard commented May 23, 2024

jacobbaungard commented May 23, 2024

jacobbaungard commented May 23, 2024

jacobbaungard commented May 27, 2024

jacobbaungard commented May 27, 2024

jacobbaungard commented May 27, 2024

jacobbaungard commented May 27, 2024

jacobbaungard commented May 28, 2024

jacobbaungard commented May 28, 2024

sonarqubecloud bot commented May 28, 2024

jacobbaungard commented May 28, 2024

subbarao-meduri left a comment

openshift-ci bot commented May 28, 2024

Add optimized grafana dashboard #1454

Add optimized grafana dashboard #1454

Conversation

jacobbaungard commented May 23, 2024

jacobbaungard commented May 23, 2024

openshift-ci bot commented May 23, 2024

jacobbaungard commented May 23, 2024

jacobbaungard commented May 23, 2024

jacobbaungard commented May 23, 2024

jacobbaungard commented May 27, 2024

jacobbaungard commented May 27, 2024

jacobbaungard commented May 27, 2024

jacobbaungard commented May 27, 2024

jacobbaungard commented May 28, 2024

jacobbaungard commented May 28, 2024

sonarqubecloud bot commented May 28, 2024

Quality Gate passed

jacobbaungard commented May 28, 2024

subbarao-meduri left a comment

Choose a reason for hiding this comment

openshift-ci bot commented May 28, 2024