Add `cluster:node_cpu:ratio` to allowlist #1409

jacobbaungard · 2024-04-18T07:48:33Z

We add this metric to the allowlist as it will be used to optimize dashboard performance for the fleet wide CPU widgets.

The recording rule is already added to *ks clusters here:

multicluster-observability-operator/operators/endpointmetrics/manifests/prometheus/prometheusrules/kube-prometheus-node-recording.yaml

Line 24 in 7ed3670

record: cluster:node_cpu:sum_rate5m

The rule is slightly different from the the statistics we currently use in the dashboard. Before we counted everything except idle cpu as "cpu utilization", while this rule also excludes iowait and streal, this aligns with how i.e node-exporter handle things, more info in the commit message here: prometheus/node_exporter@3e6f4ce

philipgough

/lgtm

jacobbaungard · 2024-04-29T07:17:27Z

Rebased with E2E-kind fixes.

jacobbaungard · 2024-04-29T09:07:55Z

/retest

jacobbaungard · 2024-04-29T17:22:02Z

/retest

subbarao-meduri

/lgtm

jacobbaungard · 2024-05-02T06:33:06Z

/retest

jacobbaungard · 2024-05-02T07:08:39Z

/retest

jacobbaungard · 2024-05-02T11:24:31Z

/retest

jacobbaungard · 2024-05-02T14:46:38Z

/retest

😩

jacobbaungard · 2024-05-03T07:38:33Z

/retest

jacobbaungard · 2024-05-03T09:02:03Z

/retest

jacobbaungard · 2024-05-03T15:24:39Z

/retest

jacobbaungard · 2024-05-03T15:41:13Z

/retest

jacobbaungard · 2024-05-06T07:22:37Z

/retest

moadz · 2024-05-06T13:13:14Z

/retest

We add this metric to the allowlist as it will be used to optimize dashboard performance for the fleet wide CPU widgets. Signed-off-by: Jacob Baungard Hansen <[email protected]>

jacobbaungard · 2024-05-15T08:45:22Z

/test test-e2e

jacobbaungard · 2024-05-15T17:31:04Z

/test test-e2e

jacobbaungard · 2024-05-15T20:09:37Z

/test test-e2e

jacobbaungard · 2024-05-16T06:36:48Z

/test test-e2e

jacobbaungard · 2024-05-16T07:11:34Z

/test test-e2e

philipgough

/lgtm

openshift-ci · 2024-05-16T08:30:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jacobbaungard, philipgough, subbarao-meduri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~operators/multiclusterobservability/OWNERS~~ [jacobbaungard,philipgough,subbarao-meduri]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sonarqubecloud · 2024-05-16T08:39:11Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

We add this metric to the allowlist as it will be used to optimize dashboard performance for the fleet wide CPU widgets. Signed-off-by: Jacob Baungard Hansen <[email protected]>

* Add `cluster:node_cpu:ratio` to allowlist (#1409) We add this metric to the allowlist as it will be used to optimize dashboard performance for the fleet wide CPU widgets. Signed-off-by: Jacob Baungard Hansen <[email protected]> * Add optimized cluster overview dashboard This optimized dashboard mainly lowers the cardinality of the CPU metrics. Specifically instead of using `avg(rate(node_cpu_seconds_total` which has a cardinality of total CPUs across all managed clusters, we instead use `cluster:node_cpu:ratio` which has a cardinality of 1 per cluster. That is with 100 clusters, with 16 CPUs, the cardinality before was 100*16 = 1600, where as with this change we now only fetch 100 metrics. This should scale quite a bit better on larger installations with many clusters/nodes. Signed-off-by: Jacob Baungard Hansen <[email protected]> * Tests: Add basic test for dashboard existence A quick test that checks if the dashboards exists. Signed-off-by: Jacob Baungard Hansen <[email protected]> --------- Signed-off-by: Jacob Baungard Hansen <[email protected]>

openshift-ci bot added dco-signoff: yes approved labels Apr 18, 2024

jacobbaungard force-pushed the ACM-10961-add-cluster-node-cpu-ratio branch from 2d027e0 to 7edb2b6 Compare April 18, 2024 07:49

philipgough approved these changes Apr 18, 2024

View reviewed changes

openshift-ci bot assigned philipgough Apr 18, 2024

openshift-ci bot added the lgtm label Apr 18, 2024

jacobbaungard force-pushed the ACM-10961-add-cluster-node-cpu-ratio branch from 7edb2b6 to b8c339b Compare April 29, 2024 07:17

openshift-ci bot removed the lgtm label Apr 29, 2024

subbarao-meduri requested a review from philipgough May 1, 2024 12:38

subbarao-meduri approved these changes May 1, 2024

View reviewed changes

openshift-ci bot assigned subbarao-meduri May 1, 2024

openshift-ci bot added the lgtm label May 1, 2024

Add cluster:node_cpu:ratio to allowlist

782410e

We add this metric to the allowlist as it will be used to optimize dashboard performance for the fleet wide CPU widgets. Signed-off-by: Jacob Baungard Hansen <[email protected]>

jacobbaungard force-pushed the ACM-10961-add-cluster-node-cpu-ratio branch from b8c339b to 782410e Compare May 15, 2024 05:38

openshift-ci bot removed the lgtm label May 15, 2024

philipgough approved these changes May 16, 2024

View reviewed changes

openshift-ci bot added the lgtm label May 16, 2024

openshift-merge-bot bot merged commit 6c5d85a into stolostron:main May 16, 2024
15 checks passed

jacobbaungard deleted the ACM-10961-add-cluster-node-cpu-ratio branch May 16, 2024 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `cluster:node_cpu:ratio` to allowlist #1409

Add `cluster:node_cpu:ratio` to allowlist #1409

jacobbaungard commented Apr 18, 2024

philipgough left a comment

jacobbaungard commented Apr 29, 2024

jacobbaungard commented Apr 29, 2024

jacobbaungard commented Apr 29, 2024

subbarao-meduri left a comment

jacobbaungard commented May 2, 2024

jacobbaungard commented May 2, 2024

jacobbaungard commented May 2, 2024

jacobbaungard commented May 2, 2024

jacobbaungard commented May 3, 2024

jacobbaungard commented May 3, 2024

jacobbaungard commented May 3, 2024

jacobbaungard commented May 3, 2024

jacobbaungard commented May 6, 2024

moadz commented May 6, 2024

jacobbaungard commented May 15, 2024

jacobbaungard commented May 15, 2024

jacobbaungard commented May 15, 2024

jacobbaungard commented May 16, 2024

jacobbaungard commented May 16, 2024

philipgough left a comment

openshift-ci bot commented May 16, 2024

sonarqubecloud bot commented May 16, 2024

Add cluster:node_cpu:ratio to allowlist #1409

Add cluster:node_cpu:ratio to allowlist #1409

Conversation

jacobbaungard commented Apr 18, 2024

philipgough left a comment

Choose a reason for hiding this comment

jacobbaungard commented Apr 29, 2024

jacobbaungard commented Apr 29, 2024

jacobbaungard commented Apr 29, 2024

subbarao-meduri left a comment

Choose a reason for hiding this comment

jacobbaungard commented May 2, 2024

jacobbaungard commented May 2, 2024

jacobbaungard commented May 2, 2024

jacobbaungard commented May 2, 2024

jacobbaungard commented May 3, 2024

jacobbaungard commented May 3, 2024

jacobbaungard commented May 3, 2024

jacobbaungard commented May 3, 2024

jacobbaungard commented May 6, 2024

moadz commented May 6, 2024

jacobbaungard commented May 15, 2024

jacobbaungard commented May 15, 2024

jacobbaungard commented May 15, 2024

jacobbaungard commented May 16, 2024

jacobbaungard commented May 16, 2024

philipgough left a comment

Choose a reason for hiding this comment

openshift-ci bot commented May 16, 2024

sonarqubecloud bot commented May 16, 2024

Quality Gate passed

Add `cluster:node_cpu:ratio` to allowlist #1409

Add `cluster:node_cpu:ratio` to allowlist #1409