-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cluster:node_cpu:ratio
to allowlist
#1409
Add cluster:node_cpu:ratio
to allowlist
#1409
Conversation
2d027e0
to
7edb2b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
7edb2b6
to
b8c339b
Compare
Rebased with E2E-kind fixes. |
/retest |
1 similar comment
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/retest |
2 similar comments
/retest |
/retest |
/retest 😩 |
/retest |
5 similar comments
/retest |
/retest |
/retest |
/retest |
/retest |
We add this metric to the allowlist as it will be used to optimize dashboard performance for the fleet wide CPU widgets. Signed-off-by: Jacob Baungard Hansen <[email protected]>
b8c339b
to
782410e
Compare
/test test-e2e |
1 similar comment
/test test-e2e |
/test test-e2e |
2 similar comments
/test test-e2e |
/test test-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jacobbaungard, philipgough, subbarao-meduri The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Quality Gate passedIssues Measures |
We add this metric to the allowlist as it will be used to optimize dashboard performance for the fleet wide CPU widgets. Signed-off-by: Jacob Baungard Hansen <[email protected]>
We add this metric to the allowlist as it will be used to optimize dashboard performance for the fleet wide CPU widgets. Signed-off-by: Jacob Baungard Hansen <[email protected]>
* Add `cluster:node_cpu:ratio` to allowlist (#1409) We add this metric to the allowlist as it will be used to optimize dashboard performance for the fleet wide CPU widgets. Signed-off-by: Jacob Baungard Hansen <[email protected]> * Add optimized cluster overview dashboard This optimized dashboard mainly lowers the cardinality of the CPU metrics. Specifically instead of using `avg(rate(node_cpu_seconds_total` which has a cardinality of total CPUs across all managed clusters, we instead use `cluster:node_cpu:ratio` which has a cardinality of 1 per cluster. That is with 100 clusters, with 16 CPUs, the cardinality before was 100*16 = 1600, where as with this change we now only fetch 100 metrics. This should scale quite a bit better on larger installations with many clusters/nodes. Signed-off-by: Jacob Baungard Hansen <[email protected]> * Tests: Add basic test for dashboard existence A quick test that checks if the dashboards exists. Signed-off-by: Jacob Baungard Hansen <[email protected]> --------- Signed-off-by: Jacob Baungard Hansen <[email protected]>
We add this metric to the allowlist as it will be used to optimize dashboard performance for the fleet wide CPU widgets.
The recording rule is already added to *ks clusters here:
multicluster-observability-operator/operators/endpointmetrics/manifests/prometheus/prometheusrules/kube-prometheus-node-recording.yaml
Line 24 in 7ed3670
The rule is slightly different from the the statistics we currently use in the dashboard. Before we counted everything except
idle
cpu as "cpu utilization", while this rule also excludesiowait
andstreal
, this aligns with how i.e node-exporter handle things, more info in the commit message here: prometheus/node_exporter@3e6f4ce