Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cluster:node_cpu:ratio to allowlist #1409

Conversation

jacobbaungard
Copy link
Contributor

We add this metric to the allowlist as it will be used to optimize dashboard performance for the fleet wide CPU widgets.

The recording rule is already added to *ks clusters here:

The rule is slightly different from the the statistics we currently use in the dashboard. Before we counted everything except idle cpu as "cpu utilization", while this rule also excludes iowait and streal, this aligns with how i.e node-exporter handle things, more info in the commit message here: prometheus/node_exporter@3e6f4ce

Copy link
Contributor

@philipgough philipgough left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Apr 18, 2024
@jacobbaungard jacobbaungard force-pushed the ACM-10961-add-cluster-node-cpu-ratio branch from 7edb2b6 to b8c339b Compare April 29, 2024 07:17
@openshift-ci openshift-ci bot removed the lgtm label Apr 29, 2024
@jacobbaungard
Copy link
Contributor Author

Rebased with E2E-kind fixes.

@jacobbaungard
Copy link
Contributor Author

/retest

1 similar comment
@jacobbaungard
Copy link
Contributor Author

/retest

Copy link
Collaborator

@subbarao-meduri subbarao-meduri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@jacobbaungard
Copy link
Contributor Author

/retest

2 similar comments
@jacobbaungard
Copy link
Contributor Author

/retest

@jacobbaungard
Copy link
Contributor Author

/retest

@jacobbaungard
Copy link
Contributor Author

/retest

😩

@jacobbaungard
Copy link
Contributor Author

/retest

5 similar comments
@jacobbaungard
Copy link
Contributor Author

/retest

@jacobbaungard
Copy link
Contributor Author

/retest

@jacobbaungard
Copy link
Contributor Author

/retest

@jacobbaungard
Copy link
Contributor Author

/retest

@moadz
Copy link
Contributor

moadz commented May 6, 2024

/retest

We add this metric to the allowlist as it will be used to optimize
dashboard performance for the fleet wide CPU widgets.

Signed-off-by: Jacob Baungard Hansen <[email protected]>
@jacobbaungard jacobbaungard force-pushed the ACM-10961-add-cluster-node-cpu-ratio branch from b8c339b to 782410e Compare May 15, 2024 05:38
@openshift-ci openshift-ci bot removed the lgtm label May 15, 2024
@jacobbaungard
Copy link
Contributor Author

/test test-e2e

1 similar comment
@jacobbaungard
Copy link
Contributor Author

/test test-e2e

@jacobbaungard
Copy link
Contributor Author

/test test-e2e

2 similar comments
@jacobbaungard
Copy link
Contributor Author

/test test-e2e

@jacobbaungard
Copy link
Contributor Author

/test test-e2e

Copy link
Contributor

@philipgough philipgough left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm label May 16, 2024
Copy link

openshift-ci bot commented May 16, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jacobbaungard, philipgough, subbarao-meduri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@openshift-merge-bot openshift-merge-bot bot merged commit 6c5d85a into stolostron:main May 16, 2024
15 checks passed
@jacobbaungard jacobbaungard deleted the ACM-10961-add-cluster-node-cpu-ratio branch May 16, 2024 09:06
coleenquadros pushed a commit to coleenquadros/multicluster-observability-operator that referenced this pull request May 23, 2024
We add this metric to the allowlist as it will be used to optimize
dashboard performance for the fleet wide CPU widgets.

Signed-off-by: Jacob Baungard Hansen <[email protected]>
jacobbaungard added a commit to jacobbaungard/multicluster-observability-operator that referenced this pull request Jun 5, 2024
We add this metric to the allowlist as it will be used to optimize
dashboard performance for the fleet wide CPU widgets.

Signed-off-by: Jacob Baungard Hansen <[email protected]>
openshift-merge-bot bot pushed a commit that referenced this pull request Jun 5, 2024
* Add `cluster:node_cpu:ratio` to allowlist (#1409)

We add this metric to the allowlist as it will be used to optimize
dashboard performance for the fleet wide CPU widgets.

Signed-off-by: Jacob Baungard Hansen <[email protected]>

* Add optimized cluster overview dashboard

This optimized dashboard mainly lowers the cardinality of the CPU
metrics. Specifically instead of using `avg(rate(node_cpu_seconds_total`
which has a cardinality of total CPUs across all managed clusters, we
instead use `cluster:node_cpu:ratio` which has a cardinality of 1 per
cluster.

That is with 100 clusters, with 16 CPUs, the cardinality before was
100*16 = 1600, where as with this change we now only fetch 100 metrics.

This should scale quite a bit better on larger installations with many
clusters/nodes.

Signed-off-by: Jacob Baungard Hansen <[email protected]>

* Tests: Add basic test for dashboard existence

A quick test that checks if the dashboards exists.

Signed-off-by: Jacob Baungard Hansen <[email protected]>

---------

Signed-off-by: Jacob Baungard Hansen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants