Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optimized grafana dashboard #1454

Conversation

jacobbaungard
Copy link
Contributor

This optimized dashboard mainly lowers the cardinality of the CPU
metrics. Specifically instead of using avg(rate(node_cpu_seconds_total
which has a cardinality of total CPUs across all managed clusters, we
instead use cluster:node_cpu:ratio which has a cardinality of 1 per
cluster.

That is with 100 clusters, with 16 CPUs, the cardinality before was
100*16 = 1600, where as with this change we now only fetch 100 metrics.

This should scale quite a bit better on larger installations with many
clusters/nodes.

@jacobbaungard
Copy link
Contributor Author

/test test-unit
/test e2e-kind

Copy link

openshift-ci bot commented May 23, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@jacobbaungard
Copy link
Contributor Author

/test test-unit

3 similar comments
@jacobbaungard
Copy link
Contributor Author

/test test-unit

@jacobbaungard
Copy link
Contributor Author

/test test-unit

@jacobbaungard
Copy link
Contributor Author

/test test-unit

@jacobbaungard jacobbaungard force-pushed the ACM-10962-dashboard-optimization branch from 84ec537 to 776c7a2 Compare May 27, 2024 07:49
@jacobbaungard
Copy link
Contributor Author

/test test-unit

@jacobbaungard jacobbaungard marked this pull request as ready for review May 27, 2024 08:08
@jacobbaungard jacobbaungard force-pushed the ACM-10962-dashboard-optimization branch 4 times, most recently from 899d86a to 0e2886c Compare May 27, 2024 13:39
@jacobbaungard
Copy link
Contributor Author

/test images

@jacobbaungard
Copy link
Contributor Author

/retest

1 similar comment
@jacobbaungard
Copy link
Contributor Author

/retest

@jacobbaungard
Copy link
Contributor Author

/test e2e-kind

@jacobbaungard jacobbaungard force-pushed the ACM-10962-dashboard-optimization branch from c96d7b6 to 939fbbd Compare May 28, 2024 10:01
This optimized dashboard mainly lowers the cardinality of the CPU
metrics. Specifically instead of using `avg(rate(node_cpu_seconds_total`
which has a cardinality of total CPUs across all managed clusters, we
instead use `cluster:node_cpu:ratio` which has a cardinality of 1 per
cluster.

That is with 100 clusters, with 16 CPUs, the cardinality before was
100*16 = 1600, where as with this change we now only fetch 100 metrics.

This should scale quite a bit better on larger installations with many
clusters/nodes.

Signed-off-by: Jacob Baungard Hansen <[email protected]>
Instead of listing all clusters manually in the query, i.e like:

```
cluster=~"(local-cluster|simulated-managed-cluster-1|simulated-managed-cluster-1-1|simulated-managed-cluster-1-10|simulated-managed-cluster-1-2|simulated-managed-cluster-1-3..."
```

We set it to `".+"` simplifying the query significantly.

Signed-off-by: Jacob Baungard Hansen <[email protected]>
A quick test that checks if the dashboards exists.

Signed-off-by: Jacob Baungard Hansen <[email protected]>
@jacobbaungard jacobbaungard force-pushed the ACM-10962-dashboard-optimization branch from bb0437f to ba93d28 Compare May 28, 2024 11:34
While the auxiliary images (endpoint-monitoring-operator, etc) correctly
used the CI built images in kind, this was that the case for MCO itself.
In this commit we make sure to load in the `IMAGE REF` from the kind env
file, so that the CI image for MCO is used as well.

Signed-off-by: Jacob Baungard Hansen <[email protected]>
@jacobbaungard jacobbaungard force-pushed the ACM-10962-dashboard-optimization branch from ba93d28 to ad9b81a Compare May 28, 2024 11:59
Copy link

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@jacobbaungard
Copy link
Contributor Author

/retest

Copy link
Collaborator

@subbarao-meduri subbarao-meduri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link

openshift-ci bot commented May 28, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jacobbaungard, subbarao-meduri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [jacobbaungard,subbarao-meduri]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit a1c94fa into stolostron:main May 28, 2024
16 checks passed
jacobbaungard added a commit to jacobbaungard/multicluster-observability-operator that referenced this pull request May 28, 2024
These were missed before merging:
stolostron#1454

Signed-off-by: Jacob Baungard Hansen <[email protected]>
jacobbaungard added a commit to jacobbaungard/multicluster-observability-operator that referenced this pull request May 31, 2024
These were missed before merging:
stolostron#1454

Signed-off-by: Jacob Baungard Hansen <[email protected]>
jacobbaungard added a commit that referenced this pull request May 31, 2024
These were missed before merging:
#1454

Signed-off-by: Jacob Baungard Hansen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants