-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optimized grafana dashboard #1454
Add optimized grafana dashboard #1454
Conversation
/test test-unit |
Skipping CI for Draft Pull Request. |
/test test-unit |
3 similar comments
/test test-unit |
/test test-unit |
/test test-unit |
84ec537
to
776c7a2
Compare
/test test-unit |
899d86a
to
0e2886c
Compare
/test images |
/retest |
1 similar comment
/retest |
/test e2e-kind |
c96d7b6
to
939fbbd
Compare
This optimized dashboard mainly lowers the cardinality of the CPU metrics. Specifically instead of using `avg(rate(node_cpu_seconds_total` which has a cardinality of total CPUs across all managed clusters, we instead use `cluster:node_cpu:ratio` which has a cardinality of 1 per cluster. That is with 100 clusters, with 16 CPUs, the cardinality before was 100*16 = 1600, where as with this change we now only fetch 100 metrics. This should scale quite a bit better on larger installations with many clusters/nodes. Signed-off-by: Jacob Baungard Hansen <[email protected]>
Instead of listing all clusters manually in the query, i.e like: ``` cluster=~"(local-cluster|simulated-managed-cluster-1|simulated-managed-cluster-1-1|simulated-managed-cluster-1-10|simulated-managed-cluster-1-2|simulated-managed-cluster-1-3..." ``` We set it to `".+"` simplifying the query significantly. Signed-off-by: Jacob Baungard Hansen <[email protected]>
A quick test that checks if the dashboards exists. Signed-off-by: Jacob Baungard Hansen <[email protected]>
bb0437f
to
ba93d28
Compare
While the auxiliary images (endpoint-monitoring-operator, etc) correctly used the CI built images in kind, this was that the case for MCO itself. In this commit we make sure to load in the `IMAGE REF` from the kind env file, so that the CI image for MCO is used as well. Signed-off-by: Jacob Baungard Hansen <[email protected]>
ba93d28
to
ad9b81a
Compare
Quality Gate passedIssues Measures |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jacobbaungard, subbarao-meduri The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
These were missed before merging: stolostron#1454 Signed-off-by: Jacob Baungard Hansen <[email protected]>
These were missed before merging: stolostron#1454 Signed-off-by: Jacob Baungard Hansen <[email protected]>
These were missed before merging: #1454 Signed-off-by: Jacob Baungard Hansen <[email protected]>
This optimized dashboard mainly lowers the cardinality of the CPU
metrics. Specifically instead of using
avg(rate(node_cpu_seconds_total
which has a cardinality of total CPUs across all managed clusters, we
instead use
cluster:node_cpu:ratio
which has a cardinality of 1 percluster.
That is with 100 clusters, with 16 CPUs, the cardinality before was
100*16 = 1600, where as with this change we now only fetch 100 metrics.
This should scale quite a bit better on larger installations with many
clusters/nodes.