Skip to content

Commit

Permalink
Add optimized grafana dashboard (#1454)
Browse files Browse the repository at this point in the history
* Add optimized cluster overview dashboard

This optimized dashboard mainly lowers the cardinality of the CPU
metrics. Specifically instead of using `avg(rate(node_cpu_seconds_total`
which has a cardinality of total CPUs across all managed clusters, we
instead use `cluster:node_cpu:ratio` which has a cardinality of 1 per
cluster.

That is with 100 clusters, with 16 CPUs, the cardinality before was
100*16 = 1600, where as with this change we now only fetch 100 metrics.

This should scale quite a bit better on larger installations with many
clusters/nodes.

Signed-off-by: Jacob Baungard Hansen <[email protected]>

* Grafana: Use wildcard for all on cluster overview

Instead of listing all clusters manually in the query, i.e like:

```
cluster=~"(local-cluster|simulated-managed-cluster-1|simulated-managed-cluster-1-1|simulated-managed-cluster-1-10|simulated-managed-cluster-1-2|simulated-managed-cluster-1-3..."
```

We set it to `".+"` simplifying the query significantly.

Signed-off-by: Jacob Baungard Hansen <[email protected]>

* Tests: Add basic test for dashboard existence

A quick test that checks if the dashboards exists.

Signed-off-by: Jacob Baungard Hansen <[email protected]>

* Kind test: Actually use CI built MCO image in test

While the auxiliary images (endpoint-monitoring-operator, etc) correctly
used the CI built images in kind, this was that the case for MCO itself.
In this commit we make sure to load in the `IMAGE REF` from the kind env
file, so that the CI image for MCO is used as well.

Signed-off-by: Jacob Baungard Hansen <[email protected]>

---------

Signed-off-by: Jacob Baungard Hansen <[email protected]>
  • Loading branch information
jacobbaungard authored May 28, 2024
1 parent 29695d9 commit a1c94fa
Show file tree
Hide file tree
Showing 5 changed files with 1,950 additions and 4 deletions.
8 changes: 8 additions & 0 deletions cicd-scripts/setup-e2e-tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,11 @@ EOF

# deploy the MCO operator via the kustomize resources
deploy_mco_operator() {
# makes sure we get the MULTICLUSTER_OBSERVABILITY_OPERATOR_IMAGE_REF
if [[ -n ${IS_KIND_ENV} ]]; then
source ${ROOTDIR}/tests/run-in-kind/env.sh
fi

if [[ -n ${MULTICLUSTER_OBSERVABILITY_OPERATOR_IMAGE_REF} ]]; then
cd ${ROOTDIR}/operators/multiclusterobservability/config/manager && kustomize edit set image quay.io/stolostron/multicluster-observability-operator=${MULTICLUSTER_OBSERVABILITY_OPERATOR_IMAGE_REF}
else
Expand All @@ -138,6 +143,9 @@ deploy_mco_operator() {
cd ${ROOTDIR}
kustomize build ${ROOTDIR}/operators/multiclusterobservability/config/default | kubectl apply -n ${OCM_DEFAULT_NS} --server-side=true -f -

cat ${ROOTDIR}/operators/multiclusterobservability/config/manager/manager.yaml
cat ${ROOTDIR}/operators/multiclusterobservability/config/manager/kustomization.yaml

# wait until mco is ready
wait_for_deployment_ready 10 60s ${OCM_DEFAULT_NS} multicluster-observability-operator
echo "mco operator is deployed successfully."
Expand Down
Loading

0 comments on commit a1c94fa

Please sign in to comment.