Add optimized grafana dashboard (#1454)

* Add optimized cluster overview dashboard This optimized dashboard mainly lowers the cardinality of the CPU metrics. Specifically instead of using `avg(rate(node_cpu_seconds_total` which has a cardinality of total CPUs across all managed clusters, we instead use `cluster:node_cpu:ratio` which has a cardinality of 1 per cluster. That is with 100 clusters, with 16 CPUs, the cardinality before was 100*16 = 1600, where as with this change we now only fetch 100 metrics. This should scale quite a bit better on larger installations with many clusters/nodes. Signed-off-by: Jacob Baungard Hansen <[email protected]> * Grafana: Use wildcard for all on cluster overview Instead of listing all clusters manually in the query, i.e like: ``` cluster=~"(local-cluster|simulated-managed-cluster-1|simulated-managed-cluster-1-1|simulated-managed-cluster-1-10|simulated-managed-cluster-1-2|simulated-managed-cluster-1-3..." ``` We set it to `".+"` simplifying the query significantly. Signed-off-by: Jacob Baungard Hansen <[email protected]> * Tests: Add basic test for dashboard existence A quick test that checks if the dashboards exists. Signed-off-by: Jacob Baungard Hansen <[email protected]> * Kind test: Actually use CI built MCO image in test While the auxiliary images (endpoint-monitoring-operator, etc) correctly used the CI built images in kind, this was that the case for MCO itself. In this commit we make sure to load in the `IMAGE REF` from the kind env file, so that the CI image for MCO is used as well. Signed-off-by: Jacob Baungard Hansen <[email protected]> --------- Signed-off-by: Jacob Baungard Hansen <[email protected]>
stolostron · May 28, 2024 · a1c94fa · a1c94fa
1 parent 29695d9
commit a1c94fa
Show file tree

Hide file tree

Showing 5 changed files with 1,950 additions and 4 deletions.
diff --git a/cicd-scripts/setup-e2e-tests.sh b/cicd-scripts/setup-e2e-tests.sh
@@ -130,6 +130,11 @@ EOF
 
 # deploy the MCO operator via the kustomize resources
 deploy_mco_operator() {
+  # makes sure we get the MULTICLUSTER_OBSERVABILITY_OPERATOR_IMAGE_REF
+  if [[ -n ${IS_KIND_ENV} ]]; then
+    source ${ROOTDIR}/tests/run-in-kind/env.sh
+  fi
+
   if [[ -n ${MULTICLUSTER_OBSERVABILITY_OPERATOR_IMAGE_REF} ]]; then
     cd ${ROOTDIR}/operators/multiclusterobservability/config/manager && kustomize edit set image quay.io/stolostron/multicluster-observability-operator=${MULTICLUSTER_OBSERVABILITY_OPERATOR_IMAGE_REF}
   else
@@ -138,6 +143,9 @@ deploy_mco_operator() {
   cd ${ROOTDIR}
   kustomize build ${ROOTDIR}/operators/multiclusterobservability/config/default | kubectl apply -n ${OCM_DEFAULT_NS} --server-side=true -f -
 
+  cat ${ROOTDIR}/operators/multiclusterobservability/config/manager/manager.yaml
+  cat ${ROOTDIR}/operators/multiclusterobservability/config/manager/kustomization.yaml
+
   # wait until mco is ready
   wait_for_deployment_ready 10 60s ${OCM_DEFAULT_NS} multicluster-observability-operator
   echo "mco operator is deployed successfully."