Node panels #136

afcollins · 2024-09-24T16:28:18Z

Type of change

Description

Adding a new section to the OCP Performance dashboard that help me get an quick overview of the cluster for any nodes or issues to dive into.

Reordered OVN Dashboard for relevance, but also because the panels pop out of the row.
Also removed old metrics.

Makefile changes to allow generated dashboard cleanup without deleting and redownloading binaries.

variables changes to be more flexible with prometheus that may not be running inside an openshift cluster.

Checklist before requesting a review

I have performed a self-review of my code.
If it is a core feature, I have added thorough tests.

Testing

How were the fix/results from this change verified? Please provide relevant screenshots or results.
I run make and import the generated dashboards in a grafana running locally against a locally running prometheus.

afcollins · 2024-09-24T16:28:51Z

Makefile

@@ -30,10 +30,14 @@ format: deps

 build: deps $(LIBRARY_PATH) $(outputs)

-clean:
+clean-all:


New make task so binaries can still be deleted, but are not deleted every time.

afcollins · 2024-09-24T16:29:56Z

assets/ocp-performance/panels.libsonnet

+      + options.legend.withSortDesc(true)
+      + options.legend.withPlacement('bottom'),
+
+    genericLegendCounter(title, unit, targets, gridPos):


New panel type with different legend fields, more relevant for counters and memory.

afcollins · 2024-09-24T16:30:57Z

assets/ocp-performance/variables.libsonnet

@@ -3,8 +3,7 @@ local var = g.dashboard.variable;

 {
  datasource:
-    var.datasource.new('datasource', 'prometheus')
-    + var.datasource.withRegex('/^Cluster Prometheus$/'),


Using a dashboard against a prometheus outside of openshift, I update this variable after importing. Instead, just deleting for good.

Confirmed on ROSA cluster that the dashboard variable auto-populates to 'Cluster Prometheus'

afcollins · 2024-09-24T16:32:12Z

templates/General/ocp-performance.jsonnet

-    panels.timeSeries.genericLegend('ovs-worker CPU Usage', 'percent', queries.OVSCPU.query('$_worker_node'), { x: 0, y: 21, w: 12, h: 8 }),
-    panels.timeSeries.genericLegend('ovs-worker Memory Usage', 'bytes', queries.OVSMemory.query('$_worker_node'), { x: 12, y: 21, w: 12, h: 8 }),
-    panels.timeSeries.generic('99% Pod Annotation Latency', 's', queries.ovnAnnotationLatency.query(), { x: 0, y: 1, w: 24, h: 12 }),
-    panels.timeSeries.generic('99% CNI Request ADD Latency', 's', queries.ovnCNIAdd.query(), { x: 0, y: 13, w: 12, h: 8 }),


These y values were causing these three panels to pop out of the row.

Also, the metrics seem far less frequently used than CPU and memory usage, so I also moved them to the bottom so relevant panels stay at the top.

smanda99 · 2024-09-25T10:16:08Z

lgtm

Add panels that show cluster view Signed-off-by: Andrew Collins <[email protected]>

Signed-off-by: Andrew Collins <[email protected]> panels and legends updates Signed-off-by: Andrew Collins <[email protected]>

afcollins · 2024-09-25T16:31:19Z

assets/ovn-monitoring/queries.libsonnet

    query():
-      generateTimeSeriesQuery('ovnkube_master_leader', '{{pod}}'),


metric doesn't exist. Replacement is only _leader that is unique, as ovnkube_controller_leader is 0 for all pods.

afcollins · 2024-09-25T16:31:37Z

assets/ovn-monitoring/queries.libsonnet

  },

  ovnNorthd: {
    query():
      generateTimeSeriesQuery('ovn_northd_status', '{{pod}}'),
  },

-  ovnNbdbLeader: {
-    query():
-      generateTimeSeriesQuery('ovn_db_cluster_server_role{server_role="leader",db_name="OVN_Northbound"}', '{{pod}}'),


Removing both as neither metric exists any longer.

afcollins · 2024-09-25T16:31:56Z

assets/ovn-monitoring/queries.libsonnet

  numOnvController: {
    query():
      generateTimeSeriesQuery('count(ovn_controller_monitor_all) by (namespace)', ''),
  },

  ovnKubeControlPlaneCPU: {
    query():
-      generateTimeSeriesQuery('irate(container_cpu_usage_seconds_total{pod=~"(ovnkube-master|ovnkube-control-plane).+",namespace="openshift-ovn-kubernetes",container!~"POD|"}[2m])*100','{{container}}-{{pod}}-{{node}}'),
+      generateTimeSeriesQuery('sum( irate(container_cpu_usage_seconds_total{pod=~"(ovnkube-master|ovnkube-control-plane).+",namespace="openshift-ovn-kubernetes",container!~"POD|"}[2m])*100 ) by (pod, node)', '{{pod}} - {{node}}'),


label formatting to match ocp-performance dashboard

afcollins commented Sep 24, 2024

View reviewed changes

afcollins requested a review from smanda99 September 24, 2024 16:32

afcollins added 3 commits September 25, 2024 10:52

OCP Dash changes

cf8ab66

Add panels that show cluster view Signed-off-by: Andrew Collins <[email protected]>

Makefile and variables changes

75952e3

Signed-off-by: Andrew Collins <[email protected]> panels and legends updates Signed-off-by: Andrew Collins <[email protected]>

Add similar changes to ovn dashboard, remove old queries

ffb3daf

afcollins force-pushed the node-panels branch from 3996873 to ffb3daf Compare September 25, 2024 16:27

afcollins commented Sep 25, 2024

View reviewed changes

afcollins merged commit 1b03ca2 into cloud-bulldozer:master Sep 27, 2024
2 checks passed

afcollins deleted the node-panels branch September 27, 2024 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node panels #136

Node panels #136

afcollins commented Sep 24, 2024 •

edited

Loading

afcollins Sep 24, 2024

afcollins Sep 24, 2024

afcollins Sep 24, 2024

afcollins Sep 24, 2024

afcollins Sep 24, 2024

smanda99 commented Sep 25, 2024

afcollins Sep 25, 2024

afcollins Sep 25, 2024

afcollins Sep 25, 2024

		query():
		generateTimeSeriesQuery('ovnkube_master_leader', '{{pod}}'),

Node panels #136

Node panels #136

Conversation

afcollins commented Sep 24, 2024 • edited Loading

Type of change

Description

Checklist before requesting a review

Testing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smanda99 commented Sep 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

afcollins commented Sep 24, 2024 •

edited

Loading