-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node panels #136
Node panels #136
Conversation
@@ -30,10 +30,14 @@ format: deps | |||
|
|||
build: deps $(LIBRARY_PATH) $(outputs) | |||
|
|||
clean: | |||
clean-all: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New make task so binaries can still be deleted, but are not deleted every time.
+ options.legend.withSortDesc(true) | ||
+ options.legend.withPlacement('bottom'), | ||
|
||
genericLegendCounter(title, unit, targets, gridPos): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New panel type with different legend fields, more relevant for counters and memory.
@@ -3,8 +3,7 @@ local var = g.dashboard.variable; | |||
|
|||
{ | |||
datasource: | |||
var.datasource.new('datasource', 'prometheus') | |||
+ var.datasource.withRegex('/^Cluster Prometheus$/'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a dashboard against a prometheus outside of openshift, I update this variable after importing. Instead, just deleting for good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed on ROSA cluster that the dashboard variable auto-populates to 'Cluster Prometheus'
panels.timeSeries.genericLegend('ovs-worker CPU Usage', 'percent', queries.OVSCPU.query('$_worker_node'), { x: 0, y: 21, w: 12, h: 8 }), | ||
panels.timeSeries.genericLegend('ovs-worker Memory Usage', 'bytes', queries.OVSMemory.query('$_worker_node'), { x: 12, y: 21, w: 12, h: 8 }), | ||
panels.timeSeries.generic('99% Pod Annotation Latency', 's', queries.ovnAnnotationLatency.query(), { x: 0, y: 1, w: 24, h: 12 }), | ||
panels.timeSeries.generic('99% CNI Request ADD Latency', 's', queries.ovnCNIAdd.query(), { x: 0, y: 13, w: 12, h: 8 }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These y
values were causing these three panels to pop out of the row.
Also, the metrics seem far less frequently used than CPU and memory usage, so I also moved them to the bottom so relevant panels stay at the top.
lgtm |
Add panels that show cluster view Signed-off-by: Andrew Collins <[email protected]>
Signed-off-by: Andrew Collins <[email protected]> panels and legends updates Signed-off-by: Andrew Collins <[email protected]>
3996873
to
ffb3daf
Compare
query(): | ||
generateTimeSeriesQuery('ovnkube_master_leader', '{{pod}}'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metric doesn't exist. Replacement is only _leader
that is unique, as ovnkube_controller_leader
is 0 for all pods.
}, | ||
|
||
ovnNorthd: { | ||
query(): | ||
generateTimeSeriesQuery('ovn_northd_status', '{{pod}}'), | ||
}, | ||
|
||
ovnNbdbLeader: { | ||
query(): | ||
generateTimeSeriesQuery('ovn_db_cluster_server_role{server_role="leader",db_name="OVN_Northbound"}', '{{pod}}'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing both as neither metric exists any longer.
numOnvController: { | ||
query(): | ||
generateTimeSeriesQuery('count(ovn_controller_monitor_all) by (namespace)', ''), | ||
}, | ||
|
||
ovnKubeControlPlaneCPU: { | ||
query(): | ||
generateTimeSeriesQuery('irate(container_cpu_usage_seconds_total{pod=~"(ovnkube-master|ovnkube-control-plane).+",namespace="openshift-ovn-kubernetes",container!~"POD|"}[2m])*100','{{container}}-{{pod}}-{{node}}'), | ||
generateTimeSeriesQuery('sum( irate(container_cpu_usage_seconds_total{pod=~"(ovnkube-master|ovnkube-control-plane).+",namespace="openshift-ovn-kubernetes",container!~"POD|"}[2m])*100 ) by (pod, node)', '{{pod}} - {{node}}'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
label formatting to match ocp-performance dashboard
Type of change
Description
Adding a new section to the OCP Performance dashboard that help me get an quick overview of the cluster for any nodes or issues to dive into.
Reordered OVN Dashboard for relevance, but also because the panels pop out of the row.
Also removed old metrics.
Makefile changes to allow generated dashboard cleanup without deleting and redownloading binaries.
variables changes to be more flexible with prometheus that may not be running inside an openshift cluster.
Checklist before requesting a review
Testing
I run
make
and import the generated dashboards in a grafana running locally against a locally running prometheus.