Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Virtual Machines related dashboards #1613

Merged

Conversation

sradco
Copy link
Contributor

@sradco sradco commented Sep 10, 2024

In this PR we introduce 4 dashboards related to Virtual Machines:

  1. Service Level dashboards - Virtual Machines by Time in Status
  2. Inventory dashboards - Virtual Machines Inventory
  3. Executive dashboards - Single Cluster View
  4. Executive dashboards - Single Virtual Machine View

And update the "ACM - OpenShift Virtualization Overview"
name to "Executive dashboards - Clusters Overview"
and enriched it with additional details.

Also moved all dashboards under the "ACM / OpenShift Virtualization"
folder.

Signed-off-by: Shirly Radco [email protected]

@sradco sradco force-pushed the add_virtual_machines_inventory_dashboard branch from 5bae7a4 to 766ee1f Compare September 10, 2024 15:19
@sradco sradco closed this Sep 10, 2024
@sradco sradco reopened this Sep 10, 2024
@sradco
Copy link
Contributor Author

sradco commented Sep 10, 2024

Hi @moadz , please review the following dashboards. Thank you.

@sradco sradco changed the title Add Virtual Machines inventory and health dashboards Add actionable Virtual Machines inventory and health dashboards Sep 11, 2024
name: grafana-dashboard-acm-virtual-machines-inventory
namespace: open-cluster-management-observability
annotations:
observability.open-cluster-management.io/dashboard-folder: "ACM / OpenShift Virtualization"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moadz is there another place I need to create the folder or this is enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is just "OpenShift Virtualisation" sufficient afaik there are only ACM dashboards in that grafana anyway, so the top level folder is redundant.

@sradco sradco force-pushed the add_virtual_machines_inventory_dashboard branch from 766ee1f to f7972f8 Compare September 13, 2024 10:11
@sradco sradco changed the title Add actionable Virtual Machines inventory and health dashboards Add Virtual Machines related dashboards Sep 13, 2024
@sradco sradco force-pushed the add_virtual_machines_inventory_dashboard branch 5 times, most recently from aa6056f to e4db282 Compare September 17, 2024 09:05
@sradco sradco force-pushed the add_virtual_machines_inventory_dashboard branch 2 times, most recently from 46acad7 to 3b1b00e Compare September 17, 2024 12:21
@moadz
Copy link
Contributor

moadz commented Sep 18, 2024

/hold

Copy link
Contributor

@moadz moadz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending we can go through this together @sradco let's organise some time to view them live :)

"id": 29,
"uid": "OmpD1ZoSk",
"iteration": 1709210609633,
"id": 45,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note is all the dashboards need a unique UUID as per this PR
https://github.com/stolostron/multicluster-observability-operator/pull/1566/files if they ever link back to themselves in the dashboard definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All dashboards have a uid

Comment on lines +146 to +159
- kubevirt_vmi_info
- kubevirt_vm_running_status_last_transition_timestamp_seconds
- kubevirt_vm_non_running_status_last_transition_timestamp_seconds
- kubevirt_vm_error_status_last_transition_timestamp_seconds
- kubevirt_vm_starting_status_last_transition_timestamp_seconds
- kubevirt_vm_migrating_status_last_transition_timestamp_seconds
- kubevirt_vmi_memory_available_bytes
- kubevirt_vmi_phase_count
- kubevirt_vmi_cpu_usage_seconds_total
- kubevirt_vmi_network_receive_bytes_total
- kubevirt_vmi_network_transmit_bytes_total
- kubevirt_vmi_storage_iops_read_total
- kubevirt_vmi_storage_iops_write_total

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not relevant to this PR, but we probably want to maintain independent allowlists in the future as we support more than one allowlist.

}
]
},
"description": "This dashboard provides a comprehensive list of all virtual machines across clusters, offering detailed insights into each VM’s configuration. Users can easily search and filter VMs based on various attributes, enabling quick access to specific information for effective management and troubleshooting.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth getting @dockerymick to do a review of the copy on these dashboards. I think there's a style guide from the doc writers they would probably like to maintain here as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"description": "This dashboard provides a comprehensive list of all virtual machines across clusters, offering detailed insights into each VM’s configuration. Users can easily search and filter VMs based on various attributes, enabling quick access to specific information for effective management and troubleshooting.",
"description": "This dashboard provides a comprehensive list of all virtual machines deployed in a given managed cluster, offering detailed insights into each VM’s configuration. Users can easily search and filter VMs based on their attributes, enabling quick access to specific information for effective management and troubleshooting.",

}
]
},
"description": "This dashboard provides a quick overview of the health status of Virtual Machines (VMs) across clusters in the KubeVirt environment. It helps users identify VMs that are currently in unhealthy states and those that have been in such states for an extended period, potentially making them candidates for cleanup. Use the filters to customize the view based on cluster, namespace, VM name, and duration in an unhealthy state for efficient monitoring and management.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@moadz moadz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tags for @dockerymick

}
]
},
"description": "This dashboard provides detailed insights into a specific virtual machine, including its configuration, resource consumption, and any active alerts. It allows users to easily locate the virtual machine across multiple clusters, facilitating efficient monitoring and troubleshooting.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -21,44 +21,41 @@ data:
}
]
},
"description": "This dashboard shows details related to the OpenShift Virtualization operator and Virtual machines",
"description": "This dashboard provides insights into the health of the OpenShift Virtualization operator and displays key metrics for virtual machines at the cluster level across multiple clusters. It helps monitor and assess the status of virtualization resources, ensuring smooth operation and quick identification of issues.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
]
},
"description": "This dashboard provides comprehensive insights into the cluster with OpenShift Virtualization installed. It offers detailed information about the virtual machines (VMs), associated resources, and any active alerts, enabling efficient monitoring and management of virtualization within the cluster.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dockerymick
Copy link
Contributor

Thanks @moadz for tagging in me in areas that need a review. I will get to this ASAP

Copy link
Contributor

@dockerymick dockerymick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested to use sentence-case for titles because we use sentence-case for the RHACM console (which follows PatternFly guidance). I also did not do a full review of each file because the number of lines. I did a complete review of the first .yaml file. Essentially, most of my suggestions would probably be similar to `dash-acm-openshift-virtualization-overview.yaml

}
],
"title": "OpenShift Virtualization Health by Cluster",
"title": "Number of VMs started in the last 7 days",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that this title is sentenced-cased

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dockerymick , What did you mean here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just agree with this change. This is basically what I am asking for when I say sentence case

@openshift-ci openshift-ci bot removed the lgtm label Sep 19, 2024
@sradco sradco force-pushed the add_virtual_machines_inventory_dashboard branch from 3b1b00e to a2ca0c1 Compare September 19, 2024 20:42
@sradco sradco force-pushed the add_virtual_machines_inventory_dashboard branch from 69d4853 to 39dfe5e Compare September 19, 2024 20:53
@moadz
Copy link
Contributor

moadz commented Sep 20, 2024

/unhold

@moadz
Copy link
Contributor

moadz commented Sep 20, 2024

/lgtm

@moadz
Copy link
Contributor

moadz commented Sep 20, 2024

/retest

@openshift-ci openshift-ci bot added the lgtm label Sep 20, 2024
@sradco sradco force-pushed the add_virtual_machines_inventory_dashboard branch from 39dfe5e to a70a4e6 Compare September 21, 2024 06:45
@openshift-ci openshift-ci bot removed the lgtm label Sep 21, 2024
@sradco
Copy link
Contributor Author

sradco commented Sep 21, 2024

Hi @moadz, I had to rebase so there LGTM was removed

@moadz
Copy link
Contributor

moadz commented Sep 23, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Sep 23, 2024
@dockerymick
Copy link
Contributor

Before merging, can we address my suggestions or at least have a discussion about them ?

@sradco
Copy link
Contributor Author

sradco commented Sep 24, 2024

Before merging, can we address my suggestions or at least have a discussion about them ?

Can we address them in a separate PR? I believe this is urgent. @moadz ?

@moadz moadz requested a review from dockerymick September 24, 2024 11:55
@moadz
Copy link
Contributor

moadz commented Sep 24, 2024

Before merging, can we address my suggestions or at least have a discussion about them ?

Can we address them in a separate PR? I believe this is urgent. @moadz ?

Yes discussed it with @dockerymick we can do a follow up with the docs fixes i'm just keen to get the dashboards QE'ed, Mikela are you able to remove the request changes?

@sradco sradco force-pushed the add_virtual_machines_inventory_dashboard branch from a70a4e6 to 12322c5 Compare September 24, 2024 13:28
@openshift-ci openshift-ci bot removed the lgtm label Sep 24, 2024
In this PR we introduce 4 dashboard related to
Virtual Machines:

1. Service Level dashboards - Virtual Machines by Time in Status
2. Inventory dashboards - Virtual Machines Inventory
3. Executive dashboards - Single Cluster View
4. Executive dashboards - Single Virtual Machine View

And update the "ACM - OpenShift Virtualization Overview"
name to "Executive dashboards - Clusters Overview"
and enriched it with additional details.

Also moved all dashboards under the "ACM / OpenShift Virtualization"
folder.

Signed-off-by: Shirly Radco <[email protected]>
@sradco sradco force-pushed the add_virtual_machines_inventory_dashboard branch from 12322c5 to aaaa718 Compare September 24, 2024 13:36
@sradco
Copy link
Contributor Author

sradco commented Sep 24, 2024

@moadz Im sorry, I had to get an important fix that would impact future support of the dashboards

@dockerymick
Copy link
Contributor

I think you can just merge without doing the request or just resolve them. I am on PTO. I can go through the resolve if yall are unable to merge

@dockerymick dockerymick removed their request for review September 24, 2024 14:06
Copy link
Contributor

@dockerymick dockerymick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved my suggestions

Copy link

@moadz
Copy link
Contributor

moadz commented Sep 25, 2024

/retest

@moadz
Copy link
Contributor

moadz commented Sep 25, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Sep 25, 2024
Copy link

openshift-ci bot commented Sep 25, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: moadz, sradco

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@moadz moadz merged commit 4f2fb5f into stolostron:main Sep 25, 2024
22 checks passed
Copy link

openshift-ci bot commented Sep 25, 2024

@sradco: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/images aaaa718 link unknown /test images
ci/prow/test-e2e aaaa718 link unknown /test test-e2e
ci/prow/e2e-kind aaaa718 link unknown /test e2e-kind

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants