Abnormal workflow termination can orphan NVMe namespaces #148

ajfloeder · 2024-04-05T17:18:54Z

Abnormally terminated workflows can fail to cleanup nvme namespaces.

Documenting the symptom here. Not yet sure of the root cause(s).

Cleanup method

Orphaned NVMe Namespaces?
If all of your workflows have completed, you can check a particular rabbit to determine if it has orphaned NVMe namespaces by:

~/tools/nvme.sh list

If there are namespaces listed there, they are orphaned.

The easy way to delete these namespaces is:

delete the nnfnodeecdata resource for the Rabbit in question
delete the nnf-node-manager pod for the Rabbit in question

The nnf-node-manager pod will restart automatically. Because its nnfnodeecdata resource has been removed, it will cleanup all existing namespaces during initialization..

The text was updated successfully, but these errors were encountered:

ajfloeder self-assigned this Apr 5, 2024

github-project-automation bot added this to Issues Dashboard Apr 5, 2024

github-project-automation bot moved this to 📋 Open in Issues Dashboard Apr 5, 2024

ajfloeder changed the title ~~Abnormal workflow termination can leak nvme namespaces~~ Abnormal workflow termination can orphan NVMe namespaces Apr 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abnormal workflow termination can orphan NVMe namespaces #148

Abnormal workflow termination can orphan NVMe namespaces #148

ajfloeder commented Apr 5, 2024

Abnormal workflow termination can orphan NVMe namespaces #148

Abnormal workflow termination can orphan NVMe namespaces #148

Comments

ajfloeder commented Apr 5, 2024