Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abnormal workflow termination can orphan NVMe namespaces #148

Open
ajfloeder opened this issue Apr 5, 2024 · 0 comments
Open

Abnormal workflow termination can orphan NVMe namespaces #148

ajfloeder opened this issue Apr 5, 2024 · 0 comments
Assignees

Comments

@ajfloeder
Copy link
Contributor

Abnormally terminated workflows can fail to cleanup nvme namespaces.

Documenting the symptom here. Not yet sure of the root cause(s).

Cleanup method

Orphaned NVMe Namespaces?
If all of your workflows have completed, you can check a particular rabbit to determine if it has orphaned NVMe namespaces by:

~/tools/nvme.sh list

If there are namespaces listed there, they are orphaned.

The easy way to delete these namespaces is:

  1. delete the nnfnodeecdata resource for the Rabbit in question
  2. delete the nnf-node-manager pod for the Rabbit in question

The nnf-node-manager pod will restart automatically. Because its nnfnodeecdata resource has been removed, it will cleanup all existing namespaces during initialization..

@ajfloeder ajfloeder self-assigned this Apr 5, 2024
@github-project-automation github-project-automation bot moved this to 📋 Open in Issues Dashboard Apr 5, 2024
@ajfloeder ajfloeder changed the title Abnormal workflow termination can leak nvme namespaces Abnormal workflow termination can orphan NVMe namespaces Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 📋 Open
Development

No branches or pull requests

1 participant