Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'podman start wkdev' shows error: nvidia-ctk exit code 1 #72

Open
pgorszkowski-igalia opened this issue Nov 5, 2024 · 2 comments
Open
Assignees

Comments

@pgorszkowski-igalia
Copy link
Contributor

podman start wkdev
Error: OCI runtime error: unable to start container "346dbfb347b39d13c11d7731cbe0e37da76fd4fb6d097d39534774a4e2a60144": crun: {"msg":"error executing hook `/usr/bin/nvidia-ctk` (exit code: 1)","level":"error","time":"2024-11-05T09:37:14.582274Z"}

This problem happened to me after updating the SDK (wkdev-update). I updated also nvidia tools with wkdev-setup-nvidia-gpu-for-container (remove /etc/cdi/nvidia.yml before it).

podman logs -f wkdev gives empty logs.

I also tried to create a new container with the same effect.

The problem was that I had an old version of nvidia-ctk (nvidia-container-toolkit-base package):

 /usr/bin/nvidia-ctk --version
NVIDIA Container Toolkit CLI version 1.13.5
commit: 6b8589dcb4dead72ab64f14a5912886e6165c079

wkdev-setup-nvidia-gpu-for-container should update it, but it didn't because my /etc/apt/sources.list.d/libnvidia-container.list contained commented and not updated (after upgrading the system) proper links

After updating /etc/apt/sources.list.d/libnvidia-container.list with proper url and updating the nvidia-container-toolkit-base all works as expected.

@pgorszkowski-igalia pgorszkowski-igalia self-assigned this Nov 5, 2024
@pgorszkowski-igalia
Copy link
Contributor Author

@TingPing : maybe in nvidia-container-toolkit-base it would be good to diff existing libnvidia-container.list and the one from the link and in case of differences, provide a warning to the user, WDYT?

@pgorszkowski-igalia
Copy link
Contributor Author

Based on @nikolaszimmermann knowledge the reason for the error from nvidia-ctk was mounting the whole /dev folder (NVIDIA/nvidia-container-toolkit#143) and the problem is fixed from version > 1.15.0-rc.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant