Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem when setting up docker enviroment #8

Open
L-Kernegger opened this issue Dec 3, 2024 · 3 comments
Open

Problem when setting up docker enviroment #8

L-Kernegger opened this issue Dec 3, 2024 · 3 comments

Comments

@L-Kernegger
Copy link

Plattform: Nvidia Jetson Nano with Edge TPU installed
Docker version: Docker version 20.10.21, build 20.10.21-0ubuntu1~18.04.3
Ubuntu version:
Distributor ID: Ubuntu
Description: Ubuntu 18.04.6 LTS
Release: 18.04
Codename: bionic
(this version is from the official Nvidia Image)

When executing the command:

nvidia@nvidia-desktop:/mnt/sd/SHMT$ sudo sh scripts/docker_setup_partition.sh 
[sudo] password for nvidia: 
[gpgtpu_partition] - building docker image from dockerfile...
[+] Building 2.5s (23/23) FINISHED                                              
 => [internal] load build definition from Dockerfile                       0.0s
 => => transferring dockerfile: 38B                                        0.0s
 => [internal] load .dockerignore                                          0.0s
 => => transferring context: 2B                                            0.0s
 => [internal] load metadata for nvcr.io/nvidia/l4t-base:r32.4.4           2.1s
 => [auth] nvidia/l4t-base:pull token for nvcr.io                          0.0s
 => [opencv_base 1/4] FROM nvcr.io/nvidia/l4t-base:r32.4.4@sha256:e9d0631  0.0s
 => [internal] load build context                                          0.0s
 => => transferring context: 81B                                           0.0s
 => CACHED [opencv_base 2/4] COPY ./opencv_install_deps.sh opencv_install  0.0s
 => CACHED [opencv_base 3/4] RUN ./opencv_install_deps.sh                  0.0s
 => CACHED [opencv_base 4/4] RUN echo $(ls -lh /usr/include/$(uname -i)-l  0.0s
 => CACHED [build1 1/4] RUN ln -snf /usr/share/zoneinfo/$CONTAINER_TIMEZO  0.0s
 => CACHED [build1 2/4] RUN  sh -c "echo '/usr/local/cuda/lib64' >> /etc/  0.0s
 => CACHED [build1 3/4] RUN  ldconfig                                      0.0s
 => CACHED [build1 4/4] RUN  apt-get install -y build-essential cmake git  0.0s
 => CACHED [build2 1/7] COPY update_sources.sh /                           0.0s
 => CACHED [build2 2/7] RUN /update_sources.sh                             0.0s
 => CACHED [build2 3/7] RUN dpkg --add-architecture armhf                  0.0s
 => CACHED [build2 4/7] RUN dpkg --add-architecture arm64                  0.0s
 => CACHED [build2 5/7] RUN DEBIAN_FRONTEND=noninteractive apt-get instal  0.0s
 => CACHED [build2 6/7] RUN apt-get install -y libeigen3-dev &&     sudo   0.0s
 => CACHED [build2 7/7] RUN apt-get install -y python-scipy                0.0s
 => CACHED [final 1/2] RUN echo "export LD_LIBRARY_PATH=/usr/local/cuda/l  0.0s
 => CACHED [final 2/2] WORKDIR /home                                       0.0s
 => exporting to image                                                     0.2s
 => => exporting layers                                                    0.0s
 => => writing image sha256:9c2344e9f541d38be18d04500859cdae3703ba3380271  0.0s
 => => naming to docker.io/library/gpgtpu_partition_image                  0.0s
gpgtpu_partition_container
gpgtpu_partition_container
[gpgtpu_partition] - build docker container...
42388e49673538748e6f8931cbbb0404455be5e92d7e6f72e319c3c2b5d3a4bd
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #1: error running hook: exit status 1, stdout: , stderr: time="2024-12-03T05:12:28-05:00" level=info msg="Symlinking /mnt/sd/docker-data/overlay2/ecdbbf8f1a0d9147b218c4edbdb70b3d82fe18e175fcd02ed6c9ba34933b475b/merged/etc/vulkan/icd.d/nvidia_icd.json to /usr/lib/aarch64-linux-gnu/tegra/nvidia_icd.json"
time="2024-12-03T05:12:28-05:00" level=error msg="failed to create link [/usr/lib/aarch64-linux-gnu/tegra/nvidia_icd.json /etc/vulkan/icd.d/nvidia_icd.json]: failed to create symlink: failed to remove existing file: remove /mnt/sd/docker-data/overlay2/ecdbbf8f1a0d9147b218c4edbdb70b3d82fe18e175fcd02ed6c9ba34933b475b/merged/etc/vulkan/icd.d/nvidia_icd.json: device or resource busy": unknown.
nvidia@nvidia-desktop:/mnt/sd/SHMT$ 

The folder that this symlink is supposed to be created in is recreated every time the script runs and the original file isn't being held by anything:

nvidia@nvidia-desktop:/mnt/sd/SHMT$ lsof | grep /usr/lib/aarch64-linux-gnu/tegra/nvidia_icd.json
nvidia@nvidia-desktop:/mnt/sd/SHMT$ lsof | grep /etc/vulkan/icd.d/nvidia_icd.json
nvidia@nvidia-desktop:/mnt/sd/SHMT$ lsof | grep /mnt/sd/docker-data/overlay2/ecdbbf8f1a0d9147b218c4edbdb70b3d82fe18e175fcd02ed6c9ba34933b475b/merged/etc/vulkan/icd.d/nvidia_icd.json
nvidia@nvidia-desktop:/mnt/sd/SHMT$ 

Things I already tried that didnt work:
reinstalling docker
using a different storage driver (vfs instead of overlay2)
removing the file while the script is running using another script
mounting a copy of the file to make sure it isn't in use

I hope that this repository is still maintained and that somebody knows how to fix this.

@jk78346
Copy link
Collaborator

jk78346 commented Dec 3, 2024

This wasn't encountered before from our side. Would the very first build with fresh environment still have the unremove-able file? At this moment I don't see it's related to the Dockerfile we provide as it passes the building.

@L-Kernegger
Copy link
Author

The file is in a directory that is recreated each time the container is tried to be built, meaning that no enviroment has that file before building.
I did not mean to blame it on your Dockerfile, I was just wondering wether or not you had an idea for a solution.

@L-Kernegger
Copy link
Author

Could you please provide the docker version it ran on successfully on your end, so that I can try if it is erroring due to that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants