-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusion about ~/.nv/ComputeCache behavior with docker #272
Comments
First, I am not familiar with Docker and I do not use it personally.
It is related to CUDA, not LuaJIT.
So I guess that if you build Torch7(cutorch and cunn) with
I guess this will work. waifu2x.udp.jp uses AMI without Docker. (However, it takes about 30 seconds at the first execution). |
I also was using the AMI without Docker and things were working properly, but when I added Docker the initial execution took 10 minutes (as opposed to 30 seconds on the bare AMI), so it might just be a simple docker integration issue. The specific hangup is that importing cudnn takes 10 minutes... there's a line in cudnn that tries to configure the gpus and struggles with Volta. I'm also new to Docker, so the cacheing issue might be a red herring but I'm still working through it. It seems plausible. You had mentioned that cacheing might have been the issue here (I hadn't realized it was you that pointed me here :) ) soumith/cudnn.torch#385 |
ok, I will try to build cuda-torch:10.1 image and test it on p3 instance. |
I don't believe cudnn has cuda 10 bindings, I was seeing this behavior with Cuda 9 and cudnn 7.1 Here's the issue + docker file for how I was building it: |
I have built a Docker image, I changed to generate binaries for sm_70(volta) and sm_75 at docker build.
Dockerfile for torch7: https://github.com/nagadomi/distro/blob/cuda10/Dockerfile |
OK, it works.
|
Thanks for the incredibly quick response and guidance. I notice that the waifu2x dockerfile is skipping the soumith cudnn install/make that is seen in the From
|
cudnn.torch is installed at the time of installation of torch7. |
Can confirm that this strategy worked. I realized that I was originally using the Amazon Linux Deep Learning AMI instead of the Ubuntu Deep Learning AMI. It's very possible that the Amazon Linux distro simply doesn't work properly with NVIDIA or cuda or nvidia-docker, there have been reports of similar issues. Thanks for taking the time to help me throught this @nagadomi |
As the README.md says:
Does this mean that when using docker waifu2x will always run very slowly the first time that waifu2x is executed on the host volume and subsequent executions on the host volume will be faster? Is luajit compiling the program the first time that it is used and then executing the compiled version in subsequent runs?
My specific use-case is that I'm executing the docker image in the cloud (AWS EC2 -- p3.2xlarge instances using the Volta architecture). This means that the host volume changes frequently. So, if I spin up a new EC2 instance from an AMI that has never executed waifu2x before, will the first execution of the docker image always be slow (even if I pass the ComputeCache path to docker). If so, I would generate the AMI after executing waifu2x so that the binary is already in the ComputeCache when the server is started, but that step is nontrivial in practice
Are there additional steps I need to take to "prime" the host container with precompiled binaries/libraries for the Volta architecture that would make subsequent docker executions run more quickly? Is it possible to simply build waifu2x ahead of time, instead of relying on JIT?
The text was updated successfully, but these errors were encountered: