Running the SCNN docker on AWS #11

bhargav253 · 2019-06-13T20:25:37Z

I am trying to run the docker container on AWS-p2 instance with 1 K80 Tesla Card.
The default config of the instance is
CUDA : 10.1
Driver : 418.67

I tried following the instructions to manually install a different nvidia driver version from this link

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html

But I am unable to install the driver version specified "367.57".
The wget command in that instruction fails saying that such a driver version is unavailable for tesla series.

When I try to run the docker with the default driver version, it fails on some CUDA call, and complains about the driver version.

Have you tried running it recently on AWS? Have you faced similar issues?

bhargav253 · 2019-06-17T02:48:15Z

I keep getting this error while trying to check the gpu version after launching the docker

root@c107a0693ba7:~/scnn# nvidia-smi
Failed to initialize NVML: Driver/library version mismatch

The docker container is unable to use the underlying GPU on my AWS instance. It complaints about driver version mismatch.
I am unable to rollback the nvidia driver version on my instance to exactly match the version mentioned.

Apparently, there are container best practices like in the link below which talks about how to avoid these exact driver miss-comparability issues while composing dockers.

https://hackernoon.com/docker-compose-gpu-tensorflow-%EF%B8%8F-a0e2011d36

cooperlab · 2019-06-17T21:34:55Z

We are going to re-package this to avoid the driver/library conflicts. It won't happen immediately but is on our short list of things to do.

pranjalvaidya · 2020-02-24T23:56:21Z

Any update on this matter? We have been really interested in using the model; however, we've experienced some issues related to libraries and drivers.

pooyam closed this as completed Feb 27, 2020

pooyam reopened this Feb 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running the SCNN docker on AWS #11

Running the SCNN docker on AWS #11

bhargav253 commented Jun 13, 2019

bhargav253 commented Jun 17, 2019

cooperlab commented Jun 17, 2019

pranjalvaidya commented Feb 24, 2020

Running the SCNN docker on AWS #11

Running the SCNN docker on AWS #11

Comments

bhargav253 commented Jun 13, 2019

bhargav253 commented Jun 17, 2019

cooperlab commented Jun 17, 2019

pranjalvaidya commented Feb 24, 2020