-
Notifications
You must be signed in to change notification settings - Fork 9
AWS EC2 ML instance setup
Based loosely on: https://github.com/NVIDIA/nvidia-docker (docker setup)
Start by launching a p2.xlarge instance with Ubuntu 16.04. Then follow the steps in each section below (via ssh into the instance, using the key pair you selected at launch).
Original instructions: https://docs.docker.com/install/linux/docker-ce/ubuntu/
sudo apt-get update
sudo apt-get install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install -y docker-ce
sudo usermod -aG docker $USER
sudo reboot
(wait for reboot...usually takes 30-60 seconds)
Test: $ docker run hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/engine/userguide/
Original instructions: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html#obtain-nvidia-driver-linux
sudo apt-get upgrade -y linux-aws
sudo reboot
(wait for reboot)
sudo apt-get install -y gcc make linux-headers-$(uname -r)
cd /tmp
curl -O http://us.download.nvidia.com/tesla/384.145/NVIDIA-Linux-x86_64-384.145.run
-
sudo /bin/sh ./NVIDIA-Linux-x86_64-384.145.run
Note: accept (OK) the popups regarding guessing at the X Windows library location, and 32-bit compatibility. rm NVIDIA-Linux-x86_64-384.145.run
sudo apt-get remove --purge -y gcc make linux-headers-$(uname -r)
sudo apt-get autoremove -y
sudo apt-get autoclean -y
sudo reboot
(wait for reboot)
Test driver install: $ nvidia-smi
Sun Jun 24 17:05:33 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.145 Driver Version: 384.145 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 59C P0 60W / 149W | 0MiB / 11439MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Original instructions: https://github.com/NVIDIA/nvidia-docker
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd
Test: sudo nvidia-container-cli --load-kmods info
NVRM version: 384.145
CUDA version: 9.0
Device Index: 0
Device Minor: 0
Model: Tesla K80
GPU UUID: GPU-a261b3a1-a7c4-5170-2494-2b9e09cf0b82
Bus Location: 00000000:00:1e.0
Architecture: 3.7
And: docker run --rm --runtime=nvidia -ti nvidia/cuda nvidia-smi
Sun Jun 24 17:09:59 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.145 Driver Version: 384.145 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 61C P0 60W / 149W | 0MiB / 11439MiB | 98% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Note that to run containers with the GPU available, you have to use the nvidia runtime