Skip to content

AWS EC2 ML instance setup

Scott Came edited this page Mar 20, 2023 · 13 revisions

Steps for setting up an AWS EC2 instance for Docker+R+Keras

Based loosely on: https://github.com/NVIDIA/nvidia-docker (docker setup)

Start by launching a p2.xlarge instance with Ubuntu 16.04. Then follow the steps in each section below (via ssh into the instance, using the key pair you selected at launch).

Set up Docker

Original instructions: https://docs.docker.com/install/linux/docker-ce/ubuntu/

  1. sudo apt-get update
  2. sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
  3. curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
  4. sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
  5. sudo apt-get update
  6. sudo apt-get install -y docker-ce
  7. sudo usermod -aG docker $USER
  8. sudo reboot

(wait for reboot...usually takes 30-60 seconds)

Test: $ docker run hello-world


Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/engine/userguide/

Install NVIDIA drivers

Original instructions: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html#obtain-nvidia-driver-linux

  1. sudo apt-get upgrade -y linux-aws
  2. sudo reboot

(wait for reboot)

  1. sudo apt-get install -y gcc make linux-headers-$(uname -r)
  2. cd /tmp
  3. curl -O http://us.download.nvidia.com/tesla/384.145/NVIDIA-Linux-x86_64-384.145.run
  4. sudo /bin/sh ./NVIDIA-Linux-x86_64-384.145.run Note: accept (OK) the popups regarding guessing at the X Windows library location, and 32-bit compatibility.
  5. rm NVIDIA-Linux-x86_64-384.145.run
  6. sudo apt-get remove --purge -y gcc make linux-headers-$(uname -r)
  7. sudo apt-get autoremove -y
  8. sudo apt-get autoclean -y
  9. sudo reboot

(wait for reboot)

Test driver install: $ nvidia-smi

Sun Jun 24 17:05:33 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.145                Driver Version: 384.145                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   59C    P0    60W / 149W |      0MiB / 11439MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Setup NVIDIA Docker runtime

Original instructions: https://github.com/NVIDIA/nvidia-docker

  1. curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
  2. distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
  3. curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
  4. sudo apt-get update
  5. sudo apt-get install -y nvidia-docker2
  6. sudo pkill -SIGHUP dockerd

Test: sudo nvidia-container-cli --load-kmods info

NVRM version:   384.145
CUDA version:   9.0

Device Index:   0
Device Minor:   0
Model:          Tesla K80
GPU UUID:       GPU-a261b3a1-a7c4-5170-2494-2b9e09cf0b82
Bus Location:   00000000:00:1e.0
Architecture:   3.7

And: docker run --rm --runtime=nvidia -ti nvidia/cuda nvidia-smi

Sun Jun 24 17:09:59 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.145                Driver Version: 384.145                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   61C    P0    60W / 149W |      0MiB / 11439MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Note that to run containers with the GPU available, you have to use the nvidia runtime