layout | title | tagline |
---|---|---|
page |
Utilizing GPUs with Singularity |
Utilizing GPGPUs on the Maverick supercomputer through containerized environments.
You can register your app to ANY system at TACC, but Maverick may not always be the best choice if you don't always need GPUs.
System | Cores/Node | Pros | Limitations |
---|---|---|---|
Stampede | 16 | Thousands of nodes, Xeon Phi accelerators | Retiring ~ Dec2017 |
Stampede 2 Phase1 | 68 | Thousands of nodes, KNL processors | Slow for serial code |
Stampede 2 Phase2 | 48 | Thousands of nodes, Skylake processors | Coming Soon, High Demand |
Lonestar 5 | 24 | Compute, GPUs, Large-mem | UT only, slow external network |
Wrangler | 24 | SSD Filesystem for fast I/O, Hosted Databases, Hadoop, HDFS | Low node-count |
Jetstream | 24 | Long running instances, root access | Limited storage |
Maverick | 20 | GPUs, high memory nodes | Deprecated software stack |
Chameleon | Variable | GPUs, bare metal VM, software defined networking | Difficult to configure |
Catapult | 16 | FPGAs | Windows-only |
You can learn about all choices at the TACC Systems Overview. Detailed specifications can be found in the User Guide of each system.
If you have an application already configured on a non-tacc system, you can register that system to the Designsafe agave tenant.
After registration, you can not only run applications, but access data as well. Just remember that applications will run as YOUR user when you share them with others.
TACC supports containerized compute environments through Singularity, which provides environment encapsulation without privilege escalation (root). Singularity provides the following functionality:
- Environment encapsulation
- Image based containers (single file)
- Devices and interconnects are passed into container
- Infiniband
- GPGPUs
- No abnormal privilege escalation allowed
- No root daemons
- Containers are read-only when not root
- Pass in filesystems and directories your user has access to
Since version 2.3, Singularity has supported the two following workflows
Create a Singularity container from scratch.
- Create image of specific size
- (sudo) bootstrap image
- (sudo) add content through definition file
- (sudo) manually install software
- Done
http://singularity.lbl.gov/archive/docs/v2-3/bootstrap-image
Utilize your knowledge of Docker to create Singularity images.
- Pull docker image
- Run docker image
http://singularity.lbl.gov/archive/docs/v2-3/docs-docker
These containers are run without root, so you simply
- run - Run the default functionality of the container, which takes in arguments
- exec - Execute a specific command inside the container, and then exit
- shell - Enter the container and interactively run commands
Since Singularity supported docker containers, it has been fairly simple to utilize GPUs for machine learning code like TensorFlow. From Maverick, which is TACC's GPU system:
# Work from a compute node
idev -m 60
# Load the singularity module
module load tacc-singularity
# Pull your image
singularity pull docker://nvidia/caffe:latest
#
singularity exec --nv caffe-latest.img caffe device_query -gpu 0
Please note that the --nv
flag specifically passes the GPU drivers into the container. If you leave it out, the GPU will not be detected.
singularity exec caffe-latest.img caffe device_query -gpu 0
For TensorFlow, you can directly pull their latest GPU image and utilize it as follows.
# Change to your $WORK directory
cd $WORK
#Get the software
git clone https://github.com/tensorflow/models.git ~/models
# Pull the image
singularity pull docker://tensorflow/tensorflow:latest-gpu
# Run the code
singularity exec --nv tensorflow-latest-gpu.img python $HOME/models/tutorials/image/mnist/convolutional.py
You probably noticed that we check out the models repository into your $HOME
directory. This is because your $HOME
and $WORK
directories are only available inside the container if the root folders /home
and /work
exist inside the container. In the case of tensorflow-latest-gpu.img
, the /work
directory does not exist, so any files there are inaccessible to the container.
You may be thinking "what about overlayfs??". The Linux kernel on Maverick does not support overlayfs, so it had to be disabled in our singularity install.
You can then use these methods in your next Agave app.