Skip to content

Latest commit

 

History

History
 
 

01_distributedDeepLearning

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Distributed Deep Learning

Led by Corey Adams and Huihuo Zheng from ALCF

This section of the workshop will introduce to you the methods we use to run distributed deep learning training on ALCF resources like Theta and ThetaGPU.

We show distributed training using two frameworks:

  1. Horovod (for TensorFlow and PyTorch), and
  2. DistributedDataParallel (DDP) (for PyTorch only).