Skip to content

Implementation of distributed deep learning with multi GPUs

Notifications You must be signed in to change notification settings

OkYongChoi/multi-gpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Distributed Learning with Multi-Nodes and Multi-GPUs

I used WideResNet model for Cifar10 classifier and NovoGrad for optimizer

PyTorch with Ray

install

$ pip install ray 
$ pip install torch_optimizer # for NovoGrad

Run the script

python wideresenet-cifar10.py

Reference

Distributed PyTorch of Ray docs

Tensorflow with Horovod

I Modified the code from the NVIDA multi-GPU course.

Install horovod from dockerhub

I've tried other ways but failed to install horovod properly. Thus, I strongly recommend you to pull the image of horovod from https://hub.docker.com/r/horovod/horovod.

You may need to pip install scipy and tensorflow-addons to run this script more on this docker image.

Run the script

e.g. Single-Node

$ horovodrun -np $num_gpus python wideresnet-cifar10.py --epochs 5 --batch-size 512
or
$ mpirun -np $num_gpus python wideresnet-cifar10.py --epochs 5 --batch-size 512

e.g. Multi-Node

$ horovodrun -np 16 -H server1:4,server2:4,server3:4,server4:4 python wideresnet-cifar10.py

About

Implementation of distributed deep learning with multi GPUs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages