Distributed Learning with Multi-Nodes and Multi-GPUs

I used WideResNet model for Cifar10 classifier and NovoGrad for optimizer

PyTorch with Ray

install

$ pip install ray 
$ pip install torch_optimizer # for NovoGrad

Run the script

python wideresenet-cifar10.py

Reference

Distributed PyTorch of Ray docs

Tensorflow with Horovod

I Modified the code from the NVIDA multi-GPU course.

Install horovod from dockerhub

I've tried other ways but failed to install horovod properly. Thus, I strongly recommend you to pull the image of horovod from https://hub.docker.com/r/horovod/horovod.

You may need to pip install scipy and tensorflow-addons to run this script more on this docker image.

Run the script

e.g. Single-Node

$ horovodrun -np $num_gpus python wideresnet-cifar10.py --epochs 5 --batch-size 512
or
$ mpirun -np $num_gpus python wideresnet-cifar10.py --epochs 5 --batch-size 512

e.g. Multi-Node

$ horovodrun -np 16 -H server1:4,server2:4,server3:4,server4:4 python wideresnet-cifar10.py

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
tf-horovod		tf-horovod
torch-ray		torch-ray
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Learning with Multi-Nodes and Multi-GPUs

PyTorch with Ray

install

Run the script

Reference

Tensorflow with Horovod

Install horovod from dockerhub

Run the script

About

Releases

Packages

Languages

OkYongChoi/multi-gpus

Folders and files

Latest commit

History

Repository files navigation

Distributed Learning with Multi-Nodes and Multi-GPUs

PyTorch with Ray

install

Run the script

Reference

Tensorflow with Horovod

Install horovod from dockerhub

Run the script

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages