Using 4 GPUs for training takes the same time as using just 1 #202

MiguelCosta94 · 2023-12-05T19:59:29Z

I'm training a BigGAN with differential augmentation and LeCam optimization on a custom dataset. My setup features 4 NVIDIA RTX 3070 and I'm running on Ubuntu 20.04. I observe that running the training on the 4 GPUs, using Distributed Data Parallel takes the same time as performing the training using a single GPU. Am I doing something wrong?

For training using a single GPU, I'm using the following command:
CUDA_VISIBLE_DEVICES=0 python3 src/main.py -t -hdf5 -l -std_stat -std_max 64 -std_step 64 -metrics fid is prdc -ref "train" -cfg src/configs/VWW/BigGAN-DiffAug-LeCam.yaml -data ../Datasets/vw_coco2014_96_GAN -save SAVE_PATH_VWW -mpc --post_resizer "friendly" --eval_backbone "InceptionV3_tf"

For training using the 4 GPUs, I'm using the following commands:
export MASTER_ADDR=localhost
export MASTER_PORT=1234
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 src/main.py -t -DDP -tn 1 -cn 0 -std_stat -std_max 64 -std_step 64 -metrics fid is prdc -ref "train" -cfg src/configs/VWW/BigGAN-DiffAug-LeCam.yaml -data ../Datasets/vw_coco2014_96_GAN -save SAVE_PATH_VWW -mpc --post_resizer "friendly" --eval_backbone "InceptionV3_tf"

The text was updated successfully, but these errors were encountered:

mingukkang · 2024-01-31T07:48:01Z

Could you please check the batch size used in the training process?

If you are using 1 GPU with a batch size of 256, it is advisable to switch to 4 GPUs, each with a batch size of 64, in order to accelerate training. It's important not to use the 256 batch size for each GPU for faster training.

MiguelCosta94 changed the title ~~Using 4 GPUs is slower than using just 1~~ Using 4 GPUs for training takes the same time as using just 1 Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using 4 GPUs for training takes the same time as using just 1 #202

Using 4 GPUs for training takes the same time as using just 1 #202

MiguelCosta94 commented Dec 5, 2023 •

edited

Loading

mingukkang commented Jan 31, 2024

Using 4 GPUs for training takes the same time as using just 1 #202

Using 4 GPUs for training takes the same time as using just 1 #202

Comments

MiguelCosta94 commented Dec 5, 2023 • edited Loading

mingukkang commented Jan 31, 2024

MiguelCosta94 commented Dec 5, 2023 •

edited

Loading