Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using 4 GPUs for training takes the same time as using just 1 #202

Open
MiguelCosta94 opened this issue Dec 5, 2023 · 1 comment
Open

Comments

@MiguelCosta94
Copy link

MiguelCosta94 commented Dec 5, 2023

I'm training a BigGAN with differential augmentation and LeCam optimization on a custom dataset. My setup features 4 NVIDIA RTX 3070 and I'm running on Ubuntu 20.04. I observe that running the training on the 4 GPUs, using Distributed Data Parallel takes the same time as performing the training using a single GPU. Am I doing something wrong?

For training using a single GPU, I'm using the following command:
CUDA_VISIBLE_DEVICES=0 python3 src/main.py -t -hdf5 -l -std_stat -std_max 64 -std_step 64 -metrics fid is prdc -ref "train" -cfg src/configs/VWW/BigGAN-DiffAug-LeCam.yaml -data ../Datasets/vw_coco2014_96_GAN -save SAVE_PATH_VWW -mpc --post_resizer "friendly" --eval_backbone "InceptionV3_tf"

For training using the 4 GPUs, I'm using the following commands:
export MASTER_ADDR=localhost
export MASTER_PORT=1234
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 src/main.py -t -DDP -tn 1 -cn 0 -std_stat -std_max 64 -std_step 64 -metrics fid is prdc -ref "train" -cfg src/configs/VWW/BigGAN-DiffAug-LeCam.yaml -data ../Datasets/vw_coco2014_96_GAN -save SAVE_PATH_VWW -mpc --post_resizer "friendly" --eval_backbone "InceptionV3_tf"

@MiguelCosta94 MiguelCosta94 changed the title Using 4 GPUs is slower than using just 1 Using 4 GPUs for training takes the same time as using just 1 Dec 5, 2023
@mingukkang
Copy link
Collaborator

Could you please check the batch size used in the training process?

If you are using 1 GPU with a batch size of 256, it is advisable to switch to 4 GPUs, each with a batch size of 64, in order to accelerate training. It's important not to use the 256 batch size for each GPU for faster training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants