-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training does not work with 1 GPU #2
Comments
I attempted this fix:
However, this halves training throughput with insgen enabled and according to the paper, "the extra computing load is extremely small and the training efficiency is barely affected", so I assume this is not doing the right thing. |
@kata44 I alse want to train with only 1GPU, and I think only modified ‘concat_all_gather()’ function is not correct. as the code said in _batch_shuffle_ddp()[https://github.com/genforce/insgen/blob/52bda7cfe59094fbb2f533a0355fff1392b0d380/training/contrastive_head.py#L73-L75] and _batch_unshuffle_ddp() |
Hi, I am not familiar with multiple GPUs training but I think the bug is triggered by Now look at line 51 |
Have you solved this problem? Can you train with one GPU? |
Hi!Can I delete this line for normal training? |
I think the issue is simply that the process groups need to be initialised even if there is only one GPU see the patch in #5 |
There seems to be a problem with the contrastive loss when using 1 GPU to train, training only works when setting no_insgen=true.
The output is:
The text was updated successfully, but these errors were encountered: