You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reduction is "mean", I believe it is correct to let DDP average the gradient across multiple GPUs, and there's no need to counteract that? Only when DDP does the average, would it be equivalent between 2 GPUs with 8 examples each and 1 GPU with 16 examples, both reduced by "mean".
Thanks in advance.
The text was updated successfully, but these errors were encountered:
Hi @MXueguang , sorry that I forgot about this for a long while. As far as I understood, mean of mean is still correctly, mean, so Line 72 loss * world_size should be deleted. What do you think?
Hi devs, first, thanks for this clean & useful code base!
I'm not following this line
tevatron/src/tevatron/retriever/modeling/encoder.py
Line 72 in f88e0c7
Since in an earlier line
tevatron/src/tevatron/retriever/modeling/encoder.py
Line 41 in f88e0c7
reduction is "mean", I believe it is correct to let DDP average the gradient across multiple GPUs, and there's no need to counteract that? Only when DDP does the average, would it be equivalent between 2 GPUs with 8 examples each and 1 GPU with 16 examples, both reduced by "mean".
Thanks in advance.
The text was updated successfully, but these errors were encountered: