Correctness of "counter average weight reduction" #143

rgdyy · 2024-08-02T10:07:11Z

Hi devs, first, thanks for this clean & useful code base!

I'm not following this line

tevatron/src/tevatron/retriever/modeling/encoder.py

Line 72 in f88e0c7

loss = loss * self.world_size # counter average weight reduction

Since in an earlier line

tevatron/src/tevatron/retriever/modeling/encoder.py

Line 41 in f88e0c7

self.cross_entropy = nn.CrossEntropyLoss(reduction='mean')

reduction is "mean", I believe it is correct to let DDP average the gradient across multiple GPUs, and there's no need to counteract that? Only when DDP does the average, would it be equivalent between 2 GPUs with 8 examples each and 1 GPU with 16 examples, both reduced by "mean".

Thanks in advance.

MXueguang · 2024-08-05T05:08:44Z

Hi @rgdyy , #30 is relevant.

rgdyy · 2024-10-26T05:00:18Z

Hi @MXueguang , sorry that I forgot about this for a long while. As far as I understood, mean of mean is still correctly, mean, so Line 72 loss * world_size should be deleted. What do you think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctness of "counter average weight reduction" #143

Correctness of "counter average weight reduction" #143

rgdyy commented Aug 2, 2024 •

edited

Loading

MXueguang commented Aug 5, 2024

rgdyy commented Oct 26, 2024

Correctness of "counter average weight reduction" #143

Correctness of "counter average weight reduction" #143

Comments

rgdyy commented Aug 2, 2024 • edited Loading

MXueguang commented Aug 5, 2024

rgdyy commented Oct 26, 2024

rgdyy commented Aug 2, 2024 •

edited

Loading