Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctness of "counter average weight reduction" #143

Open
rgdyy opened this issue Aug 2, 2024 · 2 comments
Open

Correctness of "counter average weight reduction" #143

rgdyy opened this issue Aug 2, 2024 · 2 comments

Comments

@rgdyy
Copy link
Contributor

rgdyy commented Aug 2, 2024

Hi devs, first, thanks for this clean & useful code base!

I'm not following this line

loss = loss * self.world_size # counter average weight reduction

Since in an earlier line

self.cross_entropy = nn.CrossEntropyLoss(reduction='mean')

reduction is "mean", I believe it is correct to let DDP average the gradient across multiple GPUs, and there's no need to counteract that? Only when DDP does the average, would it be equivalent between 2 GPUs with 8 examples each and 1 GPU with 16 examples, both reduced by "mean".

Thanks in advance.

@MXueguang
Copy link
Contributor

Hi @rgdyy , #30 is relevant.

@rgdyy
Copy link
Contributor Author

rgdyy commented Oct 26, 2024

Hi @MXueguang , sorry that I forgot about this for a long while. As far as I understood, mean of mean is still correctly, mean, so Line 72 loss * world_size should be deleted. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants