clip_grad_norm applied to scaled gradients #64

fpgaminer · 2023-01-03T23:57:27Z

On this line, grad clipping occurs:

Line 931 in 27d301c

torch.nn.utils.clip_grad_norm_(unet.parameters(), 1.0)

However, if fp16 is enabled then the clipping would be applied to the scaled gradients, due to GradScaler.

According to PyTorch documentation (https://pytorch.org/docs/master/notes/amp_examples.html#gradient-clipping), the gradients should be unscaled before clipping.

So, this appears to be a bug and could cause fp16 training to result in worse performance than it otherwise should.

The text was updated successfully, but these errors were encountered:

Provide feedback