Add gradient noise scale logging #2019

rockerBOO · 2025-03-30T18:40:38Z

In this paper, we demonstrate that a simple and easy-to-measure statistic called the gradient noise scale predicts the largest useful batch size across many domains and applications, including a number of supervised learning datasets (MNIST, SVHN, CIFAR-10, ImageNet, Billion Word), reinforcement learning domains (Atari and Dota), and even generative model training (autoencoders on SVHN). We find that the noise scale increases as the loss decreases over a training run and depends on the model size primarily through improved model performance.

Larger batch sizes	Simple noise scale

Because we accumulate the gradient for all gradients in the LoRA network, we would want to limit the number of batches (steps) but we can associate the loss and the gradient noise scale to find the optimal batch size according to the paper.

rockerBOO · 2025-04-03T20:02:23Z

Added noise_variance and critical_batch_size which may be a little misleading but can refer to the paper for more info. I'm accumulating the dynamic batch size as part of the critical batch size calculation (like bucketing can create uneven amount of batches) so should be more accurate than a flat batch size configuration value.

I have longer tests but I have added these since so I will need to do some more tests. I think the idea is noise variance should be flat if it is more ideal, and to change the batch size if it's not flat. Gradient noise scale and critical batch size are relatable in the paper but their relevance is architecture and dataset specific so you may find their relationship dynamic more relatable for different runs than what might be a good valuation. The paper doesn't go into diffusion models since it's from 2018 but maybe newer papers have looked further into this dynamic.

rockerBOO added 4 commits March 30, 2025 14:34

Add gradient noise scale logging

fcdae99

use network

90bcab0

Accumulate gradient sums

df8e1ac

Add noise variance. Add critical batch size based on variable batch size

11bdf9b

Move batch size syncing outside validation

bf2e5ab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gradient noise scale logging #2019

Add gradient noise scale logging #2019

rockerBOO commented Mar 30, 2025

rockerBOO commented Apr 3, 2025

Add gradient noise scale logging #2019

Are you sure you want to change the base?

Add gradient noise scale logging #2019

Conversation

rockerBOO commented Mar 30, 2025

rockerBOO commented Apr 3, 2025