Too huge step_size at initialization stage #15

e-sha · 2019-10-03T13:30:09Z

I found that step_size is too high in the initial 5 steps.
The problem is in the code:

if N_sma >= self.N_sma_threshhold:
    step_size = math.sqrt((1 - beta2_t) * (N_sma - 4) / (N_sma_max - 4) * (N_sma - 2) / N_sma * N_sma_max / (N_sma_max - 2)) / (1 - beta1 ** state['step'])
else:
    step_size = 1.0 / (1 - beta1 ** state['step'])

If betas are set to (0.9, 0.999) the internal variables are changed as following:

state['step']| step_size
------------------------------
        1    |     10
        2    |5.26315789
        3    |3.6900369
        4    |2.90782204
        5    |2.44194281
        6    |0.00426327
        7    |0.00524248
        8    |0.00607304
        9    |0.00681674
       10    |0.00750596

Note, that step_size doesn't depend on gradient value and it scales learning_rate.
Thus RAdam aggressively moves weights from their initial values, even if they have a good initialization.

Is it better to set step_size equal to 0 if N_sma < self.N_sma_threshhold?

The text was updated successfully, but these errors were encountered:

lessw2020 · 2019-10-03T14:26:06Z

Hi @e-sha - thanks for pointing this out!
Offhand, yes, it looks like 0 would be a better result but will need to test and see.
Can you test it if you have time today? I will try and test it later this evening and then can update if that appears to be the best option (which it appears to be).
I have some other work from a couple other optimizers that might be better than 0 for first five but won't have time to test that until later (see RangerQH for example).
Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too huge step_size at initialization stage #15

Too huge step_size at initialization stage #15

e-sha commented Oct 3, 2019

lessw2020 commented Oct 3, 2019

Too huge step_size at initialization stage #15

Too huge step_size at initialization stage #15

Comments

e-sha commented Oct 3, 2019

lessw2020 commented Oct 3, 2019