Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too huge step_size at initialization stage #15

Open
e-sha opened this issue Oct 3, 2019 · 1 comment
Open

Too huge step_size at initialization stage #15

e-sha opened this issue Oct 3, 2019 · 1 comment

Comments

@e-sha
Copy link

e-sha commented Oct 3, 2019

I found that step_size is too high in the initial 5 steps.
The problem is in the code:

if N_sma >= self.N_sma_threshhold:
    step_size = math.sqrt((1 - beta2_t) * (N_sma - 4) / (N_sma_max - 4) * (N_sma - 2) / N_sma * N_sma_max / (N_sma_max - 2)) / (1 - beta1 ** state['step'])
else:
    step_size = 1.0 / (1 - beta1 ** state['step'])

If betas are set to (0.9, 0.999) the internal variables are changed as following:

state['step']| step_size
------------------------------
        1    |     10
        2    |5.26315789
        3    |3.6900369
        4    |2.90782204
        5    |2.44194281
        6    |0.00426327
        7    |0.00524248
        8    |0.00607304
        9    |0.00681674
       10    |0.00750596

Note, that step_size doesn't depend on gradient value and it scales learning_rate.
Thus RAdam aggressively moves weights from their initial values, even if they have a good initialization.

Is it better to set step_size equal to 0 if N_sma < self.N_sma_threshhold?

@lessw2020
Copy link
Owner

Hi @e-sha - thanks for pointing this out!
Offhand, yes, it looks like 0 would be a better result but will need to test and see.
Can you test it if you have time today? I will try and test it later this evening and then can update if that appears to be the best option (which it appears to be).
I have some other work from a couple other optimizers that might be better than 0 for first five but won't have time to test that until later (see RangerQH for example).
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants