You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note, that step_size doesn't depend on gradient value and it scales learning_rate.
Thus RAdam aggressively moves weights from their initial values, even if they have a good initialization.
Is it better to set step_size equal to 0 if N_sma < self.N_sma_threshhold?
The text was updated successfully, but these errors were encountered:
Hi @e-sha - thanks for pointing this out!
Offhand, yes, it looks like 0 would be a better result but will need to test and see.
Can you test it if you have time today? I will try and test it later this evening and then can update if that appears to be the best option (which it appears to be).
I have some other work from a couple other optimizers that might be better than 0 for first five but won't have time to test that until later (see RangerQH for example).
Thanks!
I found that step_size is too high in the initial 5 steps.
The problem is in the code:
If betas are set to (0.9, 0.999) the internal variables are changed as following:
Note, that step_size doesn't depend on gradient value and it scales learning_rate.
Thus RAdam aggressively moves weights from their initial values, even if they have a good initialization.
Is it better to set step_size equal to 0 if N_sma < self.N_sma_threshhold?
The text was updated successfully, but these errors were encountered: