-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is adabelief the best optimizer? #44
Comments
"This work considers the update step in first-order methods. Other directions include Lookahead [42] which updates “fast” and “slow” weights separately, and is a wrapper that can combine with other optimizers; variance reduction methods [43, 44, 45] which reduce the variance in gradient; and LARS [46] which uses a layer-wise learning rate scaling. AdaBelief can be combined with these methods. Other variants of Adam have been proposed (e.g. NosAdam [47], Sadam [48] and Adax [49])." |
I tested adabelief on my task, it is worse than ranger. |
@hiyyg Could you post your task, network, and hyper-params of two optimizers for your task? |
It was an internal task, sorry I can not share it. The hyper params are all the default for both optimizers. |
@hiyyg which version of adabelief did you use? Not sure if it's caused by eps, quickly skimming over the ranger code, default uses eps=1e-5, equivalent to eps=1e-10 for AdaBelief. The most recent (0.2) default eps is 1e-16 for AdaBelief, equivalent to an eps=1e-8 for Adam. The difference in eps is crucial for adaptive optimizers, this could be the reason causing the performance difference. |
Thanks. I guess I used the version around 28 Dec 2020. I think your information might be very useful for users who want to compare Adabelief with Ranger. |
Thanks for the info. 28 Dec 2020 is about v0.1 and the default eps=1e-16 for AdaBelief |
https://paperswithcode.com/paper/adabelief-optimizer-adapting-stepsizes-by-the
The text was updated successfully, but these errors were encountered: