-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Lion, up to 5x faster than Adam, and more accurate #156
Comments
Note that subsequent research has shown marginal at best improvements over Adam(W) with more rigorous experimental design. Nevertheless, this should be a straightforward addition if anyone is interested in getting their feet wet with a PR. |
Isn't it already done #129 ? |
You're right, I completely forgot about that. Thanks Peter! |
It seems Lion is not documented (nor implemented) at Flux.jl, nor here? https://fluxml.ai/Flux.jl/stable/training/optimisers/ I recall looking for it in code, not finding, then for Adam finding "AdamW,RAdam" so I thought I was in the right place ("if not list all there, then more optimizers, such a Lion implemented in .."). Did optimizers belong originally in Flux.jl then moved out to a new package? Or well reexported in Flux.jl for compatibility (I can understand that). In general do you think you have the best optimizers implemented (somewhere)? [I know were activation functions are, it seems squareplus is not implemented (which seems like a good softplus alternative), I could, or add it to my NNLib.jl issue. I also think FlashAttention is missing and its improved version 2.] |
Lion is implemented here (Optimisers.jl). I believe the optimiser/Optimise.jl in Flux.jl is somehow out-dated and should be ignored. |
At present this is a little complicated. Flux still exports its own (optimiser/Optimise.jl) optimisers. But has methods to auto-translate them to their Optimisers.jl equivalents. The hope is to delete all of that soon -- perhaps FluxML/Flux.jl#1986 is the issue. Having Flux re-export any newly added rules (for which it has no old equivalents, like Lion) would be fine. They could be temporarily included in the docs. Or perhaps simpler, some note to look at Optimisers.jl for more could be added somewhere. |
There is indeed such a note in https://fluxml.ai/Flux.jl/stable/training/optimisers/. We'd want to make the preceding paragraph more strongly worded however, as I think the replacement is basically done and no longer "gradual". Now, one thing I did notice is that Lion is not currently included in the Optimisers.jl docs build. That should be a simple enough fix. |
Motivation and description
https://arxiv.org/abs/2302.06675
It's 11 lines of pseudo-code (shorter than AdamW)
Possible Implementation
No response
The text was updated successfully, but these errors were encountered: