Implement Lion, up to 5x faster than Adam, and more accurate #156

PallHaraldsson · 2023-08-18T15:37:16Z

Motivation and description

Lion (EvoLved Sign Momentum). It is more memory-efficient than Adam as it only keeps track of the momentum. Different from adaptive optimizers, its update has the same magnitude for each parameter calculated through the sign operation. We compare Lion with widely used optimizers, such as Adam and Adafactor, for training a variety of models on different tasks. On image classification, Lion boosts the accuracy of ViT by up to 2% on ImageNet and saves up to 5x the pre-training compute on JFT.

It's 11 lines of pseudo-code (shorter than AdamW)

Possible Implementation

No response

ToucheSir · 2023-08-18T16:52:04Z

Note that subsequent research has shown marginal at best improvements over Adam(W) with more rigorous experimental design. Nevertheless, this should be a straightforward addition if anyone is interested in getting their feet wet with a PR.

chengchingwen · 2023-08-18T17:11:16Z

Isn't it already done #129 ?

ToucheSir · 2023-08-18T17:15:31Z

You're right, I completely forgot about that. Thanks Peter!

PallHaraldsson · 2023-08-19T14:11:16Z

It seems Lion is not documented (nor implemented) at Flux.jl, nor here?

https://fluxml.ai/Flux.jl/stable/training/optimisers/

https://github.com/FluxML/Flux.jl/blob/134882831277844cfab81f2e6ef393634b4215ec/src/optimise/Optimise.jl#L7

I recall looking for it in code, not finding, then for Adam finding "AdamW,RAdam" so I thought I was in the right place ("if not list all there, then more optimizers, such a Lion implemented in .."). Did optimizers belong originally in Flux.jl then moved out to a new package? Or well reexported in Flux.jl for compatibility (I can understand that).

In general do you think you have the best optimizers implemented (somewhere)?

[I know were activation functions are, it seems squareplus is not implemented (which seems like a good softplus alternative), I could, or add it to my NNLib.jl issue. I also think FlashAttention is missing and its improved version 2.]

chengchingwen · 2023-08-19T14:19:32Z

Lion is implemented here (Optimisers.jl). I believe the optimiser/Optimise.jl in Flux.jl is somehow out-dated and should be ignored.

mcabbott · 2023-08-19T14:48:50Z

Or well reexported in Flux.jl for compatibility

At present this is a little complicated. Flux still exports its own (optimiser/Optimise.jl) optimisers. But has methods to auto-translate them to their Optimisers.jl equivalents. The hope is to delete all of that soon -- perhaps FluxML/Flux.jl#1986 is the issue.

Having Flux re-export any newly added rules (for which it has no old equivalents, like Lion) would be fine. They could be temporarily included in the docs. Or perhaps simpler, some note to look at Optimisers.jl for more could be added somewhere.

ToucheSir · 2023-08-19T15:56:02Z

There is indeed such a note in https://fluxml.ai/Flux.jl/stable/training/optimisers/. We'd want to make the preceding paragraph more strongly worded however, as I think the replacement is basically done and no longer "gradual".

Now, one thing I did notice is that Lion is not currently included in the Optimisers.jl docs build. That should be a simple enough fix.

ToucheSir transferred this issue from FluxML/Flux.jl Aug 18, 2023

ToucheSir added enhancement New feature or request good first issue Good for newcomers labels Aug 18, 2023

ToucheSir closed this as completed Aug 18, 2023

ToucheSir mentioned this issue Aug 19, 2023

Add Lion to docs #157

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Lion, up to 5x faster than Adam, and more accurate #156

Implement Lion, up to 5x faster than Adam, and more accurate #156

PallHaraldsson commented Aug 18, 2023

ToucheSir commented Aug 18, 2023 •

edited

Loading

chengchingwen commented Aug 18, 2023

ToucheSir commented Aug 18, 2023

PallHaraldsson commented Aug 19, 2023 •

edited

Loading

chengchingwen commented Aug 19, 2023

mcabbott commented Aug 19, 2023

ToucheSir commented Aug 19, 2023

Implement Lion, up to 5x faster than Adam, and more accurate #156

Implement Lion, up to 5x faster than Adam, and more accurate #156

Comments

PallHaraldsson commented Aug 18, 2023

Motivation and description

Possible Implementation

ToucheSir commented Aug 18, 2023 • edited Loading

chengchingwen commented Aug 18, 2023

ToucheSir commented Aug 18, 2023

PallHaraldsson commented Aug 19, 2023 • edited Loading

chengchingwen commented Aug 19, 2023

mcabbott commented Aug 19, 2023

ToucheSir commented Aug 19, 2023

ToucheSir commented Aug 18, 2023 •

edited

Loading

PallHaraldsson commented Aug 19, 2023 •

edited

Loading