Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory savings #6

Closed
MoeinSorkhei opened this issue Dec 2, 2024 · 3 comments
Closed

Memory savings #6

MoeinSorkhei opened this issue Dec 2, 2024 · 3 comments

Comments

@MoeinSorkhei
Copy link

MoeinSorkhei commented Dec 2, 2024

Hi, great work.

I noticed from another issue that Flora is applied to Adafactor and is compared against other methods applied to Adam (in the figure below).
image

  1. Wouldn't be more reasonable to make comparisons when all methods applied to the same optimizer style, such as Adam?
    Currently it is hard to separate the contribution of Flora to memory savings from Adafactor alone.

  2. Do you have any comment regarding the training speed of models using Flora? How does the training time using Flora compares against the standard Adafactor optimizer?

Best,

@yongchanghao
Copy link
Collaborator

Thank you for asking! 1. In the paper, all of our main baselines used Adafactor as the default optimizer so they are already memory-efficient. However, since many people use Adam more often in practice, the mentioned figure chose Adam to demonstrate the overall effect. 2. The speed is a little slower because we need to generate and apply random projections in our implementation. But the overhead is usually negligible, and I would expect that writing some specific CUDA kernels could largely accelerate Flora.

Copy link

Stale due to inactivity. Closing in 3 days if no further activities.

Copy link

Close due to inactivity

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants