Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does cosine lr scheduler work when the prodigy optimizer is used ? #1787

Open
sipie800 opened this issue Nov 17, 2024 · 1 comment
Open

Comments

@sipie800
Copy link

The prodigy (or Dadapt) calculate lr itself. I know the lr rate argument is actually the ratio on that calculated lr. Does it also work that way if I use cosine_with_restarts? Will the ratio change as the way when a AdamW is used?

@rockerBOO
Copy link
Contributor

It basically changes the LR directly, which is a component of the other adaptive calculations.

So it takes 1.0 to 0.0. You can use tensorboard or wandb to show the LR on charts. In prodigy the LR works like a multiplier.

Cosine with restarts works as you expect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants