Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cfg option --scheduler.warmup_min_lr #542

Merged
merged 4 commits into from
Apr 12, 2024
Merged

Conversation

epwalsh
Copy link
Member

@epwalsh epwalsh commented Apr 11, 2024

For setting the starting LR during the warmup period. When no set this defaults to the current behavior of starting at 10% of the target LR.

For setting the starting LR during the warmup period.
@epwalsh epwalsh requested review from 2015aroras and dwadden April 11, 2024 21:08
@dwadden
Copy link
Contributor

dwadden commented Apr 11, 2024

Thanks for doing this!

Copy link
Contributor

@dwadden dwadden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Collaborator

@2015aroras 2015aroras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just minor comments

olmo/optim.py Outdated
@@ -480,7 +481,11 @@ def get_max_grad_norm_ratio(
return self._get_max_grad_norm_coeff(initial_max_grad_norm_ratio, step, max_steps)

def _linear_warmup(self, initial_lr: float, step: int, warmup_steps: int = 2000) -> float:
return initial_lr * (0.1 + 0.9 * min(step, warmup_steps) / warmup_steps)
if self.warmup_min_lr is not None:
assert initial_lr > self.warmup_min_lr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You could also add a assert self.warmup_min_lr >= 0 for good measure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 75eba56

olmo/optim.py Outdated
if self.warmup_min_lr is not None:
assert initial_lr > self.warmup_min_lr
return self.warmup_min_lr + (initial_lr - self.warmup_min_lr) * min(step, warmup_steps) / warmup_steps
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a cleaner code perspective, maybe it would be better to have just 1 formula and use self.warmup_min_lr to set the starting lr. Something like:

warmup_min_lr = self.warmup_min_lr if self.warmup_min_lr is not None else 0.1 * initial_lr
...
return warmup_min_lr + (initial_lr - warmup_min_lr) * min(step, warmup_steps) / warmup_steps

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 75eba56

@epwalsh epwalsh merged commit d2afcaa into main Apr 12, 2024
9 of 11 checks passed
@epwalsh epwalsh deleted the epwalsh/warmup-min-lr branch April 12, 2024 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants