-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cfg option --scheduler.warmup_min_lr
#542
Conversation
For setting the starting LR during the warmup period.
Thanks for doing this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just minor comments
olmo/optim.py
Outdated
@@ -480,7 +481,11 @@ def get_max_grad_norm_ratio( | |||
return self._get_max_grad_norm_coeff(initial_max_grad_norm_ratio, step, max_steps) | |||
|
|||
def _linear_warmup(self, initial_lr: float, step: int, warmup_steps: int = 2000) -> float: | |||
return initial_lr * (0.1 + 0.9 * min(step, warmup_steps) / warmup_steps) | |||
if self.warmup_min_lr is not None: | |||
assert initial_lr > self.warmup_min_lr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You could also add a assert self.warmup_min_lr >= 0
for good measure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 75eba56
olmo/optim.py
Outdated
if self.warmup_min_lr is not None: | ||
assert initial_lr > self.warmup_min_lr | ||
return self.warmup_min_lr + (initial_lr - self.warmup_min_lr) * min(step, warmup_steps) / warmup_steps | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a cleaner code perspective, maybe it would be better to have just 1 formula and use self.warmup_min_lr
to set the starting lr. Something like:
warmup_min_lr = self.warmup_min_lr if self.warmup_min_lr is not None else 0.1 * initial_lr
...
return warmup_min_lr + (initial_lr - warmup_min_lr) * min(step, warmup_steps) / warmup_steps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 75eba56
For setting the starting LR during the warmup period. When no set this defaults to the current behavior of starting at 10% of the target LR.