-
-
Notifications
You must be signed in to change notification settings - Fork 167
Custom Scheduler
This page is about the custom scheduler feature found on the training tab. This option allows you to use a scheduler not currently found in OneTrainer, but may exist elsewhere, for example in PyTorch. This feature was primarily added to be able to use the OneCycle scheduler.
In order to use the custom scheduler, you need to select it on the training tab.
In order to actually define the custom scheduler, you need to click the three dots (...) which will open up the custom scheduler window.
On this window, normally you will be presented with a blank template which you will need to fill in with information in order to use your custom scheduler of choice. There are tooltips for most fields if you hold your mouse over them.
- Class Name (Default: Blank) - This is where you define the module and class name of the scheduler you want to use. In this example it is torch.optim.lr_scheduler.MultiStepLR
- add parameter - This button will add two blank fields. The parameter name, and the value that your custom scheduler is looking for. There are some OneTrainer variables that you can pass forward, which the tooltip shows.
In the above example, two parameters are passed to the custom scheduler MultiStepLR. The first is the milestones with values of 50,100,150 and the gamma of .5. In this example, the custom scheduler will reduce the learning rate by 0.5 at steps of 50, 100 and 150 (after warmup is complete if you use it). This would create the following LRs as an example for a pivotal tuning.
Step | LoRA Unet LR | Emebedding LR |
---|---|---|
0 | 1e-4 | 1e-3 |
50 | 5e-5 | 5e-4 |
100 | 2.5e-5 | 2.5e-4 |
150+ | 1.25e-5 | 1.25e-4 |
PyTorch includes custom schedulers not standardly include in OneTrainer. They can be found here: (https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate)
Each scheduler listed here will list the name and parameters it is looking for.
Some Examples:
- MultiStep Linear: https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.MultiStepLR.html#torch.optim.lr_scheduler.MultiStepLR
- OneCycle: https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.OneCycleLR.html#torch.optim.lr_scheduler.OneCycleLR
A cycle (meaning it will repeat) where you can control the steps and min LR of the cosine. A normal cosine scheduler will end at a learning rate near 0, where this custom cosine can end at a learning rate value that you control. This is a cold cycle, in that the learning rate will gradually increase at the start of the next cycle (in comparison to warm restart).
- Class Name:
torch.optim.lr_scheduler.CosineAnnealingLR
- Parameter:
T_max
This sets the number of steps of the cycle. If this value is not equal to the OneTrainer value, the cosine will start to increase again. - Parameter:
eta_min
This sets the minimum learning rate of the cosine. For example using a value of 5e-5 would have the cosine bottom out at this value instead of 0.
A cycle (meaning it will repeat) with a slow warmup to a peak, and then a ramp down towards a final value. OneCycle has many parameters that can be adjusted, but this will focus on the basic ones, setting the length and the values for the Learning Rates.
- Class Name:
torch.optim.lr_scheduler.OneCycleLR
- Parameter:
max_lr
This sets the peak of OneCycle. For example a value of 3e-4 will set the peak LR to this value. This value is reached at 30% of your steps by default. (You need to feed this to OneCycle, it will not use the LR from OneTrainer) - Parameter:
div_factor
This parameter sets the initial learning rate of the cycle using the formula initial_lr = max_lr/div_factor. With the above example, using a div_factor of 25 would set the initial LR at 1.2e-5. - Parameter:
final_div_factor
This parameter sets the final learning rate of the cycle using the formula min_lr = initial_lr/final_div_factor. With the above examples, using a final_div_factor of .2 (2e-1) would set the min_lr to 6e-5. - Parameter:
total_steps
This parameter sets the number of steps in the cycle. If this value does not equal the OneTrainer value, the cycle will repeat, or end early.
Medium has various articles on the schedulers, including OneCycle, but are some behind a paywall.
This page goes deeply into OneCycle: https://www.deepspeed.ai/tutorials/one-cycle/
This image shows what each scheduler in PyTorch looks like graphically: