-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix conflict between fp16 and deterministic sampling
Due to the removal of the grad hook, MultipleOptimizer no longer has a method step, it has been replaced with externally_managed_step which takes information about which optimizers need to be stepped. This means that it is no longer compatible with torch.cuda.amp.GradScaler. While fixing this issue, the MultipleOptimizer system was also refactored. - MultipleOptimizer and the OpenNMT Optimizer wrapper switched places: MultipleOptimizer now wraps the other one, instead of the reverse. - The OpenNMT Optimizer was renamed to SubOptimizer for clarity. - SubOptimizer handles learning rate scheduling and grad clipping. - MultipleOptimizer handles creation of multiple optimizers, grad scaling, restoring from checkpoint, backward, zero_grad, deciding which suboptimizers to step, and reporting. - Each optimizer now individually controls its learning rate schedule. When new components with freshly initialized parameters are introduced by the curriculum, they now apply warmup to the LR of these parameters. This should improve stability. - As each optimizer has its own learning rate, it is not obvious what to log in the report_training one-liner. Learning rate was removed. Instead, all optimizers log their learning rates. This is currently log spam, but will be lowered to debug in #70. Each sub-optimizer having its own GradScaler leads to multiple backward passes and RuntimeError. There can only be one GradScaler, which must therefore be the responsibility of MultipleOptimizer. Closes: #71
- Loading branch information
Showing
7 changed files
with
231 additions
and
230 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.