-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set proper logging levels #70
Comments
The underlying issue is that we are not setting the logging level correctly when creating the logger. It is always set to show warning and above. Because of this, all our logging happens on warning or error levels, even if it is debug. Before the level of individual logged messages can be adjusted, we need to fix the underlying issue. |
Due to the removal of the grad hook, MultipleOptimizer no longer has a method step, it has been replaced with externally_managed_step which takes information about which optimizers need to be stepped. This means that it is no longer compatible with torch.cuda.amp.GradScaler. While fixing this issue, the MultipleOptimizer system was also refactored. - MultipleOptimizer and the OpenNMT Optimizer wrapper switched places: MultipleOptimizer now wraps the other one, instead of the reverse. - The OpenNMT Optimizer was renamed to SubOptimizer for clarity. - SubOptimizer handles learning rate scheduling and grad clipping. - MultipleOptimizer handles creation of multiple optimizers, grad scaling, restoring from checkpoint, backward, zero_grad, deciding which suboptimizers to step, and reporting. - Each optimizer now individually controls its learning rate schedule. When new components with freshly initialized parameters are introduced by the curriculum, they now apply warmup to the LR of these parameters. This should improve stability. - As each optimizer has its own learning rate, it is not obvious what to log in the report_training one-liner. Learning rate was removed. Instead, all optimizers log their learning rates. This is currently log spam, but will be lowered to debug in #70. Each sub-optimizer having its own GradScaler leads to multiple backward passes and RuntimeError. There can only be one GradScaler, which must therefore be the responsibility of MultipleOptimizer. Closes: #71
Due to the removal of the grad hook, MultipleOptimizer no longer has a method step, it has been replaced with externally_managed_step which takes information about which optimizers need to be stepped. This means that it is no longer compatible with torch.cuda.amp.GradScaler. While fixing this issue, the MultipleOptimizer system was also refactored. - MultipleOptimizer and the OpenNMT Optimizer wrapper switched places: MultipleOptimizer now wraps the other one, instead of the reverse. - The OpenNMT Optimizer was renamed to SubOptimizer for clarity. - SubOptimizer handles learning rate scheduling and grad clipping. - MultipleOptimizer handles creation of multiple optimizers, grad scaling, restoring from checkpoint, backward, zero_grad, deciding which suboptimizers to step, and reporting. - Each optimizer now individually controls its learning rate schedule. When new components with freshly initialized parameters are introduced by the curriculum, they now apply warmup to the LR of these parameters. This should improve stability. - As each optimizer has its own learning rate, it is not obvious what to log in the report_training one-liner. Learning rate was removed. Instead, all optimizers log their learning rates. This is currently log spam, but will be lowered to debug in #70. Each sub-optimizer having its own GradScaler leads to multiple backward passes and RuntimeError. There can only be one GradScaler, which must therefore be the responsibility of MultipleOptimizer. Closes: #71
Currently logs are overwhelming and not human-readable.
Would be great to sift through the current messages and set appropriate logging levels (also remove sneaky prints that are surely still around).
The text was updated successfully, but these errors were encountered: