You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add option --jit_compile to compile the model with XLA (only applied in training at the moment)
Fixes and improvements
Improve correctness of gradient accumulation and multi-GPU training by normalizing the gradients with the true global batch size instead of using an approximation
Report the total number of tokens per second in the training logs, in addition to the source and target numbers
Relax the sacreBLEU version requirement to include any 2.x versions