Skip to content

Latest commit

 

History

History
8 lines (6 loc) · 582 Bytes

optimization.md

File metadata and controls

8 lines (6 loc) · 582 Bytes

Optimization

  • (ICLR 2020) RAdam
    On the Variance of the Adaptive Learning Rate and Beyond [PDF] [Code]

  • (EMNLP 2020) Admin
    Understanding the Difficulty of Training Transformers [PDF] [Code]

  • (Other models) TorchScope