- published in 2016. 7
- Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton
- Similar to batch normalization.
- Works well for recurrent networks with mini-batch.
- Independent with batch size.
- Robust to Input data's scale
- Robust to Weight matrix's scale and shift
- Naturally uploaded scales decreased as learning progresses
- but, Batch Norm still perform better for CNNs.