Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 530 Bytes

200701 Go Wide, Then Narrow.md

File metadata and controls

7 lines (4 loc) · 530 Bytes

https://arxiv.org/abs/2007.00811

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks (Denny Zhou, Mao Ye, Chen Chen, Tianjian Meng, Mingxing Tan, Xiaodan Song, Quoc Le, Qiang Liu, Dale Schuurmans)

  1. 폭이 넓은 모델 학습 2. 폭이 좁은 모델에 linear 레이어를 붙여서 폭을 맞추고 feature matching 3. 파인튜닝 후 linear 레이어 머지. resnet50/bert base로 resnet101/bert large 성능 달성. glorified feature kd인 것 같긴 한데 어쨌든 좋은 결과.

#distillation #lightweight