Skip to content

Latest commit

 

History

History
8 lines (5 loc) · 517 Bytes

210922 Scale Efficiently.md

File metadata and controls

8 lines (5 loc) · 517 Bytes

https://arxiv.org/abs/2109.10686

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers (Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler)

트랜스포머를 어떻게 키우는 것이 효과적인가. 전반적으로 다른 dimension을 키우는 것보다 깊이를 깊게 만드는 쪽이 낫다는 느낌이네요. 구글 사람들 요즘 llm을 쭉쭉 밀어붙이고 있군요.

#transformer