https://arxiv.org/abs/2110.13900
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing (Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Micheal Zeng, Furu Wei)
speech pretraining. 더 많은 더 다양한 데이터 + mixing augmentation + 모델 보강. 성능이 막 펑펑 뛰어오르네요.
#speech #pretraining