1.RNN,Seq2Seq,Attention简介:
完全图解RNN、RNN变体、Seq2Seq、Attention机制
补充知识:LSTM
2.Attention存在的问题:
(I)对噪声敏感
(II)对长句识别性能下降
GMIS 2017 | 腾讯AI Lab副主任俞栋:语音识别研究的四大前沿方向
3.解读transformer:
Attention is All You Need | 每周一起读
4.解读CNN机器翻译:
Facebook提出全新CNN机器翻译:准确度超越谷歌而且还快九倍
1.attention用于语音识别前的baseline(用于翻译),对于长句识别效果急剧变差:
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning
to align and translate. arXiv:1409.0473, September 2014
2.第一篇将attention用于语音识别的文章(提出hybird attention机制,解决attention的位置信息问题,提出attention加窗机制
,数据集:TIMIT
):
J.Chorowski, D.Bahdanau, D.Serdyuk, K.Cho, and Y.Bengio."Attention-based models for speech recognition"
3.attention加窗改进,RNN更换为GRU在LVCSR任务上的应用(数据集:(WSJ) corpus (available as LDC93S6B and LDC94S13B)
):
D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, and Y. Bengio,“End-to-end attention- based large vocabulary speech recognition,”
4.最经典的attention语音识别模型:
W. Chan, N. Jaitly, Q. Le, and O. Vinyals, “Listen, attend and spell: Aneural network for large vocabulary conversational speech recognition”
5.CTC+Attention(CTC强化了对齐的单调性,且CTC加快网络训练速度,数据集:WSJ1 (81 hours)
,WSJ0 (15 hours)
, CHiME-4 (18 hours)
):
S. Kim, T. Hori, and S. Watanabe, “Joint CTC-attention based end-to-end speech recognition using multi-task learning”
6.transformer:
N. Shazeer, Niki Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin "Attention Is All You Need"
8.CNN+RNN的E2E ASR:
VERY DEEP CONVOLUTIONAL NETWORKS FOR END-TO-END SPEECH RECOGNITION
9.CNN+RNN attention+CTC(Encoder在RNN前加入VGGnet进行特征提取,数据集:WSJ0
,CHIME-4
):
Kim S, Hori T, Watanabe S. Joint CTC-attention based end-to-end speech recognition using multi-task learning
10.self-attention+CTC:
SELF-ATTENTION NETWORKS FOR CONNECTIONIST TEMPORAL CLASSIFICATION IN SPEECH RECOGNITION
2.pytorch E2EASR代码
需要解决的问题:如何多GPU运行