E-Branchformerモデルの検証 #8

fujimotos · 2023-02-06T00:38:19Z

チケットのゴール

現在のReazonSpeechはConformerベースの音声認識モデルである。
E-Branchformerベースのレシピを作成し、モデルの訓練を行う。
構築したモデルを検証して、精度改善等の検証を行う。

参考リンク

Kim, Kwangyoun, et al. "E-Branchformer: Branchformer with Enhanced merging for speech recognition." 2022 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2023.
- https://arxiv.org/abs/2210.00077
ReazonSpeech学習レシピ（現行）
- https://github.com/espnet/espnet/blob/master/egs2/reazonspeech/asr1/conf/train_asr_conformer.yaml

euyniy · 2023-04-07T04:55:07Z

この前ESPnet2 Librispeechのレシピを使ってreazonspeech medium (500h~)を元で31epochの訓練を走ってみました。ログは以下です（まだサチっていないよう）：

2023-02-16 07:43:47,882 (trainer:338) INFO: 31epoch results: 
[train] iter_time=2.915e-04, forward_time=0.100, loss_ctc=40.290, loss_att=30.836, acc=0.690, loss=33.672, 
backward_time=0.117, optim_step_time=0.083, optim0_lr0=2.697e-04, train_time=28.888, time=1 hour, 25 minutes and 
30.11 seconds, total_count=550157, gpu_max_cached_mem_GB=4.861, 
[valid] loss_ctc=21.431, cer_ctc=0.259, loss_att=16.760, acc=0.834, cer=0.222, wer=0.849, loss=18.161, time=44.84 
seconds, total_count=3255, gpu_max_cached_mem_GB=4.861,

Loss	CER

参考として、今のconformer-transformerモデル（パラメーターが変わりますが）はこういう感じです。

2023-02-11 03:22:58,191 (trainer:338) INFO: 31epoch results: 
[train] iter_time=2.541e-04, forward_time=0.077, loss_ctc=31.263, loss_att=17.444, acc=0.787, loss=21.590, 
backward_time=0.063, optim_step_time=0.057, optim0_lr0=7.346e-04, train_time=6.864, time=34 minutes and 31.61 
seconds, total_count=280519, gpu_max_cached_mem_GB=4.801, 
[valid] loss_ctc=22.093, cer_ctc=0.266, loss_att=12.771, acc=0.859, cer=0.194, wer=0.799, loss=15.567, time=12.59 
seconds, total_count=1674, gpu_max_cached_mem_GB=4.801, [att_plot] time=1 minute and 6.61 seconds, total_count=0, 
gpu_max_cached_mem_GB=4.801

今のところ大規模で回す計画はないですが、branchformerの実験に関して何か進捗があったらまたここに貼らせていただきます。

sw005320 · 2023-04-07T12:57:14Z

@pyf98, maybe you can help them.
You can translate this into English (or Chinese).

I think their learning rate is too low in this scenario, or there is something wrong with the actual batchsize (with multiple GPUs or gradient accumulation).

pyf98 · 2023-04-07T17:39:51Z

I'm not sure what Conformer and E-Branchformer configs are being used exactly. I feel some configs might have issues.

The Conformer config provided above has 12 layers without Macaron FFN. The input layer downsamples 6 times. These are different from the configs in other recipes (e.g., LibriSpeech). If you simply use the same E-Branchformer config from LibriSpeech, there can be some issues. For example, the model can be much larger.

In our experiments, we scale Conformer and E-Branchformer to have similar parameter counts. In such cases, we usually do not need to tune the training hyper-parameters again. We have added E-Branchformer configs and results in many other ESPnet2 recipes covering various types of speech.

euyniy · 2023-04-11T07:54:00Z

@pyf98 @sw005320
Thanks for your input!
The experiment above was conducted with this config on 500h~ of data. The E-Branchformer model has 145M params and the Conformer used for comparison has 91M. (btw, In our latest released conformer model we enabled Macaron FFN)

Will check the lr/accum_grads/multi-gpu/downsampling configurations and other recipes as well when we run more experiments on larger dataset!

pyf98 · 2023-04-12T21:42:56Z

Thanks for the information. When comparing these models (E-Branchformer vs Conformer), we typically just replaced the encoder config (at a similar model size) but kept the other training configs the same. This worked well in general.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

E-Branchformerモデルの検証 #8

E-Branchformerモデルの検証 #8

fujimotos commented Feb 6, 2023

euyniy commented Apr 7, 2023 •

edited

Loading

sw005320 commented Apr 7, 2023

pyf98 commented Apr 7, 2023

euyniy commented Apr 11, 2023

pyf98 commented Apr 12, 2023 •

edited

Loading

E-Branchformerモデルの検証 #8

E-Branchformerモデルの検証 #8

Comments

fujimotos commented Feb 6, 2023

チケットのゴール

参考リンク

euyniy commented Apr 7, 2023 • edited Loading

sw005320 commented Apr 7, 2023

pyf98 commented Apr 7, 2023

euyniy commented Apr 11, 2023

pyf98 commented Apr 12, 2023 • edited Loading

euyniy commented Apr 7, 2023 •

edited

Loading

pyf98 commented Apr 12, 2023 •

edited

Loading