Skip to content

Latest commit

 

History

History
57 lines (47 loc) · 2.39 KB

mma.md

File metadata and controls

57 lines (47 loc) · 2.39 KB

Monotonic Multihead Attention with Fixed Pre-decision Module

This is a tutorial of training and evaluating a transformer MMA simultaneous model on MUST-C English-Germen Dataset, from SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation.

MuST-C is multilingual speech-to-text translation corpus with 8-language translations on English TED talks.

Data Preparation

See data preparation

ASR Pretraining

The training script for asr is in exp/1a-pretrain_asr.sh.

Pre-trained model

ASR model with Emformer encoder and Transformer decoder. Pre-trained with joint CTC cross-entropy loss.

MuST-C (WER) en-de (V2) en-es
dev 9.65 14.44
tst-COMMON 12.85 14.02
model download download
vocab download download

Monotonic multihead attention with fixed pre-decision module

The training script for mma is in exp/2-mma.sh.

To train a MMA-H model with latency weight 0.1, use

bash 2-mma.sh -t de -m hard_aligned -l 0.1

To train a MMA-IL model with latency weight 0.1, use

bash 2-mma.sh -t de -m infinite_lookback -l 0.1

If you want to finetune from a offline model (latency weight = 0), use

bash 2b-mma_finetune.sh -t de -m infinite_lookback -l 0.1

Inference & Evaluation

The evaluation instruction is in simuleval_instruction.md. The MMA uses the default_agent.py.

{
    "Quality": {
        "BLEU": 22.882280993425326
    },
    "Latency": {
        "AL": 1582.635476344213,
        "AL_CA": 1824.0610745999502,
        "AP": 0.7660114625870339,
        "AP_CA": 0.8291859397671248,
        "DAL": 2127.1755059232137,
        "DAL_CA": 2391.403942353481
    }
}