Work in progress: Inspired from this paper Zero-Shot Foreign Accent Conversion without a Native Reference
This is the translator module as shown in the above paper.
- Install ffmpeg.
- Install Kaldi
- Install PyKaldi
- Install packages using environment.yml file.
- Download pretrained TDNN-F model, extract it, and set
PRETRAIN_ROOT
inkaldi_scripts/extract_features_kaldi.sh
to the pretrained model directory.
- Acoustic Model: LibriSpeech. Download pretrained TDNN-F acoustic model here.
- You also need to set
KALDI_ROOT
andPRETRAIN_ROOT
inkaldi_scripts/extract_features_kaldi.sh
accordingly.
- You also need to set
- Vector Quantization: [ARCTIC and L2-ARCTIC, see here for detailed training process.
- Translator seq2seq (i.e., Seq2seq model): ARCTIC and L2-ARCTIC. Please see here for a merged version. All the pretrained the models are available (To be updated) here
datatset_root
├── speaker 1
├── speaker 2
│ ├── wav # contains all the wav files from speaker 2
│ └── kaldi # Kaldi files (auto-generated after running kaldi-scripts
.
.
└── speaker N
- Use Kaldi to extract BNF for individual speakers (Do it for all speakers)
./kaldi_scripts/extract_features_kaldi.sh /path/to/speaker
- Preprocessing
python preprocess_bnfs.py path/to/dataset
python make_data.py #Edit the file to specify dataset path
-
Vector Quantize the BNFs see here
-
Setting Training params See conf/
-
Training Model
./train.sh
- Synthesizer Code and Training see here