FastSpeech2 article and FastSpeech article.
Inference result is audio, but Github supports only video+audio formats.
0005-audio-l1-p1-e1.mov
You can also download a folder with tts-results from Google Drive, it includes 27 audios with different length, pitch and energy for the first three inputs from test_model/input.txt
.
- Use python3.9
conda create -n fastspeech2 python=3.9 && conda activate fastspeech2
- Install libraries
pip3 install -r requirements.txt
- Download data
bash scripts/download_data.sh
- Preprocess data: save pitch and energy
python3 scripts/preprocess_data.py
- Download my final FastSpeech2 checkpoint
python3 scripts/download_checkpoint.py
- Run for training
python3 train.py -c configs/train.json
Final model was trained with train.json
config.
- Run for testing
python3 test.py
test.py
include such arguments:
- Config path:
-c, --config, default="configs/test.json"
- Create multiple audio variants with different length, pitch and energy
-t, --test, default=False
- Increase or decrease audio speed:
-l, --length-control, default=1
- Increase or decrease audio pitch:
-p, --pitch-control, default=1
- Increase or decrease audio energy:
-e, --energy-control, default=1
- Checkpoint path:
-cp, --checkpoint, default="test_model/tts-checkpoint.pth"
- Input texts path:
-i, --input, test_model/input.txt
- Waveglow weights path:
-w, --waveglow, default="waveglow/pretrained_model/waveglow_256channels.pt"
Results will be saved in the test_model/results
, you can see example in this folder.
https://api.wandb.ai/links/tgritsaev/rkir8sp9 (English only)
This repository is based on a heavily modified fork of pytorch-template repository. FastSpeech2 impementation is based on the code from HSE "Deep Learning in Audio" course seminar and official FastSpeech2 repository.