Skip to content

tgritsaev/fastspeech2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text to Speech with FastSpeech2

FastSpeech2 article and FastSpeech article.

Example

Inference result is audio, but Github supports only video+audio formats.

0005-audio-l1-p1-e1.mov

You can also download a folder with tts-results from Google Drive, it includes 27 audios with different length, pitch and energy for the first three inputs from test_model/input.txt.

Installation guide

  1. Use python3.9
conda create -n fastspeech2 python=3.9 && conda activate fastspeech2
  1. Install libraries
pip3 install -r requirements.txt
  1. Download data
bash scripts/download_data.sh
  1. Preprocess data: save pitch and energy
python3 scripts/preprocess_data.py
  1. Download my final FastSpeech2 checkpoint
python3 scripts/download_checkpoint.py

Train

  1. Run for training
python3 train.py -c configs/train.json

Final model was trained with train.json config.

Test

  1. Run for testing
python3 test.py

test.py include such arguments:

  • Config path: -c, --config, default="configs/test.json"
  • Create multiple audio variants with different length, pitch and energy -t, --test, default=False
  • Increase or decrease audio speed: -l, --length-control, default=1
  • Increase or decrease audio pitch: -p, --pitch-control, default=1
  • Increase or decrease audio energy: -e, --energy-control, default=1
  • Checkpoint path: -cp, --checkpoint, default="test_model/tts-checkpoint.pth"
  • Input texts path: -i, --input, test_model/input.txt
  • Waveglow weights path: -w, --waveglow, default="waveglow/pretrained_model/waveglow_256channels.pt"

Results will be saved in the test_model/results, you can see example in this folder.

Wandb Report

https://api.wandb.ai/links/tgritsaev/rkir8sp9 (English only)

Credits

This repository is based on a heavily modified fork of pytorch-template repository. FastSpeech2 impementation is based on the code from HSE "Deep Learning in Audio" course seminar and official FastSpeech2 repository.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published