A python implementation of incremental text-to-speech using fastspeech2.
- It uses a non-auto-regressive text-to-speech model. (tacotron, transformer-tts -> fastspeech2 )
- It uses a simple context discard algorithm for speed-up.
- Download pretrained tts+vocoder from https://zenodo.org/record/5498896
- Unzip the file.
- Place the unzipped files like this:
incremental_tts
├── exp
| ├── stats
│ │ ├── train
│ | ├── energy_stats.npz
│ | ├── energy_stats.npz
│ | └── energy_stats.npz
│ └── tts
│ ├── config.yaml
│ └── train.total_count.ave_10best.pth
├── gan_tts.py
└── incremental_tts.py
- Install anaconda.
- Make anaconda environments.(recommanded python version -> 3.7.4)
- Install all python requirements in anaconda enviroments.
- torch (cuda version, no cpu-only version)
- numpy
- espnet2
- pyaudio
- just type and use. -> python incremental_tts.py