Incremental text-to-speech

A python implementation of incremental text-to-speech using fastspeech2.

How is it different from previous-work?

It uses a non-auto-regressive text-to-speech model. (tacotron, transformer-tts -> fastspeech2 )
It uses a simple context discard algorithm for speed-up.

How to use?

Download pretrained tts+vocoder from https://zenodo.org/record/5498896
Unzip the file.
Place the unzipped files like this:

incremental_tts
├── exp 
|    ├── stats
│    │    ├── train
│    |    ├── energy_stats.npz
│    |    ├── energy_stats.npz
│    |    └── energy_stats.npz
│    └── tts
│         ├── config.yaml
│         └── train.total_count.ave_10best.pth
├── gan_tts.py 
└── incremental_tts.py

Install anaconda.
Make anaconda environments.(recommanded python version -> 3.7.4)
Install all python requirements in anaconda enviroments.

torch (cuda version, no cpu-only version)
numpy
espnet2
pyaudio

just type and use. -> python incremental_tts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Incremental text-to-speech

How is it different from previous-work?

How to use?

Files

README.md

Latest commit

History

README.md

File metadata and controls

Incremental text-to-speech

How is it different from previous-work?

How to use?