Skip to content

Latest commit

 

History

History
146 lines (138 loc) · 4.35 KB

README.md

File metadata and controls

146 lines (138 loc) · 4.35 KB

TatarTTS Dataset

TatarTTS is an open-source text-to-speech dataset for the Tatar language. The dataset comprises ~70 hours of transcribed audio recordings, featuring two professional speakers (one male and one female).

Paper on TechRxiv

TatarTTS: An Open-Source Text-to-Speech Synthesis Dataset for the Tatar Language

Paper on IEEE

TatarTTS: An Open-Source Text-to-Speech Synthesis Dataset for the Tatar Language

Setup and Requirements

We employed Piper text-to-speech system to train TTS models on our dataset.

sudo apt-get install python3-dev 
git clone https://github.com/rhasspy/piper.git
cd piper/src/python
python3 -m venv .venv
source .venv/bin/activate
pip3 install --upgrade pip
pip3 install --upgrade wheel setuptools
pip3 install -e .

Please check the installation guide for more information.

Downloading the dataset

LINK TO DOWNLOAD WILL BE AVAILABLE SOON HERE. After downloading the dataset, unzip it inside piper/src/python/ directory. The dataset is in the ljspeech format.

TatarTTS
|-male
  |-wav
    |0.wav
    |1.wav
    |2.wav
    ...
  |-metadata.csv
|-female
  |-wav
    |0.wav
    |1.wav
    |2.wav
    ...
  |-metadata.csv

Pre-processing

cd piper/src/python
mkdir TatarTTS_piper
cd TatarTTS_piper
mkdir male female

Pre-processing the male speaker dataset

python3 -m piper_train.preprocess \
  --language tt \
  --input-dir /TatarTTS/male \
  --output-dir /TatarTTS_piper/male \
  --dataset-format ljspeech \
  --single-speaker \
  --sample-rate 22050

Pre-processing the female speaker dataset

python3 -m piper_train.preprocess \
  --language tt \
  --input-dir /TatarTTS/female \
  --output-dir /TatarTTS_piper/female \
  --dataset-format ljspeech \
  --single-speaker \
  --sample-rate 22050

Training

cd piper/src/python

Training on the male speaker dataset

python3 -m piper_train \
    --dataset-dir /TatarTTS_piper/male\
    --accelerator 'gpu' \
    --devices 1 \
    --batch-size 32 \
    --validation-split 0.0 \
    --num-test-examples 0 \
    --max_epochs 1000 \
    --checkpoint-epochs 1 \
    --precision 32

Training on the female speaker dataset

python3 -m piper_train \
    --dataset-dir /TatarTTS_piper/female\
    --accelerator 'gpu' \
    --devices 1 \
    --batch-size 32 \
    --validation-split 0.0 \
    --num-test-examples 0 \
    --max_epochs 1000 \
    --checkpoint-epochs 1 \
    --precision 32

Exporting a Model

python3 -m piper_train.export_onnx \
    /path/to/model.ckpt \
    /path/to/model.onnx
    
cp /path/to/training_dir/config.json \
   /path/to/model.onnx.json

Speech Synthesis with Pre-trained Models

Download and unzip pre-trained models (.onnx, .ckpt) for both speakers from Google Drive.

CLI

cd models
echo 'Аның чыраенда тәвәккәллек чагыла иде.' |   ./piper --model male/male.onnx --config male/config.json --output_file welcome.wav
echo 'Аның чыраенда тәвәккәллек чагыла иде.' |   ./piper --model female/female.onnx --config female/config.json --output_file welcome.wav

Python

cd piper/src/python_run
python3 piper --model /path/to/model/.onnx --config /path/to/model/config.json --output-file welcome.wav

Authors and Citation

The project has been developed in academic collaboration between ISSAI and Institute of Applied Semiotics of Tatarstan Academy of Sciences

@INPROCEEDINGS{10463261,
  author={Orel, Daniil and Kuzdeuov, Askat and Gilmullin, Rinat and Khakimov, Bulat and Varol, Huseyin Atakan},
  booktitle={2024 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)}, 
  title={TatarTTS: An Open-Source Text-to-Speech Synthesis Dataset for the Tatar Language}, 
  year={2024},
  volume={},
  number={},
  pages={717-721},
  doi={10.1109/ICAIIC60209.2024.10463261}}

References

  1. Piper: https://github.com/rhasspy/piper
  2. Pre-processing, training, and exporting: https://github.com/rhasspy/piper/blob/master/TRAINING.md