KazEmoTTS
⌨️ 😐 😠 🙂 😞 😱 😮 🗣

This repository provides a dataset and a text-to-speech (TTS) model for the paper
KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis

Dataset Statistics 📊

Emotion	# recordings	Narrator F1				Narrator M1				Narrator M2

		Total (h)	Mean (s)	Min (s)	Max (s)	Total (h)	Mean (s)	Min (s)	Max (s)	Total (h)	Mean (s)	Min (s)	Max (s)
neutral	9,385	5.85	5.03	1.03	15.51	4.54	4.77	0.84	16.18	2.30	4.69	1.02	15.81
angry	9,059	5.44	4.78	1.11	14.09	4.27	4.75	0.93	17.03	2.31	4.81	1.02	15.67
happy	9,059	5.77	5.09	1.07	15.33	4.43	4.85	0.98	15.56	2.23	4.74	1.09	15.25
sad	8,980	5.60	5.04	1.11	15.21	4.62	5.13	0.72	18.00	2.65	5.52	1.16	18.16
scared	9,098	5.66	4.96	1.00	15.67	4.13	4.51	0.65	16.11	2.34	4.96	1.07	14.49
surprised	9,179	5.91	5.09	1.09	14.56	4.52	4.92	0.81	17.67	2.28	4.87	1.04	15.81

Narrator	# recordings	Duration (h)
F1	24,656	34.23
M1	19,802	26.51
M2	10,302	14.11
Total	54,760	74.85

Installation 🛠️

First, you need to build the monotonic_align code:

cd model/monotonic_align; python setup.py build_ext --inplace; cd ../..

Note: Python version is 3.9.13

Pre-Processing Data for Training 🧹

You need to download the KazEmoTTS dataset and customize it, as in filelists/all_spk, by executing data_preparation.py:

python data_preparation.py -d <path_to_KazEmoTTS_dataset>

Training Stage 🏋️‍♂️

To initiate the training process, you must specify the path to the model configurations, which can be found in configs/train_grad.json, and designate a directory for checkpoints, typically located at logs/train_logs, to specify the GPU you will be using.

CUDA_VISIBLE_DEVICES=YOUR_GPU_ID
python train_EMA.py -c <configs/train_grad.json> -m <checkpoint>

Inference 🧠

Pre-Training Stage 🏃

If you intend to utilize a pre-trained model, you will need to download the necessary checkpoints TTS, vocoder for both the TTS model based on GradTTS and HiFi-GAN.

To conduct inference, follow these steps:

Create a text file containing the sentences you wish to synthesize, such as filelists/inference_generated.txt.
Specify the txt file format as follows: text|emotion id|speaker id.
Adjust the path to the HiFi-Gan checkpoint in inference_EMA.py.
Set the classifier guidance level to 100 using the -g parameter.

python inference_EMA.py -c <config> -m <checkpoint> -t <number-of-timesteps> -g <guidance-level> -f <path-for-text> -r <path-to-save-audios>

Synthesized samples 🔈

You can listen to some synthesized samples here.

Citation 🎓

We kindly urge you, if you incorporate our dataset and/or model into your work, to cite our paper as a gesture of recognition for its valuable contribution. The act of referencing the relevant sources not only upholds academic honesty but also ensures proper acknowledgement of the authors' efforts. Your citation in your research significantly contributes to the continuous progress and evolution of the scholarly realm. Your endorsement and acknowledgement of our endeavours are genuinely appreciated.

@misc{abilbekov2024kazemotts,
      title={KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis}, 
      author={Adal Abilbekov and Saida Mussakhojayeva and Rustem Yeshpanov and Huseyin Atakan Varol},
      year={2024},
      eprint={2404.01033},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
configs		configs
filelists		filelists
model		model
text		text
README.md		README.md
data_collate.py		data_collate.py
data_loader.py		data_loader.py
data_preparation.py		data_preparation.py
inference_EMA.py		inference_EMA.py
melspec.py		melspec.py
models.py		models.py
train_EMA.py		train_EMA.py
utils_data.py		utils_data.py
xutils.py		xutils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KazEmoTTS
⌨️ 😐 😠 🙂 😞 😱 😮 🗣

Dataset Statistics 📊

Installation 🛠️

Pre-Processing Data for Training 🧹

Training Stage 🏋️‍♂️

Inference 🧠

Pre-Training Stage 🏃

Synthesized samples 🔈

Citation 🎓

About

Releases

Packages

Contributors 2

Languages

IS2AI/KazEmoTTS

Folders and files

Latest commit

History

Repository files navigation

KazEmoTTS ⌨️ 😐 😠 🙂 😞 😱 😮 🗣

Dataset Statistics 📊

Installation 🛠️

Pre-Processing Data for Training 🧹

Training Stage 🏋️‍♂️

Inference 🧠

Pre-Training Stage 🏃

Synthesized samples 🔈

Citation 🎓

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

KazEmoTTS
⌨️ 😐 😠 🙂 😞 😱 😮 🗣

Packages