Kazakh-Speech-Commands-Dataset

Preprint

Speech Command Recognition: Text-to-Speech and Speech Corpus Scraping Are All You Need

Presentation on the 3rd International Conference on Robotics, Automation, and Artificial Intelligence (RAAI 2023)

Speech Command Recognition: Text-to-Speech and Speech Corpus Scraping Are All You Need

Synthetic speech commands generation

In this project, we used Piper to generate synthetic speech commands. Piper is a fast, local neural text to speech system. It provides five voices for the Kazakh language. The list of available models for other languages can be found here and the corresponding demos are given here. To generate synthetic speech commands for Kazakh, download and unzip the model from Google Drive. Then, open the synthetic_data_generation.ipynb notebook, update the path to the model, and run all cells.

Speech corpus scraping

To automatically extract speech commands from a large-scale speech corpus, we used Vosk Speech Recognition Toolkit. The example code is given in speech_corpus_scraping.ipynb notebook.

Data augmentation

To increase the dataset size further, you can apply audio augmentation methods to the synthetic dataset and also to the speech corpus scraped dataset. The details can be found in the data_augmentation.ipynb notebook.

Model training, validation, and testing

The details of training, validation, and testing of the model can be found in the Keyword-MLP directory.

Tutorials

Video tutorials for each step of the project on our YouTube channel

Citation

@article{Kuzdeuov2023,
author = "Askat Kuzdeuov and Shakhizat Nurgaliyev and Diana Turmakhan and Nurkhan Laiyk and Huseyin Atakan Varol",
title = "{Speech Command Recognition: Text-to-Speech and Speech Corpus Scraping Are All You Need}",
year = "2023",
month = "5",
url = "https://www.techrxiv.org/articles/preprint/Speech_Command_Recognition_Text-to-Speech_and_Speech_Corpus_Scraping_Are_All_You_Need/22717657",
doi = "10.36227/techrxiv.22717657.v1"
}

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
Keyword-MLP		Keyword-MLP
corpus_speech_commands		corpus_speech_commands
speech_corpus		speech_corpus
LICENSE		LICENSE
README.md		README.md
data_augmentation.ipynb		data_augmentation.ipynb
process_real_data.ipynb		process_real_data.ipynb
speech_corpus_scraping.ipynb		speech_corpus_scraping.ipynb
synthetic_data_generation.ipynb		synthetic_data_generation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kazakh-Speech-Commands-Dataset

Preprint

Presentation on the 3rd International Conference on Robotics, Automation, and Artificial Intelligence (RAAI 2023)

Synthetic speech commands generation

Speech corpus scraping

Data augmentation

Model training, validation, and testing

Tutorials

Citation

About

Releases

Packages

Languages

License

IS2AI/Kazakh-Speech-Commands-Dataset

Folders and files

Latest commit

History

Repository files navigation

Kazakh-Speech-Commands-Dataset

Preprint

Presentation on the 3rd International Conference on Robotics, Automation, and Artificial Intelligence (RAAI 2023)

Synthetic speech commands generation

Speech corpus scraping

Data augmentation

Model training, validation, and testing

Tutorials

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages