Skip to content

IS2AI/Kazakh-Speech-Commands-Dataset

Repository files navigation

Kazakh-Speech-Commands-Dataset

Preprint

Speech Command Recognition: Text-to-Speech and Speech Corpus Scraping Are All You Need

Presentation on the 3rd International Conference on Robotics, Automation, and Artificial Intelligence (RAAI 2023)

Speech Command Recognition: Text-to-Speech and Speech Corpus Scraping Are All You Need

Synthetic speech commands generation

In this project, we used Piper to generate synthetic speech commands. Piper is a fast, local neural text to speech system. It provides five voices for the Kazakh language. The list of available models for other languages can be found here and the corresponding demos are given here. To generate synthetic speech commands for Kazakh, download and unzip the model from Google Drive. Then, open the synthetic_data_generation.ipynb notebook, update the path to the model, and run all cells.

Speech corpus scraping

To automatically extract speech commands from a large-scale speech corpus, we used Vosk Speech Recognition Toolkit. The example code is given in speech_corpus_scraping.ipynb notebook.

Data augmentation

To increase the dataset size further, you can apply audio augmentation methods to the synthetic dataset and also to the speech corpus scraped dataset. The details can be found in the data_augmentation.ipynb notebook.

Model training, validation, and testing

The details of training, validation, and testing of the model can be found in the Keyword-MLP directory.

Tutorials

Video tutorials for each step of the project on our YouTube channel

Citation

@article{Kuzdeuov2023,
author = "Askat Kuzdeuov and Shakhizat Nurgaliyev and Diana Turmakhan and Nurkhan Laiyk and Huseyin Atakan Varol",
title = "{Speech Command Recognition: Text-to-Speech and Speech Corpus Scraping Are All You Need}",
year = "2023",
month = "5",
url = "https://www.techrxiv.org/articles/preprint/Speech_Command_Recognition_Text-to-Speech_and_Speech_Corpus_Scraping_Are_All_You_Need/22717657",
doi = "10.36227/techrxiv.22717657.v1"
}

Releases

No releases published

Packages

No packages published