Skip to content

A Speech To Text in Kinyarwanda train using coqui STT

Notifications You must be signed in to change notification settings

IMdtman/STT-Kinyarwanda

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Kinyarwanda STT

Introduction

This README is a quick-start guide to training or finetuning an STT model using the Coqui toolkit on the Kinyarwanda speech data.

Dockerfile Setup (Recommended)

To avoid problems setting up Coqui STT on your environment and compatibility issues, we recommend pulling or building the Coqui STT dockerfile from the stt-train:latest image:

$ git clone --recurse-submodules https://github.com/coqui-ai/STT
$ cd STT
$ docker build -f Dockerfile.train . -t stt-train:latest
$ docker run -it stt-train:latest

Data Preprocessing

Data Formatting

After downloading and extracting the dataset, we found the following contents:

  • .tsv files, containing metadata such as text transcripts
  • .mp3 audio files, located in the clips directory

Coqui STT cannot directly work with Common Voice data, so we need the Coqui importer script bin/import_cv2.py to format the data correctly:

$ bin/import_cv2.py --validate_label_locale /path/to/validate_locale_rw.py /path/to/extracted/common-voice/archive

The importer script above would create .csv files from the .tsv files, and .wav files from the .mp3 files.

The --validate_label_locale flag is optional but needed for data cleaning. The details on the input to the flag can be found in the data cleaning section below.

Data Cleaning

  1. As a way to clean the data, we need to validate the text. It checks a sentence to see if it can be converted and if possible normalizes the encoding, removes special characters, etc. For this we use the commonvoice-utils tool to clean the text for Kinyarwanda (rw).

The script below is passed as an argument to the --validate_label_locale flag in the importer command above

#validate_locale_rw.py
from cvutils import Validator

def validate_label(label):
	v = Validator("rw") #rw - locale for Kinyarwanda. You should change accordingly.
	return v.validate(label)
  1. We also need to ensure that each audio/input is longer than the transcript/output. This step is important so as not to run into training errors. As a result, we remove data from (train, dev, and test CSVs) that don’t meet this criterion.
$ python3 /path/to/remove_outliers.py /path/to/train.csv --clips_dir /path/to/clips

Training

Since we would be training (and validating) our model within the docker environment we created initially, we need to first create and run a container:

$ docker run -it --name sttcontainer -v ~/data:/code/data/host_data --gpus all stt-train:latest

The above command does the following:

  • creates a container named sttcontainer
  • bind mounts the /data directory on the host environment to the /code/data/host_data on the docker environment.
  • gives the docker environment access to all the host GPUs

(Assuming we are within the docker environment:)

$ mkdir data/host_data/tensorboard # directory to save the training (loss) results
$ python -m coqui_stt_training.train \
	--checkpoint_dir data/host_data/jan-8-2021-best-kinya-deepspeech \
	--alphabet_config_path data/host_data/kinyarwanda_alphabet.txt \
	--n_hidden 2048 \
	--train_cudnn true \
	--train_files data/host_data/misc/lg-rw-oct2021/rw/clips/train.csv \
	--dev_files data/host_data/misc/lg-rw-oct2021/rw/clips/dev.csv \
	--test_files data/host_data/misc/lg-rw-oct2021/rw/clips/test.csv \
	--epochs 20 \
	--train_batch_size 128 \
	--dev_batch_size 128 \
	--test_batch_size 128 \
	--summary_dir data/host_data/tensorboard \
    	--reduce_lr_on_plateau true

Testing

By default, the trained model (after training) is tested on the test data at the end of the specified epoch, however, you can use a previously saved model on some test data:

$ python -m coqui_stt_training.train \
	--checkpoint_dir data/host_data/jan-8-2021-best-kinya-deepspeech \
	--alphabet_config_path data/host_data/kinyarwanda_alphabet.txt \
	--test_files data/host_data/misc/lg-rw-oct2021/rw/clips/test.csv \
	--test_batch_size 128

About

A Speech To Text in Kinyarwanda train using coqui STT

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%