MoChA Speech Recognition

This repository contains an implementation of a speech recognition system using the Monotonic Chunkwise Attention (MoChA) mechanism. The goal of this project is to provide an end-to-end automatic speech recognition (ASR) system that can transcribe speech into text.

Installation

Before getting started, make sure to have the following prerequisites installed:

Python 3.9
TensorFlow 2
Numpy
Librosa
tqdm

To install the required packages, you can run:

pip install -r requirements.txt

Data Preparation

Download your desired speech recognition dataset (e.g., LibriSpeech, CommonVoice) and organize it into the data/raw_data directory, with separate subdirectories for train, dev, and test sets.
Run the preprocessing script to convert the raw audio files into suitable features (e.g., MFCCs, log-mel filterbank energies) and store them in the data/preprocessed_data directory:

python src/data_utils.py

Training

To train the MoChA ASR model, run the following command:

python src/train.py

This will train the model using the preprocessed data and save the best performing model in the models/saved_models directory.

Evaluation

To evaluate the trained model on the test dataset, run:

python src/evaluate.py

This will load the best performing model from models/saved_models and calculate the performance metrics (e.g., Word Error Rate) on the test dataset.

Usage

After training and evaluating the model, you can use it for your own speech recognition tasks by importing the MoChAASR class from the model.py file and loading the trained weights.

from src.model import MoChAASR

model = MoChAASR()
model.load_weights('path/to/saved/model/weights')
transcription = model.transcribe('path/to/audio/file')

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MoChA Speech Recognition

Table of Contents

Installation

Data Preparation

Training

Evaluation

Usage

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

MoChA Speech Recognition

Table of Contents

Installation

Data Preparation

Training

Evaluation

Usage

License