Speech Processing with Kaldi

Tutorial on using Kaldi for Dysarthric Speech Recognition and Speaker Recognition.

The data used is provided by the University of Toronto for free.

http://www.cs.toronto.edu/~complingweb/data/TORGO/torgo.html
Speakers have speech impairments due to Cerebral Palsy or Amyotrophic Lateral Sclerosis.

Goals of this excercise

Build a kaldi-based GMM-HMM acoustic model for speech recognition.
Improve the recognition accuracy for impaired speech (data augmentation, hyperparameter tuning, etc.)
Train a DNN-HMM acoustic model using the alignments from the GMM-HMM model.
Perform speaker identification/recognition via i-vectors and improve baseline results.

Sections

Part 1: Installation & Data Preparation
Part 2: Speech Recognition (acoustic and Language model training)
- GMM-HMM acoustic model
Part 3: DNN-HMM acoustic model
Part 4: Speaker Recognition (using i-vectors)

Section Details

Part 1.1 Installation
- Kaldi
- The SRI Language Modeling Toolkit
- Sequitur Grapheme-to-Phoneme converter
- Intel MKL (Math Kernel Library)
Part 1.2 Data Preparation
- Audio data download
- Files that need to be created by us
- Kaldi directory structure
Part 2 Speech Recognition
- N-gram language model building
- MFCC extraction + CMVN (cepstral mean and variance normalization)
- GMM-HMM training
  - Monophone training
  - Triphone training
  - Delta + delta-delta training computes dynamic coefficients to supplement the MFCC features.
  - Linear Discriminant Analysis – Maximum Likelihood Linear Transform (LDA-MLLT to reduce feature space)
  - Speaker Adaptive Training (SAT performs speaker and noise normalization)
  - Alignment with Feature Space Maximum Likelihood Linear Regression (fmllr features are speaker-normalized features)
Part 3 Speech Recognition
- DNN-based acoustic model
  - Use GMM-HMM generated alignments to train a deep neural network acoustic model
  - Restricted Boltzmann Machine (RBM) pre-training
  - Frame cross-entropy training
  - Sequence-training optimizing state-level minimum Bayes risk (sMBR)
Part 4 Speaker Recognition (or identification)
- MFCC feature extraction
- Voice Activity detection (compute energy based VAD output)
- Train Gaussian Mixture Model - Universal Background Model (GMM-UBM)
- Train ivector extractor
- Extract ivector from audio files
- Train a Probabilistic Linear Discriminant Analysis (PLDA) model
- Compute PLDA score (Equal Error Rate)

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
Feature Extraction		Feature Extraction
data prep		data prep
installation		installation
speaker recognition		speaker recognition
speech recognition		speech recognition
README.md		README.md
kaldi.png		kaldi.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Processing with Kaldi

Tutorial on using Kaldi for Dysarthric Speech Recognition and Speaker Recognition.

The data used is provided by the University of Toronto for free.

Goals of this excercise

Sections

Section Details

About

Releases

Packages

Languages

abnerLing/Kaldi-Speech_Processing

Folders and files

Latest commit

History

Repository files navigation

Speech Processing with Kaldi

Tutorial on using Kaldi for Dysarthric Speech Recognition and Speaker Recognition.

The data used is provided by the University of Toronto for free.

Goals of this excercise

Sections

Section Details

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages