Skip to content

abnerLing/Kaldi-Speech_Processing

Repository files navigation

alt text

Speech Processing with Kaldi

Tutorial on using Kaldi for Dysarthric Speech Recognition and Speaker Recognition.

The data used is provided by the University of Toronto for free.

Goals of this excercise

  1. Build a kaldi-based GMM-HMM acoustic model for speech recognition.
  2. Improve the recognition accuracy for impaired speech (data augmentation, hyperparameter tuning, etc.)
  3. Train a DNN-HMM acoustic model using the alignments from the GMM-HMM model.
  4. Perform speaker identification/recognition via i-vectors and improve baseline results.

Sections

Section Details

  • Part 1.1 Installation

    • Kaldi
    • The SRI Language Modeling Toolkit
    • Sequitur Grapheme-to-Phoneme converter
    • Intel MKL (Math Kernel Library)
  • Part 1.2 Data Preparation

    • Audio data download
    • Files that need to be created by us
    • Kaldi directory structure
  • Part 2 Speech Recognition

    • N-gram language model building
    • MFCC extraction + CMVN (cepstral mean and variance normalization)
    • GMM-HMM training
      • Monophone training
      • Triphone training
      • Delta + delta-delta training computes dynamic coefficients to supplement the MFCC features.
      • Linear Discriminant Analysis – Maximum Likelihood Linear Transform (LDA-MLLT to reduce feature space)
      • Speaker Adaptive Training (SAT performs speaker and noise normalization)
      • Alignment with Feature Space Maximum Likelihood Linear Regression (fmllr features are speaker-normalized features)
  • Part 3 Speech Recognition

    • DNN-based acoustic model
      • Use GMM-HMM generated alignments to train a deep neural network acoustic model
      • Restricted Boltzmann Machine (RBM) pre-training
      • Frame cross-entropy training
      • Sequence-training optimizing state-level minimum Bayes risk (sMBR)
  • Part 4 Speaker Recognition (or identification)

    • MFCC feature extraction
    • Voice Activity detection (compute energy based VAD output)
    • Train Gaussian Mixture Model - Universal Background Model (GMM-UBM)
    • Train ivector extractor
    • Extract ivector from audio files
    • Train a Probabilistic Linear Discriminant Analysis (PLDA) model
    • Compute PLDA score (Equal Error Rate)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published