Skip to content

Latest commit

 

History

History
23 lines (18 loc) · 1.05 KB

README.md

File metadata and controls

23 lines (18 loc) · 1.05 KB

SpeechRecognition

This is a new repository for Math 572 Harmonic Analysis - Speech Recognition

This project consists of two main sections (Signal Processing and LGB/K-means Clustering)

How to run code

$ python MFCC.py - takes input audio files and converts to frequency spectrum
$ python LBG.py - runs LBG clustering algorithm to train/classify speakers
$ python ProcessSpeaker.py - processes and identifies unknown speakers from speech

Signal Processing

  • Short Time FFT (ie windowing) for preserving frequency changes over time
  • Conversion to Cepstrum domain for better understanding of speech freqs
  • MFCC (Mel Frequency Cepstral Coefficients) non-linear representation of signal

LGB & K-means Clustering

  • Main goal is to create cluster for codebooks containing codewords (ie clusters)
  • Nearest neighbor search for clustering given speaker into speech vector
  • LGB used to reset the centroids in K-means clustering

Testing

  • Training will be done on multiple speech signals
  • Testing will be done on unknown speech signals from audience