SpeechRecognition

This is a new repository for Math 572 Harmonic Analysis - Speech Recognition

This project consists of two main sections (Signal Processing and LGB/K-means Clustering)

How to run code

$ python MFCC.py - takes input audio files and converts to frequency spectrum
$ python LBG.py - runs LBG clustering algorithm to train/classify speakers
$ python ProcessSpeaker.py - processes and identifies unknown speakers from speech

Signal Processing

Short Time FFT (ie windowing) for preserving frequency changes over time
Conversion to Cepstrum domain for better understanding of speech freqs
MFCC (Mel Frequency Cepstral Coefficients) non-linear representation of signal

LGB & K-means Clustering

Main goal is to create cluster for codebooks containing codewords (ie clusters)
Nearest neighbor search for clustering given speaker into speech vector
LGB used to reset the centroids in K-means clustering

Testing

Training will be done on multiple speech signals
Testing will be done on unknown speech signals from audience

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SpeechRecognition

How to run code

Files

README.md

Latest commit

History

README.md

File metadata and controls

SpeechRecognition

How to run code