ZSL-Speech-Recognition

Zero-Shot Learning is the formulation of a machine learning problem when models are trained without examples. This means that one data set is used during model training, and another, previously unknown to the model, is used during testing.

My generative models (VAE, GAN) create signal characteristics determined by semantic attributes of a certain class.

As an example of the results, the confusion matrix for the GAN model, tested on the Google Speech Command dataset. Basically, words like "left", "down", "stop" were incorrectly recognized. The percentage of errors on WER for GAN on the test showed 26%, SEP - 46%:

The percentage of errors for VAE on the LibriSpeech dataset was 21.89% WER and 53.5 SER on the test. An example of the result of the work of VAE:

The results are very positive, since the testing was conducted on classes that did not participate in the learning process. The models had only their semantic attributes.

Libraries and technologies: PyTorch, NumPy, cuDNN library and CUDA technology are used. Information from hidden layers of NVIDIA's pre-trained Jasper acoustic model is used to extract features and generate vector representations of sounds and texts. The Universal Sentense Encoder (USE) model for English is used to vectorize text annotations. Also Spotify's Annoying library is used to store lists of words and sentences in the form of matrices.

Datasets: Google Speech Command Dataset: https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html

LibriSpeech: https://www.openslr.org/12

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
confusion_matrix_GAN.png		confusion_matrix_GAN.png
gan.py		gan.py
result_LS_VAE.png		result_LS_VAE.png
utlis.py		utlis.py
vae.py		vae.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ZSL-Speech-Recognition

About

Releases

Packages

Languages

amhtj/ZSL-Speech-Recognition

Folders and files

Latest commit

History

Repository files navigation

ZSL-Speech-Recognition

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages