This repository contains the code for the speech emotion recognition model built using Keras, TensorFlow, and sklearn libraries. The data used for training purposes is the RAVDESS Audio Dataset.
Tested on Python 3.8.10
To install the dependencies run:
pip install -r requirements.txt
The RAVDESS audio dataset consists of two lexically matched statements vocalized in a neutral North American accent by 24 professional actors (12 female, 12 male) in the database. Calm, happy, sad, angry, afraid, surprised, and disgusted expressions can be found in speech, whereas calm, happy, sad, angry, and fearful emotions can be found in song. Each expression has two emotional intensity levels (normal and strong), as well as a neutral expression. We will only be making use of audio files in this project.
The audio files of RAVDESS are selected and placed inside the data
folder.
To extract the data, run:
python utils.py
To train the model, run:
python model.py
To get predictions, run: cd src
python engine.py --framework=(keras/sklearn) --infer --infer-file-path="filepath of the sample to make predictions"