Skip to content

ag027592/NLP_Final

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP Final Project / 周惶振 (104061701) 何元通 (105062575)

Amplifying a Sense of Emotion toward Drama- Long Short-Term Memory Recurrent Neural Network for dynamic emotion recognition

Introduction

We want to use the NNIME database to study the emotion behavior (such as arousal and valence state) in small duration (like in real time), and to augment a sense of emotional feeling with visual demonstration. After all of this, we just wonder how the emotion application can be. Therefore, we think of the amplification of the emotion in video. There are a lot of video that you would feel awkward watching it because of its boring and no effect. So we want to amplify the context in the video to make the video better.

Dataset Description: NNIME-Emotion Corpus

The increasing availability of large-scale emotion corpus with advancement in emotion recognition algorithms have enabled the emergence of next-generation humanmachine interfaces. The database is a result of the collaborative work between engineers and drama experts. This database includes recordings of 44 subjects engaged in spontaneous dyadic spoken interactions. The multimodal data includes approximately 11-hour worth of audio, video, and electrocardiogram data recorded continuously and synchronously. The database is also completed with a rich set of emotion annotations of discrete and continuous-in-time annotation from a total of 50 annotators per subject. The emotion annotation further includes a diverse perspectives: peer-report, directorreport, self-report, and observer-report. This carefully-engineered data collection and annotation processes provide an additional valuable resource to quantify and investigate various aspects of affective phenomenon and human communication. To our best knowledge, the NNIME is one of the few large-scale Chinese affective dyadic inter-action database that have been systematic-ally collected, organized, and to be publicly released to the research community.

SVR Model

Support vector regression is used to predict the final results. We can get approximately 30 to 40 percentage of correlation or sometimes even 70 percentage of correlation in the better situation without fine tuning. Furthermore, for the purpose of making the result better, we have once performed feature selection. Nevertheless, perhaps the feature are tuned the best in fasttext, we found that using all features can perform better than simply using merely some part. Eventually, we have consider to use other data mining algorithms such as binary tree… etc. in the future works to show the advantage of our main purpose of LSTM-RNN.

LSTM-RNN Model

By RNNLM, we train two LSTM model for audio and text respectively, the structure showed in figure, and parameter showed below after training LSTM model, we integrate the prediction of activation and valence of each frame to sentence by mean.

Installation

    1. Downloading the file "code".
    1. Changing the path in the .../code/
    1. LSTM_RNN: To acess to .../code/LSTM_RNN_Code and run LSTM_RNN_Regression.py code
    1. SVR: To acess to .../code/SVR_code and run performSVR.py code (You have to download the needed data in "連結.txt")

Result

Spearman Correlation Activation Valence
SVR(Audio) 0.32 0.09
LSTM-RNN(Audio) 0.43 0.13
SVR(Text) 0.43 0.32
LSTM-RNN(Text) 0.1 0.04

About

LSTM RNN for Emotion Recognition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages