forked from lukasgarbas/nlp-text-emotion
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Lukas Garbas
committed
Dec 19, 2019
0 parents
commit ca574f9
Showing
1 changed file
with
53 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# Emotion Classification in Short Messages | ||
|
||
Multi-class sentiment analysis problem to classify text into five emotions categories: joy, sadness, anger, fear, neutral. A fun weekend project to go through different text classification techniques. This includes dataset preparation, traditional machine learning with scikit-learn, LSTM neural networks and transfer learning using BERT (tensorflow's keras). | ||
|
||
# Datasets | ||
|
||
## Datasets overview | ||
|
||
**Summary Table** | ||
|
||
| Dataset | Year | Content | Size | Emotion categories | Balanced | | ||
| :--------------: | :--: | :-------: | ------------ | ------------------ | :-------: | | ||
|dailydialog| 2017 | dialogues |102k sentences|neutral, joy, surprise, sadness, anger, disgust, fear| No | | ||
|emotion-stimulus|2015|dialogues|2.5k sentences|sadness, joy, anger, fear, surprise, disgust| No | | ||
|isear|1990|emotional situations|7.5k sentences|joy, fear, anger, sadness, disgust, shame, guilt| Yes | | ||
|
||
links: [dailydialog](http://yanran.li/dailydialog.html), [emotion-stimulus](http://www.site.uottawa.ca/~diana/resources/emotion_stimulus_data), [isear](http://www.affective-sciences.org/index.php/download_file/view/395/296/) | ||
|
||
|
||
## Combined dataset | ||
|
||
Dataset was combined from dailydialog, isear, and emotion-stimulus to create a balanced dataset with 6 labels: joy, sad, anger, fear, disgust, surprise and neutral. The texts mainly consist of short messages and dialog utterances. | ||
|
||
# Experiments | ||
|
||
### Traditional Machine Learning: | ||
* Data preprocessing: noise and punctuation removal, tokenization, stemming | ||
* Text Representation: TF-IDF | ||
* Classifiers: Naive Bayes, Random Forrest, Logistic Regrassion, SVM | ||
|
||
| Approach | F1-Score | | ||
| :------------------ | :------: | | ||
| Naive Bayes | 0.6702 | | ||
| Random Forrest | 0.6372 | | ||
| Logistic Regression | 0.6935 | | ||
| SVM | 0.7271 | | ||
|
||
### Neural Networks | ||
* Data preprocessing: noise and punctuation removal, tokenization | ||
* Word Embeddings: pretrained 300 dimensional word2vec ([link](https://fasttext.cc/docs/en/english-vectors.html)) | ||
* Deep Network: LSTM and biLSTM | ||
|
||
| Approach | F1-Score | | ||
| :------------------ | :------: | | ||
| LSTM + w2v_wiki | 0.7395 | | ||
| biLSTM + w2v_wiki | 0.7414 | | ||
|
||
### Transfer learning with BERT | ||
Fine-tuning BERT for text classification | ||
|
||
| Approach | F1-Score | | ||
| :------------------ | :------: | | ||
| fine-tuned BERT | 0.8320 | |