Skip to content

Latest commit

 

History

History
399 lines (256 loc) · 12.2 KB

experiments.md

File metadata and controls

399 lines (256 loc) · 12.2 KB

Experiments with Perceptron

Here we include the evaluation results of our model run with different features and parameters.

Summary

Summary contains an overview of all experiments.

Experiments with Unigrams

Model LR Feature(s) Tokenization Epochs Conv Macro F
Baseline 0.1 binary Basic (split by space / punct) 150 0.8 0.382
Baseline 0.1 count Basic 150 0.78 0.412
Baseline 0.1 frequency Basic 150 0.57 0.436
+Shuffle 0.1 frequency Basic 150 0.56 0.412
+Avg 0.1 frequency Basic 150 0.57 0.509
+Shuffle, +Avg 0.1 frequency Basic 150 0.56 0.52
+Shuffle, +Avg 0.3 frequency Basic 150 0.56 0.521
+Shuffle, +Avg 0.5 frequency Basic 150 0.56 0.521
+Shuffle, +Avg 1.0 frequency Basic 150 0.56 0.521
+Shuffle, +Avg 0.3 frequency replace_emojis 150 0.55 0.511
+Shuffle, +Avg 0.3 frequency replace_num 150 0.56 0.52
+Shuffle, +Avg 0.3 frequency replace_emojis, replace_num 150 0.55 0.51
+Shuffle, +Avg 0.3 frequency remove_stopw 150 0.52 0.47
+Shuffle, +Avg 0.3 frequency remove_punc 150 0.56 0.518
+Shuffle, +Avg 0.3 frequency lowercase 150 0.48 0.498
+Shuffle, +Avg 0.3 frequency stem 150 0.41 0.455
+Shuffle, +Avg 0.3 frequency all params 150 0.34 0.387
Baseline 0.1 tf-idf Basic 150 0.79 0.401
+Avg 0.3 tf-idf Basic 150 0.79 0.467
+Shuffle, +Avg 0.3 tf-idf Basic 150 0.77 0.472
+Shuffle, +Avg 0.3 if-idf replace_emojis, replace_num 150 0.76 0.461
+Shuffle, +Avg 0.3 if-idf all params 150 0.43 0.376

Experiments with Birgrams

Model LR Feature(s) Tokenization Epochs Conv Macro F
Baseline 0.3 Bigram: frequency Basic 150 0.97 0.545
Baseline 0.3 Bigram: tf-idf Basic 25 0.98 0.54

Experiments with Unigrams+Birgrams

Model LR Feature(s) Tokenization Epochs Conv Macro F
Baseline 0.3 Unigram+Bigram: binary Basic 150 0.99 0.55
Baseline 0.3 Unigram+Bigram: count Basic 25 0.96 0.564
Baseline 0.3 Unigram+Bigram: tf_idf Basic 150 0.99 0.554
Baseline 0.3 Unigram+Bigram: frequency Basic 150 0.97 0.568
Baseline 0.3 Unigram+Bigram: frequency Basic 50 0.88 0.58

Details & Charts

Binary. Tokenization: basic

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.8	0.382	0.388	0.28	0.68	0.54	0.37	0.42	0.46	0.43	0.29	0.54	0.3	0.41	0.22

Count. Tokenization: basic

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.78	0.412	0.413	0.33	0.53	0.49	0.45	0.45	0.48	0.42	0.33	0.51	0.33	0.37	0.35

Frequency. Tokenization: basic

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.57	0.436	0.463	0.63	0.18	0.42	0.69	0.45	0.64	0.6	0.2	0.51	0.61	0.41	0.46

Frequency. Tokenization: basic, MCP+Shuffling

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.56	0.412	0.437	0.4	0.57	0.86	0.2	0.52	0.54	0.36	0.68	0.86	0.14	0.39	0.49

Frequency. Tokenization: basic, MCP+Averaging

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.57	0.509	0.51	0.46	0.5	0.56	0.56	0.53	0.56	0.48	0.45	0.57	0.55	0.45	0.43

Frequency. Tokenization: basic, MCP+A+S

Learning rate 0.1

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.56	0.52	0.521	0.47	0.51	0.57	0.58	0.55	0.58	0.5	0.46	0.57	0.56	0.47	0.44

Learning rate 0.3

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.56	0.521	0.522	0.47	0.51	0.57	0.58	0.55	0.58	0.5	0.46	0.57	0.56	0.47	0.44

Learning rate 0.5:

convergence chart

150th epoch results

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.56	0.521	0.522	0.47	0.51	0.57	0.58	0.55	0.58	0.5	0.46	0.57	0.56	0.47	0.44

Learning rate 1.0:

convergence chart

150th epoch results

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.56	0.521	0.522	0.48	0.51	0.57	0.58	0.55	0.58	0.5	0.46	0.57	0.56	0.47	0.44

Frequency. MCP+A+S. Different tokenization params

Learning rate 0.3

replace_emojis

Emoticons and emojis are replaced by a lexical entity (e.g. ":)" -> "smile")

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.55	0.511	0.512	0.47	0.51	0.57	0.56	0.54	0.58	0.46	0.44	0.56	0.54	0.46	0.43

replace_num

Numerical tokens are replaced by <NUM> tag

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.56	0.52	0.521	0.47	0.51	0.57	0.58	0.55	0.58	0.5	0.46	0.57	0.56	0.47	0.44

replace_emojis & replace_num

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.55	0.51	0.511	0.47	0.51	0.57	0.56	0.54	0.58	0.47	0.44	0.56	0.54	0.45	0.43

remove_punc

Remove punctuation during tokenization

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.56	0.518	0.519	0.47	0.51	0.57	0.58	0.54	0.58	0.5	0.45	0.57	0.55	0.46	0.44

remove_stopw

Remove stopwords

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.52	0.47	0.47	0.41	0.47	0.55	0.53	0.48	0.5	0.42	0.39	0.54	0.51	0.44	0.42

stem

Stem tokens

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.41	0.455	0.457	0.4	0.48	0.49	0.49	0.48	0.52	0.44	0.39	0.52	0.53	0.41	0.32

lowercase

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.48	0.498	0.499	0.44	0.51	0.56	0.54	0.52	0.56	0.46	0.43	0.56	0.54	0.45	0.41

Frequency. Tokenization: all params, MCP+A+S

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.34	0.387	0.39	0.32	0.46	0.48	0.4	0.43	0.44	0.36	0.26	0.42	0.46	0.36	0.31

TF-IDF. Different parametres

Model: Baseline, Tokenization: Basic:

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.79	0.401	0.411	0.31	0.63	0.41	0.59	0.54	0.36	0.43	0.38	0.61	0.29	0.45	0.2

Model: MCP+Averaging, Tokenization: Basic:

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.79	0.467	0.467	0.42	0.44	0.52	0.52	0.51	0.51	0.43	0.43	0.53	0.5	0.4	0.39

Model: MCP+A+S, Tokenization: Basic:

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.77	0.472	0.473	0.42	0.44	0.53	0.53	0.51	0.52	0.44	0.43	0.53	0.51	0.41	0.4

Model: MCP+A+S, Tokenization: all params:

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.43	0.376	0.378	0.3	0.43	0.45	0.42	0.43	0.41	0.34	0.26	0.42	0.43	0.34	0.3

Model: MCP+A+S, Tokenization: replace_emoji+replace_num:

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.76	0.461	0.462	0.42	0.42	0.52	0.53	0.5	0.53	0.41	0.42	0.51	0.5	0.41	0.38

Model: MCP+A+S, Features: Bigrams, Tokenization: basic

convergence chart

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.97	0.545	0.547	0.54	0.53	0.55	0.61	0.58	0.6	0.5	0.53	0.58	0.58	0.53	0.43

Model: MCP+A+S, Features: Unigram+Bigram, binary, Tokenization: basic

![convergence chart](results/experiment_grams%281, 2%29_binary.png)

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.99	0.55	0.55	0.53	0.54	0.58	0.59	0.6	0.6	0.52	0.51	0.6	0.58	0.48	0.48

Model: MCP+A+S, Features: Unigram+Bigram, TF-IDF, Tokenization: basic

![convergence chart](results/experiment_grams%281, 2%29_tf-idf.png)

150th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.99	0.554	0.557	0.56	0.52	0.57	0.62	0.55	0.66	0.52	0.54	0.61	0.58	0.54	0.42

Model: MCP+A+S, Features: Bigram, TF-IDF, Tokenization: basic

convergence chart

25th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.98	0.54	0.542	0.56	0.51	0.56	0.6	0.57	0.6	0.49	0.53	0.57	0.58	0.51	0.43

Model: MCP+A+S, Features: Unigram+Bigram, count, Tokenization: basic

![convergence chart](results/experiment_grams%281, 2%29_count.png)

25th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.96	0.564	0.565	0.53	0.56	0.61	0.6	0.6	0.62	0.54	0.52	0.61	0.6	0.5	0.5

Model: MCP+A+S, Features: Unigram+Bigram, frequency, Tokenization: replace num/emoji

![convergence chart](results/experiment_rr_grams%281, 2%29_frequency.png)

50th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.88	0.579	0.581	0.56	0.57	0.6	0.63	0.59	0.64	0.56	0.54	0.62	0.62	0.54	0.48

Model: MCP+A+S, Features: Unigram+Bigram, frequency, Tokenization: basic

![convergence chart](results/experiment_grams%281, 2%29_frequency_50e.png)

50th epoch results:

Conv	Fmac	Fmic	supP	supR	disP	disR	feaP	feaR	sadP	sadR	joyP	joyR	angP	angR
0.88	0.58	0.582	0.56	0.57	0.6	0.63	0.59	0.64	0.56	0.55	0.62	0.62	0.55	0.48