This projects builds upon Yoon Kim’s Convolutional Neural Networks for Sentence Classification, showing that w2v embedding with cnn inception network achieves neat performance on text classification while being fast and efficient. Thus I'd like to investigate how other structures of cnn perofrm on text classification, like Lenet and the simplest structure of cnn with just 1 convolution layer, why certain structures work, and how do different embedding layers impact this task as well.
lenet_normal_cnn.ipynb
contains results of using lenet and the simplest cnn structure on text classification, both with the normal embedding layer as well as pretrained word2vec embedding.
The result shows that cnn sequential depth don't matter that much for text classification task, as the simplest cnn model already achieve even slightly higher performance than using lenet:
Special thanks to these 2 posts and tutorials:
CNN in keras with pretrained word2vec weights
How to Develop a Multichannel CNN Model for Text Classification
lenet:
simplest cnn:
clean_normal_embed.ipynb
contains results of using normal embedding trained from scratch, with inception cnn.
clean_w2v
: using pre-trained w2v with inception cnn.
The above 2 files show that using the pre-trained w2v achieves slightly better performance than using the embedding layer trained from scratch.
w2v:
normal embedding:
dirty_tokenization
folder are experimentations by using a rather dirty tokenization.
The folder contains the following files:
normal_cnn_trainable_false.ipynb
: using pretrained w2v and inception cnn
dirty_embed_trainable_true.ipynb
: using embedding initiated from scratch and inception cnn
The results show that a dirty tokenization method results in lower accuracy (0.78 for w2v) and takes much longer time to train (30 epochs) to converge.
tune_normal_embed.ipynb
contains a failed attempt of trying to conduct hyperband on cnn inception for text classification. The result is likely due to the large size of the embedding layer. Even setting the embedding dimension to just 100 still results in a error on resource draining in hyperband.
Hyperband appears to work with setting embedding size to just 32, but the tradeoff is having really poor accuracy, too the point of being meaningless.
other_tests
is a folder that contains some other failure tests.
Inside it is a folder named overfitting
. In it are 2 notebooks of simplest cnn structure and inception cnn structure that results in high performance (accuracy of 88) but extreme overfitting
The reason is likely due to not adding the "L2 regularization" term, ephasizing the importance of regularization in this task.