Analyzed tweets to determine positive, negative, or neutral sentiment from kaggle competition data.
Python Version: 3.6 Packages: pandas, numpy, plotly, SpaCy, nltk
Distribution of sentiments in training data
To evaluate my models I used the Jaccard index, which determines the similarity of two sample sentences.
Here are the distributions of Jaccard scores on tweets compared with training tweets and selected parts of a tweet.
List of the most common words (after removal of stopwords)
I used SpaCy to teach my named entity recogniser My steps:
- Load the model
- Shuffle and loop over selected training examples
- Save the model
- Test the model