Skip to content

Latest commit

 

History

History
22 lines (16 loc) · 880 Bytes

README.md

File metadata and controls

22 lines (16 loc) · 880 Bytes

IMDBReviewsSentimentAnalysis

This project is a sentiment analysis of 50k movie reviews on Internet Movie Database (IMDB) and the dataset is available on: IMDB dataset of 50k movie reviews

SentimentAnalysis_IMDBReviews.ipynb consists of the following steps:

Preprocessing:

  • Lowercasting
  • Removing URLs
  • Removing Punctuations
  • Removing Stopwords
  • Handling Emogies
  • stemming

vectorizing using TF-IDF

building, training and testing the model using 3 methods:

  • Logistic Regression
  • Random Forest Classifier
  • Decision Tree Classifier

At the end, Logistic Regression turned out to be the best model, with accuracy score of 0.8941