Built a scraper and automatically collected data on all the reviews for MCU movies from IMDB. Implemented bag of words model to extract dominant words, common word associations and text features (such as readability, length, etc of review,) that relate to review score.
Conducted sentiment analysis and analyzed the effect of sentiment on review score and profit generated for each movie. Identified the best dictionary for sentiment analysis through comparing regression models (on rating and sentiment) and assessing model fit. Further augmented analysis by looking into aspects such as the readability, formality and length of reviews and measuring its impact on ratings. Additionally performed unsupervised and supervised topic modelling to look at topics generated across the corpus.