Skip to content

GregoryPye/IMDB-review-year-predictor

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IMDB review year predictor

Collaborative project to buld ML skills and try out approaches via determining review year of IMBD reviews

The discussion ...

I've been thinking about an example that is based on text analysis as it's new to me [Greg] and I think helps suss out what can and cannot be done in ML Vs. classic stats. And, what has a good sized data set plus some experience of others playing with it so there are good sources for otherwise tedious data wrangling.

One of the classic ones is the IMDB review corpus, though there a million examples of 'predict if the review is positive or negative' so I was musing on something different. One that came to mind is predicting the year of the review, as I have a hunch that the language form and words used will change over time. The data set below might be a good starting point as it has the full reviews along with the date of review.

https://www.kaggle.com/ebiswas/imdb-review-dataset

It might be that there is a solid hint in years written in the data set, so I was thinking of a pre-process step that bulk replaced any years or strings in the format of a date with QQQQQ or similar (practically, it might just be replace with a dictionary token of 'this was a year').

What do you think? May not be the best ever but it is grounded in real data and would allow e.g. use of pre-canned word relationship models as well as n-gram bag of words type stuff. And there might be a derivative version that estimates time between film release and review.

About

Collaborative project to determine review data of IMBD reviews

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%