Author: Grace Campbell
Reddit is a content aggregation website where members can submit links, text posts, images, and videos, which other members can then comment on and discuss. The posts "are organized by subject into user-created boards called 'subreddits', which cover a variety of topics including news, science, movies, video games, music, books, fitness, food, and image-sharing." (Wikipedia)
There are two subreddits I am interested in: /r/News and /r/TheOnion. The first contains titles of news articles, while the second contains titles of satirical news articles. Can I build a classification model using natural language processing that can accurately predict which subreddit a given post came from?
- Data Preparation
- Modeling