A program to scrape news stories from The Guardian for sentiment analysis.
This was something I wrote a couple years ago to play with a sentiment analysis library (VADER) and to practice web scraping and using pandas dataframes.
To scrape the stories run: python guardian_scraper.py
.
You can set the number of stories scraped per news theme via the NO_URLS variable and add or remove news themes in the THEMES list. Scraping 50 stories will take around 1-2 minutes per theme.
You can run each analysis separately, or use python run_analysis.py
to run them all. Set the DATAFILE variable to the
output generated by guardian_scraper, or to the example data set in the /data directory.
The analysis will take a few minutes to run, and generate:
Violin plots from the sentiment analysis of each theme:
Wordclouds for each theme: