Skip to content

LLLWWWJJJP99/Tweet-Sentiment-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tweet Sentiment Analysis Project

This project deals with real-time streaming data arriving from twitter streams.

** 1.Task.**

I implement the following framework using Apache Spark Streaming, StandfordCoreNLP, Twitter Developer Restful API, ElasticSearch and Kibana. The framework performs sentiment analysis of particular hashtags in twitter data in real time. For example, we want to do the sentiment analysis for all the tweets for #guncontrolnow and show their (e.g.,positive, neutral, negative) statistics in Kibana.

For this, I get the tweets via scrapper. Next, I write a sentiment analysis program to predict sentiment of the tweet message. Finally, I visualize Ir fndings using ElasticSearch/Kibana.

Scrapper -> Sentiment Analyzer/Common topic finder -> Visualizer (ElasticSearch/Kibana)

Module Details

1) Scrapper

We provide a sample scrapper (stream.py). However, I need to extend the code to support the following functionality.

The scrapper collects tweets and pre-process them for analytics. It is a standalone program written in Python by using twitter dev restful api and should perform the following:

  1. Collect tweets in real-time with particular hashtag. For example, I collect all tweets with #guncontrolnow.
  2. After getting tweets, I fliter them by removing emoji symbols and special characters and discard any noisy tweet that do not belong to #guncontrolnow. Note that the returned tweet contains both the meta data (e.g., location) and text contents. I have to keep at least the text content and the location meta data.
  3. After fltering, I convert the location meta data of each tweet back to its geolocation info by calling google geo API and send the text and geolocation info to spark streaming.
  4. My scrapper program run infnitely and should take hash tags as input parameters while running.

2) Sentiment Analyzer

Sentiment Analyzer determines whether a piece of tweet is positive, neutral or negative. For example,

I use any third-party sentiment analyzer like Stanford CoreNLP for sentiment analyzing.

In summary, for each hashtag, I perform sentiment analysis using sentiment analysis tools discussed above and output sentiment and geolocation of each tweet to some external bases (either save in a json fle or send them to kibana for visualization).

3) Visualizer

I install ElasticSearch and Kibana. Create an index for visualization. Create a data table to show the sentiment of each tweet, i.e., "sentiment | tweet". Then, create a number of geo coordinate maps to show the geolocation distribution of tweets. More specifcally, frst geo coordinate map show the geolocation distribution of all tweets, regardless of sentiment related to #guncontrolnow. Second and Third geo coordinate map show geolocation distributions of positive tweets and negative tweets, respectively. When I send data from spark to ElasticSearch, I need to add a time stamp. In the dashboard, set the refresh time to 2 min as an example.

** 2.How to run it.**

  1. Install tweepy, standfordcorenlp, googlemap and kibana packages in your ide.
  2. Install elasticsearch and kibana locally.
  3. run MyScraper.py firstly and then run main.py.
  4. If you have already install kibana or used kibana cloud, you can access my bashboard at My Tweets Bashboard

Bashboard Screenshot alt text

feel free to email me about how to run the project. My email: [email protected]

About

CS 6350: Big Data Analytics and Management Spring 2018

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages