Seeking Public Sentiments on "Demonetization" by Analyzing Twitter Data

This repository contains an application for analyzing Twitter data using Hadoop, Flume, Pig and MS-Excel.

Getting Started

Install Hadoop ,Pig and Flume

Before you get started with the actual application, you'll need Hadoop, Flume and Pig. The easiest way to get them is to use Cloudera Quickstart VM to set up your initial environment. You can download Cloudera Quickstart VM here or you can manually set them all up.

Configuring Flume

Download the custom Flume Source and add the JAR file to the Flume classpath

Copy flume-sources-1.0-SNAPSHOT.jar to /usr/lib/flume-ng/plugins.d/twitter-streaming/lib/flume-sources-1.0-SNAPSHOT.jar

Configure flume.conf file

Copy twitter.conf file provided to $FLUME_HOME/conf directory.

Edit the twitter.conf file and set parameters in accordance with the Twitter Agent.

Fill the correct Twitter's developer account authentication details.

If you wish to edit the keywords and add Twitter API related data, do it. 

Now, save the changes done in the file.

Starting the Flume agent

Create the HDFS directory for storing the Flume data. Make sure that it is accessible by the user running the Agent

$ hadoop fs -mkdir <destination>
$ hadoop fs -chown -R flume:flume <destination>
$ hadoop fs -chmod -R 770 <destination>
$ $FLUME_HOME/bin/flume-ng agent --conf $FLUME_HOME/conf/ -f $FLUME_HOME/conf/twitter.conf  Dflume.root.logger=DEBUG,console -n TwitterAgent

Conversion of Flume Data from Avro to CSV format and its Analyzation using Pig

Avro to CSV using Pig

You need PiggyBank library.

jar: http://search.maven.org/#search%7Cga%7C1%7Cpiggybank

#register PiggyBank library

grunt> register /<local_path>/piggybank-0.12.0.jar

#load avro to pig

grunt> result = LOAD '/<path_to_avro>/filename.avro' USING org.apache.pig.builtin.AvroStorage();

#store avro as csv

grunt> STORE result INTO '<destination_filename>' USING org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'YES_MULTILINE', 'UNIX') PARALLEL 1;

CSV Data Analyzation using Pig

Now place mainScript.sh, extraction.pig, AFINN.txt and the above formed .csv file, all in a Directory on your LFS.

Replace the file name of demonetization-tweets.csv in mainScript.sh with your .csv file formed in above step.

Run mainScript.sh as :

$ sh mainScript.sh

We will rate the word as per its meaning from +5 to -5 using the dictionary AFINN. The AFINN is a dictionary 
which  consists of 2500 words which are rated from +5 to -5 depending on their meaning.

Visualizing Our Output

After running mainScript.sh , output.csv file will be generated in your directory.

Now , you can visualize your output.csv using MsExcel, etc.

We have provided a screenshot of how we visualized our OUTPUT in output.csv

                                           ---X--X--X--X---

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Seeking Public Sentiments on "Demonetization" by Analyzing Twitter Data

Getting Started

Configuring Flume

Conversion of Flume Data from Avro to CSV format and its Analyzation using Pig

Visualizing Our Output

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
AFINN.txt		AFINN.txt
Project.JPG		Project.JPG
README.md		README.md
demonetization-tweets.csv		demonetization-tweets.csv
extraction.pig		extraction.pig
mainScript.sh		mainScript.sh
twitter.conf		twitter.conf

Two-Students/Twitter-Analysis-Demonetization

Folders and files

Latest commit

History

Repository files navigation

Seeking Public Sentiments on "Demonetization" by Analyzing Twitter Data

Getting Started

Configuring Flume

Conversion of Flume Data from Avro to CSV format and its Analyzation using Pig

Visualizing Our Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages