Skip to content

doanduyhai/Cassandra-Spark-Demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is a Spark/Cassandra demo using the open-source Spark Cassandra Connector

There are 2 packages with 2 distinct demos

  • us.unemployment.demo
  • Ingestion
    1. FromCSVToCassandra: read US employment data from CSV file into Cassandra
    2. FromCSVCaseClassToCassandra: read US employment data from CSV file, create case class and insert into Cassandra
  • Read
    1. FromCassandraToRow: read US employment data from Cassandra into CassandraRow low-level object
    2. FromCassandraToCaseClass: read US employment data from Cassandra into custom Scala case class, leveraging the built-in object mapper
    3. FromCassandraToSQL: read US employment data from Cassandra using SparkSQL a the connector integration
  • twitter.stream
  • TwitterStreaming: demo of Twitter stream saved back to Cassandra (stream IN). To make this demo work, you need to start the job with the following info:
        <ol>
            <li>-Dtwitter4j.oauth.consumerKey="value"</li>
            <li>-Dtwitter4j.oauth.consumerSecret="value"</li>
            <li>-Dtwitter4j.oauth.accessToken="value"</li>
            <li>-Dtwitter4j.oauth.accessTokenSecret="value"</li>
        </ol>
        
        If you don't have a Twitter app credentials, create a new apps at <a href="https://apps.twitter.com/" target="_blank">https://apps.twitter.com/</a>
    
  • weather.data.demo
  • Data preparation
    1. Go to the folder main/data
    2. Execute $CASSANDRA_HOME/bin/cqlsh -f weather_data_schema.cql from this folder. It should create the keyspace spark_demo and some tables
    3. Download the Weather_Raw_Data_2014.csv.gz from here (>200Mb)
    4. Unzip it somewhere on your disk
  • Ingestion
    1. WeatherDataIntoCassandra: read all the Weather_Raw_Data_2014.csv file (30.106 lines) and insert the data into Cassandra. It may take some time before the ingestion is done so go take a long coffee ( < 1 hour on my MacBookPro 15") Please do not forget set the path to this file by changing the WeatherDataIntoCassandra.WEATHER_2014_CSV value
    This step should take a while since there are 30.106 lines to be inserted into Cassandra
  • Read
    1. WeatherDataFromCassandra: read all raw weather data plus all weather stations details, filter the data by French station and take data only between March and June 2014. Then compute average on temperature and pressure
    This step should take a while since there are 30.106 lines to be read from Cassandra

About

Demo for the Spark Cassandra connector

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages