Guide to RADAR HDFS Connector

An adapted Connector to write stream data to HDFS, using Confluent HDFS Connector with an adapted AvroFormat to write both key and value data.

Using HDFS Connector with Custom Format

Adapted HDFS Connector should be added to the $CLASSPATH to be used. Add radarbackend.jar to $CLASSPATH and follow the configurations as mentioned in User Guide of HDFS Connector.

Sample Configuration

name=radar-hdfs-sink
connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
tasks.max=1
topics=mock_empatica_e4_battery_level,mock_empatica_e4_blood_volume_pulse
flush.size=1200
hdfs.url=hdfs://localhost:9000
format.class=org.radarcns.sink.HDFS.AvroFormatRadar

To execute connector in standalone mode

connect-standalone /etc/schema-registry/connect-avro-standalone.properties path-to-your-hdfs-connector-configuration.properties

Extracting data from HDFS to local file system

By default the data is written to /topics/ directory. (Can be configured using topics.dir property)

To extract data from HDFS

Create a directory to collect written data

`sudo mkdir <dir-name>
Change current directory of command line to that directory

`cd <dir-name>
Extract data from HDFS

`hadoop fs -get /topics

Aggregating collected data

Download avro-tools-1.7.7.jar.

Create a radar-data-extract.sh script as below

#!/bin/bash
    
#Save current directory 
cur=$PWD
#Save command line arguments so functions can access it
args=("$@")
    
#To access command line arguments use syntax ${args[1]} etc
function dir_command {
    #This example command implements doing git status for folder
    cd "/partition=0"
    echo $PWD
    java -jar /path-to/avro-tools-1.7.7.jar concat $(ls) "../_full.avro"
    java -jar /path-to/avro-tools-1.7.7.jar tojson "../_full.avro" >>"../_full.json"
}
    
#This loop will go to each immediate child and execute dir_command
find . -maxdepth 1 -type d \( ! -name . \) | while read dir; do
  dir_command "$dir"
  cd "$cur"
done

Navigate to /topics directory. Execute radar-data-extract.sh from /topics directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly