-
Notifications
You must be signed in to change notification settings - Fork 6
Guide to RADAR HDFS Connector
An adapted Connector to write stream data to HDFS, using Confluent HDFS Connector with an adapted AvroFormat to write both key and value data.
Adapted HDFS Connector should be added to the $CLASSPATH
to be used.
Add radarbackend.jar
to $CLASSPATH
and follow the configurations as mentioned in User Guide of HDFS Connector.
name=radar-hdfs-sink
connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
tasks.max=1
topics=mock_empatica_e4_battery_level,mock_empatica_e4_blood_volume_pulse
flush.size=1200
hdfs.url=hdfs://localhost:9000
format.class=org.radarcns.sink.HDFS.AvroFormatRadar
connect-standalone /etc/schema-registry/connect-avro-standalone.properties path-to-your-hdfs-connector-configuration.properties
By default the data is written to /topics/
directory. (Can be configured using topics.dir
property)
Create a directory to collect written data
`sudo mkdir <dir-name>
Change current directory of command line to that directory
`cd <dir-name>
Extract data from HDFS
`hadoop fs -get /topics
Download avro-tools-1.7.7.jar.
Create a radar-data-extract.sh
script as below
#!/bin/bash
#Save current directory
cur=$PWD
#Save command line arguments so functions can access it
args=("$@")
#To access command line arguments use syntax ${args[1]} etc
function dir_command {
#This example command implements doing git status for folder
cd "/partition=0"
echo $PWD
java -jar /path-to/avro-tools-1.7.7.jar concat $(ls) "../_full.avro"
java -jar /path-to/avro-tools-1.7.7.jar tojson "../_full.avro" >>"../_full.json"
}
#This loop will go to each immediate child and execute dir_command
find . -maxdepth 1 -type d \( ! -name . \) | while read dir; do
dir_command "$dir"
cd "$cur"
done
Navigate to /topics
directory.
Execute radar-data-extract.sh
from /topics
directory.