Skip to content

Latest commit

 

History

History

kafka-spark-opentsdb

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Example Streaming Application: Kafka to OpenTSDB

Overview

This example application reads events from Kafka and writes them to OpenTSDB.

The application is a tarball file containing binaries and configuration files required to perform stream processing.

PNDA Guide

The chapter "Packages & Applications" in the PNDA guide contains a lot more detail on the Deployment Manager, Console and Repository Manager. You can also work through the step by step guide to using PNDA in "Getting Started".

Requirements

Please note that the current implementation does not support being compiled against Cloudera libraries.

Build and creating the package

This project is built with sbt. See the install instructions.

The previous step generate the jar file. To create a package, you will need a set of files, which are available in the src/universal folder:

  • application.properties: config file used by the Spark Streaming scala application.
  • log4j.properties: defines the log level and behaviour for the application.
  • opentsdb.json: contains metrics to be created in OpenTSDB.
  • properties.json: contains default properties that may be overridden at application creation time.

We use the sbt native packager for creating the package. For more information: SBT Native packager. To build and generate the tarball, run:

 sbt packageApp

Your package will be available into the target/universal folder.

Deploying the package and creating an application

The PNDA console can be used to deploy the application package to a cluster and then to create an application instance. The console is available on port 80 on the edge node.

When creating an application in the console, ensure that the input_topic property is set to a real Kafka topic.

"input_topic": "avro.events.samples",

To make the package available for deployment, it must be uploaded to a package repository. The default implementation is an OpenStack Swift container. The package may be uploaded via the PNDA repository manager which abstracts the container used, or by manually uploading the package to the container.

Run sample data source

If you want to produce test data and see how the ingest pipeline works, there is a script in data-source/producer.py which produces random events and sends it over Kafka.

To run the test script, refer to the instructions in the data-source folder of this repository.