In this lab, you will discover how to compile and deploy a Spark Streaming application and then use OpenTSDB to query the data which have been written in HBase.
Applications that run on PNDA are packaged as tar.gz
archives and pushed to an application repository. The tar
archive contains all the binaries and configuration required to run the application.
You'll learn how to use the Deployment Manager API to deploy a package from the package repository and run it on the cluster.
Make sure you are familiar with the following before proceeding with the lab:
- The getting started guide will introduce you to many concepts used in this lab.
- The console is your starting point for interacting with a PNDA cluster. All of the main features of PNDA can be accessed from the console.
- The Deployment Manager API provides a mechanism for discovering available packages and deploying & undeploying functionality. It works with a well-defined package structure to make it fast and straightforward to get real scalable analytics running on the platform.
- Make sure that your package repository is correctly configured as described in the PREPARE phase depending your infra: AWS, OpenStack or server clusters.
- The platform-package-repository tool lets you upload packages to OpenStack Swift.
- Read the technical notes in the example repository.
Make sure you have the following installed on your development computer:
The PNDA distribution is available on GitHub at:
Clone the example-applications repository.
In the root folder, run the command:
sbt assembly
To build the release 1.0.0 of your application, run:
cd app/src/main/resources/
mkdir darkvader-1.0.0
cp -r sparkStreaming darkvader-1.0.0
tar zcf darkvader-1.0.0.tar.gz darkvader-1.0.0
Use the platform-package-repository tool to upload the application tar.gz file to your application repository.
Use the graphical interface in the console to deploy packages and start applications.
The technical notes in the example repository describe how to set up a test producer that will create suitable test data for consumption by this example application.
The messages on Kafka are Avro encoded and look like this:
src: "collectd"
timestamp: 1453480869657
host_ip: "bb80:0:1:2:a00:bbff:bbee:b123"
rawdata: "{'host':'pnda0','collectd_type':'cpu','value':'90','timestamp':'2016-02-15T13:00:12.000Z'}"
- src is indicating that this is a collectd agent
- raw data is containing dynamic value of CPU usage for a set of pnda hosts
The Spark Streaming app uses raw data. Create a new structure and push it to the OpenTSDB timeseries database like this:
- timestamp
- kso.collectd.cpu as the OpenTSDB metric
- host as a tag
- value
The Spark Streaming code is located in: TODO check path
streaming-app/src/main/scala/com/cisco/pnda/PndaConsumer.scala
You can learn more about these Spark Streaming API calls and the rest of the API by reading the Spark Streaming documentation.
Once running and consuming data the example application will display some KPIs in the console.
OpenTSDB offers a number of means to extract data, such as CLI tools, an HTTP API and as a GnuPlot graph. Querying with OpenTSDB's tag based system can be a bit tricky, so read through this document and checkout the OpenTSDB HTTP API format page for further information.
Use the Main console to navigate to the OpenTSDB UI. Try entering a current time window and experiment with the autocompleting "Metric" field to find data to display.
You can also use the OpenTSDB API in order to retrieve data in an efficient way according to your needs. Here are some examples:
Get all points of a time series for a specific period:
GET /api/query?start=2016/02/05-00:00:00&end=2016/02/05-00:30:00&m=sum:kso.collectd.cpu{host=pnda0}
This will return a response such as:
[{"metric":"kso.collectd.cpu","tags":{"host":"pnda0"},"aggregateTags":[],"dps":{"1454630401":32.0,"1454630462":49.0,"1454630521":0.0,"1454630582":39.0,"1454630642":13.0,"1454630701":9.0,"1454630762":11.0,"1454630821":6.0,"1454630882":69.0,"1454630942":29.0,"1454631001":27.0,"1454631062":77.0,"1454631121":73.0,"1454631182":12.0,"1454631241":2.0,"1454631302":5.0,"1454631362":48.0,"1454631421":14.0,"1454631482":46.0,"1454631541":93.0,"1454631602":75.0,"1454631661":3.0,"1454631722":26.0,"1454631781":33.0,"1454631842":9.0,"1454631901":67.0,"1454631962":55.0,"1454632021":90.0,"1454632082":10.0,"1454632141":64.0}}]
If you no longer need it, stop the application and undeploy the package via the console.