Skip to content

Commit

Permalink
Merge branch 'Gradiant-develop' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
91pavan committed Nov 12, 2018
2 parents d52fa1a + c1862a8 commit 792c705
Show file tree
Hide file tree
Showing 101 changed files with 5,516 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
scripts/components/files/.ipynb_checkpoints/
.vscode/
docker/dm_keys/
1 change: 1 addition & 0 deletions docker/AUTHORS
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Carlos Giraldo <[email protected]>
Empty file added docker/CONTRIBUTORS
Empty file.
15 changes: 15 additions & 0 deletions docker/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Unless otherwise specified in the file, this software is:

Copyright (c) 2018 Gradiant. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
173 changes: 173 additions & 0 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
# red-pnda

<img src="logo/pnda1r-trans.png" alt="Red PNDA logo" width="400" height="300"/>

This framework provisions a minimal set of the PNDA ([pnda.io](http://pnda.io)) components to enable developers writing apps targeted at the full PNDA stack, to experiment with the PNDA components in a smaller, lightweight environment. Data exploration and app prototyping is supported using Jupyter and Apache Spark.

**Note**:

* Packages and application support isn't available on red-pnda. The respective tabs will **not** work on the PNDA console and will throw an error message.

* This framework is not implemented with either scalability nor HA in mind and hence is unsuited for running production workloads. If this is a requirement, then one of the core PNDA flavors will be required - see PNDA [Guide](http://pnda.io/guide).


The Red PNDA framework is intended as a platform for experimentation and is NOT formally supported at this point in time. Any issues encountered with the system can be reported to the standard PNDA support forums for informational purposes only.

## Acknowledgement

This work has been inspired by an initial concept created by Maros Marsalek ([https://github.com/marosmars](https://github.com/marosmars)) and Nick Hall ([https://github.com/cloudwiser](https://github.com/cloudwiser))

## Prerequisites

Tested with Ubuntu 18.04 distro.

Docker Engine (tested with docker version **18.05.0-ce**)

docker-compose (tested with docker-compose version **1.21.2**)


Minimum amount of RAM / VCPU / Storage: 4GB / 2 / 16GB
Recommended amount of RAM / VCPU / Storage: 16GB / 4 / 60GB

These values are illustrative since they depend on the data analytics application to run on PNDA.

## Deploying red-PNDA as docker containers

The `deploy.sh` script start the containers and perform several post deploy tasks (e.g., creating users, initializing DDBB tables, etc.). Just inspect the script for more information.

To access the PNDA services from the host, the script appends `service-name IP-address` to the /etc/hosts file.

After deployment access the [PNDA console-frontend Web](http://console-frontend).

Default user is `pnda` and password `pnda`.

Other service web UIs:

* [Spark](http://spark-master:8080)
* [Kafka-manager](http://kafka-manager:10900)
* [HDFS](http://hdfs-namenode:50070)
* [HBASE](http://hbase-master:60010)
* [Jupyter](http://jupyter:8000)
* [Grafana](http://grafana:3000)
* [OpenTSDB](http://opentsdb:4242)

### Terminal access to running containers

You should be able to access a bash terminal in any of the running
containers through the `docker exec -ti CONTAINER_NAME /bin/bash` command.

### Access to service logs
You should be able to get the logs of any of the running
containers through the `docker logs CONTAINER_NAME` command.

## Red-PNDA components

Red-PNDA makes use the following open source components:

* Console Frontend - [https://github.com/pndaproject/platform-console-frontend](https://github.com/pndaproject/platform-console-frontend)
* Console Backend - [https://github.com/pndaproject/platform-console-backend](https://github.com/pndaproject/platform-console-backend)
* Platform Testing - [https://github.com/pndaproject/platform-testing](https://github.com/pndaproject/platform-testing)
* Platform Libraries - [https://github.com/pndaproject/platform-libraries](https://github.com/pndaproject/platform-libraries)
* Kafka 1.0.0 - [http://kafka.apache.org](http://kafka.apache.org)
* Jupyter Notebook - [http://jupyter.org](http://jupyter.org)
* Apache Spark 2.3.1 - [http://spark.apache.org](http://spark.apache.org)
* Apache Hbase 2.0.1 - [http://hbase.apache.org](http://hbase.apache.org)
* OpenTSDB 2.3.1 - [http://opentsdb.net](http://opentsdb.net)
* Grafana 5.0.3 - [https://grafana.com](https://grafana.com)
* Kafka Manager 1.3.3.17 - [https://github.com/yahoo/kafka-manager](https://github.com/yahoo/kafka-manager)
* Example Kafka Clients - [https://github.com/pndaproject/example-kafka-clients](https://github.com/pndaproject/example-kafka-clients)
* Jmxproxy 3.2.0 - [https://github.com/mk23/jmxproxy](https://github.com/mk23/jmxproxy)

## Data Ingestion

For instructions on how to use logstash to ingest data, refer to this [guide](../logstash_guide.md)

For detailed instructions on different data ingress methods, refer to this [guide](http://pnda.io/pnda-guide/producer/)

### Kafka

#### How to connect to red-pnda kafka instance?

To connect to the red-pnda kafka instance, you can connect to the broker on `kafka:9092`.

#### Are there any default topics which I can use?

By default, there are two kafka topics created for easy usage.

1. raw.log.localtest
2. avro.log.localtest

The `raw.log.localtest` topic is a generic topic; you could use this topic to ingest any type of data.

The `avro.log.localtest` topic can be used to ingest PNDA avro encoded data.

Note that if you use the `raw.log.localtest` topic, data is written to the disk of the VM.

By default data is stored in the `/data` directory of the VM's file system using a system-timestamp directory hierarchy

For example, if you streamed data on 20th June 2017 at 5PM, your data will be stored in...

/data/year=2017/month=6/day=20/hour=17/dump.json

#### Sample Kafka Producer

We have also provided a sample Kafka producer in python. This will send one json event to the `raw.log.logtest` topic per execution, so feel free to play around with it.

cd /opt/pnda
python producer.py

Depending on what time you send the data, it will be stored in

/data/year=yyyy/month=mm/day=dd/hour=hh/dump.json

Where yyyy,mm,dd and hh can be retreived by using the system date command

date


## Jupyter Notebooks

The [Jupyter Notebook](http://jupyter.org) is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. In Red PNDA, it supports exploration and presentation of data from the local file system.

The default password for the Jupyter Notebook is `pnda`

Please refer to our [Jupyter Guide](../jupyter_guide.md) for steps on how to use Jupyter

For those who are new to PNDA, there’s a network-related dataset (BGP updates from the Internet) and an accompanying tutorial Juypter notebook named `Introduction to Big Data Analytics.ipynb`, to help you get started.

Also, there's a sample tutorial named `tutorial.ipynb` provided to do some basic analysis with data dumped to disk via Kafka through Spark DataFrames.

If you are interested in data mining or anomaly detection, take a look at the `red-pnda-anom-detect.ipynb` where we work with telemetry data and try and detect unintentional traffic loss in the network.

## Grafana Server

Default login credentials for Grafana is `pnda/pnda`


## Shutdown

To stop docker PNDA services run `docker-compose down`.

To remove the docker PNDA containers run `docker-compose rm`.

To delete the PNDA docker persistent volumes run `./delete_volumes.sh`.


## General Troubleshooting

Please refer to our [Troubleshooting guide](../General_Troubleshooting.md) for tips if you encounter any problems.


## Further Reading

For further deep dive into the various components, use this as a entry point.

* Jupyter Notebooks, this guide which contains a nice intro to Jupyter as well: [https://github.com/jakevdp/PythonDataScienceHandbook](https://github.com/jakevdp/PythonDataScienceHandbook)

* OpenTSDB: [http://opentsdb.net/docs/build/html/user_guide/quickstart.html](http://opentsdb.net/docs/build/html/user_guide/quickstart.html)

* Grafana: [http://docs.grafana.org/guides/getting_started/](http://docs.grafana.org/guides/getting_started/)

* Kafka Manager: [https://github.com/yahoo/kafka-manager](https://github.com/yahoo/kafka-manager)

* Apache Spark: [https://spark.apache.org/docs/1.6.1/quick-start.html](https://spark.apache.org/docs/1.6.1/quick-start.html)
4 changes: 4 additions & 0 deletions docker/delete_volumes.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash


docker volume rm $(docker volume ls -f name=red-pnda -q)
106 changes: 106 additions & 0 deletions docker/deploy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
#!/bin/bash

echo "---------------- STARTING HDFS and HBASE ----------------"
docker-compose up -d zookeeper
docker-compose up -d hdfs-namenode
docker-compose up -d hdfs-datanode
while ! docker exec -ti hdfs-namenode nc -vz hdfs-namenode:8020 ; do
echo "waiting for hdfs-namenode to start"
sleep 2
done
docker-compose up -d hbase-master
docker-compose up -d hbase-region

echo "---------------- ADDING users to HDFS ----------------"
echo "adding hdfs as admin superuser"
docker exec -ti hdfs-namenode adduser --system --gecos "" --ingroup=root --shell /bin/bash --disabled-password hdfs
echo "adding pnda user"
PNDA_USER=pnda
PNDA_GROUP=pnda
docker exec -ti hdfs-namenode addgroup $PNDA_GROUP
docker exec -ti hdfs-namenode adduser --gecos "" --ingroup=$PNDA_GROUP --shell /bin/bash --disabled-password $PNDA_USER
docker exec -ti hdfs-namenode hdfs dfs -mkdir -p /user/$PNDA_USER
docker exec -ti hdfs-namenode hdfs dfs -chown $PNDA_USER:$PNDA_GROUP /user/$PNDA_USER
docker exec -ti hdfs-namenode hdfs dfs -chmod 770 /user/$PNDA_USER


echo "---------------- ADDING KITE_TOOLS to HDFS NAMENODE AND INITIALIZE PNDA REPOs ----------------"
docker cp hdfs/kite-files/pnda.avsc hdfs-namenode:/tmp/pnda.avsc
docker cp hdfs/kite-files/pnda_kite_partition.json hdfs-namenode:/tmp/pnda_kite_partition.json
docker exec -i hdfs-namenode apk add --no-cache curl
docker exec -i hdfs-namenode /bin/bash < hdfs/add_kite_tools_and_create_db.sh

echo "---------------- CREATING HBASE TABLES for OPENTSDB ----------------"
docker exec -i hbase-master /bin/bash < opentsdb/create_opentsdb_hbase_tables.sh

echo "---------------- ENABLING THRIFT API in HBASE MASTER ----------------"
docker exec -d hbase-master hbase thrift start -p 9090
while ! docker exec -ti hbase-master nc -vz hbase-master:9090 ; do
echo "waiting for hbase thrift api to start"
sleep 2
done
echo "---------------- STARTING THE REST OF THE SERVICES ----------------"
docker-compose up -d
echo "---------------- CREATING pnda user in services ----------------"
docker exec deployment-manager sh -c 'adduser -D pnda && echo "pnda:pnda" | chpasswd'
docker exec jupyter-ssh sh -c 'adduser -D pnda && echo "pnda:pnda" | chpasswd'

echo "---------------- ADDING ssh keys to dm_keys volume ----------------"
mkdir -p dm_keys
echo "Generating SSH Keys for Deployment Manager connections"
ssh-keygen -b 2048 -t rsa -f dm_keys/dm -q -N ""
cp dm_keys/dm dm_keys/dm.pem

docker cp dm_keys/ deployment-manager:/opt/pnda/
docker exec -ti deployment-manager chown -R root:root /opt/pnda/dm_keys/
docker exec -ti deployment-manager chmod 644 /opt/pnda/dm_keys/dm.pub
docker exec -ti deployment-manager chmod 600 /opt/pnda/dm_keys/dm.pem
docker exec -ti deployment-manager chmod 600 /opt/pnda/dm_keys/dm


echo "---------------- ADDING Public key to jupyter-ssh ----------------"
docker exec jupyter-ssh mkdir -p /home/pnda/.ssh
docker cp dm_keys/dm.pub jupyter-ssh:/home/pnda/.ssh/authorized_keys
docker exec jupyter-ssh chmod 644 /home/pnda/.ssh/authorized_keys
docker exec jupyter-ssh chown -R pnda:pnda /home/pnda/.ssh
docker exec jupyter-ssh mkdir -p /root/.ssh
docker cp dm_keys/dm.pub jupyter-ssh:/root/.ssh/authorized_keys
docker exec jupyter-ssh chmod 644 /root/.ssh/authorized_keys
docker exec jupyter-ssh chown -R root:root /root/.ssh
echo "---------------- ADDING Public key to deployment-manager-ssh ----------------"
#docker exec deployment-manager-ssh mkdir -p /root/.ssh
#docker cp dm_keys/dm.pub deployment-manager-ssh:/root/.ssh/authorized_keys
#docker exec deployment-manager-ssh chmod 644 /root/.ssh/authorized_keys
#docker exec deployment-manager-ssh chown -R root:root /root/.ssh

./register_hostnames.sh

#echo "---------------- OOZIE create sharelib in HDFS ----------------"
#docker exec oozie oozie-setup.sh sharelib create -fs hdfs://hdfs-namenode:8020
echo "---------------- KAFKA-MANAGER CONFIGURATION ----------------"
curl -X POST \
http://kafka-manager:10900/clusters \
-H 'content-type: application/x-www-form-urlencoded' \
-d 'name=PNDA&zkHosts=zookeeper%3A2181&kafkaVersion=1.0.0&jmxEnabled=true&jmxUser=&jmxPass=&activeOffsetCacheEnabled=true&securityProtocol=PLAINTEXT' &>/dev/null

echo "---------------- GRAFANA: importing data sources and dashboards ----------------"
timeout 10s bash -c 'while [[ $(curl -s -o /dev/null -w %{http_code} http://grafana:3000/login) != 200 ]]; do sleep 1; done; echo OK' || echo TIMEOUT

curl -H "Content-Type: application/json" -X POST \
-d '{"name":"PNDA OpenTSDB","type":"opentsdb","url":"http://localhost:4242","access":"proxy","basicAuth": false,"isDefault": true }' \
http://pnda:pnda@grafana:3000/api/datasources
curl -H "Content-Type: application/json" -X POST \
-d '{"name":"PNDA Graphite","type":"graphite","url":"http://$GRAPHITE_HOST:$GRAPHITE_PORT","access":"proxy","basicAuth":false,"isDefault":false}' \
http://pnda:pnda@grafana:3000/api/datasources
./grafana/grafana-import-dashboards.sh grafana/PNDA.json
./grafana/grafana-import-dashboards.sh grafana/PNDA-DM.json
./grafana/grafana-import-dashboards.sh grafana/PNDA-Hadoop.json
./grafana/grafana-import-dashboards.sh grafana/PNDA-Kafka.json
echo "red-PNDA Deployment Finished - Opening console-frontend web ui"
xdg-open http://console-frontend






Loading

0 comments on commit 792c705

Please sign in to comment.