Known Issues

This is a set of scripts used for deploying a GTStream benchmark on an OpenStack cloud.

Currently these are highly personalized to my settings, but should be easy to extend/reconfigure.

Known Issues

Before using, be aware of some known issues (mostly due to my laziness):

the user name and path "/home/ubuntu" is hardcoded all over the place :(
several other paths are hardcoded (e.g. looking in "~/files" for required JVM package)
OpenStack dependencies: this has never been tested (nor do I have any intention to in the near future) on AWS.

Intro

GTStream (https://github.com/chengweiwang/GTStream) is designed as a large-scale log-streaming benchmark. This project adds some scripts that attempt to make it easier to deploy a benchmark test on an OpenStack cloud.

Currently, these scripts use newer versions of software used by GTStream, including:

Hadoop 1.2.1
HBase 0.94.10
Flume 1.4.0

Getting Started

Deploying requires Hadoop, HBase, and Flume. You can set these up in any configuration you like, but the scripts included here attempt to make it a bit easier to scale up a cloud deployment of these packages.

To get started, first edit the following two files to adjust settings:

conf/cloud-properties.sh: set certain VM properties to use, such as ssh keyname, path to ssh private key, login user, VM image, security groups, etc.
conf/gtstream-env.sh: environment variables used by install scripts. Set things like software versions, URLs to download packages from, install paths, configuration files, etc.

You can build up a minimal set of "required" servers by running:

bin/bootup_required_nodes.sh

which will start:

a Hadoop Master
an HBase Master
a Zookeeper quorum (at least one server)

Installing Software on Instances

Once an instance has been booted, you need to copy a JVM to "~/files" on that instance, and you need a copy of these scripts. Two included scripts can be used to do this. Here's a short example:

download an Oracle JVM to a directory called "files" somewhere. Note that you may not be able to use wget from the Oracle website due to licensing stuff: mkdir /tmp/files && cd /tmp/files wget [some_jvm_url]
clone the scripts: git clone [this_repos_url] /tmp/gtstream-scripts && cd /tmp/gtstream-scripts
bootup some nodes (see "Getting Started" above): bin/bootup_required_nodes.sh - or - bin/bootup_node_and_wait.sh [flavor] [hostname]
copy JVM and scripts: bin/sync_files_to_all.sh bin/sync_scripts_to_all.sh

Once you've got it all sync'd up, ssh into your VM and run: cd gtstream-scripts sudo bin/install.sh

Starting Software

To start Hadoop on the master, run: start-all.sh Note you may want to remove localhost from HADOOP_HOME/conf/slaves if you don't want the master to run a tasktracker/datanode.

To start HBase on the master, run: start-hbase.sh Note you probably don't want to remove localhost from HBASE_HOME/conf/regionservers, because localhost will likely be the regionserver for -ROOT-.

Scaling

To add Hadoop slaves (a datanode and a tasktracker), bootup a VM using bin/bootup_node_and_wait.sh and follow the install procedure above. Then login and run: bin/add_self_to_hadoop.sh

To add HBase regionservers, do the same and login and run: bin/add_self_to_hbase.sh

GTStream Benchmarking

By default, Flume is configured to tail a single logfile using the exec source, and to push data directly into HBase.

You can (and probably will want to) customize this to a more intricate setup (e.g., with log aggregation, and using something more stable than exec source).

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
bin		bin
conf		conf
fabric		fabric
LICENSE.txt		LICENSE.txt
README.md		README.md
bootstrap.py		bootstrap.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Known Issues

Intro

Getting Started

Installing Software on Instances

Starting Software

Scaling

GTStream Benchmarking

About

Releases

Packages

Languages

License

bjlaub/gtstream-scripts

Folders and files

Latest commit

History

Repository files navigation

Known Issues

Intro

Getting Started

Installing Software on Instances

Starting Software

Scaling

GTStream Benchmarking

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages