Skip to content

Running Faunus on Amazon EC2

okram edited this page Jul 20, 2012 · 34 revisions

Amazon EC2 and Whirr make it easy to set up a Hadoop compute cluster that can then be utilized by Faunus. This section of documentation will explain how to set up a Hadoop cluster on Amazon EC2 and execute Faunus scripts.

Setting Up Whirr

Apache Whirr is a set of libraries for running cloud services. Whirr provides a cloud-neutral way to run services (you don’t have to worry about the idiosyncrasies of each provider), a common service API (the details of provisioning are particular to the service), and smart defaults for services (you can get a properly configured system running quickly, while still being able to override settings as needed). You can also use Whirr as a command line tool for deploying clusters. — The Apache Whirr Homepage


Faunus provides a Whirr recipe for loading up a Hadoop cluster that is properly versioned for the Hadoop currently used by Faunus (see bin/whirr.properties). This recipe is reproduced below. Please see the Whirr Quick Start for more information about the parameters and how to set up an Amazon EC2 account.

whirr.cluster-name=faunuscluster
whirr.cluster-user=ec2-user
whirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode,3 hadoop-datanode+hadoop-tasktracker
whirr.provider=aws-ec2
whirr.identity=${env:AWS_ACCESS_KEY_ID}
whirr.credential=${env:AWS_SECRET_ACCESS_KEY}
whirr.private-key-file=${sys:user.home}/.ssh/id_rsa
whirr.public-key-file=${sys:user.home}/.ssh/id_rsa.pub
whirr.hadoop.version=1.0.3

Once your Amazon EC2 keys and ssh key files have been properly set up, a Hadoop cluster can be launched. The recipe above creates a 4 node cluster.

faunus$ whirr launch-cluster --config bin/whirr.properties
Bootstrapping cluster
Configuring template
Configuring template
Starting 3 node(s) with roles [hadoop-datanode, hadoop-tasktracker]
Starting 1 node(s) with roles [hadoop-namenode, hadoop-jobtracker]
...

When logging into the Amazon EC2 Console, the cluster machines are visible. After running the Hadoop proxy . ~/.whirr/faunuscluster/hadoop-proxy.sh, the Hadoop cluster is ready for job submissions. A simply check to ensure that the Hadoop cluster is working is to see if HDFS is available.

faunus$ export HADOOP_CONF_DIR=~/.whirr/faunuscluster
faunus$ hadoop fs -ls /
Found 3 items
drwxr-xr-x   - hadoop supergroup          0 2012-07-20 19:13 /hadoop
drwxrwxrwx   - hadoop supergroup          0 2012-07-20 19:13 /tmp
drwxrwxrwx   - hadoop supergroup          0 2012-07-20 19:13 /user

Running a Faunus Script

```bash
faunus$ hadoop fs -put data/graph-of-the-gods.json graph-of-the-gods.json
faunus$
``

Clone this wiki locally