-
Notifications
You must be signed in to change notification settings - Fork 58
Getting Started
Faunus requires that the user have access to a Hadoop cluster. If a Hadoop cluster is readily available to the user, then Faunus is easy to get up and running. If not, then the provided Whirr recipe can be leveraged to spawn a Hadoop cluster on Amazon EC2 (see the following tutorial) or a local instance of Hadoop can be run in pseudo-cluster mode (see the following tutorial). This section will discuss the pseudo-cluster approach for those users just getting started with Hadoop (and Faunus). For more experienced Hadoop users, the examples below can be easily adapted to work with a non-local Hadoop cluster (e.g. simply change the Hadoop configuration to point to the accessible cluster ($HADOOP_CONF
).
Faunus has been written and tested with Hadoop 1.0. For using Faunus locally on a single machine, a Hadoop pseudo-cluster can be used. Instructions to set up a pseudo-cluster are provided in the Hadoop documentation. Once the pseudo-cluster has been set up, it can be started at anytime using $HADOOP_HOME/bin/start-all.sh
. Once running, Hadoop can be issued jobs (e.g. Faunus jobs).
~$ start-all.sh
starting namenode, logging to /Applications/hadoop/hadoop-1.0.3/libexec/../logs/hadoop-marko-namenode-markolaptop.local.out
localhost: starting datanode, logging to /Applications/hadoop/hadoop-1.0.3/libexec/../logs/hadoop-marko-datanode-markolaptop.local.out
localhost: starting secondarynamenode, logging to /Applications/hadoop/hadoop-1.0.3/libexec/../logs/hadoop-marko-secondarynamenode-markolaptop.local.out
starting jobtracker, logging to /Applications/hadoop/hadoop-1.0.3/libexec/../logs/hadoop-marko-jobtracker-markolaptop.local.out
localhost: starting tasktracker, logging to /Applications/hadoop/hadoop-1.0.3/libexec/../logs/hadoop-marko-tasktracker-markolaptop.local.out
To ensure that the installation is running properly, do an ls
on Hadoop’s distributed file system, HDFS.
faunus$ hadoop fs -ls /
Found 2 items
drwxr-xr-x - marko supergroup 0 2012-07-26 11:55 /tmp
drwxr-xr-x - marko supergroup 0 2012-07-26 11:55 /user
Faunus can be downloaded from the downloads section of this project. Faunus provides a Gremlin implementation that is used to construct a Faunus job (a chain of 1 or more MapReduce jobs). The shell script bin/faunus.sh
takes two arguments:
-
Faunus configuration file: a reference to a
java.util.Properties
formatted file with various job control parameters (optional — defaults tobin/faunus.properties
). - Gremlin script: a Gremlin graph traversal that either yields a derived graph or some statistic of the graph.
-
Command line configuration:
-D
prefixed property list that overrides and/or extends the properties identified in the Faunus configuration file. (e.g.-Dfaunus.graph.output.format.class=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
Faunus deploys with a toy graph called The Graph of the Gods (represented in Faunus’ GraphSON format). This graph has 12 vertices/17 edges and denotes people, places, monsters and their various types of relationships to one another. A diagrammatic representation is provided above and the raw GraphJSON representation is provided below.
{"name":"saturn","type":"titan","_id":0,"_inE":[{"_label":"father","_id":12,"_outV":1}]}
{"name":"jupiter","type":"god","_id":1,"_outE":[{"_label":"lives","_id":13,"_inV":4},{"_label":"brother","_id":16,"_inV":3},{"_label":"brother","_id":14,"_inV":2},{"_label":"father","_id":12,"_inV":0}],"_inE":[{"_label":"brother","_id":17,"_outV":3},{"_label":"brother","_id":15,"_outV":2},{"_label":"father","_id":24,"_outV":7}]}
{"name":"neptune","type":"god","_id":2,"_outE":[{"_label":"lives","_id":20,"_inV":5},{"_label":"brother","_id":19,"_inV":3},{"_label":"brother","_id":15,"_inV":1}],"_inE":[{"_label":"brother","_id":18,"_outV":3},{"_label":"brother","_id":14,"_outV":1}]}
{"name":"pluto","type":"god","_id":3,"_outE":[{"_label":"pet","_id":23,"_inV":11},{"_label":"lives","_id":21,"_inV":6},{"_label":"brother","_id":17,"_inV":1},{"_label":"brother","_id":18,"_inV":2}],"_inE":[{"_label":"brother","_id":19,"_outV":2},{"_label":"brother","_id":16,"_outV":1}]}
{"name":"sky","type":"location","_id":4,"_inE":[{"_label":"lives","_id":13,"_outV":1}]}
{"name":"sea","type":"location","_id":5,"_inE":[{"_label":"lives","_id":20,"_outV":2}]}
{"name":"tartarus","type":"location","_id":6,"_inE":[{"_label":"lives","_id":21,"_outV":3},{"_label":"lives","_id":22,"_outV":11}]}
{"name":"hercules","type":"demigod","_id":7,"_outE":[{"_label":"mother","_id":25,"_inV":8},{"time":1,"_label":"battled","_id":26,"_inV":9},{"time":2,"_label":"battled","_id":27,"_inV":10},{"time":12,"_label":"battled","_id":28,"_inV":11},{"_label":"father","_id":24,"_inV":1}]}
{"name":"alcmene","type":"human","_id":8,"_inE":[{"_label":"mother","_id":25,"_outV":7}]}
{"name":"nemean","type":"monster","_id":9,"_inE":[{"time":1,"_label":"battled","_id":26,"_outV":7}]}
{"name":"hydra","type":"monster","_id":10,"_inE":[{"time":2,"_label":"battled","_id":27,"_outV":7}]}
{"name":"cerberus","type":"monster","_id":11,"_outE":[{"_label":"lives","_id":22,"_inV":6}],"_inE":[{"_label":"pet","_id":23,"_outV":3},{"time":12,"_label":"battled","_id":28,"_outV":7}]}
To use this sample graph for the following example, it must be first placed into HDFS.
faunus$ hadoop fs -put data/graph-of-the-gods.json graph-of-the-gods.json
faunus$ hadoop fs -ls
Found 1 item
-rw-r--r-- 1 marko supergroup 2028 2012-07-26 11:55 /user/marko/graph-of-the-gods.json
With the graph in HDFS, a reference is made to it in bin/faunus.properties
. Faunus already has all the properties configured to point to graph-of-the-gods.json
so it is possible to simply run the desired Gremlin traversal. To determine the grandfathers of the elements in the graph, a 2 step walk over father edges is enacted.
faunus$ bin/faunus.sh 'g.V().as("x").out("father").out("father").linkIn("x","grandfather")'
12/08/28 09:28:43 INFO mapreduce.FaunusCompiler: ,
12/08/28 09:28:43 INFO mapreduce.FaunusCompiler: , |\ ,__
12/08/28 09:28:43 INFO mapreduce.FaunusCompiler: |\ \/ `\
12/08/28 09:28:43 INFO mapreduce.FaunusCompiler: \ `-.:. `\
12/08/28 09:28:43 INFO mapreduce.FaunusCompiler: `-.__ `\/\/\|
12/08/28 09:28:43 INFO mapreduce.FaunusCompiler: / `'/ () \
12/08/28 09:28:43 INFO mapreduce.FaunusCompiler: .' /\ )
12/08/28 09:28:43 INFO mapreduce.FaunusCompiler: .-' .'| \ \__
12/08/28 09:28:43 INFO mapreduce.FaunusCompiler: .' __( \ '`(()
12/08/28 09:28:43 INFO mapreduce.FaunusCompiler: /_.'` `. | )(
12/08/28 09:28:43 INFO mapreduce.FaunusCompiler: \ |
12/08/28 09:28:43 INFO mapreduce.FaunusCompiler: |/
12/08/28 09:28:43 INFO mapreduce.FaunusCompiler: Generating job chain: g.V().as("x").out("father").out("father").linkIn("x","grandfather")
12/08/28 09:28:43 INFO mapreduce.FaunusCompiler: Compiled to 3 MapReduce job(s)
...
Lets examine the command more closely.
g.V().as("x").out("father").out("father").linkIn("x","grandfather")
-
g
: the graph pointed to bybin/faunus.properties
-
V
: for all the vertices in the graph -
as("x")
: name the current elements “x” -
out("father")
: traverse out over father edges -
out("father")
: traverse out over father edges -
linkIn("x","grandfather")
: create incoming grandfather edges from vertices at step “x”
When the Faunus job completes (which could be many MapReduce jobs), the resultant derivation is available in output.txt
(the file name is set in bin/faunus.properties
). This file can be pulled onto local disk (from HDFS) and viewed. Hercules’ grandfather is Saturn (that is the only grandfather derivation available in The Graph of the Gods).
faunus$ hadoop fs -getmerge output.txt target/output.txt
faunus$ more target/output.txt
{"name":"saturn","type":"titan","_id":0,"_inE":[{"_label":"grandfather","_id":-1,"_outV":7},{"_label":"father","_id":12,"_outV":1}]}
{"name":"jupiter","type":"god","_id":1,"_outE":[{"_label":"lives","_id":13,"_inV":4},{"_label":"brother","_id":16,"_inV":3},{"_label":"brother","_id":14,"_inV":2},{"_label":"father","_id":12,"_inV":0}],"_inE":[{"_label":"brother","_id":17,"_outV":3},{"_label":"brother","_id":15,"_outV":2},{"_label":"father","_id":24,"_outV":7}]}
{"name":"neptune","type":"god","_id":2,"_outE":[{"_label":"lives","_id":20,"_inV":5},{"_label":"brother","_id":19,"_inV":3},{"_label":"brother","_id":15,"_inV":1}],"_inE":[{"_label":"brother","_id":18,"_outV":3},{"_label":"brother","_id":14,"_outV":1}]}
{"name":"pluto","type":"god","_id":3,"_outE":[{"_label":"pet","_id":23,"_inV":11},{"_label":"lives","_id":21,"_inV":6},{"_label":"brother","_id":17,"_inV":1},{"_label":"brother","_id":18,"_inV":2}],"_inE":[{"_label":"brother","_id":19,"_outV":2},{"_label":"brother","_id":16,"_outV":1}]}
{"name":"sky","type":"location","_id":4,"_inE":[{"_label":"lives","_id":13,"_outV":1}]}
{"name":"sea","type":"location","_id":5,"_inE":[{"_label":"lives","_id":20,"_outV":2}]}
{"name":"tartarus","type":"location","_id":6,"_inE":[{"_label":"lives","_id":21,"_outV":3},{"_label":"lives","_id":22,"_outV":11}]}
{"name":"hercules","type":"demigod","_id":7,"_outE":[{"_label":"mother","_id":25,"_inV":8},{"_label":"grandfather","_id":-1,"_inV":0},{"time":1,"_label":"battled","_id":26,"_inV":9},{"time":2,"_label":"battled","_id":27,"_inV":10},{"time":12,"_label":"battled","_id":28,"_inV":11},{"_label":"father","_id":24,"_inV":1}]}
{"name":"alcmene","type":"human","_id":8,"_inE":[{"_label":"mother","_id":25,"_outV":7}]}
{"name":"nemean","type":"monster","_id":9,"_inE":[{"time":1,"_label":"battled","_id":26,"_outV":7}]}
{"name":"hydra","type":"monster","_id":10,"_inE":[{"time":2,"_label":"battled","_id":27,"_outV":7}]}
{"name":"cerberus","type":"monster","_id":11,"_outE":[{"_label":"lives","_id":22,"_inV":6}],"_inE":[{"_label":"pet","_id":23,"_outV":3},{"time":12,"_label":"battled","_id":28,"_outV":7}]}