-
Notifications
You must be signed in to change notification settings - Fork 58
Distributed Graph Computing with Gremlin
The script
-step in Faunus’ Gremlin allows for the arbitrary execution of a Gremlin script against all vertices in the graph (or those which currently exist in Faunus’ computational pipeline). This simple idea has interesting ramifications for Gremlin-based distributed computing.
- Global graph mutations: update a Titan cluster in parallel given some arbitrary computation.
- Global graph algorithms: propagate information to arbitrary depths in the graph in order to compute some algorithm in a parallel fashion.
One way to do global graph mutations is to use an InputFormat
that reads a graph from a database (e.g. Titan and/or Rexster) and then mutate the Faunus representation of that graph in HDFS over a various Gremlin/Faunus steps. Finally, delete the original graph in the database and bulk loading the new mutated graph. The problem with this method is that it requires the graph database to be deleted and re-loaded which, for production 24×7 systems, is not a reasonable requirement.
Another way to do this is using script
-step to allow for real-time, parallel bulk updates of the original graph in the graph database itself. A simple example explains the idea. Assume the Graph of the Gods dataset in Titan/Cassandra sharing its nodes with Hadoop data node and task trackers.