JRuby on Hadoop is a thin wrapper for Hadoop Mapper / Reducer by JRuby. This is a fork of the original project and works with Hadoop 0.20.1, whilst the original works with the deprecated Hadoop API for the latest version of Hadoop.
Required gems are all on GemCutter.
-
Upgrade your rubygem to 1.3.5
-
Clone this project, build the gem and install it
-
Run Hadoop cluster on your machines and set HADOOP_HOME env variable.
-
put files into your hdfs. ex) test/inputs/file1
-
Now you can run ‘joh’ like below:
$ joh examples/wordcount.rb test/inputs/file1 test/outputs
You can get Hadoop job results in your hdfs test/outputs/part-*
see also examples/wordcount.rb
def setup(job) #configure the MapReduce Job end def map(key, value, context) value.split.each do |word| context.collect(word, 1) end end def reduce(key, values, context) sum = 0 values.each {|v| sum += v } context.collect(key, sum) end
You can build hadoop-ruby.jar by “ant”.
ant
Required to set env HADOOP_HOME for your system. Assumed Hadoop version is 0.20.1.
Koichi Fujikawa <[email protected]> Abhinay Mehta <[email protected]>
License: Apache License