JRuby on Hadoop is a thin wrapper for Hadoop Mapper / Reducer by JRuby. This is a fork of the original project and works with Hadoop 0.20.1, whilst the original works with the deprecated Hadoop API for the latest version of Hadoop.
Required gems are all on GemCutter.
Upgrade your rubygem to 1.3.5
Clone this project, build the gem and install it
Run Hadoop cluster on your machines and set HADOOP_HOME env variable.
put files into your hdfs. ex) test/inputs/file1
Now you can run ‘joh’ like below:
$ joh examples/wordcount.rb test/inputs/file1 test/outputs
You can get Hadoop job results in your hdfs test/outputs/part-*
see also examples/wordcount.rb
def setup(job) #configure the MapReduce Job end def map(key, value, context) value.split.each do |word| context.collect(word, 1) end end def reduce(key, values, context) sum = 0 values.each {|v| sum += v } context.collect(key, sum) end
You can build hadoop-ruby.jar by “ant”.
Required to set env HADOOP_HOME for your system. Assumed Hadoop version is 0.20.1.
Koichi Fujikawa <[email protected]> Abhinay Mehta <[email protected]>
License: Apache License