Skip to content

Latest commit

 

History

History
56 lines (40 loc) · 1.26 KB

README.rdoc

File metadata and controls

56 lines (40 loc) · 1.26 KB

JRuby on Hadoop

Description

JRuby on Hadoop is a thin wrapper for Hadoop Mapper / Reducer by JRuby. This is a fork of the original project and works with Hadoop 0.20.1, whilst the original works with the deprecated Hadoop API for the latest version of Hadoop.

Install

Required gems are all on GemCutter.

  1. Upgrade your rubygem to 1.3.5

  2. Clone this project, build the gem and install it

Usage

  1. Run Hadoop cluster on your machines and set HADOOP_HOME env variable.

  2. put files into your hdfs. ex) test/inputs/file1

  3. Now you can run ‘joh’ like below:

$ joh examples/wordcount.rb test/inputs/file1 test/outputs

You can get Hadoop job results in your hdfs test/outputs/part-*

Example

see also examples/wordcount.rb

def setup(job)
  #configure the MapReduce Job
end

def map(key, value, context)
  value.split.each do |word|
    context.collect(word, 1)
  end
end

def reduce(key, values, context)
  sum = 0
  values.each {|v| sum += v }
  context.collect(key, sum)
end

Build

You can build hadoop-ruby.jar by “ant”.

ant

Required to set env HADOOP_HOME for your system. Assumed Hadoop version is 0.20.1.

Authors

Koichi Fujikawa <[email protected]> Abhinay Mehta <[email protected]>

License: Apache License