Skip to content

Using Hadoop by Ruby script, supported by JRuby. Not Hadoop streaming.

Notifications You must be signed in to change notification settings

abhinay/jruby-on-hadoop

 
 

Repository files navigation

JRuby on Hadoop

Description

JRuby on Hadoop is a thin wrapper for Hadoop Mapper / Reducer by JRuby. This is a fork of the original project and works with Hadoop 0.20.1, whilst the original works with the deprecated Hadoop API for the latest version of Hadoop.

Install

Required gems are all on GemCutter.

  1. Upgrade your rubygem to 1.3.5

  2. Clone this project, build the gem and install it

Usage

  1. Run Hadoop cluster on your machines and set HADOOP_HOME env variable.

  2. put files into your hdfs. ex) test/inputs/file1

  3. Now you can run ‘joh’ like below:

$ joh examples/wordcount.rb test/inputs/file1 test/outputs

You can get Hadoop job results in your hdfs test/outputs/part-*

Example

see also examples/wordcount.rb

def setup(job)
  #configure the MapReduce Job
end

def map(key, value, context)
  value.split.each do |word|
    context.collect(word, 1)
  end
end

def reduce(key, values, context)
  sum = 0
  values.each {|v| sum += v }
  context.collect(key, sum)
end

Build

You can build hadoop-ruby.jar by “ant”.

ant

Required to set env HADOOP_HOME for your system. Assumed Hadoop version is 0.20.1.

Authors

Koichi Fujikawa <[email protected]> Abhinay Mehta <[email protected]>

License: Apache License

About

Using Hadoop by Ruby script, supported by JRuby. Not Hadoop streaming.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 62.0%
  • Ruby 38.0%