GitHub - balshor/gdata-storagehandler: A Hive StorageHandler that uses a Google Spreadsheet as a backend.

gdata-storagehandler

This project implements a HiveStorageHandler that allows Hive to read and write data from a Google spreadsheet.

Although Hive/Hadoop are geared towards processing big data, this storage handler implementation is geared towards "Small Data". The original use case was for writing around a dozen lines of data containing the final output of a report into a Google spreadsheet.

Because of the small data orientation, it is recommended to read or write data from tables backed by this StorageHandler from only a single mapper or reducer. Using multiple mappers or reducers can result in duplicate data being read or written to the spreadsheet.

Some other notes:

The spreadsheet must be writable by the specified user.
We use 2-legged OAuth. See http://code.google.com/apis/gdata/docs/auth/oauth.html#2LeggedOAuth and http://www.google.com/support/a/bin/answer.py?hl=en&answer=162105.
The spreadsheet must exist and have the correct headers. Any Hive columns that do not map to a column header will not be written to the spreadsheet.
All writes are appends.

Sample usage:

add jar gdata-storagehandler.jar ;

create external table output(day string, cnt int, source_class string, source_method string, thrown_class string)
stored by 'com.bizo.hive.gdata.GDataStorageHandler'
with serdeproperties (
  "gdata.user" = "[email protected]",
  "gdata.consumer.key" = "bizo.com",
  "gdata.consumer.secret" = "...",
  "gdata.spreadsheet.name" = "Daily Exception Summary",
  "gdata.worksheet.name" = "First Worksheet",
  "gdata.columns.mapping" = "day,count,class,method,thrown"
)
;

If you are using Amazon's Elastic Mapreduce, you can add the jar file as follows:

add jar s3://com-bizo-public/hive/storagehandler/gdata-storagehandler-0.1.jar ;

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
lib		lib
src/main/java		src/main/java
LICENSE		LICENSE
README.markdown		README.markdown

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

balshor/gdata-storagehandler

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages