-
First exploration
-
motivation
-
walkthrough
-
reflection
-
-
Stream
-
Why Hadoop I: Simple Parallelism
-
Chimps at typewriters
-
Pig Latin translation
-
Testing it at commandline
-
Running it on cluster
-
Input Splits
-
-
Reshape
-
Locality
-
Elves pt1
-
Simple Join
-
Elves pt2
-
Partition key + sort key
-
-
Using Hadoop and herding `cat`s
-
overview of wukong
-
overview of pig
-
toolset overview
-
-
cat
herding-
Simple (!) munging
-
total sort
-
sampling
-
-
Data munging (Semi-structured data)
-
Statistics
-
First pig -
-
Log Processing
-
-
Sessionizing a log
-
-
Statistics
-
Average, StdDev, etc of a huge spreadsheet
-
Exact Percentiles (Median) of a huge spreadsheet
-
Approximate Percentiles (Median) of a huge spreadsheet
-
Histogram
-
Geographic
-
-
mechanics of handling geo data
-
Statistics on grid cells
-
Clustering
-
Pointwise mutual information
-
Text Processing
-
-
Inverted Index (word count)
-
Minhash
-
Time Series
-
-
weather & flight delays for prediction
-
Anomaly detection
-
Wikipedia Pageview
-
Flight delays
-
World Cup
-
Graph
-
-
Adjacency List / Edge List conversion
-
Minimal Spanning Tree
-
Pagerank
-
Undirecting a graph
-
Assemble a min-index Adj. list
-
Breadth-First Search
-
Min-degree undirected graph
-
Hadoop Internals
-
Tuning, for the wise and lazy
-
Tuning, for the brave and foolish
-
-