First exploration
- motivation
- walkthrough
- reflection
Stream
- Why Hadoop I: Simple Parallelism
- Chimps at typewriters
- Pig Latin translation
- Testing it at commandline
- Running it on cluster
- Input Splits
Reshape
- Locality
- Elves pt1
- Simple Join
- Elves pt2
- Partition key + sort key
Using Hadoop and herding `cat`s
- overview of wukong
- overview of pig
- toolset overview
cat herding
- Simple (!) munging
- total sort
- sampling
Data munging (Semi-structured data)
Statistics
- First pig -
  - Log Processing
- Sessionizing a log
Statistics
- Average, StdDev, etc of a huge spreadsheet
- Exact Percentiles (Median) of a huge spreadsheet
- Approximate Percentiles (Median) of a huge spreadsheet
- Histogram
  - Geographic
- mechanics of handling geo data
- Statistics on grid cells
- Clustering
- Pointwise mutual information
  - Text Processing
- Inverted Index (word count)
- Minhash
  - Time Series
- weather & flight delays for prediction
- Anomaly detection
- Wikipedia Pageview
- Flight delays
- World Cup
  - Graph
- Adjacency List / Edge List conversion
- Minimal Spanning Tree
- Pagerank
- Undirecting a graph
- Assemble a min-index Adj. list
- Breadth-First Search
- Min-degree undirected graph
  - Hadoop Internals
  - Tuning, for the wise and lazy
  - Tuning, for the brave and foolish

Provide feedback

Saved searches