This is an attempt to list out all the interesting projects.
It is intended for anyone designing modern large scale architectures and need to choose tools/technoglogies/frameworks. The purpose is to help in making that choices with resources like comparisons/use-cases/features/maturity or really anything that helps in making an informed decision.
This are implementations/libraries to help write distributed applications which require some form of coordination.
- Tez vs Dryad
- Hadoop vs Spark - Too many differences, no good link.
- White Elephent
- Ambrose
- Lipstick
- Hue - Hadoop Web UI
- Inviso
- Timberlake
- Zoie
- Norbert - cluster manager and networking layer built on top of Zookeeper.
- Okapi - Large-scale ML & graph analytics on Giraph
- Scalding - A Scala API for Cascading
- SummingBird - Streaming MapReduce with Scalding and Storm
- Curator - set of Java libraries that make using Apache ZooKeeper much easier
- Turbine - Low latency high throughput aggregator for real time streams
- DataFu - Collection of MapReduce lib
- Twill (Previsously known as Weave) - YARN application writing lib