A Hadoop toolkit for working with big data

Cloud9 is a collection of Hadoop tools that tries to make working with big data a bit easier.

Warning As of December 2015, this library is no longer being actively developed or maintained. Please see Bespin for its replacement. Why? The Cloud9 codebase dates back to 2007 and has accumulated a lot of cruft; it's time to start over with a blank slate.

This software was designed with two goals in mind: First, to serve as a teaching tool for MapReduce and MapReduce algorithm design. Second, to provide a collection of useful tools on which to build other "big data" systems. Here are just a few features:

Getting Started


Reference Implementations

Cloud9 provides reference implementations of many design patterns and algorithms introduced in the book Data-Intensive Text Processing with MapReduce by Lin and Dyer. Some of these examples are also solutions to exercises included with the library, which have been previously used in MapReduce courses at the University of Maryland.