Cloud9

A Hadoop toolkit for working with big data

The order inversion design pattern is covered in Chapter 3, where the example is computing bigram relative frequencies. A reference implementation can be found in the solutions to the bigram counting exercise.

Here's how it works, very briefly: in addition to each bigram count, the mapper emits an special key (word, *) which contributes to the marginal count of the word. We define the sort order so that this special key always comes before the bigram counts, so that by preserving state across multiple reduce method calls,we can compute the relative frequency directly by simple division. We also need to define a custom partitioner to send all keys with the same left element to the same reducer.