A Hadoop toolkit for working with big data
The order inversion design pattern is covered in Chapter 3, where the example is computing bigram relative frequencies. A reference implementation can be found in the solutions to the bigram counting exercise.
Here's how it works, very briefly: in addition to each bigram
count, the mapper emits an special key (word, *)
which contributes to
the marginal count of the word. We define the sort order so that this
special key always comes before the bigram counts, so that by
preserving state across multiple reduce method calls,we can compute
the relative frequency directly by simple division. We also need to
define a custom partitioner to send all keys with the same left
element to the same reducer.