Problem Set 1: Bigram counts
Due: Friday 9/19 (by 9:30am)
Complete the bigram counts exercise in Cloud9. You must complete both parts of the exercise ("Count the bigrams" and "From counts to conditional probabilities").
As a note, it is not necessary to write MapReduce programs to answer all the questions! Once your program output is written to HDFS, you can retrieve the relevant files (see this guide to HDFS shell commands for details) and use standard Unix utils to answer the questions (grep, sort, wc, etc.).
Deliverables
This problem set is due by 9:30am, Wednesday 9/17. Send me an email, with "LBSC 878A: Problem Set 1" as the subject. In the email body, either include answers to the exercise questions, or attach a file contain those answers. Attach a tarball containing any code that you have written. Feel free to describe anything you've done beyond the assignment.