Problem Set 4: PageRank
Due: Friday 10/17 (by 9:30am)
Complete the PageRank exercise in Cloud9.
Deliverables
This problem set is due by 9:30am, Friday 10/17. Send me an email, with "LBSC 878A: Problem Set 4" as the subject. In the email body or attached as a file, provide the following information:
- A written description of your implementation (a few paragraphs
at the most). Specifically, I want a discussion of your approach to
the following:
- handling the dangling nodes,
- incorporating the random jump factor, and
- checking for convergence.
- For each of the sample graphs, list the top ten nodes in order of descending PageRank value.
Also a tarball containing any code that you have written. Feel free to describe anything you've done beyond the assignment.
In completing this assignment, I expect that you will abide by the university's code of academic integrity. Specifically, you may not consult any existing implementation of PageRank in MapReduce/Hadoop. Consulting the implementation of any non-MapReduce implementation (e.g., in JUNG) is fine.