University of Maryland, College Park


Data-Intensive Information Processing Applications (Spring 2010)

Assignment 4: PageRank

Due: Friday 3/26 (2pm)

Complete the PageRank exercise in Cloud9. The sample graphs are loaded on the cluster at /tmp/sample-data. Although you may debug in standalone mode on your local machine, your final solution must run on the cluster.

Question 1. Describe your implementation. In particular, I want a discussion of your approach to the following:

  1. handling the dangling nodes,
  2. incorporating the random jump factor, and
  3. checking for convergence.

Question 2. For each of the sample graphs (small, medium, large), list the top ten nodes in order of descending PageRank value.

Question 3. For each of the sample graphs (small, medium, large), save the job details page for one iteration and attach it as part of your answer. That is, from the jobtracker webapp, find your job, save that page in HTML, and include it in your assignment submission.

Question 4. How long did it take you to complete this assignment?

Submission Instructions

This assignment is due by 2pm, Friday 3/26. Please send us (both Jimmy and Nitin) an email with "Cloud Computing Course: Assignment 4" as the subject. In the body of the email put answers to the questions above. If you have collaborated with anyone else or have received any assistance in completing this assignment, you must tell us.

Pack up your code into a zip file named USERNAME-code.zip, and attach it your assignment submission. So for example, I would pack up my code in a file named jimmylin-code.zip. Once again, please follow these instructions exactly.

Note: The Google/IBM cluster is a shared resource accessible by many. Any impropriety on the cluster will be taken very seriously. This includes tampering or attempting to tamper with another student's results, attempting to pass another student's result as one's own, etc. See the Code of Academic Integrity or the Student Honor Council for more information.

For this assignment, you may not consult any existing implementation of PageRank in MapReduce/Hadoop. Consulting the implementation of any non-MapReduce implementation (e.g., in JUNG) is fine.

Back to main page


This page, first created: 15 Mar 2010; last updated: Creative Commons: Attribution-Noncommercial-Share Alike 3.0 United States Valid XHTML 1.0! Valid CSS!