Funded by the National Science Foundation (IIS-0836560)
PI: Jimmy Lin, Co-PI: Philip Resnik

Note: This project concluded in June 2011. This website is no longer actively maintained, and is available primarily for archival purposes.

Project Overview

In October 2007, Google and IBM jointly announced the Academic Cloud Computing Initiative (ACCI), with the goal of helping both researchers and students address the challenges of "web-scale" computing. The initiative revolves around Google's MapReduce programming framework, which represents a proven approach to tackling data-intensive problems in a distributed manner. Six universities were involved in the collaboration at the outset: Carnegie Mellon University, Massachusetts Institute of Technology, Stanford University, the University of California at Berkeley, the University of Maryland, and University of Washington. See Google press release, IBM press release, and UMD press release.

As part of this initiative, IBM and Google have dedicated a large cluster of several hundred machines for use by faculty and students at the participating institutions. The cluster takes advantage of Hadoop, an open-source implementation of MapReduce in Java. By making these resources available, Google and IBM hope to encourage faculty adoption of cloud computing in their research and also integration of the technology into the curriculum. A few months later, the ACCI teamed up with the National Science Foundation to create the Cluster Exploratory (CLuE) initiative, whereby NSF would provide funding to support research on the ACCI infrastructure. This project was funded under that program.

In the context of this project, we have been exploring the intersection of large-scale text retrieval and statistical machine translation. One thread has been scaling up iterative machine learning algorithms to larger and larger dataset. Another thread has been the application of IR techniques to automatically extract bilingual training data.

Project Team

picture of Jimmy Jimmy Lin
Associate Professor
The iSchool (College of Information Studies), University of Maryland
picture of Philip Philip Resnik
Professor
Department of Linguistics, University of Maryland
Chris Dyer Chris Dyer
Ph.D. student
Department of Computer Science, University of Maryland
(graduate Spring 2010)
Tamer Elyased Tamer Elsayed
Ph.D. student
Department of Computer Science, University of Maryland
(graduated Summer 2009)
Ferhan Ture Ferhan Ture
Ph.D. student
Department of Computer Science, University of Maryland

Publications (selected)

Broader Impacts

  • This grant supported the development of Cloud9, a Hadoop library for both research and teaching used at Maryland and elsewhere.
  • This grant supported the development of a textbook on MapReduce algorithm design, available here.
  • This grant supported multiple iterations of a course on MapReduce and large-data processing at the University of Maryland.

Disclaimer

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Please contact the PI for additional information.