University of Maryland, College Park


Data-Intensive Information Processing Applications (Spring 2010): Syllabus

In the syllabus below, readings are to be completed before the class indicated. White refers to Hadoop: The Definitive Guide, O'Reilly, 2009. Lin & Dyer refers to Data-Intensive Text Processing with MapReduce (textbook under development). See general course information for more details.

# Date Topic Assignment Due Details
11/26 Introduction to MapReduce   [show]
11/26

Introduction to MapReduce

Readings (complete before class)

Topics

  • Administrivia
  • Overview of cloud computing
  • Overview of MapReduce and the distributed file system

Material

[Hide]
22/2 Hadoop: Nuts and Bolts Assignment 1-1 [show]
22/2

Hadoop: Nuts and Bolts

Readings (complete before class)

  • White, Chapter 1, "Meet Hadoop"
  • White, Chapter 2, "MapReduce", up until page 32
  • White, Chapter 3, "The Hadoop Distributed File System", up until page 63
  • White, Chapter 4, "Hadoop I/O", starting from "Serialization" until the end of chapter
  • White, Chapter 5, "Developing a MapReduce Application", up until page 144

Note that a lot of chapters are assigned from the White book. However, the purpose is to get you acquainted with Hadoop—we don't expect you to digest all the material the first time through, since no doubt you'll be referring back to the book frequently throughout the semester.

Topics

  • Writing, running, debugging Hadoop programs
  • Hadoop behind the scenes

Material

[Hide]
2/9 Class canceled due to snowstorm Assignment 1-2
32/16 MapReduce: the programming environment [show]
32/16

MapReduce: the programming environment

Readings (complete before class)

  • Lin & Dyer, Chapter 3, MapReduce Algorithm Design
  • White, Chapter 3, "The Hadoop Distributed File System", page 63 until end of chapter
  • White, Chapter 6, "How MapReduce Works"
  • White, Chapter 7, "MapReduce Types and Formats"
  • White, Chapter 8, "MapReduce Features"

Topics

  • "Warehouse-size" computers and the datacenter environment
  • MapReduce algorithm design and design patterns

Material

[Hide]
42/23 Text retrieval algorithms Assignment 2 [show]
42/23

Text retrieval algorithms

Readings (complete before class)

Topics

  • Introduction to information retrieval
  • Basics of indexing and retrieval
  • Inverted indexing in MapReduce
  • Retrieval at scale

Material

[Hide]
53/2 Graph algorithms Assignment 3 [show]
53/2

Graph algorithms

Readings (complete before class)

Topics

  • Graph problems and representations
  • Parallel breadth-first search
  • PageRank

Material

[Hide]
63/9 Midterm
63/9 [Hide]
3/16 Spring break: no class!
73/23 MapReduce and databases Assignment 4 [show]
73/23

MapReduce and databases

Readings (complete before class)

Topics

  • Relational databases vs. MapReduce
  • MapReduce algorithms for processing relational data
  • OLTP vs. OLAP (data warehousing and business intelligence)

Material

[Hide]
83/30 Hidden Markov models Assignment 5 [show]
83/23

Hidden Markov models

Readings (complete before class)

Topics

  • Hidden Markov models
  • Expectation maximization

Material

[Hide]
94/6 Language models [show]
94/6

Language models

Readings (complete before class)

  • Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, and Jeffrey Dean. (2007) Large Language Models in Machine Translation. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 858–867.

Topics

  • N-gram language models
  • Parameter estimation for web-scale language models

Material

[Hide]
104/13 Large-scale graphs [show]
104/13

Large-scale graphs

Readings (complete before class)

Topics

  • Scalable identity resolution in email collections: Slides in PDF (622 KB)
  • DNA sequence assembly: Slides in PDF (5.65 MB)
[Hide]
114/20 Dryad and DryadLINQ [show]
114/20

Dryad and DryadLINQ

Readings (complete before class)

Topics

[Hide]
124/27 Bigtable, Hive, and Pig Assignment 6 [show]
124/27

Bigtable, Hive, and Pig

Readings (complete before class)

Topics

  • Bigtable
  • Hive
  • Pig

Material

[Hide]
135/4 Project  
135/4 [Hide]
5/6 Project (optional session as makeup for snowstorm)  
145/11 Project Presentations
145/11 [Hide]
15TBA Final Exam    

Back to main page


This page, first created: 07 Jan 2010; last updated: Creative Commons: Attribution-Noncommercial-Share Alike 3.0 United States Valid XHTML 1.0! Valid CSS!