Schedule of Classes

Session Date Topic Assignment due
1January 26IntroductionAssignment #0: Prelude
2February 2From Business Intelligence to Data ScienceAssignment #1: Warmup
3February 9MapReduce — Basic Algorithm Design
February 16Canceled due to snowstorm
4February 23MapReduce — Structured and Unstructured DataAssignment #2: Counting
5March 2MapReduce — GraphsAssignment #3: Inverted Indexing
6March 9MapReduce — Data Mining
Spring Break
7March 23Extending MapReduceAssignment #4: Graphs
8March 30NoSQL
9April 6Beyond MapReduce — Dataflow LanguagesAssignment #5: HBase
10April 13Beyond MapReduce — Graph ProcessingAssignment #6: Project Proposal
11April 20Beyond MapReduce — Stream ProcessingAssignment #7: Data Analytics
12April 27Production Considerations
13May 4Project Work
14May 11Project Presentations Final Project

Session 1: Introduction January 26

Readings (to be completed before class)

Slides

PPTX (Mac)   PDF

Back to top

Session 2: From Business Intelligence to Data Science February 2

Readings (to be completed before class)

Slides

PPTX (Mac)   PDF

Back to top

Session 3: MapReduce — Basic Algorithm Design February 9

Readings (to be completed before class)

  • Data-Intensive Text Processing with MapReduce
    • Chapter 3: MapReduce Algorithm Design
  • Ullman. (2012) Designing Good Mapreduce Algorithms. Crossroads.
  • Hadoop: The Definitive Guide (3rd Edition):
    • Chapter 4: Hadoop I/O (Read section on "Serialization", pages 93—108 and section on "File-Based Data Structures", pages 130—142)
    • Chapter 5: Developing a MapReduce Application (Skip sections on "Writing a Unit Test with MRUnit" and "Apache Oozie")
    • Chapter 6: How MapReduce Works (Skip section on "Configuration Tuning")
    • Chapter 7: MapReduce Types and Formats
    • Chapter 8: MapReduce Features (Read sections on "Counters", "Side Data distribution", and "Sorting")

Slides

PPTX (Mac)   PDF

Back to top

Session 4: MapReduce — Structured and Unstructured Data February 23

Readings (to be completed before class)

Slides

PPTX (Mac)   PDF

Back to top

Session 5: MapReduce — Graphs March 2

Readings (to be completed before class)

Slides

PPTX (Mac)   PDF

Back to top

Session 6: MapReduce — Data Mining March 9

Readings (to be completed before class)

Slides

PPTX (Mac)   PDF

Back to top

Session 7: Extending MapReduce March 23

Readings (to be completed before class)

Slides

PPTX (Mac)   PDF

Back to top

Session 8: NoSQL March 30

Readings (to be completed before class)

Slides

PPTX (Mac)   PDF

Back to top

Session 9: Beyond MapReduce — Dataflow Languages April 6

Readings (to be completed before class)

Slides

PPTX (Mac)   PDF

Back to top

Session 10: Beyond MapReduce — Graph Processing April 13

Readings (to be completed before class)

Slides

PPTX (Mac)   PDF

Back to top

Session 11: Beyond MapReduce — Stream Processing April 20

Readings (to be completed before class)

Slides

PPTX (Mac)   PDF

Back to top