Schedule

Week Description Dates Assignments
1 The Data Flywheel Sep 4 A1 Released: 9/4
2 Data Warehouses, Data Lakes, and Lakehouses Sep 9, 11 A2 Released: 9/11 A1 Due: 9/11
3 Batch Processing I Sep 16, 18
4 Batch Processing II Sep 23, 25 A3 Released: 9/25 A2 Due: 9/25
5 Rubber, Meet Road Sept 30, Oct 1
6 Data Infrastructure for Machine Learning Oct 7, 9 A4 Released: 10/9 A3 Due: 10/9
7Reading Week: No Classes!
8 Midterm Exam Oct 21, 23
9 Text Processing I Oct 28, 30 A5 Released: 10/30 A4 Due: 10/30
10 Text Processing II Nov 4, 6
11 Clustering Nov 11, 13 A6 Released: 11/13 A5 Due: 11/13
12 Graph Processing Nov 18, 20
13 Stream Processing Nov 25, 27 A6 Due: 11/27
14 LLMs Dec 2
Final Exam TBD

Week 1: The Data Flywheel Sep 4

Key Questions

  • What does it mean to be an AI-first or data-driven company?
  • What's the data flywheel?
  • What's data engineering?
  • What are data platforms?
  • What are the 4 V's of data?
  • What is this course about? And what is it not about?

Readings

The above readings are available for free online through the university's library. The links above point directly to Waterloo proxied content, but if you're having trouble accessing the content (e.g., due to VPN settings), you might have go through the library's portal (i.e., search for the book title and follow the appropriate link).

Slides

PDF slides for Sept 4 (v1.00)

Back to top

Week 2: Data Warehouses, Data Lakes, and Lakehouses Sep 9, 11

Key Questions

  • What are the main differences between operational and analytical infrastructure?
  • What are data warehouses? What problems did they evolve to solve?
  • What are data lakes and lakehouses? What problems did they evolve to solve?
  • What are the components of modern data platforms?
  • How do operational and analytical data models differ?
  • What goes on in ETL/ELT?
  • How do different physical representations of data affect storage, compute, and other tradeoffs within data platforms?

Readings

The above readings are available for free online through the university's library. The links above point directly to Waterloo proxied content, but if you're having trouble accessing the content (e.g., due to VPN settings), you might have go through the library's portal (i.e., search for the book title and follow the appropriate link).

Slides

PDF slides for Sept 9 (v1.01) PDF slides for Sept 11 (v1.00)

Back to top

Week 3: Batch Processing I Sep 16, 18

Key Questions

Readings

Slides

Back to top

Week 4: Batch Processing II Sep 23, 25

Key Questions

Readings

Slides

Back to top

Week 5: Rubber, Meet Road Sept 30, Oct 1

Key Questions

Readings

Slides

Back to top

Week 6: Data Infrastructure for Machine Learning Oct 7, 9

Key Questions

Readings

Slides

Back to top

Week 8: Midterm Exam Oct 21, 23

Back to top

Week 9: Text Processing I Oct 28, 30

Key Questions

Readings

Slides

Back to top

Week 10: Text Processing II Nov 4, 6

Key Questions

Readings

Slides

Back to top

Week 11: Clustering Nov 11, 13

Key Questions

Readings

Slides

Back to top

Week 12: Graph Processing Nov 18, 20

Key Questions

Readings

Slides

Back to top

Week 13: Stream Processing Nov 25, 27

Key Questions

Readings

Slides

Back to top

Week 14: LLMs Dec 2

Key Questions

Readings

Slides

Back to top

Final Exam TBD

Back to top