Time: Tuesdays and Thursdays, 10:00-11:20am
Location: QNC 1502
Instructor: Jimmy Lin
TAs: Nafis Ahmed and Eimaan Saqib
Piazza: Link

Important: There are two sections of CS 451. This is the course homepage for Section 002. There is another Section 001 that meets at 1pm taught by Dan Holtby. If you're in that section, this is not the right place.

Overview

The data flywheel characterizes the virtuous cycle by which an organization builds a useful product, analyzes user behavior from it to derive insights, and transforms those insights into actions to improve the product. Google, Meta, Amazon, and nearly all consumer technology companies today can be characterized in this manner.

Today, machine learning techniques are ubiquitous in “closing the loop”, in deriving insights from massive amounts of data and putting them into action. This course provides an overview of the infrastructure that enables this, what can generically be termed a data platform. Today, in industry this is often called a lakehouse (a portmanteau of a data lake and data warehouse). In modern parlance, this course focuses on data engineering.

Data engineering is a practical endeavor for grappling with the volume, velocity, variety, and veracity of data. While there are best practices, there are no solutions, only tradeoffs (e.g., between time and space, between latency and throughput, etc.) that need to be managed based on different usage scenarios. Outside of brand new startups, data platforms likely already exist within an organization, and typically represent complex, messy, and evolving entities comprising a myriad of technologies, processes, and constraints (both technical and non-technical). Thus, a key competency in data engineering is the ability to understand and evaluate existing designs and the tradeoffs they represent; only then can data platforms be improved to accelerate the data flywheel.

Having described what this course is, it might be similarly helpful to describe what this course is not:

Objectives

By the end of this course, students will be able to:

Fork me on Github!