This course aims to look at trends in cluster computing, specifically trends driven by changes in hardware, applications, and privacy requirements and how these changes impact systems that drive modern datacenters. The aim of the course is to introduce students to recent work, and allow them to
- Explain the design and architecture of these systems.
- Analyze tradeoffs between the design of these systems, and decide what is most appropriate for a given use case.
- Gain experience with using and building big data systems.
Tentative Schedule and Syllabus
|Date||Topic & Readings||Other|
|02/02||Introduction: Course Mechanics and Overview||
Lab 0: Administrative
|02/09||Introduction and Overview||
Lab 1: Setup HDFS and Spark
|02/16||Introduction and Trends||Whiteboard|
Lab 1 Due
Project Proposal Due.
|03/16||Storage: Privacy and Policies||
|03/23||Communication: Introduction and Performance||
Midterm. Due 03/26 5pm ET.
|03/30||Communication: Applications and Privacy||
Final Project Checkin - I
|04/13||Programming Models: Serverless||Whiteboard|
|04/20||Applications: Machine Learning||
Final Project Checkin - II
|04/27||Applications: Reinforcement Learning||Whiteboard|
|05/04||What we missed.||Whiteboard|
|05/11||Final out. Due 05/16 5pm ET.|
Grading will be based on quality of work, and presentation. The grade breakdown is as follows (this might change until the beginning of semester):
- 15% for the one project: This is designed to introduce you to CloudLab infrastructure and help you set up a basic cluster.
25% for the final project: This should be done in groups of 2 or 3 people. You
can either (a) explore a new research idea, or (b) work on a significant
implementation project. For (a) you should work on a project that could
eventually lead to a paper at SoCC, OSDI, SOSP or similar conference; while
for (b) we recommend finding an existing open source
project and extending or contributing to it (e.g., developing a new scheduling
policy for Kubernetes or Apache Yarn); or developing a
sufficiently large project.
We will have 2 intermediate project checkpoints to give you early feedback on project progress. You are encouraged to use Campuswire and other class communication medium to ask questions and get help from others in the class.
- 20% synthesized notes: Each student needs to sign up to produce notes for four lectures (each set of notes is worth 5%). These will be posted for the rest of the class, and should discuss the motivation, trends and tradeoffs in the papers for that lecture, and potentially look beyond these papers. These are due a week after the lecture.
- 10% class participation: We are going to judge participation by responses on Campuswire, and comments on synthesized notes.
- 10% Midterm, 20% Final exam: Both are going to be take-home.