Through the 1990s, processors got faster and faster, but around the turn of the millennium, CPUs started to reach physical limits. To drive innovation and produce more powerful computers, manufacturers began to pack multiple cores into each processor. However, most software is conceived by the programmer as a linear progression of instructions. Compilers can find some parallelization opportunities, but that’s not enough to take full advantage of multiple cores. Programmers must take much of the responsibility for making their programs exploit multiple cores, and this course discusses many modern techniques:

Students will be expected to have experience with a programming-intensive course, as well as some experience with C/C++. The course will include lectures, homework, labs, a final project, and a final exam.

Labs & Assignments

A series of labs will challenge students to make their understanding of the concepts taught in the course more concrete. The labs will be implemented in C++ (or C, if requested). An introductory lab (Lab 0) ensures that all students are roughly on the same page with the necessary tools (gcc/g++, gdb, and git). The remaining labs will involve building pieces of a simple multithreaded data server. Please, take a look at the department's academic integrity policy. You can show a colleague how to use a given tool. You can discuss strategies to solve the problems with a colleague, if (and only if) you mention it in writing in your assignment hand-in. This kind of collaboration is encouraged. Each student must type, compile, debug, benchmark, and discuss any aspect of their own code. You are not allowed to look at a colleague's code, and copying code from existing resources will both not teach you anything and be subject to severe disciplinary action.

Several homeworks will also be assigned to evaluate students' absorption of the theoretical concepts taught in class, in preparation for the final exam.

Mailing List

The class mailing list can be found at Questions about the lecture material and readings should go to the mailing list before contacting the instructor directly.


The readings column is updated each week. Note that the readings listed are the ones you should or must (as noted) read by that class; they're not the readings assigned on that day.

Date & Lecture Material Assignments & Reading
September 8, 2016
Lecture 1 (Slides)
Syllabus, course outline
The Advent of the Multicore Processor
September 15, 2016
Lecture 2 (Slides)
Parallelism, Concurrency, and Performance
September 22, 2016
Lecture 3 (Slides)
Understanding Hardware
September 29, 2016
Lecture 4 (Slides)
Parallel Programming and PThreads: An Introduction
October 6, 2016
Lecture 5 (Slides)
Threads and P-Threads II
October 13, 2016
Lecture 6 (Slides, Appendix)
Coordinating Resources
October 20, 2016
Lecture 7 (Slides)
Synchronized Structures I
October 27, 2016
Lecture 8 (Slides)
Synchronized Structures II
November 3, 2016
Lecture 9 (Slides)
Multicore Correctness
November 10, 2016
Lecture 10 (Slides)
Multicore Performance Evaluation
November 17, 2016
Lecture 11 (Slides)
Heterogeneous Multicore
November 24, 2016
Thanksgiving Recess
(No class)
December 1, 2016
Lecture 12 (Slides)
Transactional Memory
December 8, 2016
Lecture 13 (Slides)
Looking Ahead, Final Exam Review
December 15, 2016
Lecture 14
Project Presentations
December 22, 2016
Final Exam
In-class exam

Preliminary Readings

  1. "Parallel Programming for Multicore and Cluster Systems" (You must be logged into NYU network)
  2. Herb Sutter, The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software, Dr. Dobb's Journal, 30(3), March 2005.
  3. How to survive the multicore software revolution?
  4. Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency (You must be logged into NYU network)
  5. The Memory System: You Can't Avoid It, You Can't Ignore It, You Can't Fake It (You must be logged into NYU network)
  6. A Primer on Memory Consistency and Cache Coherence (You must be logged into NYU network)
  7. The Problem With Threads
  8. A Runtime Implementation of OpenMP Tasks (You must be logged into NYU network)
  9. IPC considered harmful for multiprocessor workloads
  10. Computer Architecture Performance Evaluation Methods (You must be logged into NYU network)
  11. Effective Performance Measurement and Analysis of Multithreaded Applications
  12. Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
  13. The Impact of Performance Asymmetry in Emerging Multicore Architectures
  14. Transactional Memory (You must be logged into NYU network)
  15. Unlocking Concurrency
  16. Performance-Aware Multicore Programming (You must be logged into NYU network)
  17. The Common Case Transactional Memory Behavior of Multithreaded Programs