Overview
This course considers the challenge of converting high-level algorithmic ideas into efficient parallel code. We will see how this challenge is greatly simplified by modern programming languages and programming techniques, especially functional programming techniques which make it possible to quickly develop efficient, scalable, and correct implementations. Students will learn how to design, analyze, implement, and evaluate the performance of parallel algorithms across a variety of problem domains (e.g., graph analysis, computational geometry, numerical algorithms, image processing, etc.). In the second half of the course, students will complete a parallel programming project of their own design.
Prerequisites: CSCI-GA.2110 Programming Languages. Familiarity with functional programming and recursive programming techniques is helpful.
Topics include:
- work and span, parallelism
- sequential baselines, work-efficiency, speedup
- parallel functional programming
- parallel algorithm design techniques: divide-and-conquer, contraction, prefix doubling, etc.
- parallel data structures: sequences, sets, tables/dictionaries, graphs, etc.
-
higher-order parallel primitives:
map
,reduce
,filter
,scan
, etc. - determinism and non-determinism
- parallel algorithms from a variety of domains: sorting, searching, order statistics, text/image/audio processing, graph analysis, computational geometry, numerical algorithms, etc.
Schedule
(Note: tentative—subject to change)
Week | Date | Lecture | Notes | Homework | |
---|---|---|---|---|---|
0 | Mon | Jan 20 | no lecture—MLK day (university holiday) | ||
1 | Mon | Jan 27 |
introduction,
parallel hardware,
parallelism vs concurrency,
parallel functional programming,
MaPLe programming language,
par ,
scheduling,
(self-)speedup
|
notes (pdf, md)
code Recommended Reading: APS Ch 2 Sec 1,2 Ch 7 Ch 8 |
hw1 released |
2 | Mon | Feb 3 |
work and span,
language-based cost model,
recurrences,
divide-and-conquer,
reduce
|
notes (pdf)
Recommended Reading: APS Ch 2 Sec 3 Ch 26 Sec 1 Ch 28 Sec 4 |
hw1 due hw2 released |
3 | Mon | Feb 10 |
work efficiency,
recurrences (cont.),
contraction,
parallel prefix sums,
scan ,
sequences
|
hw2 due hw3 released |
|
4 | Tue | Feb 18 |
(Note: lecture Tue instead of Mon)
parallel sorting and searching, order statistics |
hw3 due hw4 released |
|
5 | Mon | Feb 24 | trees, parallel ordered sets and tables/dictionaries, parallel augmented maps |
hw4 due hw5 released |
|
6 | Mon | Mar 3 | graphs: undirected and directed, sparse representations, parallel traversals |
hw5 due hw6 released |
|
7 | Mon | Mar 10 | graphs (cont.): parallel traversals, contraction | hw6 due | |
8 | Mon | Mar 17 | the parallel zoo: parallelism in Rust, Java, Go, ISPC, CUDA, Futhark, etc. | project proposals due | |
Fri | Mar 21 | proposal revisions due (if applicable) | |||
9 | Mon | Mar 24 | no lecture—spring break | ||
10 | Mon | Mar 31 | advanced topics: fusion, eliminating intermediate allocation | ||
11 | Mon | Apr 7 | advanced topics: dynamic programming, bottom-up scheduling | ||
12 | Mon | Apr 14 | advanced topics: randomized parallel algorithms | project checkpoint due | |
13 | Mon | Apr 21 | advanced topics: on-the-fly concurrency and non-determinism, parallel hashing and hash tables | ||
14 | Mon | Apr 28 | advanced topics: scheduling by work-stealing | ||
15 | Mon | May 5 | project presentations | projects due | |
Thu | May 8 |
(Note: 10:00am–11:50am)
project presentations |
Policies
Grading: homework assignments (50%), final project (50%)
Deadlines: All deadlines are at 5:00pm (eastern time) on the date listed in the schedule.
Late Submissions: 10% score penalty for each day late. Submissions will be not be accepted if they are submitted more than one week late.
Academic Integrity: Please review the department academic integrity policy. In this course, you are permitted to discuss assignments with other students as long as all discussion adheres to the following "whiteboard policy". Discussion may take place at a whiteboard (or on a scrap of paper, etc.), but no record of the discussion may be kept (all notes must be erased or discarded, no audio or video recording, etc.) and you must allow at least two hours to pass after the discussion before working on the assignment. Being able to recreate any solution from memory is considered proof that you actually understand the solution. If you collaborate with someone in this way on an assignment, you must list their name(s) in your submission. Copying solutions or any other work is a serious offense.
Accommodations: If you are in need of accommodations due to a disability or otherwise, please contact the instructor: s (dogoodt) we!stricluckk (a!t) nyrobotsu (do!t) ed!u
Project
In the second half of the course, students will complete a self-directed programming project, responsible for half of their overall grade. The project can be completed individually, or in groups of two. The goal of the project is to develop a parallel application which achieves real parallel speedups.
Project proposals will be due halfway through the semester (please see the schedule). The specifics of the project are up to the students; the only requirements are (1) the project must be the students' original work, and (2) the project must have a significant parallel programming component, ideally demonstrating speedups on a real-world problem.
For the programming component of the project, we recommend using MPL (see below). However, if desired, another programming language can be used, with approval of the instructor. In the project proposal, students should clearly state what tools and programming language(s) they intend to use.
MaPLe (MPL)
Homework assignments will use the MaPLe programming language, a high-level parallel programming language which offers a number of features making it simpler and safer to write efficient parallel code. The MaPLe language is based on Standard ML. Students do not need to already be familiar with MPL or Standard ML; we will introduce these as part of the course.
Resources
Slack. We'll use Slack for questions and discussions outside of lecture. An invite link will be sent at the beginning of the class.
Textbook. There is no required textbook. The content of this course is roughly based on the free textbook Algorithms: Parallel and Sequential, by Umut A. Acar and Guy Blelloch.
Learning MaPLe. We will introduce this language as part
of the course. If you would like to get a head start, we recommend taking a
look at mpl-tutorial
,
especially the first few sections (Hello World, Parallelism and Granularity
Control, and Trees). A number of programming examples are available
here
and here.
MaPLe is based on Standard ML. To familiarize
yourself with the syntax we recommend this guide.
Compute Servers. Courant has a number of compute servers available for students to use for assignments, research, etc. For this course we recommend students use the following machines, each of which has 32 cores (64 threads) and 256GB of memory.
- crunchy1.cims.nyu.edu
- crunchy2.cims.nyu.edu
- crunchy5.cims.nyu.edu
- crunchy6.cims.nyu.edu
Information about accessing these servers is available here.
You will need a CIMS account. If you do not already have a CIMS account,
please follow the instructions here.
We recommend adding the following to your local SSH configuration,
replacing YOUR_CIMS_USERNAME
with your CIMS account name.
This is usually the same as your NYU NetID.
Host cims-access
HostName access.cims.nyu.edu
User YOUR_CIMS_USERNAME
Host cims-crunchy1
ProxyCommand ssh cims-access nc crunchy1.cims.nyu.edu 22
User YOUR_CIMS_USERNAME
You can similarly add configurations for the other machines, crunchy2.cims.nyu.edu, etc. You should then be able to log into a crunchy server like so:
$ ssh cims-crunchy1