Parallel Computing                       
Prof. Mohamed Zahran   (aka Prof. Z)
mzahran AT cs DOT nyu DOT edu
M/W 2:00-3:15 pm  Location: WWH 109
Office Hours (WWH 320): Tuesdays 2-4pm (WWH 320)

It is all about parallelism!! 


Welcome students! ... to the Parallel Computing course, edition Spring 2017.  I will keep updating this page regularly. If you have questions related to that course feel free to email me. 

Most of us have learned to program a single microprocessor (single core) using a high-level programming language like C/C++, Java, ... This is called sequential programming. We feel very comfortable with this because we think in sequential way and give the machine statements to be executed in sequence. However, this has to change. A microprocessor with single core no longer exists in almost all computers we are using today (including your tablets and smartphones). Most of our devices are now multicore processors. A multicore processor contains several core (called CPUs or cores) on-chip. To make the best use of these multicore chips we need to program them in-parallel. Sequential programming, for all platforms from smartphones to supercomputers, is falling out of fashion and taking back-seat to parallel programming.

How to think in parallel? How to write code in parallel to make the best use of the underlying hardware? How is that new hardware different from the traditional one? What will the future be for the software and hardware? This is the topic of this course.

Here is some basic information about this course:

Midterm exam:  March 20th (same place/time as the lecture)
Final exam:  May 15th - WWH 109 - 2-3:50pm

Getting in Touch:

By order of preference

Lecture Notes and Schedule 

Lecture date Topic Reading Comments
1 1/23 Why Parallel Computing?
  • 1.1-1.4
2 1/25 Parallel Hardware: Basics
  • 2.1-2.2
3 1/30 Parallel Hardware: Advanced
  • 2.3
4 2/1 Parallel Software: Basics
  • 2.4, 2.7
  • hw1 assigned
5 2/6 Parallel Software: Advanced
6 2/8 Performance Analysis
  • 2.6
  • hw2 assigned
2/13 MPI - I
  • 3.1
8 2/15 MPI - II
  • 3.2-3.3
  • 3.4.1-3.4.5
2/20 President's day - No class
9 2/22 MPI - III
  • 3.4.6-3.6
10 2/27 MPI - IV
  • 3.7
11 3/1 MPI - Last Touch
12 3/6 OpenMP - I
  • 5.1-5.2
3/8 Revision

3/13  No class: Spring Recess

3/15 No class Spring Recess
3/20 Midterm Exam
13 3/22 OpenMP - I (cont'd)
  • 5.1-5.2
14 3/27 OpenMP - II
  • 5.3, 5.4, 5.5
15 3/29 OpenMP - III
  • 5.6-5.7
16 4/3 OpenMP - IV
  • 5.8
  • lab2 asigned
17 4/5 OpenMP - Last Touch
  • 5.9
18 4/10 GPU - Intro
19 4/12 CUDA  I
20 4/17 CUDA  I (cont'd)
21 4/19 CUDA  II
  • lab 3 assigned
  • hw 3 assigned
22 4/24 CUDA  III
23 4/26 Invited Speaker
24 5/1 CUDA  IV
25 5/3 CUDA  Last Touch
Revision 5/8 Revision



Before doing any MPI programming, do the following once you login onto your CIMS account:
ssh to one of the computational nodes (e.g. crunchy1, crunchy3, crunchy4, crunchy5, and  crunchy6 ... No crunchy2!)
Type, the following:
     module load openmpi-x86_64

Lab 1 description - Due March 6th on NYU classes - You will need this file.  

Lab 2 description - Due April 12th on NYU classes - This file will help you check your answer.


After you login to your CIMS account, ssh to cudax (where x is 1 to 5)
then do the following steps:
  1. module load mpi/mpich-x86_64
  2. cp -r /usr/local/cuda/samples ~/samples
  3. cd samples
  4. make
  5. cd bin/x86_64/linux/release
  6. ./deviceQuery
  7. ./bandwidthTest
After that, each time you need to do CUDA programming, you need only  to do step1.
Lab 3 description - Due May 1st - You also need this file.

Homework Assignments 

Interesting Links (Geeky stuff )

If you have an interesting link, please email it to the instructor and it will find its way to this page.

Top500 Supercomputers in the world

Future Chips (targeting both software and hardware folks interested in parallel programming)

Designing and Building Parallel Programs

The trouble with multicore

Introduction to parallel computing

The Landscape of Parallel Computing

MPI with Python
MPI tutorial
More advanced MPI tutorial

EPCC benchmarks for OpenMP