Graphics Processing Units (GPUs): Architecture and Programming

Prof. Mohamed Zahran   (aka Prof. Z)
mzahran AT cs DOT nyu DOT edu
 Tuesdays: 5:10PM — 7:00PM   WWH 317
                                                                           Office Hours (WWH 320): Wed/Th (2-3pm)                                                                              


Welcome students! ... to the Graphics Processing Units course, edition Fall 2016.  
I will keep updating this page regularly. If you have questions related to this course feel free to email me. 

This course examines the architecture and capabilities of modern GPUs (graphics processing unit).

Why GPUs are important now more than ever?
Many computations can be performed faster on the GPU than on a traditional CPU.  
This is why GPUs exist now in almost all computers (from tablets to supercomputers);
and the many of Top 500 supercomputers in the world are built around GPUs.
GPUs are now used for a diverse set of applications not only traditional graphics applications.
This course introduces the concept of general-purpose GPUs or GPGPUs.
In this course, we will cover architectural aspects of modern GPUs.
We will also learn how to program GPUs to solve different type of problems and how to make
the best use of its hardware.

Mailing List

Sign up for the Mailman mailing list for the course, if you have not done it already. You can manage your subscription by clicking here.
Please follow the mailing list etiquette. 


Date Topic Readings Comments
1.   9/6 Gentle Introduction to GPUs
  • chp 1
2.   9/13 GPGPUs Evolution and Hardware Perspective
  • hw1 assigned 
3.   9/20 CUDA: Introduction
  • chp 3
4.   9/27 CUDA:  Threads and Memories
  • chp 4
  • chp 5
5.   10/4 CUDA Advanced Techniques 1
  • chp 6
  • chp 13
6.   10/11 CUDA Advanced Techniques 1 (cont'd)
7.   10/18  CUDA Advanced Techniques 2
8.   10/25 CUDA Advanced Techniques 3
9.   11/1 CUDA Advanced Techniques 3 (cont'd)
10. 11/8 CUDA Advanced Techniques 4
  • Lab 2 assigned
11. 11/15 OpenCL
  • chp 14
  • A very, and recent, good book about OpenCL is this one. You get free access if at NYU network.
  • source code: this and this (code that we studied in class).
12. 11/22 OpenCL (cont'd)
  • Lab 3 assigned
13. 11/29 OpenCL in Action
What Is Next
  • OpenACC chp 15
14. 12/6 Revision Previous exams:
Final exam:  Tuesday Dec 20th, 2016 at 5:10pm Room WWH 317

Assignments (non-programming) 
(15% of total grade)
Hw1 - Due Sep 20 - sol  
Hw2 - Due Oct 25 - sol    

Programming Assignments
(30% of total grade)

To setup you machine to work with out CUDA cluster:
First, login to your CIMS account
Once logged in, ssh to cuda2 (or  cuda5)
Now,  you can get setup running CUDA code by following these instructions:
module load mpi/mpich-x86_64
cp -r /usr/local/cuda/samples ~/samples
cd samples
cd bin/x86_64/linux/release

For lab 3 you need to ssh to opencl1

(25% of the total grade)

The following are suggested projects. You can do the project alone or as part of a gorup of 2.
After reading the list of suggested projects you can:
  1. Pick one of them.
  2. Suggest your own.
  3. Suggested a modified version of one of them.
If you decide to do #2 or #3, you need to discuss your version with me first before the project is officially assigned to you.
This is because some students underestimate their choice (given the amount of time) or overestimate their choice (by picking something overly simple)!

Here the list of suggested projects:
The final project report as well as your source code are due on Dec 13th by email  to the grader and CC me.
The report must contain the following parts:

Interesting Links (Geeky stuff about GPUs)

GPUs in general:
First  digital 3D rendered film (Thanks William Ward)
Interview with Ed Catmull (Thanks William Ward)
NVIDIA GPU computing seminars
GPU accelerated machine learning (Thanks Darshan Hegde for the link)
Intel Graphics
Floating point numbers  (Thanks Chris W. Quackenbush)

Nice summary of optimizations (Thanks to Darshan Hegde )
CUDA C programming guide
CUDA C Best Practices
CUDA occupancy calculator
For CUDA developers
Series of CUDA articles at Dr. Dobb's

OpenCL 2.0 reference card


GPU Simulators and Tools:
Multi2sim (simulates both GPUs and multicore)
GPUOcelot (dynamic compilation for PTX)