Graphics Processing Units (GPUs): Architecture and Programming

Prof. Mohamed Zahran   (aka Prof. Z)
mzahran AT cs DOT nyu DOT edu
 Tuesdays: 5:10PM — 7:00PM   WWH 317
                                                                           Office Hours (WWH 320): Wed 2-4pm                                                                             


Welcome students! ... to the Graphics Processing Units course, edition Fall 2017.  
I will keep updating this page regularly. If you have questions related to this course feel free to email me. 

This course examines the architecture and capabilities of modern GPUs (graphics processing unit),
and how to use them to get the best performance for many applications.

Why GPUs are important now more than ever?
Many computations can be performed faster on the GPU than on a traditional CPU (e.g. many scientific applications, training part of deep learning, ...).  
This is why GPUs exist now in almost all computers (from tablets to supercomputers);
and many of Top 500 supercomputers in the world are built around GPUs.
GPUs are now used for a diverse set of applications not only traditional graphics applications.
This course introduces the concept of general-purpose GPUs or GPGPUs.
In this course, we will cover architectural aspects of modern GPUs.
We will also learn how to program GPUs to solve different type of problems and how to make
the best use of its hardware.

NYU Classes


Date Topic Readings Comments
9/5 Gentle Introduction to GPUs
  • chp 1
9/12 Hardware Perspective of GPUs
9/19 Introduction to CUDA
  • chp 2
  • hw1 assigned
9/26 Introduction to CUDA

  • Lab 1 assigned
  • Tools you will need.
10/3 CUDA: Threads & Memory
  • chp 3
  • chp 4
  • It is good to read the extra example of chp 3 (Image Blur)
10/10CUDA: Advanced Techniques 1
  • chp 5
  • 6.1
  • hw2 assigned
10/17 CUDA: Advanced Techniques 1 (cont'd)

10/24 CUDA: Advanced Techniques 2
10/31 CUDA: Advanced Techniques 3
  • chp 9 (histogram)
  • chp 17 (computational thinking)
  • This paper
11/7 Parallel Patterns skim:
  • chp 7
  • chp 8
11/14 CUDA: Advanced Techniques 3 (cont'd)
  • OpenACC: chp 19
11/21 OpenCL
11/28 Revision Previous Final Exams:


Assignments (non-programming - Submission through NYU classes
(15% of total grade)

Programming Assignments
(30% of total grade)
To setup you machine to work with out CUDA cluster:
First, login to your CIMS account
Once logged in, ssh to cuda2 (or  cuda5)
Now,  you can get setup running CUDA code by following these instructions:
module load mpi/mpich-x86_64
cp -r /usr/local/cuda/samples ~/samples
cd samples
cd bin/x86_64/linux/release

(25% of the total grade)

Note: You may not find a paper about your specific project topic. The idea is to survey the literature and introduce your new idea.
The papers given below are just starting points. If you pick a project, you may need to read and digg more.

Suggested projects:

Interesting Links (Geeky stuff about GPUs)

GPUs in general:
High Performance Computing on GPUs
First  digital 3D rendered film (Thanks William Ward)
Interview with Ed Catmull (Thanks William Ward)
NVIDIA GPU computing seminars
GPU accelerated machine learning (Thanks Darshan Hegde for the link)
Intel Graphics
Floating point numbers  (Thanks Chris W. Quackenbush)

Nice summary of optimizations (Thanks to Darshan Hegde )
CUDA C programming guide
CUDA C Best Practices
CUDA occupancy calculator
For CUDA developers
Series of CUDA articles at Dr. Dobb's

OpenCL 2.0 reference card


GPU Simulators and Tools:
Multi2sim (simulates both GPUs and multicore)
GPUOcelot (dynamic compilation for PTX)