Processing Units (GPUs): Architecture and Programming
Welcome students! ... to
the Graphics Processing Units course, edition Fall 2016.
course examines the architecture and capabilities of
modern GPUs (graphics processing unit).
I will keep updating this page regularly. If you have
questions related to this course feel free to email me.
Why GPUs are important now more than ever?
Many computations can be performed
faster on the GPU than on a traditional CPU.
This is why GPUs exist now in almost all
computers (from tablets to supercomputers);
many of Top 500 supercomputers in the world are built
GPUs are now used for a diverse set of applications not only
traditional graphics applications.
This course introduces the concept of
general-purpose GPUs or GPGPUs.
In this course, we will cover architectural aspects of modern GPUs.
We will also learn how to program GPUs to solve different type of
problems and how to make the best use of its hardware.
- Some other suggested, but not required, books:
- Our grader for this semester: Jiakai Zhang jiakai.ta (at) gmail dot com
Sign up for the Mailman mailing
list for the course, if you have not done it already. You can manage your subscription by clicking here.
Please follow the mailing list etiquette.
- Use the Reply command to
contribute to the current thread, but NOT to start
- Use your NYU email, not any other ones.
- If quoting a previous message, try to trim off
- Use a descriptive Subject: field when starting a new topic.
- Do not use one message to ask two unrelated questions.
- Do NOT make the mistake of sending your
completed project assignment to the mailing list!
Final exam: Tuesday Dec 20th, 2016 at 5:10pm Room WWH 317
Hw1 - Due Sep 20 - sol
- (15% of total grade)
Hw2 - Due Oct 25 - sol
(30% of total grade)
To setup you machine to work with out CUDA cluster:
First, login to your CIMS account
Once logged in, ssh to cuda2 (or cuda5)
Now, you can get setup running CUDA code by following these instructions:
module load mpi/mpich-x86_64
cp -r /usr/local/cuda/samples ~/samples
For lab 3 you need to ssh to opencl1
(25% of the total grade)
The following are suggested projects. You can do the project alone or as part of a gorup of 2.
After reading the list of suggested projects you can:
If you decide to do #2 or #3, you need to discuss your version with me first before the project is officially assigned to you.
- Pick one of them.
- Suggest your own.
- Suggested a modified version of one of them.
is because some students underestimate their choice (given the amount
of time) or overestimate their choice (by picking something overly
Here the list of suggested projects:
The final project report as well as your source code are due on Dec 13th by email to the grader and CC me.
sorting algorithms for GPUs: This involves: making an exhaustive (to
some extent) list of sorting algorithms for CPU, then pick the ones
that are good candidates for GPUs, implement them, and compare them in
terms of performance (relative to sequential version) and scalability
(relative to problem size).
- L2 cache replacement policy: This
project involves: analyzing access patterns of L2 cache with several
benchmark programs, survey currently available replacement policies for
L2 GPUs, propose a possibly better one, implement it by modifying a
given simulator (GPGPU-sim).
- Pick an application to parallelize for
GPU. You will need to compare it to the sequential version and any
existing parallel version for multicore and GPU. Moreover, you need to
generalize your findings such that people can make use of your
experience to parallelize their own applications.
- Do we need L3 cache in GPUs? You need to design experiments to find the answer. This requires using GPGPU-sim simulator.
applications on GPUs: Can we benefit from executing not-so-GPU-friendly
on GPUs? How? What are the conditions? You need to design experiments
to accomplish this.
The report must contain the following parts:
- Title of the project
- Name of the authors
- Abstract: summarizing the problem, solution, and results in one paragraph only.
- Introduction: What is the problem you are trying to solve? Why is it important? What are your contributions?
- Background Information: You can put here any needed background information about the problem at hand and any domain-specific knowledge.
- Literature Survey: Previous work in this problem and what are the pros and cons of each solution.
- Proposed Solution
- Experimental Setup: Specs of the machine, problem size, etc.
- Experimental Results & Analysis: What are your findings. I expect much deeper description that "As we can see X is better than Y".
- Conclusions: What are your main findings? Can you generalize them?
Links (Geeky stuff about GPUs)
GPUs in general:
digital 3D rendered film (Thanks William Ward)
with Ed Catmull (Thanks William Ward)
GPU computing seminars
GPU accelerated machine learning (Thanks Darshan Hegde for the link)
Floating point numbers (Thanks Chris W. Quackenbush)
Nice summary of optimizations (Thanks to Darshan Hegde )
C programming guide
CUDA C Best Practices
of CUDA articles at Dr. Dobb's
2.0 reference card
Simulators and Tools:
(simulates both GPUs and multicore)
(dynamic compilation for PTX)