Processing Units (GPUs): Architecture and Programming
Welcome students! ... to
the Graphics Processing Units course, edition Fall 2014.
I will keep updating this page regularly. If you have
questions related to this course feel free to email me.
Here is some basic information:
course examines the architecture and capabilities of
modern GPUs (graphics processing unit).
Many computations can be performed
faster on the GPU than on a traditional CPU.
This is why GPUs exist now in almost all
computers (from tablets to supercomputers);
majority of Top 500 supercomputers in the world are built
GPUs are now used for a diverse set of applications not only
traditional graphics applications.
This introduces the concept of
general-purpose GPUs or GPGPUs.
In this course, we will cover architectural aspects of modern GPUs.
We will also learn how to program GPUs to solve different type of
problems and how to make
the best use of its hardware.
- Some other suggested, but not required, books:
Final exam: Dec 17, 7:10-9pm Room 317
- Our graders:
- Jiakai Zhang zhjk (AT) nyu dot edu
- Tao Feng tf771 (AT) nyu do edu
Sign up for the Mailman mailing
list for the course, if you have not done it already. You can manage your subscription by clicking here.
Please follow the mailing list etiquette.
- Use the Reply command to
contribute to the current thread, but NOT to start
- Use your NYU email, not any other ones.
- If quoting a previous message, try to trim off
- Use a descriptive Subject: field when starting a new topic.
- Do not use one message to ask two unrelated questions.
- Do NOT make the mistake of sending your
completed project assignment to the mailing list!
Assignment 1: Due Sep 17th, 2014 - solutions - comments
Assignment 2: Due Oct 15th, 2014 - solutions - comments
Assignment 3: Due Nov 12th, 2014 - solutions - comments
To setup you machine to work with out CUDA cluster:
First, login to your CIMS account
Once logged in, ssh to cuda1
Now, you can get setup running CUDA code by following these instructions:
cp -r /usr/local/pkg/cuda
Suggested projects (but feel free to suggest your own):
- Problem definition report (5%) -> Due Oct 15th
- What is the specific topic (detailed problem definition)?
- What are the input and output?
- Why is it suitable for GPU?
- Survey and design report (5%) -> Due Oct 29th
- Survey of previous implementations in literature.
- What are the shortcomings of those implementations?
- How will your design avoid these shortcomings
- Give a detailed design of your program:
- Data structure
- Main functions used for CPU and kernels in GPU
- Optimizations that you will consider and why (be specific to your project)?
- Project results and final report (10%)-> Due Nov 26th
- Cover page: Project name, your name(s)
- Section 1: High-level project description
- Section 2: Design:
- Your data structure
- How you partition your data
- How you decide on threads, blocks, and grid
- Section 3: Optimizations you have done in your code to speedup things
- Section 4: Experimental setup
- Machines on which you tested your code
- Input you used to test your implementation
- How did you check for correctness
- Section 5: Experimental Results
6: Conclusions (Was CUDA suitable for this project? What could have
been done better? What are the shortcomings of your implementations?)
need to email the source code as well as readme file telling us how to
compile and run your code, in a single zip file named after your
Links (Geeky stuff about GPUs)
GPUs in general:
digital 3D rendered film (Thanks William Ward)
with Ed Catmull (Thanks William Ward)
GPU computing seminars
GPU accelerated machine learning (Thanks Darshan Hegde for the link)
Nice summary of optimizations (Thanks to Darshan Hegde )
C programming guide
CUDA C Best Practices
of CUDA articles at Dr. Dobb's
2.0 reference card
Simulators and Tools:
(simulates both GPUs and multicore)
(dynamic compilation for PTX)