G22.3033-003
Architecture and Programming of Parallel Computers

Fall 1998 -- Vijay Karamcheti


Announcements


General Information

Lectures: Thursdays, 5:00pm-7:00pm,
102 CIWW
Office Hours: Mondays, 4:00pm-5:00pm,
715 Broadway, Room 704
phone: (212) 998-3496
email: vijayk@cs.nyu.edu

Prerequisites

Senior undergraduate-level or introductory graduate-level courses in computer architecture and operating systems.

Description

Parallel computing is a critical component of the computing technology of the 90s, and is likely to grow in importance with the proliferation of multiprocessor PC desktops and servers (consisting of 2-8 Pentium II's on a shared bus), and scalable clusters of commodity workstations.

This course shall examine the organizing principles behind parallel computing both from an architectural and a programming perspective. The course consists of two parts, organized around a common set of issues relevant to all parallel systems: naming, synchronization, latency, and bandwidth. The first part will discuss how modern parallel computer architectures deal with these issues, both at the small (shared memory multiprocessors) and large (scalable multiprocessors) scales. The second part of the course will discuss how the issues are dealt with in several common programming paradigms including message-passing, shared-memory, data-parallel, as well as higher-level approaches. The focus in this part of the course will be on both programming expression and programming for performance.

The intended audience for this course is doctoral students with research interests in computer architectures, software systems (programming languages, compilers, operating systems), and applications.

Course Structure

The course will consist of lectures based on material from two textbooks (recommended) supplemented with current research papers. There will be both written and programming assignments as well as a significant term project (which can be done in groups of 2-3 students). There will be no exams.

Textbooks

1.
David Culler and Jaswinder Pal Singh with Anoop Gupta,
Parallel Computer Architecture: A Hardware/Software Approach,
Morgan Kaufmann, 1998.
2.
George Almasi and Allan Gottlieb,
Highly-Parallel Computing, 2nd Edition
Benjamin-Cummings, 1994.

Syllabus

9/10 Lecture 1
Handouts 1, 2, 3
HW 1 (due 09/24)
Introduction
why parallel computing, motivating applications,
history and convergence, course organization
9/17 Lecture 2 Parallel Programs
decomposition, assignment, orchestration, mapping
case studies: Ocean, Barnes, Ray Tracing, Data Mining
parallel system components: architecture, OS and compilers, programming models, applications
9/24 Lecture 3
Handouts 4, 5
HW 2 (due 10/08)
Models of Parallel Computation
analytical: PRAM, LogP
operational: data parallel, message passing, shared memory
common issues: naming, synchronization, latency, bandwidth
tutorial: data-parallel programming
10/01 Lecture 4 Small-scale Shared Memory Machines
bus-based architectures, snoopy caches
case studies: SGI Challenge, SUN Enterprise
tutorial: programming with MPI
10/08 Lecture 5
Handouts 6, 7
HW 3 (due 10/22)
Large-scale Distributed Memory Machines
scalable networks, processor-network interfaces
support for put/get, remote memory access
case study: Cray T3E
10/15 Lecture 6 Large-scale Shared Memory Machines
directory-based coherence
case study: SGI Origin 2000
tutorial: programming with threads
10/22 Lecture 7
Handout 8
HW 4 (due 11/05)
Large-scale Shared Memory Machines(contd.)
programmable protocol processors
case study: Stanford FLASH
10/29 Lecture 8 Programming for Performance
common issues: load-balance, synchronization, locality
data-parallel models: compilation technology, data layout
11/05 Lecture 9 (this lecture has been rescheduled to Nov. 3rd, 7:30pm-9:00pm, 101 WWH)
Programming for Performance(contd.)
message-passing models: messaging layers, multithreading,
message pipelining and aggregation
11/12 --- No Class ---
11/19 Lecture 10 Programming for Performance(contd.)
shared memory models: lock aggregation, dynamic task assignment,
layout to minimize cache conflicts
Hardware/Software Tradeoffs
shared virtual address space
11/26 --- Thanksgiving Vacation ---
12/03 Lecture 11 Future Directions
hardware and software trends
high-level programming models and compilation issues
parallel programming tools
concluding remarks
12/10 Lecture 12 Project Presentations