G22.3033.10  Scalable Clusters: Architecture and Software

Spring 1998 -- Vijay Karamcheti

General Information

Lectures: Tuesdays, 7:00pm-9:00pm, 101 WWH
Office Hours: Wednesdays, 11:00am-12:00noon, 715 Broadway, Room 704
phone: (212) 998-3496
email: vijayk@cs.nyu.edu
Mailing List: g22_3033_010_sp98@cs.nyu.edu
Click here if you want to mail something to the list.


Clusters of commodity PCs and workstations are fast becoming a viable alternative to custom parallel machines: high-performance processing nodes and fast networks combine to provide performance levels comparable to massively-parallel architectures, while offering significant cost and scaling advantages.

In this course, we will examine the architecture and software of high-performance scalable clusters, emphasizing differences in how various issues are addressed in a cluster environment as compared to in custom parallel machines.

Topics to be covered include the architectures of clusters (processing nodes, networks, processor-network interfaces), middleware (messaging layers, distributed shared memory systems, models for group communication and fault-tolerant computing, and various models of resource coordination), and specific case studies of clusters and large-scale applications that have demonstrated good performance.


Introductory graduate level courses in computer architecture and operating systems recommended.

Course Structure

The course will be conducted as a combination of lectures (providing general background) and discussions of papers representing the state of the art. Students will be expected to present papers, participate in discussion, and do a project which explores the research issues in greater detail. There will be no exams.

Lecture Schedule and Reading List

The following is an approximate schedule of topics to be covered. All the referenced papers are available on the web. Everyone is expected to have read the papers before the corresponding lecture.

Lecture 1 (1/20) Course Introduction and Administrivia
Lecture 2 (1/27) Technology Trends
Lecture 3 (2/3) Full-custom Parallel Machines
PRESENTER(S): Arash Baratloo, Holger Karl, Fangzhe Chang
Lecture 4 (2/10) Parallel Machines with Commodity Processor Cores
PRESENTER(S): Perrin Meyer (Recommended)
Lecture 5 (2/17) Processor-Network Interfaces
PRESENTER(S): Yuanyuan Zhao, Jian Zheng
Lecture 6 (2/23) Low-level Messaging Protocols
PRESENTER(S): Linda Steinberg, Leonid Zheleznyak (Recommended)
Lecture 7 (3/3) Higher-level Messaging Protocols
PRESENTER(S): Amit Nene, Niranjan Nilakantan (Recommended)
Lecture 8 (3/10) Shared Virtual Memory
PRESENTER(S): Yuanyuan Zhao, Fangzhe Chang (Recommended)
Lecture 9 (3/24) Shared Object Memory and Custom Protocols
PRESENTER(S): Amit Nene, Niranjan Nilakantan (Recommended)
Lecture 10 (3/31) Fault-tolerance and Availability
PRESENTER(S): Arash Baratloo, Jian Zheng (Recommended)
Lecture 11 (4/7) Resource Coordination
PRESENTER(S): Harris Morgenstern, Holger Karl (Recommended)
Lecture 12 (4/14) Case Studies
PRESENTER(S): Perrin Meyer, Linda Steinberg
Lecture 13 (4/21) No Class
Lecture 14 (4/28) Project Presentations


You can do projects in groups of 2-3. You are free to choose any topic which explores the research issues associated with using a cluster of commodity workstations as a viable high-performance parallel platform. I am trying to arrange for access to both a scalable parallel machine and a cluster of workstations which can be used for running experiments.

A one-page writeup describing the project idea and completion plan is due by 9:00pm, March 3, 1998 (Lecture 7). Project presentations are scheduled for Lectures 13 and 14. The final report is due by 6:00pm, May 5, 1998 and should be 15-20 pages long in conference style (e.g., consisting of an abstract, intro/motivation, background, approach/solution, analysis/discussion, related work, and conclusions).

Some project ideas are described below:

  1. Characterizing application sensitivity to performance differences between custom parallel machines and clusters. This study can be performed using a suite of scientific or commercial benchmarks, or one or two large-scale applications, and would quantify how the performance of these applications is affected by differences in communication overhead and latency, as well as coordinated scheduling.
  2. Porting standard messaging layers (e.g., MPI) or distributed shared memory systems (e.g., CRL, Shasta) to a cluster environment, and evaluating their performance.
  3. Exploring new techniques for overcoming the disadvantages of high communication overheads in a cluster environment. Examples include developing techniques which would schedule communication operations in advance of where they are actually required, and developing application-specific distributed shared memory protocols. The expectation here is that you would build upon an existing system, and demonstrate the incremental benefit of your ideas.
  4. Exploring the use of alternate processor-network interfaces for clusters (e.g., one which supports responsive communication operations such as put/get, instead of just send/receive), and simulating/measuring their performance advantages.
  5. Designing a "killer app" which can benefit from the cost, performance, scalability, and high availability of a cluster environment (Inktomi started off as a class project at Berkeley).