Distributed Computing, G22.2631

Course Goals

Distributed computing has moved from the laboratory to development to commercial diffusion. A typical Wall Street application involves widely dispersed traders within a city talking to a duplexed server at the primary site with an identical configuration at a disaster recovery site. (What is the implication of database replication on such an architecture?) Many such applications are currently being extended to transoceanic operation. Embedded system enterprises from aerospace to telecommunications have developed or announced major distributed computing systems. (How do real-time guarantees interact with fault tolerance?)

The ideal distributed system is cheaper, faster, more highly available, and safer than a centralized system. Unfortunately, few ideal distributed systems exist. The reasons often have to do with programming complexity, but often also have to do with inherent tradeoffs between safety and availability. For example, safety suggests that all server updates go to the primary and backup, but should the primary stop if the backup has failed? Such tradeoffs disappear if we are less demanding about what safety means. Understanding the reasons underlying such tradeoffs helps one make better decisions. (Why are the most successful distributed systems in communications applications?) Distributing work, especially across the web, brings with it a new set of problems as well: How does one manage networks? How does one trust communication, especially authentication? How does one interface with systems one doesn't know? Here are some of the topics we will discuss.


The course is based on research papers, lectures, and bound lecture notes. Most of the lectures will be based on the notes, though some will not be. (You should purchase these from Unique Copy Center, 252A Greene St. at cost of about $30. Please note that they may be labeled G22.3033, but should have my name on them and say Distributed Computing.) Consult the table of contents of the lecture notes for a detailed syllabus.

Many difficult-to-find papers will be in the library as well as some papers that we don't explicitly discuss. Please note that these ``Green Box'' papers should be kept in their alphabetic order.

There is no textbook, though you are welcome to look for any book you think may be helpful. You may also need to buy a book on Java to do the projects and homeworks --- many are available.

Mailing List

The majordomo subscription identifier is: g22_2631_001_fall97. Send mail to majordomo@cs.nyu.edu with the body
to find out how to subscribe to and use majordomo. To subscribe to the class mailing list, send mail to majordomo@cs.nyu.edu with the body
subscribe g22_2631_001_fall97@cs.nyu.edu

About the Instructor

Dennis Shasha is a professor of computer science at New York University's Courant Institute of Mathematical Sciences.ht He holds a B.S. from Yale, an M.S. from Syracuse and a Ph.D. from Harvard. He has written four books: Database Tuning: A Principled Approach and two mathematical detective stories, The Puzzling Adventures of Dr. Ecco and Codes, Puzzles, and Conspiracy and Out of Their Minds: the lives and discoveries of 15 great computer scientists. His three main research projects combine databases with parallel processing (the Persistent Linda project), databases with pattern recognition (Combinatorial Pattern Discovery project) and databases with expert systems and information retrieval (Thinksheet project).

Questions are welcome. Please send them to shasha@cs.nyu.edu. Office hours are Wednesday afternoon before class.


Homework 1 Homework 2

Project Information



Code (Look at the README file in the code tar file for installation instructions)