G22.2631-001

Distributed Computing

Shasha, Dennis

Graduate Division

Computer Science


 

GOALS

Distributed computing has moved from the laboratory to development to commercial diffusion, most notorioiusly on the web. A typical Wall Street application involves widely dispersed traders within a city talking to a duplexed server at the primary site with an identical configuration at a disaster recovery site. (What is the implication of database replication on such an architecture?) Many such applications are currently being extended to transoceanic operation. Embedded system enterprises from aerospace to telecommunications have developed or announced major distributed computing systems. (How do real-time guarantees interact with fault tolerance?) The ideal distributed system is cheaper, faster, more highly available, and safer than a centralized system. Unfortunately, few ideal distributed systems exist. The reasons often have to do with programming complexity, but often also have to do with inherent tradeoffs between safety and availability. For example, safety suggests that all server updates go to the primary and backup, but should the primary stop if the backup has failed? Such tradeoffs disappear if we are less demanding about what safety means. Understanding the reasons underlying such tradeoffs helps one make better decisions. (Why are the most successful distributed systems in communications applications?) Distributing work, especially across the web, brings with it a new set of problems as well: How does one manage networks? How does one trust communication, especially authentication? How does one interface with systems one doesn't know? How do software stacks like Active Software, Jini, and Java Beans work? These are some of the topics we will discuss, sometimes with the help of outside experts. Please purchase the lecture notes at Unique Copy 252A Greene St. at cost of about $30 before the first class.

Syllabus

Here is the syllabus in postscript and in plain text (latex source) .

Recent Papers

Here is a survey paper in PDF format from ACM Computing Surveys vol. 31, number 1: Fundamentals of Fault-Tolerant Distributed Computing in Asynchronous Environments by Felix Gaertner This is a copy made available for classroom use without fee.

Here are Ernie Cohen's power point slides about Java Technologies: RMI, Beans, Enterprise Beans, and Jini. Ernie can be reached at ernie@research.telcordia.com.

This is a recent survey paper by Eli Gafni that supersedes, philosophically at least, his Milcom paper.

Here are Joe Conron's notes on MQ series. They are in word format.