Distributed computing has moved from the laboratory
to development to commercial diffusion, most notorioiusly
on the web.
A typical Wall Street application involves widely dispersed
traders within a city talking to a duplexed server at the primary site
with an identical configuration at a disaster recovery site.
(What is the implication of database replication on such an architecture?)
Many such applications are currently being extended to
Embedded system enterprises
from aerospace to telecommunications have developed
or announced major distributed
(How do real-time guarantees interact with fault tolerance?)
The ideal distributed system is cheaper, faster,
more highly available, and safer than a centralized
Unfortunately, few ideal distributed systems exist.
The reasons often have to do with programming complexity,
but often also have to do with inherent tradeoffs between
safety and availability.
For example, safety suggests that all server updates go to the
primary and backup, but should the primary stop if the backup has failed?
Such tradeoffs disappear if we are less demanding about what safety means.
Understanding the reasons underlying such tradeoffs helps one make
(Why are the most successful distributed systems in communications
Distributing work, especially across the web,
brings with it a new set of problems as well:
How does one manage networks?
How does one trust communication, especially authentication?
How does one interface with systems one doesn't know?
How do software stacks like Active Software, Jini, and Java Beans work?
These are some of the topics we will discuss, sometimes with the help
of outside experts.
Please purchase the lecture notes at Unique Copy
252A Greene St. at cost of about $30
before the first class.
Linguistic constructs as a way to think
about distributed programming (especially the relationship between
message passing and shared memory).
Your project will be in Java (or perhaps K, www.kx.com).
Algorithms to handle asynchrony:
Time as a partial order, Lamport clocks, clock synchronization,
the notion of round in an asynchronous system, consistent snapshots,
case study: Ethernets with guarantees.
Algorithms and impossibility results for fault tolerance:
goals of fault tolerance, masking, commit protocols,
transmission algorithms, consensus,
case studies: database backups, replication servers, replicated
state machines, idempotent memories.
Applications: fault tolerance in financial systems,
realtime internet protocols,
authentication and security including applications
of zero-knowledge proofs.
Software stacks: from message primitives to application-level
management, enterprise java beans and jini, intelligent agents.
Here is the syllabus in
and in plain text (latex source) .
Here is a survey paper in PDF format
from ACM Computing Surveys vol. 31, number 1:
Fundamentals of Fault-Tolerant
Distributed Computing in Asynchronous Environments
by Felix Gaertner
This is a copy made available for classroom use without fee.
Here are Ernie Cohen's power point slides about Java Technologies:
RMI, Beans, Enterprise Beans, and Jini.
Ernie can be reached at email@example.com.
This is a recent survey paper by Eli Gafni that supersedes,
philosophically at least,
his Milcom paper.
Here are Joe Conron's notes on MQ series.
They are in word format.