Distributed Systems Fall 2021
Lecture 2: Logical Clocks, Safety and Liveness
The main readings for this week (Lamport'78 and Fidge'88) concern themselves
with the task of ordering events (operations) in a distributed system. You
might wonder why do we not just use real clocks? One problem is relativity,
which tells us that the notion of time is dependent on location. This of
course sounds unhelpful, we after all manage to schedule our lives around
clocks despite relativity. Delay this thought until September 23, when we
will look at some reasons why this results in practical challenges for
distributed systems. Instead focus on logical clocks for this class.
Why worry about ordering anyways? The reason is we always think about algorithms
as presenting steps that must be performed in order: an algorithm is a sequence
of steps that must be performed. Ordering is thus at the heart of this
discussion: when we analyze distributed systems we need to understand the order
in which they performed operations, and when designing systems we need to worry
about how we ensure that operations are performed in order.
# Lamport '78: Time, Clocks, and the Ordering of Events in a Distributed Systems
In this paper, the first of the logical clock papers, Lamport describes an
algorithm (a procedure) for recovering a **total order** on events in the
distributed system. A total order here in particular means that for any two
events (which remember are operations) e and e' either e < e' (e happens and
then e' happens) or e' < e.
Total orders are appealing, and Lamport's construction makes sure that the total
order is sane: i.e., messages are sent before being received, and causality is
maintained. However, despite this the total order Lamport derives is not
**unique**, which just means that one can ascribe more than one total order to
a set of events.
## Questions to consider
* Consider a distributed system where the only events are sending and receiving
messages. Give pseudocode that each process should run when sending or receiving
messages in order to maintain a Lamport clock.
* Construct an example for the application that only allows sends and receives
where two or more total orders can be assigned to a single event sequence.
# Fidge '88: Timestamps in Message-Passing Systems That Preserve the Partial Ordering
This is one of two papers that was published simultaneously that described vector
clocks (the optional reading Mattern'88 is the other). The idea here is that while
total ordering, as provided by Lamport clocks, is appealing the additional structure
looses information. In particular we know that some events happen concurrently,
that is there is no causal relation between them. Vector clocks are an attempt
to capture this richer structure, and produce a partial order. A partial order
here means that for two events e and e' we can say that either e happens before
e', e' happens before e or neither of the happens before relations are true.
## Questions
* Consider a distributed system where the only events are sending and receiving
messages. Give pseudocode that each process should run when sending or receiving
messages in order to maintain a Vector Clock.
* Consider a distributed system with three processes, where each process must send at
least **one** message to one other process. Construct a scenario (a schedule/sequence
of events) where at least a pair of events happen **concurrently**.
# Alpern and Schneider '85: Defining Liveness
We briefly talked about this paper at the end of the last lecture. You should read
the paper carefully, but don't need to answer any questions about it.