Distributed Systems Fall 2021 Lecture 2: Logical Clocks, Safety and Liveness The main readings for this week (Lamport'78 and Fidge'88) concern themselves with the task of ordering events (operations) in a distributed system. You might wonder why do we not just use real clocks? One problem is relativity, which tells us that the notion of time is dependent on location. This of course sounds unhelpful, we after all manage to schedule our lives around clocks despite relativity. Delay this thought until September 23, when we will look at some reasons why this results in practical challenges for distributed systems. Instead focus on logical clocks for this class. Why worry about ordering anyways? The reason is we always think about algorithms as presenting steps that must be performed in order: an algorithm is a sequence of steps that must be performed. Ordering is thus at the heart of this discussion: when we analyze distributed systems we need to understand the order in which they performed operations, and when designing systems we need to worry about how we ensure that operations are performed in order. # Lamport '78: Time, Clocks, and the Ordering of Events in a Distributed Systems In this paper, the first of the logical clock papers, Lamport describes an algorithm (a procedure) for recovering a **total order** on events in the distributed system. A total order here in particular means that for any two events (which remember are operations) e and e' either e < e' (e happens and then e' happens) or e' < e. Total orders are appealing, and Lamport's construction makes sure that the total order is sane: i.e., messages are sent before being received, and causality is maintained. However, despite this the total order Lamport derives is not **unique**, which just means that one can ascribe more than one total order to a set of events. ## Questions to consider * Consider a distributed system where the only events are sending and receiving messages. Give pseudocode that each process should run when sending or receiving messages in order to maintain a Lamport clock. * Construct an example for the application that only allows sends and receives where two or more total orders can be assigned to a single event sequence. # Fidge '88: Timestamps in Message-Passing Systems That Preserve the Partial Ordering This is one of two papers that was published simultaneously that described vector clocks (the optional reading Mattern'88 is the other). The idea here is that while total ordering, as provided by Lamport clocks, is appealing the additional structure looses information. In particular we know that some events happen concurrently, that is there is no causal relation between them. Vector clocks are an attempt to capture this richer structure, and produce a partial order. A partial order here means that for two events e and e' we can say that either e happens before e', e' happens before e or neither of the happens before relations are true. ## Questions * Consider a distributed system where the only events are sending and receiving messages. Give pseudocode that each process should run when sending or receiving messages in order to maintain a Vector Clock. * Consider a distributed system with three processes, where each process must send at least **one** message to one other process. Construct a scenario (a schedule/sequence of events) where at least a pair of events happen **concurrently**. # Alpern and Schneider '85: Defining Liveness We briefly talked about this paper at the end of the last lecture. You should read the paper carefully, but don't need to answer any questions about it.