Distributed Systems Spring 2024 Lecture 5: RSMs This is the first of several lectures on replicated state machines, which are also the subject of Lab 3 and Lab 4. Schneider's 1990 tutorial presents an overview of replicated state machines, and how they can be used to achieve fault tolerance. The Raft paper (Ongaro and Oustehout) presents Raft, a recent paper that both proposes a consensus protocol and how to employ it to build RSMs. # Schneider: Implementing Fault-Tolerant Services using the State Machine Approach: A Tutorial Consider a case where you have a program that can run on a single machine (or server), e.g., a web server for ordering pizzas. A machine can fail, and a failed web server can result in an inability to order pizza, which is both tragic for the pizza consumer and loses money for the pizza vendor. As we discussed in the first lecture our focus in this class is going to be on how to use distributed computing to mask such failures. The question then becomes how do we convert this pizza web-server into a distributed pizza web-server than can survive machine failures. One approach would be to scrap the existing web server and come up with a brand new protocol. Another, simpler (perhaps lazier) approach would be to run multiple copies (replicas) of the pizza web-server and switch between these replicas when failures occur. The difficult part is making sure that users (pizza consumers) do not see any observable difference during failover. RSMs represent an architecture for building such replicated fault-tolerance services. This paper presents the requirements on programs being replicated and how to replicate them. ## Questions (a) What requirements do RSMs impose on the program being replicated? (b) What is the role of consensus protocols in RSMs?