Distributed Systems Fall 2022
Lecture 5: RSMs

This is the first of several lectures
on replicated state machines, which are also the subject of Lab 3 and Lab 4.
Schneider's 1990 tutorial presents an overview of replicated state machines,
and how they can be used to achieve fault tolerance. The Raft paper (Ongaro and
Oustehout) presents Raft, a recent paper that both proposes a consensus protocol
and how to employ it to build RSMs.

# Schneider: Implementing Fault-Tolerant Services using the State Machine Approach: A Tutorial

Consider a case where you have a program that can run on a single machine (or
server), e.g., a web server for ordering pizzas. A machine can fail, and a failed
web server can result in an inability to order pizza, which is both tragic for
the pizza consumer and loses money for the pizza vendor. As we discussed in the
first lecture our focus in this class is going to be on how to use distributed
computing to mask such failures.

The question then becomes how do we convert this pizza web-server into a distributed
pizza web-server than can survive machine failures. One approach would be to scrap
the existing web server and come up with a brand new protocol. Another, simpler
(perhaps lazier) approach would be to run multiple copies (replicas) of the
pizza web-server and switch between these replicas when failures occur. The
difficult part is making sure that users (pizza consumers) do not see any observable
difference during failover. RSMs represent an architecture for building such
replicated fault-tolerance services. This paper presents the requirements on programs
being replicated and how to replicate them.

## Questions
(a) What requirements do RSMs impose on the program being replicated?
(b) What is the role of consensus protocols in RSMs?