Building scalable geo-replicated storage backends for web applications
Candidate: Yair Sovran
Advisor: Jinyang Li

Abstract

Web applications increasingly require a storage system that is both scalable and can replicate data across many distant data centers or sites. Most existing storage solutions fall into one of two categories: Traditional databases offer strict consistency guarantees and programming ease, but are difficult to scale in a geo-replicated setting. NoSQL stores are scalable and efficient, but have weak consistency guarantees, placing the burden of ensuring consistency on programmers. In this dissertation, we describe two systems that help bridge the two extremes, providing scalable, geo-replicated storage for web applications, while also easy to program for. Walter is a key-value store that supports transactions and replicating data across distant sites. A key feature underlying Walter is a new isolation property: Parallel Snapshot Isolation (PSI). PSI allows Walter to replicate data asynchronously, while providing strong guarantees within each site. PSI does not allow write-write conflicts, alleviating the burden of writing conflict resolution logic. To prevent write-write conflicts and implement PSI, Walter uses two new and simple techniques: preferred sites and counting sets. Lynx is a distributed database backend for scaling latency-sensitive web applications. Lynx supports optimizing queries via data denormalization, distributed secondary indexes, and materialized join views. To preserve data constraints across denormalized tables and secondary indexes, Lynx relies on the a novel primitive: Distributed Transaction Chain (DTC). A DTC groups a sequence of transactions to be executed on different nodes while providing two guarantees. First, all transactions in a DTC execute exactly once despite failures. Second, transactions from concurrent DTCs are interleaved consistently on common nodes. We built several web applications on top of Walter and Lynx: an auction service, a microblogging service, and a social networking website. We have found that building web applications using Walter and Lynx is quick and easy. Our experiments show that the resulting applications are capable of providing scalable, low latency operation across multiple geo-replicated sites.