Data sharing has become more and more collaborative with the rise of Web 2.0, where multiple writers jointly write and organize the content in a repository. Current solutions use a centralized entity, such as Wikipedia or Google Groups, to serve the data. However, centralized solutions may be undesirable due to privacy concerns and censorship, which are problems that can be alleviated by switching to decentralized solutions.
The challenge of building a decentralized collaborative repository is achieving high data availability, durability, and consistency. Attaining these goals is difficult because peer nodes have limited bandwidth and storage space, low availability, and the repository has high membership churn.
This thesis presents Friendshare, a decentralized multiple-writer data repository. Separating the metadata from the data allows for efficient metadata replication across privileged admin nodes, thus increasing availability and durability. The primary commit scheme, where a primary node is responsible for determining the total order of writes in the repository, is employed to ensure eventual consistency. If the primary leaves the system unexpectedly, the remaining admin nodes run Paxos, a consensus protocol, to elect a new primary.
The Paxos protocol requires high node availability in order to be run efficiently, a criteria that is rarely met in typical peer-to-peer networks. To rectify this problem, we offer two optimizations to improve Paxos performance in low availability environments.
Friendshare has been implemented and deployed to gather real-world statistics. To offer theoretical predictions, we built a simulator to demonstrate the performance and service availability of Friendshare at various node online percentages. In addition, we show the performance improvements of our Paxos optimizations in comparison with the basic Paxos protocol.