Computer Science Colloquium

Improving the End-to-end Availability of Internet-based Systems

David Andersen
MIT

Friday, March 12, 2004 11:30 A.M.
Room 1302 Warren Weaver Hall
251 Mercer Street
New York, NY 10012-1185

Directions: http://cs.nyu.edu/csweb/Location/directions.html
Colloquium Information: http://cs.nyu.edu/csweb/Calendar/colloquium/index.html

Hosts:

Richard Cole cole@cs.nyu.edu, (212) 998-3119

Abstract

The end-to-end availability of Internet services is between two and three orders of magnitude worse than other important engineered systems, including the US airline system, the 911 emergency response system, and the US public telephone system. This talk makes two contributions to improve end-to-end availability. First, a study of three years of data collected on a 31-site testbed explores why failures happen, and finds that access network failures, inter-provider and wide-area routing anomalies, domain name system faults, and server-side failures all have a role to play in reducing availability.

Second, an overlay network with new algorithms for end-to-end path selection improves availabity by one or two orders of magnitude compared to the current state. A purely overlay-based system, RON (resilient overlay networks), deploys nodes in different organizations and networks, carefully measures and monitors the status of available paths, and relies on them to cooperatively route packets by way of each other to bypass faults. A second system, RAN (resilient access networks), uses a combination of physical path redundancy and overlays within a network of cooperative Web proxies to improve the availability for Web users. Experimental evidence suggests that RON can reduce failures by a factor of six, and that with physical path redundancy, a six-site RAN eliminates almost all network-based failures.


top | contact webmaster@cs.nyu.edu