Location: Warren Weaver Hall 109
Date: December 13, 2002, 9:30 a.m.
Host: Yevgeniy Dodis
Modern Applications of Bloom Filters
Dr. Andrei Broder
A Bloom filter is an ingenious randomized data-structure for concisely representing a set in order to support approximate membership queries. The space efficiency is achieved at the cost of a small probability of false positives. It was invented by Burton Bloom in 1970 for the purpose of spell checking and for many years it was seldom mentioned in unrelated contexts, except for database optimization. Nevertheless, Bloom's beautiful approach has seen a sudden resurgence in a variety of large-scale applications such as shared web caches, query routing, replica location, and stream data analysis. This talk presents a plethora of recent uses of this old data structure, its modern variants, and the mathematical basis behind them, with the aim of making these ideas available to a wider community and the hope of inspiring new applications.
This is an "algorithm engineering" talk, targeted at both theoreticians and practioners and only a general technical background is needed. It is based on a survey being jointly prepared with Michael Mitzenmacher (Harvard).
What Can We Do in Sublinear Time?
Dr. Ronitt Rubinfeld
NEC Laboratories America
We have long considered showing the existence of a linear time algorithm for a problem to be the gold standard of achievement. Indeed, it is hard to imagine doing any better than that, since for any nontrivial problem, any algorithm must consider all of the input in order to make a decision. However, as extremely large data sets grow more prevalent in a wide variety of settings, it is natural to wonder what one can do in *sublinear* time. In fact, there has been a lot of recent interest in this direction.
This talk will contain a brief, nonexhaustive survey of the types of problems that can be solved in sublinear time. Since any sublinear time algorithm can only view a small portion of the input, for most natural problems the answer must necessarily be approximate in some sense. We concentrate on such algorithms. We will see that there are classical optimization problems whose values can be approximated in sublinear time. In addition, property testing, an alternative notion of approximation for decision problems, has been applied to give sublinear algorithms for a wide variety of problems.
I will spend the rest of the talk describing joint work with Bernard Chazelle and Luca Trevisan, in which we consider sublinear time algorithms for approximating the weight of a minimum spanning tree. We give a probabilistic algorithm for estimating the weight of a minimum spanning tree in time O(dw log(w)), where d is the average degree of the graph and w is a bound on the ratio of the maximum weight edge to the minimum weight edge. In particular, for constant degree graphs in which the ratio w is constant, our running time will also be constant. We also show that our running time is nearly tight.
Compact Representations for Data
Prof. Moses Charikar
Several algorithmic techniques have been devised recently to deal with large volumes of data. At the heart of many of these techniques are ingenious schemes to represent data compactly. This talk will present some constructions of such compact representation schemes. We will see an application to identifying similar objects (e.g. text documents) in a large collection. We will also see how such schemes lead to efficient, one-pass algorithms for processing large volumes of data (streaming algorithms).
Folding and Unfolding in Computational Geometry
Prof. Erik Demaine
When can a linkage of rigid bars be untangled or folded into a desired configuration? What polyhedra can be cut along their surface and unfolded into a flat piece of paper without overlap? What shapes can result by folding a piece of paper flat and making one complete straight cut? Folding and unfolding is a branch of discrete and computational geometry that addresses these and many other intriguing questions. I will give a taste of the many results that have been proved in the past few years, as well as the several exciting open problems that remain open. Many folding problems have applications in areas including manufacturing, robotics, graphics, and protein folding.
Refreshments will be offered starting 15 minutes prior to the scheduled start of the talk.