Correlation Clustering with Noisy Input
Speaker: Claire Mathieu, Brown University
Location: Warren Weaver Hall 1302
Date: October 2, 2009, 11:30 a.m.
Host: Joel Spencer
Correlation clustering is a type of clustering that uses a basic form of input data: For every pair of data items, the input specifies whether they are similar (belonging to the same cluster) or dissimilar (belonging to different clusters). This information may be inconsistent, and the goal is to find a clustering (partition of the vertices) that disagrees with as few pieces of information as possible. Correlation clustering is APX-hard for worst-case inputs. We study the following semi-random noisy model to generate the input: start from an arbitrary partition of the vertices into clusters. Then, for each pair of vertices, the similarity information is corrupted (noisy) independently with probability p. Finally, an adversary generates the input by choosing similarity/dissimilarity information arbitrarily for each corrupted pair of vertices. In this model, our algorithm produces a clustering with cost at most 1 + o(1) times the cost of the optimal clustering, as long as p is not too close to 1/2. Moreover, if the noise p is small, then we can exactly reconstruct all clusters of the planted clustering that have size at least a constant times (1/epsilon), and provide a certificate (witness) proving that those clusters are in any optimal clustering. Among other techniques, we use the natural semi-definite programming relaxation followed by an interesting rounding phase. The analysis uses semi-definite programming duality and spectral properties of random matrices. (Work is joint with Warren Schudy, Brown University.)
Claire Mathieu is Professor of Computer Science at Brown University. Prof. Mathieu is primarily interested in the design and analysis of algorithms. She has also done work relating to average-case analysis of algorithms, computing with errors, models for DNA computing, analysis of statistical physics models, Monte Carlo Markov chains, computational geometry and online algorithms. Currently, her focus is on approximation algorithms for combinatorial optimization problems such as scheduling, packing, and clustering, with a particular focus on approximation schemes.
Refreshments will be offered starting 15 minutes prior to the scheduled start of the talk.