Title: Streaming k-means Approximation. Speaker: Ragesh Jaiswal, Columbia Abstract: We extend results of Arthur and Vassilvitskii ( k-means++: the advantages of careful seeding, SODA 2007) to provide a pseudo-approximation algorithm for the k-means clustering problem in the batch (non-streaming) setting. We then use this algorithm in an extended version of the divide-and-conquer strategy by Guha et al. to obtain a streaming algorithm for the k-means problem with good approximation/memory tradeoffs. More specifically, we first extend the k-means++ algorithm, due to Arthur and Vassilvitskii, which is a randomized algorithm that gives an O(log k) approximation to the k-means objective in the batch setting. Our new algorithm, is shown to give a constant approximation factor to the k-means objective, and outputs O(k log k) centers. We then provide a variant of Guha et al.'s algorithm for k-means in the one-pass (streaming) setting, resulting in an efficient algorithm which outputs exactly k centers and gives a tradeoff of O(n^a) memory for an O(c^(1/a) log(k)) approximation for any constant a<1 and some fixed constant c>1. Empirical evaluations on real and simulated data reveal the practical utility of our method. I will also talk about a quantitatively improved version of our first result obtained independently by Aggarwal, Deshpande, and Kannan (Adaptive Sampling for k-means Clustering, APPROX 2009). This is a joint work with Claire Monteleoni (CCLS, Columbia) and Nir Ailon (Google). This work appeared in NIPS 2009.