Speaker: David Blei, Princeton University
Location: Kaufman Management Center KMC5-90
Date: November 30, 2012, 11:30 a.m.
Host: NYU Stern, IOMS Department
Probabilistic topic modeling provides a suite of tools for analyzing large collections of documents. Topic modeling algorithms can uncover the underlying themes of a collection and decompose its documents according to those themes. We can use topic models to explore the thematic structure of a corpus and to solve a variety of prediction problems about documents.
Most topic models are based on hierarchical mixed-membership models, where each document expresses a set of components (called topics) with individual per-document proportions. The computational problem is to condition on a collection of observed documents and estimate the posterior distribution of the topics and per-document proportions. In modern data sets, this amounts to posterior inference with billions of latent variables.
How can we cope with such data? In this talk, I will describe stochastic variational inference, an algorithm for computing with topic models that can handle very large document collections (and even endless streams of documents). I will demonstrate our algorithm with models fitted to millions of articles. I will show how stochastic variational inference can be generalized to many kinds of hierarchical models, including models of images and social networks, and Bayesian nonparametric models. I will highlight several open questions and outstanding issues.
Refreshments will be offered starting 15 minutes prior to the scheduled start of the talk.