Computer Science Colloquium
Probabilistic Models of Text and Images
Friday, April 4, 2005 11:15 A.M.
Room 1302 Warren Weaver Hall
251 Mercer Street
New York, NY 10012-1185
Colloquium Information: http://cs.nyu.edu/csweb/Calendar/colloquium/index.html
Richard Cole firstname.lastname@example.org, (212) 998-3119
Managing large and growing collections of information is a central
goal of modern computer science. Data repositories of texts, images,
music, and genetic information have become widely accessible,
necessitating good methods of retrieval, organization, and
exploration. In this talk, I will describe probabilistic models of
information collections, for which the above problems can be cast as
First, I will describe the use of graphical models as a flexible
framework for the representation of modeling assumptions. Fast
posterior inference algorithms based on variational methods allow us
to specify complex Bayesian models and apply them to large datasets.
With this framework in hand, I will develop latent Dirichlet
allocation (LDA), a graphical model particularly suited to analyzing
text collections. LDA posits an index of hidden topics which describe
the underlying documents. The topics are learned from a collection,
and new documents can be situated into that collection via posterior
inference of their associated topics. Extensions of LDA can index a
set of images, or multimedia collections of related text and images.
I will illustrate the use of such models with several datasets.
Finally, I will describe nonparametric Bayesian methods for relaxing
the restriction to a fixed number of topics. These methods allow for
models based on the natural assumption that the number of topics grows
with the collection. I will extend this idea to trees, and to models
for discovering both the structure and content of a topic hierarchy.
Joint work with Michael Jordan, Andrew Ng, Thomas Griffiths, and Josh
David M. Blei completed his Ph.D. in Computer Science at U.C. Berkeley
in August 2004. He is currently a postdoctoral researcher at Carnegie
| contact email@example.com