Computer Science Colloquium

Statistics, Geometry, Computation: Searching for Low Dimensional Manifolds in High Dimensional Data

Lawrence Saul

Monday, February 28, 2005 11:15 A.M.
Room 1302 Warren Weaver Hall
251 Mercer Street
New York, NY 10012-1185

Colloquium Information:


Yann LeCun, (212) 998-3283


How can we detect low dimensional structure in high dimensional data? If the data is mainly confined to a low dimensional subspace, then simple linear methods can be used to discover the subspace and estimate its dimensionality. More generally, though, if the data lies on (or near) a low dimensional manifold, then its structure may be highly nonlinear, and linear methods are bound to fail.

The last few years have witnessed several advances in the problem of nonlinear dimensionality reduction. Given high dimensional data sampled from a low dimensional manifold, we now have several frameworks for estimating the data's intrinsic dimensionality and computing a faithful low dimensional embedding. Surprisingly, the main computations for "manifold learning" are based on highly tractable optimizations, such as nearest-neighbor searches, least squares fits, eigenvalue problems, and semidefinite programming.

Building on elementary ideas from convex optimization, spectral graph theory, and differential geometry, I will describe two recent approaches that we have developed for the problem of nonlinear dimensionality reduction: one for computing distance-preserving (isometric) embeddings, the other for computing angle-preserving (conformal) embeddings. The resulting algorithms can be understood in terms of simple physical intuitions. In practice, the embeddings computed by these algorithms are quite useful for the visualization and analysis of high dimensional data sets. I will also discuss several applications to problems in image, speech, and language processing.

top | contact