Nonlinear Dimensionality Reduction and Manifold Learning

Non-linear CCA and PCA by Alignment of Local Models

In the same way that you can align local representations of linear subspace models to get globally nonlinear models, you can also align representations of linear input-output models such as Canonical Correlation Analysis to get non-linear CCA. A special case of our method, when applied to only a single manifold, reduces to the Laplacian Eigenmaps algorithm.
Jakob Verbeek, Nikos Vlassis and I had a paper entitled Non-linear CCA and PCA by Alignment of Local Models describing this work in NIPS 2003.

Automatic Alignment of Local Representations

A big challenge when getting many experts to learn in concert is to get their hidden representations to agree in some way. In this paper, we present an automatic alignment procedure which maps the disparate internal representations learned by several local dimensionality reduction experts into a single, coherent global coordinate system for the original data space. Our algorithm can be applied to any set of experts, each of which produces a low-dimensional local representation of a high-dimensional input. Unlike the Global Coordination paper below (which coordinates models by modifying their objective functions) our algorithm is invoked after training and applies an efficient eigenmethod to post-process the trained models. It has no local optima and the size of the system it must solve scales with the number of local models rather than the number of original data points, making it more efficient than model-free algorithms such as Isomap or LLE.
A paper on this work, with Yee Whye Teh: Automatic Alignment of Local Representations appeared in NIPS14.

Stochastic Neighbour Embedding

Nonlinear dimensionality reduction algorithms can often be formulated in terms of what they preserve about the original data. Some, like PCA preserve variance, others, like MDS or Isomap large distances (metric, nonmetric or geodesic), others like SOM or LLE preserve nearby neighbours. In this paper we describe a probabilistic approach to the task of embedding high-dimensional objects into a low-dimensional space in a way that preserves the *distribution of neighbor identities*. A Gaussian is centered on each object in the high-dimensional space and the densities under this Gaussian are used to define a probability distribution over all the potential neighbors of the object. The aim of the embedding is to approximate this distribution as well as possible when the same operation is performed in the low-dimensional space. The natural cost function is a sum of Kullback-Leibler divergences, one per object. This leads to a simple gradient for adjusting the positions of objects in the low-dimensional space. Unlike other dimensionality reduction methods, this probabilistic framework makes it easy to represent each high-dimensional object by a mixture of widely separated low-dimensional points. This allows the embeddings of ambiguous objects, like the document count vector for the word ``bank'', to be close to the embeddings of both ``river'' and ``finance'' without forcing outdoor concepts to be located close to corporate ones.
As an example, here's a map of the NIPS vols.0-12 authors as produced by the SNE algorithm.
A paper on SNE, written with Geoff Hinton: Stochastic Neighbor Embedding appeared in NIPS14.

Global Coordination

High dimensional data that lies on or near a low dimensional manifold can be described by a collection of local linear models. Such a description, however, does not provide a global parameterization of the manifold---arguably an important goal of unsupervised learning. In this paper, we show how to learn a collection of local linear models that solves this more difficult problem. Our local linear models are represented by a mixture of factor analyzers, and the ``global coordination'' of these models is achieved by adding a regularizing term to their objective function. The regularizer breaks a degeneracy in the mixture model's parameter space, favoring models whose internal coordinate systems are aligned in a consistent way. As a result, the internal coordinates change smoothly and continuously as one traverses a connected path on the % data manifold---even when the path crosses the domains of many different local models. The regularizer takes the form of a Kullback-Leibler divergence and illustrates an unexpected application of variational methods: not to perform approximate inference in intractable probabilistic models, but to learn more useful internal representations in tractable~ones.
Our paper, Global Coordination of Local Linear Models appeared in NIPS13 (joint work with Lawrence Saul and Geoff Hinton).

Locally Linear Embedding (LLE)

Exploratory data analysis and visualization play an increasingly important role in all areas of science. The need to analyze large amounts of multivariate data raises the fundamental problem of dimensionality reduction--how to discover compact representations of high dimensional data. Here we introduce locally linear embedding (LLE), an unsupervised learning algorithm that is able to discover the global structure of nonlinear manifolds. LLE uses local symmetries and linear reconstructions to compute low dimensional, neighborhood preserving embeddings of multivariate data. Applied to images of faces, LLE discovers a coordinate representation of facial attributes; applied to documents of text, it colocates--in a continuous semantic space--words with similar contexts.
(with Lawrence Saul, AT&T Labs)
A paper describing this work appeared in the Dec22, 2000 issue of Science.
Here is a link to the online version of the paper.
A longer version of the LLE paper is now available, which appeared in volume 4 of JMLR.

Learning nonlinear dynamics

Zoubin Ghahramani and I have been working on learning models of dynamical systems using only observations of their input-output behaviour.
A final draft of a longer paper An EM algorithm for Identification of Nonlinear Dynamical Systems which is being submitted for publication.
Our paper Learning nonlinear dynamical systems using an EM algorithm appeared in NIPS11.

Sam Roweis, Vision, Learning and Graphics Group, NYU, www.cs.nyu.edu/~roweis