
Nonlinear Dimensionality Reduction and Manifold Learning
Nonlinear CCA and PCA by Alignment of Local Models
 In the same way that you can align local representations
of linear subspace models to get globally nonlinear models, you can
also align representations of linear inputoutput models such as
Canonical Correlation Analysis to get nonlinear CCA.
A special case of our method, when applied to only a single manifold,
reduces to the Laplacian Eigenmaps algorithm.
 Jakob Verbeek, Nikos Vlassis and I had a paper entitled
Nonlinear CCA and PCA by Alignment of Local Models
describing this work in NIPS 2003.
Automatic Alignment of Local Representations
 A big challenge when getting many experts to learn in concert is
to get their hidden representations to agree in some way.
In this paper, we
present an automatic alignment procedure which maps the disparate
internal representations learned by several local dimensionality reduction
experts into a single, coherent global coordinate system for the
original data space. Our algorithm can be applied to any set of
experts, each of which produces
a lowdimensional local representation
of a highdimensional input. Unlike the Global Coordination
paper below (which coordinates models by modifying their objective
functions) our algorithm is invoked after training and applies
an efficient eigenmethod to postprocess the
trained models. It has no local optima and the size of the system it
must solve scales with the number of local models rather than the
number of original data points, making it more efficient than
modelfree algorithms such as Isomap or LLE.
 A paper on this work, with Yee Whye Teh:
Automatic Alignment of Local Representations
appeared in NIPS14.
Stochastic Neighbour Embedding
 Nonlinear dimensionality reduction algorithms can often be
formulated in terms of what they preserve about the original data.
Some, like PCA preserve variance, others, like MDS or Isomap large distances
(metric, nonmetric or geodesic), others like SOM or LLE preserve
nearby neighbours. In this paper
we describe a probabilistic approach to the task of embedding
highdimensional objects into a lowdimensional space in a way that
preserves the *distribution of neighbor identities*.
A Gaussian is centered on each object in the
highdimensional space and the densities under this Gaussian are used
to define a probability distribution over all the potential neighbors
of the
object. The aim of the embedding is to approximate this distribution
as well as possible when the same operation is performed in the
lowdimensional space. The natural cost function is a sum of
KullbackLeibler divergences, one per object. This leads to a simple
gradient for adjusting the positions of objects in the lowdimensional
space. Unlike other dimensionality reduction methods,
this probabilistic framework makes it easy to represent
each highdimensional object by a mixture of widely separated
lowdimensional points. This allows the embeddings of ambiguous
objects, like the document count vector for the word ``bank'', to be
close to the embeddings of both ``river'' and ``finance'' without
forcing outdoor concepts to be located close to corporate ones.
 As an example, here's a map of the NIPS
vols.012 authors as produced by the SNE algorithm.
 A paper on SNE, written with Geoff Hinton:
Stochastic Neighbor Embedding
appeared in NIPS14.
Global Coordination
 High dimensional data that lies on or near a low dimensional
manifold can be described by a collection of local linear models.
Such a description, however, does not provide a global
parameterization of the manifoldarguably an important goal
of unsupervised learning.
In this paper, we show how to learn a collection of local linear
models that solves this more difficult problem. Our local linear
models are represented by a mixture of factor analyzers, and the
``global coordination'' of these models is achieved by adding a
regularizing term to their objective function. The regularizer breaks
a degeneracy in the mixture model's parameter space, favoring models
whose internal coordinate systems are aligned
in a consistent way. As a result, the internal coordinates change
smoothly and continuously as one traverses a connected path on the % data
manifoldeven when the path crosses the domains of many different
local models. The regularizer takes the form of a KullbackLeibler
divergence and illustrates an unexpected application of variational
methods: not to perform approximate inference in intractable
probabilistic models, but to learn more useful internal
representations in tractable~ones.
 Our paper,
Global Coordination of Local Linear Models appeared in
NIPS13 (joint work with Lawrence Saul and Geoff Hinton).
Locally Linear Embedding (LLE)

Exploratory data analysis and visualization play an
increasingly important role in all areas of science.
The need to analyze large amounts of multivariate data
raises the fundamental problem of dimensionality reductionhow to
discover compact representations of high dimensional data. Here we
introduce locally linear embedding (LLE), an unsupervised learning
algorithm that is able to discover the global structure of nonlinear
manifolds. LLE uses local symmetries and linear reconstructions to
compute low dimensional, neighborhood preserving embeddings of
multivariate data. Applied to images of faces, LLE discovers a
coordinate representation of facial attributes; applied to documents
of text, it colocatesin a continuous semantic spacewords with
similar contexts.
(with Lawrence Saul, AT&T Labs)
 A paper describing this
work appeared in the Dec22, 2000 issue of Science.
Here is a link to the
online version of the paper.
 A longer version
of the LLE paper is now available, which appeared in volume 4 of JMLR.
Learning nonlinear dynamics
[ 
Information 
Research 
Teaching 
Professional 
]
Sam Roweis, Vision, Learning and Graphics Group,
NYU, www.cs.nyu.edu/~roweis
