NIPS Tutorial 1999

Probabilistic Models for Unsupervised Learning

Tutorial presented at the 1999 NIPS Conference by Zoubin Ghahramani and Sam Roweis

Download Tutorial Slides: [gzip compressed postscript] [pdf]

Abstract: Many of the methods used for clustering, dimensionality reduction, source separation, time series modeling, and other classical problems in unsupervised data modeling are closely related to each other. The focus of this tutorial is to present a consistent unified picture of how these methods, which have been developed and rediscovered in several different fields, are variants of each other, and how a single framework can be used to develop learning algorithms for all of them. We will start from a humble Gaussian model, to describe how continuous state models such as factor analysis, principal components analysis (PCA) and independent components analysis (ICA) are related to each other. We will then motivate discrete state mixture models and vector quantization. Mixture models and factor analysis are then extended to model time series data, and result in hidden Markov models (HMMs) and linear-Gaussian dynamical systems (a.k.a. state-space models), respectively.

All of these models can be described within the framework of probabilistic graphical models, which we will briefly introduce. In this framework it becomes easy to explore variants and hybrids (such as mixtures of factor analyzers and switching state-space models) which are potentially powerful tools. This framework also makes it clear that the same general probability propagation algorithm can be used to infer the hidden (i.e. latent) variables in all these models, and that the EM algorithm can be used to learn the maximum likelihood (ML) parameters. In the latter part of the tutorial we will focus on approximate inference techniques for models in which probability propagation is intractable, and on variational methods for Bayesian model averaging which can overcome the overfitting and model selection problems in ML learning. Matlab demos will be used to demonstrate some of the models and algorithms.