This page contains the schedule, slide from the lectures, lecture notes, reading lists,
assigments, and web links.
I urge you to download the DjVu viewer
and view the DjVu version of the documents below. They display faster,
are higher quality, and have generally smaller file sizes than the PS and PDF.
|01/26: multi-layer learning|
Early history of multilayer learning
|02/01: Target Prop Algorithms|
|02/08: Unsupervised Feature Learning|
- Marc'Aurelio Ranzato: Symmetric Product of Experts.
|02/15: Unsupervised Learning|
- A short Review of statistical physics concepts: energy, entropy,
free energy, gibbs distribution (Yann).
- Helmoltz Machines: this
page. Either (Hinton and Zemel, NIPS 1994),
(Zemel and Hinton, Neural Computation 1995), (Hinton, Dayan, Frey, and
Neal, Science 1995), or (Dayan, Hinton, Neal, Zeme, Neural Computation
1995), or some combination thereof (Alyssa, Piotr, Marina).
Training Products of Experts by Minimizing Contrastive Divergence.
Neural Computation, 2002 (Philip, Marco, Marc'Aurelio).
Different types of graphical models (Yann)
- Bayesian belief nets
- directed graphical models
- graphical models with loops are generally intractable
- conditional probability tables are invertible with Bayes rule:
the directions of the arrow don't matter in principle
(they do not express causality, just dependency).
- undirected graphical models: the likelihood is a product of potential functions
- Markov random fields: graphical models with local interactions
- undirected graphical models with potential functions
must be normalized explicitely. The partition function problem.
- factor graphs: each potential function is explicitely represented
(a slightly more general representation of graphical models)
- logarithmic representation: the factors are additive energy
functions. The likelihood is proportional to exp(-energy).
- energy-based models: factors graphs without normalization (no
partition function). Can be used when no explicit probabilities
are required: only the relative values of the energis matter.
- representing common models as factor graphs:
example: an HMM is a "comb".
|03/08: Independent Component Analysis, Source Separation|
- Bell AJ, Sejnowski TJ (1995) "An
information-maximization approach to blind separation and blind
deconvolution," Neural Computation, 7: 1129-1159.
version of Bell and Sejnowski (Crispy, Jie, George).
- Zibulevsky & Pearlmutter: Blind Source Separation by Sparse
Decomposition in a Signal Dictionary. Neural Computation, 13(4):863-882. 2001.
(Jeremy, Sumit, Koray).
- Hinton G. E., Welling, M., Teh, Y. W, and Osindero, S.
A New View of ICA,
Proceedings of ICA-2001, San Diego (Raia, Yury, Jihun).
Links, additional info
Tutorial / Review
Graph Transformer Networks. Sequence labeling with energy-Based factor
graphs. (see gradient-based
learning applied to document recognition part 4-7, page 16 on.
- John Lafferty, Andrew McCallum, and Fernando Pereira.
random fields: Probabilistic models for segmenting and labeling
sequence data. Proceedings of ICML-01, 2001 (Alyssa, Piotr,
- Michael Collins.
Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms.
EMNLP 2002. (Matt, Ayse).
- B. Taskar, C. Guestrin and D. Koller.
Markov Networks Neural Information Processing Systems Conference
(NIPS03), 2003 (Philip, Marco, Marc'Aurelio).
|03/29: Dynamic Graphical Models|
NO CLASS (Snowbird workshop)
|04/12: Reinforcement Learning|
Each group will study and explain one class of RL algorithm,
with an application, as listed below. Much of the required
information can be found in Sutton and Barto's book
Learning: An Introduction. However, a number of other sources of
introductory information is listed below.
- [Crispy, Jie, George]:
- [Jeremy, Sumit, Koray]:
one of the
original TD-Gammon papers by Gerry Tesauro.
- [Raia, Yury, Jihun]:
algorithms, with the
Background Reading Material