CSC2515 Fall 2006 - Weekly and Other Readings

Textbook

There is no required textbook for the class.
Two recommended books that cover similar material are Hastie, Tibshirani, Friedman Elements of Statistical Learning and MacKay, Info Theory, Inference, and Learning Algorithms which is freely available online.

I will be handing out class notes as we go along.

Some classic papers will be assigned as weekly readings.
To indicate you have completed a reading, use the Online Reading Submission
page.

We will also be covering material similar to a variety of chapters from a few other books which I will point out in class.

Weekly Readings

September 12
L.G. Valiant, A Theory of the Learnable [pdf, 9pages]
September 19
Fix, Hodges, Nonparametric Discrimination: Consistency Properties, 1951. [pdf, 21 pages]
September 25
Pedro Domingos, Michael Pazzani, On the Optimality of the Simple Bayesian Classifier [pdf, 28pages] (you can skip some of the technical material in Sec6)
October 3
Robert Tibshirani Regression shrinkage and selection via the lasso [pdf , ps.gz, 28pages]
October 10
Rumelhart, Hinton and Williams, Learning representation by backpropagating errors, (Nature, 1986). [pdf, 4pages]
October 17
Michael I. Jordan and Robert A. Jacobs (1994), Hierarchical Mixtures of Experts and the EM Algorithm [pdf , ps.gz, 36pages]
October 24
C.K.Chow and C.N. Liu, Approximating discrete probability distributions with dependence trees [pdf, 6 pages]
October 31
Geoff Hinton and Radford Neal, A View of the EM Algorithm, Learning in Graphical Models (1998), [pdf , ps.gz, 14pages]
November 7
Zoubin Ghahramani and Geoff Hinton, The EM algorithm for Mixtures of Factor Analyzers [ps.gz, pdf, 8 pages]
November 14
Alan Poritz, Hidden Markov Models: A guided tour., ICASSP 1988. [pdf, 7pages]
November 21
Lawrence Saul, Fernando Pereira, Aggregate and mixed-order Markov models for statistical language processing., EMNLP 1997. [pdf, 9pages]
November 28
Rob Shapire, The Strength of Weak Learnability., Machine Learning 1990. [pdf, 31pages]
December 5
Corinna Cortes and Vladimir Vapnik, Support Vector Networks, Machine Learning 20(3): 273-297 (1995) [ps.gz, pdf, 31pages]

Additional Material

Probability and Statistics Review [ps.gz, pdf].
Some useful matrix identities and gaussian identities.
Andrew Moore at CMU has a tutorials page with many excellent mini-tutorials on various statistical machine learning topics.
In particular, you might want to check out his tutorials on probability and density, and on Gaussian and Bayesian classifiers.
A short MATLAB tutorial.
Numerical Recipes in C has some useful things on optimization and computing with real numbers, although some of their algorithms I wouldn't recommend actually using. It is available in pdf on the web here. Chapters 10 and and 15 might be of interest.

Extra Papers of Interest

R.A. Fisher, The Use of Multiple Measurements in Taxonomic Problems [pdf, 10pages]
Geman, Bienstock, Doursat, Neural Networks and the Bias/Variance Dilemma. Neural Computation (1992), [pdf 58 pages]
Sam Roweis and Zoubin Ghahramani, A Unifying Review of Linear Gaussian Models, Neural Compuation (1999), [pdf 41pages]
David Mackay, Maximum Likelihood and Covariant Algorithms for ICA [ps.gz, 15 pages]
Joel Max, Quantizing for Minimum Distortion, IRE Transactions on Information Theory, March 1960. [pdf, 6 pages]
Stuart Lloyd, Least Squares Quantization in PCM, IEEE Transactions on Information Theory 28(2), March 1982. [pdf, 9 pages]
Fix, Hodges, Nonparametric Discrimination: Consistency Properties, 1951. [pdf, 21 pages]
Marina Meila, An accelerated Chow and Liu algorithm [ps.gz, 12 pages]
An article from Scientific American on Stein's Paradox. Another paper on this topic.
Golub, Heath, Wahaba, Generalized Cross Validation, Technometrics 1979. [pdf]
Leo Brieman, Bagging Predictors [pdf, 20 pages]
David Wolpert, Stacked Generalization [ps.gz, 57 pages]
Bradley Efron and Gail Gong, A Leisurely Look at the Bootstrap, the Jackknife, and Cross-Validation The American Statistician, Vol. 37, No. 1. (Feb., 1983), pp. 36-48. [pdf13pages]
(Note that there is a tiny typo in this paper: 2 lines below expression (3) on the 1st page, the bar is ommited from the x(i) on the right side of the equation.)
The voted perceptron algorithm is introduced in this paper by Freund and Schapire.
The very cool Winnow algorithm is introduced by Littlestone in this paper.
S. Gallant. Perceptron-based learning algorithms. [pdf, 13pages]
Some papers on decision trees: J.R. Quinlan's original paper, Induction of Decision Trees; a review paper by Murthy, a paper by Chou on approximately optimal partitioning.
A couple of suggestions for how to improve Fisher's discriminant.

CSC2515 - Machine Learning || www.cs.toronto.edu/~roweis/csc2515/