Introduction To Machine Learning, Fall 2012

Introduction To Machine Learning

Fall 2012


Machine learning is an exciting and fast-moving field of Computer Science with many recent consumer applications (e.g., Kinect, Google Translate, Siri, digital camera face detection, Netflix recommendations) and applications within the sciences and medicine (e.g., predicting protein-protein interactions, species modeling, detecting tumors, personalized medicine). In this undergraduate-level class, students will learn about the theoretical foundations of machine learning and how to apply machine learning to solve new problems.

General information

Lectures: Tuesday and Thursday, 11am-12:15pm
Room: Warren Weaver Hall 317

Prof. David Sontag   
dsontag {@ | at}
Jinglun Dong
jinglundong {@ | at}

Office hours: Tuesday 5-6pm and by appointment. Location: 715 Broadway, 12th floor, Room 1204

Grading: problem sets (50%) + midterm exam (25%) + project (20%) + participation (5%). Problem Set policy

Books: No textbook is required for this class, but students may find it helpful to purchase one of the following books. Bishop's book is much easier to read, whereas Murphy's book has substantially more depth and coverage (and is up to date).

Mailing list: To subscribe to the class list, follow instructions here.

Project information


Note: the Bishop and Murphy readings suggested below are optional.

Lecture Date Topic Required reading Assignments
1 Sept 4 (Tues)
Overview [Slides]
Chapter 1 of Murphy's book

Bishop, Chapter 1 (optional)


2 Sept 6 (Th)
Introduction to learning [Slides]

Loss functions, Perceptron algorithm
Review notes on probability, linear algebra, and calculus

Sept 11 (Tues)
Linear classifiers [Slides]

Proof of perceptron mistake bound, introduction to Support vector machines
Notes on perceptron mistake bound (just section 1)

Notes on support vector machines (sections 1-4)

Bishop, Section 4.1.1 (pg. 181-182) and Chapter 7 (pg. 325-328)

Murphy, Section 14.5.2 (pg. 498-501)
ps1 (data) due Sept 25 at 11am [Solutions]
Sept 13 (Th)
Support vector machines [Slides]
See above. Also:

Bishop, Sections 7.1.1 and 7.1.3

Sept 18 (Tues)
Support vector machines (continued) [Slides]

Derivation of SVM dual, introduction to kernels
Optional: Hastie (book online for free - see below), Section 4.5 (pg. 129)

Sept 20 (Th)
Kernel methods [Slides] Notes on kernel methods (sec. 3-8)

Bishop, Section 6.2, Section 7.1 (except for 7.1.4), and Appendix E

Murphy, Chapter 14 (except 14.4 and 14.7)

Optional: For more on SVMs, see Hastie, Sections 12.1-12.3 (pg. 435). For more on cross-validation see Hastie, Section 7.10 (pg. 250).

Optional: For more advanced kernel methods, see chapter 3 of this book (free online from MIT libraries)

Sept 25 (Tues)
Kernel methods & optimization

Mercer's theorem, convexity
Lecture notes

Sept 27 (Th)
Learning theory [Slides]

Generalization of finite hypothesis spaces
These have only high-level overviews:
 - Murphy, Section 6.5.4 (pg. 209)
 - Bishop, Section 7.1.5 (pg. 344)

Oct 2 (Tues)
Learning theory (continued) [Slides]

Notes on learning theory
ps2, due Oct 8 by 4pm [Solutions]
Oct 4 (Th)
Nearest neighbor methods [Slides]

Also margin-based generalization
Notes on gap-tolerant classifiers (section 7.1, pg. 29-31)

Hastie et al., Sections 13.3-13.5 (on nearest neighbor methods)

Oct 9 (Tues)
Decision trees [Slides]

Bishop, Section 14.4 (pg. 663)

Murphy, Section 16.2

Oct 11 (Th)

No class on Oct 16 (Fall recess)
Ensemble methods, Boosting [Slides]
Hastie et al., Section 8.7 (bagging) and Sections 10.1-10.6 (boosting)

Boosting overview (except sections 6 and 8)

Bishop, Sections 14.2 & 14.3
Murphy, Section 16.4

Optional: Hastie et al. Chapter 15 (on random forests)

Oct 18 (Th)
Boosting (continued) [Slides]

A Few Useful Things to Know About Machine Learning

ps3, due Oct 25 at 11am

Oct 23 (Tues)
Midterm exam

Oct 25 (Th)
Clustering [Slides]

K-means, Agglomerative clustering
Hastie et al., Sections 14.3.6, 14.3.8, 14.3.9, 14.3.12

Murphy, Sections 25.1, 25.5-25.5.3

Bishop, Section 9.1 (pg. 424)
ps4 (2 page project proposal), due Nov. 6 at 5pm by e-mail

Oct 30 (Tues)
Class cancelled due to
Hurricane Sandy

Nov 1 (Th)
Class cancelled due to
Hurricane Sandy

Nov 6 (Tues)
Clustering (continued) [Slides]

Spectral clustering
Hastie et al., Section 14.5.3

Optional: Tutorial on spectral clustering

Murphy, Section 25.4

Nov 8 (Th)
Introduction to Bayesian methods [Slides]

Probability, decision theory
Murphy, Sections 3-3.3

Bishop, Sections 2-2.3.4

Nov 13 (Tues)
Naive Bayes [Slides]

Murphy, Sections 3.4, 3.5 (naive Bayes), 5.7 (decision theory)

Bishop, Section 1.5 (decision theory)

Nov 15 (Th)
Logistic regression [Slides]

Notes on naive Bayes and logistic regression

Murphy, 8-8.3 (logistic reg.), 8.6 (generative vs. discriminative)

Bishop,  4.2-4.3.4 (logistic reg.)
ps5, due Nov 27 at 11am [Solutions]
Nov 20 (Tues)

No class on Nov 22 (Thanksgiving)
Logistic Regression (continued) [Slides]

Nov 27 (Tues)
Linear Regression [Slides]
Hastie et al., Chapter 3 (pages 43-47)
Notes on linear regression (see Part 1)

Murphy, Section 7-7.5
Bishop, Sections 3-3.2

Nov 29 (Th)
Mixture models, EM algorithm [Slides]

Notes on mixture models
Notes on Expectation Maximization

Murphy, 11-, Section 11.4.7
Bishop, Sections 9.2, 9.3, 9.4

Dec 4 (Tues)
EM algorithm (continued) [Slides]

ps6, due Dec 11 at 11am
Dec 6 (Th)
Hidden Markov models [Slides]

Notes on HMMs
Tutorial on HMMs
Introduction to Bayesian networks

Murphy, Chapter 17
Bishop, Sections 8.4.1, 13.1-2

Dec 11 (Tues)
Factor analysis & dimensionality reduction (PCA) [Slides] Notes on PCA
More notes on PCA

Bishop, Sections 12.1 (PCA), 12.4.1 (ICA)

Optional: Barber, Chapter 15

Dec 13 (Th)
Dimensionality reduction (continued) [Slides]

Latent Dirichlet allocation
Review article on topic modeling

Dec 18 (Tues)
Project presentations

10 - 11:50am, WWH 317

Final projects due Dec 16 at 5pm (electronically)

Acknowledgements: Many thanks to the University of Washington, Carnegie Mellon University, UT Dallas, Stanford, UC Irvine, Princeton, and MIT for sharing material used in slides and homeworks.

Reference materials


This is an undergraduate-level course. Students should be very comfortable with basic mathematical skills in addition to good programming skills. Some knowledge of probability theory and statistics, linear algebra, and multivariable calculus will be helpful. Basic Algorithms (CSCI-UA.0310) is a prerequisite, although well qualified students taking it during Fall 2012 will be accepted with permission of the instructor. Please contact the professor with any additional questions.

Problem Set policy

I expect you to try solving each problem set on your own. However, when being stuck on a problem, I encourage you to collaborate with other students in the class, subject to the following rules:

  1. You may discuss a problem with any student in this class, and work together on solving it. This can involve brainstorming and verbally discussing the problem, going together through possible solutions, but should not involve one student telling another a complete solution.

  2. Once you solve the homework, you must write up your solutions on your own, without looking at other people's write-ups or giving your write-up to others.

  3. In your solution for each problem, you must write down the names of any person with whom you discussed it. This will not affect your grade.

  4. Do not consult solution manuals or other people's solutions from similar courses.