Introduction To Machine Learning, Fall 2013

Introduction To Machine Learning

Fall 2013


Machine learning is an exciting and fast-moving field of computer science with many recent consumer applications (e.g., Microsoft Kinect, Google Translate, Iphone's Siri, digital camera face detection, Netflix recommendations, Google news) and applications within the sciences and medicine (e.g., predicting protein-protein interactions, species modeling, detecting tumors, personalized medicine). In this undergraduate-level class, students will learn about the theoretical foundations of machine learning and how to apply machine learning to solve new problems.

General information

Lectures: Tuesday and Thursday, 11am-12:15pm
Room: Warren Weaver Hall 312

Prof. David Sontag   
dsontag {@ | at}
Chen-Chien Wang

ccw352 {@ | at}

Office hours: Tuesday 5-6pm. Location: 715 Broadway, 12th floor, Room 1204

Grading: problem sets (50%) + midterm exam (25%) + project (20%) + participation (5%). Problem Set policy

Pre-requisites: Basic Algorithms (CS 310) is required, but can be taken concurrently. Students should be very comfortable with basic mathematical skills in addition to good programming skills. Some knowledge of linear algebra and multivariable calculus will be helpful.

: No textbook is required (readings will come from freely available online material). If an additional reference is desired, the following books are good options. Bishop's book is easier to read, whereas Murphy's book has more depth and coverage (and is up to date).

Mailing list: To subscribe to the class list, follow instructions here.

Project information


Note: the Bishop and Murphy readings are optional

Lecture Date Topic Required reading Assignments
1 Sept 3 (Tues)
Overview [Slides]
Chapter 1 of Murphy's book

Bishop, Chapter 1 (optional)


2 Sept 5 (Th)
Introduction to learning [Slides]

Loss functions, Perceptron algorithm, proof of perceptron mistake bound
Barber 17.1-2 (stop before 17.2.1) on least-squares regression, 29.1.1-4 (review of vector algebra)

Notes on perceptron mistake bound (just section 1)
ps1 (data), due Sept 17 at 11am
Sept 10 (Tues)
Linear classifiers [Slides]

Introduction to Support vector machines

Notes on support vector machines (sections 1-4)

Bishop, Section 4.1.1 (pg. 181-182) and Chapter 7 (pg. 325-328)

Murphy, Section 14.5.2 (pg. 498-501)

Sept 12 (Th)
Support vector machines [Slides]
See above. Also:

Bishop, Sections 7.1.1 and 7.1.3

Sept 17 (Tues)
Support vector machines (continued) [Slides]

Derivation of SVM dual, introduction to kernels
Notes on SVM dual and kernel methods (sec. 3-8)

If you would like a second reference, see these notes (sections 5-8)

Bishop, Section 6.2, Section 7.1 (except for 7.1.4), and Appendix E

Murphy, Chapter 14 (except 14.4 and 14.7)
ps2, due Sept 24 at 11am
Sept 19 (Th)
Kernel methods [Slides]
See above.

Optional: For more on SVMs, see Hastie, Sections 12.1-12.3 (pg. 435). For more on cross-validation see Hastie, Section 7.10 (pg. 250).

Optional: For more advanced kernel methods, see chapter 3 of this book (free online from NYU libraries)

Sept 24 (Tues)
Kernel methods & optimization

Mercer's theorem, convexity
Lecture notes

Sept 26 (Th)
Learning theory [Slides]

Generalization of finite hypothesis spaces
Lecture notes

These have only high-level overviews:
 - Murphy, Section 6.5.4 (pg. 209)
 - Bishop, Section 7.1.5 (pg. 344)
ps3 (data), due Oct. 8 at 11am
Oct 1 (Tues)
Learning theory (continued) [Slides]

Notes on learning theory

Oct 3 (Th)
Learning theory (continued) [Slides]

Also margin-based generalization
Notes on gap-tolerant classifiers (section 7.1, pg. 29-31)

Oct 8 (Tues)
Nearest neighbor methods [Slides]

Hastie et al., Sections 13.3-13.5 (on nearest neighbor methods)

Bishop, Section 14.4 (pg. 663)

Murphy, Section 16.2
ps4, due Oct. 17 at 11am
Oct 10 (Th)

No class on Oct 15 (Fall recess)
Decision trees [Slides]
Mitchell Ch. 3

Rudin's lecture notes

Oct 17 (Th)
Ensemble methods [Slides]

Random forests
Hastie et al., Section 8.7 (bagging)

Optional: Hastie et al. Chapter 15 (on random forests)

Oct 22 (Tues)
Midterm exam

A Few Useful Things to Know About Machine Learning

Oct 24 (Th)
Clustering [Slides]

Hastie et al., Sections 14.3.6, 14.3.8, 14.3.9, 14.3.12

Murphy, Sections 25.1, 25.5-25.5.3

Bishop, Section 9.1 (pg. 424)
Project proposal, due Oct. 31 at 5pm by e-mail
Oct 29 (Tues)
Clustering (continued) [Slides]

Hierarchical clustering
See above.

Oct 31 (Th)
Clustering (continued) [Slides]

Spectral clustering
Hastie et al., Section 14.5.3

Optional: Tutorial on spectral clustering

Murphy, Section 25.4

Nov 5 (Tues)
Introduction to Bayesian methods [Slides]

Probability, decision theory
Murphy, Sections 3-3.3

Bishop, Sections 2-2.3.4

Nov 7 (Th)
Naive Bayes [Slides]

Murphy, Sections 3.4, 3.5 (naive Bayes), 5.7 (decision theory)

Bishop, Section 1.5 (decision theory)

Nov 12 (Tues)
Logistic regression [Slides]

Notes on naive Bayes and logistic regression

Murphy, 8-8.3 (logistic reg.), 8.6 (generative vs. discriminative)

Bishop,  4.2-4.3.4 (logistic reg.)
ps5, due Nov 21 at 11am [Solutions]
Nov 14 (Th)
Mixture models, EM algorithm [Slides]

Notes on mixture models
Notes on Expectation Maximization

Murphy, 11-, Section 11.4.7
Bishop, Sections 9.2, 9.3, 9.4

Nov 19 (Tues)
EM algorithm (continued) [Slides]

Nov 21 (Th)
Hidden Markov models [Slides]

Notes on HMMs
Tutorial on HMMs

Murphy, Chapter 17
Bishop, Sections 8.4.1, 13.1-2

Nov 26 (Tues)
Dimensionality reduction (PCA) [Slides] Notes on PCA
More notes on PCA

Bishop, Sections 12.1 (PCA), 12.4.1 (ICA)

Optional: Barber, Chapter 15
ps6 (data), due Dec. 5 at 11am
Dec 3 (Tues)
No class on Nov 28 (Thanksgiving)
Bayesian networks [Slides]

Latent Dirichlet allocation
Review article on topic modeling
Introduction to Bayesian networks

Dec 5 (Th)
Collaborative filtering

Overview of matrix factorization

Dec 10 (Tues)
Applications in computational biology [Slides]

An introduction to graphical models

Dec 12 (Th)
Project presentations (group 1)

Dec 17 (Tues)
Project presentations (everyone else)

During final exam slot. Note the special time! Same location.

Acknowledgements: Many thanks to the University of Washington, Carnegie Mellon University, UT Dallas, Stanford, UC Irvine, Princeton, and MIT for sharing material used in slides and homeworks.

Reference materials

Problem Set policy

I expect you to try solving each problem set on your own. However, when being stuck on a problem, I encourage you to collaborate with other students in the class, subject to the following rules:

  1. You may discuss a problem with any student in this class, and work together on solving it. This can involve brainstorming and verbally discussing the problem, going together through possible solutions, but should not involve one student telling another a complete solution.

  2. Once you solve the homework, you must write up your solutions on your own, without looking at other people's write-ups or giving your write-up to others.

  3. In your solution for each problem, you must write down the names of any person with whom you discussed it. This will not affect your grade.

  4. Do not consult solution manuals or other people's solutions from similar courses.
Late submission policy During the semester you are allowed at most two extensions on the homework assignment. Each extension is for at most 48 hours and carries a penalty of 25% off your assignment.