Special Topics in Statistical Natural Language Processing

Course#: G22.3033-010
Instructor: Slav Petrov
Lecture: Tuesdays 5:00-6:50PM, Room 412 Warren Weaver Hall
Office hours: By appointment


10/31/10: Assignment 4 is out, due Nov. 16, before class.
10/12/10: Assignment 3 is out, due Oct. 26, before class.
9/27/10: Assignment 2 is out, due Oct. 12, before class.
9/13/10: Assignment 1 is out, due Sept. 28, before class.
9/8/10: Posted lectures notes and reading for first classes.
7/25/10: Webpage is up.

Class Summary:

In this course we will explore statistical, model-based approaches to natural language processing. There will be a focus on corpus-driven methods that make use of supervised and unsupervised machine learning methods and algorithms. We will examine some of the core tasks in natural language processing, starting with simple word-based models for text classification and building up to rich, structured models for syntactic parsing and machine translation. In each case we will discuss recent research progress in the area and how to design efficient systems for practical user applications.

This course assumes a good background in basic probability and a strong ability and interest to program (in Java). There will be four course assignments and a final project. In the assignments, you will construct basic systems for core NLP tasks and then improve them through a cycle of error analysis and model redesign. Each of the assignments is worth 15% of your final grade. For the final project you will be able to choose a single topic or application and investigate it in greater depth. The final project will be 30% of your final grade. The remaining 10% will come from class participation.

The class is open to graduate as well as undergraduate students.


The primary text book for this course will be:
Jurafsky and Martin, Speech and Language Processing, Second Edition
Make sure to get the purple second edition of the book and not the white first edition.

The following book is not required, but can be useful as secondary literature and is available online:
Manning and Schuetze, Foundations of Statistical Natural Language Processing

Additional readings will come from recent research papers.


Date Topic Textbook Reading Recent Papers Reading Assignments
9/7 Introduction Jurafsky & Martin Chapter 1
(or Manning & Schuetze Chapters 1-3)
9/14 Language Models Jurafsky & Martin Chapter 4
(or Manning & Schuetze Chapter 6)
Chen & Goodman, Interpreting Kneyser-Ney,
Large Language Models
Assignment 1 out
9/21 Text Classification,
Word Sense Disambiguation
Classification Tutorial, MaxEnt Tutorial,
Generative and Discriminative Classifiers
9/28 More Classification Jurafsky & Martin Chapters 6.6, 19.1, 20
(or Manning & Schuetze Chapter 7)
Graphical Models, Latent Dirichlet Allocation Assignment 1 due,
Assignment 2 out
10/5 Part-of-Speech Tagging Jurafsky & Martin Chapter 5
(or Manning & Schuetze Chapter 10)
TnT Tagger, Toutanova & Manning '00
10/12 Advanced Part-of-Speech
Jurafsky & Martin Chapter 6
(or Manning & Schuetze Chapter 9)
Merialdo '94, CRFs,
Prototype-Driven Induction, Johnson '07
Assignment 2 due,
Assignment 3 out
10/19 Syntactic Parsing Jurafsky & Martin Chapters 12, 13
(or Manning & Schuetze Chapters 3, 11)
Best-First, A*, K-Best
10/26 Advanced Syntactic
Jurafsky & Martin Chapter 14
(or Manning & Schuetze Chapter 12)
Unlexicalized, Lexicalized, Latent Variable Assignment 3 due
11/2 Word Alignments Jurafsky & Martin Chapter 25
(or Manning & Schuetze Chapter 13)
MT Tutorial, Overview, IBM Models,
HMM-Alignments, Agreement
Proposal due
Assignment 4 out
11/9 Phrase-Based

Decoding, Phrases, Moses
11/16 Hierarchical (Syntax-Based)
Hiero, GHKM, Syntax vs. Phrases,
Synchronous Grammars
Assignment 4 due
11/23 Sentiment Analysis,
Pang & Lee Sentiment Analysis Book,
Jurafsky & Martin Chapter 23.3 - 23.7
Sentiment: Aspects, Lexicons, Summarization
Summarization: Query, N-Gram, Topical
11/30 Dependency Parsing Dependency Parsing Chapters 1-4
(available from NYU network)
MST Parsing, Shift-Reduce Parsing,
3rd Order Parsing, Dual Decomposition
12/7 Final Presentations How to (not) write a paper,
How to give a talk
Project due 12/17


There will be four assignments in which you will build systems for various NLP tasks. You will be provided with a (Java) code base that will handle most of the basic infrastructure and will only need to write the core components.