Statistical Natural Language Processing Fall 2017

Course#: CSCI-GA.3033-008
Instructor: Ankur Parikh
TA: Abhinav Gupta (
Lecture: Tuesdays 5:10-7:00PM, Warren Weaver Hall Room 512
Mailing List: Piazza
Office hours: Tuesdays 7-8 pm in WW 328 and by appointment

Class Summary:

In this course we will examine some of the core tasks in natural language processing, starting with simple word-based models for text classification and building up to rich, structured models for syntactic parsing and machine translation. In each case we will first cover classic approaches, then review recent research progress and finally discuss with the help of guest lecturers (from Google) how to design efficient systems for practical user applications. There will be a focus on corpus-driven methods that make use of supervised and unsupervised machine learning methods and algorithms. We will explore statistical approaches based on graphical models and deep learning.

Your grade will be composed of 4 components:

Scribe sheet signup. Scribe notes are due 2 weeks after the lecture, except for the 12/5 scribe notes which are due in one week. Please pick a lecture to scribe for. Here is a template to use for the notes.

Late policy: You get 7 late days to use at your discretion (no more than 5 per assignment). After that you lose 10% per day. Please note that late days cannot be used for the final project.

Reference Textbooks:

Readings will be assigned from:


Date Topic Textbook Reading Recent Papers Reading Scribe Notes Assignments
9/5 Introduction and Language Modeling Jurafsky & Martin Chapter 4 (also Ch. 1)
(or Manning & Schuetze Chapters 1-3, 6)
MT Tutorial, LM Overview,
Large LMs, Neural LM
Assignment 1 out
9/12 Text Classification and Machine Learning Overview Mitchell Chapter 3,
Jurafsky & Martin Chapter 7
Classification Tutorial,
MaxEnt Tutorial,
NN Primer, Neural Network Tutorial
Assignment 2 out
9/19 Part-of-Speech Tagging Jurafsky & Martin Chapter 9-10
(or Manning & Schuetze Chapter 10)
NN Guide, TnT Tagger
9/26 Lexical Semantics (Guest Lecturer: Prof. Ellie Pavlick) Jurafsky & Martin Chapters 15-17 Frege, word2vec, word2vec Explained,
Syntactic Embeddings, NNs for NLP
10/3 Advanced Part-of-Speech
Jurafsky & Martin Chapter 9-10
(or Manning & Schuetze Chapter 9)
Merialdo '94, Bilingual POS Induction,
CRFs, BiLSTM POS Tagging

You will be submitting the predictions that your system makes and the results will be automatically posted to the leadeboard below.


Rank Handle #2

The leadeboard is based on code from Wang Ling and Chris Dyer, which in turn is based on this course.