Advanced Natural Language Processing

CSCI-GA.2591
Fall 2017
Monday 5:10-7:00
60 Fifth Avenue [Forbes Building], Room C12

This version posted 17 August 2017.

Course Information>

Instructor

Prof. Ralph Grishman
60 Fifth Avenue [Forbes Building], Room 300
phone:  998-3497
email:  grishman@cs.nyu.edu
Fall office hours: Mondays, 2:30-3:30 PM
(generally also available Mondays 11:00-1:00 and Thursdays 11:00-12:00; please send an email in advance if possible)

Prerequisites

An introductory class in natural language processing.
Good programming skills.
Knowledge of Java.

Course objectives

We assume that you are familiar with the main language analysis components of NLP. This course is intended to give you hands-on experience in assembling these components into an integrated NLP system. The system we will build will be for knowledge-base construction: converting an (unstructured) text corpus into a (structured) data base.

Approach

We will begin by reading a few recent papers on each component in order to understand the design alternatives. Each student will select one component to implement, delivering a minimal functional version after 3 weeks and a higher-performing one after 6 weeks. These will be initially connected in a pipeline, most likely with modest performance. The minimal version needs to be in Java; later versions may use deep learning languages such as Theano or TensorFlow. We will then consider how to improve perormance through domain modeling and joint inference, in particular using Markov Logic Networks. In the final weeks each student will conduct an experiment to improve some aspect of KBC performance through domain modeling, joint inference, or deep learning.

Student obligations

Because much of the work will be presented in class, assignment deadlines will be strictly enforced. There is no final exam.

Textbook

There is no text for the course; all readings will be made available on-line.

As a general reference for most of the topics covered in the course we recommend Jurafsky and Martin's Speech and Language Processing.

Slides and Notes

slides notes
1 (Sept. 11) Preliminaries
JetLite
Tokenizer
JetLite
Tokenizer bibliography
2 (Sept. 18) Maximum Entropy
Named Entity tagging
NamedEntity references