Natural Language Processing

Spring 2013
Prof. Grishman
Tuesdays, 5:00-6:50, 202 Warren Weaver Hall

Course Description

The Web, along with intranets and electronic publication, is making vast amounts of text available on-line. But getting the information we need out of these texts still involves a lot of searching and reading. Web search can at best find relevant documents (along with a lot of irrelevant ones); it doesn't find the facts we need.

This course will consider how methods of natural language processing can be used to bridge this gap: to extract information automatically from text, creating data bases from news, scientific publications, and medical records. When coupled with social media, it can offer near-real-time event tracking. We will consider several levels of text analysis, including syntactic analysis (grammars and parsing), semantic analysis (word and sentence meaning), and discourse analysis (pronoun resolution and text structure).  We will use both systems based on hand-coded rules and those trained automatically from corpora using statistical methods.

During the course you will use and extend a suite of text processing tools, JET, coded in Java, building up all the basic components for an information extraction system. There will be 10 small weekly assignments (some 'paper-and-pencil', some running and modifying Jet), a term project, and a final exam.

Students should have

A familiarity with the basics of propositional and predicate logic (expressing English sentences in predicate logic) is also helpful but not essential. The course is appropriate for strong undergraduate senior computer science majors as well as graduate students.

CSCI-GA.1180, Mathematical Techniques for Computer Science Applications, while not a prerequisite, does provide a solid background for the statistical methods employed in this course.

Textbook:  Jurafsky and Martin, Speech and Language Processing (Second edition, Prentice Hall)

For further information, you can consult the course pages from 2010, when I last taught the course.  You may also want to look at the pages of our natural language research group, the Proteus Project, and in particular its work on information extraction.

For further information, contact Prof. Grishman.