G22.2591 - Advanced Natural Language Processing

Prof. Grishman

Spring 2004

Monday 5-7 PM

course schedule with links to lectures

Over the past decade, the paradigm has shifted in natural language processing from systems based on hand-coded rules to systems which are trained from text corpora -- in most cases, from corpora which have been hand-annotated with specific linguistic information.  In many cases, the result has been systems which significantly outperform the earlier systems with hand-coded rules.  These corpus-trained methods will be the focus of this course.

In addition, because of the considerable cost of hand annotation, recent research has studied semi-supervised methods (where only some of the text is annotated) and active learning methods (where the learning method identifies specific portions of the text to be hand annotated).

In many cases, relatively simple models and learning methods will do quite well.  For better system performance, however, it is necessary to understand the limitations of these models and the linguistic features which can lead to better performance.  This course will look at several natural language processing tasks from this point of view, examining the linguistic characteristics which support the creation of effective models, and the learning methods required to train these models.  Among the tasks which may be considered are:
The classes will be a mix of lectures, discussion, and student presentations.  In addition to preparing one or two presentations for the course, students will be expected to run a number of smaller experiments, and one larger experiment as a term project.

We hope to get a variety of students who can contribute their knowledge in such areas as machine learning and linguistics.  All students should have
For further information, contact the instructor at  grishman@cs.nyu.edu