G22.2591 - Advanced Natural Language Processing
Prof. Grishman
Spring 2004
Monday 5-7 PM
Over the past decade, the paradigm has shifted in natural language processing
from systems based on hand-coded rules to systems which are trained from
text corpora -- in most cases, from corpora which have been hand-annotated
with specific linguistic information. In many cases, the result has
been systems which significantly outperform the earlier systems with hand-coded
rules. These corpus-trained methods will be the focus of this course.
In addition, because of the considerable cost of hand annotation, recent
research has studied semi-supervised methods (where only some of the text
is annotated) and active learning methods (where the learning method identifies
specific portions of the text to be hand annotated).
In many cases, relatively simple models and learning methods will do quite
well. For better system performance, however, it is necessary to understand
the limitations of these models and the linguistic features which can lead
to better performance. This course will look at several natural language
processing tasks from this point of view, examining the linguistic characteristics
which support the creation of effective models, and the learning methods
required to train these models. Among the tasks which may be considered
are:
- part-of-speech tagging
- chunking
- coreference
- sense disambiguation
- information extraction
The classes will be a mix of lectures, discussion, and student presentations.
In addition to preparing one or two presentations for the course, students
will be expected to run a number of smaller experiments, and one larger experiment
as a term project.
We hope to get a variety of students who can contribute their knowledge in
such areas as machine learning and linguistics. All students should
have
- some background in natural language processing (if you have not taken
the introductory course, G22.2590,
you will be expected to read material from the Jurafsky and Martin text,
Speech and Language Processing, in preparation for the course)
- adequate programming skills to assemble substantial programs for
class experiments
- sufficient mathematical background to be able to read papers in machine
learning which include arguments regarding statistics and probability
For further information, contact the instructor at grishman@cs.nyu.edu