G22.2590 - Natural Language Processing - Spring 2005 Prof. Grishman

Term Project

March 21, 2005

You must submit a “term project” on material connected to the course;  this is worth 30% of your grade.  You have wide latitude in what do for the project.  It may be a project based on Jet;  a separate programming project, or a research paper.  A Jet or programming project must be accompanied by a separate, well-written description of the project;  an analysis of the data and your system’s performance will be an important part of the grade.

Joint projects are permitted.

The general idea is to do something interesting which will require you to confront some of the 'real problems' of doing NLP.

Possible projects

  1. We will discuss the use of Jet for a specific extraction task -- 'executive succession' -- using the pattern matching tools in Jet.  Adaptation of Jet to perform extraction on a different topic (modeled on the executive succession patterns, but more extensive).  Should include some analysis of the performance of the extraction patterns.  You may either try to pull one specific topic from general news, or do a richer analysis of some narrow sublanguage, such as weather forecasts, apartment ads, death notices, cooking recipes, sports results, etc.
  2. Doing similar extraction with your own program (e.g., with Perl).
  3. Preparing a context-free semantic grammar for a sublanguage.
  4. Extension of Jet syntactic patterns to more constructs (e.g., a rich variety of modifiers for noun and verb groups, or additional structures such as clause structures).  This should include some performance analysis.
  5. Experimentation with HMM to improve performance for POS [e.g., using morphological clues] or name tagging, or to apply to a different task (e.g., chunking) (using Jet tools or your own HMM implementation).
  6. Foreign language analysis:  building a POS tagger or even a chunker for another language.
  7. Implementation of one of the parsing algorithms for feature grammars or grammars with semantic features, either as an extension to the Jet parser or a separate program.
  8. Research report on some topic not covered in the course (e.g., morphological analysis of morphologically rich languages;  question answering, summarization, machine translation methods).The paper should show some understanding of what problems have and have not been addressed by current technology.
What were the best recent projects?
question-answering (NL data base interface) system for {stock quotes, train schedules} (from scratch)
extraction systems for
    {criminal verdicts, lay-offs, weather reports, apartment ads, car ads, sports game summaries, death notices}, including evaluation
     (using Jet and using Python)
noun / verb group patterns, with evaluation on larger corpus (using Jet)
feature parsers (both on top of Jet and stand-alone)
name recognizer / linker for Web pages (using HMM)
literature surveys of {coreference analysis, IE pattern learning}

Due dates

Brief project description – April 4th  -  1-2 paragraphs, email to grishman@cs.nyu.edu

Project – May 2nd (last class)

   2% penalty for each weekday late

Final exam – May 9th  (open book)