G22.2590 - Natural Language Processing - Spring 2008 Prof. Grishman

Term Project

March 13, 2008

You must submit a “term project” on material connected to the course;  this is worth 30% of your grade.  You have wide latitude in what do for the project.  It may be a project based on Jet;  a separate programming project, or a research paper.  A Jet or programming project must be accompanied by a separate, well-written description of the project;  an analysis of the data and your system’s performance will be an important part of the grade.

Joint projects are permitted.

The general idea is to do something interesting which will require you to confront some of the 'real problems' of doing NLP.  Real NLP is hard ... don't be too ambitious, or at least have a fall-back plan if your ambitions are not realized.

Possible projects

  1. We discussed the use of Jet for a specific extraction task -- 'executive succession' -- using the pattern matching tools in Jet.  Adaptation of Jet to perform event extraction (modeled on the executive succession patterns, but more extensive) from news stories.  Should include some analysis of the performance of the extraction patterns.  Among the possible event types we would be interested in: If you are considering one of these, you should begin by marking up a few documents by hand to see what is feasible.  Then you should As an alternative to general news, you could do a richer analysis of some narrow sublanguage within the news, such as weather forecasts, death notices, cooking recipes, sports results, etc.
  2. Doing similar extraction with your own program (e.g., with Perl).
  3. Extract time expressions (including actual dates, "Friday", "last month", "two weeks ago") and normalize them (figure out a date or date range, given the date of an article).
  4. Extension of Jet syntactic patterns to more constructs (e.g., a rich variety of modifiers for noun and verb groups).  This should include some performance analysis.
  5. Building your own HMM and training it to identify names, noun groups, or time expressions.
  6. Feature engineering:  training a maximum entropy tagger to identify names, noun groups, or time expressions, extending the feature set to get good performance.
  7. Implementing Brin's method for finding a relationship from the Web (requires ability to do Web queries automatically).
  8. Preparing a context-free semantic grammar and dictionary for a sublanguage, and testing it on a small sample of the sublanguage.
  9. Foreign language analysis:  building a POS tagger or even a chunker for another language.
  10. Implementation of one of the parsing algorithms for feature grammars or grammars with semantic features, either as an extension to the Jet parser or a separate program.
  11. Research report on some topic not covered in the course (e.g., morphological analysis of morphologically rich languages;  question answering, summarization, machine translation methods).The paper should show some understanding of what problems have and have not been addressed by current technology.
What were the best recent projects?
extraction systems for
    {criminal verdicts, lay-offs, weather reports, sports game summaries, death notices}, including evaluation
     (using Jet and using Python)
extracting family relationships from the Bible
noun / verb group patterns, with evaluation on larger corpus (using Jet)
question-answering (NL data base interface) system for {stock quotes, train schedules} (from scratch)
feature parser (on top of Jet -- what we're using this year)
name recognizer / linker for Web pages (using HMM)
literature surveys of {coreference analysis, IE pattern learning}

Due dates

Brief project description – March 30th  -  1-2 paragraphs, email to grishman@cs.nyu.edu

Project – May 1st (last class)

   2% penalty for each weekday late

Final exam – May 8th  (open book)