G22.2590 - Natural Language Processing -- Spring 2005 -- Prof. Grishman

Assignment #6

March 7th, 2005

5 points:
  1. [2 points] Try the noun group / verb group patterns, available as chunkPatterns.txt, on three sentences from a newspaper. How many groups did it get right?  How many did it miss?  How many wrong ones did it identify?
  2. [2 points] Extend the patterns to allow perfect tenses (of the form "have" + past participle) to the verb groups. Note that this should include progressive perfect ("have been eating") and passive perfect ("has been eaten").  In addition, allow quantifiers in the noun group ("five assignments") and pre-nominal nouns ("my afternoon tea").  Construct some test sentences to check these patterns.
  3. [1 point] Retry your three sentences and report any improvement in performance.
Due March 28th.

This homework is a first introduction to the Jet pattern language. Patterns are organized into pattern sets;  initially, we will have just one pattern set, chunks, designed to recognize noun groups and verb groups.  These pattern sets are invoked from the .properties (.jet) file by specifying a processing step of the form p(pattern-set);  in this case, p(chunks).

Basically, a noun group in English is the portion of a noun phrase up to and including the head noun/pronoun.  The verb group consists of the main verb together will preceding modals and auxiliaries (forms of "have" and "be").

A properties file (chunk.jet) to run the noun/verb group chunk patterns is

# JET properties file
Jet.dataPath           = data
Tags.fileName          = pos_hmm.txt
Pattern.fileName1      = chunkPatterns.txt
processSentence        = tagJet, pat(chunks)
Two simple traces are provided for pattern matching (on the pattern menu).  The Pattern Match Trace prints a message every time a (complete) pattern matches.  If several patterns match, only one will be applied (the one spanning the most tokens;  among those spanning the same tokens, the pattern appearing first in the file);  this is shown by the Pattern Apply Trace.