CSCI-GA.2590 - Natural Language Processing -- Spring 2013 -- Prof. Grishman

Assignment #4

February 26, 2013

5 points:
  1. [2 points] Try the noun group / verb group patterns, available as chunkPatterns.txt, on four sentences from a newspaper. How many groups did it get right? How many did it miss? How many wrong ones did it identify? (These patterns aim to tag tensed verb groups and base form verbs (infinitives). They are not intended to mark present or past participles appearing by themselves.)
  2. [2 points] Extend the patterns to allow perfect tenses (of the form "have" + past participle) to the verb groups. Note that this should include progressive perfect ("have been eating") and passive perfect ("has been eaten");  the overall pattern of English tenses is shown below.  In addition, allow quantifiers in the noun group ("five assignments") and pre-nominal nouns ("my afternoon tea").  In extending the noun group, think about which modifiers can occur together in a single noun group and the order in which they can occur, and allow for these combinations in your rules..  Construct some test sentences to check these patterns.
  3. [1 point] Retry your four sentences and report any improvement in performance.
A noun group consists of  a head noun and the modifiers to its left, including:
A verb group consists of a main verb and any modals and auxiliaries which may precede the main verb.  Here are the basic tense forms of English, including the more obscure forms (in parentheses):

simple present tense
is eaten
simple past tense
was eaten
siimple future tense
will eat
will be eaten
present perfect
has eaten
has been eaten
past perfect
had eaten
had been eaten
future perfect
will have eaten
will have been eaten
present progressive
is eating
is being eaten
past progressive
was eating
was being eaten
future progressive
will be eating
(will be being eaten)
present progressive perfect
has been eating
(has been being eaten)
past progressive perfect
had been eating
(had been being eaten)
future progressive perfect
will have been eating
(will have been being eaten)
"will" acts as a modal verb, and other modals ("can", "may", ...) can also appear in that position ("may eat", "may be eating", "may have been eating", etc.).  Verb groups are sometimes extended to handle embedded negation and adverbials ("has not eaten", "has rarely eaten") but that is not requried here.

Due March 5th. Please use the subject line "NLP - Asssignment #4".

This homework is a first introduction to the Jet pattern language. Patterns are organized into pattern sets;  initially, we will have just one pattern set, chunks, designed to recognize noun groups and verb groups.  These pattern sets are invoked from the .properties (.jet) file by specifying a processing step of the form pat(pattern-set);  in this case, pat(chunks).

A properties file (chunk.jet) to run the noun/verb group chunk patterns is

# JET properties file
Jet.dataPath           = data
Tags.fileName          = pos_hmm.txt
Pattern.fileName1      = chunkPatterns.txt
processSentence        = tagJet, pat(chunks)

Note that tagJet assigns Jet part-of-speech tags, as listed in the Jet documentation.  These are the tags which must be used in the patterns.

Two simple traces are provided for pattern matching (on the pattern menu).  The Pattern Match Trace prints a message every time a (complete) pattern matches.  If several patterns match, only one will be applied (the one spanning the most tokens;  among those spanning the same tokens, the pattern appearing first in the file);  this is shown by the Pattern Apply Trace.