G22.2590 - Natural Language Processing -- Spring 2005 -- Prof. Grishman

Assignment #7

1.  (3 points) Try your chunkPatterns from Assignment #6 on an article from the Washington Square News.   Save this article in the data directory as "article.txt".  Take the pattern set(s) you used for Assignment #6, and change
add [constit type=ngroup]
add [ngroup]
so that the noun groups will be a distinguished type of annotation (this is necessary for scoring).  Use the following properties file ("chunk-art.jet"):

# JET properties file
#   apply chunkPatterns to article.txt
Jet.dataPath         = data
Tags.fileName        = pos_hmm.txt
Pattern.fileName1    = chunkPatterns.txt
JetTest.fileName1    = article.txt
processSentence      = tokenize, tagJet, pat(chunks)
WriteSGML.type       = ngroup

Then run your pattern set on the article, following the instructions for Processing Documents in Jet; the properties file will write out the document (as file "response-article.txt") with the ngroup annotations marked in an XML notation.  Score the resulting annotations against the key file following the instructions for Using the Jet SGML Scorer.

Compare the scores with and without the change to the pattern set which you made for Assignment #6.  If your change was not very successful in improving the score, try another

We will give a small extra credit for additional, linguistically motivated enhancements which further improve the noun group score.  Such improvements should have some generality ... not coded just to handle a specific example in the WSN article.

2. (1.5 points)  Rerun your experiment using the MUC named entity tagger provided with Jet (command "tagNames").  This will require small changes to the properties file and chunk patterns.  Your proper-name pattern should now look for ENAMEX annotations produced by the NE tagger.  Report your scores with the (otherwise) original chunkPatterns and the chunkPatterns with your other modifications.

Due April 4th.