CSCI_GA.2590 - Natural Language Processing -- Spring 2013 -- Prof. Grishman

Assignment #5

1.  (3 points) Try your chunkPatterns from Assignment #4 on an article from the Washington Square News.   Save this article in the data directory as "article.txt".   Use the following properties file ("chunk-art.jet"):

# JET properties file
#   apply chunkPatterns to article.txt
Jet.dataPath         = data
Tags.fileName        = pos_hmm.txt
Pattern.fileName1    = chunkPatterns.txt
JetTest.fileName1    = article.txt
processSentence      = tokenize, tagJet, pat(chunks)
WriteSGML.type       = ngroup

Then run your pattern set on the article, following the instructions for Processing Documents in Jet; the properties file will write out the document (as file "response-article.txt") with the ngroup annotations marked in an XML notation.  Score the resulting annotations against the key file following the instructions for Using the Jet SGML Scorer.

(a) First run using the version of the chunk patterns provided to you.

(b) Then run with the changes to the pattern set which you made for Assignment #4.  Guided by the results, try making further changes to improve the score. Such improvements should have some generality ... not coded just to handle a specific example in the WSN article. (see the suggestions for Assignment #4 and the Noun Phrase section of the text.)

(c) Finally, to see if your changes are indeed general, run a second WSN article (with its key file) using both the original patterns and your current patterns. This file should be treated as test data -- you should not modify the patterns further after running with this data.

Submit your final pattern set and the scores you obtained for parts (a), (b), and (c).

2. (2 points)  Rerun your experiment using the MUC named entity tagger provided with Jet (command "tagNames").  This will require small changes to the properties file and chunk patterns.  Your proper-name pattern should now look for ENAMEX annotations produced by the NE tagger.  Report your scores with the (otherwise) original chunkPatterns and the chunkPatterns with your other modifications. Include listings of the pattern files as modified to use the named entity tagger.

Due March 12th.