G22.2590 - Natural Language Processing - Spring 2003 Prof. Grishman

Lecture 13 Outline

April 24, 2003

Asgn 9 and pattern learning:  people were able to get most examples to work (except those with a name recognition error), but the process of developing all these patterns is time consuming.  Many extraction systems now learn patterns (or probabilistic extraction models) from annotated text, or in some cases even from unannotated text..  Such learning methods will be a prime focus of the Advanced NLP class next Spring (2004).

Name recognition:  An improved Jet HMM model for names is available as name_hmm_2.  This may be helpful for people using a name recognizer as part of their term project.  It improves tagging accuracy for news text from about 75% to about 82%, primarily by using a larger corpus (300 articles instead of 100), and also by small changes to the HMM model.  Further improvement (to the low 90's%) would require a more elaborate model than currently provided by Jet (one which uses more features and more context in computing transition and emission probabilities).

Discourse.  Until now we considered the structure and meaning of sentences in isolation.  We now turn to issues primarily connected with multi-sentence text -- discourse.

Reference Resolution (J&M 18.1)


Types of referring expressions


Resolving pronoun reference

Resolving other referring expressions

Anaphora resolution in Jet

Using anaphora resolution for extraction:  an example

In many cases, we want to be able to retrieve an argument from context when it is not part of the immediate syntactic structure.  A simple way of doing this is to generate a zero anaphor (an ngroup constituent not spanning any text) and then let reference resolution map it to an entity.  We have created a version of the AppointPatterns which uses this method to collect organization names and, in some cases, people names.

Discourse Analysis:  Analyzing Text Coherence  (J&M 18.2)

Why are we interested in analyzing the structure of a discourse beyond the sentence level? How to analyze text coherence?