G22.2591 - Advanced Natural Language Processing - Spring 2011

Lecture 5

Name Recognition ... final words

Stopping criteria for active learners

In active learning, we (presumably) do not fact the problem of semantic drift, as long as the informant remains consistent in his/her responses.  But we still need to know when to stop -- when to stop asking the informant questions.  This question was addressed in

Florian Laws and Hinrich Schuetze
Stopping criteria for active learning of named entity recognition.
Coling 2008.

In principle, a probabilistic tagger (HMM, MaxEnt) can estimate the F score of the current model based on the probability of the most likely tag for each token.  However, such estimates can be quite poor -- not good enough to identify a stopping point.  However, using the gradient of any of several measures (uncertainty of last selected instance, probability margin of top hypothesis) can be effective:  we stop when the gradient approaches 0.  This is uncertainty convergence or performance convergence and did provide a good stopping point for named entity tagging.

Supplementary material on combining multi-view approaches and active learning.

Hyponymy and Lexico-Syntactic Patterns

Name tagging provides useful but limited information because it is generally based on a limited number of broad categories (person, organization, location, ...).  If we want to know "Who shot JR?" and the system knows that "who" is answered by a person name, then we can look for an answer of the form "<ENAMEX TYPE=PERSON>...</ENAMEX> shot JR".   On the other hand, if the question was posed as "Which rancher shot JR?" we would have to know that a rancher was a kind of person, or have a list of ranchers, in order to answer the question.

The relation between a general class X and a more specific class Y is the hypernym/hyponym relation:  X is a hypernym of Y, Y is a hyponym of X.  Thus rancher is a hyponym of person. 

Hyponym relations are valuable for many NLP tasks and applications.  They are useful for answering questions, information extraction, and generally for applying selectional (semantic) constraints.  So they have been a long-term object of study in NLP.

One early approach to acquiring hyponomy was through the use of machine-readable dictionaries.  Dictionary definitions are written in a standard style ("Y (n) 1. an X which ...") that allows the extraction of hyponym relations without full parsing.  Several dictionaries (Longman's, OALD [Oxford Advanced Learner's Dictionary], Merriam-Webster Pocket) were intensively analyzed in the 1970's and 80's for this purpose.  One problem which made the construction of a hyponym graph from the individual relations is the presence of words with multiple senses. Hyponymy is really a relation between word senses, not words;  lumping the senses together produces a lot of false connections.

This problem was addressed by George Miller and his colleagues in the creation of WordNet starting in the early 1990's.  The basic nodes in the WordNet graph are synsets [synonym sets], which are sets of synonymous word senses.  However, WordNet shares the problems of most large-scale hand-built resources;  it is somewhat inconsistent and incomplete.  It is intended to cover general word usage and so is not adequate for more specialized texts.

To complement (or, for specialized domains, replace) WordNet, interest grew in the 1990's in learning hypoym relations from corpora.  Marti Hearst pointed out that many such relations could be acquired from a few lexico-syntactic patterns (patterns combining specific lexical items and syntactic structures):

Marti Hearst. Automatic acquisition of hyponyms from large text corpora. COLING 1992.

(student presentation)

Marti Hearst's system was based on manually constructed patterns;  Rion Snow showed how to train a hyponym classifier which learned such patterns from WordNet and a large text corpus.

Rion Snow, Daniel Jurafsky, and Andrew Ng. Learning syntactic patterns for automatic hypernym discovery. NIPS 2004.

powerpoint slides prepared by Ang Sun

Looking ahead:   using lexico-syntactic patterns ...

Marius Pasca and Benjamin Van Durme
Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs