G22.2591 - Advanced Natural Language Processing - Spring 2011
Name Recognition ... final words
Stopping criteria for active learners
In active learning, we (presumably) do not fact the problem of semantic
drift, as long as the informant remains consistent in his/her
responses. But we still need to know when to stop -- when to stop
asking the informant questions. This question was addressed in
Florian Laws and Hinrich Schuetze
criteria for active learning of named entity recognition.
In principle, a probabilistic tagger (HMM, MaxEnt) can estimate the F
score of the current model based on the probability of the most likely
tag for each token. However, such estimates can be quite poor --
not good enough to identify a stopping point. However, using the
gradient of any of several measures (uncertainty of last selected
instance, probability margin of top hypothesis) can be effective:
we stop when the gradient approaches 0. This is uncertainty convergence or performance convergence and did
provide a good stopping point for named entity tagging.
Supplementary material on combining multi-view approaches and
Hyponymy and Lexico-Syntactic Patterns
Name tagging provides useful but limited information because it is
generally based on a limited number of broad categories (person,
organization, location, ...). If we want to know "Who shot JR?"
and the system knows that "who" is answered by a person name, then we
can look for an answer of the form "<ENAMEX
TYPE=PERSON>...</ENAMEX> shot JR". On the other
hand, if the question was posed as "Which rancher shot JR?" we would
have to know that a rancher was a kind of person, or have a list of
ranchers, in order to answer the question.
The relation between a general class X and a more specific class Y is
the hypernym/hyponym relation: X is a hypernym of Y, Y is a
hyponym of X. Thus rancher is a hyponym of person.
Hyponym relations are valuable for many NLP tasks and
applications. They are useful for answering questions,
information extraction, and generally for applying selectional
(semantic) constraints. So they have been a long-term object of
study in NLP.
One early approach to acquiring hyponomy was through the use of
machine-readable dictionaries. Dictionary definitions are written
in a standard style ("Y (n) 1. an X which ...") that allows the
extraction of hyponym relations without full parsing. Several
dictionaries (Longman's, OALD [Oxford Advanced Learner's Dictionary],
Merriam-Webster Pocket) were intensively analyzed in the 1970's and
80's for this purpose. One problem which made the construction of
a hyponym graph from the individual relations is the presence of words
with multiple senses. Hyponymy is really a relation between word
senses, not words; lumping the senses together produces a lot of
This problem was addressed by George Miller and his colleagues in the
creation of WordNet starting in the early 1990's. The basic nodes
in the WordNet graph are synsets
[synonym sets], which are sets of synonymous word senses.
However, WordNet shares the problems of most large-scale hand-built
resources; it is somewhat inconsistent and incomplete. It
is intended to cover general word usage and so is not adequate for more
To complement (or, for specialized domains, replace) WordNet, interest
grew in the 1990's in learning hypoym relations from corpora.
Marti Hearst pointed out that many such relations could be acquired
from a few lexico-syntactic patterns
(patterns combining specific lexical items and syntactic structures):
Marti Hearst's system was based on manually constructed patterns;
Rion Snow showed how to train a hyponym classifier which learned such
patterns from WordNet and a large text corpus.
Rion Snow, Daniel Jurafsky, and Andrew Ng.
for automatic hypernym discovery.
using lexico-syntactic patterns ...
Marius Pasca and Benjamin Van Durme
Weakly-Supervised Acquisition of Open-Domain Classes and Class
Attributes from Web Documents and Query Logs