G22.2591 - Advanced Natural Language Processing - Spring 2004
(Discuss term project topics)
Coreference is the task of determining whether two phrases refer to the
same 'entity'. We will consider it synonymous with anaphora resolution -- when a
linguistic unit (the anaphor)
refers back to a prior linguistic unit (the antecedent) in the discourse.
[However, it is possible to distinguish these phenomena ... if there
are two mentions of "the FBI" in a discourse, they are definitely
coreferential, but one may not consider the second mention an anaphoric
reference to the first.]
Defining coreference as a formal task (so that it can be evaluated) is
not trivial ... there are many possible variations on the basic task.
One such task definition was provided by the MUC-6
There is generally good agreement between people about coreference
relations, although decisions can be tricky in some cases of vague
referents. ("The New York
police apprehended twenty criminals last week. A police raid made the newspapers.").
- only consider references by noun phrases to other noun
phrases; a full consideration of coreference would also have to
include references to events described by entire sentences ("Prof.
Grishman talked for seven hours non-stop. This performance was not
appreciated by the students, who wanted to go home.")
- only consider identity relations, not (for example) part-whole
relations, sometimes called bridging
references ("He entered the
classroom. The lights were
off, and the floor was wet.")
- exclude split antecedents,
such as "Fred met Mary after class. They went out for a cream soda."
- include predicate nominals ("Fred
is vice president.") and
apposition ("Fred, vice president of
Ford, resigned last week.").
Anaphora can be divided into three types: pronouns, nominals
(phrases headed by common nouns), and names. Linguistic studies
have mostly looked at pronouns; some computational treatments
have also been limited to pronouns. Generally, system
very good for names,
moderate for pronouns, and poorest for nominals. The main factors
considered in resolving each type of potential anaphor are different:
In principle, we would like to use much richer criteria of semantic
compatibility and coherence in selecting antecedents, but (except for
very restricted domains) this is currently beyond the state of the art.
- The first task in pronoun
resolution is determining whether a pronoun is pleonastic (non-referential), as in
"It is raining." or "It is remarkable that your dog ate
you homework again." Fortunately, this can generally be
determined based on syntactic context. If it is referential, the
factors involved in identifying an antecedent include its position
relative to the anaphor, discourse focus, gender and number agreement
and selectional compatibility (whether the antecedent is selectionally
compatible with the context of the anaphor).
- For nominals, systems look for a 'compatible' antecedent.
In the simplest case, this is an antecedent with the same head and all
the (non-restrictive) modifiers present in the antecedent.
Frequently, however, the anaphor has a different head or adds modifiers
not present in the antecedent. No systems do very well at
handling such cases.
- For names, anaphora resolution can usually be done by substring
match. ("Fred Smith ... Mr. Smith.").
Unlike all the tasks we considered until now (part-of-speech tagging,
chunking, name tagging, word sense disambiguation), coreference cannot
be readily reduced to a tagging task, because it involves a relation
between mentions (in effect, a gathering of mentions into
co-referential clusters or equivalence classes). This has led to
complications in devising an appropriate scoring metric.
It's not sufficient to ask whether a pronoun refers to the correct
antedecent, because there may be several possible correct
antecedents. In addition, we need to be concerned with how
severely we count particular errors: if there are 20
(referring noun phrases), and a system reports the answer as two sets
of 10, have we gotten 10 mentions wrong, or just 1 of 19 links?
This can have a large effect on scores.
MUC-6/7 adopted a scoring metric which essentially counts links, and
asks how many coreference links would have to be added to get the
correct cluster. Consider one coreferential cluster consisting of
mentions m1, ..., mN. Suppose in the system
response these mentions are divided among K clusters. We then
define recall as (N-K)/(N-1). Total recall is obtained by summing
these quantities (numerator and denominator) over all clusters in the
key. Precision is computed by reversing key and response.
This metric is now generally used for reporting coreference scores, but
it has some shortcomings. In particular, if there are lots of
singletons, you don't get any credit for getting them right. Baldwin
et al. discuss Vilain's metric and their alterative B-cubed
coreference metric in their MUC-7 paper.
Non-trainable Coreference Methods: Major References
Jerry Hobbs. Resolving
Pronoun References. Lingua
Proposes a search order (through the
parse trees of current and prior sentences) for pronouns; widely
used for pronominal anaphora resolution.
Shalom Lappin and Herbert Leass. An Algorithm for
Pronominal Anaphora Resolution. Computational Linguistics 20(4):535-561.
Uses various weights (hand set) to
select antecedents for pronouns. Tested on corpus of several
hundred pronouns ... got 86% accuracy.
Renata Vieira; Massimo Poesio. An Empirically
Based System for Processing Definite Descriptions. Computational Linguistics 26(4):539-593.
Developed detailed rules for resolving
definite nominal anaphors and provides detailed evaluation against a
Trainable Coreference Methods: An Annotated Bibliography
Niyu Ge; John Hale; Eugene Charniak. A Statistical
Approach to Anaphora Resolution. WVLC 1998. **
Does only pronoun resolution.
Reports 84% accuracy on their corpus (non-referential pronouns
excluded). Probabilistic model with features including Hobbs
distance, number agreement, gender match, mention frequency, and
selectional preference. Discusses methods of learning
gender/animateness characteristics of potential antecedents.
Wee Meng Soon; Daniel Chung Yong Lim; Hwee Tou Ng. A Machine
Learning Approach to Coreference Resolution of Noun Phrases.
Computational Linguistics 27 #4, 521-545 (2001). **
Corpus-trained procedure for full (MUC)
coreference task. Used a set of syntactic features plus a
semantic class agreement feature (using high-level Wordnet
classes); 12 features in all. Trained on MUC data with
decision tree procedure (classifier produces binary outcome). On
test, links to most recent antecedent for which classifier returns
true. Got MUC-6, 7 F=63%, 60%.
Vincent Ng; Claire Cardie. Improving Machine
Learning Approaches to Coreference Resolution. ACL 2002. **
Tries to improve on work of Soon et
al. Non-linguistic improvements include richer string-match
features, different selection of positive training examples, and
selecting best-scoring antecedent (rather than closest acceptable
antecedent). Together yield 3%
improvement. Tries very rich set of features -- net loss using
decision tree. Prunes the features, gets some gain on MUC-6 (not
MUC-7). Final MUC F= 69%, 63%.
Vincent Ng and Claire Cardie. Identifying
Anaphoric and Non-Anaphoric Noun Phrases to Improve Coreference
Resolution. COLING 2002.
Introduces a separate (corpus-trained)
test to distinguish anaphoric and non-anaphoric NPs, using
domain-independent criteria (in contrast to Bean and Riloff).
Finds that if this test is applied to all possible anaphors,
performance gets worse, but if some 'sure' anaphor-antecedent pairs
passing a string match or alias test are excluded, performance
improves. Reports MUC-6, 7 scores 66%, 64%.
Xiaofeng Yang; Guodong Zhou; Jian Su; Chew Lim Tan. Coreference
Resolution Using Competition Learning Approach. ACL 2003.
Proposes using discriminative learner
to choose between antecedents, rather than just scoring each antecedent
against anaphor. Helps for pronouns (2-3% over Ng&Cardie
ACL), little help for nominals. Perhaps features comparing
antecedents are too weak. Overall MUC-6, 7 F is 71%, 60%.
Sanda M. Harabagiu; Razvan C. Bunescu; Steven J. Maiorano. Text and
Knowledge Mining for Coreference Resolution. NAACL 2001. **
Integrates several corpus-driven
methods, although in all cases the learning appears
to be performed manually,
from corpus examples. Develops rules based on MUC corpus.
Uses WordNet to determine similarity of heads for nominal
anaphors; uses corpora to weight different possible paths in
WordNet. Learns rule weighting from corpus. Finally, uses
bootstrap to find good candidate examples / rules from unannotated
text. Scores are very high, but Ng and Cardie [ACL 2002] report
that this is based on a different metric from the standard MUC
metric. (Also, Sanda M. Harabagiu; Steven J. Maiorano. Multilingual
Coreference Resolution. ANLP/NAACL 2000.
similar approach, extended to process parallel English/Romanian texts
with small improvement in reference resolution.)
J. McCarthy and W. Lehnert,
Decision Trees for Coreference
Resolution. Proc. Fourteenth Int'l Joint Conf. on
Artificial Intelligence (1995).
Coreference for entities as identified
by MUC-5 joint venture task; uses information from extracted
Andrew Kehler. Probabilistic
Coreference in Information Extraction. WVLC 97.
Coreference for entities as identified
by MUC-5 joint venture task; uses information from extracted
Claire Cardie and Kiri Wagstaf. Noun Phrase
Coreference as Clustering. EMNLP 1999.
Builds entities by clustering mentions,
proceeding backwards through document, treating entities as
clusters. Is described as unsupervised learning but really
appears to be hand-tuned ... no real learning. MUC-6 F=54%.
David L. Bean; Ellen
Identification of Non-Anaphoric Noun Phrases. ACL 1999.
Describes a variety of test (some
syntactic, some lexical and corpus-based) to identify 'existential' NPs
-- those which can be understood independently of other mentions
(note that this is not the same as being non-coreferential).
Suggests that this would be helpful as a filter for anaphora
resolution, but this is not directly tested.
Andrew McCallum and Ben Wellner. Toward
Conditional Models of Identity Uncertainty with Application to Proper
Noun Coreference. IJCAI Workshop on Information Integration
on the Web, 2003.
Argues that coreference models based on
binary (anaphor-antecedent) probabilities or scores are not
sufficient; we need more general models that can capture
relations among all mentions in a coreferential cluster.
Introduces such a model, and demonstrates 2-3% absolute improvement in
name coreference performance.