G22.2591 - Advanced Natural Language Processing - Spring 2011
Coreference is the task of determining whether two phrases refer to the
same 'entity'. We will consider it synonymous with anaphora resolution -- when a
linguistic unit (the anaphor)
refers back to a prior linguistic unit (the antecedent) in the discourse.
[However, it is possible to distinguish these phenomena ... if there
are two mentions of "the FBI" in a discourse, they are definitely
coreferential, but one may not consider the second mention an anaphoric
reference to the first.]
Coreference is critical to most NLP tasks. Certainly it will be
to doing well on KBP. If we find a sentence "He was born in Greenwich
Village on January 9, 1914." we won't be able to do anything without
Defining coreference as a formal task (so that it can be evaluated)
not trivial ... there are many possible variations on the basic task.
One such task definition was provided by the MUC-6
ACE implicitly defines a similar coference task because entity mentions
grouped into entities. One basic different, however, is that
only entities of the ACE types are marked. This may have subtle
effects on the evaluation.
- only consider references by noun phrases to other noun
phrases; a full consideration of coreference would also have to
include references to events described by entire sentences ("Prof.
Grishman talked for seven hours non-stop. This performance was not
appreciated by the students, who wanted to go home.")
- only consider identity relations, not (for example) part-whole
relations, sometimes called bridging
references ("He entered the
classroom. The lights were
and the floor was wet.")
- exclude split antecedents,
as "Fred met Mary after class. They went out for a cream soda."
- include predicate nominals ("Fred
is vice president.") and
apposition ("Fred, vice president of
Ford, resigned last week.").
There is generally good agreement between people about coreference
relations, although decisions can be tricky in some cases of vague
referents. ("The New York
police apprehended twenty criminals last week. A police raid made the newspapers.").
Anaphora can be divided into three types: pronouns, nominals
(phrases headed by common nouns), and names. Linguistic studies
have mostly looked at pronouns; some computational treatments
have also been limited to pronouns. Generally, system
very good for names,
moderate for pronouns, and poorest for nominals. The main factors
considered in resolving each type of potential anaphor are different:
In principle, we would like to use much richer criteria of semantic
compatibility and coherence in selecting antecedents, but (except for
very restricted domains) this is currently beyond the state of the art.
- The first task in pronoun
resolution is determining whether a pronoun is pleonastic (non-referential), as in
"It is raining." or "It is remarkable that your dog ate
you homework again." Fortunately, this can generally be
determined based on syntactic context. If it is referential, the
factors involved in identifying an antecedent include
- its position relative to the anaphor
- its salience and the discourse focus (NPs in subject position
and those which have been referred to repeatedly are more likely
- gender and number agreement and
- selectional compatibility (whether the antecedent is
compatible with the context of the anaphor).
- For nominals, systems look for a 'compatible' antecedent.
In the simplest case, this is an antecedent with the same head and all
the (non-restrictive) modifiers present in the antecedent.
Frequently, however, the anaphor has a different head or adds modifiers
not present in the antecedent. No systems do very well at
handling such cases.
- For names, anaphora resolution can usually be done by substring
match. ("Fred Smith ... Mr. Smith.").
Within- and cross-document coreference
So far we have considered coreference within a single document or
discourse, but the problem also arises across documents. When
applied across documents, it is generally limited to entities with
names. Even for entities with names, the task is not trivial.
On the one hand, names transliterated from non-Roman alphabets
may be spelled differently in different documents (e.g.,
al-Gaddafi"). On the other hand, a common name may
refer to many different people. So cross-document resolution
(part of KBP) requires contextual information.
Unlike all the tasks we considered until now (part-of-speech tagging,
chunking, name tagging, word sense disambiguation), coreference cannot
be readily reduced to a tagging task, because it involves a relation
between mentions (in effect, a gathering of mentions into
co-referential clusters or equivalence classes). This has led to
complications in devising an appropriate scoring metric.
It's not sufficient to ask whether a pronoun refers to the correct
antedecent, because there may be several possible correct
antecedents. In addition, we need to be concerned with how
severely we count particular errors: if there are 20
(referring noun phrases), and a system reports the answer as two sets
of 10, have we gotten 10 mentions wrong, or just 1 of 19 links?
This can have a large effect on scores.
MUC-6/7 adopted a scoring metric which essentially counts links, and
asks how many coreference links would have to be added to get the
correct cluster. Consider one coreferential cluster consisting of
mentions m1, ..., mN. Suppose in the system
response these mentions are divided among K clusters. We then
define recall as (N-K)/(N-1). Total recall is obtained by summing
these quantities (numerator and denominator) over all clusters in the
key. Precision is computed by reversing key and response.
This metric is now generally used for reporting coreference scores, but
it has some shortcomings. In particular, if there are lots of
singletons, you don't get any credit for getting them right. Baldwin
al. discuss Vilain's metric and their alterative B-cubed
coreference metric in their MUC-7 paper.
Non-trainable Coreference Methods: Major References
Jerry Hobbs. Resolving
Proposes a search order (through the
parse trees of current and prior sentences) for pronouns; widely
used for pronominal anaphora resolution.
Shalom Lappin and Herbert Leass. An Algorithm for
Pronominal Anaphora Resolution. Computational Linguistics 20(4):535-561.
Uses various weights (hand set) to
select antecedents for pronouns. Tested on corpus of several
hundred pronouns ... got 86% accuracy.
Renata Vieira; Massimo Poesio. An Empirically
Based System for Processing Definite Descriptions. Computational Linguistics 26(4):539-593.
Developed detailed rules for resolving
definite nominal anaphors and provides detailed evaluation against a
Trainable Coreference Methods:
Two 'classic' papers and a recent survey
Niyu Ge; John Hale; Eugene Charniak. A Statistical
Approach to Anaphora Resolution. WVLC 1998.
Does only pronoun resolution.
Reports 84% accuracy on their corpus (non-referential pronouns
excluded). Probabilistic model with features including Hobbs
distance, number agreement, gender match, mention frequency, and
selectional preference. Discusses methods of learning
gender/animateness characteristics of potential antecedents.
Wee Meng Soon; Daniel Chung Yong Lim; Hwee Tou Ng. A Machine
Learning Approach to Coreference Resolution of Noun Phrases.
Linguistics 27 #4, 521-545 (2001).
Corpus-trained procedure for full (MUC)
coreference task. Used a set of syntactic features plus a
semantic class agreement feature (using high-level Wordnet
classes); 12 features in all. Trained on MUC data with
decision tree procedure (classifier produces binary outcome). On
test, links to most recent antecedent for which classifier returns
true. Got MUC-6, 7 F=63%, 60%.
Survey of Coreference Models
Vincent Ng, who has published extensively on coreference resolution
over the past decade, has written a recent survey of supervised
methods: Vincent Ng. Supervised
Noun Phrase Coreference Research: the first 15 years.
ACL 2010. He discusses the three major types of trainable models
- mention-pair model, which is a statistical model for deciding
whether a pair of mentions are coreferential
- does not directly enforce transitivity
- generally takes selective approach to creation of negative
- must be coupled with a clustering procedure
- entity-mention model, which decides whether a mention is
coreferential with an entity (a cluster of [preceding] mentions)
- allows constraints to apply between mention and any, most, or
all elements of cluster
- ranking models, which explicitly compare two alternate mention
- cluster-ranking models explicitly compare two alternate entity
A variety of approaches to building coreference models
Aria Haghighi and Dan Klein, Simple
coreference resolution with rich syntactic and semantic features.
Is it possible to do coreference
without a statistical coref model? Haghighi and Klein learn rich
lexical, syntactic, and semantic constraints and combine them to create
a high performance coref system without the need to train a coref
model. (Further improved on by Haghighi and Klein, Coreference
resolution in a modular, entity-centered model
, NAACL 2010
Vincent Ng, Unsupervised
models for coreference resolution. EMNLP 2008.
David Bean and Ellen Riloff.
learning of contextual role knowledge for coreference resolution.
Proc. HLT/NAACL 2004.
Can we make use of discourse
information -- event sequences -- to resolve coref? Bean and
Riloff learn pairs of predicates which are likely to govern references
to the same entity, starting with reliable coreference pairs. Applied
in two narrow domains, terrorism
and disasters. (Generalized by Liao and Grishman, Large
corpus-based semantic feature extraction for pronoun coreference
Proceedings of the Second International Workshop on NLP Challenges in
the Information Explosion Era (at COLING 2010)).