G22.2591 - Advanced Natural Language Processing - Spring 2009
Discuss KBP progress.
Event extraction: final comments
Summarize Yangarber 2003, Stevenson and Greenberg, Surdeanu et al. (Lecture 11 notes).
Scenario / event extraction really represents a range of tasks
Different sources of trigger similarity
- scenarios (MUC) generally broader than events (ACE)
- different scopes among scenarios ... MUC-3 (topic) vs MUC-6 (close to event)
- different scopes for events ... attacks (multiple instantiations) vs marriage (single instantiation)
- represents range of user goals / needs
No real study yet of whether these sources are complementary, or how different
sources may be appropriate for different tasks.
- document-level similarity (Riloff, Yangarber)
- local similarity
- WordNet (Stevenson and Greenberg)
Coreference is the task of determining whether two phrases refer to the
same 'entity'. We will consider it synonymous with anaphora resolution -- when a
linguistic unit (the anaphor)
refers back to a prior linguistic unit (the antecedent) in the discourse.
[However, it is possible to distinguish these phenomena ... if there
are two mentions of "the FBI" in a discourse, they are definitely
coreferential, but one may not consider the second mention an anaphoric
reference to the first.]
Coreference is critical to most NLP tasks. Certainly it will be essential
to doing well on KBP. If we find a sentence "He was born in Greenwich
Village on January 9, 1914." we won't be able to do anything without
Defining coreference as a formal task (so that it can be evaluated) is
not trivial ... there are many possible variations on the basic task.
One such task definition was provided by the MUC-6
ACE implicitly defines a similar coference task because entity mentions are
grouped into entities. One basic different, however, is that
only entities of the ACE types are marked. This may have subtle
effects on the evaluation.
- only consider references by noun phrases to other noun
phrases; a full consideration of coreference would also have to
include references to events described by entire sentences ("Prof.
Grishman talked for seven hours non-stop. This performance was not
appreciated by the students, who wanted to go home.")
- only consider identity relations, not (for example) part-whole
relations, sometimes called bridging
references ("He entered the
classroom. The lights were
off, and the floor was wet.")
- exclude split antecedents,
such as "Fred met Mary after class. They went out for a cream soda."
- include predicate nominals ("Fred
is vice president.") and
apposition ("Fred, vice president of
Ford, resigned last week.").
There is generally good agreement between people about coreference
relations, although decisions can be tricky in some cases of vague
referents. ("The New York
police apprehended twenty criminals last week. A police raid made the newspapers.").
Anaphora can be divided into three types: pronouns, nominals
(phrases headed by common nouns), and names. Linguistic studies
have mostly looked at pronouns; some computational treatments
have also been limited to pronouns. Generally, system
very good for names,
moderate for pronouns, and poorest for nominals. The main factors
considered in resolving each type of potential anaphor are different:
In principle, we would like to use much richer criteria of semantic
compatibility and coherence in selecting antecedents, but (except for
very restricted domains) this is currently beyond the state of the art.
- The first task in pronoun
resolution is determining whether a pronoun is pleonastic (non-referential), as in
"It is raining." or "It is remarkable that your dog ate
you homework again." Fortunately, this can generally be
determined based on syntactic context. If it is referential, the
factors involved in identifying an antecedent include
- its position relative to the anaphor
- its salience and the discourse focus (NPs in subject position
and those which have been referred to repeatedly are more
- gender and number agreement and
- selectional compatibility (whether the antecedent is selectionally
compatible with the context of the anaphor).
- For nominals, systems look for a 'compatible' antecedent.
In the simplest case, this is an antecedent with the same head and all
the (non-restrictive) modifiers present in the antecedent.
Frequently, however, the anaphor has a different head or adds modifiers
not present in the antecedent. No systems do very well at
handling such cases.
- For names, anaphora resolution can usually be done by substring
match. ("Fred Smith ... Mr. Smith.").
Within- and cross-document coreference
So far we have considered coreference within a single document or
discourse, but the problem also arises across documents. When
applied across documents, it is generally limited to entities with
names. Even for entities with names, the task is not trivial.
On the one hand, names transliterated from non-Roman alphabets
may be spelled differently in different documents (e.g.,
"Muammar al-Gaddafi"). On the other hand, a common name may
refer to many different people. So cross-document resolution
(part of KBP) requires contextual information.
Unlike all the tasks we considered until now (part-of-speech tagging,
chunking, name tagging, word sense disambiguation), coreference cannot
be readily reduced to a tagging task, because it involves a relation
between mentions (in effect, a gathering of mentions into
co-referential clusters or equivalence classes). This has led to
complications in devising an appropriate scoring metric.
It's not sufficient to ask whether a pronoun refers to the correct
antedecent, because there may be several possible correct
antecedents. In addition, we need to be concerned with how
severely we count particular errors: if there are 20
(referring noun phrases), and a system reports the answer as two sets
of 10, have we gotten 10 mentions wrong, or just 1 of 19 links?
This can have a large effect on scores.
MUC-6/7 adopted a scoring metric which essentially counts links, and
asks how many coreference links would have to be added to get the
correct cluster. Consider one coreferential cluster consisting of
mentions m1, ..., mN. Suppose in the system
response these mentions are divided among K clusters. We then
define recall as (N-K)/(N-1). Total recall is obtained by summing
these quantities (numerator and denominator) over all clusters in the
key. Precision is computed by reversing key and response.
This metric is now generally used for reporting coreference scores, but
it has some shortcomings. In particular, if there are lots of
singletons, you don't get any credit for getting them right. Baldwin
et al. discuss Vilain's metric and their alterative B-cubed
coreference metric in their MUC-7 paper.
Non-trainable Coreference Methods: Major References
Jerry Hobbs. Resolving
Pronoun References. Lingua
Proposes a search order (through the
parse trees of current and prior sentences) for pronouns; widely
used for pronominal anaphora resolution.
Shalom Lappin and Herbert Leass. An Algorithm for
Pronominal Anaphora Resolution. Computational Linguistics 20(4):535-561.
Uses various weights (hand set) to
select antecedents for pronouns. Tested on corpus of several
hundred pronouns ... got 86% accuracy.
Renata Vieira; Massimo Poesio. An Empirically
Based System for Processing Definite Descriptions. Computational Linguistics 26(4):539-593.
Developed detailed rules for resolving
definite nominal anaphors and provides detailed evaluation against a
Trainable Coreference Methods:
Niyu Ge; John Hale; Eugene Charniak. A Statistical
Approach to Anaphora Resolution. WVLC 1998.
Does only pronoun resolution.
Reports 84% accuracy on their corpus (non-referential pronouns
excluded). Probabilistic model with features including Hobbs
distance, number agreement, gender match, mention frequency, and
selectional preference. Discusses methods of learning
gender/animateness characteristics of potential antecedents.
Wee Meng Soon; Daniel Chung Yong Lim; Hwee Tou Ng. A Machine
Learning Approach to Coreference Resolution of Noun Phrases.
Computational Linguistics 27 #4, 521-545 (2001).
Corpus-trained procedure for full (MUC)
coreference task. Used a set of syntactic features plus a
semantic class agreement feature (using high-level Wordnet
classes); 12 features in all. Trained on MUC data with
decision tree procedure (classifier produces binary outcome). On
test, links to most recent antecedent for which classifier returns
true. Got MUC-6, 7 F=63%, 60%.