G22.2591 - Advanced Natural Language Processing - Spring 2011
The past decade has seen a great deal of work on building supervised
coreference components. These vary in
the types of constraints/features used (distance, gender, number,
semantic compatibility, ...)
- the type of model used (how coreference is reduced to a classification
problem: mention pair model, entity pair model, mention ranking
model, entity ranking model)
We will consider today three variants on this research:
Learning accurate constraints from corpora
Aria Haghighi and Dan Klein, Simple
coreference resolution with rich syntactic and semantic features.
Is it possible to do coreference
without a statistical coref model? Haghighi and Klein learn rich
lexical, syntactic, and semantic constraints and combine them to create
a high performance coref system without the need to train a coref
model. (Further improved on by Haghighi and Klein, Coreference
in a modular, entity-centered model
, NAACL 2010
Colin Cherry and Shane Bergsma. An
expectation maximization approach to pronoun resolution.
One of the earliest successful efforts at unsupervised coreference.
First limits the search space for antecedents of pronouns
Then defines a generative probabilistic model for a document with
- only current and preceding sentence
- excludes pleonastic pronouns
- excludes cataphora
- excludes antecedents violating gender or number agreement
- imposes syntactic constraints on reflexives
- a document is defined by a fixed set of 'pronoun positions' and a
set of candidate antecedents for each position
- generating a document then involves
- picking a candidate antecedent c for each position
- picking a pronoun (he / she / it / they) for each position
- picking a context (governor in dependency tree and dependency
relation) for each position
P(resolved document) = product(pronoun positions) sum(c) P(p, k | c)
P(p, k | c) = P(p | c) P(k | c) = P(p | l) P(k | l) P(l) P(j)
where 'l' is the lexical content of the antecedent c and 'j' (the jump
value) is its position in the candidate list
We can then use EM to learn the separate probabilities in order to
maximize the probability of the resolved document. This involves
alternating E steps and M steps.
In the E-step, we compute (fractional) counts of antecedents for each
P(c | p, k) = P(p | l) P(k | l) P(l) P(j) / sum(c') P(p | l') P(k | l')
Here a particular c defines values for l and j. Given these P(c |
p, k), and hence the counts of <p, l>, <k, l>, ... we can
compute maximum likelihood probabilities such as
P(p | l) = #<p, l> / #l
in the M step.
They got a pronoun resolution accuracy of 66%, compared to 71% for a
supervisied SVM resolver trained on 1400 examples.
More recent work on unsupervised coreference ...
Vincent Ng, Unsupervised
models for coreference resolution. EMNLP 2008.
Hoifung Poon and Pedro Domingos, Joint
unsupervised coreference resolution with Markov logic, EMNLP 2008.
David Bean and Ellen Riloff.
of contextual role knowledge for coreference resolution.
Proc. HLT/NAACL 2004.
Can we make use of discourse
information -- event sequences -- to resolve coref? Bean and
Riloff learn pairs of predicates which are likely to govern references
to the same entity, starting with reliable coreference pairs. Applied
in two narrow domains, terrorism
and disasters. (Generalized by Liao and Grishman, Large
corpus-based semantic feature extraction for pronoun coreference
Proceedings of the Second International Workshop on NLP Challenges in
the Information Explosion Era (at COLING 2010)).