G22.2591 - Advanced Natural Language Processing - Spring 2011

Lecture 14

Presentations of term projects

Coreference, cont'd

Discourse constraints, cont'd

Liao and Grishman, Large corpus-based semantic feature extraction for pronoun coreference, Proceedings of the Second International Workshop on NLP Challenges in the Information Explosion Era (at COLING 2010)).

Cross-document coreference

There have recently been several evaluations of cross-doc coreference:  WePS (Web People Search), ACE 2008, and KBP.  Cross-document coreference is limited to entities which are named in each document.  There are several variations on the task
Evaluations have been done on web collections and on news stories.

Vector space model

The basic approach to cross-document coref is the vector space model (Bagga and Baldwin, Entity-Based Cross-Document Core f erencing Using the Vector Space Model, ACL 1998)
Challenges

for a particular name string, distribution over entities is typically very skewed:  most references refer to the same person (e.g., Michael Jackson)
relevant context features are sparse
very wide range of name frequencies and degrees of ambiguity
Addressing the Challenges (without annotating data)