G22.2591 - Advanced Natural Language Processing - Spring 2009

Lecture 11

Discussion of KBP data preparation problems.

KBP term project people:
Some suggestions for training a relation classifier.

Scenario Template / Event Extraction (cont'd)

Ellen Riloff, Automatically Generating Extraction Patterns from Untagged Text Proc. Thirteenth National Conference on Artificial Intelligence (AAAI-96) , 1996, pp. 1044-1049.

Roman Yangarber; Ralph Grishman; Pasi Tapanainen; Silja Huttunen.  Automatic Acquisition of Domain Knowledge for Information Extraction.  Proc. COLING 2000.

Presentations by Wei Xu and Shasha Liao.

Roman Yangarber. (2003) Counter-Training in Discovery of Semantic Patterns.
ACL 2003.

Addresses the problem which arose with Yangarber et al. (2000) of figuring out when to stop bootstrapping ... without further control, that procedure would continue adding patterns until all patterns in the corpus are included.  This is a common problem of semi-supervised (bootstrapping) methods.  Yangarber addresses it here by creating seeds for a number of different scenarios and training classifiers for these scenarios concurrently.  Relevance of an article for one scenario is treated as negative evidence of its relevance for other scenarios.  Eventually the patterns are partitioned among the scenarios and the bootstrapping halts.

M. Stevenson and M. Greenwood. A Semantic Approach to IE Pattern Induction.
ACL 2005.

Compares Yangarber's discovery procedure with a procedure which expands the same set of seeds using WordNet.  Defines a "semantic similarity" metric over WordNet first for individual words and then for subject-verb-object patterns.  At each iteration it adds to the seed set the patterns most similar to (the centroid of) the seed patterns.  Evaluates both methods on the MUC-6 (executive succession) task, using the MUC-6 corpus and (for Yangarber's method) 6000 additional documents.  Shows a small advantage over Yangarber on the document filtering task, and a considerably larger advantage on the sentence filtering task.

Mihai Surdeanu, Jordi Turmo, and Alicia Ageno.
A Hybrid Approach for the Acquisition of Information Extraction Patterns.
Proceedings of the EACL 2006 Workshop on Adaptive Text Extraction and Mining (ATEM 2006), April 2006.

Uses a co-training strategy in which two classifiers seek to classify documents as relevant to a particular scenario. One is an extraction-pattern based classifier similar to Yangarber's; the other is a bag-of-words classifier. In their experiments, the bag-of-words classifier converges quickly; the pattern-based takes much longer. A number of pattern-ranking functions are tried, including the Riloff criterion used by Yangarber. A simpler function adapted from Collins and Singer proves to work better. In general, considerable gains are reported over Yangarber-style bootstrapping.