Automatically Generating Extraction Patterns from Untagged Text
Proc. Thirteenth National Conference on Artificial Intelligence
(AAAI-96) , 1996, pp. 1044-1049.
Roman Yangarber; Ralph Grishman; Pasi Tapanainen; Silja Huttunen. Automatic Acquisition of Domain Knowledge for Information Extraction. Proc. COLING 2000.
M. Stevenson and M. Greenwood. A Semantic
Approach to IE Pattern Induction.
Compares Yangarber's discovery procedure
with a procedure which expands the same set of seeds using
WordNet. Defines a "semantic similarity" metric over WordNet
first for individual words and then for subject-verb-object
patterns. At each iteration it adds to the seed set the patterns
most similar to (the centroid of) the seed patterns. Evaluates
both methods on the MUC-6 (executive succession) task, using the MUC-6
corpus and (for Yangarber's method) 6000 additional documents.
Shows a small advantage over Yangarber on the document filtering task,
and a considerably larger advantage on the sentence filtering task.
Mihai Surdeanu, Jordi Turmo, and Alicia Ageno.
A Hybrid Approach for the Acquisition of Information Extraction Patterns.
Proceedings of the EACL 2006 Workshop on Adaptive Text Extraction and Mining (ATEM 2006), April 2006.
Uses a co-training strategy in which two classifiers seek to classify
documents as relevant to a particular scenario. One is an
extraction-pattern based classifier similar to Yangarber's; the
other is a bag-of-words classifier. In their experiments, the
bag-of-words classifier converges quickly; the pattern-based takes
much longer. A number of pattern-ranking functions are tried,
including the Riloff criterion used by Yangarber. A simpler function
adapted from Collins and Singer proves to work better. In general,
considerable gains are reported over Yangarber-style bootstrapping.