Planning for KBP -- data annotation.
Presentation by Omer Gunes.
For the early MUCs, these systems were created by constructing rules manually ... either patterns which matched subtrees of a parse tree or regular expressions which matched series of tokens or chunks.
Early attempts were made to semi-automate the pattern creation process: starting with a large annotated corpus, taking the immediate syntactic context of each slot filler and converting it to an extraction pattern. The resulting set of patterns was then reviewed by hand. This, however, still depended on a large amount of hand annotation.
Automatically Generating Extraction Patterns from Untagged Text
Proc. Thirteenth National Conference on Artificial Intelligence
(AAAI-96) , 1996, pp. 1044-1049.
How to select relevant patterns without tagging the whole corpus? Riloff observed that if the corpus was classified into relevant and irrelevant documents, patterns which occurred substantially more often in relevant documents were in general relevant patterns. This greatly reduced but did not eliminate the corpus annotation required.
Roman Yangarber; Ralph Grishman; Pasi Tapanainen; Silja Huttunen.
Acquisition of Domain Knowledge for Information Extraction.
Proc. COLING 2000.
Yangarber entirely eliminated the corpus annotation through a bootstrapping scheme. Starting from a set of seed patterns, he retrieved some (relevant) documents containing these seeds, then used Riloff's metric to select some additional patterns, which were used to retrieve more documents, etc.
M. Stevenson and M. Greenwood. A Semantic Approach to IE Pattern Induction.
Mihai Surdeanu, Jordi Turmo, and Alicia Ageno.
A Hybrid Approach for the Acquisition of Information Extraction Patterns.
Proceedings of the EACL 2006 Workshop on Adaptive Text Extraction and Mining (ATEM 2006), April 2006.