G22.2591 - Advanced Natural Language Processing - Spring 2009

Lecture 10

Discussion of Assignment #3 results.

Planning for KBP -- data annotation.

Relation Extraction (cont'd)

Takaaki Hasegawa, Satoshi Sekine, Ralph Grishman Discovering Relations among Named Entities from Large Corpora. ACL 2004.

Presentation by Omer Gunes.

Scenario Template / Event Extraction

The scenario template task originally was the information extraction task for the MUC evaluations. It involved identifying participants, locations, dates etc. of a class of events -- a naval engagement, a terrorist incident, a joint venture. With later MUCs, the task narrowed to single events or closely related events -- executive succession, rocket launchings. For the ACE evaluations, this became the event extraction task.

For the early MUCs, these systems were created by constructing rules manually ... either patterns which matched subtrees of a parse tree or regular expressions which matched series of tokens or chunks.

Early attempts were made to semi-automate the pattern creation process: starting with a large annotated corpus, taking the immediate syntactic context of each slot filler and converting it to an extraction pattern. The resulting set of patterns was then reviewed by hand. This, however, still depended on a large amount of hand annotation.

Ellen Riloff, Automatically Generating Extraction Patterns from Untagged Text Proc. Thirteenth National Conference on Artificial Intelligence (AAAI-96) , 1996, pp. 1044-1049.

How to select relevant patterns without tagging the whole corpus? Riloff observed that if the corpus was classified into relevant and irrelevant documents, patterns which occurred substantially more often in relevant documents were in general relevant patterns. This greatly reduced but did not eliminate the corpus annotation required.

Roman Yangarber; Ralph Grishman; Pasi Tapanainen; Silja Huttunen.  Automatic Acquisition of Domain Knowledge for Information Extraction.  Proc. COLING 2000.

Yangarber entirely eliminated the corpus annotation through a bootstrapping scheme. Starting from a set of seed patterns, he retrieved some (relevant) documents containing these seeds, then used Riloff's metric to select some additional patterns, which were used to retrieve more documents, etc.

looking ahead

Roman Yangarber. (2003) Counter-Training in Discovery of Semantic Patterns.
ACL 2003.

M. Stevenson and M. Greenwood. A Semantic Approach to IE Pattern Induction.
ACL 2005.

Mihai Surdeanu, Jordi Turmo, and Alicia Ageno.
A Hybrid Approach for the Acquisition of Information Extraction Patterns.
Proceedings of the EACL 2006 Workshop on Adaptive Text Extraction and Mining (ATEM 2006), April 2006.