G22.2591 - Advanced Natural Language Processing - Spring 2004
Information Extraction: Unsupervised Learning
(course evaluation today)
We considered last week some of the methods for learning extraction
patterns from annotated corpora. Developing annotated corpora for
information extraction is particularly problematic because there are so
many scenarios (event types), and we need a separate annotation for
each scenario. We therefore consider this week how such systems
could be developed with very little training data.
Several of the systems are based on bootstrapping methods.
Bootstrapping relies on some redundancy ... multiple features which are
correlated with an instance of the class or relation of interest.
We will consider examples of two types of bootstrapping, pattern
/ instance bootstrapping and pattern/relevant document
bootstrapping. In pattern/instance bootstrapping, a pair of names
in a given context is likely to be an instance of a relation if (1)
other pairs appearing in this context are instances of the relation or
(2) this pair appearing in other contexts is an instance of the
relation. This is similar to the bootstrapping used for
unsupervised name discovery.
In pattern/relevant document bootstrapping, a pair of names in a given
context is likely to be an instance of an event if (1) other pairs
appearing in this context are instances of the event or (2) the pair
appears more frequently in documents containing other instances of the
Discovery from relevant documents: