G22.2591 - Advanced Natural Language Processing - Spring 2009

Lecture 8

Discuss Assignment #2 results.

Relation Extraction

What are relations?

Relations are facts involving a pair of individual entities.
ex:  Moby Dick was written by Hermann Melville.
ex:  Columbia University is located in New York City.
ex:  Ralph Grishman works for New York University.
They generally represent states rather than events (but the distinction between relations and events is not clear-cut).
Many can be described as attributes of one of the entities.
Relations were introduced as an NLP task for MUC-7 (1997) and extended for ACE.

Why extract relations?

What is the challenge?

The challenge for relation extraction is the usual challenge for NLP:  a coverage or paraphrase problem ... figuring out all the ways in which a relation may be expressed, or all the more specific predicates which may imply a given relation (is a professor at ==> works for;  is taking a tour of ==> is located in).

Supervised training

ACE has defined a set of 18 relations (2005 guidelines).
These include Employment, Part-Whole:Geographical, Located, Near, and Family relations.
Note that relations may occur between any types of mentions -- names, nominals, or pronouns.
Thus the phrase "his uncle" gives rise to a Family relation between "he" and "uncle".

What features should we use to detect relations?
Let's look at some examples (/home/grishman/jetx/data/ACE/2005/training/nw).

Augmented parse tree method

S. Miller, M. Crystal, H. Fox, L. Ramshaw, R. Schwartz, R. Stone, R. Weischedel, and the Annotation Group (BBN Technologies).  BBN: Description of the SIFT System as Used for MUC-7MUC-7 Proceedings.
 
For MUC-7, BBN introduced a statistical model for recognizing binary relations between entities -- the 'template relation' task introduced in that evaluation. (This task involved a small number of relations, such as person -- organization, and organization -- location.)  They used a generative model based on a parse tree augmented with semantic labels.  The augmentation is somewhat complicated (see Figure 3 of the paper).  In simplified terms, if a relation connects nodes A and B in the parse tree, and the lowest node dominating both A and B is C, then they add a semantic label to A, B, and C, and to all nodes on the paths from C to A and B.  In addition, in some cases a node is added to the tree to indicate the type of relation and the argument.

A large training corpus of this form is generated in a semi-automatic fashion.  The relations are first annotated by hand.  The sentences are then parsed using a TreeBank-based parser, and the resulting (syntactic) tree is augmented with information about the relations.  In this way a training corpus of about 1/2 million words was produced.  From this training corpus they then produce a lexicalized probabilistic context-free grammar.

This grammar is then used to parse new (test) text;  and the relations present are gleaned from the semantic labels (if any) on the trees.

SVM Methods

Highlights of SVMs [Support Vector Machines]
For brief presentations of SVMs see the chapter from Introduction to Information Retrieval or the slides from Andrew Moore.

Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella.
Kernel Methods for Relation Extraction.  J. Machine Learning Research 3 (2003) 1083-1106.

SRA addressed the same relation-extraction problem differently.  They used a partial parser (roughly, a chunker) and they used a discriminative method (SVMs) instead of a generative one. The parse tree nodes contain a type and a head or text field (Figure 1).  To represent a relation, the nodes get a 'role' field;  for example, to capture a person-affiliation relation, one node (the person) gets role=member and one node (the organization) gets role=affiliation.

One advantage of SVMs is that we do not have to explicitly enumerate the features which are used to classify examples;  it is sufficient to provide a kernel function which, roughly speaking, computes a similarity between examples.  As their kernel, they used a measure of similarity between two trees.  Basically, two trees are considered similar if their roots have the same type and role, and each has a subsequence of children (not necessarily consecutive) with the same types and roles.  The value of the similarity depends on how many such subsequences exist, and how spread out they are.  All the training examples are converted into such shallow parse trees with role labels, and used to train the system;  the SVM can then classify new examples of possible relations.

They obtain an F measure of 0.87 for person-affiliation and 0.83 for organization-location, although this is with hand-checked parses.

Shubin Zhao and Ralph Grishman.
Extracting Relations withh Integrated Information Using Kernel Methods. ACL 2005.

The work at NYU used different types of evidence to identify ACE relations: words, bigrams, the syntactic path between the two arguments, and the local syntactic context of each individual argument. A separate kernel function was written for each, and then a composite kernel combining them all.

Looking ahead: semi- and un-supervised methods

Sergei Brin. Extracting Patterns and Relations from the World Wide Web. (Also available in PDF)  In Proc. World Wide Web and Databases International Workshop, pages 172-183. Number 1590 in LNCS, Springer, March 1998.

Eugene Agichtein and Luis Gravano,   Snowball: Extracting Relations from Large Plain-Text Collections, [slides ]   In Proc. 5th ACM International Conference on Digital Libraries (ACM DL), 2000

Takaaki Hasegawa, Satoshi Sekine, Ralph Grishman Discovering Relations among Named Entities from Large Corpora. ACL 2004.