G22.2591 - Advanced Natural Language Processing - Spring 2011

Lecture 7

Relation Extraction

What are relations?

Relations are facts involving a pair of individual entities.
ex:  Moby Dick was written by Hermann Melville.
ex:  Columbia University is located in New York City.
ex:  Ralph Grishman works for New York University.
They generally represent states rather than events (but the distinction between relations and events is not clear-cut).
Many can be described as attributes of one of the entities.
Relations were introduced as an NLP task for MUC-7 (1997) and extended for ACE. Most of the attributes in the Knowledge Base Population task are also relations.

In addition to these general relations, there has been considerable work in the last few years on relations in molecular biology, such as protein-protein interactions.

Why extract relations?

What is the challenge?

The challenge for relation extraction is the usual challenge for NLP:  a coverage or paraphrase problem ... figuring out all the ways in which a relation may be expressed, or all the more specific predicates which may imply a given relation (is a professor at ==> works for;  is taking a tour of ==> is located in).

Supervised training

MUC-7 was limited to 3 types of relations involving organizations: employee_of, product_of, location_of.

ACE defined a set of 18 relations (2005 guidelines).
These include Employment, Part-Whole:Geographical, Located, Near, and Family relations.
Note that relations may occur between any types of mentions -- names, nominals, or pronouns.
Thus the phrase "his uncle" gives rise to a Family relation between "he" and "uncle".

The two arguments of an ACE relation had to occur within the same sentence (MUC relations did not have this constraint, nor does KBP).
What features should we use to detect relations?
Let's look at some examples (/home/grishman/jetx/data/ACE/2005/training/nw).

Augmented parse tree method

S. Miller, M. Crystal, H. Fox, L. Ramshaw, R. Schwartz, R. Stone, R. Weischedel, and the Annotation Group (BBN Technologies).  BBN: Description of the SIFT System as Used for MUC-7MUC-7 Proceedings.
 
For MUC-7, BBN introduced a statistical model for recognizing binary relations between entities -- the 'template relation' task introduced in that evaluation. (This task involved a small number of relations, such as person -- organization, and organization -- location.)  They used a generative model based on a parse tree augmented with semantic labels.  The augmentation is somewhat complicated (see Figure 3 of the paper).  In simplified terms, if a relation connects nodes A and B in the parse tree, and the lowest node dominating both A and B is C, then they add a semantic label to A, B, and C, and to all nodes on the paths from C to A and B. In addition, in some cases a node is added to the tree to indicate the type of relation and the argument.

A large training corpus of this form is generated in a semi-automatic fashion.  The relations are first annotated by hand.  The sentences are then parsed using a TreeBank-based parser, and the resulting (syntactic) tree is augmented with information about the relations. (The parsing is constrained to be compatible with the semantic relations.) In this way a training corpus of about 1/2 million words was produced.  From this training corpus they then produce a lexicalized probabilistic context-free grammar.

This grammar is then used to parse new (test) text;  and the relations present are gleaned from the semantic labels (if any) on the trees.

Because MUC relations can cross sentence boundaries, BBN's parse-tree methods were supplemented with a feature-based method to recognize cross-sentence relations. They included both structural features and content features. The structural features included information about the immediate context of the entities involved, and whether the entities were in successive sentences. The content features tested for particular names or descriptors (e.g., generals are employed by armies or countries) and conflicting information about names (a person who works for X probably doesn't work for Y).

Feature-Based Methods

The basic idea of feature-based methods is to train a classifier which, given two entities in the same sentence (for ACE relations), labels the pair either with a relation type or NO RELATION.

The features used can combine information at many levels, primarily involving the words between the two potential arguments but also using a limited context before the first argument or after the second. Potential features include the words of the two entities, the words between the two entities, the chunks between the two entities, dependency tree features, parse tree features, and membership of words in semantic classes (countries, relatives).

Nanda Kambhatla.
Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Information Extraction. ACL 2004.

GuoDong Zhou, Jian Su, Jie Zhang, and Min Zhang.
Exploring Various Knowledge in Relation Extraction. ACL 2005.

Kernel Methods

Highlights of SVMs [Support Vector Machines]
For brief presentations of SVMs see the chapter from Introduction to Information Retrieval or the slides from Andrew Moore.

Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella.
Kernel Methods for Relation Extraction.  J. Machine Learning Research 3 (2003) 1083-1106.

SRA addressed the same relation-extraction problem differently.  They used a partial parser (roughly, a chunker) and they used a discriminative method (SVMs) instead of a generative one. The parse tree nodes contain a type and a head or text field (Figure 1).  To represent a relation, the nodes get a 'role' field;  for example, to capture a person-affiliation relation, one node (the person) gets role=member and one node (the organization) gets role=affiliation.

One advantage of SVMs is that we do not have to explicitly enumerate the features which are used to classify examples;  it is sufficient to provide a kernel function which, roughly speaking, computes a similarity between examples.  As their kernel, they used a measure of similarity between two trees.  Basically, two trees are considered similar if their roots have the same type and role, and each has a subsequence of children (not necessarily consecutive) with the same types and roles.  The value of the similarity depends on how many such subsequences exist, and how spread out they are.  All the training examples are converted into such shallow parse trees with role labels, and used to train the system;  the SVM can then classify new examples of possible relations.

They obtain an F measure of 0.87 for person-affiliation and 0.83 for organization-location, although this is with hand-checked parses.

Shubin Zhao and Ralph Grishman.
Extracting Relations withh Integrated Information Using Kernel Methods. ACL 2005.

The work at NYU used different types of evidence to identify ACE relations: words, bigrams, the syntactic path between the two arguments, and the local syntactic context of each individual argument. A separate kernel function was written for each, and then a composite kernel combining them all.

Min Zhang; Jie Zhang; Jian Su; GuoDong Zhou.
A Composite Kernel to Extract Relations between Entities with Both Flat and Structured Features. ACL 2006.

Looking ahead: semi- and un-supervised methods

Sergei Brin. Extracting Patterns and Relations from the World Wide Web. (Also available in PDF)  In Proc. World Wide Web and Databases International Workshop, pages 172-183. Number 1590 in LNCS, Springer, March 1998.

Eugene Agichtein and Luis Gravano,   Snowball: Extracting Relations from Large Plain-Text Collections, [slides ]   In Proc. 5th ACM International Conference on Digital Libraries (ACM DL), 2000

Razvan Bunescu and Raymond J. Mooney. Learning to Extract Relations from the Web using Minimal Supervision. ACL 2007.

Takaaki Hasegawa, Satoshi Sekine, Ralph Grishman Discovering Relations among Named Entities from Large Corpora. ACL 2004.