G22.2591 - Advanced Natural Language Processing - Spring 2011
What are relations?
Relations are facts involving a pair of
ex: Moby Dick was written by Hermann
ex: Columbia University is located in New York City.
ex: Ralph Grishman works for New York University.
They generally represent states rather than events (but the distinction
between relations and events is not clear-cut).
Many can be described as attributes of one of the entities.
Relations were introduced as an NLP task for MUC-7
(1997) and extended for ACE
. Most of the attributes in the Knowledge Base Population
task are also relations.
In addition to these general relations, there has been considerable
work in the last few years on relations in molecular biology,
such as protein-protein interactions.
Why extract relations?
- Relations capture much of the connection between entities, and
can be used to build various entity networks (e.g., social networks).
- Many user queries ask about relations; if these can be
identified, the search engine can return the answer or at least the
sentence which probably has the answer. See for example the analysis
of Encarta questions: Analysis
of Factoid Questions for Effective Relation Extraction (poster),
Eugene Agichtein, Silviu Cucerzan, and Eric Brill,
ACM SIGIR International Conference on Research and Development in
Information Retrieval (SIGIR), 2005
- In bioinformatics and genomics, much of the information is in
such as protein/gene interactions which can be extracted from articles
(see for example Claudio Giuliano, Alberto Lavelli, Lorenza Romano. Exploiting
Shallow Linguistic Information for Relation Extraction from Biomedical
Literature, EACL 2006
What is the challenge?
The challenge for relation extraction
is the usual challenge for NLP: a coverage or paraphrase problem
... figuring out all the ways in which a relation may be expressed, or
all the more specific predicates which may imply a given relation (is a
professor at ==> works for; is taking a tour of ==> is
MUC-7 was limited to 3 types of relations involving organizations:
employee_of, product_of, location_of.
ACE defined a set of 18 relations (2005 guidelines).
These include Employment, Part-Whole:Geographical, Located, Near, and
Note that relations may occur between any types of mentions -- names,
nominals, or pronouns.
Thus the phrase "his uncle" gives rise to a Family relation between
"he" and "uncle".
The two arguments of an ACE relation had to occur within the
same sentence (MUC relations did not have this constraint,
nor does KBP).
What features should we use to detect relations?
Let's look at some examples
Augmented parse tree method
S. Miller, M. Crystal, H. Fox, L. Ramshaw, R. Schwartz, R. Stone, R.
Weischedel, and the Annotation Group (BBN Technologies). BBN:
Description of the SIFT System as Used for MUC-7. MUC-7 Proceedings.
For MUC-7, BBN introduced a statistical
model for recognizing binary relations between entities -- the
'template relation' task introduced in that evaluation. (This task
involved a small number of relations, such as person -- organization,
and organization -- location.) They used a generative model based
on a parse tree augmented with semantic labels. The augmentation
is somewhat complicated (see Figure 3 of the paper). In
simplified terms, if a relation connects nodes A and B in the parse
tree, and the lowest node dominating both A and B is C, then they add a
semantic label to A, B, and C, and to all nodes on the paths from C to
A and B. In addition, in some cases a node is added to the tree
to indicate the type of relation and the argument.
A large training corpus of this form is generated in a semi-automatic
fashion. The relations are first annotated by hand. The
sentences are then parsed using a TreeBank-based parser, and the
resulting (syntactic) tree is augmented with information about the
relations. (The parsing is constrained to be compatible with the
semantic relations.) In this way a training corpus of about 1/2 million
words was produced. From this training corpus they then produce a
lexicalized probabilistic context-free grammar.
This grammar is then used to parse new (test) text; and the
relations present are gleaned from the semantic labels (if any) on the
Because MUC relations can cross sentence boundaries, BBN's parse-tree
methods were supplemented with a feature-based method to recognize
cross-sentence relations. They included both structural features
and content features. The structural features included information
about the immediate context of the entities involved, and whether the
entities were in successive sentences. The content features tested
for particular names or descriptors (e.g., generals are employed by
armies or countries) and conflicting information about names
(a person who works for X probably doesn't work for Y).
The basic idea of feature-based methods is to train a classifier
which, given two entities in the same sentence (for ACE relations),
labels the pair either with a relation type or NO RELATION.
The features used can combine information at many levels, primarily
involving the words between the two potential arguments but
also using a limited context before the first argument or
after the second. Potential features include the words of the
two entities, the words between the two entities, the chunks
between the two entities, dependency tree features, parse tree
features, and membership of words in semantic classes (countries,
Combining Lexical, Syntactic, and Semantic Features with
Maximum Entropy Models for Information Extraction. ACL 2004.
GuoDong Zhou, Jian Su, Jie Zhang, and Min Zhang.
Exploring Various Knowledge in Relation Extraction. ACL 2005.
Highlights of SVMs [Support Vector Machines]
For brief presentations of SVMs see
chapter from Introduction to Information Retrieval or the slides from
- linear classifier
- select separating plane to maximize margin
- support vectors = data points closest to separating plane
- dealing with noisy (non-separable) data: soft margin
- kernel methods: replace dot product with kernel function to
capture similarity of data points without reducing each data point to a
vector of features
Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella.
for Relation Extraction. J. Machine Learning Research
3 (2003) 1083-1106.
SRA addressed the same
relation-extraction problem differently. They used a partial
parser (roughly, a chunker) and they used a discriminative method
(SVMs) instead of a generative one. The parse tree nodes contain a type
and a head or text field (Figure 1). To represent a relation, the
nodes get a 'role' field; for example, to capture a
person-affiliation relation, one node (the person) gets role=member and
one node (the organization) gets role=affiliation.
One advantage of SVMs is that we do not
have to explicitly enumerate the features which are used to classify
examples; it is sufficient to provide a kernel function which,
roughly speaking, computes a similarity between examples. As
their kernel, they used a measure of similarity between two
trees. Basically, two trees are considered similar if their roots
have the same type and role, and each has a subsequence of children
(not necessarily consecutive) with the same types and roles. The
value of the similarity depends on how many such subsequences exist,
and how spread out they are. All the training examples are
converted into such shallow parse trees with role labels, and used to
train the system; the SVM can then classify new examples of
They obtain an F measure of 0.87 for person-affiliation and 0.83 for
organization-location, although this is with hand-checked parses.
Shubin Zhao and Ralph Grishman.
Extracting Relations withh Integrated Information Using Kernel Methods.
The work at NYU used different types
of evidence to identify ACE relations: words, bigrams, the syntactic
path between the two arguments, and the local syntactic context of
each individual argument. A separate kernel function was written for
each, and then a composite kernel combining them all.
Min Zhang; Jie Zhang; Jian Su; GuoDong Zhou.
A Composite Kernel to Extract Relations between Entities with Both Flat and Structured Features. ACL 2006.
Looking ahead: semi- and un-supervised methods
Sergei Brin. Extracting
and Relations from the World Wide Web. (Also available in PDF)
Proc. World Wide Web and
Databases International Workshop, pages 172-183. Number 1590 in
LNCS, Springer, March 1998.
Eugene Agichtein and Luis Gravano, Snowball:
Extracting Relations from Large Plain-Text Collections, [slides
] In Proc. 5th ACM
International Conference on Digital Libraries (ACM DL), 2000
Razvan Bunescu and Raymond J. Mooney.
Learning to Extract Relations from the Web using Minimal Supervision.
Takaaki Hasegawa, Satoshi Sekine, Ralph Grishman
Discovering Relations among Named Entities from Large Corpora.