G22.2590 - Natural Language Processing - Spring 2006 Prof. Grishman
Lecture 10 Outline
April 4, 2006
Discuss Asgn. 8: identifying features for chunking
value of conjoined features: parts of speech of
prior AND current words as a single feature, for example
Information extraction: learning methods
Discuss patterns for appointment events: variables, concept hierarchy,
write statement (Assignment 10)
Alternatives to hand-coded patterns ... supervised learning
- annotate a large corpus with the event(s) of interest
- learn features which characterize a particular slot of an event (word
context, syntactic context)
Large scale corpus annotation is burdensome, because it must be repeated
for each type of event we would like to extract. Can we learn from raw
Several approaches have been developed for such unsupervised learning. These often
are 'bootstrapping' methods, which start with a few 'seed' patterns and gradually
discover additional patterns.
One such procedure, described by Sergei Brin, discovers patterns for binary
relations. It works as follows
There is a risk that incorrect patterns can lead to incorrect pairs, so the
errors can grow rapidly. Brin reduced this effect by usiing a functional
relation (book / author) and rejecting patterns which did not produce functional
- start with a few pairs of names involved in the relation of interest,
and a set of patterns (initially empty)
- collect (from a large corpus) examples of patterns which connect several
of these pairs
- add best pattern(s) to set
- collect additional pairs matching these patterns
Sergei Brin. Extracting Patterns
and Relations from the World Wide Web. (Also available in PDF)
In Proc. World Wide Web and Databases International
Workshop, pages 172-183. Number 1590 in LNCS, Springer, March 1998.
Some applications of NLP, such as grammar checking, depend only on the syntactic
form of an input. Most applications, however, are dependent on what
a sentence 'means'. For example, for information extraction,
we want to find mentions of a type of an event, however expressed.
For question answering, we need to connect the question to a body of knowledge
which can provide the answer. In fact, as we shall see when we consider
the processing of extended (multi-sentence) texts, we generally must integrate
the sentence with a great deal of 'background knowledge' in order to understand
what it means. How should we express the meaning of an utterance?
Clearly we can express the meaning in natural language. Why is this
not satisfactory for analyzing the meaning? The components (the words)
are ambiguous; even sentences in isolation are ambiguous. Resolving
these ambiguities can be difficult. Even though syntactic analysis has
made some relationships explicit, others are still implicit; for example,
it does not indicate the quantificational structure of a sentence.
Furthermore, the rules for inferring new facts from given facts in natural
language may be very complicated.
Formal Languages for Meaning Representation
In order to analyze and manipulate the meaning of sentences, we will transform
the sentences into a meaning representation language. We want to transform
the sentences into a language which
[J&M also mention the characteristic of having a canonical form for each
meaning. This is an ultimate goal rather than an easily achieved criterion.
However, we can see both syntactic analysis and semantic analysis as moving
in this direction -- reducing paraphrase, i.e., reducing the variation
in form for a given meaning.]
- is unambiguous
- has simple rules of interpretation and inference, and in particular
- has a logical structure determined by its form
- is sufficiently expressive to capture the range of natural
language meanings we shall require for our applications
These are the properties of the languages of logic. Actual
systems may use different representations, but they are generally equivalent
to the formal language (extensions of predicate calculus) we will use for
The simplest form is propositional logic, but it is not powerful enough for
our purposes. Predicate logic combines predicates and their arguments.
Predicate logic has simple rules of inference, such as modus ponens (from
A and A==>B, infer B).
- terms (constants, variables, and functions)
- atomic formulas (predicate + arguments)
- logical connectives (not, and, or, implies)
Predicate calculus is intended for representing “eternal truths” (like the
facts of mathematics). We face several problems when we try to use it
for representing events. First, how many arguments does an event have
(consider J&M example of eating, p. 524)? In natural language, the
same type of event may be described with many different sets of arguments
and modifiers (time, location, speed, ...). We can use meaning postulates
to relate these, but that requires many such postulates and may make commitments
we do not intend. Second, we need to individuate events (say that two
events are the same or different; count events).
We can address this problem by reification -- treating events as objects
(J&M p. 527).
Other issues of expressiveness
There are many other issues which we may need to address in our meaning representation
- generalized quantifiers (for ‘some’, ‘most’, …)
- tense and aspect
- modality and belief (need to allow formulas as arguments: “John
believes Fred likes Mary” = believe(John,like(Fred,Mary)) )
- presupposition (“All the men on Mars drink Coca-Cola.”)
- fuzziness (“The milk is warm.”)
Representations for information extraction
Information extraction applications are generally concerned with identifying
specific, individual events and relations between entities. They need
to capture event modifiers such as time and location, but not quantification.
Such applications typically use a frame or slot-filler representation.
For each type of event (or set of event types taking the same arguments)
we define a frame (template), with one slot containing a unique identifier
of the event, and one slot for each possible argument/modifier. Similarly,
a frame is defined for each type of entity. Slots may be filled with
constants or the identifiers of other events or entities.
Semantic Analysis: Adjusting the Representation
Mapping Syntax to Semantics (J&M
We want to compute the semantic representation of a sentence from the
parse tree. Because the parse tree provides a structural framework,
we will use a compositional, syntax-driven translation process.
This means that we will associate a (partial) semantic interpretation with
some or all of the nodes of the parse tree, to be computed (using a rule)
from the interpretations of its children.
We could embed this translation in a procedure associated with each type
of node. Alternatively, one can formalize this by a set of rules associated
with the productions of the grammar (J&M sec. 15.1). The grammar
will be extended to add a SEM feature, representing the semantic interpretation
of a node. Each production will then incorporate the rule for computing
its SEM value, and the SEM of the root will be the interpretation of the
sentence. (J&M p. 549).
The semantics of a verb phrase is essentially the semantics of a clause,
with one argument (the subject) missing … a predicate with one unbound argument.
We can represent this by a lambda expression (p. 551). Lambda expressions
are commonly used to capture the rules for composing the semantics.
For the process of translating syntactic to semantic forms, it is convenient
to introduce restricted quantifiers, of the form
(forall x: C(x))
These do not add any power to predicate calculus; they can be rewritten
(exists x: C(x))
(forall x: C(x)) P(x) = (forall x) (C(x) => P(x))
Roughly speaking, a noun phrase can be translated to a constant or a restricted
(exists x: C(x)) P(x) = (exists x) (C(x) & P(x))
One source of ambiguity is quantifier scope:
A woman gives birth in the United States every five minutes.
We can represent the two readings in conventional predicate calculus using
different quantifier scopes. If we explicitly represent all the semantic
ambiguities in a sentence in this way, we may have very many readings.
It is therefore practical to initially produce (from the parse) a representation
which captures multiple readings … which encodes (some of) the ambiguity.
(And hope that this ambiguity can be resolved at a later stage of semantic
In particular, we can use complex terms (J&M p. 555)
<Quantifier x: C(x)>
with the understanding that
P(<Quantifier x: C(x)>) = (Quantifier x: C(x)) P(x)
If an expression contains several complex terms, the scope of the quantifiers
is indeterminate. Semantic analysis will generate such quasi-logical
forms, with a separate step then determining the quantifier scope and
generating a predicate calculus expression.