CSCI-GA.2590 - Natural Language Processing - Spring 2013 Prof. Grishman
Lecture 7 Outline
March 12, 2013
Begin discussing term projects.
Conditional Random Fields
Maximum entropy Markov Models have proven effective in building models
for a number of NLP sequential tagging tasks. You will be building one
such application for today's assignment. However, they suffer
from a problem called 'label bias'.
The immpact of label bias depends
on the structure of the network; it is particularly evident if there
are states with only one outgoing arc.
This problem can be avoided by using conditional random fields
(CRFs) (Lafferty, McCallum, and Pereira, Conditional Random Fields:
Probabilistic Models for Segmenting and Labeling Sequence Data).
While MEMMs treat the transitions out of each single state
as a separate problem (with a normalized log-linear model), the CRF
creates a single normalized log-linear model for predicting
tag sequences. Such a model can take longer to train but can
produce better results.
Sentence Level Patterns and Semantic Grammars
(Section 15.5 in J&M First Edition;
more briefly in section 24.2.2 of J&M Second Edition)
We can couple together the annotators we have considered so far ... a
POS tagger, a name annotator, and a chunk annotator ... to give us
information on the low-level constituents of a sentence, quickly and
with reasonable accuracy. Can we keep goiing now and write
patterns for larger constituents, such as noun phrases and sentences?
Unfortunately, for larger constituents the problem of ambiguity becomes
much greater, as we have discussed before. Syntactic patterns are
not sufficient to produce accurate, unambiguous analyses. We need
to include semantic constraints, and maybe more.
Capturing semantic constraints in general is a difficult problem.
However, if we focus on a narrow domain, such as weather reports, car
repair reports, (specific types of ) medical reports, or some types of
financial reports, the problem gets easier. For such a
sublanguage, we can identify semantic
classes: form classes of nouns (the types of
'entities' in the domain) and classify verbs in terms of the types of
nouns they take as arguments. The semantic co-occurrence constraints
(selectional constraints) can then be captured either
We will initially (for our Jet implementation) use semantic grammar
patterns. Semantic grammars
- by a separate grammar component which checks these constraints
whenever the parses completes a clause constituent, or
- by a context-free grammar stated in terms of these semantic word
classes -- a semantic grammar
Capturing Semantic Constraints in
- provide a simple approach to limited sublanguages
(because they capture both syntactic and semantic constraints in a
- in particular, they are convenient for constructs which fall outside
general language syntax (constructs which appear only with specific
- but they lose power of syntactic generalization … each semantic
must appear in each of its syntactic forms (active, passive, question,
…), and so are cumbersome for broad-coverage systems.
How do we capture the
constraints in a domain? Let's consider the executive succession
domain ... keeping track of people who were hired for or who left
executive jobs. In general, articles which contain information
about executive succession also talk about other stuff, but we will
only be concerned for the moment with references to executive
succession. Other information in the article will be ignored.
We are going to look for patterns like
"appointed" person "as" position
company "named" person
company "selected" person "as" position
The first problem we face in trying to make these patterns a bit more
general is that we may have different inflected forms of each
verb. A headline might have a present tense, for example
WorldCom appoints Fred Smith as vice
president for lunar phone service
The directors of WorldCom appoint Fred
so maybe we need a pattern like
"appoint" | "appoints") person "as" position
That's not very convenient; we'd like to express the pattern in
terms of the base form of the
verb. Fortunately, the Jet
lexicon assigns a feature
structure to every inflected form of the verb, including a pa feature of the form [head = base-form], so we can write this more
[Note: this requires that one use both the Jet lexicon and the
statistical part-of-speech tagger; in this case, the tagger is
used to filter the entries
provided by the lexicon, using the Jet command pruneTags.]
This still doesn't address the problem of verb groups which include "appoint"
Enron has appointed Fred Smith as
treasurer for the day.
We could write a verb group pattern for each verb, in which we
constrain the head of the
Enron will appoint Fred Smith as comptroller.
:= [constit cat=tv pa=[head=appoint]] | [constit cat=w] vg-inf-appoint
| tv-vbe vg-ving-appoint;
and then create a unique verb group category, but that's clearly
inefficient. Instead we create a general verb group constituent
which has a pa property equal to the pa of the head of the
phrase, by writing a general verb group pattern which propagates
the information from the head to the phrase. This can be done in
the Jet pattern language using a variable (a symbol beginning with a
capital letter) for a feature:
vg-inf-appoint := [constit cat=v
pa=[head=appoint]] | "be" vg-ving;
vg-ving-appoint := [constit cat=ving pa=[head=appoint]];
when vg-appoint add [constit cat=vgroup-appoint];
[constit cat=w] vg-inf |
vg-inf := [constit cat=v pa=PA-verb] | "be" vg-ving;
vg-ving := [constit cat=ving pa=PA-verb];
when vg add [constit cat=vgroup pa=PA-verb];
We can take an exactly parallel approach for noun groups. The Jet
lexicon assigns to each form of the noun (singular and plural) a feature
pa = [head = base-form-of-noun number = singular or plural], and
we can propagate this information in the same way from the head of the
noun group to be a feature on the noun group itself. With rare
exceptions, selectional constraints act between the heads of the noun
and verb groups.
For verb groups, we still have to write
in order to capture the alternative (synonymous) verbs for hiring
someone. To make this neater, Jet provides a separate component
-- a semantic concept
hierarchy or ontology -- for grouping together related words.
The concept heirarchy allows us to create a tree of concepts, and to
associate one or more words with a concept. We associate the
verbs similar to 'appoint' with a concept node cAppoint in the hierarchy, and
this matches any word associated with the cAppoint node, or a node below
cAppoint in the hierarchy.
With this, we are ready to put together patterns
events. We will have
very modest goals for this example, and will only look for person -
position pairs (we will consider how to capture the organization name
in a later version). There are three patterns we look for:
In this version, very little allowance is made for modifiers which may
the pattern (other than modifiers in noun groups); the only
allowed is an age after a name: "Fred Smith, 42, ". Also,
this version does not impose constraints on the classes of the noun
groups, though we certainly could. This will give us some
additional recall but at some loss of precision.
- active clauses (where we just look for the VP): ...
- passive clauses: Fred was appointed as dogcatcher
- nominalizations: the appointment of Fred as dogcatcher
Discovering Patterns: Semi-supervised methods (J&M 22.2.2)
Developing a set of patterns for a given type of relation or event can
be a laborious process which requires reading a large number of
articles about corporate appointments and keeping track of how this
information is expressed. Fortunately, this process can be at
least partly automated through a bootstrapping
- select one expression of the relation, "was named"
- search a corpus for pairs involved in this relation:
person1 was named post1, person2 was named post2, ...
- look for other expressions connecting these pairs:
the best candidates (such as X) will appear with more than
one person/post pair
Some of the resulting patterns are much too general and may have
to be pruned, but it is much easier to review a list of candidates than to
think up new candidates.
- person1 X post1
- person1 Y post1
- person2 X post2