CSCI-GA.2590 - Natural Language Processing - Spring 2013 Prof. Grishman

Lecture 7 Outline

March 12, 2013

Begin discussing term projects.

Conditional Random Fields

Maximum entropy Markov Models have proven effective in building models for a number of NLP sequential tagging tasks. You will be building one such application for today's assignment. However, they suffer from a problem called 'label bias'. The immpact of label bias depends on the structure of the network; it is particularly evident if there are states with only one outgoing arc.

This problem can be avoided by using conditional random fields (CRFs) (Lafferty, McCallum, and Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data). While MEMMs treat the transitions out of each single state as a separate problem (with a normalized log-linear model), the CRF creates a single normalized log-linear model for predicting tag sequences. Such a model can take longer to train but can produce better results.

Sentence Level Patterns and Semantic Grammars

(Section 15.5 in J&M First Edition;  more briefly in section 24.2.2 of J&M Second Edition)

We can couple together the annotators we have considered so far ... a POS tagger, a name annotator, and a chunk annotator ... to give us information on the low-level constituents of a sentence, quickly and with reasonable accuracy.  Can we keep goiing now and write patterns for larger constituents, such as noun phrases and sentences?

Unfortunately, for larger constituents the problem of ambiguity becomes much greater, as we have discussed before.  Syntactic patterns are not sufficient to produce accurate, unambiguous analyses.  We need to include semantic constraints, and maybe more.

Capturing semantic constraints in general is a difficult problem.  However, if we focus on a narrow domain, such as weather reports, car repair reports, (specific types of ) medical reports, or some types of financial reports, the problem gets easier.  For such a sublanguage, we can identify semantic word classes:  form classes of nouns (the types of 'entities' in the domain) and classify verbs in terms of the types of nouns they take as arguments.  The semantic co-occurrence constraints (selectional constraints) can then be captured either
We will initially (for our Jet implementation) use semantic grammar patterns.  Semantic grammars
Capturing Semantic Constraints in Jet Patterns

How do we capture the constraints in a domain?  Let's consider the executive succession domain ... keeping track of people who were hired for or who left executive jobs.  In general, articles which contain information about executive succession also talk about other stuff, but we will only be concerned for the moment with references to executive succession.  Other information in the article will be ignored.

We are going to look for patterns like

company "appointed" person "as" position
company "named" person "as" position
company "selected" person "as" position

The first problem we face in trying to make these patterns a bit more general is that we may have different inflected forms of each verb.  A headline might have a present tense, for example

WorldCom appoints Fred Smith as vice president for lunar phone service
The directors of WorldCom appoint Fred Smith ...

so maybe we need a pattern like

company ("appointed" | "appoint" | "appoints") person "as" position

That's not very convenient;  we'd like to express the pattern in terms of the base form of the verb.  Fortunately, the Jet English lexicon assigns a feature structure to every inflected form of the verb, including a pa feature of the form [head = base-form], so we can write this more succinctly as

company [constit cat=tv pa=[head=appoint]] person "as" position

[Note:  this requires that one use both the Jet lexicon and the statistical part-of-speech tagger;  in this case, the tagger is used to filter the entries provided by the lexicon, using the Jet command pruneTags.]

This still doesn't address the problem of verb groups which include "appoint" ...

Enron has appointed Fred Smith as treasurer for the day.
Enron will appoint Fred Smith as comptroller.

We could write a verb group pattern for each verb, in which we constrain the head of the verb group

vg-appoint := [constit cat=tv pa=[head=appoint]] | [constit cat=w] vg-inf-appoint | tv-vbe vg-ving-appoint;
vg-inf-appoint := [constit cat=v pa=[head=appoint]] | "be" vg-ving;
vg-ving-appoint := [constit cat=ving pa=
[head=appoint]];
when vg-appoint add [constit cat=vgroup-appoint];

and then create a unique verb group category, but that's clearly inefficient.  Instead we create a general verb group constituent which has a pa property  equal to the pa of the head of the phrase,  by writing a general verb group pattern which propagates the information from the head to the phrase.  This can be done in the Jet pattern language using a variable (a symbol beginning with a capital letter) for a feature:

vg := [constit cat=tv pa=PA-verb] | [constit cat=w] vg-inf | tv-vbe vg-ving;
vg-inf := [constit cat=v pa=PA-verb] | "be" vg-ving;
vg-ving := [constit cat=ving pa=PA-verb];
when vg add [constit cat=vgroup
pa=PA-verb];

We can take an exactly parallel approach for noun groups.  The Jet lexicon assigns to each form of the noun (singular and plural) a feature
pa = [head = base-form-of-noun  number = singular or plural], and we can propagate this information in the same way from the head of the noun group to be a feature on the noun group itself.  With rare exceptions, selectional constraints act between the heads of the noun and verb groups.

For verb groups, we still have to write

[constit cat=vgroup pa=[head=appoint]] | [constit cat=vgroup pa=[head=name]] | etc.

in order to capture the alternative (synonymous) verbs for hiring someone.  To make this neater, Jet provides a separate component -- a semantic concept hierarchy or ontology -- for grouping together related words.  The concept heirarchy allows us to create a tree of concepts, and to associate one or more words with a concept.  We associate the verbs similar to 'appoint' with a concept node cAppoint in the hierarchy, and then write

[constit cat=vgroup pa=[head?isa(cAppoint)]]

this matches any word associated with the cAppoint node, or a node below cAppoint in the hierarchy.

With this, we are ready to put together patterns for finding instances of appointment events.  We will have very modest goals for this example, and will only look for person - position pairs (we will consider how to capture the organization name in a later version).  There are three patterns we look for: In this version, very little allowance is made for modifiers which may intervene in the pattern (other than modifiers in noun groups);  the only modifier allowed is an age after a name:  "Fred Smith, 42, ".  Also, this version does not impose constraints on the classes of the noun groups, though we certainly could.  This will give us some additional recall but at some loss of precision.

Discovering Patterns:  Semi-supervised methods (J&M 22.2.2)

Developing a set of patterns for a given type of relation or event can be a laborious process which requires reading a large number of articles about corporate appointments and keeping track of how this information is expressed.  Fortunately, this process can be at least partly automated through a bootstrapping procedure:
  1. select one expression of the relation, "was named"
  2. search a corpus for pairs involved in this relation:  person1 was named post1, person2 was named post2, ...
  3. look for other expressions connecting these pairs: 
  4. the best candidates (such as X) will appear with more than one person/post pair
Some of the resulting patterns are much too general and may have to be pruned, but it is much easier to review a list of candidates than to think up new candidates.