G22.2590 - Natural Language Processing - Spring 2005 Prof. Grishman

Lecture 8 Outline

March 28, 2005

Finish introduction to semantic constraints and semantic grammar (Lecture 7).

Capturing Semantic Constraints in Jet Patterns

How do we capture the constraints in a domain?  Let's consider the executive succession domain ... keeping track of people who were hired for or who left executive jobs.  In general, articles which contain information about executive succession also talk about other stuff, but we will only be concerned for the moment with references to executive succession.  Other information in the article will be ignored.

We are going to look for patterns like

company "appointed" person "as" position
company "named" person "as" position
company "selected" person "as" position

The first problem we face in trying to make these patterns a bit more general is that we may have different inflected forms of each verb.  A headline might have a present tense, for example

WorldCom appoints Fred Smith as vice president for lunar phone service
The directors of WorldCom appoint Fred Smith ...

so maybe we need a pattern like

company ("appointed" | "appoint" | "appoints") person "as" position

That's not very convenient;  we'd like to express the pattern in terms of the base form of the verb.  Fortunately, the Jet English lexicon assigns a feature structure to every inflected form of the verb, including a pa feature of the form [head = base-form], so we can write this more succinctly as

company [constit cat=tv pa=[head=appoint]] person "as" position

[Note:  this requires that one use both the Jet lexicon and the statistical part-of-speech tagger;  in this case, the tagger is used to filter the entries provided by the lexicon, using the Jet command pruneTags.]

This still doesn't address the problem of verb groups which include "appoint" ...

Enron has appointed Fred Smith as treasurer for the day.
Enron will appoint Fred Smith as comptroller.

We could write a verb group pattern for each verb, in which we constrain the head of the verb group

vg-appoint := [constit cat=tv pa=[head=appoint]] | [constit cat=w] vg-inf-appoint | tv-vbe vg-ving-appoint;
vg-inf-appoint := [constit cat=v pa=[head=appoint]] | "be" vg-ving;
vg-ving-appoint := [constit cat=ving pa=
[head=appoint]];
when vg-appoint add [constit cat=vgroup-appoint];

and then create a unique verb group category, but that's clearly inefficient.  Instead we create a general verb group constituent which has a pa property  equal to the pa of the head of the phrase,  by writing a general verb group pattern which propagates the information from the head to the phrase.  This can be done in the Jet pattern language using variable (symbol beginning with a capital letter) for a feature:

vg := [constit cat=tv pa=PA-verb] | [constit cat=w] vg-inf | tv-vbe vg-ving;
vg-inf := [constit cat=v pa=PA-verb] | "be" vg-ving;
vg-ving := [constit cat=ving pa=PA-verb];
when vg add [constit cat=vgroup
pa=PA-verb];

We can take an exactly parallel approach for noun groups.  The Jet lexicon assigns to each form of the noun (singular and plural) a feature
pa = [head = base-form-of-noun  number = singular or plural], and we can propagate this information in the same way from the head of the noun group to be a feature on the noun group itself.  With rare exceptions, selectional constraints act between the heads of the noun and verb groups.

For verb groups, we still have to write

[constit cat=vgroup pa=[head=appoint]] | [constit cat=vgroup pa=[head=name]] | etc.

in order to capture the alternative (synonymous) verbs for hiring someone.  To make this neater, Jet provides a separate component -- a semantic concept hierarchy or ontology -- for grouping together related words.  The concept heirarchy allows us to create a tree of concepts, and to associate one or more words with a concept.  We associate the verbs similar to 'appoint' with a concept node cAppoint in the hierarchy, and then write

[constit cat=vgroup pa=[head?isa(cAppoint)]]

this matches any word associated with the cAppoint node, or a node below cAppoint in the hierarchy.

With this, we are ready to put together patterns for finding instances of appointment events.  We will have very modest goals for this example, and will only look for person - position pairs (we will consider how to capture the organization name in a later version).  There are three patterns we look for: In this version, very little allowance is made for modifiers which may intervene in the pattern (other than modifiers in noun groups);  the only modifier allowed is an age after a name:  "Fred Smith, 42, ".  Also, this version does not impose constraints on the classes of the noun groups, though we certainly could.  This will give us some additional recall but at some loss of precision.

Lexical Semantic Resources (J&M 16.1-16.2)

Lexical semantics studies the meaning relation between words ... words which are synonyms, antonyms, or hyponyms (where one word denotes a subclass of another).  We have seen the value of such relations for building extraction systems.

Note that many words are polysemous (have multiple senses), and these lexical relations are in general between senses, not words.  This means that one must identify which sense of a word is being used (word sense disambiguatiion - J&M 17.1-17.2) before using lexical semantic relations. 

WordNet is the most widely used taxonomy of English (J&M sec. 16.2);  similar taxonomies have been produced for many other languages (see the Global WordNet Association).