G22.2590 - Natural Language Processing - Spring 2005 Prof. Grishman

Lecture 5 Outline

February 28, 2005

Review homework #2
    drawing parse trees ... having the structure reflect the modification relations
    capturing constraints for pronouns
        case constraint;  number constraint
        capturing these constraints in rules
        tensed vs. untensed (infinitive form) verbs;  combining them with modals

The JET tagger: 
    handling unknown words
    possible improvements and extensions (trigram model;  morphology)

Feature Grammars

Problems of context-free grammars

Capturing constraints:
Number agreement
Case constraints (only on pronouns in English)
Count noun constraint
… these can be captured by context-free grammar, but not efficiently (as we have seen from recent homework)

Regularization - accounting for ‘displaced constituents’ / movement

Feature structures (J&M 11.1)

Instead of having atomic symbols ('noun', 'NP'), the nodes of the parse tree will have feature structures:  sets of feature-value pairs (or attribute-value pairs). We will represent these in the form [attribute1 = value1, attribute2 = value2, ...].  For example, third-person-singular could be represented as [number = singular, person = 3].

We can include the category as a feature cat in the feature structure:  [cat = NP, number = singular, person = 3].  We can also nest the feature structures, with the value of a feature being another feature structure:  [cat = NP, agreement = [number = singular, person = 3]].

If X is a feature structure, then we will write the value of feature f of X as X.f.  If the feature structure does not specify a value for f, we say X.f = null;  furthermore, null.f = null.

A feature path is a sequence of one or more feature names which is used to select a value from a feature structure:  <f1 f2> applied to X gets (X.f1).f2.  For example, <agreement number> applied to [cat = NP, agreement = [number = singular, person = 3]] yields 'singular'.

The nested feature structures can be represented as trees with labeled arcs, and the feature paths as paths in these trees.

We may generalize these structures to graphs, corresponding to reentrant feature structures.  These will prove convenient for capturing constraints between constituents in our grammar (J&M p. 399).


Unification is a binary operation on feature structures.  It can fail (structures are not unifiable -- no result is returned), or it can succeed and return a new feature structure.

The definition of unify(X,Y) is recursive:
    if X = null, return Y
    if Y = null, return X
    if X and Y are the same atomic value, return X
    if X and Y are both feature structures, then
        create a new feature structure Z
        for each feature f in either X or Y
            add f with the value unify(X.f, Y.f) to Z
                (note that if X.f and Y.f cannot be unified, the entire process fails)
        return Z
    else fail

Note that this version does not account for reentrant structures.

J&M describe a version which destructively modifies the arguments and accounts for re-entrancy, p. 423.

Expressing Constraints in the Grammar

To a rule
    S := A B C
we can add a constraint of the form
    <node.path> = atomic-value
    <node.path> = <node'.path'>
where node and node' are one of S, A, B, or C.  Here the operation '=' is to be interpreted as unification.

To capture number agreement, we would write
    S := NP VP
        <NP number> = <VP number>
If we wanted to check agreement in both number and person, we could write
    S := NP VP
        <NP number> = <VP number>
        <NP person> = <VP person>
or we could group both number and person under an 'agreement' feature, as shown above, and write
    S := NP VP
        <NP agreement> = <VP agreement>

Similarly, we could check determiner - head agreement by
    NP := DET N
        <DET agreement> = <N agreement>

The set of constraints associated with a rule can be 'compiled' into a feature graph which will be associated with that rule.

Lexical features

Dictionary definitions will now not be simple word categories, but rather feature structures.  These can be coded as additional productions

    N := mouse
        <N number> = singular
    N := mice
        <N number> = plural

or (for the Jet system) as a feature-value list,

    "mouse"    cat = n, number = singular;
    "mice"       cat = n, number = plural;

(A feature can be omitted to indicate that it can take on either value:    "fish"    cat = n; )

Feature propagation and head features

We require rules of the form
    NP := DET N
        <NP agreement> = <N agreement>
    VP := V NP
        <VP agreement> = <V agreement>
to propagate features up the tree.  In most cases, we will find that a group of features are systematically passed from one child node to the parent.  These features are called head features, and the child is called the head of the phrase.  Thus the N is the head of the NP, and the V the head of the VP.

Parsing and feature constraints

To modify a parser to incorporate feature tests
    extend parser to keep a feature structure with each node of the parse tree
        (with each state of a chart parser)
    apply the feature tests when completing a node;  reject the node if the tests fail
        (for the chart parser, whenever an active state A -> B . C D is extended by adding a child B, the feature structure under B of the active state is unified with the feature structure under B of the child;  if the unification fails, the new state is rejected (J&M p. 430-431))

In searching the chart for a prior node, we must now check that not just the grammar symbol(s) and start and end nodes are the same, but also that the feature structures are the same (J&M point out that a subsumption test for feature structures is sufficient -- p. 432).