G22.2590 - Natural Language Processing -- Spring 2008 -- Prof. Grishman
February 7, 2008
You may want to use the Jet parser for the first three exercises,
particularly for #2. Keep in mind in doing these exercises that many
words have several parts of speech.
- Using the tiny grammar below, draw the
two parse trees for the sentence “The fair features live music.”
(If you do this with Jet, note that this grammar is slightly different
from that provided as grammar1.)
Suggest a linguistically-reasonable constraint which would
resolve this ambiguity.
sentence := np v | np v np;
np := n | art n | adj n | art adj
- Using the same tiny grammar,
the efficiency of the top-down backtracking parser, a bottom-up
constituent) parser, and a top-down chart parser on the sentence “The
fair answers questions.” For the two grammar symbols, sentence and np
Modify this grammar to capture subject-verb number
agreement. Does it now produce one parse?
J&M exercise 8.1. (practice tagging some text)
- compare the backtracking and chart parsers
with respect to the number of times each symbol is expanded (indicated
by the message "Seeking ..." on both parsers). For each
parser, report the count separately for sentence and np.
- compare all three parsers with respect
to the number of times a complete constituent
using that symbol is
generated (This is indicated by the "Found" message on the top-down
parser, the "Adding" message on the bottom-up parser, and the "Adding"
message for a complete (inactive)
edge for the chart parser.) Again, report the count
separately for sentence
(4 points: 1 point each)
- the names of some of the parts of speech in the Penn Treebank
were later modified to avoid conflicts with phrase categories.
For example, personal pronoun was changed from PP to PRP to avoid
conflict with the symbol for "Prepositional Phrase".
Unfortunately, Table 8.6 in the book uses the old set, while the
exercise uses the new set. You can find the new set in
the Jet documentation.
- in the Penn set, the word "to" is always tagged "TO", whether
it is an infinitival marker or a preposition
- modal verbs (MD) are those which do not take an "-s" suffix
in the third person singular present tense; "do" is not a modal
- the VB (base or infinitive of the verb) and VBP (present
tense plural of the verb) forms are the same for all verbs except the
verb "be" (the VB form is "be"; the VBP forms are "are" and
"am"), so you must distinguish by context:
- The main clause of a sentence must be tensed, so if a verb
appears by itself in a main clause it must be the tensed form.
For example, in "They bake cookies." , "bake" is a VBP. You can
verify this by changing the subject to a singular and seeing that the
verb changes: "He bakes cookies." (here "bakes" is a VBZ).
- On the other hand, a verb following a modal, a form
of "do", or "to" is an infinitive. In "They want to bake cookies"
or "They can bake cookies." , "bake" is a VB. You can verify this
by changing the subject to a singular and seeing that the form of
"bake" does not change.
Due February 14th.