G22.2590 – Natural Language Processing – Spring 2008

The final exam

• Worth 30 points towards final grade
• Given Thursday, May 8th, 2008, 5:00 – 6:50 (usual class day, time, place)
• Open book and notes (you will need your book!).  If you have a simple calculator, that may also be helpful if we have a question on HMM or PCFG probabilities.
• Five to seven questions

The questions will be taken from the following list of question types. Most of these correspond directly to questions asked  for homework. I may also ask one or two short (1-blue-book-page) essay questions about the issues we have discussed in the lectures.

1. English sentence structure: Label the constituents (NP, VP, PP, etc.) of an English sentence based on the grammar given in Chapter #9 (and summarized in the handout for homework #2). If the sentence is ambiguous, show its multiple parses. If the sentence violates some grammatical constraint, describe the constraint. (homework #2).
2. Context-free grammar: Extend the context-free grammar to cover an additional construct, or to capture a grammatical constraint. (homework #2).
3. Parsing: Given a very small context-free grammar, to step through the operation, or count the number of operations performed by a top-down backtracking parser, a bottom-up parser, or a chart parser (homework #3).
4. POS tagging: Tag a sentence using the Penn POS tags (homework #3).
5. HMMs and the Viterbi decoder: Describe how POS tagging can be performed using a probabilistic model (J&M sec. 8.5; lecture 4 notes). Create an HMM from some POS-tagged training data. Trace the operation of a Viterbi decoder. Compute the likelihood of a given tag sequence and the likelihood of generating a given sentence from an HMM (homework #4).
6. Feature grammar: Augment a context-free grammar using the feature formalism of J&M 11.3 to capture a grammatical constraint (homework #5).
7. Chunkers and name taggers.  Explain how BIO tags can be used to reduce chunking or name identification to a token-tagging task.  Explain how chunking can be evaluated. (lecture #7).  Explain how a maximum-entropy model can be used for tagging or chunking (lecture and homework #8).
8. Probabilistic CFG: Train a probabilistic CFG from some parses; apply this PCFG to disambiguate a sentence. Explain how this PCFG can be extended to capture lexical information.  Compute lexically-conditioned probabilities.  (homework #9)
9. Logical form: write the logical form of an English sentence, with or without event reification (J&M chap. 14 and 15.1;  lecture #11).
10. Jet: be able to extend, or trace the operation, of one of the Jet pattern sets we have distributed and discussed (for noun and verb groups, and for appointment events).  Analyze and correct a shortcoming in the appointment patterns (homework #10).