G22.2590 – Natural Language Processing – Spring 2010
The final exam
- Worth 30 points towards final grade
- Given Thursday, May 6th, 2010, 5:00 – 6:50 (usual
- Open book and notes (you will need your book!). If you have
simple calculator, that may also be helpful for questions on
or PCFG probabilities or word/document similarities.
- Five to seven questions
The questions will be taken from the following list of question
Most of these correspond directly to questions asked for
I may also ask one or two short (1-blue-book-page) essay questions
issues we have discussed in the lectures.
- Java patterns:
Write a Java pattern for a simple construct (e.g., an NYU course
number) (lecture/homework #2).
- English sentence structure: Label the constituents (NP,
VP, PP, etc.) of an English sentence based on the grammar given in
(and summarized in the handout for homework #3). If the sentence is
show its multiple parses. If the sentence violates some grammatical
describe the constraint. (homework #3).
- Context-free grammar: Extend the context-free grammar to
an additional construct, or to capture a grammatical constraint.
- Parsing: Given a very small context-free grammar, to step
the operation, or count the number of operations performed by a
backtracking parser, a bottom-up parser, or a chart parser (homework
- POS tagging: Tag a sentence using the Penn POS tags
- HMMs and the Viterbi decoder: Describe how POS tagging can
performed using a probabilistic model (J&M sec. 5.5 and chap 6;
Create an HMM from some POS-tagged training data. Trace the operation
a Viterbi decoder. Compute the likelihood of a given tag sequence and
likelihood of generating a given sentence from an HMM (homework #5).
- Chunkers and name taggers. Explain how BIO tags can
used to reduce chunking or name identification to a token-tagging task.
how chunking can be evaluated. (lecture #6). Explain how a
maximum-entropy model can be used for tagging or chunking (lecture #7
- Probabilistic CFG: Train a probabilistic CFG from some
apply this PCFG to disambiguate a sentence. Explain how this PCFG can
extended to capture lexical information. Compute
probabilities. (homework #9)
- Logical form: write the logical form of an English
with or without event reification (lecture #11).
- Jet: be able to extend, or trace the operation, of one of
Jet pattern sets we have distributed and discussed (for noun and verb
and for appointment events). Analyze and correct a shortcoming in
the appointment patterns (homework #10).
- Reference resolution:
analyze a reference resolution problem -- identify the constraints
and preference which
would lead a system to select the correct antecedent (lecture #10).
- Word sense disambiguation:
given a word with two senses and small training set of contexts for
each of the two senses. apply the naive Bayes procedure to resolve the
sense of the word in a test case (J&M 20.2.2, lecture #12)
- Bag-of-word methods:
compute the similarity of two bags of words using normalized dot
product (eqn. 23.7 in the text). Explain the tf and idf factors
(sec. 23.1.2). Describe the role of the similarity metric in
question answering and summarization (Lecture #13).