G22.2591 - Advanced Natural Language Processing - Spring 2011
Due Sunday night April 3
For the third experiment, we will try to
find paraphrases of relation expressions automatically
from a large corpus, following in simplified form the
approach of Brin, Agichtein, and Ravichandran.
We will only use a middle context (between the two arguments),
we will treat that context as a fixed token sequence,
and we will only perform one iteration,
pattern --> argument pairs --> patterns.
Choose a (nearly) functional relation to explore.
- start with one pattern, such as ", the capital of"
- collect pairs of named arguments (sequences of one
or more capitalized tokens immediately preceding and
following the pattern). You can use the ngram search engine.
- look for other word sequences connecting 2 or more pairs
- for each sequence, measure its precision at retrieving
- compare the precision scores with your assessment of
whether these are paraphrases