G22.2591 - Advanced Natural Language Processing - Spring 2011

Assignment #3

Due Sunday night April 3

For the third experiment, we will try to find paraphrases of relation expressions automatically from a large corpus, following in simplified form the approach of Brin, Agichtein, and Ravichandran. We will only use a middle context (between the two arguments), we will treat that context as a fixed token sequence, and we will only perform one iteration, pattern --> argument pairs --> patterns.

Choose a (nearly) functional relation to explore.
  1. start with one pattern, such as ", the capital of"
  2. collect pairs of named arguments (sequences of one or more capitalized tokens immediately preceding and following the pattern). You can use the ngram search engine.
  3. look for other word sequences connecting 2 or more pairs
  4. for each sequence, measure its precision at retrieving this relation
  5. compare the precision scores with your assessment of whether these are paraphrases