G22.2591 - Advanced Natural Language Processing - Spring 2009
Due Wednesday April 1
For the third experiment, we will try to
find paraphrases of relation expressions automatically
from a large corpus. The basic idea, which we will
explore in the papers of Brin and Agichtein,
works as follows:
- start with one pattern, such as ", the capital of"
If you're interested in working on the TAC KBP,
pick one of the KBP relations.
- collect pairs of named arguments (sequences of one
or more capitalized tokens immediately preceding and
following the pattern). You can use the ngram search engine.
- look for other word sequences connecting 2 or more pairs
- judge how many of these are paraphrases
Also it's time to think about a term project
Semi-supervised experiments at name tagging, hyponymy, or
relation extraction are just some of the possibilities.
If you have a clear idea of what you want to do, send me
an abstract by April 1.
One timely possibility is to help us in preparing for a
possible TAC KBP evaluation. If several students opt for
this, we would do it collaboratively, comparing notes each
KBP involves both slot-filling (relation finding) and linking
tasks; we will focus on the slot-filling, since it matches
what we are currently covering in the course. The first
step is to study an individual relation (attribuite) and get a better
appreciation of the problem. You may want to study the
relation you are working on for Assignment #3.
Things we need to understand include
To answer some of these questions, you will need to collect a fair
number of examples of the relation.
- will we need lots of patterns for good coverage?
- what type is the slot value? (is it a type currently
recognized by a name tagger?)
- how tightly do we need to constrain the slot value
- do we require anaphora resolution to get good coverage?
- how can we estimate system recall?
If you are taking this path, please send a progress report with
answers to (some of) these questions by April 1.