Candidate: Bonan Min
Advisor: Ralph Grishman

Committee:

Prof. Ralph Grishman, NYU (advisor, reader)
Prof. Satoshi Sekine, NYU (reader)
Prof. Heng Ji, CUNY (reader)
Prof. Ernest Davis, NYU (auditor)
Prof. Dennis Shasha, NYU (auditor)
Date: Thursday, May 2nd, 2013
Time: 10:30 am
Room: 719 Broadway Room 709

Title: Relation Extraction with Weak Supervision and Distributional Semantics

Abstract:

Relation Extraction aims at detecting and categorizing semantic relations between pairs of entities in unstructured text. It benefits an enormous number of applications such as Web search and Question Answering. Traditional approaches for relation extraction either rely on learning from a large number of accurate human-labeled examples or pattern matching with hand-crafted rules. These resources are very laborious to obtain and can only be applied to a narrow set of target types of interest.

This talk focuses on learning relations with little or no human supervision. First, we examine the approach that treats relation extraction as a supervised learning problem. We develop an algorithm that is able to train a model with approximately 1/3 of the human-annotation cost and that matches the performance of models trained with high-quality annotation. Second, we investigate distant supervision, a weakly supervised algorithm that automatically generates its own labeled training data. We develop a latent Bayesian framework for this purpose. By using a model which provides a better approximation of the weak source of supervision, it outperforms the state-of-the-art methods. Finally, we investigate the possibility of building all relational tables beforehand with an unsupervised relation extraction algorithm. We develop an effective yet efficient algorithm that combines the power of various semantic resources that are automatically mined from a corpus based on distributional semantics. The algorithm is able to extract a very large set of relations from the web at high precision.