G22.2591 - Advanced Natural Language Processing - Spring 2011

Lecture 11

Unsupervised learning of semantics

Today we will be looking at three very different papers which each address the problem of learning semantics from scratch with minimal or no supervision.

Yusuke Shinyama and Satoshi Sekine.  Preemptive information extraction using unrestricted relation discovery.  HLT-NAACL 2006.

Considers how we might identify all the frequently recurring relation and event types in a large news corpus.  First groups articles into clusters representing the same event.  Then groups clusters into metaclusters representing the same type of event.  Identifies shared basic pattterns across clusters in a metacluster;  these are transformed into the columns of a table of extracted information.

Percy Liang, Michael Jordan, and Dan Klein.  Learning semantic correspondences with less supervision.  ACL-IJCNLP 2009.

Considers how we might learn the correspondence between a textual description of an event and a structured (data base) with information (data base records) on the same event.  Allows for the possibility that some information may appear only in the text, some only in the data base.  Builds a 3-level generative model (select records to report;  select fields in records to report;  generates words for field), then uses EM to align the model with the text, setting parameters for each level.  Evaluated in terms of quality of alignment.

Hoifung Poon and Pedro Domingos.  Unsupervised semantic parsing.  EMNLP 2009.

Seeks to do unsupervised semantic analysis starting from dependency trees. Builds clusters of synonymous syntactic and semantic relations in order to account for paraphrases both at the syntactic level and the semantic level. Evaluates success by generating logical forms for a collection of GENIA abstracts and then answering questions about these texts by matching logical forms.

Coming next: anaphora resolution.