G22.2591 - Advanced Natural Language Processing - Spring 2004

Lecture 6

Name Recognition, cont'd

Unsupervised learning of names

The methods described to date require the preparation of a substantial amount of training data.  Can this requirement be reduced?  We will look today at several papers which investigate this issue:

Tomek Strzalkowski; Jin Wang.  A Self-Learning Universal Concept Spotter. COLING 96.

presentation by Iman Sen.

Michael Collins; Yoram Singer.  Unsupervised Models for Named Entity Classification.  EMNLP 99.

presentation by Ben Wellington.

Silviu Cucerzan; David Yarowsky.  Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence.  EMNLP 99.

presentation by Kiyoshi Sudo.

if time permits, out local effort:  Roman Yangarber; Winston Lin; Ralph Grishman.  Unsupervised Learning of Generalized Names.  COLING 2002.

All of these efforts aim to produce taggers with moderate performance starting with minimal resources.  In principle, it should be possible to use similar techniques to improve the performance of good taggers (taggers with substantial training data) or to adapt good taggers to shifts in domain, but (as far as I know) these crucial issues have not yet been explored.