MUC-6

MUC-6, the sixth in a series of Message Understanding Conferences, was held in November 1995. This conference, like the previous five MUCs, was organized by Beth Sundheim of the Naval Research and Development group (NRaD) of NCCOSC (previously NOSC). These conferences, which have involved the evaluation of information extraction systems applied to a common task, have been funded by ARPA to measure and foster progress in information extraction.

Prior MUCs had focused on a single task of "information extraction": analyzing free text, identifying events of a specified type, and filling a data base template with information about each such event. Over the course of the five MUCs, the tasks and templates had become increasingly complicated. A meeting in December 1993, following MUC-5, and chaired by Ralph Grishman, defined a broader set of objectives for the forthcoming MUCs: to push information extraction systems towards greater portability to new domains, and to encourage more basic work on natural language analysis by providing evaluations of some basic language analysis technologies.

NYU and NRaD worked together to develop specifications for a set of four evaluation tasks:

named entity recognition
coreference
template elements
scenario templates (traditional information extraction)

These tasks were refined in 1994 and early 1995 through a process of corpus annotation and extensive e-mail discussion by the MUC-6 Planning/Annotation Committee. This was followed by an anonymous "dry run" evaluation, which was held in April 1995. Scores from this evaluation are available. To give a flavor of these tasks, we have prepared brief examples of each; more detailed information is given below. [The specifications given below are those distributed prior to the conference, in June 1995; small modifications were made just before and during the evaluation.]

The formal MUC-6 evaluation was held in September 1995, and the MUC-6 Conference was held in Columbia, Maryland in November 1995. A proceedings of this conference, including descriptions of the systems from all the participants, is being assembled and will be distributed by Morgan Kaufmann.

Named Entity Recognition

The Named Entity task for MUC-6 involved the recognition of entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions. This task is intended to be of direct practical value (in annotating text so that it can be searched for names, places, dates, etc.) and an essential component of many language processing tasks, such as information extraction.

Named Entity Task Definition, version 2.0 (31 May 95)

Tokenization Rules, version 1.2 (10 Feb 95)

Coreference

The Coreference task for MUC-6 involved the identification of coreference relations among noun phrases.

Coreference Task Definition, version 2.1 (21 Mar 95)

Information Extraction

The template-filling task for MUC-6 involved the extraction of information about a specified class of events and the filling of a template for each instance of such an event. In contrast to MUC-5, the effort has been to design relatively simple templates and to predefine the "template elements" (for people, organizations, and artifacts) which would apply to a wide variety of different event types.

Information Extraction Task Definition, version 1.5 (18 Apr 95)

A Revised Template Description for Time (v3)

Supplement to Time Treatment Used for MUC-5

The task specifications are available as compressed tar files for downloading by anonymous ftp from cs.nyu.edu, directory pub/nlp/muc6/, in both postscript form (file ps.tar.Z) and text form (file text.tar.Z). Evaluation for the information extraction task will be done with respect to a particular "scenario" (type of event). In order to reduce the time which participants spend becoming expert on a particular domain, and to encourage the development of tools to port systems to new domains, this scenario will be released only one month before the evaluation.

Two example scenarios are available, one involving orders for aircraft, the other involving labor negotiations:

Sample Scenario on Aircraft Orders, version 1.1 (22 Feb 95)

Example Templates for Aircraft Orders, version 1.2 (24 Mar 95)

Sample Scenario on Labor Negotiations, version 1.4 (20 Apr 95)

The labor negotiation scenario was used for the dry run in April 1995. The articles used for the (Aircraft Order) example templates are taken from the Wall Street Journal and are available in machine-readable form on the ACL/DCI disk, which is distributed by the Linguistic Data Consortium . In addition, systems can be evaluated on their ability to fill the template elements for people and organizations, independent of a particular scenario.

(For information about other natural language processing research at the NYU Proteus Project, click here).

(Last updated April 25, 1996.)