MUC-6, the sixth in a series of Message Understanding Conferences, was
held in November 1995. This conference, like the previous five MUCs,
was organized by Beth Sundheim of the Naval Research and Development
group (NRaD) of NCCOSC (previously NOSC). These conferences, which
have involved the evaluation of information extraction systems applied
to a common task, have been funded by ARPA to measure and foster
progress in information extraction.
Prior MUCs had focused on a single task of "information
extraction": analyzing free text, identifying events of a specified
type, and filling a data base template with information about each
such event. Over the course of the five MUCs, the tasks and templates
had become increasingly complicated. A meeting in December 1993,
following MUC-5, and chaired by Ralph Grishman, defined a broader set
of objectives for the forthcoming MUCs: to push information extraction
systems towards greater portability to new domains, and to encourage
more basic work on natural language analysis by providing evaluations
of some basic language analysis technologies.
NYU and NRaD worked together to develop specifications for a
set of four evaluation tasks:
- named entity recognition
- coreference
- template elements
- scenario templates (traditional information extraction)
These tasks were refined in 1994 and early 1995 through a process of
corpus annotation and extensive e-mail discussion by the MUC-6 Planning/Annotation
Committee. This was followed by an anonymous "dry run"
evaluation, which was held in April 1995. Scores from this evaluation are
available.
To give a flavor of these tasks, we have prepared
brief examples
of each; more detailed information is given below. [The specifications
given below are those distributed prior to the conference, in June
1995; small modifications were made just before and during the
evaluation.]
The formal MUC-6 evaluation was held in September 1995, and
the MUC-6 Conference was held in Columbia, Maryland in November 1995.
A proceedings of this conference, including descriptions of the
systems from all the participants, is being assembled and will
be distributed by Morgan Kaufmann.
Named Entity Recognition
The Named Entity task for MUC-6 involved the
recognition of entity names (for people and organizations), place
names, temporal expressions, and certain types of numerical
expressions. This task is intended to be of direct practical value
(in annotating text so that it can be searched for names, places,
dates, etc.) and an essential component of many language processing
tasks, such as information extraction.
Named Entity Task Definition, version 2.0 (31 May 95)
Tokenization Rules, version 1.2 (10 Feb 95)
Coreference
The Coreference task for MUC-6 involved the identification of
coreference relations among noun phrases.
Coreference Task Definition, version 2.1 (21 Mar 95)
Information Extraction
The template-filling task for MUC-6 involved the extraction of
information about a specified class of events and the filling
of a template for each instance of such an event. In contrast to
MUC-5, the effort has been to design relatively simple templates
and to predefine the "template elements" (for people, organizations,
and artifacts) which would apply to a wide variety of different
event types.
Information Extraction Task Definition, version 1.5 (18 Apr 95)
A Revised Template Description for Time (v3)
Supplement to Time Treatment Used for MUC-5
The task specifications are available as compressed tar files
for downloading by anonymous ftp from cs.nyu.edu, directory
pub/nlp/muc6/,
in both postscript form (file ps.tar.Z
)
and text form (file text.tar.Z
).
Evaluation for the information extraction task will be done
with respect to a particular "scenario" (type of event). In
order to reduce the time which participants spend becoming expert
on a particular domain, and to encourage the development of tools
to port systems to new domains, this scenario will be released only
one month before the evaluation.
Two example scenarios are available, one involving
orders for aircraft, the other involving labor negotiations:
Sample Scenario on Aircraft Orders, version 1.1 (22 Feb 95)
Example Templates for Aircraft Orders, version 1.2 (24 Mar 95)
Sample Scenario on Labor Negotiations, version 1.4 (20 Apr 95)
The labor negotiation scenario was used for the dry run in April 1995.
The articles used for the (Aircraft Order) example templates are taken from
the Wall Street Journal and are available in machine-readable form
on the ACL/DCI disk, which is distributed by the
Linguistic Data
Consortium .
In addition, systems can be evaluated on their ability to fill the
template elements for people and organizations, independent of a
particular scenario.
(For information about other natural language processing research
at the NYU Proteus Project, click here).
(Last updated April 25, 1996.)