[Next] [Previous] [Top] [Back to MUC-6 main page]

Coreference Task Definition

4 HOW MUCH OF THE MARKABLE TO ANNOTATE

4.1 - Head of a Phrase
4.2 - Maximal Noun Phrase
4.3 - Exceptions: Articles

The task is defined in order to allow maximal latitude for systems in identifying markables, and to decouple the evaluation from that of accurately parsing noun phrases. Accordingly, the string generated by a system to identify a markable must include the head of the markable (as defined below) and may include any additional text up to a maximal noun phrase (as defined below).

In preparing the key, the text element to be enclosed in SGML tags is the maximal noun phrase; the head will be designated by the MIN attribute.

[We expect that in the future it may be possible, when separate noun phrase bracketings are available, to automatically generate the maximal NP markup from a markup using only heads.]

4.1 Head of a Phrase

For most noun phrases, the head will be the main noun, without its left and right modifiers.

<COREF MIN="task" ...>the coreference task</COREF>

<COREF MIN="contract" ...>the last contract</COREF> you will ever get

<COREF MIN="quantity" ...>a large quantity of sugar</COREF>

<COREF MIN="tons" ...>about 200,000 tons of sugar</COREF>

If the head is a name, the entire name is marked. This includes suffixes such as "Sr.", "III", etc. on personal names and "Corp." on organization names; it does not include personal titles or any modifiers. We follow in this regard the rules for marking personal and organization names for the Named Entity task.

<COREF MIN="Frederick F. Fernwhistle Jr." ...>the Honorable Frederick F. Fernwhistle Jr.</COREF>

<COREF MIN="Ford Motor Co." ...>Ford Motor Co. of Dearborn, Michigan<COREF>

<COREF MIN="Georg Rath" ...>Herr Dr. Georg Rath</COREF>

In the case of location designators consisting of multiple names, each name is considered a separate unit (as in the Named Entity task) and the head is generally the first of these names, with the others treated as modifiers of the first name:

<COREF MIN="Newark" ...>Newark, New Jersey</COREF>

Dates, currency amounts, and percentages are also treated as atomic units, as in the Named Entity task:

<COREF MIN="December 7, 1941" ...> December 7, 1941, a day which will live in infamy,</COREF>

<COREF MIN="$1.2 million" ...>$1.2 million in crisp bills</COREF>

<COREF MIN="20%">20% of the shares</COREF>

In the case of "headless" constructions, the "head" -- for coreference purposes -- shall be the last token of the noun phrase preceding any prepositional phrases, relative clauses, and other "right modifiers":

<COREF MIN="seven" ...>seven of the best</COREF>

<COREF MIN="five" ...>the five who were left standing</COREF>

<COREF MIN="youngest" ...>the six youngest</COREF>

If the maximal noun phrase is the same as the head, the MIN need not be marked.

4.2 Maximal Noun Phrase

The maximal noun phrase includes all text which may be considered a modifier of the noun phrase. This includes (among other modifiers) appositional phrases, non-restrictive relative clauses, and prepositional phrases which may be viewed as modifiers of the noun phrase or of a containing clause:

*Mr. Holland*

*the senior of the executives who will assume Holland's duties*

*the rumor that the war had ended*

*Fred Frosty, the ice cream king of Tyson's Corner,*

*the Penn Central Co., which used to run a railroad,*

XYZ Inc. formed *a joint venture with Sony*

Note that in the fourth and fifth cases the final comma may be viewed as part of the NP, and so is included in the maximal NP; in the last case, "with Sony" could equally well be taken to modify "venture" or "formed", and so is included as part of the maximal NP around "venture". Note also that in the "Fred Frosty" example, there is a coreference between the entire noun phrase and the appositional phrase, "the ice cream king of Tyson's Corner"; see section 5.3 for a discussion of this construct.

In the case of a pair of conjoined noun phrases with shared complements or modifiers, the maximal noun phrases will NOT include the conjunct. The maximal NP for the first conjunct will include all of the NP up to the conjunction; the maximal NP for the second conjunct will include all of the NP following the conjunction:

<COREF ID="1" MIN="Fribble">Ms. Fribble</COREF> was <COREF ID="2" REF="1" TYPE="IDENT" STAT="OPT">president</COREF> and <COREF ID="3" REF="1" TYPE="IDENT" STAT="OPT" MIN="CEO"> CEO of Amalgamated Text Processing Inc.</COREF>

4.3 Exceptions: Articles

If the only difference between the head and the maximal noun phrase is the presence of an article -- the word "the", "a", or "an" at the beginning of the noun phrase -- the MIN need not be explicitly marked. (The scoring program will automatically strip leading articles before comparing strings.)


Coreference Task Definition - 31 MAY 95
[Next] [Previous] [Top] [Back to MUC-6 main page]

Generated with CERN WebMaker