[Next] [Previous] [Top] [Back to MUC-6 main page]

Named Entity Task Definition

4 GUIDELINES FOR MARKUP OF EXCEPTIONAL CONSTRUCTIONS

4.1 - Expressions Involving Elision
4.2 - Effects of Tokenization Conventions
4.3 - Nested Expressions

4.1 Expressions Involving Elision

Multi-name expressions containing conjoined modifiers (with elision of the head of one conjunct) should be marked up as separate expressions.

"North and South America"

<ENAMEX TYPE="LOCATION">North</ENAMEX> and <ENAMEX TYPE="LOCATION">South America</ENAMEX>

A similar case involving elision with number expressions:

"10- and 20-dollar bills"

<NUMEX TYPE="MONEY">10</NUMEX>- and <NUMEX TYPE="MONEY">20-dollar</NUMEX> bills

In contrast, there is no elision in the case of single-name expressions containing conjoined modifiers; such expressions should be marked up as a single expression.

"U.S. Fish and Wildlife Service"

<ENAMEX TYPE="ORGANIZATION">U.S. Fish and Wildlife Service</ENAMEX>

The subparts of range expressions should be marked up as separate expressions.

"175 to 180 million Canadian dollars"

<NUMEX TYPE="MONEY">175</NUMEX> to <NUMEX TYPE="MONEY">180 million Canadian dollars</NUMEX>

"the 1986-87 academic year"

the <TIMEX TYPE="DATE">1986</TIMEX>-<TIMEX TYPE="DATE" ALT="87">87 academic year</TIMEX>

4.2 Effects of Tokenization Conventions

The systems must incorporate certain tokenization conventions. These conventions are contained in a separate document titled "Tokenization Rules."

The tokenization conventions for MUC-6 have an impact on the boundaries of the strings to be tagged. For example, the conventions call for treating possessive forms, e.g., "California's," as multiple tokens, unless there is a name such as "McDonald's [burger company]" that is inherently possessive. See the separate documentation titled "Tokenization Rules" for further information and examples.

4.3 Nested Expressions

There are cases where LOCATION (ENAMEX) expressions are to be tagged when they occur within TIMEX and NUMEX expressions. See Appendices B and C for a description of cases where markup of nested expressions may occur. Entity names that appear within ENAMEX tags are *not* to be tagged.


Named Entity Task Definition - 02 JUN 95
[Next] [Previous] [Top] [Back to MUC-6 main page]

Generated with CERN WebMaker