Named Entity Task Definition
"North and South America"
<ENAMEX TYPE="LOCATION">North</ENAMEX> and <ENAMEX TYPE="LOCATION">South America</ENAMEX>
A similar case involving elision with number expressions:
"10- and 20-dollar bills"
<NUMEX TYPE="MONEY">10</NUMEX>- and <NUMEX TYPE="MONEY">20-dollar</NUMEX> bills
In contrast, there is no elision in the case of single-name expressions containing conjoined modifiers; such expressions should be marked up as a single expression.
"U.S. Fish and Wildlife Service"
<ENAMEX TYPE="ORGANIZATION">U.S. Fish and Wildlife Service</ENAMEX>
The subparts of range expressions should be marked up as separate expressions.
"175 to 180 million Canadian dollars"
<NUMEX TYPE="MONEY">175</NUMEX> to <NUMEX TYPE="MONEY">180 million Canadian dollars</NUMEX>
"the 1986-87 academic year"
the <TIMEX TYPE="DATE">1986</TIMEX>-<TIMEX TYPE="DATE" ALT="87">87 academic year</TIMEX>
4.2 Effects of Tokenization Conventions
The systems must incorporate certain tokenization conventions. These
conventions are contained in a separate document titled "Tokenization Rules."
The tokenization conventions for MUC-6 have an impact on the boundaries of the strings to be tagged. For example, the conventions call for treating possessive forms, e.g., "California's," as multiple tokens, unless there is a name such as "McDonald's [burger company]" that is inherently possessive. See the separate documentation titled "Tokenization Rules" for further information and examples.
4.3 Nested Expressions
There are cases where LOCATION (ENAMEX) expressions are to be tagged
when they occur within TIMEX and NUMEX expressions. See Appendices B and C for a description of cases where markup of nested expressions may occur. Entity names that appear within ENAMEX tags are *not* to be tagged.
Generated with CERN WebMaker