Tokenization Rules
...<ENAMEX TYPE="ORGANIZATION">Jaguar</ENAMEX> company in <ENAMEX TYPE="LOCATION">Britain</ENAMEX>.
"U.S.-based"
"U.S.-Japan trade negotiations"
"an Eaton-Sumitomo joint venture"
"PHILADELPHIA--A new recycling center has been built."
"Guiness' Schenley Industries"
""IBM stock fell today," he said" [note the double quote preceding IBM]
Generated with CERN WebMaker
1.2 Examples with hyphen or dash
"Chicago-based"<ENAMEX TYPE="LOCATION">Chicago</ENAMEX>-based
<ENAMEX TYPE="LOCATION">U.S.</ENAMEX>-based
<ENAMEX TYPE="LOCATION">U.S.</ENAMEX>-<ENAMEX TYPE="LOCATION">Japan</ENAMEX> trade negotiations
an <ENAMEX TYPE="ORGANIZATION">Eaton</ENAMEX>-<ENAMEX TYPE="ORGANIZATION">Sumitomo</ENAMEX> joint venture
<ENAMEX TYPE="LOCATION">PHILADELPHIA</ENAMEX>--A new recycling center has been built.
1.3 Examples with apostrophe
"California's"<ENAMEX TYPE="LOCATION">California</ENAMEX>'s
<ENAMEX TYPE="ORGANIZATION">Guiness</ENAMEX>' <ENAMEX TYPE="ORGANIZATION">Schenley Industries</ENAMEX>
1.4 Examples with other punctuation
"(IBM)"(<ENAMEX TYPE="ORGANIZATION">IBM</ENAMEX>)
"<ENAMEX TYPE="ORGANIZATION">IBM</ENAMEX> stock fell today," he said
1.5 Examples with special characters
"US$10"<ENAMEX TYPE="LOCATION">US</ENAMEX><NUMEX TYPE="MONEY">$10</NUMEX>
Tokenization Rules - 14 JUN 95
[Next] [Previous] [Top] [Back to MUC-6 main page]