[Next] [Previous] [Top] [Back to MUC-6 main page]

Tokenization Rules

3 Hyphen at end of line

3.1 - When a hyphen is used at the end of a line to separate a single word into two parts, the word is treated as a single token.
3.2 - If, however, the word is naturally hyphenated and the hyphenated word just happens to be broken at the hyphen at the end of a line, the parts of the word are treated as separate tokens.

3.1 When a hyphen is used at the end of a line to separate a single word into two parts, the word is treated as a single token.

"Phila-

delphia"

<ENAMEX TYPE="LOCATION">Phila-

delphia</ENAMEX>

3.2 If, however, the word is naturally hyphenated and the hyphenated word just happens to be broken at the hyphen at the end of a line, the parts of the word are treated as separate tokens.

"Chicago-

based"

<ENAMEX TYPE="LOCATION">Chicago</ENAMEX>-

based


Tokenization Rules - 14 JUN 95
[Next]
[Previous] [Top] [Back to MUC-6 main page]

Generated with CERN WebMaker