Onomasticon (Name Dictionary)


action name
tagNamesFromOnoma
resources required
onomasticon (name dictionary)
properties
Onoma.fileName
annotations required
token
annotations added
ENAMEX
isName

Jet provides two meams of tagging names: a statistical name model, implemented as an HMM or MEMM, and a name dictionary, formally called an onomasticon. Each line in the onomasticon defines a single name and should consist of one or more tokens separated by spaces, a tab character, and a name type; a second tab and an entitiy subtype are optional. For example, the line

New York (tab) GPE

defines "New York" as a geo-political entity name; the line

New York (tab) GPE (tab) Population-Center

further specifies it as being of subtype Population-Ceenter. Matches must be exact, including case. In case of ambiguity, the longest match is preferred. Nested matches are not recognized; after a name is matched, the matcher advances to the first token following the matched name.

It is possible to use both a name dictionary and a statistical name tagger. In this case the onoma tagger is applied first and takes precedence; the statistical tagger is applied second and tags tokens which have not been tagged by the onoma tagger:

processSentence = ..., tagNamesFromOnoma, tagNames, ...

This currently only works with the MEMM tagger, not with the HMM tagger.