action
name |
tagNamesFromOnoma |
resources
required |
onomasticon (name dictionary)
|
properties |
Onoma.fileName |
annotations
required |
token |
annotations
added |
ENAMEX isName |
Jet provides two meams of tagging names: a statistical name model, implemented as an
HMM or MEMM, and a name dictionary, formally called an onomasticon. Each line in the
onomasticon defines a single name and should consist of one or more tokens separated by spaces,
a tab character, and a name type; a second tab and an entitiy subtype are optional.
For example, the line
New York (tab) GPE
defines "New York" as a geo-political entity name; the line
New York (tab) GPE (tab) Population-Center
further specifies it as being of subtype Population-Ceenter.
Matches must be exact, including case.
In case of ambiguity, the longest match is preferred. Nested matches are not recognized;
after a name is matched, the matcher advances to the first token following the matched name.
It is possible to use both a name dictionary and a statistical name tagger. In this
case the onoma tagger is applied first and takes precedence; the statistical tagger
is applied second and tags tokens which have not been tagged by the onoma tagger:
processSentence = ..., tagNamesFromOnoma, tagNames, ...
This currently only works with the MEMM tagger, not with the HMM tagger.