tagPOS | Assigns annotations of type constit to each token, with feature cat corresponding to the Penn part-of-speech tag. |
tagJet | First assigns annotations of type tagger to each token, with feature cat corresponding to the Penn part-of-speech tag. Then (using the tagger annotations) assigns annotations of type constit to each token, with features cat and number corresponding to the Jet part-of-speech encoding. |
pruneTags | This action assumes that the tokens have already been assigned constit
annotations by dictionary look-up, using the Jet part-of-speech;
words with several parts of speech will have been assigned several such
annotations. pruneTags uses the HMM tagger to select the determine
the most likely part-of-speech P of the word in context, and removes
all constit annotations except those corresponding to P. This
makes it possible to use the additional information provided by the lexicon
(base word forms, syntactic and semantic features, predicate-argument structures)
while still retaining the benefit (one part-of-speech per word) of the
tagger.
Like tagJet, pruneTags first assigns annotations of type tagger to each token, with feature cat corresponding to the Penn part-of-speech tag; it then uses these annotations to guide the pruning of the constit annotations. |
annotation type | TYPE feature | significance |
ENAMEX | ORGANIZATION | organization name |
ENAMEX | PERSON | person's name |
ENAMEX | LOCATION | location name |
TIMEX | DATE | date |
TIMEX | TIME | time |
NUMEX | MONEY | monetary expression |
NUMEX | PERCENT | percentage |
The action to assign these tags is "tagNames".
Note: A detailed description of the internal and external representation of the HMMs is provided as part of the API documentation.