action
names |
tagPOS tagJet pruneTags |
resources
required |
HMM
part-of-speech model |
properties |
Tags.fileName |
annotations
required |
token |
annotations
added |
constit tagger |
tagPOS | Assigns annotations of type constit to each token, with feature cat corresponding to the Penn part-of-speech tag. |
tagJet | First assigns annotations of type tagger to each token, with feature cat corresponding to the Penn part-of-speech tag. Then (using the tagger annotations) assigns annotations of type constit to each token, with features cat and number corresponding to the Jet part-of-speech encoding. |
pruneTags | This action assumes that the tokens have already been
assigned constit
annotations by dictionary look-up, using the Jet part-of-speech;
words with several parts of speech will have been assigned several such
annotations. pruneTags uses the HMM tagger to select the
determine
the most likely part-of-speech P of the word in context, and
removes
all constit annotations except those corresponding to P.
This
makes it possible to use the additional information provided by the
lexicon
(base word forms, syntactic and semantic features, predicate-argument
structures)
while still retaining the benefit (one part-of-speech per word) of the
tagger. It also provides more accurate tagging of words in the
lexicon but not known to the statistical tagger. Like tagJet, pruneTags first assigns annotations of type tagger to each token, with feature cat corresponding to the Penn part-of-speech tag; it then uses these annotations to guide the pruning of the constit annotations. |