Jet.HMM
Class HMMTagger

java.lang.Object
  extended byJet.HMM.HMMTagger

public class HMMTagger
extends java.lang.Object

a POS (part-of-speech) tagger using a bigram model. The tagger makes use of the generic HMM (Hidden Markov Model) mechanism.


Field Summary
static boolean trace
          if true, use HMMannotator trace to write a one-line message about each part-of-speech assignment to Console.
 
Constructor Summary
HMMTagger()
          create a new HMM-based part-of-speech tagger.
 
Method Summary
 void annotate(Document doc, Span span, java.lang.String type)
          tag 'span' of 'doc' according to the Penn Tree Bank tag set.
 void load(java.lang.String fileName)
          load the HMM associated with this tagger from file 'fileName'.
 void prune(Document doc, Span span)
          prune existing 'constit' annotations on 'span' of 'doc' using information from a part-of-speech tagger.
 void score(Document doc, Document key)
          compare the 'constit' tags of Documents 'doc' and 'key', and report (to System.out) the agreement rate.
 void store(java.lang.String fileName)
          store the HMM associated with this tagger to file 'fileName'.
 void tagJet(Document doc, Span span)
          tag 'span' of 'doc' according to the Jet part of speech set.
 void tagPenn(Document doc, Span span)
          tag 'span' of 'doc' according to the Penn Tree Bank tag set.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

trace

public static boolean trace
if true, use HMMannotator trace to write a one-line message about each part-of-speech assignment to Console.

Constructor Detail

HMMTagger

public HMMTagger()
create a new HMM-based part-of-speech tagger.

Method Detail

store

public void store(java.lang.String fileName)
           throws java.io.IOException
store the HMM associated with this tagger to file 'fileName'.

Throws:
java.io.IOException

load

public void load(java.lang.String fileName)
          throws java.io.IOException
load the HMM associated with this tagger from file 'fileName'.

Throws:
java.io.IOException

tagPenn

public void tagPenn(Document doc,
                    Span span)
tag 'span' of 'doc' according to the Penn Tree Bank tag set. Words are assigned 'constit' annotations with feature cat = a Penn tag.


annotate

public void annotate(Document doc,
                     Span span,
                     java.lang.String type)
tag 'span' of 'doc' according to the Penn Tree Bank tag set. Words are assigned annotations of type 'type' with feature cat = a Penn tag.


score

public void score(Document doc,
                  Document key)
compare the 'constit' tags of Documents 'doc' and 'key', and report (to System.out) the agreement rate.


tagJet

public void tagJet(Document doc,
                   Span span)
tag 'span' of 'doc' according to the Jet part of speech set. Words are first assigned 'tagger' annotations with feature cat = a Penn tag. Then these are mapped to Jet tags, and 'constit' annotations are added with cat = a Jet part-of-speech tag.


prune

public void prune(Document doc,
                  Span span)
prune existing 'constit' annotations on 'span' of 'doc' using information from a part-of-speech tagger. Words are assumed, on entry, to have multiple 'constit' annotations from dictionary look-up, reflecting the POS ambiguity of the words; this ambiguity will be reduced using a tagger. Words are first assigned 'tagger' annotations with feature cat = a Penn tag. This information is then used to remove 'constit' annotations not consistent with the Penn tag.