Jet.HMM
Class HMMannotator

java.lang.Object
  extended byJet.HMM.HMMannotator

public class HMMannotator
extends java.lang.Object

HMMAnnotator provides methods for training and using HMMs with annotated Documents.


Constructor Summary
HMMannotator(Jet.Chunk.TokenClassifier h)
          create a new annotator based on HMM h.
 
Method Summary
 void annotate(Document doc)
          use the HMM to add annotations to Document 'doc'.
 void annotateSpan(Document doc, Span textSpan)
          use the HMM to add annotations to Span 'textSpan' of Document 'doc'.
 java.lang.String[][] getTagTable()
          returns the tag table (the correspondence between HMM tags and annotation types and features).
 void readTagTable(java.io.BufferedReader in)
          read the tag table (the list of annotation types and features) from BufferedReader 'in'.
 void readTagTable(java.lang.String tagFileName)
          read the tag table (the list of annotation types and features) from file 'tagFileName'.
 void setAnnotateEachToken(boolean flag)
          sets / clears the annotateEachToken flag, which applies only if BItag == false.
 void setBItag(boolean flag)
          sets / clears the BItag flag.
 void setRecordMargin(boolean recordMargin)
          turn on/off the feature that records the margin associated with an annotation as a feature 'margin' on the annotation.
 void setTagTable(java.lang.String[][] table)
          define the tag table for the annotator -- the correspondence between the tags associated with the states and the annotations on the documents.
 void setTrace(boolean trace)
          turn on / off the trace
 void setZoneToTag(java.lang.String zone)
          sets the zones to be annotated.
 void train(Document doc)
          use the annotations on Document 'doc' to train the HMM.
 void train(DocumentCollection col)
          use the annotations on all documents in DocumentCollection 'col' to train HMM 'h'.
 void trainOnSpan(Document doc, Span textSpan)
          use the annotations on Span 'span' of Document 'doc' to train the HMM.
 void writeTagTable(java.io.PrintWriter pw)
          writes the tag table (the correspondence between HMM tags and annotation types and features) to PrintWriter 'pw'.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HMMannotator

public HMMannotator(Jet.Chunk.TokenClassifier h)
create a new annotator based on HMM h.

Method Detail

setTagTable

public void setTagTable(java.lang.String[][] table)
define the tag table for the annotator -- the correspondence between the tags associated with the states and the annotations on the documents. The tag table is a two-dimensional array, where each row contains 4 elements:
annotation-type | feature [or null if no feature constraint] | feature-value | tag
This specifies that the annotation with the given annotation-type (and feature / feature-value, if not null) matches the HMM state with tag 'tag'.


readTagTable

public void readTagTable(java.lang.String tagFileName)
read the tag table (the list of annotation types and features) from file 'tagFileName'.


readTagTable

public void readTagTable(java.io.BufferedReader in)
read the tag table (the list of annotation types and features) from BufferedReader 'in'. Each line must be of the form
annotationType HMMtag
or annotationType feature featureValue HMMtag
where 'HMMtag' ties this line to a state of the HMM. Stops at the end-of-file or on encountering a line 'endtags'.


writeTagTable

public void writeTagTable(java.io.PrintWriter pw)
writes the tag table (the correspondence between HMM tags and annotation types and features) to PrintWriter 'pw'.


getTagTable

public java.lang.String[][] getTagTable()
returns the tag table (the correspondence between HMM tags and annotation types and features).


setBItag

public void setBItag(boolean flag)
sets / clears the BItag flag. If the flag is false, then the tag given by the tag table is matched directly against the tag on the state, If the flag is true, the tag in the tag table matches tags B-tag and I-tag on HMM states. More precisely, if an annotation is associated with a tag X through the tag table, and the annotation spans three tokens, the first token must match an HMM state with tag B-X, and the remaining tokens must match HMM states with tag I-X.


setAnnotateEachToken

public void setAnnotateEachToken(boolean flag)
sets / clears the annotateEachToken flag, which applies only if BItag == false. If annotateEachToken is true, then the span associated with each token receives a separate annotation (if assigned a tag which corresponds to an annotation), If this flag is false, then we look for the maximal sequence of tokens which are assigned the same tag, and assign it a single annotation (again, if the tag corresponds to an annotation).


setZoneToTag

public void setZoneToTag(java.lang.String zone)
sets the zones to be annotated. For example, if zone="P", then each sequence of tokens spanned by an annotation of type P will be processed, either to train the HMM or to annotate the text using the HMM.


setTrace

public void setTrace(boolean trace)
turn on / off the trace


setRecordMargin

public void setRecordMargin(boolean recordMargin)
turn on/off the feature that records the margin associated with an annotation as a feature 'margin' on the annotation. The margin is the difference between the log probability of the analysis including this annotation and the log probability of the most likely analysis excluding this annotation.


train

public void train(Document doc)
use the annotations on Document 'doc' to train the HMM. Assumes the document has 'zoneToTag' annotations, and trains on each zone separately.


trainOnSpan

public void trainOnSpan(Document doc,
                        Span textSpan)
use the annotations on Span 'span' of Document 'doc' to train the HMM.


train

public void train(DocumentCollection col)
use the annotations on all documents in DocumentCollection 'col' to train HMM 'h'.


annotate

public void annotate(Document doc)
use the HMM to add annotations to Document 'doc'. The Viterbi algorithm is used to find the most likely state sequence for each token sequence; the tags on the resulting states are used to generate annotations (based on the tagTable).


annotateSpan

public void annotateSpan(Document doc,
                         Span textSpan)
use the HMM to add annotations to Span 'textSpan' of Document 'doc'. The Viterbi algorithm is used to find the most likely state sequence for each token sequence; the tags on the resulting states are used to generate annotations (based on the tagTable).