Jet.HMM
Class HMM

java.lang.Object
  extended byJet.Chunk.TokenClassifier
      extended byJet.HMM.HMM

public class HMM
extends Jet.Chunk.TokenClassifier

A Hidden Markov Model. The model is composed of states (HMMstate) and arcs (HMMarc). The model can be trained (train method), applied to a token sequence to find the most likely state sequence (viterbi method), loaded, saved, and printed.

Note that this HMM assumes that tokens are emitted by states, not by arcs. However, the start and end states do not emit tokens, so a sequence of N tokens is matched by a sequence of N+2 states, including the start and end state.

The HMM also incorporates an auxiliary memory in the form of a document dictionary ('cache'), which is intended for use in name tagging. If a word has once been tagged as a specific type of name ("Mr. John Park") within a document, this can be recorded so that subsequent uses of the name will be consistently tagged even if the context is ambiguous ("Park").


Field Summary
protected static double UNLIKELY
           
 double viterbiProbability
          after the viterbi decoder method has been invoked, the probability along the best path found by the decoder.
 
Constructor Summary
HMM()
          create a new HMM using instances of BasicHMMemitter to control emission of tokens from states.
HMM(java.lang.Class emitterClass)
          create a new HMM using instances of emitterClass to control emission of tokens from states.
 
Method Summary
 void addState(HMMstate state)
          add state state to the HMM.
 void computeProbabilities()
          compute the probabilities for token emission and state transition from the counts acquired in training.
 void createModel()
           
 double getLocalMargin(Document doc, Annotation[] tokens, java.lang.String excludedTag, int excludedTagStart, int excludedTagEnd)
          returns the margin for assigning a particular tag to a sequence of tokens.
 double getMargin()
          if invoked after a call on 'viterbi', returns the margin (the difference in score between the best and second best analyses).
 HMMstate getState(java.lang.String stateName)
          returns state with given name, or null if no such state
 void load(java.io.Reader HMMReader)
          read a description of an HMM from HMMReader.
 void load(java.lang.String fileName)
           
 void newDocument()
           
 void print()
          print a complete description of the HMM (all states and arcs) to System.out.
 void recordMargin()
          enable the recording of the margin (the difference in score between the best and second best analysis) by the Viterbi decoder.
 void resetForTraining()
           
 void setTagsToCache(java.lang.String[] tags)
           
 void store(java.io.PrintWriter stream)
          save the HMM to stream in a form which can be reloaded using load(java.io.Reader).
 void store(java.lang.String fileName)
           
 void train(Document doc, Annotation[] tokens, java.lang.String[] tags)
          a slower algorithm for training the HMM.
 void train0(Document doc, Annotation[] tokens, java.lang.String[] tags)
          a fast, simple algorithm for training the HMM.
 java.lang.String[] viterbi(Document doc, Annotation[] tokens)
          a Viterbi decoder for HMMs.
 int[] viterbiPath(Document doc, Annotation[] tokens)
          a Viterbi decoder for HMMs.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

UNLIKELY

protected static final double UNLIKELY
See Also:
Constant Field Values

viterbiProbability

public double viterbiProbability
after the viterbi decoder method has been invoked, the probability along the best path found by the decoder.

Constructor Detail

HMM

public HMM()
create a new HMM using instances of BasicHMMemitter to control emission of tokens from states.


HMM

public HMM(java.lang.Class emitterClass)
create a new HMM using instances of emitterClass to control emission of tokens from states.

Method Detail

setTagsToCache

public void setTagsToCache(java.lang.String[] tags)

load

public void load(java.io.Reader HMMReader)
          throws java.io.IOException
read a description of an HMM from HMMReader. The description consists of lines
STATE state-name
ARC TO state-name [ count ]
EMIT token [ count ]
TAG token
ALLOW type

Throws:
java.io.IOException

load

public void load(java.lang.String fileName)

addState

public void addState(HMMstate state)
add state state to the HMM.


getState

public HMMstate getState(java.lang.String stateName)
returns state with given name, or null if no such state


resetForTraining

public void resetForTraining()

newDocument

public void newDocument()

train0

public void train0(Document doc,
                   Annotation[] tokens,
                   java.lang.String[] tags)
a fast, simple algorithm for training the HMM. This algorithm trains the HMM from a fully annotated corpus and requires that at each token there be exactly one arc which can be followed to a successor state with the correct tag.


train

public void train(Document doc,
                  Annotation[] tokens,
                  java.lang.String[] tags)
a slower algorithm for training the HMM. This algorithm trains the HMM from a fully annotated corpus and requires that there be a unique path through the network whose tags match those of the training data.


computeProbabilities

public void computeProbabilities()
compute the probabilities for token emission and state transition from the counts acquired in training. This method should be invoked after training is complete (after all calls on the train method) and before the HMM is applied (calls on the viterbi method).


createModel

public void createModel()

print

public void print()
print a complete description of the HMM (all states and arcs) to System.out.


store

public void store(java.io.PrintWriter stream)
save the HMM to stream in a form which can be reloaded using load(java.io.Reader).


store

public void store(java.lang.String fileName)

viterbiPath

public int[] viterbiPath(Document doc,
                         Annotation[] tokens)
a Viterbi decoder for HMMs. Given an array of token annotations, tokens, on document doc, returns the most likely path which can generate those tokens. The value returned is an array of the states (indexes into states) along the most likely path.


viterbi

public java.lang.String[] viterbi(Document doc,
                                  Annotation[] tokens)
a Viterbi decoder for HMMs. Given an array of token annotations, tokens, on document doc, returns the most likely path which can generate those tokens. The value returned is an array of the tags associated with the states along the most likely path.


recordMargin

public void recordMargin()
enable the recording of the margin (the difference in score between the best and second best analysis) by the Viterbi decoder.


getMargin

public double getMargin()
if invoked after a call on 'viterbi', returns the margin (the difference in score between the best and second best analyses). Requires that 'recordMargin' be called at some point before the call on 'viterbi'.


getLocalMargin

public double getLocalMargin(Document doc,
                             Annotation[] tokens,
                             java.lang.String excludedTag,
                             int excludedTagStart,
                             int excludedTagEnd)
returns the margin for assigning a particular tag to a sequence of tokens. This procedure assumes that this assignment is part of the Viterbi decoding of the sentence. It computes the difference between the log probability of the Viterbi decoding and the log probability of the decoding using a constrained HMM where the specified tokens cannot be assigned the tag excludedTag.

Parameters:
doc - the Document containing the sentence being tagged
tokens - the token annotations for the sentence
excludedTag - the tag assigned to the sequence
excludedTagStart - the index of the first token being assigned this tag
excludedTagEnd - the index of the last token being assigned this tag