|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.ObjectJet.Chunk.TokenClassifier
Jet.HMM.HMM
A Hidden Markov Model. The model is composed of states (HMMstate) and arcs (HMMarc). The model can be trained (train method), applied to a token sequence to find the most likely state sequence (viterbi method), loaded, saved, and printed.
Note that this HMM assumes that tokens are emitted by states, not by arcs. However, the start and end states do not emit tokens, so a sequence of N tokens is matched by a sequence of N+2 states, including the start and end state.
The HMM also incorporates an auxiliary memory in the form of a document dictionary ('cache'), which is intended for use in name tagging. If a word has once been tagged as a specific type of name ("Mr. John Park") within a document, this can be recorded so that subsequent uses of the name will be consistently tagged even if the context is ambiguous ("Park").
Field Summary | |
protected static double |
UNLIKELY
|
double |
viterbiProbability
after the viterbi decoder method has been invoked, the probability along the best path found by the decoder. |
Constructor Summary | |
HMM()
create a new HMM using instances of BasicHMMemitter to control
emission of tokens from states. |
|
HMM(java.lang.Class emitterClass)
create a new HMM using instances of emitterClass to control
emission of tokens from states. |
Method Summary | |
void |
addState(HMMstate state)
add state state to the HMM. |
void |
computeProbabilities()
compute the probabilities for token emission and state transition from the counts acquired in training. |
void |
createModel()
|
double |
getLocalMargin(Document doc,
Annotation[] tokens,
java.lang.String excludedTag,
int excludedTagStart,
int excludedTagEnd)
returns the margin for assigning a particular tag to a sequence of tokens. |
double |
getMargin()
if invoked after a call on 'viterbi', returns the margin (the difference in score between the best and second best analyses). |
HMMstate |
getState(java.lang.String stateName)
returns state with given name, or null if no such state |
void |
load(java.io.Reader HMMReader)
read a description of an HMM from HMMReader . |
void |
load(java.lang.String fileName)
|
void |
newDocument()
|
void |
print()
print a complete description of the HMM (all states and arcs) to System.out. |
void |
recordMargin()
enable the recording of the margin (the difference in score between the best and second best analysis) by the Viterbi decoder. |
void |
resetForTraining()
|
void |
setTagsToCache(java.lang.String[] tags)
|
void |
store(java.io.PrintWriter stream)
save the HMM to stream in a form which can be reloaded
using load(java.io.Reader) . |
void |
store(java.lang.String fileName)
|
void |
train(Document doc,
Annotation[] tokens,
java.lang.String[] tags)
a slower algorithm for training the HMM. |
void |
train0(Document doc,
Annotation[] tokens,
java.lang.String[] tags)
a fast, simple algorithm for training the HMM. |
java.lang.String[] |
viterbi(Document doc,
Annotation[] tokens)
a Viterbi decoder for HMMs. |
int[] |
viterbiPath(Document doc,
Annotation[] tokens)
a Viterbi decoder for HMMs. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
protected static final double UNLIKELY
public double viterbiProbability
Constructor Detail |
public HMM()
BasicHMMemitter
to control
emission of tokens from states.
public HMM(java.lang.Class emitterClass)
emitterClass
to control
emission of tokens from states.
Method Detail |
public void setTagsToCache(java.lang.String[] tags)
public void load(java.io.Reader HMMReader) throws java.io.IOException
HMMReader
. The
description consists of lines
java.io.IOException
public void load(java.lang.String fileName)
public void addState(HMMstate state)
state
to the HMM.
public HMMstate getState(java.lang.String stateName)
public void resetForTraining()
public void newDocument()
public void train0(Document doc, Annotation[] tokens, java.lang.String[] tags)
public void train(Document doc, Annotation[] tokens, java.lang.String[] tags)
public void computeProbabilities()
public void createModel()
public void print()
public void store(java.io.PrintWriter stream)
stream
in a form which can be reloaded
using load(java.io.Reader)
.
public void store(java.lang.String fileName)
public int[] viterbiPath(Document doc, Annotation[] tokens)
tokens
, on document
doc
, returns the most likely path which can generate
those tokens. The value returned is an array of the states
(indexes into states
) along the most likely path.
public java.lang.String[] viterbi(Document doc, Annotation[] tokens)
tokens
, on document
doc
, returns the most likely path which can generate
those tokens. The value returned is an array of the tags
associated with the states along the most likely path.
public void recordMargin()
public double getMargin()
public double getLocalMargin(Document doc, Annotation[] tokens, java.lang.String excludedTag, int excludedTagStart, int excludedTagEnd)
excludedTag
.
doc
- the Document containing the sentence being taggedtokens
- the token annotations for the sentenceexcludedTag
- the tag assigned to the sequenceexcludedTagStart
- the index of the first token being assigned this tagexcludedTagEnd
- the index of the last token being assigned this tag
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |