Jet.HMM
Class HMMNameTagger

java.lang.Object
  extended byJet.HMM.HMMNameTagger

public class HMMNameTagger
extends java.lang.Object

a Named Entity tagger based on the generic HMM (Hidden Markov Model) mechanism. Methods are provided for creating an HMM for a set of name tags, for training the HMM from annotated corpora, for applying the tagger to new text, and for scoring the results. It uses an external file consisting of a tag table, the line 'endtags', and the HMM.


Field Summary
 HMMannotator annotator
           
 HMM nameHMM
           
 
Constructor Summary
HMMNameTagger(java.lang.Class emitterClass)
          creates a new HMMNameTagger (with an empty HMM).
 
Method Summary
 void load(java.lang.String fileName)
          load the tag table and the HMM associated with this tagger from file 'fileName'.PrintStream
static void main(java.lang.String[] args)
          procedures for training and testing the named entity tagger on specific corpora.
 void scoreCollection(java.lang.String testCollection, java.lang.String keyCollection)
          computes the recall/precision of 'testCollection' with respect to 'keyCollection' (which should have the same documents) with respect to the name annotations in 'tagsToScore'.
 void store(java.lang.String fileName)
          store the tag table and the HMM associated with this tagger to file 'fileName'.
 void tag(Document doc, Span span)
          tag span 'span' of Document 'doc' with Named Entity annotations.
 void train(java.lang.String trainingCollection)
          train the HMMNameTagger using the collection of Documents 'trainingCollection'.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

nameHMM

public HMM nameHMM

annotator

public HMMannotator annotator
Constructor Detail

HMMNameTagger

public HMMNameTagger(java.lang.Class emitterClass)
creates a new HMMNameTagger (with an empty HMM).

Parameters:
emitterClass - the class of the emitter associated with each state of the HMM; must be a subclass of HMMemitter.
Method Detail

train

public void train(java.lang.String trainingCollection)
           throws java.io.IOException
train the HMMNameTagger using the collection of Documents 'trainingCollection'. The documents should have a TEXT zone marked; training is done on all sentences within this zone.

Throws:
java.io.IOException

store

public void store(java.lang.String fileName)
           throws java.io.IOException
store the tag table and the HMM associated with this tagger to file 'fileName'.

Throws:
java.io.IOException

load

public void load(java.lang.String fileName)
          throws java.io.IOException
load the tag table and the HMM associated with this tagger from file 'fileName'.PrintStream

Throws:
java.io.IOException

tag

public void tag(Document doc,
                Span span)
tag span 'span' of Document 'doc' with Named Entity annotations.


scoreCollection

public void scoreCollection(java.lang.String testCollection,
                            java.lang.String keyCollection)
computes the recall/precision of 'testCollection' with respect to 'keyCollection' (which should have the same documents) with respect to the name annotations in 'tagsToScore'. Reports both per-document and total scores to System.out.


main

public static void main(java.lang.String[] args)
                 throws java.io.IOException
procedures for training and testing the named entity tagger on specific corpora.

Throws:
java.io.IOException