JET, the Java Extraction Tool, provides a variety of components for language analysis, such as sentence segmentation, name tagging, time expression tagging and normalization, part-of-speech tagging, partial parsing, and coreference analysis. These components can be arranged in pipelines for different applications, and can be used either for interactive analysis of individual sentences, or 'batch' analysis of complete documents. Simple tools are provided for annotating documents and displaying annotated documents. A full set of procedures are also provided for performing information extraction of entities, relations, and events following the ACE [Automatic Content Extraction] specifications.
JET is a work in progress, and continues being regularly expanded and updated.
(select to download)
||8 Nov 2012
||23 Jan 2014
||expand guide; add dependency parser;
add class files; add windows script
||29 Jun 2014
||enhance onomasticon (name dictionary);
add actions and guide entries for onomasticon
and dependency parser
|1.7.6||28 Sep 2014||jet-140928.tar.gz||add 'generic' feature for entities; improved onomasticon matcher|
|1.8.0||31 Dec 2014||jet-141231.tar.gz||
add Brown word clusters and name tagger trained on 3 X larger corpus,
together producing more robust name annotation (see note below)
In addition, the directory will contain the following files and
directories for those who wish to recompile or modify Jet
If you plan on using the Tratz dependency parser, you will also need to download parseModel.gz and put it in the jet/data directory.
For the best name tagger coverage, download
AceOntoMeneModel.gz, uncompress it,
put it in the jet/acedata directory, and then run with the properties
NameTags.ME.fileName = ../acedata/AceOntoMeneModel
WordClusters.fileName = brownClusters10-2014.txt
To use Jet,