Package Jet.Tipster

The Tipster package provides the basic methods for recording information about documents.  It is loosely based on the 'Tipster Architecture' developed by R.Grishman as part of the Government-sponsored Tipster program.  The basic objects are Documents and Annotations;  a Document is a container for the text of the document, and a set of Annotations on the Document.

See:
          Description

Class Summary
Annotation An Annotation assigns a type and a set of features to a portion of a Document.
AnnotationColor provides a mechanism for associating particular highlighting colors with particular annotation types in Document displays.
AnnotationTool a tool for manually adding annotations to a Document.
CollectionAnnotationTool a tool for displaying a collection and allowing the AnnotationTool to be invoked on documents in the collection.
CollectionView display of a DocumentCollection, with buttons to select views of individual Documents.
Document Document provides a container for the text of a document and the annotations on a document.
DocumentCollection a set of ExternalDocuments.
ExternalDocument a Document associated with a file.
Span A portion of a document, represented by its starting and ending character positions, and a pointer to the document.
View displays a Document with its annotations.
 

Package Jet.Tipster Description

The Tipster package provides the basic methods for recording information about documents.  It is loosely based on the 'Tipster Architecture' developed by R.Grishman as part of the Government-sponsored Tipster program.  The basic objects are Documents and Annotations;  a Document is a container for the text of the document, and a set of Annotations on the Document.

In the course of processing, the Jet system builds up a lot of information about the words and phrases in a Document:  simple things like parts-of-speech for individual words and type information (person/company/location) for names, as well as more complex things like phrases and clauses (with internal structure).  We want to have a single class of object for capturing all of this information and associating it with a Document.  The class we use for this purpose is the Annotation.  An Annotation is associated with a Span (substring) of the text of a Document.  The Annotation has a type and a set of features with values.  For example, an annotation can indicate that a portion of a document is a sentence, or is a token with a given part-of-speech.  More complex structures can be build by having Annotations which point to other annotations.

A Document is processed in a series of stages, such as tokenization, sentence splitting, dictionary look-up, pattern matching, etc.  Each stage uses the Annotations placed on the Document by previous stages, and adds its own Annotations to the Document.

Annotations provide a mark-up capability very similar to that of SGML or XML (although Annotations do not have to be nested the way SGML/XML mark-up it).  The Document class provides a method for converting selected Annotations on a Document to XML mark-up, and in the future will have a method for converting XML mark-up to Annotations.  In addition, the Document class provides a method for viewing a Document and highlighting selected annotations (this is very primitive at present).