|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Class Summary | |
Annotation | An Annotation assigns a type and a set of features to a portion of a Document. |
AnnotationColor | provides a mechanism for associating particular highlighting colors with particular annotation types in Document displays. |
AnnotationTool | a tool for manually adding annotations to a Document. |
CollectionAnnotationTool | a tool for displaying a collection and allowing the AnnotationTool to be invoked on documents in the collection. |
CollectionView | display of a DocumentCollection, with buttons to select views of individual Documents. |
Document | Document provides a container for the text of a document and the annotations on a document. |
DocumentCollection | a set of ExternalDocuments. |
ExternalDocument | a Document associated with a file. |
Span | A portion of a document, represented by its starting and ending character positions, and a pointer to the document. |
View | displays a Document with its annotations. |
The Tipster package provides the basic methods for recording information about documents. It is loosely based on the 'Tipster Architecture' developed by R.Grishman as part of the Government-sponsored Tipster program. The basic objects are Documents and Annotations; a Document is a container for the text of the document, and a set of Annotations on the Document.
In the course of processing, the Jet system builds up a lot of information about the words and phrases in a Document: simple things like parts-of-speech for individual words and type information (person/company/location) for names, as well as more complex things like phrases and clauses (with internal structure). We want to have a single class of object for capturing all of this information and associating it with a Document. The class we use for this purpose is the Annotation. An Annotation is associated with a Span (substring) of the text of a Document. The Annotation has a type and a set of features with values. For example, an annotation can indicate that a portion of a document is a sentence, or is a token with a given part-of-speech. More complex structures can be build by having Annotations which point to other annotations.
A Document is processed in a series of stages, such as tokenization, sentence splitting, dictionary look-up, pattern matching, etc. Each stage uses the Annotations placed on the Document by previous stages, and adds its own Annotations to the Document.
Annotations provide a mark-up capability very similar to that of SGML or XML (although Annotations do not have to be nested the way SGML/XML mark-up it). The Document class provides a method for converting selected Annotations on a Document to XML mark-up, and in the future will have a method for converting XML mark-up to Annotations. In addition, the Document class provides a method for viewing a Document and highlighting selected annotations (this is very primitive at present).
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |