Jet.Tipster
Class Document

java.lang.Object
  extended byJet.Tipster.Document
Direct Known Subclasses:
ExternalDocument

public class Document
extends java.lang.Object

Document provides a container for the text of a document and the annotations on a document.


Constructor Summary
Document()
          Creates a new document with no text or annotations.
Document(java.lang.String stg)
          Creates a new document with text stg and no annotations.
 
Method Summary
 Annotation addAnnotation(Annotation ann)
          Adds an annotation to the document.
 Annotation annotate(java.lang.String tp, Span sp, FeatureSet att)
          Creates an annotation and adds it to the document.
 void annotateWithTag(java.lang.String tag)
          annotateWithTag annotates document with Span of text between <tag> and </tag>.
 void annotateWithTag(java.lang.String tag, int start, int end)
          annotateWithTag annotates document with Span of text between <tag> and </tag>.
 java.util.Vector annotationsAt(int start)
          Returns the annotations beginning at character position start.
 java.util.Vector annotationsAt(int start, java.lang.String type)
          Returns the annotations of type type beginning at character position start.
 java.util.Vector annotationsOfType(java.lang.String type)
          Returns a vector of all annotations of type type.
 java.util.Vector annotationsOfType(java.lang.String type, Span span)
          Returns a vector of all annotations of type type whose span is contained within span.
 java.lang.StringBuffer append(char c)
          Adds the char c to the end of the document.
 java.lang.StringBuffer append(java.lang.String stg)
          Adds the text stg to the end of the document.
 char charAt(int posn)
          Returns the character at position posn in the document.
 void clear()
          Deletes the text and all annotations on a document, creating an empty document.
 void clearAnnotations()
          Removes all annotations on the document.
 java.lang.String[] getAnnotationTypes()
          Returns a vector of all annotation types.
 int getNextAnnotationId()
          returns a unique integer for this Document, to be used in assigning an 'id' feature to an Annotation on this Document.
 int length()
          Returns the length of the document (in characters).
static void main(java.lang.String[] args)
           
 java.lang.String normalizedText(Annotation ann)
          Returns the text subsumed by annotation ann, with leading and trailing whitespace removed, and other whitespace sequences replaced by a single blank.
 java.lang.String normalizedText(Span s)
          Returns the text subsumed by span s, with leading and trailing whitespace removed, and other whitespace sequences replaced by a single blank.
 void removeAnnotation(Annotation ann)
          Removes annotation ann from the document.
 void removeAnnotationsOfType(java.lang.String type)
          removes all annotations of type 'type' from the document.
 void setCharAt(int posn, char c)
          Sets the character at position posn to c.
 void setSGMLindent(int n)
          set amount to indent sgml tags, per level of tag nesting, for writeSGML.
 void setSGMLwrapMargin(int n)
          set right margin for wrapping (inserting newlines) into sgml tags.
 void setText(java.lang.String stg)
          Sets the text of a document.
 java.lang.String text()
          Returns the entire text of the document.
 java.lang.String text(Annotation ann)
          Returns the text subsumed by annotation ann.
 java.lang.String text(Span s)
          Returns the text subsumed by span s.
 Annotation tokenAt(int start)
          Returns the token annotation starting at position start, or null if no token starts at this position.
 java.lang.StringBuffer writeSGML(java.lang.String type)
          Returns the text of the document with each instance of an annotation of type type enclosed in SGML tags.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Document

public Document()
Creates a new document with no text or annotations.


Document

public Document(java.lang.String stg)
Creates a new document with text stg and no annotations.

Method Detail

clear

public void clear()
Deletes the text and all annotations on a document, creating an empty document.


setText

public void setText(java.lang.String stg)
Sets the text of a document. Warning: this should not be done if the document has annotations.


text

public java.lang.String text()
Returns the entire text of the document.


text

public java.lang.String text(Span s)
Returns the text subsumed by span s.


text

public java.lang.String text(Annotation ann)
Returns the text subsumed by annotation ann.


normalizedText

public java.lang.String normalizedText(Span s)
Returns the text subsumed by span s, with leading and trailing whitespace removed, and other whitespace sequences replaced by a single blank.


normalizedText

public java.lang.String normalizedText(Annotation ann)
Returns the text subsumed by annotation ann, with leading and trailing whitespace removed, and other whitespace sequences replaced by a single blank.


append

public java.lang.StringBuffer append(java.lang.String stg)
Adds the text stg to the end of the document.


append

public java.lang.StringBuffer append(char c)
Adds the char c to the end of the document.


length

public int length()
Returns the length of the document (in characters).


charAt

public char charAt(int posn)
Returns the character at position posn in the document.


setCharAt

public void setCharAt(int posn,
                      char c)
Sets the character at position posn to c.


clearAnnotations

public void clearAnnotations()
Removes all annotations on the document.


addAnnotation

public Annotation addAnnotation(Annotation ann)
Adds an annotation to the document.


annotate

public Annotation annotate(java.lang.String tp,
                           Span sp,
                           FeatureSet att)
Creates an annotation and adds it to the document.


removeAnnotation

public void removeAnnotation(Annotation ann)
Removes annotation ann from the document. Does nothing if ann is not an annotation on the document.


removeAnnotationsOfType

public void removeAnnotationsOfType(java.lang.String type)
removes all annotations of type 'type' from the document.


annotationsAt

public java.util.Vector annotationsAt(int start)
Returns the annotations beginning at character position start. Returns null if there are no annotations starting at this position.


annotationsAt

public java.util.Vector annotationsAt(int start,
                                      java.lang.String type)
Returns the annotations of type type beginning at character position start. If there are no annotations of this type, returns null.


tokenAt

public Annotation tokenAt(int start)
Returns the token annotation starting at position start, or null if no token starts at this position.


annotationsOfType

public java.util.Vector annotationsOfType(java.lang.String type)
Returns a vector of all annotations of type type. Returns null if there are no annotations of this type.


annotationsOfType

public java.util.Vector annotationsOfType(java.lang.String type,
                                          Span span)
Returns a vector of all annotations of type type whose span is contained within span. Returns null if there are no annotations starting at this position.


getAnnotationTypes

public java.lang.String[] getAnnotationTypes()
Returns a vector of all annotation types. Returns null if there are no annotation types. Warning: do not modify the returned vector. Doing so can affect the annotations stored on a document.


annotateWithTag

public void annotateWithTag(java.lang.String tag,
                            int start,
                            int end)
annotateWithTag annotates document with Span of text between <tag> and </tag>. Sets type of annotation to tag name.

Parameters:
tag - name of a tag to find a Span between tags
start - where to start searching for a tag
end - where to end searching for a tag

annotateWithTag

public void annotateWithTag(java.lang.String tag)
annotateWithTag annotates document with Span of text between <tag> and </tag>. Sets type of annotation to tag name.

Parameters:
tag - name of a tag to find a Span between tags

getNextAnnotationId

public int getNextAnnotationId()
returns a unique integer for this Document, to be used in assigning an 'id' feature to an Annotation on this Document.


setSGMLwrapMargin

public void setSGMLwrapMargin(int n)
set right margin for wrapping (inserting newlines) into sgml tags. for writeSGML. = 0 if no wrapping. Initial value = 80.


setSGMLindent

public void setSGMLindent(int n)
set amount to indent sgml tags, per level of tag nesting, for writeSGML. = 0 if no indentation.


writeSGML

public java.lang.StringBuffer writeSGML(java.lang.String type)
Returns the text of the document with each instance of an annotation of type type enclosed in SGML tags. If type == null, all annotations are included.

In determining the span of each annotation, endNoWS is used, so the SGML tags are placed around the text with any trailing white space removed. Thus if in the sentence "My cat is sleeping.", there is a token annotation whose span is "cat " (including the trailing blank), writeSGML will generate "My cat is sleeping."

A Jet Document may contain annotations that are not nested, but these cannot be represented in SGML or XML. If the endpoint of an annotation is greated than the endpoint of a preceding annotation that is still open, the annotation is not written out.


main

public static void main(java.lang.String[] args)