Jet.Scorer
Class SGMLProcessor

java.lang.Object
  extended byJet.Scorer.SGMLProcessor

public class SGMLProcessor
extends java.lang.Object

methods for converting SGML markup into Annotations. Blanks are not allowed within tags, except for a single whitespace character before a feature name. Feature values may be enclosed in single quotes (') or double quotes("); if values do not contain whitespace or a close tag bracket (>), they need not be enclosed in quotes. Only limited error checking is done.


Field Summary
static boolean allTags
          if true, all tags will be converted to Annotations.
static java.lang.String[] emptyTags
          a list of tags which do not have corresponding close tags and so are to be converted to empty Annotations.
static boolean includeWhitespace
          if true, whitespace following end tag is included as part of span assigned to annotation.
 
Constructor Summary
SGMLProcessor()
           
 
Method Summary
static void dereference(Document doc)
          convert all references to Annotations appearing as features of other annotations from their string form ("#nnnn", where nnnn is the id of the Annotation being references) to actual pointers to Annotations.
static Document sgmlToDoc(Document doc, java.lang.String tag)
          Takes a Document doc whose text contains SGML markup; deletes all existing annotations and returns the doc with tag tags removed from the text and tag annotations added to the document.
static Document sgmlToDoc(Document doc, java.lang.String[] tags)
           
static Document sgmlToDoc(Document doc, java.lang.String sgmlText, java.lang.String tag)
           
static Document sgmlToDoc(Document doc, java.lang.String sgmlText, java.lang.String[] tags)
           
static Document sgmlToDoc(java.lang.String sgmlText, java.lang.String tag)
          Converts an SGML-marked String sgmlText to a Document instance with tag tags removed from the text and tag annotations added to the document.
static Document sgmlToDoc(java.lang.String sgmlText, java.lang.String[] tags)
          Converts an SGML-marked String sgmlText to a Document instance with tags tags removed from the text and tags annotations added to the document.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

includeWhitespace

public static boolean includeWhitespace
if true, whitespace following end tag is included as part of span assigned to annotation.


allTags

public static boolean allTags
if true, all tags will be converted to Annotations.


emptyTags

public static java.lang.String[] emptyTags
a list of tags which do not have corresponding close tags and so are to be converted to empty Annotations.

Constructor Detail

SGMLProcessor

public SGMLProcessor()
Method Detail

sgmlToDoc

public static Document sgmlToDoc(java.lang.String sgmlText,
                                 java.lang.String tag)
Converts an SGML-marked String sgmlText to a Document instance with tag tags removed from the text and tag annotations added to the document.

Tags should have the exact form of <type [feature=value]*> or </type>.

Parameters:
tag - type of tag
Returns:
the document that has tag annotations added but tag tags removed.

sgmlToDoc

public static Document sgmlToDoc(Document doc,
                                 java.lang.String tag)
Takes a Document doc whose text contains SGML markup; deletes all existing annotations and returns the doc with tag tags removed from the text and tag annotations added to the document.

Tags should have the exact form of <type [feature=value]*> or </type>.

Parameters:
tag - type of tag
Returns:
the document that has tag annotations added but tag tags removed.

sgmlToDoc

public static Document sgmlToDoc(Document doc,
                                 java.lang.String sgmlText,
                                 java.lang.String tag)

sgmlToDoc

public static Document sgmlToDoc(java.lang.String sgmlText,
                                 java.lang.String[] tags)
Converts an SGML-marked String sgmlText to a Document instance with tags tags removed from the text and tags annotations added to the document.

Tags should have the exact form of <type [feature=value]*> or </type>.

Parameters:
tags - array of types of tag
Returns:
the document that has tags annotations added but tags tags removed.

sgmlToDoc

public static Document sgmlToDoc(Document doc,
                                 java.lang.String[] tags)

sgmlToDoc

public static Document sgmlToDoc(Document doc,
                                 java.lang.String sgmlText,
                                 java.lang.String[] tags)

dereference

public static void dereference(Document doc)
convert all references to Annotations appearing as features of other annotations from their string form ("#nnnn", where nnnn is the id of the Annotation being references) to actual pointers to Annotations.