Package Jet.Pat

The Pat package encapsulates the basic pattern application mechanism of Jet, sets of pattern/action rules which can be  applied to a document to add or modify annotations on the document.  The external form of the pattern language is described below;  the classes used to encode these patterns are summarized separately.

See:
          Description

Class Summary
Action The representation of an action as part of a when statement in a pattern collection.
AddFeaturesAction the action (in a when statement) for adding features to an existing Annotation.
AnnotationPatternElement A pattern element which matches an annotation.
AssignmentPatternElement a pattern element which assigns a value (a String or integer) to a pattern variable.
AtomicPatternElement abstract class for all PatternElements which do not contain embedded references to other PatternElements.
FeatureTest representation of a condition on a feature value, represented in the pattern language by ? predicate (argument).
FinalPatternNode A node in the graph representation of a pattern set, representing the end of a path, associated with a set of actions to be performed if that node is reached in pattern matching.
GetEndPatternElement  
GetStartPatternElement a pattern element, in the graph representation of a pattern, which binds a variable to the current position in the document being matched.
Id  
IntegerPatternElement a pattern element which matches an integer token.
InternalPatternNode a non-final node in the graph representation of a pattern set (a node with outgoing arcs).
NewAnnotationAction the action (in a when statement) for creating a new annotation on a Document.
NullPatternElement a pattern element which always succeeds.
Pat contains static procedures used in pattern matching.
PatternAlternation a pattern element for recording an alternation of patterns (A | B | C).
PatternApplication records information about the matching of a pattern graph against a segment of a Document.
PatternArc an arc in the graph representation of a pattern.
PatternCollection  
PatternElement  
PatternGraph  
PatternGraphView A view of the pattern as a tree.
PatternNode a node in the graph representation of a pattern set.
PatternReference an element in a pattern which stands for a reference to another pattern.
PatternRepetition a pattern element for representing an optional or repeated pattern, A? (zero or one instance of A), A* (zero or more instances of A), or A+ (one or more instances of A).
PatternRule internal representation of a when statement in a pattern file, indicating that when pattern patternName is matched, actions should be performed.
PatternSequence a sequence of pattern elements which are to be matched in succession, to successive portions of a document.
PatternSet a set of pattern-action rules which are applied together when processing a document.
PatternView A view of the pattern as a tree.
PrintAction the encoding of the "print message" action, where message is a StringExpression (one or more strings or variables).
SpanBindingPatternElement a pattern construct which binds a variable to the span matched by a pattern element.
StringExpression a sequence of strings and variables, used as the argument to the "print" and "write" actions.
TokenStringPatternElement a pattern element which matches a specific word.
UndefinedCapPatternElement  
WriteAction the encoding of the "write message" action, where message is a StringExpression (one or more strings or variables).
 

Exception Summary
PatternSyntaxError  
 

Package Jet.Pat Description

The Pat package encapsulates the basic pattern application mechanism of Jet, sets of pattern/action rules which can be  applied to a document to add or modify annotations on the document.  The external form of the pattern language is described below;  the classes used to encode these patterns are summarized separately.

Pattern Language

The external form of a pattern collection, as it appears on a file, is a sequence of pattern statements, where each pattern statement is terminated by a semicolon.  The basic language has two statement types, a pattern definition statement and a when statement, which indicates the action to be performed when a pattern is matched.

Pattern Definition

A pattern definition has the form
pattern-name : = option1 | option2 | ... ;
where where pattern-name is a sequence of letters beginning with a lower-case letter, and each optioni is a sequence of repeated-pattern-elements separated by spaces.  A repeated pattern element has one of the forms
pattern-element
pattern-element ?
pattern-element *
pattern-element +
to indicate exactly one, zero or one, zero or more, one or more instances of pattern-element.  Pattern-element in turn may be
a string:  "quack"
an annotation:  [type feature=value  feature=value ...]
the name of another pattern
an alternation:  ( option1 | option2 | ... )
as assignment pattern element:  variable = value
A string pattern element matches an annotation of type token spanning the specified string.  An annotation pattern element matches an annotation which has the specified type and features (and may have additional features).

Variables

A variable name is a sequence of letters beginning with a capital letter.  A variable may be bound in a pattern in two ways.  An assignment pattern element
variable = value
binds variable to value.  At present, the only values allowed are integers.  A parenthesized pattern may be followed by a colon (:) and variable name
(pattern ) : variable
This binds the variable to the span of the document matched by the pattern.

When Statements and Actions

When statements associate patterns with sequences of actions.  When the pattern is matched in a document, the associated actions are performed.  The when statement has the form
when pattern-name, action1, action2, ... ;
At present, three actions are implemented:  the add action, which adds an annotation, and print action, and the write action.

The add action

The add action adds an annotation to the text.  It has the form
add [annotation-type feature=value feature=value ...]
or
add [annotation-type feature=value feature=value ...] over variable
In the first form, the span of the new annotation is the text matched by the pattern.  In the second form, the variable must have been bound to a span as part of the pattern matching;  this is used as the span of the new annotation.

The print action

The print action has the form
print stringExpression
where stringExpression can be a string (enclosed in double quotes), a variable, or a sequence of two or more strings and variables separated by plus signs (+).  A variable in a stringExpression should have been bound to a span or an annotation as part of the pattern matching process;  the print action prints the text subsumed by that span or annotation.  If the stringExpression contains two or more items, they are concatenated and the result printed together on a single line.  The output is sent to the Jet console.

The write action

The write action has the form
write stringExpression
It has the same semantics as the print action, except that the output is written to standard output.

Pattern Sets and Pattern Matching Process

The when statements are organized into pattern sets.  The statement
pattern set name;
indicates that all following when statements (until the next pattern set statment) belong to pattern set name. The basic 'top level' operation in Jet is the application of a pattern set to a sentence.

The process begins by matching all patterns in the pattern set (i.e., all patterns referenced by when statements in the pattern set) starting at the first token of the sentence.  If several patterns match, we select the pattern which matches the longest portion of the text.  If several patterns match the same (longest) portion, we select the pattern whose when statement appeared first in the pattern file.  The actions associated with the selected pattern are then executed in sequence (if no pattern matches, no actions are performed).

The starting point for pattern matching is then advanced and the process is repeated.  If any of the actions created new annotations, the starting point is set to the maximum of thes end of the annotations.  If no new annotation was created, the starting point is advanced by one token.  The matching continues until the starting point reaches the end of the sentence.