|
The Medical Language Parser
- The MLP System
- involves three successive stages, as shown in
Figure 3. This arrangement allows the user to easily alter the
specifications of his/her "user-oriented" language as his/her
needs change. The system may easily be adapted for use with an
English grammar of very different external appearance by changing
the input to the first stage of the process.
| Figure 1 |
| OBJECT GRAMMAR OF BNF |
|
| RESTRICTION LANGUAGE SYNTAX | &rarr |
Stage I |
| | &darr |
| | OBJECT RLS |
| | &darr |
| ENGLISH GRAMMAR & WORD DICTIONARY | &rarr |
Stage II |
| | &darr |
| | OBJECT EG & WD |
| | &darr |
| ENGLISH | &rarr |
Stage III |
| | &darr |
| | SENTENCE PARSES |
|
| Figure 3 |
| The Three Stages of the System
generation |
|
- The Parser
- The core of the program is a very simple top-down parser for
context-free grammars, which generates multiple parses of ambiguous
sentences sequentially using a back-up mechanism. This parser,
together with a table-driven lexical processor and a directive
processor which invokes all the other system components, is
present in the program for each of the three stages of the system.
| GENERATORS |
| &darr &uarr &darr &uarr |
| PARSER |
| | |
| LEXICAL PROCESSOR |
| | |
DIRECTIVE PROCESSOR
(loading, updating, etc.) |
|
| Figure 2 |
| Organization of Stages I and II |
|
As indicated in Figures 2 and 3, the parser has "hooks" on it to
permit various routines to be invoked during the top-down parse.
In the first two stages, the parser is used as a syntax-directed
compiler which translates the grammar of the grammar of English (i.e.
the grammar of the user-oriented language) or the grammar of
English from its input text form to list structure. Hence, the
routines invoked by the parser in stage I and stage II programs
are code generators, which construct the requisite list structure
during the top-down analysis (Figure 2).
| RESTRICTIONS |
| &darr &uarr &darr &uarr |
| PARSER |
| | |
| LEXICAL PROCESSOR |
| | |
DIRECTIVE PROCESSOR
(loading, updating, etc.) |
|
| Figure 3 |
| Organization of Stage III |
|
For stage III, the generators are replaced by a restriction
interpreter (Figure 3). The grammar of English consists of
a context-free component plus a set of restrictions, each of
which is associated with one or more productions in the
context-free component. These restrictions state conditions
which the parse tree must meet if the analysis is to be
accepted. Each time a node is added to the parse tree, and
each time a level in the tree is completed, the parser
invokes the restriction interpreter to execute those
restrictions appearing in the corresponding production;
the restriction interpreter returns a success or failure
indication to the parser. If the restriction has succeeded,
the parser continues normally (i.e., as if there had been
no restriction). If the restriction has failed, the parser
must either try an alternate option, or if all options in
a production have been exhausted, dismantle part of the
parse tree.
The system has been implemented on the Control Data 6600
at New York University in 1973. Today, it is totally in
C++. It consisted of approximately 8,000
source lines in about 100 subroutines, only some of which
are included in the program for any one stage of the system.
It is written almost entirely in FORTRAN, with only a few
assembly routines for manipulating part-word fields. Each
stage occupies approximately 55,000 60-bit words (corresponding
roughly to 400,000 8-bit bytes).
- The Compiler
- In Stages I and II, the compiler has a throughput of about
1,500 to 3,000 cards (grammar lines) per minute, depending on
the complexity of the grammar. With the English grammar of
about 20,000 source lines, including the word dictionary, a
fast compiler (directive *COMPILE()) is clearly an asset.
Still, re-compiling the grammar for seven or eight minutes
each time a change is made is rather expensive, so an updating
system (directive *MODIFY()) was included in the
compiler. In the output of stages I and II, the source text
and generated list structure are combined in a single file
called an object grammar or
an object dictionary (named in analogy with the object
program produced by a compiler). Once an object grammar
(file name with the extension obg) or object dictionary
(file name with the extension wdo) as been initially created,
the user may specify modifications to it on a
statement-by-statement basis. The system will compile
the new statements and will insert, delete, or replace the
source text and corresponding list structure in paralle.
The function of the various stages, and format of the source
input to these stages, will now be described.
- Stage 1
- parses the grammar of the grammar of English, which is
a set of BNF statements describing the syntax of five components
of the English grammar:
- The context-free component
- The lists
- The restrictions
- The dictionary canonical forms (lexical entry conventions)
- the dictionary
- The Dictionary Look-Up
-
|