What is LSP? |
![]() |
The Linguistic String Project (LSP) began in 1965 at New York University
with funding from the U.S. National Science Foundation (NSF) to
implement an English language parsing program as a first step in the
computer processing of the scientific literature. The goal was to facilitate
the retrieval of specific information from texts in answer to queries by
investigators. The parsing program was based on
Linguistic String
Analysis (Zellig S. Harris, String Analysis of Sentence
Structure, The Hague: Mouton & Co., 1962) and an algorithm
developed by Sager
(N. Sager, Procedure for left-to-right
recognition of sentence structure, T.D.A.P. No. 27, University of
Pennsylvania, 1960). The program underwent successive implementations at NYU, as did
the computer grammar used by the parser
(N. Sager, Natural Language Information
Processing, Addison-Wesley Publishing Co. 1981).
The system came to embody its own programming language (N. Sager & R. Grishman, "The restriction language for computer grammars of natural language," Communications of the ACM 18, 1975, pp. 390-400) and an extensive "dictionary" providing parts of speech and lexical subclass memberships of the words in the sentences to be parsed (E. Fitzpatrick & N. Sager, Appendix 3: The lexical subclasses of the LSP English grammar, op. cit. 1981, pp. 322-374, and "The lexical subclasses of the Linguistic String Parser," American Journal of Computational Linguistics, No. 2, 1974). The main features of the program and grammar have remained intact over time, serving as the basis for further developments. As a result of the parsing program's initial success, further funding was supplied by the National Library of Medicine (NLM) of the National Institute of Health (NIH) to develop an application for treating the clinical narrative of patient documents. This work resulted in the Medical Language Processor system.
Theory & methods The linguistic basis for LSP text processing and the organization of the programs that carry it out have been summarized, first in N. Sager's "Syntactic analysis of natural language," (Advances in Computers, vol. 8, Academic Press, 1967), and after further developments, in N. Sager's "Natural language information formatting," (Advances in Computers, vol. 17, Academic Press, 1978, pp. 89-162). For a quick view, consult these sections of the 1978 publication cited above.
Sublanguage grammar (Zellig S. Harris, Mathematical Structures of Language, Section 5.9, Wiley Interscience Publishers, 1968, pp. 152-155) provided the framework for specializing the language processing to apply to texts in a particular scientific subfield. This occurred in the LSP context in relation to medical documents, in particular, to the narrative found in reports of clinic visits, hospital discharge summaries, and the like. The resulting system comprises five stages of processing to arrive at a standard representation of the content of the documents:
|
![]() |