The Linguistic String Project

 

The Linguistic String Project (LSP) was a sustained research effort (1960-2005) in the computer processing of language based on the linguistic theory of Zellig Harris: linguistic string theory, transformation analysis, and sublanguage grammar. The programs, developed by the Project adapted for clinical narrative in LSP Medical Language Processing (LSP-MLP) that supported online access by clinicians to portions of narrative patient documents relevant to stated concerns.

The Linguistic String Project at New York University was one of the earliest research and development projects in computer processing of natural (i.e. human) language. It was initiated by Naomi Sager at NYU in 1965 with a grant from the Office of Science Information Services [OSIS] of the National Science Foundation. The OSIS at that time was seeking means to provide scientists rapid access to information in the expanding technical literature. Computer analysis of language that would facilitate pin-pointed search and retrieval of requested information was one avenue they were pursuing.

The LSP approach was to begin with a parsing program to obtain the syntactic relations among sentence words, the basic structure of language-borne information. This entailed the implementation of a parsing algorithm (top-down, left-to-right with calls on linguistic test procedures) as first described by Sager in 1960 ["A Procedure for Left to Right Analysis of Sentence Structure", Report 27 of the series Transformations and Discourse Analysis Papers published by the Dept. of Linguistics, U. of Pa.]

A 1967 article [LSP 1] laid out the basis for language computation and described the first two implementations of the LSP parser and string grammar. The grammar was specified in two components: a set of formal rewriting rules written in Backus Normal Form (BNF) that provided the structure of the output parse tree, and a set of procedures, called restrictions, that operated on the parse tree to enforce detailed grammatical constraints [LSP 5]. An English-like programming language for expressing restrictions, the Restriction Language RL, was developed [LSP 12]. The computer grammar of English that formed an integral part of the system was published in 1981 [LSP 341].

Applications of the LSP system drew upon Sublanguage Grammar, an extension of linguistic methods whereby the constraints on word combinations special to a subject matter are formalized into quasi-grammatical rules [LSP 112]. It was further shown that parsed documents could be mapped into sublanguage labeled structures, called information formats, on which information retrieval procedures could operate [LSP 28]. The LSP concentrated on the sublanguage of clinical reporting, X-ray reports, hospital discharge summaries, and the like, demonstrating an automated application of health care criteria to information formatted narrative medical reports [LSP 303]. The collective work of the LSP team on medical records was summarized in a 1987 volume [LSP 654]. The LSP Medical Language Processor, including the medically specialized English grammar and dictionary is available on the Linguistic String Project website.

The LSP MLP was converted to French in a collaborative research with the Informatics group of the Cantonal Hospital of Geneva, Switzerland [LSP 77][LSP 765]. Subsequently, an XML hierarchy of medical knowledge tags was added to the system along with an online viewer by which clinicians could see highlighted portions of documents pertaining to particular patient problems or therapies, demonstrated by Ngô Thanh Nhàn, its principal designer, as part of Sager's keynote address at the Second International Conference on the Clinical Document Architecture, October 20-22, 2004 at Acapulco, Mexico.

A general overview of methods and results of the LSP was presented by Sager at the New York Academy of Sciences in 1990 [LSP 78]. The ways in which contributions to Linguistics by Zellig Harris were utilized in the development of the LSP system were described by Sager and Ngô Thanh Nhàn in the symposium dedicated to Harris's work [LSP 91].

Notes:

  1. Sager, N. 1981. Natural language information processing: a computer grammar of English and its applications. Addison-Wesley, Reading, Mass.
  2. Wiley Online Library: JASIS onlinelibrary.wiley.com/doi/10.1002/asi.4630260104/pdf.
  3. Sager, N., Friedman, C., Lyman, M.S., MD, and members of the Linguistic String Project (1987). Medical Language Processing: Computer Management of Narrative Data. Addison-Wesley, Reading, MA.
  4. Borst, F., Sager, N., Nhàn, N.T., Su, Y., Lyman, M., Tick, L.J., Revillard, C., Chi, E., et Scherrer, J.R. (l989) Analyse Automatique de Comptes Rendus D'Hospitalisation. In Informatique et Santé, Informatique et Gestion des Unités de Soins, Comptes Rendus due Colloque AIM-IF, Paris, 1989, Degoulet, P., Stephan, J-C., Venot, A., et Yvon, P-J., Redacteurs. Paris, Springer-Verlag, pp. 246-256.
  5. Sager, N., Friedman, C., Lyman, M.S., MD, and members of the Linguistic String Project (1987). Medical Language Processing: Computer Management of Narrative Data. Addison-Wesley, Reading, MA.
  6. Informatique et Santé, Informatique et Gestion des Unités de Soins, Comptes Rendus du Colloque AIM-IF, Paris, 1989, Degoulet, P., Stephan, J-C., Venot, A., et Yvon, P-J, Redacteurs. Paris, Springer-Verlag, pp. 246-256.