Research Professor, Computer Science Department
Department of Computer Science
Courant Institute of Mathematical Sciences
New York University
Courant Institute of Mathematical Sciences • 251 Mercer Street • New York, NY 10012 •
• +1 212.998-3097 (voice) • +1 212.995-4123 (fax) • email@example.com •
Sager began work on the computer processing of language as a member of the team that developed the first English language parsing program that ran on Univac 1 at the University of Pennsylvania in 1959.2 Sager's section of the program was to treat syntactic ambiguity (more than one possible analysis at points in the sentence).3 The structures imposed by the 1959 program proved unwieldy for this task and in 1960 Sager developed an algorithm and a form of string grammar in which the treatment of ambiguity was an integral part (cf. A Procedure for Left-to-Right Analysis of Sentence Structure [TDAP 27]). This work became the basis of a Ph.D. thesis for which she was awarded a Ph.D. in Linguistics from the University of Pennsylvania in 1968 and served as the basis for the parsing program first developed at New York University in a collaboration with James Morris and Morris Salkoff (SPR 1 & SPR 5).4 In the early 1960's, in addition to Sager's work in natural language processing (NLP) at NYU, Susumu Kuno at Harvard University applied his Predictive Analyzer to English syntax.5 During this period many projects in machine translation were generously funded by the U.S. government until the Automatic Language Processing Advisory Committee (ALPAC) report (1966) found too little progress had been made to justify further support. Research in English computer parsing however continued at NYU and was joined later by IBM and other groups.
The Linguistic String Project (LSP) at New York University began in 1965 with funding from the National Science Foundation to develop computer methods for structuring and accessing information in the scientific and technical literature. Document processing was to be based on linguistic principles, first to demonstrate the possibility of computerized grammatical analysis (parsing), then to include the specialized vocabulary and rules for particular scientific domains. Domain specialization led to an elaboration of the methods of sublanguage analysis,6 in particular, as applied to the language of clinical reporting in patient documents. The 30+ year history of the Linguistic String Project and its results are tracked in the LSP annotated bibliography.
From 1966 to 1984, the NYU Linguistic String Project (LSP) issued a series of volumes, String Program Reports (SPR) 1-16 documenting in detail the development of the first LSP parser and string grammar, their further development and the researches into the structure of information they facilitated.
The computer string grammar of English at the core of the LSP parsing program was published in 1981.7 The grammar was modified to handle the specialized language of clinical documents, and the English lexicon used by the parser was augmented with medical semantic attributes. Post-parsing procedures carried the input documents to a database representation of their textual content suitable for highly specific information querying.8 The resulting Medical Language Processor (MLP) is documented at the LSP website.
Sager taught courses in Natural Language Processing and maintained the Linguistic String Project (LSP) at New York University from 1965 until her retirement in 1995. She resides in New York and for part of each year in Paris. She was one of the translators from French to English of the autobiography of Ngô Văn, a Vietamese revolutionary who, while working in a factory in Paris, became an engineer, a published scholar and author of numerous works.9Footnotes:
The Linguistic String Project (LSP) was a sustained research effort (1960-2005) in the computer processing of language based on the linguistic theory of Zellig Harris: linguistic string theory, transformational analysis and sublanguage grammar. The programs, developed by the Project adapted for clinical narrative in LSP Medical Language Processor (LSP-MLP) that supported online access by clinicians to portions of narrative patient documents relevant to stated concerns.
The Linguistic String Project (LSP) at New York University was one of the earliest research and development projects in computer processing of natural (i.e. human) language. It was initiated by Naomi Sager at NYU in 1965 with a grant from the Office of Science Information Services [OSIS] of the National Science Foundation. The OSIS at that time was seeking means to provide scientists rapid access to information in the expanding technical literature. Computer analysis of language that would facilitate pin-pointed search and retrieval of requested information was one avenue they were pursuing.
The LSP approach was to begin with a parsing program to obtain the syntactic relations among sentence words, the basic structure of language-borne information. This entailed the implementation of a parsing algorithm (top-down, left-to-right with calls on linguistic test procedures) as first described by Sager in 1960 ["A Procedure for Left to Right Analysis of Sentence Structure", Report 27 of the series Transformations and Discourse Analysis Papers published by the Dept. of Linguistics, U. of Pa.]
A 1967 article [LSP 1] laid out the basis for language computation and described the first two implementations of the LSP parser and string grammar. The grammar was specified in two components: a set of formal rewriting rules written in Backus Normal Form (BNF) that provided the structure of the output parse tree, and a set of procedures, called restrictions, that operated on the parse tree to enforce detailed grammatical constraints [LSP 5]. An English-like programming language for expressing restrictions, the Restriction Language RL, was developed [LSP 12]. The computer grammar of English that formed an integral part of the system was published in 1981 [LSP 341].
Applications of the LSP system drew upon Sublanguage Grammar, an extension of linguistic methods whereby the constraints on word combinations special to a subject matter are formalized into quasi-grammatical rules [LSP 112]. It was further shown that parsed documents could be mapped into sublanguage labeled structures, called information formats, on which information retrieval procedures could operate [LSP 28]. The LSP concentrated on the sublanguage of clinical reporting, X-ray reports, hospital discharge summaries, and the like, demonstrating an automated application of health care criteria to information formatted narrative medical reports [LSP 303]. The collective work of the LSP team on medical records was summarized in a 1987 volume [LSP 654]. The LSP Medical Language Processor, including the medically specialized English grammar and dictionary is available on the Linguistic String Project website.
The LSP MLP was converted to French in a collaborative research with the Informatics group of the Cantonal Hospital of Geneva, Switzerland [LSP 77][LSP 765]. Subsequently, an XML hierarchy of medical knowledge tags was added to the system along with an online viewer by which clinicians could see highlighted portions of documents pertaining to particular patient problems or therapies, demonstrated by Ngô Thanh Nhàn, its principal designer, as part of Sager's keynote address at the Second International Conference on the Clinical Document Architecture, October 20-22, 2004 at Acapulco, Mexico.
A general overview of methods and results of the LSP was presented by Sager at the New York Academy of Sciences in 1990 [LSP 78]. The ways in which contributions to Linguistics by Zellig Harris were utilized in the development of the LSP system were described by Sager and Ngô Thanh Nhàn in the symposium dedicated to Harris's work [LSP 91].