The Linguistic String Project (LSP) at New York University began in 1965 with funding from the National Science Foundation to develop computer methods for structuring and accessing information in the scientific and technical literature. Document processing was to be based on linguistic principles, first to demonstrate the possibility of computerized grammatical analysis (parsing), then to extend to specialized vocabulary and rules for particular scientific domains. Domain specialization led to an elaboration of the methods of sublanguage analysis, in particular as applied to the language of clinical reporting in patient documents. The 30+ year history of the Linguistic String Project and its results are tracked in the following bibliography.
Major Summary | |
Other Significant Publication | |
M | Application to Medical Texts |
1. Linguistic basis for language computation (string analysis). 2. Procedure.
3. Implementations in IPLV & FAP (IBM 7094), organization of grammars as BNF definitions and "restrictions" (= constraints on parse trees), treatment of conjunctions.
First notion of tabular reduction of text using results of computerized language processing.
Theoretical paper: relation of string analysis to context-free grammar.
Introduces particular set of tree-navigation routines to implement restrictions.
Introduces "Restriction Language", a syntax for calling on the tree-navigation routines and tests at parse tree nodes in restrictions.
Introduces "Sublanguage Grammar" as basis for tabular reduction of text.
Summarizes approach, form of grammar; refers to 3rd implementation of parser (in Fortran) for CDC6600.
Describes Fortran implementation as compiler-compiler; describes "saving" to speed parsing.
Overall description of the system with some examples.
Describes lexical component, later = Appendix 3 cf #34.
Theoretical: How sublanguage grammar provides tool for analyzing scientific literature.
Describes the Restriction Language (RL) and illustrates how restrictions written in RL operate on parse trees.
A computational linguistics experiment: Develop semantic word classes by clustering, using parse-trees.
First study of clinical narrative in relation to known grammar rules: what is left unsaid.
More on clustering, using published scientific articles for data [see #13].
Beginning of transformational component of grammar; description of mechanisms.
Describes treatment of conjunctions in the implemented string grammar.
M First application of implemented "information formatting" - x-ray diagnosis statements.
Overall view of approach - invited paper.
M "Question-answering" applied to information-formatted x-ray reports. Results.
Potential applications of Natural Language Processing to science-literature retrieval.
One-page update: We have done x-ray reports, on to discharge summaries.
Some examples of unresolved ambiguity.
Example of information formatting: radiology reports. Another version of #20.
Overall view of natural language processing (NLP) field.
M Illustrates by manual example the application of information-formatting in medical audit.
A natural language "front end" to query information-formatted x-ray reports.
1. Theoretical basis. 2. Methods. 3. Programs. 4. Applications - the full system for converting free-text information to formatted information. Precursor to #34.
M Presents information-formatting procedures (overview) with example of retrieval from small set of information-formatted discharge summaries.
M Example of retrieving quality-assurance data from information-formatted discharge summaries.
Book review.
M Computational treatment of time information in information-formatted documents.
M Information formatting applied to clinical documents: methods, processing, results.
BOOK. Documentation (partial listing) of English grammar; summary of information- formatting, examples, use in teaching.
M Chapter 8: Information Formatting, and Chapter 9: Application of Medical Information Formatting.
M First use of DBMS (CODASYL type schema) to store and retrieve from information-formatted clinical texts.
M Full report of study in #30: Quality assurance data extracted from information-formatted discharge summaries; results compared with manual extraction.
M First attempt at automatic coding using information-formatted radiology reports.
M First use of relational DBMS to store and retrieve from information-formatted clinical text.
M Treatment of temporal relations in clinical narrative by processing time expressions in the text.
M Summary report of information formatting and database representation of clinical text.
Newsletter update on project activities.
Syntactic features special to technical sublanguage reporting.
Research on natural language "front end" for information-formatted text.
M More on first relational database design for information-formatted clinical text as in #38.
Compares syntactic features of clinical text with those in Navy messages. Extends #42.
More on natural language "front end" to information-formatted clinical text. Extends #43.
M More on the first relational database design for information-formatted clinical text, as in #38, #44.
Another demonstration of information-formatting followed by temporal analysis and retrieval program.
An experiment in linguistic reduction of scientific literature to tabular form.
M A progress report on current work on processing clinical text.
M Information formatting of pathology reports: design of an information format.
Another 1 page update on project activities.
Describes lexical component of system and an experiment in supplementing the dictionary with entries from a computer readable form of Webster's Collegiate Dictionary.
M An experiment in grading the complexity of texts by a measure based on transformed parse trees of sentences.
M Reports on a program to determine semantic components of the neo-Latin portion of medical vocabulary.
Overview paper focusing on sublanguage methodology.
M Overview paper on information-formatting procedures.
M The first relational database design reported to an Information Retrieval audience. Follows #38, #44, #45.
Information-formatting of a non-medical sublanguage. Extends #42, #45.
M Another report on the relational database design for information-formatted text, follows #38, #44, #47, #58.
M Brief report of system, suggesting application to narrative laboratory reports.
M Another project update for the medical informatics community.
M Medical applications of system; companion report to #62.
M Summary paper on lexical component of system for linguistics audience.
M BOOK. Summarizes how system described in #34 (Sager book) is specialized for medical document processing.
Another summary of system for a different audience.
M System described briefly to international medical informatics community.
Experiment in applying information-formatting to questions and answers in survey.
M Presents system to international audience at IMIA sponsored symposium.
M Describes sublanguage methodology and applicability of information-formatting to French clinical documents at IMIA symposium.
M The English medical information formatting system converted to French.
M Utility of medical language processing illustrated.
Book review.
M Initial results of French medical language processing, based on English medical information formatting, with Geneva collaborators.
Abstract.
M French medical information-formatting, querying, via relational dBMS (in French).
M Relation of English and French information-formatting systems.
M Overview of system for general scientific audience.
M Application of system in medical audit using asthma discharge summaries.
M Further work with the French system, with Geneva collaborators.
M Further work on French clinical documents, with Geneva collaborators.
M Results of the medical audit experiment using asthma discharge summaries.
M Overall description of system and results of using it for medical audit.
M First report on tying output of medical language processing system to an established medical nomenclature (SNOMED Int).
M Journal report on automatic encoding of free text into SNOMED Int via medical language processing system, comparison with manual results by Computer-based Patient Records Institute (CPRI).
M Using SGML to display results of medical language processing system.
M Report on using SGML for multiple displays of results of medical language processing system.
M Dutch parser combined with other components of LSP medical language processing system.
M system using SGML display.
M Dutch medical language processing using in part the LSP MLP, and information extraction with the use of SGML display.
M How the LSP system utilizes the linguistics of Zellig Harris.