What is the MLP?
The MLP — Medical Language Processor — is a system that transforms free-text clinical documents into an XML structured representation of the information in the documents. Document sentences are parsed, further processed to eliminate ambiguities, and mapped into medically labeled structures, called Information Format Units (IFUs). The IFUs are enriched by the addition of medical knowledge tags drawn from the Structured Health Markup Language (SHML). In this form they become Health Information Units (HIUs), the basic unit of description in the final representation. Processed documents are installed in a clinician-oriented viewer to provide users selective access to textual information needed for patient care, or they can be used in other applications.


In the historical development of the MLP, first a general NLP parser was developed, along with a computer English grammar and associated English lexicon (cf. Sager, Natural Language Information Processing: A computer grammar of English and its application, Addison-Wesley 1981). The next step was to facilitate specialization of the processing for texts in a given field (medicine) by adding a further level of classification to words in the lexicon and developing a "sublanguage" component of the grammar (see Sager et al., Medical language processing: Computer Management of Narrative Data, Addison-Wesley 1987). Processing was extended to neighboring languages: French, German and Dutch.

More recently XML has been incorporated into the system in two ways: tagging lexical entries with their specific medical content (using a Medical Tag Hierarchy), and incorporating the results of tagging into an XML representation of the parsed documents. The tags serve for retrieval and display in terms of categories familar to physicians.

Acknowledgements.