An Analyzer for the Information Content of Sentences (Semantics)

Candidate: Johnson,Stephen Bennett


An algorithm is presented which produces a representation of the information content of sentences as a tree of operator words predicating on argument words. The Sentence Analyzer employs a new type of formal grammar which describes the surface syntax of sentences, grammatical constraints, and the operator-argument relations underlying the surface forms. The algorithm works left to right, first obtaining the operator-argument representations of words from a lexicon, and then applying grammar rules to construct operator-argument subtrees over longer and longer segments of the sentence. All alternate analyses are developed simultaneously. The grammar rules are based on the detailed mathematical grammar of Zellig Harris, termed here Composition-Reduction Grammar, in which sentences are generated by a process of operator words entering on argument words. As words enter, this tree structure is linearized. Various reductions may apply to words which are redundant in the operator-argument structure, producing variations such as morphological changes, and the dropping of words from the sentence. Reduction yields sentences with a more compact form, the form we see, while preserving the objective information content. The fundamental unit of the formal grammar developed here is the descriptor, a tuple of six attributes, which represents an operator-argument word class. A descriptor is similar to traditional word classes like nouns and verbs, but can carry information specific to an individual word to form an entry in the lexicon. More importantly, descriptors can replace the use of symbols for phrases in traditional grammar. This is because a descriptor can stand for the entire word sequence spanned by the operator-argument subtree of which it is the root. This feature enables the grammar rules to be specified as a relation between two descriptors whose subtrees span adjacent word sequences. The two words related by a rule either have a simple operator-argument relation, or a more complex operator-argument relation made compact by reduction. The result is a formal grammar in which all relations are between words, with sufficient power for the Sentence Analyzer to perform a direct analysis of sentences into their informational relations, without recourse to intricate transformational procedures.