Japanese/English Machine Translation
Using Sublanguage Patterns and Reversible Grammars






Ping Peng





Abstract

For this thesis, a Japanese/English machine translation system with reversible components has been designed and implemented in PROLOG. Sublanguage co-occurrence patterns have been used to address the problems of lexical and structural selection in the transfer between the internal representations of a pair of natural languages. The system has been tested translating Japanese into English in the domain of programming language manuals. The evaluation of the test outputs provides some assessment of the utility of the sublanguage approach as a method for the development and refinement of a machine translation system. The thesis also explores the roles that a reversible grammar would play in sharing linguistic knowledge between parsing and generation.

The system has been developed with the goal of using sublanguage word co-occurrence patterns to simplify the description of syntactic/semantic knowledge needed in both the transfer rules and the analysis of the source language. In particular, sublanguage co-occurrence patterns are introduced to provide semantic constraints and ellipsis recovery in parsing Japanese.

This thesis introduces a right-to-left parsing scheme for Japanese. The idea for the right-to-left parsing algorithm evolved from the desire to produce partial syntactic analyses of Japanese in a more deterministic manner than was achieved by conventional left-to-right parsing schemes. The algorithm makes efficient use of sublanguage co-occurrence patterns as semantic knowledge to help disambiguate Japanese parses. The enforcement of syntactic and semantic constraints is tightly interwoven during the course of parsing. The performance in parsing Japanese has thereby been significantly enhanced.

A procedure has been implemented for translating a Definite Clause Grammar dually into a PROLOG parser and PROLOG generator, so that one grammar can be used for parsing and generation. In current natural language processing systems, separate grammars are used for parsing and generation. However, there has long been an interest in designing a single grammar for both parsing and synthesis for reasons of efficiency and integrity, as well as linguistic elegance and perspicuity. As part of the current implementation, a strategy has been developed for creating efficient grammars for both parsing and generation using a goal reordering technique within the logic programming framework.