The Compiler Validation Project:
Developing tools for proving the correctness of compilation.


Motivation: There is a growing awareness in industry, academia, and government of the crucial role of formally proving the correctness of safety-critical portions of systems. Most verification methods deal with the high-level specification of the system. However, if one is to prove that the high-level specification is correctly implemented at the lower level, one needs to verify the compiler which performs the translations.

Goal: Our goal is to develop a methodology, supported by tools, which will provide practical means for ensuring correct compilation by state-of-the art optimizing compliers.

The Methodology — Translation Validation: Formally verifying a full-fledged optimizing compiler is not feasible due to its size, rates of change, and proprietary considerations. Translation validation is a novel approach that offers an alternative to the verification of translators in general and compilers in particular. According to the translation validation approach, one constructs a validation tool that, after every run of the compiler, formally confirms, via a proof script, that the target code produced on that run is a correct translation of the source program. The proof script can be independently checked, for even greater assurance, by existing proof checkers. Translation validation has already been shown to be effective in the optimization restricted case, in which most compiler optimizations are disabled. In this case, the proof that the target code correctly implements the source program is reduced to the proof of the validity of a set of automatically generated verification conditions (proof obligations) which are implications in first order logic. All that is required then is to establish the validity of the set of these verification conditions. Under the (provably realistic) assumption that only limited optimization is applied to arithmetic expressions, the proof obligations are in a restricted form of first order logic, called equational formulae, using uninterpreted functions to represent all arithmetic operations. Recent work has shown the feasibility of building a tool for checking the validity of such formulae. This tool is based upon finding small domain instantiations of equational formulae and then using BDD-based representation to check for validity.

New Challenges: When the optimizations are enabled, it is no longer sufficient to use the above verification conditions. The validation tool will need additional information describing which optimizing transformations have been performed. This information can only be supplied by the compiler itself. We therefore extend the translation validation approach to include compiler-generated instrumentation in the form of program annotation. This annotation will be processed by the validation tool to form invariant assertions at selected control points. The verification conditions will then be augmented with these assertions. However, the validator will also need to establish that these compiler-generated assertions are indeed invariant.

Strategy: A major component of our approach is to identify a minimal set of program annotations that is both sufficient to deal with all known optimizations and contains all the information needed to generate the translation validation by efficient, fully automatic, and simple to check proofs. Our early results present such a set for classic optimizations (constant folding, copy propagation, common-subexpression-elimination, strength reduction, loop-invariant code motion, etc.) One of our immediate goals is to extend the translation validation framework to handle instruction scheduling transformations.

Ultimate Goal: Ultimately, we envision integrating the validation tool into the compiler. Thus, the result of compilation would be not only the optimized target program, but also a simple, machine-checkable proof script. This proof script could then be checked by a simple proof checker that can be supplied with the compiler.

As part of this work, we intend to apply our results to optimizing compilers for emerging architectures, including EPIC processors, out-of-order superscalar machines, and, ultimately, highly scalable parallel computing systems.

Who will benefit from the work: We expect out work to be of great value to groups who develop applications using a process in which analysis, verification, and simulation are performed at the source level. It is therefore crucial that whatever integrity has been achieved at the source level is not jeopardized by a faulty translation from the source program to the target program. The problem of the compiler being the weak link in an otherwise high-integrity system development process is particularly acute in safety-critical systems, where certification of any compiler used is required by standards, as well as in compilation-into-silicon methods where a translation error may be very expensive to repair. Compiler developers will also benefit from this work, since it will provide them with an extremely effective debugging tool.