FOM: Computer-verified theorems [was Re: The meaning of truth]

Sun Nov 5 07:42:09 EST 2000

I agree with practically everything Bauer says in his post about the 
difficulty of software verification, and have said similar things in the past 
on FOM.  My point to Kanovei was that the tiny chance of incredibly bad luck 
in random choice of witnesses to compositeness was dwarfed even by the tiny 
chance of "machine error", for simple types of machine error like hardware 
faults and cosmic rays, so that it was no obstacle to attaining a confidence 
level of "scientifically proven" in the truth of a mathematical theorem.  
(The confidence level of "mathematically proven" is still not extremely close 
to 100% because of the possibility of human error in verifying the proof, but 
it is SUBJECTIVELY a different order of confidence when you have verified a 
proof by actually checking it in the traditional way.)

Of course the other types of computational error Bauer refers to are much 
more serious.  I would like to repeat the recommendation I made on the FOM on 
August 30 1998 (excerpted below):

**********************************************************************

4CT was "verified" quickly because the algorithm was straightforward enough, 
and the input data manageable enough, that independent researchers could 
write their own implementations and check the outputs.

But it is widely underappreciated how difficult and dangerous verification 
is.  For a hilarious horror story, read how Thomas Nicely of Lynchburg 
College discovered the flaw in the Pentium chip while trying to calculate 
some constants related to the distribution of twin primes ("Enumeration to 
1e14 of
the twin primes and Brun's constant", Virginia Journal of Science, Volume 46, 
Number 3 (Fall 1995), pp.195-204, available online at 
http://www.lynchburg.edu/public/academic/math/nicely/twins/twins.htm ).

The following quote from this paper summarizes the lessons Nicely learned:

<<<<<
After this tempest began to settle (mathematicians in general, and this one 
in particular, are not accustomed to being besieged by inquiries and visits 
from network news, the BBC, the Washington Post, the Wall Street Journal, and 
the New York Times, as well as journalists from France, Australia, and 
Malaysia), attention returned to the work at hand. With commendable haste, 
Intel provided
a replacement chip for the errant Pentium (as well as one for a colleague's 
system) and also supplied a 90-MHz Pentium system to help make up for lost 
time. Naturally, all remaining calculations were carried out in duplicate, 
and the wisdom of this caution was confirmed when other discrepancies 
appeared between duplicate runs. Twice the errors were tracked to 
intermittently defective memory (SIMM) chips; parity checking had failed to 
report either error. On another occasion, a disk subsystem failure generated 
a wholesale lot of incorrect, yet plausible data. In the most recent 
instance, a soft memory error appears to be the culprit. 

Although many may dismiss machine errors such as these as of no practical 
consequence (except to mathematicians) or as evidence of the unreliability of 
personal computers, I believe there is a more important conclusion to be 
drawn. Any significant computation, for which there is no simple check
available, should be performed in duplicate, preferably on two different 
systems, using different CPUs, hardware, operating systems, application 
software, and algorithms. In practice, it is difficult to obtain this degree 
of independence; but certainly, one must be suspicious of the result of any
single, lengthy machine computation, whether it is performed on a personal 
computer or the world's finest supercomputer. As we tell our students---check 
your work! 
>>>>>

The five different levels Nicely recommends duplicating (CPU, hardware, 
operating system, application software, and algorithms) all experienced 
failures at different stages of the project, and of course only the last of 
these five was his fault!

In the absence of rigorous proofs of the correctness of computer systems 
(including chip design, programming language, compiler, and operating 
system), we can not have the same type of certainty about the correctness of 
a computer-aided proof as we can of a humanly surveyable proof.  But we can 
attain "moral" certainty if we follow Nicely's recipe with sufficient 
thoroughness (I would demand more than two verifications in which all the 
components of the system were independent for an important result like 4CT or 
the Kepler conjecture).  

*******************************************************************

-- Joe Shipman