FOM: Computer-verified theorems [was Re: The meaning of truth]

Sat Nov 4 05:23:57 EST 2000

Recently V. Kanovei and J. Shipman have mentioned mathematical proofs
done by machines (the best known example being the theorem of 4
coloring of maps).

kanovei at wmwap1.math.uni-wuppertal.de (Kanovei) writes:
> My vision is that as soon as 
> 
> (1) there is a program whose certain output leads to a certain 
> conclusion regarding 4-color, 
> (2) the program works in digital manner, deals only with entire 
> entities (no fractions that must be cut, etc.) and does not 
> appeal to random numbers,
> (3) it does not exceed physical capacities of the computer, 
> for instance, the amount of data to be stored never exceeds 
> the storage memory, the estimated time of work is not more 
> than something, etc.
> (4) the (1,2,3) above are established as mathematical facts, 
> 
> then few runs of the program on different computers with one and 
> the same result mean that the fact has been mathematically established, 
> in principle ONE run is enough, few are invented to inghibit possible 
> misfunction of the hardware. 

Joe Shipman <shipman at savera.com> writes:
>Therefore, if you are given a number which passes the test for a few
>thousand RANDOMLY chosen w, you may be sure that either it is prime or
>you have been extraordinarily unlucky, so unlucky that "machine" error
>is a MUCH more likely explanation even if you ran the calculation on
>100 different independent machines.

I would like to make a few remarks about machine-verified theorems,
since I've heard many times before uninformed opinions like the above
ones.

If the issue is whether we can trust a machine-verified theorem, then
we must look at what are the sources of errors _in practice_. Some of
these are:

1) The source code written by the programmer contains errors.

2) The compiler that was used to translate the source code into
   machine language generated incorrect machine language code.

3) A particular run of a program went wrong because some aspect of its
   environment was misconfigured. An example of this would be a user
   forgetting to set file permissions on some file correctly, and the
   program failing to report this fact, but instead continuing execution
   as if everything was fine.

4) The hardware was designed badly (recall the infamous Pentium
   division bug).

5) There is an undetected but permanent error caused by hardware. For
   example, the program might have been communicated over the network, an
   error occurred during the communication, and now we have a corrupted
   copy of the program. Or, we wrote the program onto a floppy disk
   with a bad sector, so when we read it later it will differ from the
   original (correct) version.

6) A particular run of a program went wrong because a cosmic ray,
   or some other such unpredictable event, caused the hardware to
   malfunction. If we run the program again, everything will be
   all right--unless another cosmic ray strikes! (And since one
   already stroke, it's more improbable that another will come.)

Note that 1) to 4) are all DESIGN errors caused by HUMANS, and that 5)
can be detected fairly easily.

1) is _by far_ the most common source of errors, 2)--4) are not
uncommon, 5) is extremely uncommon, while 6) is not detectable in
practice but presumably is exceedingly rare.

By the way, modern operating systems detect the fact that a program
has run out of resources (memory, disk space, ...) and get very
unhappy about that. So Kanovei's criterion #3 about not exceeding
resources, while of course a necessary criterion, is not of concern
because it is easily detected. We must worry about those defects that
cannot be detected.

A typical mathematician who has not had much experience with
programming might think that 1)--4) are not an issue. He will say that 
we must just make sure that the program has been written correctly,
and that the hardware must be designed correctly. BUT WE HAVE NO IDEA
HOW TO DO THAT IN PRACTICE!

In reality, software and hardware verification are big issues. The
computer industry is pouring millions of dollars into research on how
to detect and eliminate errors 1)--4) from the above list, whereas 5)
and 6) are completely irrelevant, except perhaps on airplanes and the
Space Shuttle.

Let me repeat this, since it's the main point I am trying to make: IN
PRACTICE DESIGN ERRORS CAUSED BY HUMANS ARE EXTREMELY HARD TO DETECT
AND OCCUR ALL THE TIME.

Nevertheless, most programs give the expected correct results most of
the time, despite the fact that they contain design errors. Isn't that
amazing? There ought to be a study of "robustness of proofs" that
explains this.

Running programs on 100 different machines independently does not help
against design errors, unless those machines (hardware + operating
system) have been designed and manufactured independently. And even
then it is likely that all 100 of them contain some errors, albeit
different ones.

It is incredibly hard to _prove_ that software is doing what it is
supposed to be doing. The branch of computer science that deals with
this is Software Verification (and here I have in mind _formal
methods_ for proving correctness of software, as opposed to
``industrial strength'' empirical software testing). For example, we
are light-years away from being able to verify a system of the size of
Microsoft Windows operating system, or even of moderately complicated
programs (e.g. a simple first-order theorem prover).

Hardware verification is doing much better. We can verify correctness
of (design of) separate _components_ of a large CPU. For example, we
can verify a circuit for integer division, or a circuit that schedules
the CPU pipeline. We still cannot verify the correctness of design of
an entire CPU.

Thus, I would like to conclude that any FOM discussion concerning
machine verified mathematical statements should be concerned with our
embarrassing inability to verify the correctness of the programs that
we write. This is a real practical issue. (Why aren't logicians
earning millions of dollars by coming up with new and better ways of
_proving_ correctness of programs?)

Having said that, let me also state that I trust a carefully performed
machine verification of a proof to the same extent that I trust a
refereed journal article.

I entirely dismiss any arguments about humans having some sort of
supremacy over machines when it comes to verifying proofs. If
anything, the opposite is true.

I hope this convinces you that random hardware errors are a non-issue,
just like random human brain malfunctions are a non-issue.

As far as randomized algorithms are concerned: you can lower the
probability of a randomized algorithm having gone wrong all you want,
but you will still not know whether a design error caused your program
to malfunction. And that will be your true source of mistrust.

Andrej Bauer

Pure and Applied Logic
School of Computer Science
Carnegie Mellon University

P.S. The comment about the second cosmic ray being more improbable
     than the first one was a JOKE!