[FOM] Talk on Formal Verification

Sun Jan 24 11:11:58 EST 2016

This is a response to Harvey's post, regarding the (very nice) talk he gave
at the two-day meeting, Certified Proofs and Programs 2016. As I explained
to Harvey at meeting, the issues that he raised in his message are all
central to automated and interactive theorem proving. In this message I
will summarize some of the things that he and I discussed with other
participants. My expertise here is limited, but I will do my best to offer
some helpful pointers to the literature for those interested in learning
more.

The problem of finding relevant facts from large libraries and sets of
facts manifests itself in various guises. In automated reasoning,
developers use "indexing" techniques to quickly find hypotheses matching
various synactic criteria. They can be used, for example, to quickly find
identities that match against terms and expressions in a problem at hand,
or implications that match a current goal. The definitive introduction is
an article by Alan Robinson and Andrei Voronkov in the /Handbook of
Automated Reasoning/:

  http://www.cs.man.ac.uk/~voronkov/papers/handbookar_termindexing.ps

More broadly, one often wants to select potentially "relevant" theorems
based on criteria most robust than simple syntactic pattern matching. For
example, Larry Paulson has developed a remarkable tool called
"Sledgehammer" for the Isabelle proof assistant. Given a goal (that is, an
inference that we would like to fill in automatically), it culls
potentially relevant facts from Isabelle's library (which now consists of
tens of thousands of theorems), translates them and the goal from
higher-order logic to first-order encodings, sends them to external
first-order theorem provers, and, if a proof is found, uses the result to
reconstruct a formal proof within Isabelle. He and Jia Meng have developed
a relevance filter that chooses a useful list of theorems to export. The
issues and ideas are nicely described here:

  http://www.cl.cam.ac.uk/~lp15/papers/Automation/filtering-jal.pdf

There are a large number of papers written about Sledgehammer and similar
methods. The following survey provides an excellent overview of the issues,
including a number of approaches to theorem selection:

  http://people.mpi-inf.mpg.de/~jblanche/h4qed.pdf

This article is by Blanchette, Kaliszyk, Paulson, Urban.

Rather than develop heuristics to choose theorems, one might hope to use
machine learning techniques to learn which features are relevant to the
choice. There is a literature on this (Josef Urban is a pioneer here). The
Blanchette et al. survey also discusses such machine learning approaches
and offers a number of references.

All of this is situated in a long history of research in automated theorem
proving, which I cannot even begin to address here. A starting point for
this is the /Handbook of Automated Reasoning/, mentioned above. One should
perhaps also take a look at the article, "Satisfiablity Modulo Theories,"
by Barrett et al. in the /Handbook of Satisfiability/, since such
techniques (based on "conflict driven clause learning," aka CDCL) are
playing an important role. John Harrison's book, /Handbook of Practical
Logic and Automated Reasoning/ provides an excellent introduction to many
of the core ideas in automated reasoning.

Another topic that Harvey and I discussed, along with other people at the
conference, are the practical differences between using set theory and
various flavors of type theory. I don't want to start another silly flame
war now over which is the "best" or "true" foundation for mathematics.
Since all the type theories I will mention have straightforward
set-theoretic interpretations, type theory is principally no less dubious
than set theory. At issue is simply developing pragmatic approaches to
supporting mechanized reasoning and interactive theorem proving.

The issue is simply this. In ordinary mathematics, we deal with many types
of objects. An expression may denote a natural number, an integer, a list
of integers, a function from natural numbers to reals, a set of reals, a
function (like the integral) which takes a function from the reals to the
reals and a set of reals and returns a real, a group, a ring, a module over
some ring, an element of a group, a function which takes an element of a
group and a natural number and returns another element of the group, a
homomorphism between groups, a function which takes a sequence of groups
and returns a group, a function which takes a sequence of groups indexed by
some diagram and homomorphisms between them and returns a group, etc., etc.
In set theory, these are all sets, and we have to rely on separate
hypotheses and theorems to establish that these objects fall into the
indicated classes. For many purposes, however, it is useful to have a
discipline which assigns to each syntactic object a "type," indicating what
sort of thing it is. That is what type theory is designed to do.

In simple type theory, terms and types are two different sorts of
expressions. Dependent type theory is more flexible, and one can define
types that depend on terms. For example, "l : list nat" says that l is a
list of natural numbers, and "v : vec real 3" says that v is a vector of
reals of length 3. For reference, among systems with substantial
mathematical libraries, Mizar and Metamath are based on set theory; ACL2 is
very close to quantifier-free many-sorted first-order logic; HOL, HOL
light, and Isabelle use simple type theory; and Coq, Agda, PVS, and Lean
use various forms of dependent type theory.

It would take me way too far afield to compare and contrast these systems.
All are documented copiously and there is plenty of material available
online. At the meeting, Cesary Kaliszyk presented recent work on porting
the Mizar environment to Isabelle, and he, Harvey, and I talked about the
pros and cons of the Mizar approach. The associated paper, by Kaliszyk,
Pak, and Urban is here:

  http://cl-informatik.uibk.ac.at/users/cek/docs/16/ckkpju-cpp16-mizisa.pdf.

Type theory has its downsides, in that it can be difficult to view an
object of one type as an object of another. For that, one has to "cast" or
"coerce" objects from one type to another. But the advantages of adopting a
good type discipline are vast. Once again, it will take me way too far
afield to provide an introduction to dependent type theory, and copious
references are available online. But for a quick overview of some of the
things one would like a type system to do, I suggest looking at section 2
of this paper I wrote with Leonardo de Moura, Soonho Kong, and Cody Roux:

  http://arxiv.org/abs/1505.04324

For those who want to try it out, I suggest our tutorial introduction to
dependent type theory in the Lean theorem prover:

  http://leanprover.github.io/

You can find the tutorial under "documentation". The online version lets
you experiment with Lean in your browser.

I want to emphasize that type theory is not just about user input. For
formal theorem proving, we want to do all of the following:

(1) write definitions and assertions in a nice way
(2) write proofs in a nice way
(3) find and construct proofs using automation
(4) verify the correctness of computations
(5) find definitions and theorems in databases of facts

Writing assertions is only a small part, and there is always the option of
adding front ends to any underlying discipline to make the user input and
output prettier and more natural. Even at the level of writing assertions,
however, type discipline offers important advantages:

(1) Users generally need to write much less, since a lot of information
that is left implicit in ordinary mathematical writing can be inferred from
type data and filled in automatically.

(2) Type systems are better at catching typos and minor bugs, and, when
there is such an error, providing better information to the user as to what
is wrong.

These are nontrivial. When outsiders first come to type theory, a common
first reaction is that first-order logic is much simpler and more
intuitive. When faced with questions like "How is a system supposed to
figure out that this expression denotes a natural number?" "How is a system
supposed to figure out what '+' denotes?" they respond, "Well, of course,
we can adopt conventions to make that possible." The problem is that such
conventions have to be clear, general, and flexible, because otherwise the
system will devolve into a bloody mess and we will have to invent new
conventions every time users complain that the system fails to interpret
what they have written. Investing enough time to develop and implement a
working system of conventions essentially amounts to developing a type
theory. On the basis of a lot of hard work and experimentation, we have
learned a lot about what works and what doesn't. New ideas are certainly
welcome, but they have to be tested with working implementations and
substantial mathematical data, not a priori speculation and toy examples.

In any event, the potential uses of type theory in automation have only
begun to be explored. Most automated reasoning systems work in the
first-order setting. As I said above, Isabelle's Sledgehammer works by
translating theorems from higher-order logic to first-order logic. There
are drawbacks to this: first-order theorem provers do not handle the
information encoded in the translation as efficiently as a native-theorem
prover would, and, when the search fails, it is virtually impossible to
provide useful feedback to the user. Satallax and Leo II are theorem
provers designed to work directly with higher-order logic:

  http://page.mi.fu-berlin.de/cbenzmueller/leo/
  http://www.ps.uni-saarland.de/~cebrown/satallax/

At the CPP 2016 meeting that Harvey and I attended, Leonardo de Moura
described work in progress with Daniel Selsam to develop automation for
dependent type theory in the Lean theorem prover. In particular, the last
few slides in his talk explain the issues that arise when one tries to
implement "congruence closure," a commonly used algorithm for equational
reasoning in first-order settings, in the dependently typed setting:

  http://leanprover.github.io/presentations/20160119_CPP/#/sec-title-slide

I have said it before, and I will say it again: this is an exciting time to
be involved in formal methods in mathematics. The technology is certainly
not where we would like it to be, and there is a lot of work to be done.
But we are learning a lot and making progress, and I am looking forward to
seeing what will happen in the years to come.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/fom/attachments/20160124/ce30cff7/attachment-0001.html>