[SMT-COMP] Proof Validation Track for SMT-COMP

Mon Jul 19 10:44:03 EDT 2021

Hello,

Yesterday, I mentioned the plans for the Proof Validation Track. I
know that finding a good solver independent proof format is difficult.
I would like to go with something simple with a minimal but complete
set of axioms.  Then every solver has the same burden to translate its
own proof steps into this simple format.  In this mail I want to
outline the proposal I have in mind with a few more details.  I can
elaborate further, but maybe it's better to get feedback first, if you
think this is a way this can work.

One question is how to explain the CNF encoding.  When the proof
format supports arbitrary SMT-LIB terms as atoms, this is simple: an
assert is just introducing a unit clause with one atom.  Then you add
all Tseitin-encoding clauses like "~(or a b), a, b"  and use
resolution.   A solver that doesn't use Tseitin-encoding can still
explain the clauses in its CNF conversion using these axioms and
resolution.

By allowing the prover to use SMT-LIB terms in the proof that do not
occur in the input formula, the proof calculus supports extended
resolution and arbitrary intermediate transformations, e.g., a solver
can use de-Morgan to eliminate "and" in favor of "or", or vice versa.
All these transformation rules need to be proven by the respective
solver (or a proof-postprocessor) using only the minimal set of
axioms.

I experimented a bit and it looks like the standard Tseitin-encoding
axioms for logical operators,
reflexivity/symmetry/transitivity/congruence for equality, and a few
rules for ite/distinct/chained equalities should already be complete
for QF_UF.  For UF, one would add the axioms for universal and
existential quantifiers using some syntax for Hilbert's
epsilon-operator. Other theories can be added by adding theory
specific axioms, like McCarthy's array axioms. But we could keep the
proof validation track limited to the easier logics at first.  I have
at the moment no idea how to extend it to the bitvector theory, for
example.

Replaying rewrite based proofs to this format is a bit tedious, as you
first have to prove the rules you use.  As an example, proving (= (or
a b) (or b a)) would take eight axioms and seven resolution steps.
However, this is a constant overhead so it would cause only linear
proof size increase and only for the pre-processing part.  And luckily
we can automate the tedious work of adding these proof steps :)

Finally, for a compact representation of the proof, you need to
abbreviate common subterms. For this, I would suggest using the
standard let syntax from SMT-LIB. A difficulty added by the let
encoding is that terms need to be considered equal if one is the let
expansion of the other term.

Probably most controversial in this proposal would be the question
which terms are equal, i.e. cancel each other out if used as pivot in
a resolution step.  I would like to be very strict, e.g. even "(or a b
c)" and "(or (or a b) c)" should be different, and "((or a b) :named
c)", "(or a b)", and "c" are three different terms.  A let expansion
is considered equal to the unexpanded term, though.  Otherwise, proofs
cannot be compactly represented.

I hope I don't step on too many toes by going a bit ahead in this proposal.

I would be interested in some feedback.  Especially if you are
concerned that certain features of your proofs cannot be represented
without quadratic or even exponential overhead, then I would like to
hear these concerns.

Some SMT-solvers use SAT solvers that create DRAT proofs.  While these
proofs are not easily expressible using resolution, they can be
converted to extended resolution proofs with polynomial overhead.  My
feeling is that I would rather keep the proof validator simple and let
someone provide an engine that converts DRAT to extended resolution
that DRAT-based solvers can use.  The polynomial overhead seems
unavoidable, since checking a DRAT proof incurs polynomial overhead,
if I'm not mistaken.

Regards,
  Jochen Hoenicke