SMT-LIB: Inital proposal for a common format

Mon Sep 16 21:18:21 EDT 2002

Dear Collegue,

 you have already received an email about the project of setting
up a library of problems to check the satisfiability of formulae
modulo background theories.   The library is called SMT-LIB, 
the Satisfiability Modulo Theories Library, whose central page
can be found at the following address:

             http://combination.cs.uiowa.edu/smtlib/.

 In that email, we have described the aim of the initiative and we have 
also given a list of problems to be solved in order to set up the library 
in a coherent way.  One of the major problem in that list is the following:
to establish a common format to specify the satisfiability problems in the 
library in a uniform way.  In the following of this email, we give some 
proposals for such a common format together with the reasons for our 
choices.  We hope this document will foster the discussion on this very 
important issue, which is of crucial importance to the success of the 
library.

Before diving into more technical details, we would like to point
out that in order to continue receving the emails concerning the
SMT-LIB initiative, unless you have already told us that you are 
interested, PLEASE LET ONE OF US KNOW WITHIN ONE WEEK IF YOU ARE 
INTERESTED IN THE INITIATIVE. Otherwise, we will remove your email 
from the smt-disc list.

We are very enthusiastic about the project and we are sure
that you want to join us in the exiciting discussions to come
about the SMT-LIB initiative.

Below, you will find the proposal for the common format together
with some discussion.  We hope this will start reactions and comments
and do not forget to send your benchmarks to set up (a prelinary version
of) the library.  Please notice that the initial DEADLINE for sending us 
your benchmarks or setting up a local web page is OCTOBER 31, 2002.
(To facilitate the setting up of the local web pages we will soon 
provide on the SMT web site a tar file with template pages.)

Best regards,

Silvio Ranise (Silvio.Ranise at loria.fr) and
Cesare Tinelli (tinelli at cs.uiowa.edu)

===================================================================

--------------------------------------
SMT-LIB: Proposal for a common format
--------------------------------------
Some of the people in this mailing list had already started a discussion 
on the format of the benchmark. In particular, there was a proposal by 
Shankar to use XML as a basis, and a counter proposal by Greg (Nelson) 
to use S-expressions instead.

Both proposals have their merits and need further discussion. However, 
as we mention in our first message, we believe that before deciding 
which abstract syntax to use, it is important to settle some more 
fundamental issues, such as which logic to adopt, what classes of input 
problems to consider, and how to specify background theories.

The choice of the logic is important because it affects both the kind of 
problems we consider and the abstract syntax for expressing them.
The choice of a uniform way of specifying background theories is 
important for both semantical issues such as as soundness/completeness 
of the solvers and for comparison purposes: it is hard to compare your 
results to someone else's if it is not clear which background theories 
they are using.

-------------
Preliminaries
-------------

First, let's agree on some terminology and basic assumptions, to make 
sure that there are no misunderstandings.

We assume that input problems are to be checked for (un)satisfiability, 
not validity (the difference matters for those classes of formulas that 
are not closed under negation).

Since we are interested in satisfiability (as opposed to validity), when 
we say that a formula is *quantifier-free* we do mean quantifier-free 
(no implicit universal quantifiers). 
Of course, for satisfiability purposes, a quantifier-free formula phi 
can be seen as a ground formulas over an expansion of phi's signature 
which treats phi's variables as *Skolem* constants, i.e., new constant 
symbols. So, quantifier-free formulas can also be identified with ground 
formulas with Skolem constants.

We distinguish between the underlying logic used by a solver 
(first-order, modal, higher order, etc.) and the class of formulas it 
accepts as input (ground, CNF, first-order, temporal, etc.). For 
instance, the underlying logic of a solver for linear arithmetic is 
first-order logic with equality; the class of input formulas may be 
conjunctions of equations and inequations.  A further substantial
issue is how to specify the theory in which the solver works.  For example,
one must be careful when a solver works over the combination of
two theories augmented with a new function (think of the combination
of the theory of lists and integers with the function returning the 
length of a given list).

----------------
Underlying Logic
----------------

Everybody will doubtless agree that the underlying logic should be at 
least classical first-order logic with equality (FOL-EQ).
Not so obvious is whether we should limit ourselves to unsorted FOL-EQ 
or consider some sorted version of it.
While most theoretical results on combination are expressed in the 
unsorted setting, most of the background theories and input problems are 
more naturally formulated in a sorted framework. (Some of the existing 
solvers actually have a sorted input language.)
If sorts are to be seriously considered, the next question is whether to 
use a many-sorted approach, with a flat sort hierarchy, or an 
order-sorted approach, with sorts and subsorts.
An orthogonal question is whether to consider modal logics as well. For 
instance, STeP considers a version of linear-time temporal logic.
The last question might then be whether to consider a higher-order 
logic, and which one.

PROPOSAL 1: Let's start by limiting ourselves to unsorted FOL-EQ, and 
let's consider sorted logic only later, if at all.

RATIONALE: The reason not to consider modal or higher-order logics is 
that it might be an overkill. Most existing, and most likely future, 
automated solvers are limited to FOL. (Even those that take higher-order 
formulas are automatic only on those problems that can be somehow 
reduced to first-order ones.)
The reason not to consider, at least at the beginning, a sorted FOL-EQ 
is that sorts are difficult to treat properly. Certainly, more than many 
in the community seem to think.  In fact, sharing sorts amounts to sharing
predicate symbols, which means that the existing (and well-understood)
combination methods (such as Nelson and Oppen) for disjoint signatures
need substantial modifications.  We believe that sorts should be 
considered only after an adequate theoretical treatment of sorted 
combination has been worked out by the research community.  

-----------
Input class
-----------

Again, everybody will agree that the class of benchmarks we consider 
should be at least that of conjunction of literals, or better, arbitrary 
quantifier-free formulas.
But should we also consider quantified formulas? Some solvers like 
Simplify and STeP accept arbitrary first-order formulas.
Of course, depending on what we decide about the underlying logic, we 
may also need to decide whether to consider also formulas with 
modalities or with higher-order quantifiers.

PROPOSAL 2: We should definitely consider the whole class of 
quantifier-free formulas, but we would even go all the way and allow 
problems within the full class of first-order formulas.
However, let us classify the problems in the library in two classes 
according to the fact that we are interested in checking the 
satisfiability of (1) quantifier-free or (2) full first-order formulae.
The reason for separating quantifier-free benchmarks from quantified ones 
is to facilitate the downloading and the use of the quantifier-free
benchmarks for those people that are not interested in the quantified ones.

RATIONALE: Although some of the existing solvers do not go beyond 
conjunctions of literals, we think that full Boolean combinations of 
predicates should be allowed. A lot of people (including several of us) 
are investigating efficient ways to incorporate "theory solvers" into 
SAT solvers. These people would find the library very useful.
The reason to allow benchmarks with (first-order) quantifiers is that
i) they often arise in practical applications,
ii) the mathematics behind them is well-understood, and
iii) effectively solving the problem of finding suitable 
instances of quantified variables is a major research problem which
awaits substantial work from the automated theorem proving community
(although some serious work has already been done by the SRC people
and implemented in the Simplify prover).  

REMARK: We are not sure though whether the inputs should be restricted 
or not to a specific normal form (such as CNF for qffs or PNF with CNF 
matrices for fofs).
The reason we are not sure whether to impose a specific normal form (as 
done with the DIMACS format for SAT problems, for instance) is that the 
conversion may strip the problem of structural properties that a solver 
might have exploited to process the problem faster. Also, there often 
are several conversions methods into the same type of normal form, with 
some methods being better than others on certain types of inputs.
On the other hand, having a fixed normal form may facilitate the 
implementation of parsers and the like.

--------------------
Theory specification
--------------------

There are essentially two ways to specify a theory.
One is axiomatic: a theory is (the deductive closure of) a certain set 
of sentences in the underlying logic. The other is algebraic: a theory 
is a set of models (where the definition of model depends on the 
underlying logic).

PROPOSAL 3: Specify the theories algebraically.

RATIONALE: Some solvers are complete only for theories that can be 
specified algebraically. For instance some Shostak-based approaches are 
complete with respect to so called sigma-models.
Incidentally, the choice of an algebraic specification does not rule out 
any theory T that is more naturally speficied by a set of axioms S. In 
fact, the same T can always be equivalently specified as the set of 
models of S.

We conclude by pointing out that for combined theories it is important 
to specify precisely how the theory is obtained from of its components, 
the reason is that in general there are several conceivable ways to 
combine the components.
This is perhaps obvious in the sorted case where there are several ways 
to combine the sorts of one theory with those of the other. But it is 
also true in the unsorted case.
For instance, one might have two theories T_1 and T_2 where each T_i is 
the set of X-models of some set of sentences S_i (replace X- in X-model 
with your favorite prefix: sigma-, free-, canonical-, initial-, and so on).
Then, one combined theory is the set of models whose reduct to the 
signature of T_i is (isomorphic to) a model in T_i, for each i. Another 
combined theory could be the set of X-models of (S_1 UNION S_2). The two 
theories are not typically equivalent.
The case for a precise specification is even stronger in cases of 
"augmentation", in which the combined theory is more the the sum of its 
parts, so to speak. (Think for instance of a combined theory of integers 
and finite sets equipped with a cardinality operator).