SMT-LIB: Inital proposal for a common format
Silvio.Ranise@loria.fr
Silvio.Ranise at loria.fr
Mon Sep 16 21:18:21 EDT 2002
Dear Collegue,
you have already received an email about the project of setting
up a library of problems to check the satisfiability of formulae
modulo background theories. The library is called SMT-LIB,
the Satisfiability Modulo Theories Library, whose central page
can be found at the following address:
http://combination.cs.uiowa.edu/smtlib/.
In that email, we have described the aim of the initiative and we have
also given a list of problems to be solved in order to set up the library
in a coherent way. One of the major problem in that list is the following:
to establish a common format to specify the satisfiability problems in the
library in a uniform way. In the following of this email, we give some
proposals for such a common format together with the reasons for our
choices. We hope this document will foster the discussion on this very
important issue, which is of crucial importance to the success of the
library.
Before diving into more technical details, we would like to point
out that in order to continue receving the emails concerning the
SMT-LIB initiative, unless you have already told us that you are
interested, PLEASE LET ONE OF US KNOW WITHIN ONE WEEK IF YOU ARE
INTERESTED IN THE INITIATIVE. Otherwise, we will remove your email
from the smt-disc list.
We are very enthusiastic about the project and we are sure
that you want to join us in the exiciting discussions to come
about the SMT-LIB initiative.
Below, you will find the proposal for the common format together
with some discussion. We hope this will start reactions and comments
and do not forget to send your benchmarks to set up (a prelinary version
of) the library. Please notice that the initial DEADLINE for sending us
your benchmarks or setting up a local web page is OCTOBER 31, 2002.
(To facilitate the setting up of the local web pages we will soon
provide on the SMT web site a tar file with template pages.)
Best regards,
Silvio Ranise (Silvio.Ranise at loria.fr) and
Cesare Tinelli (tinelli at cs.uiowa.edu)
===================================================================
--------------------------------------
SMT-LIB: Proposal for a common format
--------------------------------------
Some of the people in this mailing list had already started a discussion
on the format of the benchmark. In particular, there was a proposal by
Shankar to use XML as a basis, and a counter proposal by Greg (Nelson)
to use S-expressions instead.
Both proposals have their merits and need further discussion. However,
as we mention in our first message, we believe that before deciding
which abstract syntax to use, it is important to settle some more
fundamental issues, such as which logic to adopt, what classes of input
problems to consider, and how to specify background theories.
The choice of the logic is important because it affects both the kind of
problems we consider and the abstract syntax for expressing them.
The choice of a uniform way of specifying background theories is
important for both semantical issues such as as soundness/completeness
of the solvers and for comparison purposes: it is hard to compare your
results to someone else's if it is not clear which background theories
they are using.
-------------
Preliminaries
-------------
First, let's agree on some terminology and basic assumptions, to make
sure that there are no misunderstandings.
We assume that input problems are to be checked for (un)satisfiability,
not validity (the difference matters for those classes of formulas that
are not closed under negation).
Since we are interested in satisfiability (as opposed to validity), when
we say that a formula is *quantifier-free* we do mean quantifier-free
(no implicit universal quantifiers).
Of course, for satisfiability purposes, a quantifier-free formula phi
can be seen as a ground formulas over an expansion of phi's signature
which treats phi's variables as *Skolem* constants, i.e., new constant
symbols. So, quantifier-free formulas can also be identified with ground
formulas with Skolem constants.
We distinguish between the underlying logic used by a solver
(first-order, modal, higher order, etc.) and the class of formulas it
accepts as input (ground, CNF, first-order, temporal, etc.). For
instance, the underlying logic of a solver for linear arithmetic is
first-order logic with equality; the class of input formulas may be
conjunctions of equations and inequations. A further substantial
issue is how to specify the theory in which the solver works. For example,
one must be careful when a solver works over the combination of
two theories augmented with a new function (think of the combination
of the theory of lists and integers with the function returning the
length of a given list).
----------------
Underlying Logic
----------------
Everybody will doubtless agree that the underlying logic should be at
least classical first-order logic with equality (FOL-EQ).
Not so obvious is whether we should limit ourselves to unsorted FOL-EQ
or consider some sorted version of it.
While most theoretical results on combination are expressed in the
unsorted setting, most of the background theories and input problems are
more naturally formulated in a sorted framework. (Some of the existing
solvers actually have a sorted input language.)
If sorts are to be seriously considered, the next question is whether to
use a many-sorted approach, with a flat sort hierarchy, or an
order-sorted approach, with sorts and subsorts.
An orthogonal question is whether to consider modal logics as well. For
instance, STeP considers a version of linear-time temporal logic.
The last question might then be whether to consider a higher-order
logic, and which one.
PROPOSAL 1: Let's start by limiting ourselves to unsorted FOL-EQ, and
let's consider sorted logic only later, if at all.
RATIONALE: The reason not to consider modal or higher-order logics is
that it might be an overkill. Most existing, and most likely future,
automated solvers are limited to FOL. (Even those that take higher-order
formulas are automatic only on those problems that can be somehow
reduced to first-order ones.)
The reason not to consider, at least at the beginning, a sorted FOL-EQ
is that sorts are difficult to treat properly. Certainly, more than many
in the community seem to think. In fact, sharing sorts amounts to sharing
predicate symbols, which means that the existing (and well-understood)
combination methods (such as Nelson and Oppen) for disjoint signatures
need substantial modifications. We believe that sorts should be
considered only after an adequate theoretical treatment of sorted
combination has been worked out by the research community.
-----------
Input class
-----------
Again, everybody will agree that the class of benchmarks we consider
should be at least that of conjunction of literals, or better, arbitrary
quantifier-free formulas.
But should we also consider quantified formulas? Some solvers like
Simplify and STeP accept arbitrary first-order formulas.
Of course, depending on what we decide about the underlying logic, we
may also need to decide whether to consider also formulas with
modalities or with higher-order quantifiers.
PROPOSAL 2: We should definitely consider the whole class of
quantifier-free formulas, but we would even go all the way and allow
problems within the full class of first-order formulas.
However, let us classify the problems in the library in two classes
according to the fact that we are interested in checking the
satisfiability of (1) quantifier-free or (2) full first-order formulae.
The reason for separating quantifier-free benchmarks from quantified ones
is to facilitate the downloading and the use of the quantifier-free
benchmarks for those people that are not interested in the quantified ones.
RATIONALE: Although some of the existing solvers do not go beyond
conjunctions of literals, we think that full Boolean combinations of
predicates should be allowed. A lot of people (including several of us)
are investigating efficient ways to incorporate "theory solvers" into
SAT solvers. These people would find the library very useful.
The reason to allow benchmarks with (first-order) quantifiers is that
i) they often arise in practical applications,
ii) the mathematics behind them is well-understood, and
iii) effectively solving the problem of finding suitable
instances of quantified variables is a major research problem which
awaits substantial work from the automated theorem proving community
(although some serious work has already been done by the SRC people
and implemented in the Simplify prover).
REMARK: We are not sure though whether the inputs should be restricted
or not to a specific normal form (such as CNF for qffs or PNF with CNF
matrices for fofs).
The reason we are not sure whether to impose a specific normal form (as
done with the DIMACS format for SAT problems, for instance) is that the
conversion may strip the problem of structural properties that a solver
might have exploited to process the problem faster. Also, there often
are several conversions methods into the same type of normal form, with
some methods being better than others on certain types of inputs.
On the other hand, having a fixed normal form may facilitate the
implementation of parsers and the like.
--------------------
Theory specification
--------------------
There are essentially two ways to specify a theory.
One is axiomatic: a theory is (the deductive closure of) a certain set
of sentences in the underlying logic. The other is algebraic: a theory
is a set of models (where the definition of model depends on the
underlying logic).
PROPOSAL 3: Specify the theories algebraically.
RATIONALE: Some solvers are complete only for theories that can be
specified algebraically. For instance some Shostak-based approaches are
complete with respect to so called sigma-models.
Incidentally, the choice of an algebraic specification does not rule out
any theory T that is more naturally speficied by a set of axioms S. In
fact, the same T can always be equivalently specified as the set of
models of S.
We conclude by pointing out that for combined theories it is important
to specify precisely how the theory is obtained from of its components,
the reason is that in general there are several conceivable ways to
combine the components.
This is perhaps obvious in the sorted case where there are several ways
to combine the sorts of one theory with those of the other. But it is
also true in the unsorted case.
For instance, one might have two theories T_1 and T_2 where each T_i is
the set of X-models of some set of sentences S_i (replace X- in X-model
with your favorite prefix: sigma-, free-, canonical-, initial-, and so on).
Then, one combined theory is the set of models whose reduct to the
signature of T_i is (isomorphic to) a model in T_i, for each i. Another
combined theory could be the set of X-models of (S_1 UNION S_2). The two
theories are not typically equivalent.
The case for a precise specification is even stronger in cases of
"augmentation", in which the combined theory is more the the sum of its
parts, so to speak. (Think for instance of a combined theory of integers
and finite sets equipped with a cardinality operator).
More information about the SMT-LIB
mailing list