[SMT-LIB] SMT-LIB logic and language Version 2 draft
Cesare Tinelli
tinelli at cs.uiowa.edu
Fri Jul 31 01:16:30 EDT 2009
Hi all,
the draft below defines Version 2 of SMT-LIB's underlying logic and
the language for specifying theories, sublogics and benchmarks.
Its highlights are:
- a simpler abstract and concrete syntax
- some level of support for parametric types via schematic theory
declarations
- sublogic declarations based on multiple theories
Again, your feedback is highly welcome.
Cesare
------------------------------------------------------------------------
-----------------
SMT-LIB Logic and Language Version 2
Working Draft
Cesare Tinelli
30 July 2009
===========
0. Prologue
===========
This document describes the proposed Version 2 of the SMT-LIB logic
and input language. It assumes previous familiarity with Version 1.2
(described in http://goedel.cs.uiowa.edu/smtlib/papers/format-v1.2-
r06.08.30.pdf).
A more complete and self-contained document will be issued later.
The document is based on discussions within the SMT-LOGIC work group
created specifically for this purpose, and on additional feedback
from other people. The work group consists of Clark Barrett, Sylvain
Conchon, Bruno Dutertre, Jim Grundy, Leonardo de Moura, Albert
Oliveras, Aaron Stump, and Cesare Tinelli.
Substantial external feedback was provided by Sava Krstic and Philipp
Ruemmer.
Legend:
- Text enclosed in (* *) describes the motivations for new
or modified features in Version 2 with respect to Version 1.2.
- All examples of the new language are in the new concrete syntax.
- The meta syntax _some text_ and *some text* is used in this
document as a substitute for the italics and the bold font,
respectively.
====================
1. Underlying Logic
====================
The underlying logic of this new version of the SMT-LIB language is
basically the same as before: classical many-sorted logic.
The main differences are:
1) Sorts are denoted by (first-order) sort *terms* built with sort
symbols of arity >= 0.
-----------------------------------------------------------------
Ex:
- sort symbols of arity 0: Int, Index, Elem
- sort symbols of arity 1: List, Set
- sort symbols of arity 2: Array
- Sort Terms:
Int, Index, Elem, (Array Index Elem), (Set (Array Index Elem))
-----------------------------------------------------------------
(* This change is motivated by our goal to introduce some form of
parametric types into the logic, which will be done through the use
of schematic theory declarations (see later). *)
2) We drop the distinction between terms and formulas but we
introduce a distinguished Bool sort, always interpreted as the
Booleans, for what were formulas before.
(* The reason for this change is that it allows us to considerably
simplify the syntax of the language without introducing any
substantial complications for SMT-solver writers. *)
==================
2. Abstract syntax
==================
2.1 Terms
The new abstract syntax for terms is captured by the rules below.
Parentheses are meta-symbols used for grouping, As usual, the unary
suffix meta-symbols * and + indicate zero, resp. one, or more
repetitions.
Similarly to the current version, the rules are very liberal since
further restrictions are imposed orthogonally by an associated type
system.
[Annotations] alpha ::= a = v (v is intentionally left unspecified)
[Sort symbols] q ::= as before, but now with an associated
arity >= 0
[Sort terms] s ::= q s*
[Special constants] c ::= includes rationals and numerals (see later)
[Variables] x ::= as before
[Fun. Symbols] f ::= as before
[Terms] u ::= x | c
| f t*
| f^s t*
| let (x = t)+ in t
| exists (x:s)+ t | forall (x:s)+ t
[Annot. Terms] t ::= u alpha*
Main differences with the previous abstract syntax:
- sorts are now denoted by (ground) first-order terms;
- the syntactical category of predicate symbols is gone: predicate
symbols become function symbols with a Boolean return sort;
- the syntactical category of formulas is gone, formulas are now
Boolean terms;
- there is a single let construct, with one or more (parallel) bindings;
- the Boolean connectives, ite, distinct, and = are all demoted from
connectives of the language to theory function symbols (see later);
- generalized overloading of function symbols is now allowed;
- within a term, a function symbol f can be given a sort qualifier,
with the expression f^s (see later).
2.2 Type system
For convenience the words "type" and "sort" will be used
interchangeably here. The new type system is similar to the current
one but, except for the addition of type rules for sort terms (which
simply check that a sort symbol is applied to the correct number of
arguments), it has a lot less rules, thanks to the simplifications
above.
Note that despite having now sort symbols of non-zero arity, the
logic remains first-order. In particular, contrary to higher-order
typed logics, it does not have a distinguished binary sort symbol ->,
say, such that the sort term (-> s_1 s_2) denotes, in every model,
the set of total functions between (the denotations of) the sorts s_1
and s_2.
Of course, we can model total functions as a data type in this
proposed version (we already do that in Version 1.2 in the theory of
arrays with extensionality). But that is done at the level of an
individual theory, not at the level of the logic itself.
Also, the logic remains many-sorted and keeps the same model
theoretic semantics as in the previous version. A sort term, such as
(Array Real Real), denotes a (single) domain (i.e., a non-empty set
of individuals) the same way as a sort symbols do in Version 1.2.
This is in contrast with logic with parametric types, which allow
also type variables. There, a type like (Array alpha beta) where
alpha, beta are type variables, denotes a *family* of domains.
A more substantial change to the type system is that the language now
allows generalized overloading of function symbols. Overloading is
both of the "ad-hoc" kind and of "polymorphic" kind.
In the first kind, a function symbol can have two entirely different
ranks, even with different arity, e.g.,
- : Real Real and
- : Real Real Real
for arithmetic minus.
In the second kind, the arity is the same and all the different ranks
have the "same shape", e.g.,
select : (Array s1 s2) s1 s2
for any sort terms s1 and s2. Note that this is an informal
distinction. Strictly speaking, all overloading---even the second
kind---is ad-hoc because the logic does not have parametric types.
The important point is that, with generalized overloading, the type
system is not deterministic anymore: a term may have more than one sort.
Moreover, even when it does have a single sort, type inference (as
opposed to simpler type checking) may be necessary to compute that
sort. Explicit type qualifiers for function symbols, as in f^s, have
been introduced as a simple way to resolve the non-determinism.
Using type qualifiers, it is still possible to compute
deterministically the type of any term bottom-up during parsing as it
is with Version 1.2. Figuring out which function symbols to qualify
explicitly is quite simple because the only troublesome cases are
those of overloaded function symbols with same input sorts and
different output sort, as in
f : s_1 ... s_n s and
f : s_1 ... s_n s'
with n >= 0 and s, s' distinct. Let's call such function symbols
"ambiguous".
In a term of the form (f t_1 ... t_n) with f ambiguous, and only in
those, f needs to be annotated with its intended returned type. For
instance, if the return type is meant to be s, the term must be
written as (f^s t_1 ... t_n).
We will then enforce the following policy for SMT-LIB benchmarks:
Every ambiguous function symbol occurring in a benchmark
formula must be annotated with a type qualifier.
(As for occurrences of ambiguous symbols in a theory definition, see
the section on schematic theories.)
A concrete example of such situations comes from logics of lists
where the empty list constant symbol 'nil' has two or more types, say,
(List A), (List B), ...
The policy prescribes that every occurrence of nil in a formula must
be accompanied by a type qualifier. In the concrete syntax, described
later, instead of just nil we must have instead
(as nil (List A)), (as nil (List B)), ...
2.3 Constant declarations
Many theories have (in effect) some large, even infinite, set of
constant symbols all of the same type: the integer theory uses all
the numerals (0, 1, 2, ...); the real theory uses all the decimals
(0.0, 0.1, ...); some theories of strings use all the quotes-enclosed
strings ("", "abc", ...).
We identify a few such sets and treat them specially.
At the abstract syntax level we have:
n ::= all numerals (as before)
r ::= all rationals (as before)
h ::= all hexadecimals (new)
b ::= all binaries (new)
w ::= all strings (new)
The set of all special constants (which, are distinct from other
function symbols) is then:
c := n | r | h | b | w
We associate a special symbol to each of these *sets*:
N for all numerals
R for all rationals
H for all hexadecimals
B for all binaries
W for all strings
Each of these symbols can now be used to in a function symbol
declaration to declare all the constants symbols in the corresponding
set at once.
Before, function symbol declarations were defined by:
D_f ::= f s+ alpha*
| n s alpha*
| r s alpha*
Now the rules are:
C ::= N | R | H | B | W
D_f ::= f s+ alpha*
| c s alpha*
| C s alpha*
These rule allow one to declare individual constants (such as 0), or
whole sets of them (such all the numerals) at once.
The use of the special symbols C, however, is limited only to theory
declarations. In other words, they cannot be used in an :extrafuns
field of a benchmark.
(* The rationale for the extra symbols is that they introduce more
flexibility and convenience when declaring certain theories. The
reason for not allowing them in :extrafuns declarations is to avoid
unnecessarily complicating the parsing and processing of such
declarations given that a single benchmark uses only finitely-many
symbols anyway. *)
2.4 Global Function Symbol Definitions
In addition to local symbol definitions, provided by the let
construct, we now have global definitions within a theory declaration
or a benchmark.
Contrary to let, which defines only constants/variables, global
definitions can define function symbols too.
So we add another possibility to a function symbol declaration:
D_f ::= ...
| f (x:s)* t alpha*
where (x:s)* names the function's inputs and specifies their sort,
and t is a term not containing f whose free variables are among the
input variables.
The effect of a global definition
f x_1:s_1 ... x_n:s_n t (1)
(with n >= 0 and t of some sort s) in a theory declaration or a
benchmark is the same as declaring f to have rank (s_1 ... s_n s) and
adding the axiom
forall x_1:s_1 ... x_n:s_n (f x_1 ... x_n) = t (2)
in the :definition attribute (for theories) or the :assumption
attribute (for benchmarks).
(* Global definitions are added for convenience and efficiency, since
terms like (f t_1 ... t_n) in a formula can be treated as macro
applications. That is easier for a solver to do than to process the
quantified assumption (2) above.
Another important advantage of function definitions in a benchmark is
that they have global scope within the benchmark, so they can be
used, for instance, to define constant terms that can appear in both
an assumption and the main formula of the benchmark. Note that the
'let' construct cannot be used for that because it has local scope
within an :assumption or :formula attribute.
*)
2.5 Sort Declarations and definitions
Sort declarations within a theory declaration or a benchmark now need
to specify the arity of the sort symbol:
[Sort Declaration] D_s ::= q n
Similarly to global function definitions, complex sort expressions
can be given for convenience a short name with global scope in a
theory or benchmark declaration. A short name q for a sort term s
does not define a new type, it is just a synonym of s and could be
expanded to s everywhere.
For example, one can declare, say, MyList as a synonym of (List (Set
Real)).
A sort declaration is now
[Sort Declaration] D_s ::= q n
| q s
where s is a sort term. In a declaration of the form (q s), the sort
symbol q has arity 0 and is just a synonym for s. As a side
restriction, s should not be expansible to a sort term containing q---
in other words, the definition of a sort synonym cannot be circular.
==================
3. Concrete syntax
==================
The new concrete language is now included in the following subset of
Common Lisp S-expressions:
<numeral> ::= 0 | a non-empty sequence of digits not starting with 0
<rational> ::= <numeral>.0*<numeral>
<hex> ::= a non empty sequence of digits and the letters
from a to f, capitalized or not
<hexadecimal> ::= #h<hex>
------------------------------
Ex: #h0 #hA04 #h01AB #h61ff
------------------------------
<bin> ::= a non-empty sequence of 0s and 1s
<binary> ::= #bin<bin>
------------------------------
Ex: #b0 #b001 #b101011
------------------------------
<string> ::= ASCII character string in double quotes
with C-style escaped characters: \", \n, ...
<spec_constant> ::= <numeral> | <rational>
| <hexadecimal> | <binary> | <string>
<symbol> ::= any sequence of letters, digits and the characters
+ - / * = ~ ? ! . _ $ % & ^ < > @ :
that does not start with a digit
| any sequence of printable ASCII characters that starts
and ends with | and does not otherwise contain |
----------------------------------------------------------------
Ex.
+ <= x plus ** $ <sas <adf> ::: abc77 *$s&6 .kkk .8
|this is a single symbol|
|af klj^*(0asfsfe2(&*)&(#^$>>>?"']]984|
----------------------------------------------------------------
<S-expression> ::= <spec_constant> | <symbol> | ( <S-expression>* )
Main differences with the current syntax:
a) Overall, the concrete grammar is simpler.
(* This was done mostly to allow simpler parsers and so lower the
barrier to entry. *)
b) We have a special syntax (but not semantics!) for hexadecimal
constants and bitstring constants. Parsing is kept simple by the
reserved use of # for starting such constants and a one-letter
qualifier (h and b).
c) The language generated by the concrete syntax is a subset S-
expressions; in, particular, the problematic use of the symbols [ ]
{ } in Version 1.2 is gone.
(* This way, benchmarks can be processed by tools that understand S-
expressions. *)
d) The value of a user-defined attribute is an arbitrary S-
expression---as opposed to an arbitrary sequence of characters in
enclosed in { }.
e) A variable's name is any simple identifier, as opposed to just one
starting with ? or $.
(* This is for flexibility, given that the current restriction does
not really simplify parsing. Note that variables starting with ? or $
are still allowed. *)
f) Indexed identifiers in mixfix notation, such as foo[3:4], are not
allowed as symbols anymore but become expressions like (ind foo 3 4),
where 'ind' is a new keyword.
(* This is for consistency with the decision of using S-expressions
as the only kind of expressions.
The concrete syntax
( ind <simple_identifier> <index>+ )
instead of, say, just
( <simple_identifier> <index>+ )
is motivated by the need to avoid ambiguity with function applications
( <fun_symbol> <term>+ ) .
*)
g) Multi-arity function symbols are not allowed indiscriminately
anymore. Every function symbol has a fixed arity, or a finite number
of fixed arities in case of overloading.
For instance, 'and' and '=' now take exactly two arguments.
Expressions like (f x1 x2 x3 x4) with f is binary are still allowed,
but only as syntactic sugar, provided that an annotation is added in
this regard in the symbol's declaration.
The following, mutually exclusive, annotations are now reserved for
this purpose:
:left_assoc, :right_assoc and :chainable.
A function symbol f can be declared as
i) :left_assoc if it has rank of the form (s1 s2 s1)
ii) :right_assoc if it has rank of the form (s1 s2 s2)
iii) :chainable if it has rank of the form (s s Bool)
Then (f t_1 ... t_n) is syntactic sugar respectively for:
i) (f ... (f (f t_1 t_2) t_3) ... t_n)
ii) (f t_1 (f t_2 ... (f t_{n-1} t_n) ...)
iii) (and (f t_1 t_2) ... (f t_1 t_2))
[with 'and' itself assumed to be declared as :left_assoc]
For example, we can have declarations like
(+ Real Real Real :left_assoc)
(cons E (List E) (List E) :right_assoc)
(and Bool Bool Bool :left_assoc)
(= A A Bool :chainable)
(< Real Real Bool :chainable)
For simplicity of parsing, the mechanism above is limited only to
theory symbols. User-defined symbols are not allowed to have a
multiarity syntax.
(* The reason for this chance is that multiarity function symbols
complicate parsing even if they allow for flatter expressions. Also,
with Version 1.2 there as been much confusion about a number of
operators op of rank (Bool Bool Bool) on whether (op x1 x2 x3) meant
(op (op x1 x2) x3) or (and (op x1 x2) (op x2 x3)).
The proposed syntax addresses both problems. The first because it
limits the number of multiarity operators to just theory symbols. The
second because now the distinction between the two interpretations of
a multiarity operator is made explicit.
Eliminating multi-arity operators from the logic (if not from the
concrete syntax) has also another advantage. It greatly simplifies
the definition of proof systems for SMT.
*)
For convenience and backward compatibility, the predefined symbol
'distinct' as well is allowed to have more than two arguments.
The meaning of (distinct t_1 ... t_n) is the same as in Version 1.2:
the conjunction of (distinct t_i t_j) for all 1 <= i < j <= n.
The only difference is that now 'distinct' is officially just a
binary symbol and (distinct t_1 ... t_n) is defined as syntactic
sugar for the conjunction above.
h) Symbols with an arbitrary sequence of characters delimited by
vertical bars are new. They are inspired by Common Lisp (see http://
www.cs.cmu.edu/Groups/AI/html/cltl/clm/
node27.html#SECTION00630000000000000000) and were used in the
granddad (grandmom?) of all SMT solvers, Simplify.
(* Their main motivation is to be used as values of user-defined
annotations, which in Version 1.2 were restricted to arbitrary text
delimited by curly braces. *)
i) The syntax for annotations within terms has changed. It used to be
or the form
(<const> <annotation>+) or
(<fun_symbol> <an_term>+ <annotation>)
where <an_term> is an annotated term.
(* That syntax made parsing unnecessary complicated because some look-
ahead could be needed to distinguish between the two cases above. *)
The new syntax for adding annotations to terms is now
( annot <term> <annotation>+ )
where 'annot' is a new keyword.
(* This should eliminate the parsing problems above while allowing
the same level of flexibility as before (that is, terms and subterms
can be freely annotated). *)
(* Ignoring a user-defined annotation, for those systems that to do
not make use of it, should be also easy, the only complication being
that now the value of an annotation is an arbitrary S-expression,
which means that a small subparser for S-expressions is needed to
know when such a value begins and ends. [Admittedly, that is more
complicated than what we have in Version 1.2, where the value of a
user-defined attribute is any text enclosed in curly braces.]
*)
3.1 Terms
Rules:
<index> ::= <numeral>
<identifier> ::= <symbol> | ( ind <symbol> <index>+ )
<sort_term> ::= <identifier> | ( <identifier> <sort_term>+ )
--------------------------------------------------------
Ex: Int (List (Array Int Real))
(ind BitVec 3) (Set (ind Bitvec 3))
((ind FixedSizeList 4) Real)
--------------------------------------------------------
<attribute> ::= :<symbol>
-----------------------------------
Ex: :date :a2 :foo-bar :< :56
-----------------------------------
<annotation> ::= <attribute> | <attribute> <S-expression>
where <S-expression> does not start with ':'
---------------------------------------------------------
Ex: :assoc
:status unsat
:my_attribute (humpty dumpty)
:authors "Jack and Jill"
---------------------------------------------------------
<var> ::= <symbol>
<fun_symbol> ::= <identifier>
| (as <identifier> <sort_term>)
----------------------------------------------------------
NOTE: (as <identifier> <sort_term>) corresponds to the
abstract syntax expression f^s
----------------------------------------------------------
---------------------------------------------------------
Ex: + <= foo
(as nil (List Int))
(as + (Int Int Int))
----------------------------------------------------------
<quant_symbol> ::= forall | exists
<sorted_var> ::= ( <var> <sort_term> )
-------------------------------------------------------------------
Ex: (x Int) (?a (Array Int (ind BitVec 4)))
-------------------------------------------------------------------
<binding> ::= ( <var> <term> )
-----------------------------------------
Ex: (x (+ y 4))
-----------------------------------------
<term> ::= <var> | <spec_constant> | <fun_symbol>
| ( <fun_symbol> <term>+ )
| ( let ( <binding>+ ) <term> )
| ( <quant_symbol> ( <sorted_var>+ ) <term> )
| ( annot <term> <annotation>+ )
----------------------------------------------------
Ex: (forall ((x (List Int)) (y (List Int)))
(append x y) =
(ite (= x (as nil (List Int)))
y
(let ((x.h (head x)) (x.t (tail x)))
(insert x.h (append x.t y)))))
----------------------------------------------------
Note that the concrete syntax for quantifier formulas:
( <quant_symbol> ( <sorted_var>+ ) <term> )
now encloses the list of quantified variables into parentheses.
(* This change too is to simplify parsing. *)
3.2 Sort Declarations
Having introduced sort symbols of non-zero arity, sort declarations
in a benchmark now look like this:
<sort_symb_decl> ::= ( <sort_symbol> <numeral> <annotation>*)
| ( <sort_symbol> <sort_term> <annotation>*)
-----------------------------------------------------
Ex: (Int 0) ((ind BitVec 3) 0)
(List 1) ((ind FixedLengthList 1) 1)
(Array 2)
(MyList (List (Array Int Real)))
-----------------------------------------------------
3.3 Function Symbol Definitions
Global definitions of function symbols are added as an option in
function declarations (last option below):
<meta_spec_constant> ::= NUMERAL | RATIONAL | HEXADECIMAL | BINARY
| STRING
<fun_symb_decl> ::= ( <spec_constant> <sort> <annotation>* )
----------------------------------------------------------------------
Ex: (0 Int) (1.0 Real) ("" String)
----------------------------------------------------------------------
| ( <meta_spec_constant> <sort> <annotation>* )
----------------------------------------------------------------------
Ex: (NUMERAL Int) (RATIONAL Real) (STRING String)
----------------------------------------------------------------------
| ( <fun_symbol> <sort>+ <annotation>* )
----------------------------------------------------------------------
Ex: (a Int)
(+ Int Int Int :left_assoc)
(store (Array I E) I E (Array I E))
((ind extract 4 1) (ind BitVec 10) (ind BitVec 4))
----------------------------------------------------------------------
| ( <fun_symbol> ( <sorted_var>* ) <term> <annotation>* )
----------------------------------------------------------------------
Ex: (a () (* 3 b))
(f ((x Real) (y Real)) (> x (+ y y)))
----------------------------------------------------------------------
3.4 Benchmarks
The concrete syntax of benchmarks is essentially unchanged, except
for the disappearance of the :extrapreds attribute and for the
following new restrictions for the :extrasort and :extrafun attributes.
1) none of the declared symbols can be indexed,
2) none of the declared symbols can overload an already declared
symbol (including theory symbols from the benchmarks logic).
(* The rationale for the first restriction is that indexed
identifiers are used in SMT-LIB to denote infinite families of theory
symbols (as in the bitvector logics). The number of user-declared
symbols in a benchmark is always finite, so using index identifiers
is pointless, and it needlessly complicates the parsing of logics
that do not have indexed theory symbols. (It is hard to justify the
need for a user to declare an indexed identifier like (ind f 3) as
opposed to, say, f_3.)
The rationale for the second restriction is similar, considering that
we do not allow parametric sorts in benchmarks (more on this later),
which makes it impossible for a user to declare a polymorphic
function. The only possible overloading then would be ad-hoc
overloading (as in using +, say, for string concatenation). But such
overloading is hardly necessary.
*)
-------------------------------
Complete grammar for benchmarks
-------------------------------
<numeral> ::= 0 | a non-empty sequence of digits not starting with 0
<rational> ::= <numeral>.0*<numeral>
<hex> ::= a non empty sequence of digits and the letters
from a to f, capitalized or not
<hexadecimal> ::= #h<hex>
<bin> ::= a non-empty sequence of 0s and 1s
<binary> ::= #b<bin>
<string> ::= ASCII character string in double quotes
with C-style escaped characters: \", \n, ...
<spec_constant> ::= <numeral> | <rational>
| <hexadecimal> | <binary> | <string>
<symbol> ::= any sequence of letters, digits and the characters
+ - / * = ~ ? ! . _ $ % & ^ < > @ :
that does not start with a digit
| any sequence of printable ASCII characters that starts
and ends with | and does not otherwise contain |
<S-expression> ::= <spec_constant> | <symbol> | ( <S-expression>* )
<attribute> ::= :<symbol>
<annotation> ::= <attribute> | <attribute> <S-expression>
where <S-expression> does not start with ':'
<index> ::= <numeral>
<identifier> ::= <symbol> | ( ind <symbol> <index>+ )
<sort_term> ::= <identifier> | ( <identifier> <sort_term>+ )
<fun_symbol> ::= <identifier>
| (as <identifier> <sort_term>)
<quant_symbol> ::= forall | exists
<var> ::= <symbol>
<sorted_var> ::= ( <var> <sort_term> )
<binding> ::= ( <var> <term> )
<term> ::= <var> | <spec_constant> | <fun_symbol>
| ( <fun_symbol> <term>+ )
| ( let ( <binding>+ ) <term> )
| ( <quant_symbol> ( <sorted_var>+ ) <term> )
| ( annot <term> <annotation>+ )
<sort_symb_decl> ::= ( <sort_symbol> <numeral> <annotation>*)
| ( <sort_symbol> <sort_term> <annotation>*)
<meta_spec_constant> ::= NUMERAL | RATIONAL | HEXADECIMAL | BINARY |
STRING
<fun_symb_decl> ::= ( <spec_constant> <sort> <annotation>* )
| ( <meta_spec_constant> <sort> <annotation>* )
| ( <fun_symbol > <sort>+ <annotation>* )
| ( <fun_symbol> ( <sorted_var>* ) <term> <annotation>* )
<status> ::= sat | unsat | unknown
<bench_name> ::= <letter><symbol>
<logic_name> ::= <letter><symbol>
<family_name> ::= <letter><symbol>
<bench_attribute> ::= :logic <logic_name>
| :family <family_name>
| :assumption <term>
| :formula <term>
| :status <status>
| :extrasorts ( <sort_sym_decl>+ )
| :extrafuns ( <fun_sym_decl>+ )
| :notes <string>
| <annotation>
<benchmark> ::= (benchmark <bench_name> <bench_attribute>+ )
=============================
4 Theory declaration schemas
=============================
Version 2 adds some support for parametric types. The chosen approach
is middle-ground between full support at the level of the logic
itself, as done for instance in the logic of the HOL prover, and a
meta-level solution via parametric theory specifications, as done for
instance in the logic of the PVS prover.
We now replace theory declarations with _theory declaration schemas_.
These are like theory declarations except that they may contain
_function symbol declaration schemas_. The latter are in turn like
function symbol declarations except that they can use _sort
parameters_, locally scoped sort symbols of 0 arity. A new binder
called 'par' is used for to bind sort parameters.
--------------------------------------------------------------------
Ex:
(theory Array
:sorts ((Array 2))
:funs ((par (X Y) ; a function declaration schema
(select (Array X Y) X Y)) ; with sort parameters X and Y
(par (X Y)
(store (Array X Y) X Y (Array X Y)))
...
)
...
)
(theory ListWithLength
:sorts ((List 1) Int)
:funs ((par (X)
(nil (List X)))
(par (X)
(cons X (List X) (List X)))
(par (X)
(head (List X) X))
(par (X)
(length (List X) Int))
)
...
)
--------------------------------------------------------------------
A declaration schema like the two above is not a theory declaration
in the sense of Version 1.2 of SMT-LIB. Instead, it is a *family* of
theory declarations, each of which is an _instance_ of the schema.
An instance of a theory declaration schema is obtained by providing
a, possibly empty, set Q of sort symbols.
(* Actually, in general we may need instantiate a theory declaration
schema also with a set of free functions symbols, which get added to
the signature of the theory instance and are "uninterpreted".
We overlook this aspect in this draft for simplicity, but it will
treated properly in the final document.
*)
The sorts of the instance theory consists of the set S of all
(ground) sort terms over Q union R, where R is the set of sort
symbols declared in the schema itself. Note that the set S is non-
empty because, as discussed later, R will always contain the 0-arity
sort symbol Bool.
In concrete, instantiating the schematic theory declaration
ListWithLength above with Q = {}, we get a theory (declaration)
ListWithLength[Q] whose sorts are
Bool, Int,
(List Bool), (List Int),
(List (List Bool)), (List (List Int)), ...
Instantiating the schematic theory declaration Array above with Q =
{Int/0, Real/0, List/1} (where the number after the sort symbol
denotes its arity), we get a theory Array[Q] with a set S of sorts
defined inductively as follows:
- Int, Real, Bool are in S
- for all s in S, (List s) is in S
- for all s1, s2 in S, (Array s1 s2) is in S
From the set S, each function declaration schema for a function
symbol f introduces in the instance theory a family of function
symbols, all called f and whose rank is obtained from the schematic
one by instantiating the sort parameters with sorts from S.
For instance, the theory Array[Q] above has all the (concrete)
function symbols
(select (Array Int Int) Int Int),
(select (Array Int Real) Int Real),
...,
that is, all symbols with name 'select' and rank of the form
((Array s1 s2) s1 s2) for all sorts s1, s2 of Array[Q].
The input set Q of sort symbols determining an instance theory is
specified by a logic declaration as explained in Sec. 5.
(* The advantage defining instances of theory declaration schemas as
done above is that we can get, with one instantiation of the schema,
a *single* theory with arbitrarily nested sorts ---e.g., The theory
of all nested lists (List Int), (List (List Int)), ..., as opposed to
the theory of just flat lists of integers. This is convenient in
applications coming from software verification, where verification
conditions can contain arbitrarily nested data types. But it is also
crucial in providing a simple and powerful mechanism for theory
combination, as explained later.
*)
NOTE: For some applications, the instantiation mechanism above will
definitely over-generate. For instance, with it is not possible to
obtain from ListWithLength an instance theory that contains (List
Int) as the only list sort, without the nested list sorts.
This however is not a problem, either in theory or in practice. Since
a benchmark can refer only to ground sorts (sort parameters are not
allowed in it), any sorts that are in the theory but not in the
benchmark can be, for all purposes, simply ignored.
Moreover, the sublogic mechanism of SMT-LIB allows one to specify a
sublogic with a smaller set of sorts than those present in the
sublogic's theory.
The :definition field of a theory declaration schema is now used to
define, in English, for every possible instance theory, the
interpretation of its sorts and function symbols.
This definition will be typically (although in principle not
necessarily) _uniform_ in some sense wrt the set of all possible
sorts. For example, for the ListWithLength theory declaration schema
the definition may go like this:
"The sort Int is interpreted as the sort of integer numbers. For any
sort s and any interpretation of s (as a non-empty set), the
interpretation of (List s) is the set of all finite lists with
elements from s; the interpretation of 'length' is the function that
maps each list l over s to its number of elements. ..."
(* Extending the SMT-LIB underlying logic to a logic with type
variables and parametric types (as, say, in the logic of the HOL
prover) has been advocated by some people, including myself, but
questioned by others on the grounds that it unnecessarily complicates
the construction of SMT solvers, because of the need to parse and
reason about formulas with polymorphic terms and type quantifiers.
The approach adopted here looks for the time being like a good
compromise between expressibility of the SMT-LIB format and
complexity of its underlying logic.
The main advantage of this approach is that the semantics of the
logic and, as a consequence, its associated inference apparatus
remains essentially the same as in Version 1.2. The only difference
is again that sorts are now named by ground terms such as (Array Int
Real) instead of constants such as IntRealArray, say.
The main limitation is that users cannot define (new) polymorphic
function symbols in a benchmark. (Having polymorphic functions in
theories, as theory declarations schema now in effect allow, is
different because theories are typically built-in in an SMT solver.)
While this is an actual limitation, it looks like in most practical
cases it can be overcome by defining in the benchmark finitely many
monomorphic versions of the polymorphic symbol.
For instance, if a list 'append' symbol occurs in the benchmark
without being a theory symbol, one can declare an append symbol for
each of the finitely many list types occurring in the benchmark, and
possibly add a corresponding defining axiom for each such symbol in
the :assumptions attribute.
*)
4.1 Theory declaration schema for Booleans and equality
Without the distinction between formulas and terms, and with theory
declaration schemas, we can turn logical operators such as 'and',
'or', =, and so on, into theory-defined function symbols as opposed
to distinguished logical constants. Semantically, the net effect is
the same but we get much leaner abstract and concrete grammars.
This is done by introducing the theory declaration schema below, and
prescribing that every other theory declaration schema implicitly
include this one, by implicitly including the same sorts and function
symbols, with the same definition.
(theory Core
:sorts ((Bool 0))
:funs (
(true Bool) (false Bool)
(not Bool Bool)
(implies Bool Bool Bool :right_assoc)
(and Bool Bool Bool :left_assoc)
(or Bool Bool Bool :left_assoc)
(xor Bool Bool Bool :left_assoc)
(par (A)
(= A A Bool :chainable))
(par (A)
(ite Bool A A))
)
:definition
"Bool is the two-element domain of Boolean values.
For any sort s,
- (= s s Bool) is the identity relation over the domain denoted
by s.
- (ite Bool s s) is the function that returns its second argument or
its third depending on whether the first argument is true or not.
- [The other function symbols are defined as expected.]
"
)
NOTE: Refer back to Section 3 for the meaning of
the :left_assoc, :right_assoc and :chainable annotations.
As a consequence of considering formulas as terms of sort Bool, the
iff operator of Version 1.2 is now superfluous because = can be used
instead. Similarly, the old Boolean if_then_else operator is now
superfluous too because 'ite' will now do.
NOTE: The theory Empty currently in SMT-LIB is superseded by the
theory declaration schema above.
=====================
5 Logic declarations
=====================
Logic declarations are defined essentially as in Version 1.2 except
that they can now refer to more than one theory declaration schema.
In the simpler case, a logic declaration defines an instance T[Q] of
a single theory declaration schema T by specifying the input set Q of
sort symbols for T. Then, as in Version 1.2 the logic declaration
specifies any restrictions on the language of T[Q].
More generally, a logic declaration can specify a *combined* theory
by referring to several theory declaration schemas. See Section 6 for
more details on how the combined theory is determined.
The names of the theory declaration schemas the logic refers to are
listed in the value of the attribute :theories which replaces the
attribute :theory of Version 1.2.
Both the set Q of additional sort symbols and the restrictions on the
language of the instance theory are specified, in English, in
the :language attribute.
NOTE: Q can be defined as a specific, finite set of sort symbols, or
a generic one (e.g., the set of all free sort symbols of arity 0).
NOTE: The restriction on the language can also involve the sorts of
the theory instance. For example, for the theory schema ListWithInt
in Section 4, it might state that the only allowed list sorts are
those of the form (List s), where s is not a list sort. Such a
restriction would define a logic of flat lists.
============================
6. Combinations of theories
============================
Defining a precise and simple mechanism for combining theories and
sublogics in SMT-LIB is needed to contain the proliferation of
theories and sublogics. Version 2 does this only for theories.
In practice, this means that a logic can refer to a theory obtained
by combining two or more instance theories, but each logic must be
specified individually.
(* Introducing a simple modular mechanism for combining logic has
turned out to be quite challenging and so it is left to later
versions. *)
6.1 Formal semantics
Recall the difference between a theory declaration schema, standing
for a class of theories, and an individual theory, an instance of
some schema.
If T is the name of a theory declaration schema and Q is a set of
sort symbol, let T[Q] denote the instance of T obtained with Q as
explained in Section 4.
Independently from the instantiation mechanism, for our purposes, it
is best to define a theory formally as a pair (Sigma, M) where
- Sigma is a signature, a pair (S, F) where
- S is a set of sort symbols with associated arity, and
- F is a set of function symbols with associated rank, and
- M is a (possibly empty) set of many-sorted structures of
signature Sigma closed under (Sigma-)isomorphism.
The closure condition on M means that if a structure A is in M, then
every Sigma-structure isomorphic to A is also in M. It is a harmless
technical requirement that avoids some unnecessary complications later.
NOTE: Since we allow overloading, it is more convenient to see the
rank of a function as part of the symbol itself. Similarly, for the
sort symbols and their arity.
Entailment, and so satisfiability, in these theories is defined as
expected: a closed formula phi entails in T a closed formula psi (phi
|=_T psi) iff every model in M that makes phi true also makes psi true.
----------------------------------------------------------------------
Ex:
If we instantiate the theory declaration schema Array in the previous
message (which, recall, implicitly includes the Core schema) with the
sort symbols Q = {Int/0}, we get a theory T_A = Arrays[Q] = (Sigma_A,
M_A) where
o Sigma_A = ({Int/0, Bool/0, Array/2}, {...}) and
o M_A is the set of all Sigma_A-structures that interpret
- Int as *any* non-empty set,
- Bool as the Booleans,
- and:Bool,Bool -> Bool, etc. as usual
- for all sorts s, s' over {Int/0, Bool/0, Array/2}
- (Array s s') as the set of total maps from the interpretation
of s to that of s'
- =:s,s -> Bool, ite:Bool,s,s -> s,
select:(Array s s'), s -> s', ..., as expected.
RECALL: '=', 'ite', 'select', 'store' are not function symbols of
the signature. The whole '=:Bool,Bool -> Bool', '=:Int,Int -> Bool',
and so on are the function symbols.
----------------------------------------------------------------------
----------------------------------------------------------------------
Ex:
Similarly, consider a theory declaration schema for the integers like
the Int theory currently in SMT-LIB, but again with the inclusion of
the Core schema. If we instantiate it with the input sort set {Array/2},
we get a theory T_I = (Sigma_I, M_I) where
o Sigma_I = ({Int/0, Bool/0, Array/2}, {...}) and
o M_I is the set of all Sigma_I-structures that interpret
- Int as the integers,
- Bool as the Booleans,
- and:Bool,Bool -> Bool, etc. as usual
- +:Int,Int -> Int, etc. as usual
- for all sorts s, s' over {Int/0, Bool/0, Array/2}
(Array s s') as *any* non-empty set.
----------------------------------------------------------------------
Generalizing from the examples above, the semantics requires that for
any instance of a theory declaration schema, the sorts built with a
symbol not originally from the theory be interpreted as *an arbitrary
non-empty set*, for all possible such sets.
6.2 Combinations of theories
Now, for i = 1,2, let T_i = (Sigma_i, M_i) be a theory,
with Sigma_i = (S_i, F_i).
DEFINITION. The theories T_1 and T_2 are _combinable_ if they have
exactly the same sort symbols (i.e, S_1 = S_2).
--------------------------------------------------------------------
Ex: The theories T_A and T_I in the previous examples are combinable:
for each of them, the sort symbol set is {Bool/0, Int/0, Array/2}.
--------------------------------------------------------------------
NOTE: We combine theories, not theory schemas.
NOTE: The requirement on the sort symbol sets for combinable theories
is entirely out of convenience. In practice, it is no restriction at
all since, we can always get combinable theories by proper
instantiations of theory schemas.
The convenience is that combinable theories already have all the
sorts that the combined theory is going to have. Without the
requirement, the combined theory would have more sorts than its
component theories, making its definition somewhat more complicated.
(For instance, if T_A above did not have the Int sort symbol, it
would not have the (Array Int Int) sort either, and similarly for
T_I. Then, it would not be obvious how (Array Int Int) should be
interpreted in their combination.)
DEFINITION. If T_1 and T_2 are combinable and S is their set of sort
symbols, their _combination_ T_1 + T_2 is the theory T = (Sigma, M)
where Sigma = (S, F_1 union F_2) and M consists of all Sigma-
structure whose reduct to Sigma_i is isomorphic to a model of T_i,
for i=1,2.
RECALL: the reduct of a Sigma-structure A to a smaller signature
Sigma' is the structure obtained from A by
(i) forgetting about the interpretation of the function symbols not
Sigma' and
(ii) dropping all sets that are not the interpretation of a sort in
Sigma'.
---------------------------------------------------------------------
Ex:
The combination of T_A and T_I above is the theory all of whose
models interprets
- Bool as the Booleans
- Int as the integers and
- each (Array s s') as the total maps from s to s'.
- the Boolean and the arithmetic symbols as expected
- each 'select' and 'store' symbol as expected
---------------------------------------------------------------------
These notions extend trivially to the case of more than two theories
that have the same shared signature pairwise.
NOTE:
(i) The combined theory T in the definition above is well-defined.
(ii) For T_1 and T_2 whose M_i is the set of all models of a set A_i
of first-order axioms, M is the set of all models of (A_1 union A_2).
(iii) T may have an empty set of models. That is of course possible
if T_1 and T_2 are jointly inconsistent, that is if |=_T_1 phi and |
=_T_2 ~phi for some shared formula phi.
That is why the definition of theory allows it to have a non-empty
set of models. Of course, actual SMT-LIB benchmarks will use
combinations of theories that do have models.
(* A possibly useful strengthening of the definition of combinable
theories would be to also require that their shared function symbols
"have the same properties" in both theories. For instance, we may not
want a shared symbol to be commutative in one theory and non-
commutative in the other, the main reason being that we do not really
know much on how to combine theory solvers for such theories.
However, there are two problems with imposing such an additional
restriction:
(i) it is conceivable that future developments will give us methods
for theories not satisfying it,
(ii) we could not find a reasonably general formulation of it.
A possible candidate could have been:
T_1 and T_2 are combinable if S_1 = S_2 and each T_i is a
conservative extension of some theory T_0 with signature Sigma_0 =
(S_1, F_1 intersect F_2); in other words, if there exists a theory
T_0 over the shared signature such that
for all closed Sigma_0-formulas phi, |=_T_i phi iff |=_T_0 phi.
Unfortunately, this restriction is too strong for us. We sometimes
need to combine two theories where a certain sort s is of some finite
cardinality n, and so interpreted, in the first theory and
uninterpreted, and so possibly infinite, in the second.
Unfortunately, such theories are not conservative extensions of a
common theory because the property of being a finite sort is
expressible with a first-order formula (with just equality and the
logical symbols). And this formula would be entailed in the first
theory but not in the second.
*)
6.2 Logics referring to a combined theory
As mentioned in Section 5 now a logic's declaration can refer to a
combined theory by listing several theory declaration schemas
T_1, ..., T_n in its :theories attribute.
The combined theory is obtained as follows. Where Q is the set of
(additional) sort symbols specified in the :language attribute, if
any, let Q' collect all the symbols of Q and all the sort symbols of
T_1, T_2, ..., and T_n.
The combined theory is then T_1[Q'] + ... + T_n[Q'].
------------------------------------------------------------------
Ex:
(logic QF_AUFLIA
:theories (Array Int)
:language
"Closed quantifier-free formulas built over an arbitrary expansion
of the Array and Int signatures with free sort symbols of arity 0
and free function symbols, but containing only
- linear atoms, that is, atoms with no occurrences of the symbol *,
- terms of sort s, (Array s s') where s and s' are either Int or
a free sort symbol.
"
...
)
NOTE: This is exactly the same logic as QF_AUFLIA of Version 1.2,
except that now we do not need to define explicitly a combined theory
for it.
------------------------------------------------------------------
------------------------------------------------------------------
Ex:
(logic QF_LLIA
:theories (ListWithLength Int)
:language
"Closed quantifier-free formulas built over an arbitrary expansion
of the ListWithLength and Int signatures with free sort symbols of
arity 0 and free function symbols, but containing only
linear atoms, that is, atoms with no occurrences of the symbol *.
"
...
)
NOTE: The theory of this logic is the theory of linear integer
arithmetic and arbitrarily nested lists whose non-list elements are
all of sort Int, Bool, or s for some free sort symbol s.
NOTE: While the Int sort is already present in ListWithLength, the
combination with the Int theory adds its various arithmetic function
symbols to ListWithLength's 'length'.
------------------------------------------------------------------
More information about the SMT-LIB
mailing list