[SMT-LIB] Comments to the new SMT-LIB standard

Wed Jun 16 23:04:10 EDT 2010

Jochen,

thanks a lot for your feedback on SMT-LIB 2. You raise several good points, some requiring clarifications and some pointing out issues we had not realized.

Please find our belated reply below.

On 1 May 2010, at 05:00, Jochen Hoenicke wrote:

> Hello,
> 
> I am currently reading the SMT-LIB standard v2.0 and have some
> remarks/questions.
> 
> Is define-fun the new way of avoiding nested let's?

It can certainly be used instead of lets. But it is more general because it allows one to define function symbols of any arity, not just constants. Also, symbols defined with define-fun have global scope.

> Many benchmarks currently have a lot of nested flet to assign formulas
> to names.  This can be either translated by define-fun or by nested
> let terms.  The latter needs deep nesting (danger of stack-overflow
> not only in the parser but also in every code that processes the AST)
> and does not work across assert statements.  define-fun seems to be a
> good way to avoid nested lets and if I understand correctly from a
> previous mail, this is its purpose.
> 
No, that was not the intention; otherwise, we would have eliminated let constructs altogether. They are still useful to define local variables, even if, with define-fun, they are not strictly needed.

> Do solvers still have to work around stack-overflows for nested lets,
> and, or?  

As far as the standard is concerned, yes. The standard imposes no limits on the depth of a term.

> If nested lets can be replaced by define-fun and nested and
> by a multi-argument and, there is no need to support deep nestings.

Are you suggesting to add a term depth limit to the definition of the language? If so, what would that limit be? 

> I
> would do the transformation myself in the parser, but unfortunately
> there is the get-assertion function that forbids simplifying the AST.

The standard imposes no requirements on the internal representation of SMT-LIB terms. In particular, you are free to parse your input into a flat data structure (that uses internally generated intermediate symbols) if do not want to have deeply nested ASTs.

get-assertion's requirement is that the original terms should be printed back as given in input. You can always do that (as long as you do not apply destructive simplifications to your AST) regardless of whether you flatten terms internally or not.
Also, note that get-assertion is both optional and meant for the interactive mode only.

> Writing a non-recursive AST to string conversion function for deeply
> nested terms seems to be no fun at all.
> 
Possibly. Writing pretty printers is indeed no fun. Nobody likes that :)

On the other hand, what you seem to suggest, limiting term depth, is a pretty draconian design decision that would bring its own annoyances.

> Grammar issues
> -----------------
> 
> Although I cannot find it explicitly mentioned in the standard, the
> nonterminal <symbol> should obviously also match predefined symbols.
> This is necessary, e.g. to allow the terms true and false, or to allow
> every syntactic construct to match <s_expr>.

Correct.

> Then, the grammar as
> defined in the standard is ambiguous.
> 
You are right. Not only is the grammar ambiguous but the current document does a poor job in distinguishing between a "reserved word" and a "predefined symbol". 
By "reserved word" we mean, as usual, a predefined lexical token that cannot be redefined by the user. Examples, for us are 'forall' and 'let'.
By "predefined symbol" we mean a symbol that has a predefined *meaning* in a certain name space, but could be in principle redefined. Examples: 'Bool', 'true', all attribute names, etc.
We will clarify the distinction in the next release of the document. 

As for the grammar's ambiguity, the grammar in the document is intended as a formal specification of the language. It is not necessarily the grammar that one wants to use when writing a parser. For that one can use any suitable unambiguous grammar as long as the tool that uses accepts the same language in the standard (or the fragment of the language that the tool is supporting).
In particular, one can define a grammar that accepts a larger language, and then remove illegal expressions a posteriori, similarly to what is done for expressions that do not type-check.

That said, we realize that it would be good if the SMT-LIB 2 document provided an unambiguous, ready-to-use grammar. We'll try to do that too in the next release.

Until then, a quick and dirty solution for the casual parser writer might be to consider all predefined symbols as reserved words, concretely, as lexer tokens, and add them in the parser rules as needed.
For instance, defining 'true' and 'false' as tokens and including them explicitly as alternatives in the production rule that defines <term>.

A parser defined this way would not be conformant, strictly speaking. But should be able to parse correctly all the current SMT-LIB 2 benchmarks, none of which mess around with predefined symbols and reserved words. 

> First, why is Bool a predefined symbol?  and, or, not etc. are not. I
> would suggest to remove Bool as predefined symbol.

See the comment above. Bool is a distinguished sort in the language semantics, with a predefined meaning. It is not a reserved word and so does not need to be defined as a lexer token.

> Also, one could have defined "distinct" as a function in the Core
> theory by adding an attribute :pairwise similar to :chainable.
> 
We never liked having a predefined distinct operator, but it did not occur to us that we could characterize its meaning with a :pairwise attribute. That is neat. 
We'll apply this change in the next release of the document.

> If symbols include predefined symbols, what symbols are forbidden for
> identifiers?
> The following identifiers give ambiguities in the grammar:
> _ (ambiguity in identifier)
> ! as distinct let forall exists  (ambiguity in term)
> par NUMERAL DECIMAL STRING  (ambiguity in function declaration in a theory)
> Bool (ambiguity in sort)
> If I do not have a typo in my grammar, this list is exhaustive.
> My suggestion is to remove Bool from predefined symbols and disallow
> at least the symbols
> _ ! as distinct let forall exists
> in most places.  

Version 2 of SMT-LIB really defines two languages, one for theory and logic declarations and one for scripts. 

The symbols 'par', 'NUMERAL', 'DECIMAL', 'STRING' are reserved words for the first sublanguage.

The symbols '_', '!', 'as', 'let, 'forall, and 'exists', are reserved words for both.

In principle, these reserved words are enough, but it is convenient to define all the command names ('assert', 'define-logic', ...) as reserved words as well. The flexibility of being able to use a command name as, say, a function symbol is really not worth the complications that it introduces in parser generators like YACC or ANTLR. 

'Bool' is a reserved word in neither sublanguage. The same will be for 'distinct', following your earlier suggestion.

Again, we will clarify all this in the next release.

> All other symbols, e.g. push, pop, true, false, sat,
> unsat, error, can be allowed.
> 
See comment above as far as 'pop' and 'push' are concerned.

> To be precise, the function names "forall", "exists", "!", and "let"
> would not lead to any ambiguity, also "as", "distinct", and "_" could
> be used as variable names, but it would be a nightmare to write a
> parser that can distinguish a function call
> (forall ((as foo Int) (g int)) (= (f a) g))
> from a quantifier
> (forall ((as Int) (g Int)) (= (f as) g))
> 
Actually, the first term can be neither a function call nor a quantifier. But your general point still stands. In this particular case, there is no problem because 'forall' cannot be a function symbol, as you suggest.

> By the way, is the symbol |xy| different from the symbol xy?

This is a good question. Quick answer: no.

In the document, we actually meant them to be distinct. But we realize now that that conflicts with the goal of making SMT-LIB 2 a sublanguage of Common Lisp. In Common Lisp, the vertical bars are used as a quoting device, so |xy| and xy are in fact the same symbol.
Things are more complicated though because the backslash character \ is also a quoting symbol (see http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node27.html#SECTION00630000000000000000).

For simplicity, we have decided to adopt the compromise solution of *not* allowing backslashes between general bars, since we do not need that level of generality for our symbols.
This too will be addressed in the next release.

> 
> Are predefined keywords also keywords?

Yes. See comments above about predefined symbols.

> This leads to ambiguous grammar, but probably we don't want to forbid,
> e.g., the user defined annotation :notes for a term.
> If one takes the grammar literally, then all predefined attributes
> could be removed without changing the language.  I think the intention
> is, that <keyword> in attribute should not match those predefined
> keywords that appear in a production of the parent non-terminal.
> 
Actually, it is the opposite. Syntactically, predefined keywords have no special status. They could be indeed removed from the production rules that also contain <keyword>.  
This too will be clarified in the next release.

> Currently, I have 37 shift-reduce and reduce-reduce conflicts, or I
> use a complicated grammar (with four different attribute
> non-terminals) to avoid the conflicts and still allow, e.g., the
> :authors keyword in theories.
> 
> minor typos in the standard:
> ------------------------------
> 
Thanks a lot for these remarks too. Will fix as needed.

> spec_constant sometimes spelled as spec_const.
> attribute_value is unused; probably the rule for attribute should be
> attribute ::= keyword | keyword attribute_value
> :author (predefined keyword) versus :authors (info_flags)
> get-option is currently not a predefined symbol
> the predefined keywords :axioms :family :logic :script  should be removed.

> page 22: A sort parameter can be any <symbol>:  replace <symbol> with <sort>

<symbol> is actually correct. Sort parameters cannot be indexed.

> page 23: description still mentions "=", grammar does not.
> page 57: preorder enumeration should be postorder enumeration:

The text, confusingly, first says "preorder" and then "inorder". The latter is the intended one.

> (! (+ (! (x :named f)) f) :named g)  should first define f, then define g.

BTW, the term above should be
(! (+ (! x :named f) f) :named g)

> page 70: DECIMAL is sorted as if it would start with R
> 

Thanks again,

Cesare (also for Aaron and Clark)