[SMT-LIB] simplify test suite and boogie benchmarks in AUFLIA

Wed May 2 18:35:40 EDT 2007

On 5/2/07, Clark Barrett <barrett at cs.nyu.edu> wrote:
> > I've updated, as requested by Clark. The changes are:
> > - the symbols / and % are not used, uninterpreted divide and modulo
> > operators are used
> > - * is only used on constants, otherwise uninterpreted function multiply is used
>
> That's great.  I have to admit that I have somewhat mixed feelings about
> this--ideally we want to capture benchmarks in their "native" form, but at the
> same time, if non-linearity isn't really used, then it's not really a good
> non-linear benchmark either.  So I think that forcing these benchmarks to fit
> into AUFLIA is the right thing to do.

Yes... The tools are not limited to linear arithmetic. The fact, that
the benchmarks don't use non-linear constructs is probably only
because Simplify doesn't handle it.

>
> > It seems there is only one Boogie benchmark affected by this, namely
> > Search_Search.Reverse_Div_System.Int32.array_notnull-noinfer.smt,
> > otherwise fx7 is still able to prove the benchmarks expected to unsat.
> > It also proves the ones, that were previously in the non-linear
> > directory, it seems the non-linearity was not crucial.
>
> So, let me see if I understand:
>
> 1) the benchmarks in the "valid" subdirectory have indeed been verified by you to
>    be unsat in their current form.

Yes.

> 2) the benchmarks in the "invalid" subdirectory are expected to be satisfiable
>    (are they in fact *known* to be satisfiable?)

I wouldn't say they are known to be satisfable. They are too big to be
verified by hand, and I know no tool that could perform such a
verification.

They are intended to be sat, but in the past I've found bugs in both
ESC/Java  and Boogie background predicates that made the queries
unsat...

> 3) the single benchmark in the "non-linear" subdirectory is of unknown
>    satisfiability (though it is in fact now a linear benchmark--i.e. properly
>    in the logic AUFLIA)

Yes.

>
> > As for simplify test suite benchmarks, there is several ones, that are
> > failing for me. I've put them in *_possibly_sat directories.
>
> So again, to clarify, we expect all the simplify benchmarks to be
> unsatisfiable.  However, the ones in the possibly_sat directories have not been
> verified as unsatisfiable after the conversion to linear arithmetic, right?

Yes, that's correct.

>
> > cvc3 -lang smtlib +translate now reports the benchmarks to be in
> > AUFLIA logic. BTW, it seems to translate (distinct ...) into quadratic
> > number of inequalities.
>
> Ah yes, cvc3 doesn't natively handle the distinct primitive yet.  However, this
> brings up another point.  The first occurrence of distinct is fine.  It labels
> a set of symbolic constants as distinct.  The second occurrence, however, is
> stating that a set of integers are distinct.  This is implied by the underlying
> theory and as such, this is the sort of assumption we don't want too have to
> include in SMT-LIB.  Do you have any objection to removing these instances of
> distinct?  I can do it on my end unless you don't mind another round...

I can do that, that's not a problem.

>
> > I would like to bring up the problem of triggers in the benchmarks. I
> > was hoping for the triggers to be also present in benchmarks in the
> > competition:
> > - they are there in the original benchmarks,
> > - the SMT standard mentions triggers as a possible use for
> > annotations, it just doesn't define the syntax,
> > - changing any SMT solver, that currently infers triggers, so that it
> > can also read the explicit ones seems like a simple hack for me.
> >
> > I agree it might be controversial to have them introduced that late though...
> >
> > What do you think about adding them to the SMT standard anyway?
>
> For the purpose of the competition, it seems interesting both ways--with and
> without patterns.  The first tests how well patterns can be exploited and the
> second tests the ability of tools to infer their own patterns.

So the idea here would be to see how good can we do completely
automatically, without a hint from the user?

In general this is a very good idea. But remember that these queries
have been prepared for Simplify to be processed as fast as possible.
This means a whole bunch of tricks, hacks and voodoo magic have been
put into the axiomatizations, so that the triggers can selected
correctly. Selecting a different axiomatization for performance
reasons seems like an order of magnitude more complicated task than
marely stating the triggers.

So for me, using these benchmarks without triggers, is like driving a
jet on a highway -- it will go faster then a car, but that's not
exactly the point.

> I think it's too late for this year--for one thing, it seems likely that only
> your tool will be able to parse the patterns meaningfully--we typically want at
> least 2 or 3 solvers in a division to make it part of the competition.
>
> However, I like the idea of making it part of the standard and then trying both
> in next year's competition.

OK!

-- 
   Michał