[SMT-COMP] Some further questions on benchmarks status and difficulty

Tue Jun 3 10:58:49 EDT 2008

Hello list,

we're sorry to bother you again, but we have some questions and
perplexities about the status and the difficulty of some of the new
benchmarks.

First, there are several instances on nec-smt/large that are still
marked as unknown, but can be solved (with an agreement on status) by at
least three solvers: the latest version of yices and z3 (which are
public), and the current version of mathsat (which has not been released
yet). Therefore, we wanted to know if this is enough to assign them a
status (we could not report this before, since we did not know which
were the unknown instances).

Second, on QF_BV there are some instances whose status seems to depend
on the way division by zero is handled. These are the instances we have
discovered that seem to display this problem:
  QF_BV/spear/wget_v1.10.2/src_wget_vc18197.smt:  :status unsat
  QF_BV/spear/wget_v1.10.2/src_wget_vc18755.smt:  :status unsat
  QF_BV/spear/wget_v1.10.2/src_wget_vc18756.smt:  :status unsat
Both z3 1.2 and yices 1.0.12 report "sat" for them, while spear 2.3
reports unsat for the first two, and times out (1 hour on our machines)
for the third. According to our experiments, we get different results
with different semantics of division by zero which, as far as we
understand, should be undefined, meaning that it should be ok to choose
whatever semantics is liked. Is our interpretation correct? We got it
from this passage of
http://goedel.cs.uiowa.edu/smtlib/theories/BitVectors.smt:

   Note that the semantic interpretation above is underspecified because
   it does not specify the meaning of (bvudiv s t) or (bvurem s t) in
   case bv2nat([[t]]) is 0.  Since the semantics of SMT-LIB's underlying
   logic associates *total* functions to function symbols, we then
   consider as models of this theory *any* interpretation conforming to
   the specifications above (and defining bvudiv and bvurem arbitrarily
   when the second argument evaluates to 0).  Benchmarks using this
   theory should only include a :status sat or :status unsat attribute
   if the status is independent of the particular choice of model for
   the theory.

If our interpretation is correct, then maybe such instances should not
be part of the competition pool?

Third, we are not sure to understand how the difficulty of some new
instances (nec-smt/large in particular) was determined. According to the
rules, difficulty is assigned by running last year's solvers with a
timeout of 600 seconds, and then computed with the formula:

        5 * (1 - solved/total)

In the current version of the library, there is no instance in
QF_LIA/nec-smt/large with difficulty > 3. But according to our
measurements, there are several that can't be solved by any of last year
solvers within 600 seconds. For instance, we ran
nec-smt/large/checkpass/prp-1-50.smt (which is unsat) and
nec-smt/large/getoption_user/prp-18-48.smt (which is sat) on SMT-EXEC,
using a timeout of 1200 seconds, and none of last year's solvers can
solve them. But they are both marked with difficulty 3... Therefore, we
would like to know if we missed something, or if the rules have changed,
of if we just did something wrong...

Best,

Alberto, Alessandro, Anders and Roberto