[SMT-LIB] SMT-COMP update

Roberto Bruttomesso roberto.bruttomesso at gmail.com
Mon Jun 27 17:14:01 EDT 2011


Dear list,

as per the official SMT-COMP 2011 roadmap, on June 15th we have frozen
the library of benchmarks for the competition. In the last couple of
days, however, we discovered some last-minute issues with some of the
benchmarks, in particurlar in the application track (see
http://www.smtcomp.org/2011/application.shtml). We believe that we have
now fixed all the issues, and we want to inform you about the changes,
which fortunately are quite minor. In particular, we have:

- Fixed some conformance issues and translation bugs in the
 "kind_QF_UFIDL" benchmark sets. In particular:

 . There were some benchmarks making use of mixed Integer/Real
   arithmetic, due to the use of some uninterpreted functions like
   e.g. (declare-fun ___z8z___ (Int) Real).
   We have changed the output sort of such functions to be Int, and
   fixed the benchmarks accordingly

 . There were some benchmarks making use of the "div" function
   (e.g. microwave03.ec.smt2), which is not part of the QF_UFIDL logic.
   We have replaced occurrences of terms like "(div x n)" with "x"

 Notice that such changes affected the satisfiability of some queries
 in some cases. Notice also that we have not touched at all the
 "original" traces. Therefore, in a few cases, the satisfiability
 status of the postprocessed traces (which will be used for the
 competition) and the original traces will differ.

 . There are some benchmarks making use of rational numbers. For
   instance, ccp07.ec.smt2 contains a line
   (let ((.def_3129 (+ (/ 1 20) .def_3128))).
   The full list of benchmarks affected by this problem is:

     ccp06.ec.smt2
     ccp07.ec.smt2
     ccp09.ec.smt2
     ccp10.ec.smt2
     ccp11.ec.smt2
     ccp12.ec.smt2
     ccp13.ec.smt2
     ccp14.ec.smt2
     ccp15.ec.smt2
     ccp17.ec.smt2
     ccp19.ec.smt2
     ccp20.ec.smt2
     ccp21.ec.smt2
     ccp22.ec.smt2
     ccp23.ec.smt2
     ccp24.ec.smt2
     cruise_controller_10.ec.smt2
     cruise_controller_18.ec.smt2
     cruise_controller_19.ec.smt2

   The benchmarks have not been removed from the library, but they will
   not be considered for the competition.
   (Many thanks to Leonardo de Moura for reporting this problem.)

 The new version of the kind_QF_UFIDL is available from:
 http://www.smtcomp.org/2011/application/kind_QF_UFIDL_20110606.tgz

- Fixed some conformance issues in the ASASP benchmarks. The new tarball
 is available from:
 http://www.smtcomp.org/2011/application/asasp_20110606.tgz

- Fixed the results for ASASP benchmark. We initially posted results
for directories
  bench_empty, bench_hierarchy_implicit, bench_hierarchy_explicit
  but some of them turned out to be wrong (thanks to Leonardo de Moura
for spotting this issue).
  The correct result traces are available here

  http://www.smtcomp.org/2011/application/asasp_20110626_results.tar.gz

  Given the great demand of resources necessary to check these benchmarks (there
  is currently no SMT-solver able to solve them, and so we are using
an encoding into tptp2 -- the
  solving process is therefore not incremental -- the encoding and the
computational effort makes
  status computation very difficult for these suite) we determined the
status of the benchmarks up to
  a certain depth (after which "unknown" is reported). The following
are the files that, up to that depth,
  were reporting wrong answers in the previous results:

 bench_hierarchy_implicit/bench_1_1_1_random6.smt2
 bench_hierarchy_implicit/bench_1_1_1_random7.smt2
 bench_hierarchy_implicit/bench_1_1_22_random10.smt2
 bench_hierarchy_implicit/bench_1_1_22_random4.smt2
 bench_hierarchy_implicit/bench_1_1_22_random9.smt2
 bench_hierarchy_implicit/bench_1_1_24_random4.smt2
 bench_hierarchy_implicit/bench_2_1_1_random6.smt2
 bench_hierarchy_implicit/bench_2_1_1_random7.smt2
 bench_hierarchy_implicit/bench_2_1_22_random10.smt2
 bench_hierarchy_implicit/bench_2_1_22_random4.smt2
 bench_hierarchy_implicit/bench_2_1_22_random9.smt2
 bench_hierarchy_implicit/bench_2_1_24_random4.smt2

these benchmarks will be excluded from the competition.

- Fixed the result trace for the s3_srvr benchmark of the
 "blast_simplify_calls_QF_UFLIA" set.

- Added result traces for all the bechmark families. They are all
 available from http://www.smtcomp.org/2011/application.shtml
 Notice that for the ASASP benchmarks, we could produce result traces
 only for a subset of the benchmarks.
 Also, in general (for all the families) there might be result traces
 containing some "unknown" status for some of the queries.
 For the competition, we will consider only benchmarks for which we
 have results, and execute only the prefix of queries up to (not
 including) the first one with an "unknown" result. For example, if
 TRACE.smt2 has the following result file:
    sat
    unsat
    unsat
    unknown
    sat
    sat
    unsat
  Only the first 3 queries will be executed by the trace executor
  during the competition.


Finally, as regards benchmark selection for the application track, for
the QF_UFLIA, QF_LRA, QF_BV, and QF_LIA divisions, we have decided to
select all the benchmarks for the competition, whereas for the QF_UFIDL
and UFLRA divisions (corresponding to the kind_QF_UFIDL and ASASP
families respectively), we will simply select the N benchmarks
containing the largest number of "check-sat" queries (for which we have
results), where N will be determined according to the number of
competitors in such two divisions.


The SMTCOMP'11 Organizers
Roberto, Morgan, and Alberto


More information about the SMT-LIB mailing list