[SMT-LIB] SMT-COMP update
Roberto Bruttomesso
roberto.bruttomesso at gmail.com
Mon Jun 27 17:14:01 EDT 2011
Dear list,
as per the official SMT-COMP 2011 roadmap, on June 15th we have frozen
the library of benchmarks for the competition. In the last couple of
days, however, we discovered some last-minute issues with some of the
benchmarks, in particurlar in the application track (see
http://www.smtcomp.org/2011/application.shtml). We believe that we have
now fixed all the issues, and we want to inform you about the changes,
which fortunately are quite minor. In particular, we have:
- Fixed some conformance issues and translation bugs in the
"kind_QF_UFIDL" benchmark sets. In particular:
. There were some benchmarks making use of mixed Integer/Real
arithmetic, due to the use of some uninterpreted functions like
e.g. (declare-fun ___z8z___ (Int) Real).
We have changed the output sort of such functions to be Int, and
fixed the benchmarks accordingly
. There were some benchmarks making use of the "div" function
(e.g. microwave03.ec.smt2), which is not part of the QF_UFIDL logic.
We have replaced occurrences of terms like "(div x n)" with "x"
Notice that such changes affected the satisfiability of some queries
in some cases. Notice also that we have not touched at all the
"original" traces. Therefore, in a few cases, the satisfiability
status of the postprocessed traces (which will be used for the
competition) and the original traces will differ.
. There are some benchmarks making use of rational numbers. For
instance, ccp07.ec.smt2 contains a line
(let ((.def_3129 (+ (/ 1 20) .def_3128))).
The full list of benchmarks affected by this problem is:
ccp06.ec.smt2
ccp07.ec.smt2
ccp09.ec.smt2
ccp10.ec.smt2
ccp11.ec.smt2
ccp12.ec.smt2
ccp13.ec.smt2
ccp14.ec.smt2
ccp15.ec.smt2
ccp17.ec.smt2
ccp19.ec.smt2
ccp20.ec.smt2
ccp21.ec.smt2
ccp22.ec.smt2
ccp23.ec.smt2
ccp24.ec.smt2
cruise_controller_10.ec.smt2
cruise_controller_18.ec.smt2
cruise_controller_19.ec.smt2
The benchmarks have not been removed from the library, but they will
not be considered for the competition.
(Many thanks to Leonardo de Moura for reporting this problem.)
The new version of the kind_QF_UFIDL is available from:
http://www.smtcomp.org/2011/application/kind_QF_UFIDL_20110606.tgz
- Fixed some conformance issues in the ASASP benchmarks. The new tarball
is available from:
http://www.smtcomp.org/2011/application/asasp_20110606.tgz
- Fixed the results for ASASP benchmark. We initially posted results
for directories
bench_empty, bench_hierarchy_implicit, bench_hierarchy_explicit
but some of them turned out to be wrong (thanks to Leonardo de Moura
for spotting this issue).
The correct result traces are available here
http://www.smtcomp.org/2011/application/asasp_20110626_results.tar.gz
Given the great demand of resources necessary to check these benchmarks (there
is currently no SMT-solver able to solve them, and so we are using
an encoding into tptp2 -- the
solving process is therefore not incremental -- the encoding and the
computational effort makes
status computation very difficult for these suite) we determined the
status of the benchmarks up to
a certain depth (after which "unknown" is reported). The following
are the files that, up to that depth,
were reporting wrong answers in the previous results:
bench_hierarchy_implicit/bench_1_1_1_random6.smt2
bench_hierarchy_implicit/bench_1_1_1_random7.smt2
bench_hierarchy_implicit/bench_1_1_22_random10.smt2
bench_hierarchy_implicit/bench_1_1_22_random4.smt2
bench_hierarchy_implicit/bench_1_1_22_random9.smt2
bench_hierarchy_implicit/bench_1_1_24_random4.smt2
bench_hierarchy_implicit/bench_2_1_1_random6.smt2
bench_hierarchy_implicit/bench_2_1_1_random7.smt2
bench_hierarchy_implicit/bench_2_1_22_random10.smt2
bench_hierarchy_implicit/bench_2_1_22_random4.smt2
bench_hierarchy_implicit/bench_2_1_22_random9.smt2
bench_hierarchy_implicit/bench_2_1_24_random4.smt2
these benchmarks will be excluded from the competition.
- Fixed the result trace for the s3_srvr benchmark of the
"blast_simplify_calls_QF_UFLIA" set.
- Added result traces for all the bechmark families. They are all
available from http://www.smtcomp.org/2011/application.shtml
Notice that for the ASASP benchmarks, we could produce result traces
only for a subset of the benchmarks.
Also, in general (for all the families) there might be result traces
containing some "unknown" status for some of the queries.
For the competition, we will consider only benchmarks for which we
have results, and execute only the prefix of queries up to (not
including) the first one with an "unknown" result. For example, if
TRACE.smt2 has the following result file:
sat
unsat
unsat
unknown
sat
sat
unsat
Only the first 3 queries will be executed by the trace executor
during the competition.
Finally, as regards benchmark selection for the application track, for
the QF_UFLIA, QF_LRA, QF_BV, and QF_LIA divisions, we have decided to
select all the benchmarks for the competition, whereas for the QF_UFIDL
and UFLRA divisions (corresponding to the kind_QF_UFIDL and ASASP
families respectively), we will simply select the N benchmarks
containing the largest number of "check-sat" queries (for which we have
results), where N will be determined according to the number of
competitors in such two divisions.
The SMTCOMP'11 Organizers
Roberto, Morgan, and Alberto
More information about the SMT-LIB
mailing list