[SMTCOMP] request for comment
Cesare Tinelli
tinelli at cs.uiowa.edu
Thu Feb 2 01:11:52 EST 2006
Hi Aaron,
I went over the draft. I find it quite good already and I notice that it
already incorporated some of major changes suggested by various people
since SMT-COMP'05.
Here are some further comments or questions in the hope that you may
find them useful.
page 2
Is the decision to allow people to submit the results of their solver to
the demonstration section a nod to the Microsoft people? Just curious.
page 2
I'm not sure how you are going to tell that two versions of the same
solvers are not to be allowed if those versions are submitted in binary
format.
page 2
The session for system presentations on the 21st is a good idea. Is it
going to be part of CAV's program? If it after CAV's sessions that day,
have you already secured a room with the organizers? Just checking.
I would propose that you also informally organize an SMT-COMP dinner on
the 21 or the day before, compatibly with the rest of the FloC schedule.
I think it would a great way for the participants to get to know one
another, and for building a sense of community. CASC does that and it
has worked rather well, although I would not go as far as organizing an
official dinner, with a dinner registration and fee, as they do.
page 3
The new version of SMT-LIB will will be 1.2. It will be a minor revision
of the current one. I think it will be available by the end of March,
together with updated theories, logics and benchmarks as needed. For the
bitvectors, Silvio and I will propose a temporary theory and logic that
does not need the introduction of dependent types, so that we can
already have the bitvectors division at this SMT-COMP. Clark already has
a good number of benchmarks in CVCL format that should be easy to
translate in SMT-LIB 1.2.
page 3
this should be first discussed with Clark I guess. But he has made good
progress with CVCL in solving the majority of a very large set of
quantified benchmarks from NASA.
It might be good to have a division for these benchmarks, to stimulate
work on quantifiers. CVCL would not be the only one here. Clark and one
of his students have also defined a translator for Simplify, so Simplify
could be entered as well. In addition, I think that the Vampire and the
SPASS people might be interested in participating as they have been
working on those benchmarks too.
page 4
The classification into hard and easy benchmarks might be a source of
trouble and contention, because there are no clear objective measures.
You may want to spend more time here in devising a clean procedure for
deciding the difficulty of a benchmark.
Apart from that, the algorithm for choosing the benchmarks seems like a
reasonably good one, which should shield you from accusations of
favoritism.
Methodologically though, the problem I see in your proposal is that if
SMT-LIB happens to have for a given division a disproportionately large
number of benchmarks from the same class of problems (such as synthetic
benchmarks), picking randomly with a uniform distribution will give you
a biased sampling set. I'm not sure how to address that properly though.
About the scrambling issue, scrambling might be hard to justify if it
changes the structure of the problem, given that structure is an
important factor in practice. On the other hand, the only reasonable
structure preserving transformations might be renamings of free symbols,
which is probably not a deterrent for cheaters. I suppose that the only
good deterrent would be to have a large enough number of benchmarks in
SMT-LIB. So, we need more benchmarks!
Page 5
I see you now propose to give the same penalty to wrong unsat and sat
answers. Any reasons for the change?
I'm not sure that disqualifying a system after three wrong answers makes
a lot of sense. You either allow any number of wrong answers, and rely
on the scoring system to penalize, on average, buggy systems, or you
completely discount the results of a system as soon as it gives you a
wrong result. It is possible to make a good argument in favor of each of
these two choices, but I cannot see how to justify your compromise solution.
Given that
- this is not the first edition of the competition,
- you plan to have benchmarks available well in advance,
- we expect to have a correct classification of the benchmarks, and
- a solver has the option to answer "unknown" (or timeout),
I would propose to go for immediate disqualification after the first
wrong answer. Perhaps though not a disqualification from all divisions
but only from those where wrong answers were given. The rationale here
is that, since SMT solvers will most likely have specialized mechanisms
for each divisions it is possible that a solver is correct for one
division and buggy for another.
Finally, in case you are not aware of it, you may want to take a look of
what people are doing for the pseudo-boolean solver evaluation
(http://www.cril.univ-artois.fr/PB06). It might be another useful source
of suggestions.
Cheers,
Cesare
More information about the SMT-COMP
mailing list