[SMTCOMP] request for comment

Thu Feb 2 01:11:52 EST 2006

Hi Aaron,

I went over the draft. I find it quite good already and I notice that it
already incorporated some of major changes suggested by various people
since SMT-COMP'05.

Here are some further comments or questions in the hope that you may
find them useful.

page 2
Is the decision to allow people to submit the results of their solver to
the demonstration section a nod to the Microsoft people? Just curious.

page 2
I'm not sure how you are going to tell that two versions of the same 
solvers are not to be allowed if those versions are submitted in binary 
format.

page 2
The session for system presentations on the 21st is a good idea. Is it 
going to be part of CAV's program? If it after CAV's sessions that day, 
have you already secured a room with the organizers? Just checking.

I would propose that you also informally organize an SMT-COMP dinner on 
the 21 or the day before, compatibly with the rest of the FloC schedule. 
I think it would a great way for the participants to get to know one 
another, and for building a sense of community. CASC does that and it 
has worked rather well, although I would not go as far as organizing an 
official dinner, with a dinner registration and fee, as they do.

page 3
The new version of SMT-LIB will will be 1.2. It will be a minor revision 
of the current one. I think it will be available by the end of March, 
together with updated theories, logics and benchmarks as needed. For the 
bitvectors, Silvio and I will propose a temporary theory and logic that 
does not need the introduction of dependent types, so that we can 
already have the bitvectors division at this SMT-COMP. Clark already has 
a good number of benchmarks in CVCL format that should be easy to 
translate in SMT-LIB 1.2.

page 3
this should be first discussed with Clark I guess. But he has made good 
progress with CVCL in solving the majority of a very large set of 
quantified benchmarks from NASA.
It might be good to have a division for these benchmarks, to stimulate 
work on quantifiers. CVCL would not be the only one here. Clark and one 
of his students have also defined a translator for Simplify, so Simplify 
could be entered as well. In addition, I think that the Vampire and the 
SPASS people might be interested in participating as they have been 
working on those benchmarks too.

page 4
The classification into hard and easy benchmarks might be a source of 
trouble and contention, because there are no clear objective measures. 
You may want to spend more time here in devising a clean procedure for 
deciding the difficulty of a benchmark.
Apart from that, the algorithm for choosing the benchmarks seems like a 
reasonably good one, which should shield you from accusations of 
favoritism.
Methodologically though, the problem I see in your proposal is that if 
SMT-LIB happens to have for a given division a disproportionately large 
number of benchmarks from the same class of problems (such as synthetic 
benchmarks), picking randomly with a uniform distribution will give you 
a biased sampling set. I'm not sure how to address that properly though.

About the scrambling issue, scrambling might be hard to justify if it 
changes the structure of the problem, given that structure is an 
important factor in practice. On the other hand, the only reasonable 
structure preserving transformations might be renamings of free symbols, 
which is probably not a deterrent for cheaters. I suppose that the only 
good deterrent would be to have a large enough number of benchmarks in 
SMT-LIB. So, we need more benchmarks!

Page 5
I see you now propose to give the same penalty to wrong unsat and sat 
answers. Any reasons for the change?

I'm not sure that disqualifying a system after three wrong answers makes 
a lot of sense. You either allow any number of wrong answers, and rely 
on the scoring system to penalize, on average, buggy systems, or you 
completely discount the results of a system as soon as it gives you a 
wrong result. It is possible to make a good argument in favor of each of 
these two choices, but I cannot see how to justify your compromise solution.

Given that
- this is not the first edition of the competition,
- you plan to have benchmarks available well in advance,
- we expect to have a correct classification of the benchmarks, and
- a solver has the option to answer "unknown" (or timeout),
I would propose to go for immediate disqualification after the first 
wrong answer. Perhaps though not a disqualification from all divisions 
but only from those where wrong answers were given. The rationale here 
is that, since SMT solvers will most likely have specialized mechanisms 
for each divisions it is possible that a solver is correct for one 
division and buggy for another.

Finally, in case you are not aware of it, you may want to take a look of 
what people are doing for the pseudo-boolean solver evaluation 
(http://www.cril.univ-artois.fr/PB06). It might be another useful source 
of suggestions.

Cheers,

Cesare