[SMT-COMP] Fwd: [SMT-LIB] SMT-COMP 2016: Draft Rules and Timeline Available

Giles Reger giles.reger at manchester.ac.uk
Mon May 16 19:57:11 EDT 2016


[Sending on Andrei’s behalf as he is not on the list]

Hi Tjark,

you have already got a reply from Giles.

To me, competitions in our area should stimulate development of better
systems. Every unsolved problem must be a challenge. This is not what
SMT-COMP encourages. You can solve hundreds of problems nobody can
solve ... but this won't give you a point according to your rules.

Another important point for a competition is fairness. Your rules
treat unfavorably two kinds of systems:

(1) those that use methods different from other systems (and therefore
are likely to solve different problems)
(2) newcomers, because they have to compete on problems at least two
old system solve (this is in the area where many divisions have
previously had only two entrants!)

1) is not a speculation. We made preliminary tests: there is a
division where we solve over 30% more problems than all other systems
together and would not be winners according to your rules. Even more
interestingly, in this division on the union of problems solvable by
at least one other system, we solve more than any of them, which also
illustrates 2).

Note that Vampire is both 1) and 2) .

If you main motivation is 100% soundness, you did not achieve it. In
quantifier divisions, there are essentially 2 systems and we know at
least one of them is unsound (see the example from Giles).

As Giles pointed out, (mainly in quantifier divisions) giving solvers
10 hours (or 10 days) does not make a lot of new problems solvable. We
run Vampire all year round on 180+ cores and know this too well :)

If your motivation is progress in the area, changing the current rules
seems to be the only way to go.

Best, Andrei

On 15 May 2016 at 14:49, Tjark Weber <tjark.weber at it.uu.se> wrote:
> Giles,
> 
> On Fri, 2016-05-13 at 14:39 +0000, Giles Reger wrote:
>> In experiments we solve many problems with an unknown status and find
>> the notion of restricting the competition to only those with a known
>> status to be at odds with the overall aim here: to solve hard
>> problems efficiently.
>> 
>> We are more familiar with how the CASC competition is organised where
>> (i) All solvers are run on all problems (with competition timing, not
>> an inflated time limit) to establish status (rating in CASC)
>> (i) All solvable problems (by anybody) are eligible for the
>> competition
>> 
>> It seems that requiring two solvers to agree on the status of a
>> benchmark does not encourage finding new ways to solve problems but
>> only solving problems solvable by others.
> 
> I understand your point of view. Indeed, the current rules disadvantage
> solvers that can solve benchmarks not solvable (with a much higher time
> limit, nonetheless) by any other solver.
> 
> But historically, SMT-COMP has valued correct answers much more highly
> than incorrect answers. Certainly returning an incorrect answer does
> not amount to solving the benchmark. If a single solver's answer
> suffices, how can we be confident that the answer is correct?
> 
> (It appears to me that the CASC system could potentially be gamed by a
> wrapper script that runs last year's best solver, and arbitrarily
> returns "sat" or "unsat" if the solver times out.)
> 
> Opening up SMT-COMP to benchmarks with unknown status would, at the
> very least, require us to redefine when an answer is (in)correct. I am
> happy to have an open-minded discussion about this, but I do not think
> it would be appropriate to make such a substantial change on short
> notice (i.e., for SMT-COMP 2016).
> 
> Best,
> Tjark
> 



More information about the SMT-COMP mailing list