[SMT-COMP] [SMT-LIB] SMT-COMP 2016: Draft Rules and Timeline Available

Fri May 13 10:39:52 EDT 2016

Hi all,

We are planning on entering the competition for the first time this year [with our Vampire solver] and this topic of statuses has been on our minds.

In experiments we solve many problems with an unknown status and find the notion of restricting the competition to only those with a known status to be at odds with the overall aim here: to solve hard problems efficiently.

We are more familiar with how the CASC competition is organised where
(i) All solvers are run on all problems (with competition timing, not an inflated time limit) to establish status (rating in CASC)
(i) All solvable problems (by anybody) are eligible for the competition

It seems that requiring two solvers to agree on the status of a benchmark does not encourage finding new ways to solve problems but only solving problems solvable by others.

Furthermore, we observe that where systems are similar (e.g. are DPLL(T) style SMT solvers) then it seems reasonable to assume they will solve similar problems, to a certain extent. But when there are solvers implementing radically different approaches (as in our case) it is likely to be the case that the set of solved problems is quite different.

In some cases we solve about 20% of problems unsolved by other systems (based on our experiments), marking these as ineligible, from our point of view, seems unfair. As a further example, in the UFNIA division for last year’s competition only 59 problems separated the winner (CVC4) and runner up (Z3) whereas we solve 627 problems that neither system can solve.

Cheers,
Giles

On 13 May 2016, at 14:19, David R. Cok <dcok at grammatech.com<mailto:dcok at grammatech.com>> wrote:

The procedure in the past has been to run a set of solvers and to assign a status if (a) all solvers producing a sat or unsat answer agree and (b) at least two solvers produce a sat or unsat answer.

A couple years ago, after the competition in 2014, I, Aaron Stump, and Clark Barrett ran a big job with a long time out (10 hours, I think) to resolve the status of any benchmarks without a status. That task took a few months, with the rate of progress depending on how many nodes Aaron would let the task have. We succeeded on most but not all of the outstanding unknown benchmarks. It would be straightforward to do so again, but it does take some time.

- David

On 5/13/2016 8:19 AM, Christoph Wintersteiger wrote:
Hi all,

Yes, that was the reason and indeed, the rest is just basic test cases. What would be the procedure for assigning statuses? Can we just run a couple of solvers and assign a status if they agree on them?

Cheers,
Christoph

Christoph M. Wintersteiger | Researcher | Tel: +44 1223 479724 | Fax: +44 1223 479999 | research.microsoft.com/people/cwinter<http://research.microsoft.com/people/cwinter>

Microsoft Research Limited (company number 03369488) is a company registered in England and Wales whose registered office is at 21 Station Road, Cambridge, CB1 2FB

-----Original Message-----
From: smt-comp-bounces at cs.nyu.edu<mailto:smt-comp-bounces at cs.nyu.edu> [mailto:smt-comp-bounces at cs.nyu.edu] On Behalf Of Tjark Weber
Sent: 13 May 2016 13:01
To: Zhoulai Fu <wbd0730 at 163.com<mailto:wbd0730 at 163.com>>
Cc: Zhoulai <zell08v at gmail.com<mailto:zell08v at gmail.com>>; David Deharbe <david.deharbe at clearsy.com<mailto:david.deharbe at clearsy.com>>; Sylvain Conchon <sylvain.conchon at lri.fr<mailto:sylvain.conchon at lri.fr>>; smt-lib <smt-lib at cs.nyu.edu<mailto:smt-lib at cs.nyu.edu>>; smt-comp at cs.nyu.edu<mailto:smt-comp at cs.nyu.edu>
Subject: Re: [SMT-COMP] [SMT-LIB] SMT-COMP 2016: Draft Rules and Timeline Available

Zhoulai,

On Fri, 2016-05-13 at 13:26 +0800, Zhoulai Fu wrote:

I find that Alberto Griggio proposed >200 benchmarks for the SMT-COMP
15'QF_FP division, but for some reason they were not used in the final
competition.    Would you let me know whether those benchmarks will be
used in this year's competition?  In fact,  since the other 2015's
QF_FP benchmarks  mostly consist in sanity checks (e.g. "x==1e-242 /\
y == 1e-197") only and can even be solved via a quick syntax analysis
I guess, will you add more challenging floating-point benchmarks such
as Griggio's for SMT-COMP 2016?  Thanks for your help.
If I remember correctly, Griggio's benchmarks were ineligible for the competition because their satisfiability status was unknown. I do not think that this has changed until now.

Ideally, we (or the SMT-LIB maintainers) would make an attempt to determine the status of these benchmarks, and then include them in the competition. However, I am not very confident that there will be enough time to do this before the benchmark freeze for 2016.

Best,
Tjark

_______________________________________________
SMT-COMP mailing list
SMT-COMP at cs.nyu.edu<mailto:SMT-COMP at cs.nyu.edu>
https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fcs.nyu.edu%2fmailman%2flistinfo%2fsmt-comp&data=01%7c01%7ccwinter%40064d.mgd.microsoft.com%7c11c88759a8554d25588008d37b26712c%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=UuB57bytOvzy903zFi5RAsvCYXIasevdVKyeuT6Irm8%3d
_______________________________________________
SMT-LIB mailing list
SMT-LIB at cs.nyu.edu<mailto:SMT-LIB at cs.nyu.edu>
http://www.cs.nyu.edu/mailman/listinfo/smt-lib

_______________________________________________
SMT-COMP mailing list
SMT-COMP at cs.nyu.edu<mailto:SMT-COMP at cs.nyu.edu>
http://cs.nyu.edu/mailman/listinfo/smt-comp