[SMT-COMP] SMT-COMP 2016: Incorrect Answers (Unknown Track)
Giles Reger
giles.reger at manchester.ac.uk
Fri Jul 15 18:44:01 EDT 2016
Hi all,
We’re looking at the discrepancy with Z3. It seems that we may have an explanation and that Vampire should have returned ‘unknown’ here. I’m in the middle of running proof checking on our proofs for problems with unknown status and this appears to be an isolated case so far.
Best, Giles
> On 15 Jul 2016, at 19:09, Dejan Jovanović <dejan.jovanovic at sri.com> wrote:
>
> Hi Tjark,
>
> Bruno examined the data from starexec and you can see it summarized here:
>
> https://docs.google.com/spreadsheets/d/1V0Wu5RE1pHDa1inOviJ5i7xte1yo8h6eFElCKjHKfpY/edit?usp=sharing
>
> The situation is not nearly as bad as in your table.
>
> There is in fact only one benchmark where it's unclear what the answer is
> (NIA, vampire vs z3). In all other cases it is very clear that there are
> two solvers giving wrong results: Q3B and raSAT 0.3. More interestingly,
> raSAT seems to be printing unsat on timeouts (which is a bit worrying
> considering that this didn't show up in the main competition).
>
> Best, Dejan
>
> On Wed, Jul 6, 2016 at 11:18 AM Tjark Weber <tjark.weber at it.uu.se> wrote:
>
>> In the "unknown benchmarks" track, we currently have no authoritative
>> way to tell correct from incorrect answers.
>>
>> However, there were 603 conflicts (i.e., at least one solver answered
>> sat and another solver answered unsat) on 79 benchmarks. The following
>> list shows how often each solver was involved in a conflict:
>>
>> 25 ABC_default
>> 25 ABC_glucose
>> 25 Boolector
>> 25 Boolector preprop
>> 55 CVC4-master-2016-05-27-cfef263-main
>> 22 MapleSTP
>> 23 MapleSTP-mt
>> 25 mathsat-5.3.11-linux-x86_64-Main
>> 25 Minkeyrink 2016
>> 12 ProB
>> 27 Q3B
>> 50 raSAT 0.3
>> 7 raSAT 0.4 exp - final
>> 35 SMT-RAT
>> 25 stp-cms-exp-2016
>> 25 stp-cms-mt-2016
>> 25 stp-cms-st-2016
>> 24 stp-minisat-st-2016
>> 1 vampire_smt_4.1_parallel
>> 55 Yices-2.4.2
>> 67 z3-4.4.1
>>
>> Again, this does *not* necessarily mean that the solver gave an
>> incorrect answer, but merely that it disagreed with some other
>> solver(s).
>>
>> A more detailed list that includes benchmark information is attached.
>>
>> Best,
>> Tjark
>> _______________________________________________
>> SMT-COMP mailing list
>> SMT-COMP at cs.nyu.edu
>> http://cs.nyu.edu/mailman/listinfo/smt-comp
>>
> _______________________________________________
> SMT-COMP mailing list
> SMT-COMP at cs.nyu.edu
> http://cs.nyu.edu/mailman/listinfo/smt-comp
More information about the SMT-COMP
mailing list