[SMT-COMP] Contribution of QF_FPBV and QF_FPABV benchmarks

Fri Apr 7 08:40:39 EDT 2017

Hi,

> Is it okay to put these "unknown" benchmarks as "unsat"? I really want
> them included in the competition so solver
> writers are encouraged to try improve performance on these difficult
> benchmarks. If putting them as "unknown"
> means they won't be used in the competition I'd rather just label them
> as "unsat" and note in the description that
> this has not been confirmed.
>
>
> I understand this attitude. This is one of the reasons the Call for Comments
> asked people to comment on benchmarks with unknown status in the
> competition. I repeat the question here.
>
> - Benchmarks with unknown status: Previous SMT-COMPs have only used
>  benchmarks whose status (sat/unsat) was known. One might argue that
>  this favors imitation of existing solvers over innovation. Should
>  benchmarks with unknown status be used in the competition? If so,
>  how should answers for such benchmarks be scored? Can scoring for
>  one solver remain independent of the answers given by other solvers,
>  or should disagreements between solvers lead to negative scores for
>  a solver?
>
> I believe from your comments that you think that benchmarks with unknown
> status should be used in the competition, do you have any other comments
> related to this question? We are keen to have a dialogue on this,
> potentially controversial, topic.
>
> Cheers, Giles

Yes I believe "unknown" benchmarks should be allowed in the
competition but with a different scoring mechanism.

Here is a sketch suggested approach for scoring mechanism base

1. Run all tools on the "unknown" benchmarks.
2. For each benchmark
 - If one or more tools report "sat" and all other tools report
"unknown" then the benchmark is considered "sat" and all tools that
reported "sat" are awarded a point. The benchmark will then be
relabelled as "sat".
 - If one or more tools report "unsat" and all other tools report
"unknown" then the benchmark is considered "unsat" and all tools that
reported "unsat" are awarded a point. The benchmark will then be
labelled as "unsat".
- If one or more tools report "sat" and one or more tools report
"unsat" then no points are awarded for this benchmark and benchmark
will continue to be labelled as "unknown".

Optional idea: Handling tool conflicts
In the case of conflict (i.e. one tool reports "sat" and another
reports "unsat") we could request that the tool authors provide some
sort of proof that their tool gave the correct answer. If the correct
answer can be established (say within a certain time window during the
competition) the benchmark can be relabelled and then each tools score
for the benchmark can be recomputed so that the tools that gave the
correct answer gets a point for the benchmark but the tools that gave
the wrong answer gets a negative score for the benchmark.

I'm not really sure this idea is workable but if it were possible it
would encourage benchmarks labelled "unknown" to be relabelled. Even
if this idea is not workable when conflicts happen it would be good
for tool authors to collaborate after the competition to establish
what the correct answer for the benchmark is and relabel it so that it
can be used for the next competition.

Hope that helps,
Dan.