[SMT-COMP] Rules: refinements for competition-wide recognitions

Thu Jul 4 14:18:39 EDT 2019

Dear participants,

while finalizing our scoring script to generate results for the
competition-wide recognitions, we noticed that the rules document was
not specific enough to support all cases/corner cases.

We updated the document in commit
https://github.com/SMT-COMP/smt-comp/commit/be20655eea60be1adb2f109038e3c6c9f228b0f7.

The changes are as follows:

Biggest lead ranking:
---------------------

The biggest lead ranking previously only took the correctly solved score
into account. It didn't handle ties.

Changes:

In case of a tie in terms of correctly solved instances, we now rank the
tied solvers according to their runtime in the corresponding division
(the division where they have the biggest lead).

Largest contribution ranking:
-----------------------------

The virtual best score vbss(D,S) as previously defined in the rules does
not compute values between 0 and 1. Further, selecting the minimum of
n_b * c_b for a benchmark b is not sufficient to measure the impact on
the VBS (in particular since it does not allow to penalize solvers that
don't solve a given benchmark).

Changes:

In a division D, we first exclude all solvers with an error score > 0
('unsound solvers'). Unsound solvers are not considered.

We further only consider divisions with more than 2 competitive sound
solvers, i.e., |S| > 2. This limit was introduced to remedy the fact
that otherwise this recognition and the biggest lead recognition may not
be distinct enough.

For division D and the set of sound competition solvers S, instead of
one score vbss(D,S), we now compute 3 different scores:

vbss_n(D,S) ... VBS correct score for solvers S
vbss_c(D,S) ... VBS CPU score for solvers S
vbss_w(D,S) ... VBS wall-clock score for solvers S

To rank solvers with respect to the impact of removing a solver s from
S, we now compute an impact rank for each score, i.e.,

impact_correctness = 1 - vbss_n(D,S-s)/vbss_n(D,S)
impact_cpu_time = 1 - vbss_c(D,S)/vbss_c(D,S-s)
impact_wall_time = 1 - vbss_w(D,S)/vbss_w(D,S-s)

Since these ranks are compared across all divisions, for the sake of
fairness, we further decided to normalize them based on the number of
competitive solver/benchmark pairs in the division.

The largest contribution winner is the solver with the highest
normalized impact_score. In case of a tie, we determine the rank of the
tied solvers based on their runtime impact.

Aina

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: </pipermail/smt-comp/attachments/20190704/23ece368/attachment.asc>