[SMT-COMP] SMT-COMP 2019: Call for Comments, Benchmarks, Solvers

Tjark Weber tjark.weber at it.uu.se
Thu Jan 31 07:50:57 EST 2019


On Wed, 2019-01-30 at 13:03 -0800, Bruno Dutertre wrote:
> I second Dejan's suggestion.

I concur. The pseudo-random selection of benchmarks was abandoned in
2015, and the weighted scoring was introduced in 2016 to compensate for
the fact that every available benchmark is now used in the competition.
If a pseudo-random selection of benchmarks is re-introduced, it would
make sense to move the complexity from the scoring formula back to the
selection algorithm. (For instance, you might base the probability for
selecting a benchmark on what is currently the benchmark's weight; or
select a number of benchmarks from each family that corresponds to the
weight of the family.)

> I'm for an Olympic-style medal table. Winners in each logic get
> a gold medal, second places get silver, third places get bronze.
> Then just publish  the overall medal table. That would be an easy
> way to summarize the results.

Such a table has obvious appeal for its simplicity and familiarity. On
the other hand,

- it treats all divisions equally. In practice, some divisions are
arguably more important (and receive more attention) than others. The
current formula attempts to model this.

- some divisions are very competitive, but many others only have a very
small number of competing solvers. Third or even second place would be
trivial to achieve in a number of divisions. The current formula does
not reward trivial solvers that can't actually solve anything.

- it treats all places in a division after the third equally. In
particular, there is no penalty for giving wrong answers; entering a
(however buggy) solver into additional divisions could only improve the
solver's overall ranking. The current formula penalizes solvers that
give incorrect answers, thereby forcing them to more carefully choose
the divisions in which they want to compete.

Of course, any one-dimensional ranking scheme that compares solvers for
different logics is inherently limited. Regardless of the specific
scheme that is going to be used, I would suggest not to over-emphasize
the competition-wide ranking.

Best,
Tjark









När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy


More information about the SMT-COMP mailing list