[SMT-COMP] SMT-COMP 2019: Call for Comments, Benchmarks, Solvers

Thu Jan 31 08:36:46 EST 2019

> On 31 Jan 2019, at 12:50, Tjark Weber <tjark.weber at it.uu.se> wrote:
> 
> On Wed, 2019-01-30 at 13:03 -0800, Bruno Dutertre wrote:
>> I second Dejan's suggestion.
> 
> I concur. The pseudo-random selection of benchmarks was abandoned in
> 2015, and the weighted scoring was introduced in 2016 to compensate for
> the fact that every available benchmark is now used in the competition.
> If a pseudo-random selection of benchmarks is re-introduced, it would
> make sense to move the complexity from the scoring formula back to the
> selection algorithm. (For instance, you might base the probability for
> selecting a benchmark on what is currently the benchmark's weight; or
> select a number of benchmarks from each family that corresponds to the
> weight of the family.)

Indeed, this is something we’re discussing.

As a further point to the scoring comments, which I agree with, it’s really best to avoid making the score reliant on the performance of other solvers as it makes reproducibility impossible unless somebody has access to all solvers. We would want to make it as easy as possible for future researchers to compare their solver to the results of a previous competition.

>> I'm for an Olympic-style medal table. Winners in each logic get
>> a gold medal, second places get silver, third places get bronze.
>> Then just publish  the overall medal table. That would be an easy
>> way to summarize the results.
> 
> Such a table has obvious appeal for its simplicity and familiarity. On
> the other hand,
> 
> - it treats all divisions equally. In practice, some divisions are
> arguably more important (and receive more attention) than others. The
> current formula attempts to model this.
> 
> - some divisions are very competitive, but many others only have a very
> small number of competing solvers. Third or even second place would be
> trivial to achieve in a number of divisions. The current formula does
> not reward trivial solvers that can't actually solve anything.
> 
> - it treats all places in a division after the third equally. In
> particular, there is no penalty for giving wrong answers; entering a
> (however buggy) solver into additional divisions could only improve the
> solver's overall ranking. The current formula penalizes solvers that
> give incorrect answers, thereby forcing them to more carefully choose
> the divisions in which they want to compete.
> 
> Of course, any one-dimensional ranking scheme that compares solvers for
> different logics is inherently limited. Regardless of the specific
> scheme that is going to be used, I would suggest not to over-emphasize
> the competition-wide ranking.

One question here is whether we need an overall ranking at all. As you point out Tjark, not all divisions are `equal' but the relative importance of divisions is not always clear. As organisers we had discussed trying to merge divisions into larger entities such that we could reward fewer awards but we could not find any sensible larger grouping of divisions that could be properly justified from a pragmatic perspective. Given that the competition aims to encourage new developments, the awards should be associated with things we think that solver developers should be concentrating on. What’s unclear to me is whether we should be encouraging ‘general’ solvers able to handle all logics (or larger groups of logics) rather than ‘specialist’ solvers focussed on one or two logics - the notion of a global winner encourages general solvers.

Cheers, Giles