[SMT-COMP] SMT-COMP 2019: Call for Comments, Benchmarks, Solvers

Wed Jan 23 10:17:07 EST 2019

Hi,

We would also suggest to add three sub-tracks to 'Single Problem
Track' (formerly 'Main Track'):

This will not require extra resources, but just some
scripts. Assuming a category 'C' of this track, once all the
solvers have been ran on the benchmarks in 'C', one could
classify the solvers using three criteria to identify:

(1) the best solver answering 'unsat'
(2) the best solver answering 'sat'
(3) the best solver overall

Of course, the solvers' answers should not conflict for (1)
and (2). Otherwise, the benchmark should be discarded for the
corresponding sub-track.

This distinction is already used in the SAT competition, but not
in SMT. We think that it's very important in practice, because
the techniques used to prove unsatisfiability may be different
from those used to show satisfiability. For instance, some
solvers always answer "unsat", "unknown" or "timeout", but
never "sat".

Best regards,

Mohamed Iguernlala.

-- 
Senior R&D Engineer, OCamlPro SAS
Research Associate, VALS team, LRI
LinkedIn: https://fr.linkedin.com/in/mohamed-iguernlala-71515979

On 1/23/19 7:20 AM, Aina Niemetz wrote:
> ======================================================================
>
>     14th International Satisfiability Modulo Theories Competition
>                             (SMT-COMP'19)
>
>                            July 7-8, 2019
>                           Lisbon, Portugal
>
>                           CALL FOR COMMENTS
>                          CALL FOR BENCHMARKS
>                      PRELIMINARY CALL FOR SOLVERS
>
> ======================================================================
>
>
> We are pleased to announce the 2019 edition of SMT-COMP.
>
>
> SMT-COMP is the annual competition among Satisfiability Modulo
> Theories (SMT) solvers.
>
> The goals of SMT-COMP'19 are to encourage scientific advances in the
> power and scope of solvers, to stimulate the community to explore and
> discuss shared challenges, to promote tools and their usage, to engage
> and include new members of the community (in a fun environment) and to
> support the SMT-LIB project in its efforts to promote and develop the
> SMT-LIB format and collect and collate relevant benchmarks.
>
> The results of SMT-COMP'19 will be announced at the SMT Workshop (July
> 7-8), which is affiliated with the 22nd International Conference on
> Theory and Applications of Satisfiability Testing (SAT 2019).
>
>
> SMT-COMP'19 is organized under the direction of the SMT Steering
> committee. The organizing team for SMT-COMP'19 is:
>
>    * Liana Hadarean - Amazon, USA
>    * Antti Hyvärinen - Universita della Svizzera italiana, CH
>    * Aina Niemetz (co-chair) - Stanford University, USA
>    * Giles Reger (co-chair) - University of Manchester, UK
>
>
> This is a call for three things:
>
>
> CALL FOR COMMENTS:
>
>
> The organizing team is preparing the schedule and rules for 2019. To
> further the above goals, we propose to make several changes to the
> format of SMT-COMP'19.
>
> Any comments you may have on these proposed changes, on how to improve
> the competition or to redirect its focus are welcome and will be
> considered by the team. We particularly appreciate comments received
> before * February 11th, 2019 *.
>
>
> 1. Benchmark selection.
>
> Since 2015, the organizers of SMT-COMP chose to evaluate all solvers
> on all relevant benchmarks. As a consequence, results became more
> predictable, and the fixed set of benchmarks may encourage solver
> developers to overfit the problem set. Overall, this seems more of an
> evaluation than a competition.
>
> Furthermore many of the problems in SMT-LIB can now be considered
> 'easy', or at least 'unsurprising'. For example, in the last iteration
> of the competition 78% of the 258,741 main track benchmarks were
> solved by all supported solvers within the time limit (71% within 1
> second). In 7 (out of 46) logics, over 99% of benchmarks were solved
> by all solvers.
>
> To shift the focus towards challenging benchmarks, reduce the number
> of benchmarks to be able to run with a longer timeout, and bring back
> the competitive aspect, we propose the following alternative benchmark
> selection scheme:
>
> * As a first step we will remove all benchmarks in a division that
>    were solved by *all* solvers (including non-competing solvers) in
>    this division in under a second during last year's competition.
>
> * From the remaining benchmarks we will randomly select a minimum of
>    300 benchmarks. If the division contains less, we include all
>    benchmarks.  If 50% of the eligible benchmarks in the division
>    exceed 300, we select 50% of the benchmarks.
>
> * Benchmarks will be (pseudo)-randomly selected without considering
>    the difficulty, category, or family of a benchmark.  Rationale:
>    difficulty is hard to quantify and easy benchmarks are already
>    removed, categories are already reasonably distributed (around 78%
>    of the non-incremental and almost 100% of the incremental
>    benchmarks are labelled 'industrial') and benchmark weights
>    normalized with respect to family size are allowed for in the
>    scoring formula.
>
>
> 2. Increase time limit to 40 minutes.
>
> In 2017, the competition time limit was decreased to 20 minutes due to
> the infeasibility of running all submitted solvers on all benchmarks
> of a division. The new benchmark selection strategy enables us to
> extend the time limit back to 40 minutes.
>
>
> 3. Different application domains require different time limits.
>
> For example software verification traditionally requires much lower
> time limits, compared to hardware verification.
> To reward solvers optimized for different use cases, we propose two
> new tracks this year:
>
> a. New industry challenge track.
>     This track will contain unsolved SMT-LIB benchmarks, with an
>     emphasis on those coming from industrial applications. We will also
>     include benchmarks nominated by the community as challenging and of
>     interest. In this track, solvers will run with a significantly
>     longer time limit, e.g., several hours. We additionally encourage
>     the community to submit new benchmarks for this track.
>
> b. New 24-second score.
>     In the main track the competition currently gives separate scores
>     for sequential and parallel performance. To reward tools that solve
>     problems quickly we will introduce a third '24-second score' for
>     the number of problems solved within 24 seconds (wall clock time).
>
>
> 4. Mandatory system descriptions for submitted solvers.
>
> As part of a submission, SMT-COMP entrants are now required to provide
> a short (1–2 pages) description of the system. This should include a
> list of all authors of the system, their present institutional
> affiliations, and any appropriate acknowledgements. The programming
> language(s) and basic SMT solving approach employed should be
> described (e.g., lazy integration of a Nelson-Oppen combination with
> SAT, translation to SAT, etc.). System descriptions should also
> include a URL for a web site for the submitted tool.
>
> The main incentive for this change is twofold. First, we want to
> improve transparency when submitted solvers are wrapper tools
> according to the rules of the competition.  Second, we want to
> encourage documentation of technical improvements that lead to the
> current results.
>
>
> 5. Do not run non-competitive divisions.
>
> A division in a track is competitive if at least two substantially
> different solvers (i.e., solvers from two different teams) were
> submitted. Although the organizers may enter other solvers for
> comparison purposes, only solvers that are explicitly submitted by
> their authors determine whether a division is competitive, and are
> eligible to be designated as winners.
>
>
> 6. New experimental model generation track for QF_BV.
>
> In many SMT applications, model generation is an essential feature.
> Currently, none of the SMT-COMP tracks require model generation. One
> of the challenges is that the model format is not consistent across
> different solvers. While imposing a standard over all logics is
> challenging, there are several logics where it is straightforward. For
> this reason, this year we are planning to include a new experimental
> model generation track for QF_BV. In the future we hope to expand this
> track to other logics as a way of pushing for model standardization.
>
>
> 7. Rename tracks.
>
> We will rename 'Application Track' to 'Incremental Track' and 'Main
> Track' to 'Single Problem Track'.
>
> We believe the current names are misleading, as the current 'Main
> Track' also contains problems coming from applications.  Additionally,
> having it called 'Main' de-emphasizes the importance of the other
> tracks and use cases of SMT.
>
>
>
> We would additionally like to pose the following open ended questions
> to the community:
>
>
> Q1. Should SMT-COMP include new logics?
>      * Last year strings were an experimental division (and the semantics
>        are still not fixed). Should this remain experimental? How should
>        we encourage new solvers and benchmarks for this track?
>      * A new SMT-LIB theory for reals with transcendental functions is
>        also in progress. Is it mature enough to feature in the
>        competition (as an experimental division)?
>
>
> Q2. In 2016 a new scoring scheme was introduced, where benchmark
>      weights are normalized with respect to their family size. The goal
>      was to not have large benchmark families dominate the solver score
>      (as opposed to simply counting the number of correctly solved
>       instances while adding a penalty for incorrect results). The
>      current scoring system, however, now emphasizes very small
>      benchmark families (now these families are what may dominate the
>      score). We would appreciate suggestions for a new scoring scheme
>      that strikes a balance between these two trends.
>
>
> Q3. What is a benchmark family?
>      The previous question assumes benchmarks are sensibly collected
>      into families. The previous rules use the directory structure to
>      define families with each family given by a top-level directory.
>      However, it is not clear that the way that SMT-LIB is curated
>      leads to sensible families with this approach as the current
>      approach is to place benchmarks submitted by one person for a
>      logic into a single directory -- there is no reason to assume that
>      these benchmarks are related. We suggest that using the lowest
>      level directory structure may be more appropriate but are there
>      any alternative definitions of family that could be used?
>
>
> Q4. Is it important to have 'overall' winners?
>      If so, how should we achieve this? The current approach is
>      completely dependent on the number of logics supported by the
>      tool. If not overall winners, do we want a higher-level notion of
>      winner than a division winner?  What is the best way to
>      acknowledge the success of solvers and motivate newcomers whilst
>      keeping the competition fun?
>
>
>
> CALL FOR BENCHMARKS:
>
>
> Have interesting or hard benchmarks that can be made public? Want the
> world's best SMT solvers to compete to solve *your* problems? Submit
> your benchmarks to SMT-LIB and SMT-COMP!
>
>
> Please let us know as soon as possible if you are considering
> submitting benchmarks, even if the material is not quite ready. We
> will work in close cooperation with the SMT-LIB maintainers to
> integrate such benchmarks into SMT-LIB. The deadline for submission of
> new benchmarks to be used in the 2019 competition is
> * March 1st, 2019 *.
>
>
> We would encourage new benchmarks in the following logics (which
> appear to have 'stagnated' in the sense that the benchmarks in
> them are no longer challenging to competitive solvers):
>
> ALIA
> AUFLIA
> AUFNIRA
> NIA
> NRA
> QF_ANIA
> QF_AUFBV
> QF_AUFNIA
> QF_DT
> QF_FP
> QF_LIRA
> QF_NIRA
> QF_RDL
> QF_UFBV
> QF_UFIDL
> QF_UFLIA
> QF_UFLRA
> QF_UFNIA
> QF_UFNRA
> UFBV
> UFIDL
> UFLRA
>
>
> We would also like to extend our call for benchmarks to include
> benchmarks with some additional information:
>
> 1. For the Industrial Challenge Track we would like to receive
>     difficult benchmarks that are important to you and either unsolved,
>     or unsolved within some reasonable time limit. We would
>     particularly like benchmarks that come with a description of why
>     they are difficult/important. Of course, if this is not possible
>     then new challenging benchmarks are always appreciated.
>
> 2. We would appreciate receiving benchmarks that you want solved
>     quickly (e.g.  in under 24 seconds) but currently struggle to.
>     Please add the required solution time as a comment to the
>     benchmark. If we receive many benchmarks of this kind we may
>     consider a new track in the future that specifically focuses on
>     benchmarks requiring short time limits.
>
>
>
> PRELIMINARY CALL FOR SOLVERS:
>
>
> A submission deadline for solvers will be announced along with the
> rules.  However, it is useful to the organizing team to know in
> advance which and how many solvers may be entering. If you have not
> submitted a solver before, or if you think there may be unusual
> circumstances, we request that you let us know at your earliest
> convenience if you think you may be submitting one or more solvers to
> SMT-COMP'19. Note that this year we require a system description for
> all submitted solvers as part of the submission of the final solver
> versions.
>
>
>
> COMMUNICATION:
>
>
> The competition website will be at www.smtcomp.org.
>
> The SMT-COMP repository will move to GitHub at
> https://github.com/smt-comp.
>
> Public email regarding the competition may be sent to
> smt-comp at cs.nyu.edu.
>
> Announcements will be sent to both smt-comp at cs.nyu.edu and
> smt-announce at googlegroups.com.
>
>
> Sincerely,
>
> The organizing team
>
> _______________________________________________
> SMT-COMP mailing list
> SMT-COMP at cs.nyu.edu
> https://cs.nyu.edu/mailman/listinfo/smt-comp