[SMT-LIB] An alternative to SMT-COMP?

Mon Dec 31 04:10:53 EST 2012

Dear all,

sorry for the long delay.

Having a festive event like SMT-COMP (with T-shirts and live screen) is 
a nice thing: it brings together people around the screen and generate 
discussions.  Attesting participation can be good when applying for 
projects: it gives a hint to reviewers that the software reaches the 
level of maturity required for the competition.  Participating requires 
some work (just like releasing a new stable version), and that is why we 
did not participate last year.  This work is unavoidable, and necessary 
from time to time.  SMT-COMP is not to blame for the fact we did not 
participate, but rather ourselves (I apologize).  Indeed, having more 
time between successive SMT-COMPs, would be more convenient.

I think that the lost of interest in the SMT-COMP may be attributed to 
predictable results.  Leaving more time between competitions may help.  
Also, running the competitions only on the categories for which solver 
developers claim improvement would allow to concentrate on the 
categories where some changes would occur; but in practice I admit this 
would be difficult to do.  We need more and better benchmarks (and 
gathering new and good benchmarks is a very difficult task): this would 
definitely increase the suspense.  The numerous practical applications 
(for example in my direct environment: Rodin, TLAPS) of SMT could 
provide many new benchmarks; perhaps people are afraid to fill the 
SMT-LIB with formulas that are redundant or trivial.  Also 
non-disclosure agreements may be a problem.

A final point about the competition, I think that there may be a way to 
present the results such that everybody gets some credit, and such that 
it does not put too much pressure on the everlasting winner.  E.g., 
maybe highlight for each solver the number of files it was first to 
solve.  Just presenting a total order per category is not very 
satisfactory.  It would be nice also to highlight the solvers that solve 
the highest number of benchmarks, regardless of the categories.

Now about the SMT-EVAL: it is a good thing for users to have a view of 
the state of the art.  And SMT-COMP only partly helps. The wiki page 
http://en.wikipedia.org/wiki/Satisfiability_Modulo_Theories already 
gives some summary on the capabilities of the various SMT solvers, and 
enriching this with efficiency information would be valuable.  For 
SMT-EVAL, it would be possible to check the solvers against all formulas 
in the SMT-LIB.  It would be nice if the SMT-EVAL committee cooperates 
with SMT developers, i.e. advertise the "snapshot" date, and interact 
with developers if needed, e.g. in case of strange results or 
difficulties to compile.  Cesare highlighted the fact that it would be 
good for the SMT-EVAL committee not to contain any developer, for 
independence; my feeling is that one SMT developer in the committee 
would perhaps ease interaction between the committee and the developers.

Best Regards, and Happy New Year,

   Pascal

On 11/21/2012 09:48 AM, Tinelli, Cesare wrote:
> Dear all,
>
> As a follow up to my previous message about the future of SMT-COMP, there have been a few suggestions recently to have a different kind of event altogether: a comparative evaluation, let's call it SMT-EVAL, instead of a competition.
>
> This would be a non-live effort aimed at comparing solvers more thoroughly by running them on a much larger set of benchmarks and exploring several features and options for each solver, with the goal of getting a broader and more accurate picture of the state of the art.
> SMT-EVAL would perhaps not require official solver submissions and focus instead on publicly available solvers.
>
> The evaluation would clearly be a major endeavor and require a lot of work and a committed team of people. On the positive side, it would span a longer period of time, not have the sort of pressure and stress associated with a live competition, and for being more research oriented, could more easily lead to publications. Last but not least, the team in charge of it would benefit from the use of the StarExec execution service to run the experiments (seehttp://www.starexec.org/starexec/public/about.jsp). StarExec is going live soon and will be initially available only to a couple of logic solver communities, with SMT being one of them.
>
> The evaluation could be timed to end before SMT'13, which is expected to be in conjunction with SAT next year, so that its results could be presented there in lieu of the competition results.
>
> David Cok would be willing to be part of the evaluation team if SMT-EVAL replaced SMT-COMP next year. Aaron Stump has expressed a strong interest in being in the team too, both for its research aspects and because it would be a great stress test for StarExec. They both think that at least another person would be needed.
> I would add that it might be good if all the members of the team were not solver developers, to avoid any possible conflicts of interest.
>
> What do you think of the idea of SMT-EVAL as a replacement for SMT-COMP in 2013?
>
> Cheers,
>
>
> Cesare
>
>
>
>
>
>
>
>
>
>   
>
> _______________________________________________
> SMT-LIB mailing list
> SMT-LIB at cs.nyu.edu
> http://www.cs.nyu.edu/mailman/listinfo/smt-lib