[SMT-COMP] Floating-Point benchmarks for SMT-COMP

Tue Apr 24 07:08:49 EDT 2018

Hi Florian,

Did you find the time to prepare those benchmark sets for SMT-LIB? 

Regarding the competition: are any of the sub-sets from applications or are they all just functional tests?

SMT-COMP organizers: do you want those benchmarks ready in time for the competition? Given that they are mostly functional tests they are probably a bit boring, but I'd be happy to add them soon/quickly.

Cheers,
Christoph

Christoph M. Wintersteiger | Senior RSDE | Tel: +44 1223 479724 | Fax: +44 1223 479999 | research.microsoft.com/~cwinter

Microsoft Research Limited (company number 03369488) is a company registered in England and Wales whose registered office is at 21 Station Road, Cambridge, CB1 2FB 

-----Original Message-----
From: Florian Schanda <florian.schanda at altran.com> 
Sent: 21 March 2018 08:51
To: Christoph Wintersteiger <cwinter at microsoft.com>
Cc: smt-comp at cs.nyu.edu; martin.brain at cs.ox.ac.uk; florian at schanda.org.uk
Subject: Floating-Point benchmarks for SMT-COMP

Hi,

We've got a large set of benchmarks to submit this year ;)

I am not sure this is the correct process to submit; but 2 years ago it was "email Christoph"... Please direct me to the correct procedure if this is wrong.

Attached is the first set (the others I will assemble after Easter since its a bit more tricky). If you're really impatient they are all public anyway on:
   https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fflorianschanda%2Fsmtlib_schanda&data=04%7C01%7Ccwinter%40microsoft.com%7C9b0e425ea5f74058dee108d58f08fb4f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636572191140720086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1&sdata=nMDwHabYwOo0QbPrlVNYnfrfwDBeEC%2FeZS10ZrQyFwY%3D&reserved=0

This set of 72,925 benchmarks is randomly generated using:
   https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fflorianschanda%2Fpympf&data=04%7C01%7Ccwinter%40microsoft.com%7C9b0e425ea5f74058dee108d58f08fb4f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636572191140720086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-1&sdata=z4gyAf%2FoXvFgJL85FyU5NKsz3qBsWxfBpn76Gw1nJKk%3D&reserved=0

This is not really a performance benchmarks suite, more of a functional 
correctness suite. Every single benchmark should be correctly annotated with 
sat/unsat. It tests every operation the FP theory mentions, including all the 
fun real/float stuff. It has helped us find soundness bugs in all solvers; so 
we feel it is particularly helpful to include. The license of these is GPLv3. 
In terms of validating the correctness of the ground truth; we've checked 
(where applicable/possible) against hardware (a recent x86_64), software (GNU/
MPFR), and software (PyMPF itself). We've also looked at the results of 
MathSAT, Z3, CVC4, and Colibri and we're reasonably happy that all answers 
that differ from the ground truth are bugs in the respective solvers.

I will submit two more benchmark suites: one extracted from real-world 
industrial code; and the other one from the public spark 2014 testsuite. These 
are more tricky as they are not neatly QF_FP or QF_FPBV, these use arrays, 
functions, bitvectors, reals and ints, quantifiers and datatypes as well.

Perhaps there should be an ALL (or SPARK_VC_GENERATION_IS_SPECIAL) track? :)

	Florian