[SMT-COMP] QF_FP, min and zeros

Tue Apr 25 09:12:16 EDT 2017

Yeah, FMA is a perpetual mystery �� 

There is 1 open bug report for Z3's FMA, so it may well be that there's an actual bug there as well. But, I haven't had access to a machine that supports FMA for a while now, so I wasn't able to check it against the hardware yet (the SMT-LIB benchmarks were generated via an Itanium with actual FMA instruction). 

Cheers,
Christoph

Christoph M. Wintersteiger | Researcher | Tel: +44 1223 479724 | Fax: +44 1223 479999 | https://www.microsoft.com/en-us/research/people/cwinter/

Microsoft Research Limited (company number 03369488) is a company registered in England and Wales whose registered office is at 21 Station Road, Cambridge, CB1 2FB 

-----Original Message-----
From: François Bobot [mailto:francois.bobot at cea.fr] 
Sent: 24 April 2017 09:58
To: Christoph Wintersteiger <cwinter at microsoft.com>; Tjark Weber <tjark.weber at it.uu.se>
Cc: smt-comp at cs.nyu.edu
Subject: Re: [SMT-COMP] QF_FP, min and zeros

Hi,

Le 21/04/2017 à 17:03, Christoph Wintersteiger a écrit :
> I think that's a good idea and this is exactly what those benchmarks 
> have been designed to do. I agree that we can't make it mandatory for 
> all participants to run regression tests for all logics/theories 
> beforehand, but in the case of FP we can definitely send an extra 
> email to all participants making them aware of the latest version of these test cases, just to make sure they have a "reference" set to look at when they are unsure.
>

We really appreciate them for that, we are actually trying to understand why we doesn't have the same result for fma :).

Do you mean an email with the results for 1s timeout?

>
> On Fri, 2017-04-21 at 12:04 +0200, François Bobot wrote:
>> A similar stage can be done in our case, running all the 
>> Wintersteiger's unit test with a timeout of 1s and eliminating the 
>> provers that give a wrong result. But these benchmarks would not be 
>> used for the provers comparison because the real examples are lost among the unit tests.
>>
>> What do you think of that?
>
> Note that the competition comprises 3 tracks and over 40 divisions 
> (logics). To keep this manageable, we shouldn't overcomplicate its design.

The design is just to add that a division can define unit tests that will be run before the division itself with a very small timeout. So prover developers doesn't have to run them themselves.

>
> At this point, I am inclined to believe that unit tests (i) can serve 
> an important purpose in identifying incorrect solver implementations, 
> and (ii) do not necessarily cause "real" examples to be "lost": if 
> solving unit tests is easy, state-of-the-art solvers will only differ in how many of the real examples they can solve, so the (relative) ranking will depend on those.

They are easy if you support them, but for example some provers doesn't support fma or doesn't support not RNE (roundNearestTiesToEven) rounding mode. But since the industrial example doesn't use fma or not RNE rounding mode (except toy example with a parametric rounding mode) it is not worth adding support for them.

I think it is not a problem to answer unknown to a unit test.

Cheers,

--
François