LECTURE

LECTURE LECTURE 1
EFFICIENCY ISSUES IN EGC

Tarski's Geometry provides a universal framework for describing most common geometric tasks. We also have the basic computational tools for Tarski's Geometry: resultants, sturm sequences, algebraic number manipulation, cell decomposition. We might conclude from this that the geometric robustness problem within this framework is solved ``in principle''. These facts were known in the early 80's and began to be exploited (e.g., [15] addresses topologically correct decisions in CAD primitives). But serious practical obstacles remain, mostly to do with efficiency. Consider the simplest situation: geometric (CAD) modeling involving only linear objects. Yu [21] addresses this case. Here, only rational arithmetic is needed, and yet Yu concluded that exact computation in this setting is not practical in the foreseeable future. His reason was that the numbers in such computations become too large. But the approach of reducing all computation to exact rational arithmetic is only the first cut - at least it tells us where the difficulties might lie. We already know from the previous lecture that approximate arithmetic can be useful in exact geometric computation. In this lecture, we examine several computational techniques which speed up our computation, especially by exploiting approximate arithmetic.

1 Expression Packages

Efficient Big Number packages are clearly very important for EGC. One effort at creating an efficient Big Number core is [17,18]; the experience there suggests that the use of low level assembly language coding can speedup a big number integer package by a factor of 3. Beyond Big Number packages, there are many other important ideas that remains to be more fully explored at the arithmetic front: one is the idea of redundant arithmetic [10]. Another is the notion of incremental arithmetic [20].

For now, we focus on another key observation: arithmetic in a geometric algorithm is invariable performed by the repeatedly evaluation of a small number of primitives. For instance, in convex hull of a planar point let, we can basically reduce all arithmetic to a single primitive,

LeftTurn(p,q,r) = (q.y - r.y)(p.x - r.x) - (p.y - r.y)(q.x - r.x).
(1)
Of course, each evaluation of this expression is performed on different choices of p, q, r. This suggest that we focus our optimization efforts on aggregates of arithmetic operations such as in (1). Such aggregates of arithmetic operations afford much greater opportunities for optimizations than when we focus on isolated arithmetic operations (which is what happen if we only try to optimize the performance of BigNumber packages). The software embodiment of these idea is called expression packages which are built upon Big Number packages. The simplest class of expressions are based on polynomials with the operations of (+,-,×). Next, we have rational functions (+,-,×,¸), and radical expressions (+,-,×,¸, Ö{·}). We shall describe expression packages and their properties.

One key technique here is the ``compilation'' of expressions. To see what can be optimized along this line, we describe a system called LN (``Little Numbers'') from Fortune and Van Wyk [11,10,12]. It provides a simple language to define (1) numerical data types and (2) expressions. The numerical data types are basically fixed-size multiprecision numbers (only integers are available in the initial implementation). A preprocessor (also called LN) compiles such an LN program into C++ code. This code is extremely efficient since

Loops are unrolled into straightline arithmetic programs

Function calls are eliminated by direct code substitution

Multiprecision numbers are used for exact computation, but their storage sizes are known at compile time. This avoid dynamic allocation of storage.

Floating point filters are generated to avoid multiprecision arithmetic when possible.

Classic compiler optimization techniques can be used: common subexpressions, constant folding, etc.

Here is an example of an LN program:

[[

tuple Point(x,y)

macro orient(a:Point, b:Point, c:Point) =

sign( ( ....);

class integer = int<31>;
Point;
proc orient;
]]
...
// C++ code
...

The portion inside [[...]] is compiled into C++ classes and methods, while code outside [[...]] are copied out verbatim. The key words tuple and macro define (respectively) LN data types and LN expressions. The key words class and proc produces (respectively) C++ class (or typedefs) definitions and C++ methods. If the above program is stored in a file geom.e, LN will output two files, geom.h, and geom.cc. These files can now be included or compiled in the usual way.

A side effect of LN 's aggressive optimization of time efficiency is the code bloat - the generated C++ code can be very large. Milenkovic and Chang [8] reported that the compiler ran out of space while compiling an LN code for Boolean operations on planar polygons. This suggests that a less aggressive version of LN may be useful.

2 Certification and Program Filters

In the previous lecture, we exploited approximate computation in EGC through the use of BigFloat arithmetic. But another obvious avenue for exploiting approximate is machine arithmetic: machine arithmetic is fast and highly optimized. More generally, we examine the uses of programs that are ``always fast but sometimes incorrect''. Let us euphemistically call such programs useful. This seems antithetical to the whole idea of EGC. But it will actually hold the key to making EGC practical. Of course, we must deploy useful programs in our algorithms in a principled way.

The intuitive idea is easy: we must ``verify'', ``check'' or ``certify'' the output of useful programs. We will make a formal distinction among these three concepts. They are ordered in a hierarchy of decreasing difficulty, with verification having the most stringent requirement. The verification procedure takes as input a program P and some input-output specification S. The goal is to determine whether or not P has the behavior S. A checking procedure (say, specialized to a behavior S) takes an input-output pair (I, O) and will determine whether or not (I,O) Î S. This checker does not care how this (I, O) pair is produced (in practice it probably comes from running some program P). Finally, a certifying procedure (again, specialized to S) takes the same input pair (I,O) and again attempts to decide if (I,O) Î S. If it says ``yes'' then surely (I,O) Î S. If it says ``no'', it is allowed to be wrong (we call this false rejections). Of course, a trivial certifier is to say ``no'' to all its inputs. But useful certifiers should certainly do more than this. The main tradeoff for designing certifiers is between efficacy (the percentages of false rejections) and efficiency of the procedure.

The area of program checking [2,3] lends some perspective on how to do this. The program checking is a great relief from the verification legacy of the 1980s. Then, under the weight of increasingly complex programs, the battle cry was to insist on ``verification'' of all programs. This task turns out to be too difficult. For one thing, practically all code based using machine arithmetic ought fail the program verification process! Program checking from the 1990s takes a more realistic approach: it does not verify programs, it just checks specific input/output pairs.

But our main emphasis in EGC is certifiers which turns out to be very useful in practice. The term filters here is used as an alternative terminology for certifiers. A filter not have to recognize that any particular input/output pair is correct - to be useful, it only needs to recognize this for a substantial number of such pairs. Thus, filters are even easier to construct than checkers. Ultimately, filters are used for efficiency purposes, not for correctness purposes.

In EGC, the filters are usually numerical filters. These certify some property of a computed numerical value, typically its sign. This often amount to computing some error bound, and comparing the computed value with this bound. When such filters aim to certifying machine floating-point arithmetic, we call them floating point filters.

There are two main classification of numerical filters: static or dynamic. Static filters are those that can be computed at compile time, and they incur a low overhead at runtime. However, static bounds may be overly pessimistic and thus less effective. Dynamic filters exhibit opposite characteristics: they have higher runtime cost but are much more effective. We can have have semi-static filters which combines both features.

Filters can be used at different levels of granularity: from individual machine operations (e.g., arithmetic operations), to subroutines (e.g., geometric primitives) to algorithms (e.g. convex hull or meshing algorithm) to entire systems.

Computing Upper Bounds in Machine Arithmetic. In the implementation of numerical filters, we need to compute sharp upper bounds on numerical expressions. To be specific, suppose you have IEEE double values x and y and you want to compute an upper bound on |z| where z = xy. How can you do this? We can compute

~
z

¬|x\odot y|.
(2)
Here, |·| is done exactly by the IEEE arithmetic, but the multiplication \odot is not exact. One aspect of IEEE arithmetic is that you can change the rounding modes (see Lecture I). So if you have change the the rounding mode to round away from 0 or round towards +Ą, we will have [z\tilde] ł xy. Otherwise, we only know that [z\tilde] = z(1+d) where |d| Ł u. Here u = 2^-53 is the ``unit of rounding'' for the arithmetic. Unfortunately, turning the rounding mode on or off seems like an expensive proposition, not meant to be casually used in the middle of a program. So instead of relying on rounding modes, we further compute [w\tilde] as follows:

~
w

¬ ~
z

\odot (1+ 4u).
(3)
It is assumed that overflow and underflow does not occur during the computation of [w\tilde].

Note that 1+4u = 1+2^-51 is exactly representable. Therefore, we know that [w\tilde] = [z\tilde](1+ 4u)(1+d˘) for some d satisfying |d˘| Ł u. Hence,

~
w

=

z(1+d)(1+d˘)(1+4u)

ł

z(1 - 2u+ u²)(1+4u)

=

z(1 + 2u- 7u² + 4u³)

>

z

Note that if any of the operations Ĺ, \ominus or Ć is used in place of \odot in (2), the same argument still shows that [w\tilde] is an upper bound on the actual value.

We summarize this result:

Lemma 1 Let E be any rational numerical expression and let [E\tilde] be IEEE double value approximation to E. Assume the input numbers in E are IEEE doubles and E has k ł 1 operations.
(i) We can compute a IEEE double MaxAbs(E) which satisfies |E| Ł maxabs(E) in 3k machine operations.
(ii) If all the input values are positive, 2k machine operations suffices.
(iii) The value [E\tilde] is available as a side effect of computing MaxAbs(E), at the cost of storing the result.

Proof. In proof, we simply replace each rational operation in E by at most 3 machine operations: we count 2 flops to compute [z\tilde] in equations (2), and 1 flop to compute [w\tilde] in (3). In case the input numbers are non-negative, [z\tilde] needs only 1 machine operation. Q.E.D.

Model for Checking and Filtering. We give a simple formal model for checking and filtering. Assume I and O are some sets called the input and output spaces. In numerical problems, it is often possible to identify I and O with Rⁿ for some n. A computational problem is simply a subset p Í I×O. A program is simply a partial function P: I® O. So P(x) may be undefined for some x, denoted P(x); if P(x) is defined, then we write P(x)Ż. We say P is a partial algorithm for p if for all x Î I, if P(x)Ż then (x,P(x)) Î p. An algorithm A for p is a partial algorithm that happens to be a total function. A filter for p is a total program F: I×O® {0,1} such that F(x,y) = 1 implies (x,y) Î p. A checker C for p is a filter for p such that if C(x,y) = 0 then (x,y) Ď p Thus, a checker is a filter, but not necessarily vice-versa. Finally, a filtered program for p is a pair (P,F) such that P is a total program and F is a filter for p. We view (P,F) to be a new program P_F, such that on input x, if F(x,P(x)) = 1 then P_F(x) = P(x); otherwise P_F(x). Thus P_F is a partial algorithm for p. We want to ``anchor'' a filtered program (P,F) for p with an algorithm A for p. The triple (P,F,A) would constitute a program P_F,A for p: on any input x, we define P_F,A(x) = P_F(x) if P_F(x)Ż, otherwise P_F,A(x) = A(x).

Note that this model ought to be augmented by complexity considerations to show its true intent. Logically, the algorithm (P,F,A) is no different from A alone. But if C_A(x) denotes the complexity of the algorithm A on input x, we have

C_(P,F,A)(x) = C_P(x) + C_F(x,P(x)) + d

where

d = ě
í
î

0
if F(x, P(x)) = 1,

C_A(x)
else.

The algorithm (P,F,A) may be more efficient than A especially if C_A(x) is expensive, and the filter is so efficacious that most of the time, d = 0.

We consider the filtered program (P,F) for p because P is presumably some useful program for p. We can consider filter cascades: let F be a sequence

F = (F₁ ,Ľ, F_n)
(4)
where each F_i is a filter for p. We call F a (n-level) filter cascade for p. We view F as a filter for p by defining F(x,y) = max_{i = 1}ⁿ F_i(x,y). Thus F(x,y) = 1 iff some F_i(x,y) = 1. In practice, we evaluate F(x,y) by searching for the first i such that F_i(x,y) = 1; and otherwise output 0. Funke, et al [13,7] are among the first to exploit multi-level filter cascades. A filter cascade (4) in practice would have the property that each F_i is more ``effective'' but less efficient than F_i-1.

Exercises

Exercise 1.1:

(a) Write a C/C++ program which performs a sequence of arithmetic operations, and reports on the types of IEEE exceptions turned on by each.

(b) Perform some timing to determine the cost of testing for these exceptions. ¨

Exercise 1.2:

(a) Write a C/C++ program which turns on the various rounding modes, and prints the results of rounding.

(b) Perform some timing to determine the cost of switing rounding modes. ¨

(End of Exercise)

3 Static Filters

Fortune and Van Wyk [12] were the first to implement and quantify the the efficacy of filters for exact geometric computation. Their filter was implemented via the LN preprocessor system. We now look at the simple filter they implemented (which we dub the ``FvW Filter''), and some of their experimental results.

The FvW Filter. Static error bounds are easily maintained for a polynomial expression E with integer values at the leaves. Let [E\tilde] denote the IEEE double value obtained by direct evaluation of E using IEEE double operations. Fortune and Van Wyk computes a bound MaxErr(E) on the absolute error,

|E- ~
E

| Ł MaxErr(E).
(5)
It is easy to use this bound as a filter to certify the sign of [E\tilde]: if |[E\tilde]| > MaxErr(E) then sign([E\tilde]) = sign(E). Otherwise, we must resort to some fall back action. Unless we specify otherwise, the default action is to immediately use an infallible method, namely computing exactly using a Big Number package.

Let us now see how to compute MaxErr(E). It turns out that we also need the magnitude of E. The base-2 logarithm of the magnitude is bounded by MaxLen(E). Thus, we say that the FvW Filter has two filter parameters,

MaxErr(E), MaxLen(E).
(6)
We assume that each input variable x is assigned an upper bound MaxLen(x) on its bit length. Inductively, if F and G are polynomial expressions, then MaxLen(E) and MaxErr(E) are defined using the rules in Table 1.

Expression E	`MaxLen`(E)	`MaxErr`(E)

Variable x	`MaxLen`(x) given	max{0, 2^MaxLen(E)-53 }
E = F±G	1 + max{`MaxLen`(F),`MaxLen`(G)}	`MaxErr`(F) + `MaxErr`(G) +2^{MaxLen(F±G)-53}
E = FG	`MaxLen`(F) + `MaxLen`(G)	`MaxErr`(F)^MaxLen(G)+ `MaxErr`(G)^MaxLen(F)
		+ 2^{MaxLen(FG)-53}
Table 1: Parameters for the FwW Filter

Observe that the formulas in Table 1 assumes exact arithmetic. In implementations, we have to be careful to compute upper bounds on these formulas. We assume that the filter has failed in case of an overflow; it is easy to see that no underflow occurs when evaluating these formulas. Checking for exceptions has an extra overhead. Since MaxLen(E) is integer, we can evaluate the corresponding formulas using IEEE arithmetic exactly. But the formulas for MaxErr(E) will incur error, and we need to use some form of lemma 1.

Framework for Measuring Filter Efficacy. We want to quantify the efficacy of the FvW Filter. Consider the primitive of determining the sign of a 4×4 integer determinant. First look at the unfiltered performance of this primitive. We use the IEEE machine double arithmetic evaluation of this determinant (with possibly incorrect sign) as the base line for speed; this is standard procedure. This base performance is then compared to the performance of some standard (off-the-shelf) Big Integer packages. This serves as the top line for speed. The numbers cited in the paper are for the Big Integer package in LEDA (circa 1995), but the general conclusion for other packages are apparently not much different. For random 31-bit integers, the top line time yields 60 time increase over the base line. We will say

s = 60
(7)
in this case; the symbol s (or s(31)) reminds us that this is the ``slowdown'' factor. Clearly, s is a function of the bit length L as well. For instance, with random 53-bit signed integers, the factor s becomes 100. Next, still with L = 31, but using static filters implemented in LN, the factor s ranges from 13.7 to 21.8, for various platforms and CPU speeds [12,Figure 14]. For simplicity, we simply say s = 20, for some mythical combination of these platforms and CPUs. Thus the static filters improve the performance of exact arithmetic by the factor

f = 60/20 = 3.
(8)
In general, using unfiltered exact integer arithmetic as base line, the symbol f denotes the ``filtered improvement''. We use it as a measure of the efficacy of filtering.

In summary, the above experimental framework is clearly quite general, and aims to reduce the efficacy of a particular filter by estimating a number f. In general, the framework requires the following choices: (1) a ``test algorithm'' (we picked one for 4×4 determinants), (2) the ``base line'' (the standard is IEEE double arithmetic), (3) the ``top line'' (we picked LEDA's Big Integer), (4) the input data (we consider random 31-bit integers). Its simplicity means that we can extract this f factor from almost any¹ published experimental papers. Another measure of efficacy is the fraction r of correct approximate values [E\tilde] which fails to pass the filter. Unfortunately, this data is less readily available in the papers we cite.

For a true complexity model, we need to introduce size parameters. In EGC, two size parameters are of interest: the combinatorial size n and the bit size L. Hence all these parameters ought to be be written as s(n,L), f(n,L), etc.

Realistic versus Synthesis Problems. Static filters has an efficacy factor f = 3 (see (8)) in evaluating the sign of randomly generated 4-dimensional matrices (L = 31). The paper [7] calls such problems are ``synthetic benchmarks''. So it would be interesting to see the performance of filters using ``realistic benchmarks'', meaning actual algorithms for natural problems that we want to solve. But even here, there are degrees of realism. Let us² equate realistic benchmarks with algorithms for problems such as convex hulls, triangulations, etc. We look at one of these benchmarks from Sugihara below.

The point of realistic benchmarks is that they will generally involve a significant amount of non-numeric computation. Hence the f-factor in such settings may not look as good as our synthetic ones. To quantify this, suppose that a fraction

b (0 Ł b Ł 1)
(9)
of the running time of the algorithm is attributable to numerical computation. After replacing the machine arithmetic with exact integer arithmetic, the overall time becomes (1-b) + bs = 1+(s-1)b. With filtered arithmetic, the time becomes 1+(s-1)bf^-1. Thus efficacy factor f˘ for the algorithm is really

f˘ : =
(1+(s-1)b
1+(s-1)b/f
.

It is easy to verify that f˘ < f (assuming f > 1).

Note that our derivation assumes the original time is unit! This normalization is valid in our derivation because all the factors s, f that we use are ratios and are not affected by the normalization.

The factor b is empirical, of course. But even so, how can we estimate this? For instance, for 2- and 3-dimensional Delaunay triangulations, Fortune and van Wyk [12] noted that b Î [0.2,0.5]. Burnikel, Funke and Schirra [7] suggests a very simple idea for obtaining b: simply execute the test program in which each arithmetic operation is repeated c > 1 times. This gives us a new timing for the test program,

T(c) = (1-b) + cb.

Now, by plotting the running time T(c) against c, we obtain b as the slope.

Static Filters for 4-Dimensional Convex Hulls. To see static filters in action in the context of an actual algorithm, we follow a study of Sugihara [19]. He uses the well-known beneath/beyond algorithm for computing the convex hull of a set of 4-dimensional points, augmented with symbolic perturbation techniques. By using static filters, Sugihara reported a 100-fold speedup for nondegenerate inputs, and 3-fold speedup for degenerate inputs (see table below).

Let us briefly review the method: let the input point set be S and suppose they are sorted by the x-coordinates: P₁ ,Ľ, P_n. Let S_i be the set of first i points from this sorted list, and let CH(S_i) denote their convex hull. In general, the facets of the convex hull are 3-dimensional tetrahedras. The complexity of CH(S) can be Q(n²). Since this beneath/beyond algorithm is O(n²), it is optimal in the worst case. We begin by constructing CH(S₅). In general, we show how to transform CH(S_i-1) to CH(S_i). This amounts to deleting the boundary tetrahedra that are visible from P_i and addition of new tetrahedra with apexes at P_i.

The main primitive concerns 5 points A,B,C,D,E in R⁴. The sign of the following 5×5 determinant

D = é
ę
ę
ę
ę
ę
ę
ë

1
A_x
A_y
A_z
A_u

1
B_x
B_y
B_z
B_u

1
C_x
C_y
C_z
C_u

1
D_x
D_y
D_z
D_u

1
E_x
E_y
E_z
E_u
ů
ú
ú
ú
ú
ú
ú
ű

determines whether the tetrahedron (A,B,C,D) is visible from E. Of course, this is the primitive for all standard convex hull algorithms, not just the beneath/beyond method. If the coordinates are L bit integers, Hadamard bound says that the absolute value of the determinant is at most 25Ö52^4L, and thus a 6+4L bit integer. If L = 28, it suffices to use 4 computer words. Using IEEE 64-bit doubles, the mantissa is 53 bits, so d the relative error in a number, is at most 2^-53 < 4×10^-15.

The determinant can be evaluated as a 4×4 determinant A = (a_ij) with 4! = 24 terms. Let us define

d : =

max
i,j,k,l
|a_1i a_2j a_3k a_4l|

where i,j,k,l range over all permutations of {1,2,3,4}. The computed determinant det(A˘) is only an approximation to the true determinant det(A), where A˘ is some perturbed version of A. It can be shown that if we define

D : =
24
10¹⁵
d

then

| det
(A˘) - det
(A)| Ł D.

Thus, if the absolute value of det(A˘) is greater than D, we have the correct sign.

[Display table of values from Sugihara here]

[COMPUTE the PHI factor]

Exercises

Exercise 1.1:: Determine the value of b for any realistic algorithm of your choice. Does this b depend on the size of the input? ¨

(End of Exercise)

4 Dynamic Filters

To improve the quality of the static filters, we can use runtime information about the actual values of the variables, and dynamically compute the error bounds. We can again use MaxErr(E) and MaxLen(E) as found in Table 1 for static error. The only difference lies in the base case: for each variable x, the MaxErr(x) and MaxLen(x) can be directly computed from the value of x.

Dynamic Version of the FvW Filter. It is easy enough to convert the FwW filter into a dynamic one: looking at Table 1, we see that the only modification is that in the base case, we can directly compute MaxLen(x) and MaxErr(x) for a variable x. Let us estimate the cost of this dynamic filter. We already note that MaxLen(E) can be computed directly using the formula in Table 1 since they involve integer arithmetic. This takes 1 or 2 operations. But for MaxErr(E), we need to appeal to lemma 1. It is easy to see that since the values are all non-negative, we at most double the operation counts in the formulas of Table 1. The worst case is the formula for E = FG:

MaxErr(E) = MaxErr(F)^MaxLen(G)+ MaxErr(G)^MaxLen(F) + 2^{MaxLen(FG)-53}.

The value 2^{MaxLen(FG)-53} can be computed exactly in 2 flops. There remain 4 other exact operations, which requires 2×4 = 8 flops. Hence the bound here is 10 flops. Added to the single operation to compute MaxLen(F)+MaxLen(G), we obtain 11 flops. A similar analysis for E = F+G yields 8 flops.

After computing these filter parameters, we need check if the filter predicate is satisfied:

| ~
E

| > MaxErr(E).

Assuming [E\tilde] is available (this may incur storage costs not counted above), this check requires up to 2 operations: to compute the absolute value of [E\tilde] and to perform the comparison. Alternatively, if [E\tilde] is not avaiable, we can replace [E\tilde] by 2^MaxLen(E). In general, we also need to check for floating point exceptions at the end of computing the filter parameters (the filter is assumed to have failed when an exception occurred). We may be able to avoid the exception handling, e.g., static analysis may tell us that no exceptions can occur.

Fortune and Van Wyk gave somewhat similar numbers, which we quote: let f_e count the extra runtime operations, including comparisons; let f_r count the runtime operations for storing intermediate results. In the LN implementation, 12 Ł f_e Ł 14 and 24 Ł f_r Ł 33. The 36 Ł f_e+f_r Ł 47 overhead is needed even when the filter successfully certifies the approximate result [E\tilde]; otherwise, we may have to add the cost of exact computation, etc. Hence f_e+f_r is a lower bound on the s-factor when using filtered arithmetic in LN. Roughly the same bound of s = 48 was measured for LEDA's system.

Other ideas can be brought into play: we need not immediately invoke the dynamic filter. We still maintain a static filter, and do the more expensive runtime filter only when the static filter fails. And when the dynamic filter fails, we may still resort to other less expensive computation instead of jumping immediately to some failsafe but expensive big Integer/Rational computation. The layering of these stages of computations is called cascaded filtering by Burnikel, Funke and Schirra (BFS) [6]. This technique seems to pay off especially in degenerate situations. We next describe the BFS filter.

The BFS Filter. The is a dynamic filter, but it can also be described as ``semi-static'' (or ``semi-dynamic'') because one of its two computed parameters is statically determined. Let E be a radical expression, i.e., involving +,-,×,¸,Ö{·}. Again, let [E\tilde] be the machine IEEE double value computed from E in the straightforward manner (this time, with division and square-roots). In contrast to the FvW Filter, the filter parameters are now

MaxAbs(E), Ind(E).

The first is easy to understand: MaxAbs(E) is simply an upper bound on |E|. The second, called the index of E, is a natural number whose interpretation is harder to characterize. Together, they satisfy the following invariant:

|E- ~
E

| Ł MaxAbs(E) ·Ind(E) ·2^-53
(10)
Of course, the value 2^-53 may be replaced by the unit roundoff error u in general for other floating point systems.

Table 2 gives the recursive rules for maintaining MaxAbs(E) and Ind(E). The base case (E is a variable) is covered by the first two rows: notice that they distinguish between exact and rounded input variables. A variable x is exact if its value is representable without error by a IEEE double. In any case, x is assumed not to lie in the overflow range, so that the following holds

|round(x) -x| Ł |x|2^-53.

Also note that the bounds are computed using IEEE machine arithmetic, denoted

Ĺ, \ominus, \odot, Ć, Ö[ ~

]·.

The question arises: what happens when the operations lead to over- or underflow in computing the bound parameters? It can be shown that underflows for Ĺ, \ominus and Ö[[\tilde]]· can be ignored, and in the case of \odot and Ć, we just have to add a small constant MinDbl = 10^-1022 to MaxAbs(E).

Expression E MaxAbs(E) Ind(E)

Exact variable x x 0

Approx. variable x round(x) 1

E = F±G MaxAbs(F)ĹMaxAbs(G) 1+max{Ind(F),Ind(G)}

E = FG MaxAbs(F)\odot MaxAbs(G) 1+Ind(F) + Ind(G)

E = F/G [(|[E\tilde]|Ĺ(MaxAbs(F)ĆMaxAbs(G)))/((|[G\tilde]|ĆMaxAbs(G))\ominus(Ind(G)+1)2^-53)] 1+max{Ind(F),Ind(G)+1}

E = ÖF

{

(MaxAbs(F)Ć[F\tilde])\odot [E\tilde]

if [F\tilde] > 0

Ö[[\tilde]]MaxAbs(F)\odot 2²⁶

if [F\tilde] = 0

1+Ind(F)

Figure 1: Table 2: Parameters of the BFS Filter

Under the assumption (10), we can use the following criteria for certifying the sign of [E\tilde]:

| ~
E

| > MaxAbs(E) ·Ind(E) ·2^-53
(11)
Of course, this criteria should be implemented using machine arithmetic (see (3) and notes there). One can even certify the exactness of [E\tilde] under certain conditions. If E is a polynomial expression (i.e., involving +,-,× only), then E = [E\tilde] provided

1 > MaxAbs(E) ·Ind(E) ·2^-52.
(12)

Correctness of the BFS Filter. Let us just verify the rules just for the case of addition/subtraction.

Finally, we look at some experimental results. Table 3 shows the s-factor (recall that this is a slowdown factor compared to IEEE machine arithmetic) for the unfiltered and filtered cases. In both cases, the underlying Big Integer package is from LEDA's. The last column adds compilation to the filtered case. It is based on an expression compiler, EXPCOMP, somewhat in the spirit of LN. At L = 32, the f-factor (recall this is the speedup due to filtering) is 65.7/2.9 = 22.7. When compilation is used, it improves to f = 65.7/1.9 = 34.6. [Note: the reader might be tempted to deduce from these numbers that the the BFS filter is efficacious than the FwW Filter. But the use of different Big Integer packages, platforms and compilers, etc, does not justify this conclusion.]

BitLength L	Unfiltered s	BFS Filter s	BFS Compiled s

8	42.8	2.9	1.9
16	46.2	2.9	1.9
32	65.7	2.9	1.9
40	123.3	2.9	1.8
48	125.1	2.9	1.8

Figure 2: Table 3: Random 3×3 Determinant

While the above results look good, it is possible to create situations that filters are ineffective. Instead of using matrices with randomly generated integer entries, we can use degenerate determinants as input. The results recorded in Table 4 indicates that filters have very little effect. Indeed, we might have expected it to slow down the computation, since the filtering effort are strictly extra overhead. In contrast, for random data, the filter is almost always effective in avoiding exact computation.

BitLength L	Unfiltered s	BFS Filter s	BFS Compiled s

8	37.9	2.4	1.4
16	45.3	2.4	1.4
32	56.3	56.5	58.4
40	117.4	119.4	117.5
48	135.2	136.5	135.1

Figure 3: Table 4: Degenerate 3×3 Determinants

The original paper [6] describes more experimental results, including the performance of the BFS filter in the context of algorithms for computing Voronoi diagrams and triangulation of simple polygons.

Exercises

Exercise 1.1:: Implement and perform a direct comparison of the FvW Filter and the BFS Filter to determine their efficacy (i.e., their f-factors) for sign of small determinants. ¨

(End of Exercise)

5 Root Bounds

We now derive some root bounds on algebraic expressions

6 Sign Determination in Modular Arithmetic

Modular arithmetic (i.e., residue number systems) is an attractive method of implementing arithmetic because it allows each digit to be computed independently of the others. That is, is avoids the problem of carries, and thus is amenable to parallel execution in machine precision (assuming each digit fits into a machine word). Unfortunately, determining the sign of a number in modular representation seems to be non-trivial. We describe a solution provided by Brönnimann et al [4].

7 Partial Compilation of Expressions

Sometimes, part of the arguments of an expression are fixed while others are vary. We can exploit this by constructing ``partially compiled expressions''. To see an example of this phenomenon, we look at the Beneath-Beyond Algorithm for convex hulls.

See [Page 147ff in Edelsbrunner].

Let S be a set of points in d dimensions. FOR SIMPLICITY, ASSUME GENERAL POSITION. We sort the points by their first coordinate, and add them in this order. The method (called Beneath-Beyond) begins with a d+1 simplex. At each step, we add a new point p to the current hull H:

8 Other Topics

1) Precision Driven Computation: e.g., CORE Library.

2) A New Computational Paradigm: - robustness as a computational resource - speed/robustness tradeoff curve - How to Admit Non-robustness but useful code: Checking as an antidote to program verification.

3) Root Bounds for Radical Expressions.

Exercises

Exercise 1.1:: Program the partial evaluation idea for the convex hull algorithm. ¨

(End of Exercise)

References

[1]: F. Avnaim, J.-D. Boissonnat, O. Devillers, F. Preparata, and M. Yvinec. Evaluating signs of determinants using single-precision arithmetic. Algorithmica, 17(2):111-132, 1997.
[2]: M. Blum and S. Kannan. Designing programs that check their work. J. of the ACM, 42(1):269-291, Jan. 1995.
[3]: M. Blum, M. Luby, and R. Rubinfeld. Self-testing and self-correcting programs, with applications to numerical programs. J. of Computer and System Sciences, 47:549-595, 1993.
[4]: H. Brönnimann, I. Z. Emiris, V. Y. Pan, and S. Pion. Sign determination in residure number systems. Theor. Computer Science, 210:173-197, 1999.
[5]: H. Brönnimann and M. Yvinec. Efficient exact evaluation of signs of determinants. Algorithmica, 27:21-56, 2000.
[6]: C. Burnikel, S. Funke, and M. Seel. Exact arithmetic using cascaded computation. ACM Symp. on Computational Geometry, 14:175-183, 1998.
[7]: S. Burnikel, Funke. Exact geometric computation using cascading. to appear in special issue of IJCGA, 2000.
[8]: J. D. Chang and V. Milenkovic. An experiment using LN for exact geometric computations. Proceed. 5th Canadian Conference on Computational Geometry, pages 67-72, 1993. University of Waterloo.
[9]: K. L. Clarkson. Safe and effective determinant evaluation. IEEE Foundations of Computer Science, 33:387-395, 1992.
[10]: S. J. Fortune and C. J. van Wyk. Efficient exact arithmetic for computational geometry. In Proc. 9th ACM Symp. on Computational Geom., pages 163-172, 1993.
[11]: S. J. Fortune and C. J. van Wyk. LN User Manual, 1993. AT&T Bell Laboratories.
[12]: S. J. Fortune and C. J. van Wyk. Static analysis yields efficient exact integer arithmetic for computational geometry. ACM Transactions on Graphics, 15(3):223-248, 1996.
[13]: S. Funke. Exact arithmetic using cascaded computation. Master's thesis, Max Planck Institute for Computer Science, Saarbrücken, Germany, 1997.
[14]: LiDIA Homepage, 1998. LiDIA: an efficient multiprecision number package for computational number theory. URL http://www.informatik.th-darmstadt.de/TI/LiDIA/.
[15]: S. Ocken, J. T. Schwartz, and M. Sharir. Precise implementation of CAD primitives using rational parameterization of standard surfaces. In J. Hopcroft, J. Schwartz, and M. Sharir, editors, Planning, Geometry, and Complexity of Robot Motion, pages 245-266. Ablex Pub. Corp., Norwood, NJ, 1987.
[16]: V. Y. Pan and Y. Yu. Certified computation of the sign of a matrix determinant. Proc. 10th ACM-SIAM Symp. on Discrete Algorithms (SODA99), pages 715-724, 1999. Accepted for Algorithmica.
[17]: B. Serpette, J. Vuillemin, and J. Hervé. BigNum: a portable and efficient package for arbitrary-precision arithmetic. Research Report 2, Digital Paris Research Laboratory, May, 1989.
[18]: M. Shand, P. Bertin, and J. Vuillemin. Hardware speedups in long integer multiplication. 2nd ACM Symposium on Parallel Algorithms and Architectures, 1990. Crete.
[19]: K. Sugihara. Exact computation of 4-D convex hulls with perturbation and acceleration. Proc. 7th Pacific Conference on Computer Graphics and Applications (Pacific Graphics '99), 1999. October 5-7, 1999, Seoul, Korea.
[20]: C. K. Yap and T. Dubé. The exact computation paradigm. In D.-Z. Du and F. K. Hwang, editors, Computing in Euclidean Geometry, pages 452-486. World Scientific Press, Singapore, 1995. 2nd edition.
[21]: J. Yu. Exact arithmetic solid modeling. Ph.D. dissertation, Department of Computer Science, Purdue University, West Lafayette, IN 47907, June 1992. Technical Report No. CSD-TR-92-037.

Footnotes:

¹ Alternatively, we could require these numbers as the minimum standard for any experimental paper on filters.

² While holding on tightly to our theoretician's hat!

File translated from T_EX by T_TH, version 2.78.
On 1 Jun 2001, 14:21.