Tarski's Geometry provides a universal framework for describing most common geometric tasks. We also have the basic computational tools for Tarski's Geometry: resultants, sturm sequences, algebraic number manipulation, cell decomposition. We might conclude from this that the geometric robustness problem within this framework is solved ``in principle''. These facts were known in the early 80's and began to be exploited (e.g., [15] addresses topologically correct decisions in CAD primitives). But serious practical obstacles remain, mostly to do with efficiency. Consider the simplest situation: geometric (CAD) modeling involving only linear objects. Yu [21] addresses this case. Here, only rational arithmetic is needed, and yet Yu concluded that exact computation in this setting is not practical in the foreseeable future. His reason was that the numbers in such computations become too large. But the approach of reducing all computation to exact rational arithmetic is only the first cut - at least it tells us where the difficulties might lie. We already know from the previous lecture that approximate arithmetic can be useful in exact geometric computation. In this lecture, we examine several computational techniques which speed up our computation, especially by exploiting approximate arithmetic.
Efficient Big Number packages are clearly very important for EGC. One effort at creating an efficient Big Number core is [17,18]; the experience there suggests that the use of low level assembly language coding can speedup a big number integer package by a factor of 3. Beyond Big Number packages, there are many other important ideas that remains to be more fully explored at the arithmetic front: one is the idea of redundant arithmetic [10]. Another is the notion of incremental arithmetic [20].
For now, we focus on another key observation: arithmetic in a geometric algorithm is invariable performed by the repeatedly evaluation of a small number of primitives. For instance, in convex hull of a planar point let, we can basically reduce all arithmetic to a single primitive,
| (1) |
One key technique here is the ``compilation'' of expressions. To see what can be optimized along this line, we describe a system called LN (``Little Numbers'') from Fortune and Van Wyk [11,10,12]. It provides a simple language to define (1) numerical data types and (2) expressions. The numerical data types are basically fixed-size multiprecision numbers (only integers are available in the initial implementation). A preprocessor (also called LN) compiles such an LN program into C++ code. This code is extremely efficient since
Here is an example of an LN program:
[[
tuple Point(x,y)
macro orient(a:Point, b:Point, c:Point) =
sign( ( ....);
class integer = int<31>;
Point;
proc orient;
]]
...
// C++ code
...
The portion inside [[...]] is compiled into C++ classes and methods, while code outside [[...]] are copied out verbatim. The key words tuple and macro define (respectively) LN data types and LN expressions. The key words class and proc produces (respectively) C++ class (or typedefs) definitions and C++ methods. If the above program is stored in a file geom.e, LN will output two files, geom.h, and geom.cc. These files can now be included or compiled in the usual way.
A side effect of LN 's aggressive optimization of time efficiency is the code bloat - the generated C++ code can be very large. Milenkovic and Chang [8] reported that the compiler ran out of space while compiling an LN code for Boolean operations on planar polygons. This suggests that a less aggressive version of LN may be useful.
In the previous lecture, we exploited approximate computation in EGC through the use of BigFloat arithmetic. But another obvious avenue for exploiting approximate is machine arithmetic: machine arithmetic is fast and highly optimized. More generally, we examine the uses of programs that are ``always fast but sometimes incorrect''. Let us euphemistically call such programs useful. This seems antithetical to the whole idea of EGC. But it will actually hold the key to making EGC practical. Of course, we must deploy useful programs in our algorithms in a principled way.
The intuitive idea is easy: we must ``verify'', ``check'' or ``certify'' the output of useful programs. We will make a formal distinction among these three concepts. They are ordered in a hierarchy of decreasing difficulty, with verification having the most stringent requirement. The verification procedure takes as input a program P and some input-output specification S. The goal is to determine whether or not P has the behavior S. A checking procedure (say, specialized to a behavior S) takes an input-output pair (I, O) and will determine whether or not (I,O) Î S. This checker does not care how this (I, O) pair is produced (in practice it probably comes from running some program P). Finally, a certifying procedure (again, specialized to S) takes the same input pair (I,O) and again attempts to decide if (I,O) Î S. If it says ``yes'' then surely (I,O) Î S. If it says ``no'', it is allowed to be wrong (we call this false rejections). Of course, a trivial certifier is to say ``no'' to all its inputs. But useful certifiers should certainly do more than this. The main tradeoff for designing certifiers is between efficacy (the percentages of false rejections) and efficiency of the procedure.
The area of program checking [2,3] lends some perspective on how to do this. The program checking is a great relief from the verification legacy of the 1980s. Then, under the weight of increasingly complex programs, the battle cry was to insist on ``verification'' of all programs. This task turns out to be too difficult. For one thing, practically all code based using machine arithmetic ought fail the program verification process! Program checking from the 1990s takes a more realistic approach: it does not verify programs, it just checks specific input/output pairs.
But our main emphasis in EGC is certifiers which turns out to be very useful in practice. The term filters here is used as an alternative terminology for certifiers. A filter not have to recognize that any particular input/output pair is correct - to be useful, it only needs to recognize this for a substantial number of such pairs. Thus, filters are even easier to construct than checkers. Ultimately, filters are used for efficiency purposes, not for correctness purposes.
In EGC, the filters are usually numerical filters. These certify some property of a computed numerical value, typically its sign. This often amount to computing some error bound, and comparing the computed value with this bound. When such filters aim to certifying machine floating-point arithmetic, we call them floating point filters.
There are two main classification of numerical filters: static or dynamic. Static filters are those that can be computed at compile time, and they incur a low overhead at runtime. However, static bounds may be overly pessimistic and thus less effective. Dynamic filters exhibit opposite characteristics: they have higher runtime cost but are much more effective. We can have have semi-static filters which combines both features.
Filters can be used at different levels of granularity: from individual machine operations (e.g., arithmetic operations), to subroutines (e.g., geometric primitives) to algorithms (e.g. convex hull or meshing algorithm) to entire systems.
Computing Upper Bounds in Machine Arithmetic. In the implementation of numerical filters, we need to compute sharp upper bounds on numerical expressions. To be specific, suppose you have IEEE double values x and y and you want to compute an upper bound on |z| where z = xy. How can you do this? We can compute
| (2) |
| (3) |
Note that 1+4u = 1+2-51 is exactly representable. Therefore, we know that [w\tilde] = [z\tilde](1+ 4u)(1+d¢) for some d satisfying |d¢| £ u. Hence,
|
We summarize this result:
Lemma 1
Let E be any rational numerical expression and let [E\tilde]
be IEEE double value approximation to E.
Assume the input numbers in E are
IEEE doubles and E has k ³ 1 operations.
(i)
We can compute a IEEE double MaxAbs(E)
which satisfies |E| £ maxabs(E) in 3k machine operations.
(ii)
If all the input values are positive, 2k machine operations suffices.
(iii)
The value [E\tilde] is available as a side effect
of computing MaxAbs(E), at the cost of storing the result.
Model for Checking and Filtering. We give a simple formal model for checking and filtering. Assume I and O are some sets called the input and output spaces. In numerical problems, it is often possible to identify I and O with Rn for some n. A computational problem is simply a subset p Í I×O. A program is simply a partial function P: I® O. So P(x) may be undefined for some x, denoted P(x); if P(x) is defined, then we write P(x)¯. We say P is a partial algorithm for p if for all x Î I, if P(x)¯ then (x,P(x)) Î p. An algorithm A for p is a partial algorithm that happens to be a total function. A filter for p is a total program F: I×O® {0,1} such that F(x,y) = 1 implies (x,y) Î p. A checker C for p is a filter for p such that if C(x,y) = 0 then (x,y) Ï p Thus, a checker is a filter, but not necessarily vice-versa. Finally, a filtered program for p is a pair (P,F) such that P is a total program and F is a filter for p. We view (P,F) to be a new program PF, such that on input x, if F(x,P(x)) = 1 then PF(x) = P(x); otherwise PF(x). Thus PF is a partial algorithm for p. We want to ``anchor'' a filtered program (P,F) for p with an algorithm A for p. The triple (P,F,A) would constitute a program PF,A for p: on any input x, we define PF,A(x) = PF(x) if PF(x)¯, otherwise PF,A(x) = A(x).
Note that this model ought to be augmented by complexity considerations to show its true intent. Logically, the algorithm (P,F,A) is no different from A alone. But if CA(x) denotes the complexity of the algorithm A on input x, we have
|
|
We consider the filtered program (P,F) for p because P is presumably some useful program for p. We can consider filter cascades: let F be a sequence
| (4) |
(a) Write a C/C++ program which performs a sequence of
arithmetic operations, and reports on the types
of IEEE exceptions turned on by each.
(b) Perform some timing to determine the cost of testing for these exceptions. ¨
(a) Write a C/C++ program which turns on the
various rounding modes, and prints the results of rounding.
(b) Perform some timing to determine the cost of switing rounding modes. ¨
(End of Exercise)
Fortune and Van Wyk [12] were the first to implement and quantify the the efficacy of filters for exact geometric computation. Their filter was implemented via the LN preprocessor system. We now look at the simple filter they implemented (which we dub the ``FvW Filter''), and some of their experimental results.
The FvW Filter. Static error bounds are easily maintained for a polynomial expression E with integer values at the leaves. Let [E\tilde] denote the IEEE double value obtained by direct evaluation of E using IEEE double operations. Fortune and Van Wyk computes a bound MaxErr(E) on the absolute error,
| (5) |
Let us now see how to compute MaxErr(E). It turns out that we also need the magnitude of E. The base-2 logarithm of the magnitude is bounded by MaxLen(E). Thus, we say that the FvW Filter has two filter parameters,
| (6) |
Expression E | MaxLen(E) | MaxErr(E) |
Variable x | MaxLen(x) given | max{0, 2MaxLen(E)-53 } |
E = F±G | 1 + max{MaxLen(F),MaxLen(G)} | MaxErr(F) + MaxErr(G) +2MaxLen(F±G)-53 |
E = FG | MaxLen(F) + MaxLen(G) | MaxErr(F)MaxLen(G)+ MaxErr(G)MaxLen(F) |
+ 2MaxLen(FG)-53 | ||
Table 1: Parameters for the FwW Filter |
Framework for Measuring Filter Efficacy. We want to quantify the efficacy of the FvW Filter. Consider the primitive of determining the sign of a 4×4 integer determinant. First look at the unfiltered performance of this primitive. We use the IEEE machine double arithmetic evaluation of this determinant (with possibly incorrect sign) as the base line for speed; this is standard procedure. This base performance is then compared to the performance of some standard (off-the-shelf) Big Integer packages. This serves as the top line for speed. The numbers cited in the paper are for the Big Integer package in LEDA (circa 1995), but the general conclusion for other packages are apparently not much different. For random 31-bit integers, the top line time yields 60 time increase over the base line. We will say
| (7) |
| (8) |
In summary, the above experimental framework is clearly quite general, and aims to reduce the efficacy of a particular filter by estimating a number f. In general, the framework requires the following choices: (1) a ``test algorithm'' (we picked one for 4×4 determinants), (2) the ``base line'' (the standard is IEEE double arithmetic), (3) the ``top line'' (we picked LEDA's Big Integer), (4) the input data (we consider random 31-bit integers). Its simplicity means that we can extract this f factor from almost any1 published experimental papers. Another measure of efficacy is the fraction r of correct approximate values [E\tilde] which fails to pass the filter. Unfortunately, this data is less readily available in the papers we cite.
For a true complexity model, we need to introduce size parameters. In EGC, two size parameters are of interest: the combinatorial size n and the bit size L. Hence all these parameters ought to be be written as s(n,L), f(n,L), etc.
Realistic versus Synthesis Problems. Static filters has an efficacy factor f = 3 (see (8)) in evaluating the sign of randomly generated 4-dimensional matrices (L = 31). The paper [7] calls such problems are ``synthetic benchmarks''. So it would be interesting to see the performance of filters using ``realistic benchmarks'', meaning actual algorithms for natural problems that we want to solve. But even here, there are degrees of realism. Let us2 equate realistic benchmarks with algorithms for problems such as convex hulls, triangulations, etc. We look at one of these benchmarks from Sugihara below.
The point of realistic benchmarks is that they will generally involve a significant amount of non-numeric computation. Hence the f-factor in such settings may not look as good as our synthetic ones. To quantify this, suppose that a fraction
| (9) |
|
Note that our derivation assumes the original time is unit! This normalization is valid in our derivation because all the factors s, f that we use are ratios and are not affected by the normalization.
The factor b is empirical, of course. But even so, how can we estimate this? For instance, for 2- and 3-dimensional Delaunay triangulations, Fortune and van Wyk [12] noted that b Î [0.2,0.5]. Burnikel, Funke and Schirra [7] suggests a very simple idea for obtaining b: simply execute the test program in which each arithmetic operation is repeated c > 1 times. This gives us a new timing for the test program,
|
Static Filters for 4-Dimensional Convex Hulls. To see static filters in action in the context of an actual algorithm, we follow a study of Sugihara [19]. He uses the well-known beneath/beyond algorithm for computing the convex hull of a set of 4-dimensional points, augmented with symbolic perturbation techniques. By using static filters, Sugihara reported a 100-fold speedup for nondegenerate inputs, and 3-fold speedup for degenerate inputs (see table below).
Let us briefly review the method: let the input point set be S and suppose they are sorted by the x-coordinates: P1 ,¼, Pn. Let Si be the set of first i points from this sorted list, and let CH(Si) denote their convex hull. In general, the facets of the convex hull are 3-dimensional tetrahedras. The complexity of CH(S) can be Q(n2). Since this beneath/beyond algorithm is O(n2), it is optimal in the worst case. We begin by constructing CH(S5). In general, we show how to transform CH(Si-1) to CH(Si). This amounts to deleting the boundary tetrahedra that are visible from Pi and addition of new tetrahedra with apexes at Pi.
The main primitive concerns 5 points A,B,C,D,E in R4. The sign of the following 5×5 determinant
|
The determinant can be evaluated as a 4×4 determinant A = (aij) with 4! = 24 terms. Let us define
|
|
|
[Display table of values from Sugihara here]
Determine the value of b for any realistic algorithm of your choice. Does this b depend on the size of the input? ¨
(End of Exercise)
To improve the quality of the static filters, we can use runtime information about the actual values of the variables, and dynamically compute the error bounds. We can again use MaxErr(E) and MaxLen(E) as found in Table 1 for static error. The only difference lies in the base case: for each variable x, the MaxErr(x) and MaxLen(x) can be directly computed from the value of x.
Dynamic Version of the FvW Filter. It is easy enough to convert the FwW filter into a dynamic one: looking at Table 1, we see that the only modification is that in the base case, we can directly compute MaxLen(x) and MaxErr(x) for a variable x. Let us estimate the cost of this dynamic filter. We already note that MaxLen(E) can be computed directly using the formula in Table 1 since they involve integer arithmetic. This takes 1 or 2 operations. But for MaxErr(E), we need to appeal to lemma 1. It is easy to see that since the values are all non-negative, we at most double the operation counts in the formulas of Table 1. The worst case is the formula for E = FG:
|
After computing these filter parameters, we need check if the filter predicate is satisfied:
|
Fortune and Van Wyk gave somewhat similar numbers, which we quote: let fe count the extra runtime operations, including comparisons; let fr count the runtime operations for storing intermediate results. In the LN implementation, 12 £ fe £ 14 and 24 £ fr £ 33. The 36 £ fe+fr £ 47 overhead is needed even when the filter successfully certifies the approximate result [E\tilde]; otherwise, we may have to add the cost of exact computation, etc. Hence fe+fr is a lower bound on the s-factor when using filtered arithmetic in LN. Roughly the same bound of s = 48 was measured for LEDA's system.
Other ideas can be brought into play: we need not immediately invoke the dynamic filter. We still maintain a static filter, and do the more expensive runtime filter only when the static filter fails. And when the dynamic filter fails, we may still resort to other less expensive computation instead of jumping immediately to some failsafe but expensive big Integer/Rational computation. The layering of these stages of computations is called cascaded filtering by Burnikel, Funke and Schirra (BFS) [6]. This technique seems to pay off especially in degenerate situations. We next describe the BFS filter.
The BFS Filter. The is a dynamic filter, but it can also be described as ``semi-static'' (or ``semi-dynamic'') because one of its two computed parameters is statically determined. Let E be a radical expression, i.e., involving +,-,×,¸,Ö{·}. Again, let [E\tilde] be the machine IEEE double value computed from E in the straightforward manner (this time, with division and square-roots). In contrast to the FvW Filter, the filter parameters are now
|
| (10) |
Table 2 gives the recursive rules for maintaining MaxAbs(E) and Ind(E). The base case (E is a variable) is covered by the first two rows: notice that they distinguish between exact and rounded input variables. A variable x is exact if its value is representable without error by a IEEE double. In any case, x is assumed not to lie in the overflow range, so that the following holds
|
|
Expression E | MaxAbs(E) | Ind(E) | ||||||||
Exact variable x | x | 0 | ||||||||
Approx. variable x | round(x) | 1 | ||||||||
E = F±G | MaxAbs(F)ÅMaxAbs(G) | 1+max{Ind(F),Ind(G)} | ||||||||
E = FG | MaxAbs(F)\odot MaxAbs(G) | 1+Ind(F) + Ind(G) | ||||||||
E = F/G | [(|[E\tilde]|Å(MaxAbs(F)ÆMaxAbs(G)))/((|[G\tilde]|ÆMaxAbs(G))\ominus(Ind(G)+1)2-53)] | 1+max{Ind(F),Ind(G)+1} | ||||||||
E = ÖF | {
| 1+Ind(F) |
Under the assumption (10), we can use the following criteria for certifying the sign of [E\tilde]:
| (11) |
| (12) |
Correctness of the BFS Filter. Let us just verify the rules just for the case of addition/subtraction.
Finally, we look at some experimental results. Table 3 shows the s-factor (recall that this is a slowdown factor compared to IEEE machine arithmetic) for the unfiltered and filtered cases. In both cases, the underlying Big Integer package is from LEDA's. The last column adds compilation to the filtered case. It is based on an expression compiler, EXPCOMP, somewhat in the spirit of LN. At L = 32, the f-factor (recall this is the speedup due to filtering) is 65.7/2.9 = 22.7. When compilation is used, it improves to f = 65.7/1.9 = 34.6. [Note: the reader might be tempted to deduce from these numbers that the the BFS filter is efficacious than the FwW Filter. But the use of different Big Integer packages, platforms and compilers, etc, does not justify this conclusion.]
BitLength L | Unfiltered s | BFS Filter s | BFS Compiled s |
8 | 42.8 | 2.9 | 1.9 |
16 | 46.2 | 2.9 | 1.9 |
32 | 65.7 | 2.9 | 1.9 |
40 | 123.3 | 2.9 | 1.8 |
48 | 125.1 | 2.9 | 1.8 |
While the above results look good, it is possible to create situations that filters are ineffective. Instead of using matrices with randomly generated integer entries, we can use degenerate determinants as input. The results recorded in Table 4 indicates that filters have very little effect. Indeed, we might have expected it to slow down the computation, since the filtering effort are strictly extra overhead. In contrast, for random data, the filter is almost always effective in avoiding exact computation.
BitLength L | Unfiltered s | BFS Filter s | BFS Compiled s |
8 | 37.9 | 2.4 | 1.4 |
16 | 45.3 | 2.4 | 1.4 |
32 | 56.3 | 56.5 | 58.4 |
40 | 117.4 | 119.4 | 117.5 |
48 | 135.2 | 136.5 | 135.1 |
The original paper [6] describes more experimental results, including the performance of the BFS filter in the context of algorithms for computing Voronoi diagrams and triangulation of simple polygons.
Implement and perform a direct comparison of the FvW Filter and the BFS Filter to determine their efficacy (i.e., their f-factors) for sign of small determinants. ¨
(End of Exercise)
We now derive some root bounds on algebraic expressions
Modular arithmetic (i.e., residue number systems) is an attractive method of implementing arithmetic because it allows each digit to be computed independently of the others. That is, is avoids the problem of carries, and thus is amenable to parallel execution in machine precision (assuming each digit fits into a machine word). Unfortunately, determining the sign of a number in modular representation seems to be non-trivial. We describe a solution provided by Brönnimann et al [4].
Sometimes, part of the arguments of an expression are fixed while others are vary. We can exploit this by constructing ``partially compiled expressions''. To see an example of this phenomenon, we look at the Beneath-Beyond Algorithm for convex hulls.
See [Page 147ff in Edelsbrunner].
Let S be a set of points in d dimensions. FOR SIMPLICITY, ASSUME GENERAL POSITION. We sort the points by their first coordinate, and add them in this order. The method (called Beneath-Beyond) begins with a d+1 simplex. At each step, we add a new point p to the current hull H:
1) Precision Driven Computation: e.g., CORE Library.
2) A New Computational Paradigm: - robustness as a computational resource - speed/robustness tradeoff curve - How to Admit Non-robustness but useful code: Checking as an antidote to program verification.
3) Root Bounds for Radical Expressions.
Program the partial evaluation idea for the convex hull algorithm. ¨
(End of Exercise)
1 Alternatively, we could require these numbers as the minimum standard for any experimental paper on filters.
2
While holding on tightly to our theoretician's hat!