Let A = (aij) Î \Bbb Cm×n. Assume B
: =
| A*A
has eigenvalues m1, m2 ,¼, mn.
Proof.
Let x = (x1 ,¼, xn)T Î \Bbb Cn be a non-zero vector.
(a) We have
The inequality becomes an equality if we choose xj = |ai0j|/ai0j
where i0 is the index that achieves the maximum in
maxi åj |aij|. This proves
N¥(A) = maxi åj |aij|.
(b) Again, we have
Suppose j0 is the index that maximizes the expression
maxj åi |aij|, and we choose x = ej0 (the
unit vector with 1 in the j0th position). Then
inequalities £ (1) and £ (2) becomes
equalities because x is now a unit vector,
while inequality £ (2) becomes an equality
because of our choice of j0.
This proves
N1(A) = maxj åi |aij|.
(c)
For any m and x, if Bx = mx
then mx* x = x*Bx = (x*A*)(Ax). Hence
This shows that the eigenvalues mi of B are real and non-negative.
It is well-known that for any Hermitian matrix B (i.e., B* = B),
there exists unitary a matrix T (i.e., T*T = I) such that
B = T* D T where D is the diagonal matrix Diag(m1 ,¼, mn).
Without loss of generality, assume m1 is the largest eigenvalue.
For any x where ||x||2 = 1, we have
|
|
| |
|
| |
|
| |
|
| |
|
| |
|
m1 (||x||2 = ||y||2 = 1). |
|
|
The final inequality is an equality if y = e1 (the unit vector
with 1 in the first component). This proves
N2(A) = Ö{maxi mi}.
Q.E.D.
We read (a) and (b) as saying:
N¥(A) is the maximum of the 1-norm of the rows of A;
N1(A) is the maximum of the 1-norm of the columns of A.
To remember whether it is row or column, use this device:
the symbol ``¥'' has a dominant horizontal direction,
suggesting rows; likewise, the symbol ``1'' suggests columns.
It is not easy to compute N2(A).
Hence, the following connection between N2(A)
and the Frobenious norm ||A||F of A is useful.
Lemma 3
Ön N2(A) ³ ||A||F ³ N2(A). |
|
Proof.
Note that
||A||F2 = |
å
i
|
|
å
j
|
|aij|2 = Tr(A*A). |
|
Writing B = A*A = T*DT as in the above proof, we see
Therefore
n |
max
i
|
mi ³ Tr(B) ³ |
max
i
|
mi. |
|
The result follows by taking square-roots.
Q.E.D.
These bounds on N2(A) can be
computed in O(n2) arithmetic operations.
2 Elements of Error Analysis
Let u be the unit roundoff
for the floating point number system under discussion
(see the chapter on floating point numbers).
In the floating point system F(b, t), u = b1-t/2.
Recall that the standard error model says that
the approximate value [x\tilde] of any floating point arithmetic
operation is related to the exact value x by the relation
We develop some basic tools for the error analysis within this model.
The following is a slight extension of Higham [5,p. 69].
In the following, we assume that n is small enough so that
For any n ³ 1 and real e, define
When e = u, we simply write
Note that
Lemma 4
(i) gn(e) is a strictly increasing function
for e < 1/n.
(ii) If 1 > 2nu then |gn(e)| £ gn
iff e Î [ [(-u)/(1-2nu)],u].
(iii) Assume 1 > 2nu. For any real q, we have
|q| £ gn
iff
q = gn(e) for some
e Î [ [(-u)/(1-2nu)],u].
Proof.
(i) If e < e¢ < 1/n, then we verify that
1+gn(e) = [1/(1-ne)] < [1/(1-ne¢)] = 1+gn(e¢).
Note that e, e¢ can be negative.
(ii) Since 1 > 2nu, we have [(-u)/(1-2nu)] < 0 < u.
We check that gn([(-u)/(1-2nu)]) = -gn(u) = -gn.
By part (i), it follows that for all e Î [ [(-u)/(1-2nu)],u],
we have -gn £ gn(e) £ gn.
(iii) This is a consequence of (ii).
Q.E.D.
Lemma 5
Let |di| £ u, ri = ±1 and 1 > 2nu.
Then
|
n Õ
i = 1
|
(1+di)ri = 1 + gn(e) |
|
where for some e such that |gn(e)| £ u.
Proof.
Suppose n = 1. Then
where e Î [ [(-u)/(1-2u)],u].
Thus |g1(e)| £ u, as required.
Let n ³ 2.
By induction,
|
n-1 å
i = 1
|
(1+di)ri = 1+gn-1(e) |
|
and
where |gn-1(e)| £ u and |g1(e¢)| £ u.
It suffices to show that
1+gn(e¢¢) = (1+gn-1(e))(1+g1(e¢)) |
| (6) |
for some e¢¢ satisfying |gn(e¢¢)| £ u.
Alternatively, we must show that e¢¢ lies in the
range [-u/(1-2nu), u], given that
e Î [-u/(1-2(n-1)u),u] and
e¢ Î [-u/(1-2u),u].
It follows from (5) that (6) amounts to showing
Thus e¢¢ is a convex combination
of e(1-e¢) and e¢.
There are three cases to consider:
Case A.
If e¢ ³ 0, then both e(1-e¢)
and e¢ lie in the range [-u/(1-2nu), u].
So its convex combination e¢¢ lies
in the range [-u/(1-2nu), u].
Case B.
If e¢ < 0 and e ³ 0 then
e¢¢ £ |
n-1 n
|
e(1-e¢) £ |
n-1 n
|
u |
æ ç
è
|
1- |
u 1-2u
|
ö ÷
ø
|
£ u |
|
and
e¢¢ ³ |
1 n
|
e¢ ³ |
-u n(1-2u)
|
³ |
-u 1-2nu
|
. |
|
Case C.
Finally, assume e¢ < 0 and e < 0.
Then e¢¢ < 0 and we need to show that e¢¢ ³ -u/(1-2nu).
Since e¢¢ is a convex combination of
e(1-e¢) and e¢,
it suffices to show that both
e(1-e¢) and e¢ are at least -u/(1-2nu).
Clearly, e¢ ³ -u/(1-2u) ³ -u/(1-2nu).
Also,
where the last inequality may be verified directly.
Q.E.D.
On Evaluation of Expressions.
We consider expressions E
in the basic arithmetic operations,
generalized to allow summation (åi = 1n) or multi-product
(Õi = 1n). E.g.,
E = 3 x2 + x3 - (x1 x2 x3) + |
5 å
i = 1
|
xi. |
|
An expression gives a mathematical value, but we also
are interested in its evaluation on a computer. In this case,
the order of evaluating the expression is important.
This order is unique for a fully parenthesized expression.
But in general, for any E,
we define the natural evaluation of E to be
the direct left-to-right of expressions, respecting
any parentheses when present. Each
summation or multiproduct notation is linearized
in the obvious way, by increasing indices and evaluated left-to-right.
E.g., åi = 1n xn is simply a shorthand for
x1+x2+¼+xn, whose natural evaluation order is
((¼((x1+x2)+ x3) + ¼+ xn-1)+xn) |
|
Suppose E¢ is any equivalent expression, i.e., obtained
from E by applying any valid laws of arithmetic
(associativity, commutativity, distributivity, etc).
Because of finite precision arithmetic, the evaluation
of E¢ and E need not produce the same result.
The simplest example is
Another example is the following, using the law
of distributivity:
E = (x+y)(v+w), E¢ = xv + xw + yv + yw. |
| (8) |
In numerical analysis, we tend not to exploit
distributivity because they can render E¢ very
different from E. Comparing the errors in the
natural evaluations E and E¢ can be complicated.
One hint of this is the fact that the number
of operations in E and E¢ can be exponentially different:
E = |
Õ
| (1+xi), E¢ = 1 + |
å
i
|
xi + |
å
i < j
|
xixj +¼ |
Õ
i
|
xi. |
|
Therefore, we focus on one class of mathematically
equivalent expressions, called
re-ordering equivalent. Two expressions are re-ordering
equivalent if they can be obtained from each other by
applying the laws of associativity and commutativity only.
If E is re-ordering equivalent to E¢, then their
natural evaluation incurs the same number of arithetic operations.
An approximate value [E\tilde] for E is called a natural value
(resp. re-ordering value) if is obtained by the
natural evaluation (resp. re-ordering evaluation) of E.
Lemma 6
Let E = åi = 1n xi yi.
(i)
The natural value [E\tilde] of E satisfies
[E\tilde] = åi = 1n xi yi(1+qi) where
|q1| £ gn and for i = 2 ,¼, n,
|qi| £ gn-i+2.
(ii)
If E¢ is any re-ordering equivalent expression, its
natural value [(E¢)\tilde] satisfies
[(E¢)\tilde] = åi = 1n xi yi(1+qi) where
each |qi| £ gn.
Proof.
Let Ej = åi = 1j xi yi for j = 1 ,¼, n.
(i)
If n = 1, [E\tilde] = x1 y1(1+q1), |q1| £ g1.
This has the required form.
For n ³ 2, we inductively have
|
~ En-1
|
= |
n-1 å
i = 1
|
xiyi(1+q¢i) |
|
where |q¢1| £ gn-1
and for i = 2 ,¼, n-1, |q¢i| £ gn-i+1.
From
|
|
( |
~ En-1
|
+ xn yn (1+q¢n))(1+q) |
| |
|
|
n å
i = 1
|
xiyi(1+q¢i)(1+q) |
| |
|
|
|
where |qi| satisfies the lemma.
(ii)
This is immediate from part (i).
Q.E.D.
Lemma 7
Let E = (z - åi = 1n-1 xi yi)/yn.
If E¢ is any re-ordering equivalent expression,
its natural value [(E¢)\tilde] satisfies
|
~ E¢
|
(1+qn) = |
z - |
n-1 å
i = 1
|
xi yi(1+qi) |
yn
|
|
|
where each |qi| £ gn.
Proof.
Q.E.D.
3 Conditioning
Let f: X® Y be a continuous function on the normed
spaces X, Y. The condition number kf(x)
of an input instance x Î X for problem f
measures the sensitivity of f(x) to perturbations in x.
In particular, ``ill or well conditioning'' refers to the
relative largeness or smallness of kf(x).
We can work with an absolute or a relative notion of
conditioning. We follow [8] in the
following definitions.
There is an absolute and a relative concept of conditioning.
The relative notion is more important, but since the
absolute notion is simpler, we also will present it.
The absolute conditioning function of f is defined by
kAf(x) |
: =
| |
lim
d® 0
|
|
sup
||dx|| £ d
|
|
||df|| ||dx||
|
. |
| (9) |
Here, we write df for f(x+dx)-f(x).
If f is differentiable,
write J(x) = Jf(x) for the Jacobian of f at x.
The (i,j)th entry of J(x) is ¶fi/ ¶xj(x).
Then the quantities dx and
df becomes (respectively)
the infinitesimals dx and J(x)dx.
Thus
where || Jf(x)|| is the derived norm for the norms on X and Y.
The relative conditioning function of f is defined as
kfR(x) |
: =
| |
lim
d® 0
|
|
sup
||dx|| £ d
|
|
æ ç
è
|
||df|| || f(x) ||
|
| / |
|
||dx|| ||x||
|
. |
ö ÷
ø
|
|
| (11) |
Again when f is differentiable, we get
kfR(x) = || Jf(x)|| |
|| x|| || f(x)||
|
. |
| (12) |
In the following, we simply write kf(x) to refer
to the relative conditioning function.
Condition Number of a Matrix.
Let f: \Bbb Cn ® \Bbb Cm where
where A is a m×n matrix. Applying
the definitions directly, we obtain
|
|
|
lim
d® 0
|
|
sup
||dx|| £ d
|
|
æ ç
è
|
||df|| || f(x)||
|
| / |
|
||dx|| ||x||
|
|
ö ÷
ø
|
|
| |
|
|
lim
d® 0
|
|
sup
||dx|| £ d
|
|
æ ç
è
|
||A(x + dx) - Ax|| ||Ax||
|
| / |
|
||dx|| ||x||
|
|
ö ÷
ø
|
|
| |
|
|
lim
d® 0
|
|
sup
||dx|| £ d
|
|
æ ç
è
|
||A dx|| ·||x|| ||Ax|| ·||dx||
|
ö ÷
ø
|
|
| |
|
|
|
If A is square, and A-1 exists,
then we have || x || = ||A-1Ax || £ ||A-1|| ·||Ax||,
Plugging for ||x|| into the above condition number, we obtain
kA(x) £ || A || ·|| A-1 || . |
|
The nice thing about the righthand side is that it is independent
of x. This quantity is so useful that it deserves its
own name and notation:
k(A) |
: =
| || A || ·|| A-1 || |
|
is the condition number of the square matrix A.
This number k(A) is also when we want to solve
the linear equation Ax = b where b is given and we need
to compute x. This is just the original problem with
A-1 in place of A, but because of the way we define
condition numbers of a matrix, the condition number of A
can serve to bound the conditioning of this problem as well.
Condition Number of matrix. This is defined to be
Normwise vs. componentwise norm. |A| = [ |aij| ]
Distance to a Singularity.
The numerical stability of
a numerical problem is influenced by its distance
to the nearest singularity. For linear systems,
this distance can be defined as follows:
for a nonsingular matrix A and any matrix norm N,
the closest distance from A to a singular matrix is defined as
where S range over all singular matrices.
We next characterize this distance.
This result, attributed to Turing, was known to Banach in the 1920s.
Its generalization to arbitrary norms is from
Gastinel (1966).
First, we note a simple fact:
Lemma 8
For any square matrix A and any
derived norm N,
if N(A) < 1 then I+A is invertible.
Proof.
This is equivalent to showing that (I+A)x = \0 has the unique
solution x = \0. If (I+A)x = \0 then x = -Ax
and hence N(x) = N(Ax) £ N(A)N(x). We claim that N(x) = 0
since otherwise N(A) < 1 implies N(x) £ N(A)N(x) < N(x). Thus x = \0.
Q.E.D.
Theorem 9 [Turing-Gastinel]
For non-singular A and derived norm N,
d(A) = |
inf
S
|
{ N(A-S)} = |
1 N(A-1)
|
. |
|
Proof.
We first prove that d(A) ³ [1/(N(A-1))].
Let S be any matrix such that N(S-A) < [1/(N(A-1))].
It suffices to that S is non-singular.
Writing D for S-A, we
have N(A-1D) £ N(A-1)N(D) < 1.
Thus (I+A-1D) is invertible and
we see that
As a product to two non-singular matrices, S is non-singular.
To show that dT(A) £ [1/(N(A-1)N(A) )],
we construct a singular matrix S
such that N(S-A) = [1/(N(A-1))].
We pick two vectors x, y such that
y* A-1 x = N(A-1), N(xy*) = 1. |
|
We check that
|
æ ç
è
|
A - |
xy* N(A-1)
|
ö ÷
ø
|
A-1x = 0. |
|
This proves that S = A - [(xy*)/(N(A-1))] is singular.
Moreover, N(A-S) = N(xy*)/N(A-1) = 1/N(A-1).
Q.E.D.
Component-wise Distance.
For a non-singular square matrix A, define
(see [3])
dT(A) |
: =
| |
inf
S
|
|
ì í
î
|
|
N(S-A) N(A)
|
ü ý
þ
|
|
| (13) |
where S ranges all singular matrices. Thus dT(A)
is the relative distance from A to the nearest singular matrix S.
The subscript refers to Turing.
Rewrite the definition of dA(T) in (13) as
dT(A) = |
inf
S
|
{ a: N(S-A) = aN(A) } |
|
where S ranges over all singular matrices.
We can further rewrite it as
dT(A) = |
inf
S,a
|
{ a: N(S-A) £ aN(A) } |
|
The componentwise analogue of this definition is therefore
dC(A) |
: =
| |
inf
S, a
|
{ a: |S-A| £ a|A| }. |
|
A characterization of dC(A) is
given by Rohn (1990) (see [3,p.66]).
4 Convex Hulls and Partial Compilation of Expressions
Sometimes, part of the arguments of an expression
are fixed while others are vary. We can exploit this by
constructing ``partially compiled expressions''.
To see an example of this phenomenon, we look at
the Beneath-Beyond Algorithm for convex hulls.
See [Page 147ff in Edelsbrunner].
Let S be a set of points in d dimensions.
FOR SIMPLICITY, ASSUME GENERAL POSITION.
We sort the points by their first coordinate, and add them in this order.
The method (called Beneath-Beyond) begins with a d+1 simplex.
At each step, we add a new point p to the current hull H:
Prove the 5 properties stated for the norm N¢ derived
from Nm, Nn.
Let A be a square matrix and N a matrix norm.
(i)
Show that N(A) ³ |m| for any eigenvalue of A.
(ii)
Let us show that the above inequality cannot be improved
to an equality. Let
Then the eigenvalues of A satisfy |m1| = |m2| = 1.
Prove that for any norm N, we have N(A) > 1.
Hint: Write A = I+E where I is the identity matrix.
What is Ak for any k Î \Bbb N?
Show that if N(A) £ 1 then N(E) = 0, contradiction.
(iii) (Fritz John)
Show that for all square matrices A and all e > 0,
there exists a norm N such that N(A) £ e+ maxm |m|
where m ranges over the eigenvalues of A.
(a) Give specific numerical values for the expressions E and E¢
(7) so that their natural evaluations
result is different answers. Assume IEEE arithmetic.
(b) Do the same for (8).
Improve lemma 5 in case ri = 1 for all i.
(Trefethan)
Determine the absolute and relative
conditioning function of the following functions.
(i) f(x) = x/2 for x Î \Bbb C.
(ii) Suppose f(x) = Öx for x > 0.
(ii) Suppose f(x) = x-y for (x,y) Î \Bbb C2.
Use the ¥-norm on \Bbb C2.
5 OLD
REFERENCE:
Certified Computation of the Sign of a Matrix Determinant,
Dario A. Bini, Victor Y. Pan and Yanqiang Yu.
1998, abstract.
An approach to computing the sign of a determinant D
of an n×n matrix M
is to numerically compute an approximate value A for D,
estimate the roundoff error E, and to certify the computed sign
of A in case |A| > E.
Unfortunately, |D| varies from 0 to ||M||n,
and the the known techniques for error analysis
gives error bounds that are large even when |D| is small.
A predecessor to this paper by Pan, Yu and Stewart uses
this approach.
The present paper avoids estimating the roundoff error E,
but instead solve the simpler task of estimating
the mininum distance 1/||M-1|| from M to a singular
matrix. The method uses O(n3) single precision flops.
Based on this estimate, in case the sign is not certified,
a decision can be made for increasing the precision of
the numerical computation or to fall back to symbolic
guaranteed methods. Thus, this algorithm can be used
as a kind of filter.
Other work:
Clarkson proposed a general algorithm based on
Gram-Schmidt orthogonalization. The certification
method is rather complicated to program.
For n £ 4, Avnaim et al proposed an efficient ``single-precision''
method. Modular methods have been proposed by Brönnimann et al.
6 Roundoff Errors in Gaussian Elimination
Let A be a n×n matrix, and Ak be its kth column
and ai,j its (i,j)th entry.
The following matrix operator norms are used:
the 2-norm ||A||2, the row norm ||A||¥ = maxi åj |ai,j|,
the column norm ||A||1 = maxj åi |ai,j|.
Also, let |A| denote the matrix with (i,j)th entry |ai,j|.
A triangular matrix is unit (resp., proper) if
its diagonal entries are all 1 (resp., 0).
Theorem 10
(Higham, Theorem 9.3, page. 175): Let
A = PA¢Q, |
~ A
|
= A¢+ E = |
~ L
|
|
~ U
|
|
|
where the matrices are all n square, P and Q are permutation
matrices, [L\tilde] and [U\tilde] are unit lower triangular
and U upper triangular. If [L\tilde] and [U\tilde]
are computed numerically from A by Gaussian elimination
with complete pivoting, then
where gn = ne/(1-ne) and e is the
unit roundoff.
Since det(A) = det(P)det(A¢)det(Q) and
det(P) = ±1 and det(Q) = ±1 are known,
it is enough to determine the sign of det(A¢).
Corollary 11
For any operator norm,
||E|| £ e = || (|| |
~ L
|
||·|| |
~ U
|
||) || gn |
|
Computing the columns and row norms uses O(n2) arithmetic operations.
References
- [1]
-
F. Avnaim, J.-D. Boissonnat, O. Devillers, F. Preparata, and M. Yvinec.
Evaluating signs of determinants using single-precision arithmetic.
Algorithmica, 17(2):111-132, 1997.
- [2]
-
H. Brönnimann and M. Yvinec.
Efficient exact evaluation of signs of determinants.
Algorithmica, 27:21-56, 2000.
- [3]
-
F. Chaitin-Chatelin and V. Frayssé.
Lectures on Finite Precision Computations.
Society for Industrial and Applied Mathematics, Philadelphia, 1996.
- [4]
-
K. L. Clarkson.
Safe and effective determinant evaluation.
IEEE Foundations of Computer Science, 33:387-395, 1992.
- [5]
-
N. J. Higham.
Accuracy and stability of numerical algorithms.
Society for Industrial and Applied Mathematics, Philadelphia, 1996.
- [6]
-
F. John.
Lectures on Advanced Numerical Analysis.
Gordon and Breach, 150, Fifth Avenue, New York, NY 10011, 1967.
- [7]
-
V. Y. Pan and Y. Yu.
Certified computation of the sign of a matrix determinant.
Proc. 10th ACM-SIAM Symp. on Discrete Algorithms (SODA99),
pages 715-724, 1999.
Accepted for Algorithmica.
- [8]
-
L. N. Trefethen and I. David Bau.
Numerical Linear Algebra.
Society for Industrial and Applied Mathematics, Philadelphia, 1997.
Footnotes:
1
We rely on the fact that
these properties fit familiar ``mathematical patterns'' such
the triangular inequality.
File translated from
TEX
by
TTH,
version 2.78. On 26 Apr 2001, 13:46.
|