Corrections to First Printing

Thanks to Joe Darcy, David Scott and Antoine Trux for pointing out many of the following corrections to the first printing (hardback), which are all corrected in the second printing (soft cover).

Here is a list of Minor Typos.

Here is a list of Corrections to Exercises.

In Chapter 5, the round-to-nearest rule was not specified properly for floating point numbers x between Nmax and Nmax + ulp(Nmax/2); these round down to Nmax, not up to infinity.

Nick Higham pointed out that Theorem 5.1 can be strengthened; the "<=" result for round-to-nearest can be replaced by "<". The reason is that the inequality |x| >= 2^E can be replaced by |x| > 2^E when x is not a floating point number. On the other hand, if x is a floating point number, the rounding error is zero.

In Chapter 7, the statement that "any operation with a NaN is an invalid operation" is not correct, though it is correct that the result is a NaN.

Also in Chapter 7, the discussion of overflow and underflow is not quite correct. The corrected text is here.

In Chapter 10, regarding the comment in the middle page 62 about the possibility of a compiler optimization changing the behavior of Program 3: although this was the case in traditional C, it is not permitted by the C89 or C99 standards.

At the end of Chapter 13 it is said that no modern microprocessor provides 128-bit quadruple precision in hardware. In fact, this is provided by the IBM 390. The ongoing revision of the IEEE standard specifies a 128-bit quadruple precision format.

Corrections to Second Printing

In the URL for reference [BH+02], "~dbailey" should be "~dhbailey"

In the URL for reference [Ede94], "www.math" should be "www-math"

Additional Comment on the Numerical Examples

The numerical results in Chapters 10 - 13 were obtained using a Sun workstation which has 64-bit floating point registers. As explained in the book, the results could be different on a PC using 80-bit floating point registers, and the results for which accurate answers are not obtained due to cancellation or, more generally, instability, ARE different on a PC. However, a PC does not produce much more accurate final answers, since results are rounded to the 32-bit single or 64-bit double format when stored in memory.