Further Details: Floating point arithmetic



next up previous contents index
Next: How to Measure Up: Sources of Error Previous: Sources of Error

Further Details: Floating point arithmetic

 

    Roundoff error is bounded in terms of the machine precision ,   which is the smallest value satisfying

where and are floating-point numbers , is any one of the four operations +, , and , and is the floating-point result of . Machine epsilon, , is the smallest value for which this inequality is true for all , and for all and such that is neither too large (magnitude exceeds the overflow threshold)   nor too small (is nonzero with magnitude less than the underflow threshold)   to be represented accurately in the machine. We also assume bounds the relative error in unary   operations like square root:

A precise characterization of depends on the details of the machine arithmetic and sometimes even of the compiler. For example, if addition and subtraction are implemented without a guard digitgif we must redefine to be the smallest number such that

In order to assure portability , machine parameters such as machine epsilon, the overflow threshold and underflow threshold are computed at runtime by the auxiliary    routine xLAMCHgif. The alternative, keeping a fixed table of machine parameter values, would degrade portability because the table would have to be changed when moving from one machine, or even one compiler, to another.

Actually, most machines, but not yet all, do have the same machine parameters because they implement IEEE Standard Floating Point Arithmetic   [5][4], which exactly specifies floating-point number representations and operations. For these machines, including all modern workstations and PCsgif, the values of these parameters are given in Table 4.1.

  
Table 4.1: Values of Machine Parameters in IEEE Floating Point Arithmetic

As stated above, we will ignore overflow and underflow in discussing error bounds. Reference [18] discusses extending error bounds to include underflow, and shows that for many common computations, when underflow occurs it is less significant than roundoff. Overflow generally causes an error message and stops execution, so the error bounds do not applygif.        

Therefore, most of our error bounds will simply be proportional to machine epsilon. This means, for example, that if the same problem in solved in double precision and single precision, the error bound in double precision will be smaller than the error bound in single precision by a factor of . In IEEE arithmetic, this ratio is , meaning that one expects the double precision answer to have approximately nine more decimal digits correct than the single precision answer.

LAPACK routines are generally insensitive to the details of rounding, like their counterparts in LINPACK and EISPACK. One newer algorithm (xLASV2) can return significantly more accurate results if addition and subtraction have a guard digit     (see the end of section 4.9).



next up previous contents index
Next: How to Measure Up: Sources of Error Previous: Sources of Error




Tue Nov 29 14:03:33 EST 1994