next up previous contents index
Next: New Sources of Error Up: Accuracy and Stability Previous: Accuracy and Stability

Sources of Error in Numerical Calculations

 

          The effects of two sources of error can be measured by the bounds in this chapter: roundoff error and input error. Roundoff error arises from rounding results of floating-point operations during the algorithm. Input error is error in the input to the algorithm from prior calculations or measurements. We describe roundoff error first, and then input error.

Almost all the error bounds ScaLAPACK provides are multiples of relative machine precision,     which we abbreviate by tex2html_wrap_inline17202. Relative machine precision (epsilon) bounds the roundoff in individual floating-point operations. It may be loosely defined as the largest relative error    in any floating-point operation that neither overflows nor underflows. (Overflow means the result is too large to represent accurately, and underflow means the result is too small to represent accurately.) Relative machine precision (epsilon) is available either by the function call   PSLAMCH(ICTXT, 'Epsilon') (or simply PSLAMCH(ICTXT, 'E'))gif in single precision, or by the function call PDLAMCH(ICTXT, 'Epsilon') (or PDLAMCH(ICTXT, 'E')) in double precision. See section 6.1 and Table 6.1 for a discussion of common values of machine epsilon.      

PDLAMCH(ICTXT,`E') returns a single value for the selected machine parameter `E' on all processes within the context ICTXT. If these processes are running on a network of heterogeneous processors , with different floating-point arithmetics, then a ``safe'' common value is returned, the maximum value of machine epsilon for all the processors.

In case of overflow,   there are two common system responses: stopping with an error message, or returning tex2html_wrap_inline17191 and continuing to compute. The latter is the default response of IEEE standard floating-point arithmetic [7, 8]  , the most commonly used arithmetic. It is possible to change this default to abort with an error message, which is often useful for debugging.

In contrast to LAPACK, ScaLAPACK can take advantage of arithmetic with tex2html_wrap_inline17191   to accelerate the routines that compute eigenvalues of symmetric matrices using PxLAIECT (the drivers PxSYEVX and PxSYGVX, and their complex counterparts).           PxLAIECT comes in two different versions, one in which arithmetic with tex2html_wrap_inline17191 is available (the default) and one in which it is not. When tex2html_wrap_inline17191 is available, the inner loop of PxLAIECT is accelerated by removing a branch to test for and avoid division by zero. This speed advantage is realized only when arithmetic with tex2html_wrap_inline17191 is as fast as arithmetic with normalized floating-point numbers; this is usually but not always the case [42]. The compile time flag NO_IEEE can be used during installation to run without using tex2html_wrap_inline17191; see the ScaLAPACK Installation Guide for details [24].

Since underflow is almost always less significant than roundoff, we will not consider it further in this section (but see section 6.1).

Bounds on input errors may be easily incorporated into most ScaLAPACK error bounds. Suppose the input data is accurate to, say, five decimal digits (we discuss exactly what this means in section 6.3). Then one simply replaces tex2html_wrap_inline17202 by tex2html_wrap_inline17220 in the error bounds.

Further Details: Floating-Point Arithmetic 

    Roundoff error is bounded in terms of the relative machine precision tex2html_wrap_inline17202,   which is the smallest value satisfying
displaymath17196
where a and b are floating-point numbers , tex2html_wrap_inline17228 is any one of the four operations +, -, tex2html_wrap_inline12420, and tex2html_wrap_inline17236, and tex2html_wrap_inline17238 is the floating-point result of tex2html_wrap_inline17240. Relative machine precision, tex2html_wrap_inline17202, is the smallest value for which this inequality is true for all tex2html_wrap_inline17228, and for all a and b such that tex2html_wrap_inline17240 is neither too large (magnitude exceeds the overflow threshold)   nor too small (is nonzero with magnitude less than the underflow threshold)   to be represented accurately in the machine. We also assume tex2html_wrap_inline17202 bounds the relative error in unary   operations such as square root:
displaymath17197

A precise characterization of tex2html_wrap_inline17202 depends on the details of the machine arithmetic and sometimes even of the compiler. For example, if addition and subtraction are implemented without a guard digit,gif we must redefine tex2html_wrap_inline17202 to be the smallest number such that
displaymath17198

In order to assure portability , machine parameters such as relative machine precision (epsilon), the overflow threshold and underflow threshold are computed at runtime by the auxiliary    routine PxLAMCH. The alternative, keeping a fixed table of machine parameter values, would degrade portability because the table would have to be changed when moving from one machine or combination of machines, or even one compiler, to another.

Most machines (but not yet all) do have the same machine parameters because they implement IEEE Standard Floating Point Arithmetic   [7, 8], which exactly specifies floating-point number representations and operations. For these machines, including all modern workstations and PCs,gif the values of these parameters are given in Table 6.1.

Unfortunately, machines claiming to implement IEEE arithmetic may still compute different results from the same program and input. Here are some examples. Intel processors have 80-bit floating-point registers, and the fastest way to use them is to evaluate all results to 80-bit accuracy until they are stored back to memory in 32-bit or 64-bit format. The IBM RS/6000 has a fused multiply-add instruction that evaluates a+b*c with one rounding error instead of two. The DEC Alpha's default (fast) mode is to flush underflowed values to zero instead of returning subnormal numbers, which is the default demanded by the IEEE standard;     in this mode the DEC Alpha aborts if it encounters a subnormal number. In all these cases machines may be made to operate absolutely identically, for example, by rounding all intermediate results back to single or double on an Intel machine, or by doing subnormal arithmetic carefully and slowly on a DEC Alpha. These heterogeneities lead to errors encountered only in parallel computing; see section 6.2 for further discussion.

  table4445
Table 6.1: Values of machine parameters in IEEE floating-point arithmetic

As stated above, we will ignore underflow in discussing error bounds. Reference [37] discusses extending error bounds to include underflow and shows that for many common computations, when underflow occurs, it is less significant than roundoff.

Overflow historically resulted in an error message and stopped execution, in which case our error bounds would not apply. But with IEEE floating-point arithmetic,   the default is that overflow returns tex2html_wrap_inline17191 and execution continues. Indeed, with IEEE arithmetic machines can continue to compute past overflows, even division by zero, square roots of negative numbers, etc., by producing tex2html_wrap_inline17191 and NaN (``Not a Number'') symbols according to special rules of arithmetic. The default on many systems is to continue computing with these symbols. Routine PxLAIECT exploits this arithmetic to accelerate the computations of eigenvalues, as discussed above. It is also possible to stop with an error message when overflow occurs, a feature that is often useful for debugging. The user should consult the system manual to see how to turn error messages on or off.                            

Most of our error bounds will simply be proportional to relative machine precision (epsilon). This means, for example, that if the same problem in solved in double precision and single precision, the error bound in double precision will be smaller than the error bound in single precision by a factor of tex2html_wrap_inline17278. In IEEE arithmetic, this ratio is tex2html_wrap_inline17280, meaning that one expects the double-precision answer to have approximately nine more decimal digits correct than the single-precision answer.

Like their counterparts in LAPACK. ScaLAPACK routines are generally insensitive to the details of rounding, provided all processes perform arithmetic identically. The one exception is PxLAIECT, as mentioned above. The next section discusses what can happen when processes do not perform arithmetic identically, that is, are heterogeneous.


next up previous contents index
Next: New Sources of Error Up: Accuracy and Stability Previous: Accuracy and Stability

Susan Blackford
Tue May 13 09:21:01 EDT 1997