The effects of two sources of error can be measured by the bounds in
this chapter: *roundoff error* and *input error*. Roundoff error
arises from rounding results of floating-point operations during
the algorithm.
Input error is error in the input to the algorithm from prior calculations or
measurements. We describe roundoff error first, and then input error.

Almost all the error bounds ScaLAPACK provides are multiples of
*relative machine precision*,
which we abbreviate
by . Relative machine precision (epsilon) bounds the roundoff in
individual floating-point operations.
It may be loosely defined as the largest relative error
in any floating-point operation that neither overflows nor underflows.
(Overflow means the result is too
large to represent accurately, and underflow means the result is too
small to represent accurately.) Relative machine precision (epsilon) is
available either by
the function call
PSLAMCH(ICTXT, 'Epsilon') (or simply
PSLAMCH(ICTXT, 'E')) in single
precision, or
by the function call PDLAMCH(ICTXT, 'Epsilon') (or PDLAMCH(ICTXT, 'E'))
in double precision.
See section 6.1 and
Table 6.1 for a discussion of common values of machine
epsilon.

PDLAMCH(ICTXT,`E') returns a single value for the selected machine parameter `E' on all processes within the context ICTXT. If these processes are running on a network of heterogeneous processors , with different floating-point arithmetics, then a ``safe'' common value is returned, the maximum value of machine epsilon for all the processors.

In case of overflow, there are two common system responses: stopping with an error message, or returning and continuing to compute. The latter is the default response of IEEE standard floating-point arithmetic [7, 8] , the most commonly used arithmetic. It is possible to change this default to abort with an error message, which is often useful for debugging.

In contrast to LAPACK, ScaLAPACK can take advantage of arithmetic with to accelerate the routines that compute eigenvalues of symmetric matrices using PxLAIECT (the drivers PxSYEVX and PxSYGVX, and their complex counterparts). PxLAIECT comes in two different versions, one in which arithmetic with is available (the default) and one in which it is not. When is available, the inner loop of PxLAIECT is accelerated by removing a branch to test for and avoid division by zero. This speed advantage is realized only when arithmetic with is as fast as arithmetic with normalized floating-point numbers; this is usually but not always the case [42]. The compile time flag NO_IEEE can be used during installation to run without using ; see the ScaLAPACK Installation Guide for details [24].

Since underflow is almost always less significant than roundoff, we will not consider it further in this section (but see section 6.1).

Bounds on *input errors*
may be easily incorporated
into most ScaLAPACK error bounds.
Suppose the input data is accurate to, say, five decimal digits
(we discuss exactly what
this means in section 6.3). Then one simply replaces
by in the error bounds.

**Further Details: Floating-Point Arithmetic**

Roundoff error is bounded in terms of the *relative machine precision*
,
which is the smallest value satisfying

where *a* and *b* are floating-point numbers ,
is any one of the four operations +, -, , and ,
and is the floating-point result of .
Relative machine precision, , is the smallest value for which this inequality
is true for all , and for all *a* and *b* such that
is neither too large (magnitude exceeds the overflow
threshold)
nor too small
(is nonzero with magnitude less than the underflow threshold)
to be represented accurately in the machine.
We also assume bounds the relative error in unary
operations such as square root:

A precise characterization of depends on the details of the
machine arithmetic and sometimes even of the compiler.
For example, if addition and
subtraction are implemented without a guard digit,
we must redefine to be the smallest number
such that

In order to assure portability , machine parameters such as relative machine precision (epsilon), the overflow threshold and underflow threshold are computed at runtime by the auxiliary routine PxLAMCH. The alternative, keeping a fixed table of machine parameter values, would degrade portability because the table would have to be changed when moving from one machine or combination of machines, or even one compiler, to another.

Most machines (but not yet all) do have the same machine parameters because they implement IEEE Standard Floating Point Arithmetic [7, 8], which exactly specifies floating-point number representations and operations. For these machines, including all modern workstations and PCs, the values of these parameters are given in Table 6.1.

Unfortunately, machines claiming to implement IEEE arithmetic may still
compute different results from the same program and input.
Here are some examples.
Intel processors have 80-bit floating-point registers, and the fastest
way to use them is to evaluate all results to 80-bit accuracy until they
are stored back to memory in 32-bit or 64-bit format.
The IBM RS/6000 has a fused multiply-add instruction that evaluates
*a*+*b***c* with one rounding error instead of two. The DEC Alpha's default (fast)
mode is to flush underflowed values to zero instead of returning
subnormal numbers, which is the default demanded by the IEEE standard;
in this mode the DEC Alpha aborts if it encounters a subnormal number.
In all these cases machines may be made to operate absolutely identically,
for example, by rounding all intermediate results back to single or double on
an Intel machine, or by doing subnormal arithmetic carefully and slowly
on a DEC Alpha.
These heterogeneities lead to errors encountered only in parallel computing;
see section 6.2
for further discussion.

**Table 6.1:** Values of machine parameters in IEEE floating-point arithmetic

As stated above, we will ignore underflow in discussing error bounds. Reference [37] discusses extending error bounds to include underflow and shows that for many common computations, when underflow occurs, it is less significant than roundoff.

Overflow historically resulted in an error message and stopped execution, in which case our error bounds would not apply. But with IEEE floating-point arithmetic, the default is that overflow returns and execution continues. Indeed, with IEEE arithmetic machines can continue to compute past overflows, even division by zero, square roots of negative numbers, etc., by producing and NaN (``Not a Number'') symbols according to special rules of arithmetic. The default on many systems is to continue computing with these symbols. Routine PxLAIECT exploits this arithmetic to accelerate the computations of eigenvalues, as discussed above. It is also possible to stop with an error message when overflow occurs, a feature that is often useful for debugging. The user should consult the system manual to see how to turn error messages on or off.

Most of our error bounds will simply be proportional to relative machine precision (epsilon). This means, for example, that if the same problem in solved in double precision and single precision, the error bound in double precision will be smaller than the error bound in single precision by a factor of . In IEEE arithmetic, this ratio is , meaning that one expects the double-precision answer to have approximately nine more decimal digits correct than the single-precision answer.

Like their counterparts in LAPACK.
ScaLAPACK routines are generally insensitive to the details of rounding,
provided all processes perform arithmetic identically.
The one exception is PxLAIECT, as mentioned above.
The next section discusses what can happen when processes do not perform
arithmetic identically, that is, are *heterogeneous*.

Tue May 13 09:21:01 EDT 1997