Error Bounds for Fast Level 3 BLAS

The Level 3 BLAS specifications [40] specify the input, output
and calling sequence for each routine, but allow freedom of
implementation, subject to the requirement that the routines be
numerically stable.
Level 3 BLAS implementations can therefore be
built using matrix multiplication algorithms that achieve a more
favorable operation count (for suitable dimensions) than the standard
multiplication technique, provided that these ``fast'' algorithms are
numerically stable. The simplest fast matrix multiplication
technique is Strassen's
method, which can
multiply two ** n**-by-

The effect on the results in this chapter of using a fast Level 3 BLAS
implementation can be explained as follows. In general, reasonably
implemented fast Level 3 BLAS preserve all the bounds presented here
(except those at the end of subsection 4.10), but the constant
** p(n)** may increase somewhat. Also, the iterative refinement
routine
xyyRFS may take more steps to converge.

This is what we mean by reasonably implemented fast Level 3 BLAS.
Here, ** c_{i}** denotes a constant depending on the specified matrix dimensions.

(1) If ** A** is

(2)
The computed solution
to the triangular systems ** TX=B**,
where

For conventional Level 3 BLAS implementations these conditions hold with

For further details, and references to fast multiplication techniques, see [27].