next up previous contents index
Next: Documentation and Software Conventions Up: Accuracy and Stability Previous: Further Details: Error Bounds   Contents   Index


Error Bounds for Fast Level 3 BLAS

The Level 3 BLAS specifications [40] specify the input, output and calling sequence for each routine, but allow freedom of implementation, subject to the requirement that the routines be numerically stable. Level 3 BLAS implementations can therefore be built using matrix multiplication algorithms that achieve a more favorable operation count (for suitable dimensions) than the standard multiplication technique, provided that these ``fast'' algorithms are numerically stable. The simplest fast matrix multiplication technique is Strassen's method, which can multiply two n-by-n matrices in fewer than $4.7 n^{\log_2 7}$ operations, where $\log_2 7 \approx 2.807$.

The effect on the results in this chapter of using a fast Level 3 BLAS implementation can be explained as follows. In general, reasonably implemented fast Level 3 BLAS preserve all the bounds presented here (except those at the end of subsection 4.10), but the constant p(n) may increase somewhat. Also, the iterative refinement routine xyyRFS may take more steps to converge.

This is what we mean by reasonably implemented fast Level 3 BLAS. Here, ci denotes a constant depending on the specified matrix dimensions.

(1) If A is m-by-n, B is n-by-p and $\widehat C$ is the computed approximation to C=AB, then

\begin{displaymath}
\Vert \widehat C- AB \Vert _{\infty} \le c_1(m,n,p) \epsilon \Vert A\Vert _{\infty}\Vert B\Vert _{\infty} + O(\epsilon ^2).
\end{displaymath}

(2) The computed solution $\widehat{X}$ to the triangular systems TX=B, where T is m-by-m and B is m-by-p, satisfies

\begin{displaymath}
\Vert T \widehat X- B \Vert _{\infty} \le c_2(m,p) \epsilon...
...rt _{\infty} \Vert\widehat X\Vert _{\infty}
+ O(\epsilon ^2).
\end{displaymath}

For conventional Level 3 BLAS implementations these conditions hold with c1(m,n,p) = n2 and c2(m,p)= m(m+1). Strassen's method satisfies these bounds for slightly larger c1 and c2.

For further details, and references to fast multiplication techniques, see [27].


next up previous contents index
Next: Documentation and Software Conventions Up: Accuracy and Stability Previous: Further Details: Error Bounds   Contents   Index
Susan Blackford
1999-10-01