To derive a block form of Cholesky
factorization, we partition the matrices as shown in Figure
4, in which the diagonal blocks of *A* and *U* are square,
but of differing sizes. We assume that the first block has already been
factored as , and that we now want to determine
the second block column of *U* consisting of the blocks and
.
Equating submatrices in the second block of columns, we obtain

Hence, since has already been computed, we can compute as
the solution to the equation

by a call to the Level 3 BLAS routine STRSM; and then we can compute
from

This involves first updating the symmetric submatrix by a call to the
Level 3 BLAS routine SSYRK, and then computing its Cholesky factorization.
Since Fortran 77 does not allow recursion, a separate routine must be called
(using Level 2 BLAS rather than Level 3), named SPOTF2 in Figure
3.
In this way, successive blocks of columns of *U* are computed.
The LAPACK-style code for the block algorithm is shown in Figure
3.
This code runs at 49 megaflops on an IBM 3090, more than double the speed of
the LINPACK code.
On a CRAY Y-MP, the use of Level 3 BLAS squeezes a little more performance
out of one processor, but makes a large improvement when using all 8
processors.
Table 2 summarizes the results.

**Figure 3:** The body of the ``LAPACK-style'' routine SPOFA for block Cholesky
factorization. In this code fragment, `nb` denotes the width of
the blocks.

**Figure 2:** The body of the ``LAPACK-style'' routine SPOFA for Cholesky
factorization.

**Figure 1:** The body of the LINPACK routine SPOFA for Cholesky factorization.

**Figure 4:** Partitioning of *A*, , and *U* into blocks. It is assumed that
the first block has already been factored as , and we
next want to determine the block column consisting of and .
Note that the diagonal blocks of *A* and *U* are square matrices.

**Table 2:** Speed (Megaflops) of Cholesky Factorization for *n* = 500

Sun Feb 9 10:05:05 EST 1997