next up previous
Next: Examples of Block Algorithms Up: Block Algorithms and Their Previous: Block Algorithms and Their

Deriving a Block Algorithm

 

To derive a block form of Cholesky factorization, we partition the matrices as shown in Figure 4, in which the diagonal blocks of A and U are square, but of differing sizes. We assume that the first block has already been factored as tex2html_wrap_inline1801, and that we now want to determine the second block column of U consisting of the blocks tex2html_wrap_inline1805 and tex2html_wrap_inline1807. Equating submatrices in the second block of columns, we obtain
eqnarray167
Hence, since tex2html_wrap_inline1791 has already been computed, we can compute tex2html_wrap_inline1805 as the solution to the equation
eqnarray179
by a call to the Level 3 BLAS routine STRSM; and then we can compute tex2html_wrap_inline1807 from
eqnarray185

This involves first updating the symmetric submatrix tex2html_wrap_inline1815 by a call to the Level 3 BLAS routine SSYRK, and then computing its Cholesky factorization. Since Fortran 77 does not allow recursion, a separate routine must be called (using Level 2 BLAS rather than Level 3), named SPOTF2 in Figure 3. In this way, successive blocks of columns of U are computed. The LAPACK-style code for the block algorithm is shown in Figure 3. This code runs at 49 megaflops on an IBM 3090, more than double the speed of the LINPACK code. On a CRAY Y-MP, the use of Level 3 BLAS squeezes a little more performance out of one processor, but makes a large improvement when using all 8 processors. Table 2 summarizes the results.

      figure196
Figure 3: The body of the ``LAPACK-style'' routine SPOFA for block Cholesky factorization. In this code fragment, nb denotes the width of the blocks.
Figure 2: The body of the ``LAPACK-style'' routine SPOFA for Cholesky factorization.
Figure 1: The body of the LINPACK routine SPOFA for Cholesky factorization.

  figure212
Figure 4: Partitioning of A, tex2html_wrap_inline1821, and U into blocks. It is assumed that the first block has already been factored as tex2html_wrap_inline1801, and we next want to determine the block column consisting of tex2html_wrap_inline1805 and tex2html_wrap_inline1807. Note that the diagonal blocks of A and U are square matrices.

  table221
Table 2: Speed (Megaflops) of Cholesky Factorization tex2html_wrap_inline1761 for n = 500


next up previous
Next: Examples of Block Algorithms Up: Block Algorithms and Their Previous: Block Algorithms and Their

Jack Dongarra
Sun Feb 9 10:05:05 EST 1997