next up previous
Next: Data Distribution Up: LU Factorization Previous: LU Factorization

Derivation of a Block Algorithm for LU Factorization

  Suppose the tex2html_wrap_inline1863 matrix A is partitioned as shown in Figure 5, and we seek a factorization A=LU, where the partitioning of L and U is also shown in Figure 5. Then we may write,
    eqnarray274
where tex2html_wrap_inline1765 is tex2html_wrap_inline1875, tex2html_wrap_inline1877 is tex2html_wrap_inline1879, tex2html_wrap_inline1881 is tex2html_wrap_inline1883, and tex2html_wrap_inline1815 is tex2html_wrap_inline1887. tex2html_wrap_inline1889 and tex2html_wrap_inline1891 are lower triangular matrices with 1s on the main diagonal, and tex2html_wrap_inline1791 and tex2html_wrap_inline1807 are upper triangular matrices.

  figure302
Figure 5: Block LU factorization of the partitioned matrix A. tex2html_wrap_inline1765 is tex2html_wrap_inline1875, tex2html_wrap_inline1877 is tex2html_wrap_inline1879, tex2html_wrap_inline1881 is tex2html_wrap_inline1883, and tex2html_wrap_inline1815 is tex2html_wrap_inline1887. tex2html_wrap_inline1889 and tex2html_wrap_inline1891 are lower triangular matrices with 1's on the main diagonal, and tex2html_wrap_inline1791 and tex2html_wrap_inline1807 are upper triangular matrices.

Equations 1 and 2 taken together perform an LU factorization on the first tex2html_wrap_inline1923 panel of A (i.e., tex2html_wrap_inline1765 and tex2html_wrap_inline1881). Once this is completed, the matrices tex2html_wrap_inline1889, tex2html_wrap_inline1933, and tex2html_wrap_inline1791 are known, and the lower triangular system in Eq. 3 can be solved to give tex2html_wrap_inline1805. Finally, we rearrange Eq. 4 as,
 equation324
From this equation we see that the problem of finding tex2html_wrap_inline1891 and tex2html_wrap_inline1807 reduces to finding the LU factorization of the tex2html_wrap_inline1887 matrix tex2html_wrap_inline1945. This can be done by applying the steps outlined above to tex2html_wrap_inline1945 instead of to A. Repeating these steps K times, where
 equation337
where tex2html_wrap_inline1953 denotes the least integer greater than or equal to x, we obtain the LU factorization of the original tex2html_wrap_inline1863 matrix A. For an in-place algorithm, A is overwritten by L and U - the 1s on the diagonal of L do not need to be stored explicitly. Similarly, when A is updated by Eq. 5 this may also be done in place.

After k of these K steps, the first kr columns of L and the first kr rows of U have been evaluated, and matrix A has been updated to the form shown in Figure 6, in which panel B is tex2html_wrap_inline1987 and C is tex2html_wrap_inline1991. Step k+1 then proceeds as follows,

  1. factor B to form the next panel of L, performing partial pivoting over rows if necessary (see Figure 14). This evaluates the matrices tex2html_wrap_inline1999, tex2html_wrap_inline2001, and tex2html_wrap_inline2003 in Figure 6.
  2. solve the triangular system tex2html_wrap_inline2005 to get the next row of blocks of U.
  3. do a rank-r update on the trailing submatrix E, replacing it with tex2html_wrap_inline2013.

  figure347
Figure 6: Stage k+1 of the block LU factorization algorithm showing how the panels B and C, and the trailing submatrix E are updated. The trapezoidal submatrices L and U have already been factored in previous steps. L has kr columns, and U has kr rows. In the step shown another r columns of L and r rows of U are evaluated.

The LAPACK implementation of this form of LU factorization uses the Level 3 BLAS routines xTRSM and xGEMM to perform the triangular solve and rank-r update. We can regard the algorithm as acting on matrices that have been partitioned into blocks of tex2html_wrap_inline1875 elements, as shown in Figure 7.

  figure353
Figure 7: Block-partitioned matrix A. Each block tex2html_wrap_inline2049 consists of tex2html_wrap_inline1875 matrix elements.


next up previous
Next: Data Distribution Up: LU Factorization Previous: LU Factorization

Jack Dongarra
Sun Feb 9 10:05:05 EST 1997