   Next: Data Distribution Up: LU Factorization Previous: LU Factorization

## Derivation of a Block Algorithm for LU Factorization

Suppose the matrix A is partitioned as shown in Figure 5, and we seek a factorization A=LU, where the partitioning of L and U is also shown in Figure 5. Then we may write, where is , is , is , and is . and are lower triangular matrices with 1s on the main diagonal, and and are upper triangular matrices. Figure 5: Block LU factorization of the partitioned matrix A. is , is , is , and is . and are lower triangular matrices with 1's on the main diagonal, and and are upper triangular matrices.

Equations 1 and 2 taken together perform an LU factorization on the first panel of A (i.e., and ). Once this is completed, the matrices , , and are known, and the lower triangular system in Eq. 3 can be solved to give . Finally, we rearrange Eq. 4 as, From this equation we see that the problem of finding and reduces to finding the LU factorization of the matrix . This can be done by applying the steps outlined above to instead of to A. Repeating these steps K times, where where denotes the least integer greater than or equal to x, we obtain the LU factorization of the original matrix A. For an in-place algorithm, A is overwritten by L and U - the 1s on the diagonal of L do not need to be stored explicitly. Similarly, when A is updated by Eq. 5 this may also be done in place.

After k of these K steps, the first kr columns of L and the first kr rows of U have been evaluated, and matrix A has been updated to the form shown in Figure 6, in which panel B is and C is . Step k+1 then proceeds as follows,

1. factor B to form the next panel of L, performing partial pivoting over rows if necessary (see Figure 14). This evaluates the matrices , , and in Figure 6.
2. solve the triangular system to get the next row of blocks of U.
3. do a rank-r update on the trailing submatrix E, replacing it with . Figure 6: Stage k+1 of the block LU factorization algorithm showing how the panels B and C, and the trailing submatrix E are updated. The trapezoidal submatrices L and U have already been factored in previous steps. L has kr columns, and U has kr rows. In the step shown another r columns of L and r rows of U are evaluated.

The LAPACK implementation of this form of LU factorization uses the Level 3 BLAS routines xTRSM and xGEMM to perform the triangular solve and rank-r update. We can regard the algorithm as acting on matrices that have been partitioned into blocks of elements, as shown in Figure 7. Figure 7: Block-partitioned matrix A. Each block consists of matrix elements.   Next: Data Distribution Up: LU Factorization Previous: LU Factorization

Jack Dongarra
Sun Feb 9 10:05:05 EST 1997