LU Factorization   Next: QR Factorization Up: Factorization Routines Previous: Factorization Routines

## LU Factorization

The LU factorization applies a sequence of Gaussian eliminations to form , where and are matrices, and is an matrix. is unit lower triangular (lower triangular with 1's on the main diagonal), is upper triangular, and is a permutation matrix, which is stored in a vector.

At the -th step of the computation ( ), it is assumed that the submatrix of ( , ) is to be partitioned as follows, where the block is , is , is , and is . is a unit lower triangular matrix, and is an upper triangular matrix.

At first, a sequence of Gaussian eliminations is performed on the first panel of (i.e., and ). Once this is completed, the matrices , , and are known, and we can rearrange the block equations, The LU factorization can be done by recursively applying the steps outlined above to the matrix . Figure 3 shows a snapshot of the block LU factorization. It shows how the column panel, and , and the row panel, and , are computed, and how the trailing submatrix is updated. In the figure, the shaded areas represent data for which the corresponding computations are completed. Later, row interchanges will be applied to and . Figure 3: A snapshot of block LU factorization.

The computation of the above steps in the LAPACK routine, DGETRF, involves the following operations:

1. DGETF2: Apply the LU factorization on an column panel of (i.e., and ).

• [ Repeat times ( ) ]

• IDAMAX: find the (absolute) maximum element of the -th column and its location

• DSWAP: interchange the -th row with the row which holds the maximum

• DSCAL: scale the -th column of the matrix

• DGER: update the trailing submatrix

2. DLASWP: Apply row interchanges to the left and the right of the panel.

3. DTRSM: Compute the row panel of , 4. DGEMM: Update the rest of the matrix, , The corresponding parallel implementation of the ScaLAPACK routine, PDGETRF, proceeds as follows:

1. PDGETF2: The current column of processes performs the LU factorization on an panel of (i.e., and ).

• [ Repeat times ( ) ]

• PDAMAX: find the (absolute) maximum value of the -th column and its location (pivot information will be stored on the column of processes)

• PDLASWP: interchange the -th row with the row which holds the maximum

• PDSCAL: scale the -th column of the matrix

• PDGER: broadcast the -th row columnwise ( elements) in the current column of processes and update the trailing submatrix

• Every process in the current process column broadcasts the same pivot information rowwise to all columns of processes.

2. PDLASWP: All processes apply row interchanges to the left and the right of the current panel.

3. PDTRSM: is broadcast along the current row of processes, which converts the row panel to .

4. PDGEMM: The column panel is broadcast rowwise across all columns of processes. The row panel is broadcast columnwise down all rows of processes. Then, all processes update their local portions of the matrix, .   Next: QR Factorization Up: Factorization Routines Previous: Factorization Routines

Susan Ostrouchov
Fri Apr 28 09:37:26 EDT 1995