Name

HPL_pdrpanllN Left-looking recursive panel factorization.

Synopsis

#include <hpl.h>

void HPL_pdrpanllN( HPL_T_panel * PANEL, const int M, const int N, const int ICOFF, double * WORK );

Description

HPL_pdrpanllN recursively factorizes a panel of columns using the recursive Left-looking variant of the one-dimensional algorithm. The lower triangular N0-by-N0 upper block of the panel is stored in no-transpose form (i.e. just like the input matrix itself). Bi-directional exchange is used to perform the swap::broadcast operations at once for one column in the panel. This results in a lower number of slightly larger messages than usual. On P processes and assuming bi-directional links, the running time of this function can be approximated by (when N is equal to N0): N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) + N0^2 * ( M - N0/3 ) * gam2-3 where M is the local number of rows of the panel, lat and bdwth are the latency and bandwidth of the network for double precision real words, and gam2-3 is an estimate of the Level 2 and Level 3 BLAS rate of execution. The recursive algorithm allows indeed to almost achieve Level 3 BLAS performance in the panel factorization. On a large number of modern machines, this operation is however latency bound, meaning that its cost can be estimated by only the latency portion N0 * log_2(P) * lat. Mono-directional links will double this communication cost.

Arguments

PANEL   (local input/output)          HPL_T_panel *
        On entry,  PANEL  points to the data structure containing the
        panel information.
M       (local input)                 const int
        On entry,  M specifies the local number of rows of sub(A).
N       (local input)                 const int
        On entry,  N specifies the local number of columns of sub(A).
ICOFF   (global input)                const int
        On entry, ICOFF specifies the row and column offset of sub(A)
        in A.
WORK    (local workspace)             double *
        On entry, WORK  is a workarray of size at least 2*(4+2*N0).

See Also

HPL_dlocmax, HPL_dlocswpN, HPL_dlocswpT, HPL_pdmxswp, HPL_pdpancrN, HPL_pdpancrT, HPL_pdpanllN, HPL_pdpanllT, HPL_pdpanrlN, HPL_pdpanrlT, HPL_pdrpancrN, HPL_pdrpancrT, HPL_pdrpanllT, HPL_pdrpanrlN, HPL_pdrpanrlT, HPL_pdfact