|
ScaLAPACK
2.0.2
ScaLAPACK: Scalable Linear Algebra PACKage
|
00001 SUBROUTINE PDLARF( SIDE, M, N, V, IV, JV, DESCV, INCV, TAU, 00002 $ C, IC, JC, DESCC, WORK ) 00003 * 00004 * -- ScaLAPACK auxiliary routine (version 1.7) -- 00005 * University of Tennessee, Knoxville, Oak Ridge National Laboratory, 00006 * and University of California, Berkeley. 00007 * May 25, 2001 00008 * 00009 * .. Scalar Arguments .. 00010 CHARACTER SIDE 00011 INTEGER IC, INCV, IV, JC, JV, M, N 00012 * .. 00013 * .. Array Arguments .. 00014 INTEGER DESCC( * ), DESCV( * ) 00015 DOUBLE PRECISION C( * ), TAU( * ), V( * ), WORK( * ) 00016 * .. 00017 * 00018 * Purpose 00019 * ======= 00020 * 00021 * PDLARF applies a real elementary reflector Q (or Q**T) to a real 00022 * M-by-N distributed matrix sub( C ) = C(IC:IC+M-1,JC:JC+N-1), from 00023 * either the left or the right. Q is represented in the form 00024 * 00025 * Q = I - tau * v * v' 00026 * 00027 * where tau is a real scalar and v is a real vector. 00028 * 00029 * If tau = 0, then Q is taken to be the unit matrix. 00030 * 00031 * Notes 00032 * ===== 00033 * 00034 * Each global data object is described by an associated description 00035 * vector. This vector stores the information required to establish 00036 * the mapping between an object element and its corresponding process 00037 * and memory location. 00038 * 00039 * Let A be a generic term for any 2D block cyclicly distributed array. 00040 * Such a global array has an associated description vector DESCA. 00041 * In the following comments, the character _ should be read as 00042 * "of the global array". 00043 * 00044 * NOTATION STORED IN EXPLANATION 00045 * --------------- -------------- -------------------------------------- 00046 * DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 00047 * DTYPE_A = 1. 00048 * CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 00049 * the BLACS process grid A is distribu- 00050 * ted over. The context itself is glo- 00051 * bal, but the handle (the integer 00052 * value) may vary. 00053 * M_A (global) DESCA( M_ ) The number of rows in the global 00054 * array A. 00055 * N_A (global) DESCA( N_ ) The number of columns in the global 00056 * array A. 00057 * MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 00058 * the rows of the array. 00059 * NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 00060 * the columns of the array. 00061 * RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 00062 * row of the array A is distributed. 00063 * CSRC_A (global) DESCA( CSRC_ ) The process column over which the 00064 * first column of the array A is 00065 * distributed. 00066 * LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 00067 * array. LLD_A >= MAX(1,LOCr(M_A)). 00068 * 00069 * Let K be the number of rows or columns of a distributed matrix, 00070 * and assume that its process grid has dimension p x q. 00071 * LOCr( K ) denotes the number of elements of K that a process 00072 * would receive if K were distributed over the p processes of its 00073 * process column. 00074 * Similarly, LOCc( K ) denotes the number of elements of K that a 00075 * process would receive if K were distributed over the q processes of 00076 * its process row. 00077 * The values of LOCr() and LOCc() may be determined via a call to the 00078 * ScaLAPACK tool function, NUMROC: 00079 * LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 00080 * LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 00081 * An upper bound for these quantities may be computed by: 00082 * LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 00083 * LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 00084 * 00085 * Because vectors may be viewed as a subclass of matrices, a 00086 * distributed vector is considered to be a distributed matrix. 00087 * 00088 * Restrictions 00089 * ============ 00090 * 00091 * If SIDE = 'Left' and INCV = 1, then the row process having the first 00092 * entry V(IV,JV) must also have the first row of sub( C ). Moreover, 00093 * MOD(IV-1,MB_V) must be equal to MOD(IC-1,MB_C), if INCV=M_V, only 00094 * the last equality must be satisfied. 00095 * 00096 * If SIDE = 'Right' and INCV = M_V then the column process having the 00097 * first entry V(IV,JV) must also have the first column of sub( C ) and 00098 * MOD(JV-1,NB_V) must be equal to MOD(JC-1,NB_C), if INCV = 1 only the 00099 * last equality must be satisfied. 00100 * 00101 * Arguments 00102 * ========= 00103 * 00104 * SIDE (global input) CHARACTER 00105 * = 'L': form Q * sub( C ), 00106 * = 'R': form sub( C ) * Q, Q = Q**T. 00107 * 00108 * M (global input) INTEGER 00109 * The number of rows to be operated on i.e the number of rows 00110 * of the distributed submatrix sub( C ). M >= 0. 00111 * 00112 * N (global input) INTEGER 00113 * The number of columns to be operated on i.e the number of 00114 * columns of the distributed submatrix sub( C ). N >= 0. 00115 * 00116 * V (local input) DOUBLE PRECISION pointer into the local memory 00117 * to an array of dimension (LLD_V,*) containing the local 00118 * pieces of the distributed vectors V representing the 00119 * Householder transformation Q, 00120 * V(IV:IV+M-1,JV) if SIDE = 'L' and INCV = 1, 00121 * V(IV,JV:JV+M-1) if SIDE = 'L' and INCV = M_V, 00122 * V(IV:IV+N-1,JV) if SIDE = 'R' and INCV = 1, 00123 * V(IV,JV:JV+N-1) if SIDE = 'R' and INCV = M_V, 00124 * 00125 * The vector v in the representation of Q. V is not used if 00126 * TAU = 0. 00127 * 00128 * IV (global input) INTEGER 00129 * The row index in the global array V indicating the first 00130 * row of sub( V ). 00131 * 00132 * JV (global input) INTEGER 00133 * The column index in the global array V indicating the 00134 * first column of sub( V ). 00135 * 00136 * DESCV (global and local input) INTEGER array of dimension DLEN_. 00137 * The array descriptor for the distributed matrix V. 00138 * 00139 * INCV (global input) INTEGER 00140 * The global increment for the elements of V. Only two values 00141 * of INCV are supported in this version, namely 1 and M_V. 00142 * INCV must not be zero. 00143 * 00144 * TAU (local input) DOUBLE PRECISION array, dimension LOCc(JV) if 00145 * INCV = 1, and LOCr(IV) otherwise. This array contains the 00146 * Householder scalars related to the Householder vectors. 00147 * TAU is tied to the distributed matrix V. 00148 * 00149 * C (local input/local output) DOUBLE PRECISION pointer into the 00150 * local memory to an array of dimension (LLD_C, LOCc(JC+N-1) ), 00151 * containing the local pieces of sub( C ). On exit, sub( C ) 00152 * is overwritten by the Q * sub( C ) if SIDE = 'L', or 00153 * sub( C ) * Q if SIDE = 'R'. 00154 * 00155 * IC (global input) INTEGER 00156 * The row index in the global array C indicating the first 00157 * row of sub( C ). 00158 * 00159 * JC (global input) INTEGER 00160 * The column index in the global array C indicating the 00161 * first column of sub( C ). 00162 * 00163 * DESCC (global and local input) INTEGER array of dimension DLEN_. 00164 * The array descriptor for the distributed matrix C. 00165 * 00166 * WORK (local workspace) DOUBLE PRECISION array, dimension (LWORK) 00167 * If INCV = 1, 00168 * if SIDE = 'L', 00169 * if IVCOL = ICCOL, 00170 * LWORK >= NqC0 00171 * else 00172 * LWORK >= MpC0 + MAX( 1, NqC0 ) 00173 * end if 00174 * else if SIDE = 'R', 00175 * LWORK >= NqC0 + MAX( MAX( 1, MpC0 ), NUMROC( NUMROC( 00176 * N+ICOFFC,NB_V,0,0,NPCOL ),NB_V,0,0,LCMQ ) ) 00177 * end if 00178 * else if INCV = M_V, 00179 * if SIDE = 'L', 00180 * LWORK >= MpC0 + MAX( MAX( 1, NqC0 ), NUMROC( NUMROC( 00181 * M+IROFFC,MB_V,0,0,NPROW ),MB_V,0,0,LCMP ) ) 00182 * else if SIDE = 'R', 00183 * if IVROW = ICROW, 00184 * LWORK >= MpC0 00185 * else 00186 * LWORK >= NqC0 + MAX( 1, MpC0 ) 00187 * end if 00188 * end if 00189 * end if 00190 * 00191 * where LCM is the least common multiple of NPROW and NPCOL and 00192 * LCM = ILCM( NPROW, NPCOL ), LCMP = LCM / NPROW, 00193 * LCMQ = LCM / NPCOL, 00194 * 00195 * IROFFC = MOD( IC-1, MB_C ), ICOFFC = MOD( JC-1, NB_C ), 00196 * ICROW = INDXG2P( IC, MB_C, MYROW, RSRC_C, NPROW ), 00197 * ICCOL = INDXG2P( JC, NB_C, MYCOL, CSRC_C, NPCOL ), 00198 * MpC0 = NUMROC( M+IROFFC, MB_C, MYROW, ICROW, NPROW ), 00199 * NqC0 = NUMROC( N+ICOFFC, NB_C, MYCOL, ICCOL, NPCOL ), 00200 * 00201 * ILCM, INDXG2P and NUMROC are ScaLAPACK tool functions; 00202 * MYROW, MYCOL, NPROW and NPCOL can be determined by calling 00203 * the subroutine BLACS_GRIDINFO. 00204 * 00205 * Alignment requirements 00206 * ====================== 00207 * 00208 * The distributed submatrices V(IV:*, JV:*) and C(IC:IC+M-1,JC:JC+N-1) 00209 * must verify some alignment properties, namely the following 00210 * expressions should be true: 00211 * 00212 * MB_V = NB_V, 00213 * 00214 * If INCV = 1, 00215 * If SIDE = 'Left', 00216 * ( MB_V.EQ.MB_C .AND. IROFFV.EQ.IROFFC .AND. IVROW.EQ.ICROW ) 00217 * If SIDE = 'Right', 00218 * ( MB_V.EQ.NB_A .AND. MB_V.EQ.NB_C .AND. IROFFV.EQ.ICOFFC ) 00219 * else if INCV = M_V, 00220 * If SIDE = 'Left', 00221 * ( MB_V.EQ.NB_V .AND. MB_V.EQ.MB_C .AND. ICOFFV.EQ.IROFFC ) 00222 * If SIDE = 'Right', 00223 * ( NB_V.EQ.NB_C .AND. ICOFFV.EQ.ICOFFC .AND. IVCOL.EQ.ICCOL ) 00224 * end if 00225 * 00226 * ===================================================================== 00227 * 00228 * .. Parameters .. 00229 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 00230 $ LLD_, MB_, M_, NB_, N_, RSRC_ 00231 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 00232 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 00233 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 00234 DOUBLE PRECISION ONE, ZERO 00235 PARAMETER ( ONE = 1.0D+0, ZERO = 0.0D+0 ) 00236 * .. 00237 * .. Local Scalars .. 00238 LOGICAL CCBLCK, CRBLCK 00239 CHARACTER COLBTOP, ROWBTOP 00240 INTEGER ICCOL, ICOFF, ICROW, ICTXT, IIC, IIV, IOFFC, 00241 $ IOFFV, IPW, IROFF, IVCOL, IVROW, JJC, JJV, LDC, 00242 $ LDV, MYCOL, MYROW, MP, NCC, NCV, NPCOL, NPROW, 00243 $ NQ, RDEST 00244 DOUBLE PRECISION TAULOC 00245 * .. 00246 * .. External Subroutines .. 00247 EXTERNAL BLACS_GRIDINFO, DCOPY, DGEBR2D, DGEBS2D, 00248 $ DGEMV, DGER, DGERV2D, DGESD2D, 00249 $ DGSUM2D, DLASET, INFOG2L, PB_TOPGET, 00250 $ PBDTRNV 00251 * .. 00252 * .. External Functions .. 00253 LOGICAL LSAME 00254 INTEGER NUMROC 00255 EXTERNAL LSAME, NUMROC 00256 * .. 00257 * .. Intrinsic Functions .. 00258 INTRINSIC MIN, MOD 00259 * .. 00260 * .. Executable Statements .. 00261 * 00262 * Quick return if possible 00263 * 00264 IF( M.LE.0 .OR. N.LE.0 ) 00265 $ RETURN 00266 * 00267 * Get grid parameters. 00268 * 00269 ICTXT = DESCC( CTXT_ ) 00270 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 00271 * 00272 * Figure local indexes 00273 * 00274 CALL INFOG2L( IC, JC, DESCC, NPROW, NPCOL, MYROW, MYCOL, IIC, JJC, 00275 $ ICROW, ICCOL ) 00276 CALL INFOG2L( IV, JV, DESCV, NPROW, NPCOL, MYROW, MYCOL, IIV, JJV, 00277 $ IVROW, IVCOL ) 00278 NCC = NUMROC( DESCC( N_ ), DESCC( NB_ ), MYCOL, DESCC( CSRC_ ), 00279 $ NPCOL ) 00280 NCV = NUMROC( DESCV( N_ ), DESCV( NB_ ), MYCOL, DESCV( CSRC_ ), 00281 $ NPCOL ) 00282 LDC = DESCC( LLD_ ) 00283 LDV = DESCV( LLD_ ) 00284 IIC = MIN( IIC, LDC ) 00285 IIV = MIN( IIV, LDV ) 00286 JJC = MIN( JJC, NCC ) 00287 JJV = MIN( JJV, NCV ) 00288 IOFFC = IIC+(JJC-1)*LDC 00289 IOFFV = IIV+(JJV-1)*LDV 00290 * 00291 IROFF = MOD( IC-1, DESCC( MB_ ) ) 00292 ICOFF = MOD( JC-1, DESCC( NB_ ) ) 00293 MP = NUMROC( M+IROFF, DESCC( MB_ ), MYROW, ICROW, NPROW ) 00294 NQ = NUMROC( N+ICOFF, DESCC( NB_ ), MYCOL, ICCOL, NPCOL ) 00295 IF( MYROW.EQ.ICROW ) 00296 $ MP = MP - IROFF 00297 IF( MYCOL.EQ.ICCOL ) 00298 $ NQ = NQ - ICOFF 00299 * 00300 * Is sub( C ) only distributed over a process row ? 00301 * 00302 CRBLCK = ( M.LE.(DESCC( MB_ )-IROFF) ) 00303 * 00304 * Is sub( C ) only distributed over a process column ? 00305 * 00306 CCBLCK = ( N.LE.(DESCC( NB_ )-ICOFF) ) 00307 * 00308 IF( LSAME( SIDE, 'L' ) ) THEN 00309 * 00310 IF( CRBLCK ) THEN 00311 RDEST = ICROW 00312 ELSE 00313 RDEST = -1 00314 END IF 00315 * 00316 IF( CCBLCK ) THEN 00317 * 00318 * sub( C ) is distributed over a process column 00319 * 00320 IF( DESCV( M_ ).EQ.INCV ) THEN 00321 * 00322 * Transpose row vector V 00323 * 00324 IPW = MP+1 00325 CALL PBDTRNV( ICTXT, 'Rowwise', 'Transpose', M, 00326 $ DESCV( NB_ ), IROFF, V( IOFFV ), LDV, ZERO, 00327 $ WORK, 1, IVROW, IVCOL, ICROW, ICCOL, 00328 $ WORK( IPW ) ) 00329 * 00330 * Perform the local computation within a process column 00331 * 00332 IF( MYCOL.EQ.ICCOL ) THEN 00333 * 00334 IF( MYROW.EQ.IVROW ) THEN 00335 * 00336 CALL DGEBS2D( ICTXT, 'Columnwise', ' ', 1, 1, 00337 $ TAU( IIV ), 1 ) 00338 TAULOC = TAU( IIV ) 00339 * 00340 ELSE 00341 * 00342 CALL DGEBR2D( ICTXT, 'Columnwise', ' ', 1, 1, 00343 $ TAULOC, 1, IVROW, MYCOL ) 00344 * 00345 END IF 00346 * 00347 IF( TAULOC.NE.ZERO ) THEN 00348 * 00349 * w := sub( C )' * v 00350 * 00351 IF( MP.GT.0 ) THEN 00352 CALL DGEMV( 'Transpose', MP, NQ, ONE, 00353 $ C( IOFFC ), LDC, WORK, 1, ZERO, 00354 $ WORK( IPW ), 1 ) 00355 ELSE 00356 CALL DLASET( 'All', NQ, 1, ZERO, ZERO, 00357 $ WORK( IPW ), MAX( 1, NQ ) ) 00358 END IF 00359 CALL DGSUM2D( ICTXT, 'Columnwise', ' ', NQ, 1, 00360 $ WORK( IPW ), MAX( 1, NQ ), RDEST, 00361 $ MYCOL ) 00362 * 00363 * sub( C ) := sub( C ) - v * w' 00364 * 00365 CALL DGER( MP, NQ, -TAULOC, WORK, 1, WORK( IPW ), 00366 $ 1, C( IOFFC ), LDC ) 00367 END IF 00368 * 00369 END IF 00370 * 00371 ELSE 00372 * 00373 * V is a column vector 00374 * 00375 IF( IVCOL.EQ.ICCOL ) THEN 00376 * 00377 * Perform the local computation within a process column 00378 * 00379 IF( MYCOL.EQ.ICCOL ) THEN 00380 * 00381 TAULOC = TAU( JJV ) 00382 * 00383 IF( TAULOC.NE.ZERO ) THEN 00384 * 00385 * w := sub( C )' * v 00386 * 00387 IF( MP.GT.0 ) THEN 00388 CALL DGEMV( 'Transpose', MP, NQ, ONE, 00389 $ C( IOFFC ), LDC, V( IOFFV ), 1, 00390 $ ZERO, WORK, 1 ) 00391 ELSE 00392 CALL DLASET( 'All', NQ, 1, ZERO, ZERO, 00393 $ WORK, MAX( 1, NQ ) ) 00394 END IF 00395 CALL DGSUM2D( ICTXT, 'Columnwise', ' ', NQ, 1, 00396 $ WORK, MAX( 1, NQ ), RDEST, MYCOL ) 00397 * 00398 * sub( C ) := sub( C ) - v * w' 00399 * 00400 CALL DGER( MP, NQ, -TAULOC, V( IOFFV ), 1, WORK, 00401 $ 1, C( IOFFC ), LDC ) 00402 END IF 00403 * 00404 END IF 00405 * 00406 ELSE 00407 * 00408 * Send V and TAU to the process column ICCOL 00409 * 00410 IF( MYCOL.EQ.IVCOL ) THEN 00411 * 00412 IPW = MP+1 00413 CALL DCOPY( MP, V( IOFFV ), 1, WORK, 1 ) 00414 WORK( IPW ) = TAU( JJV ) 00415 CALL DGESD2D( ICTXT, IPW, 1, WORK, IPW, MYROW, 00416 $ ICCOL ) 00417 * 00418 ELSE IF( MYCOL.EQ.ICCOL ) THEN 00419 * 00420 IPW = MP+1 00421 CALL DGERV2D( ICTXT, IPW, 1, WORK, IPW, MYROW, 00422 $ IVCOL ) 00423 TAULOC = WORK( IPW ) 00424 * 00425 IF( TAULOC.NE.ZERO ) THEN 00426 * 00427 * w := sub( C )' * v 00428 * 00429 IF( MP.GT.0 ) THEN 00430 CALL DGEMV( 'Transpose', MP, NQ, ONE, 00431 $ C( IOFFC ), LDC, WORK, 1, ZERO, 00432 $ WORK( IPW ), 1 ) 00433 ELSE 00434 CALL DLASET( 'All', NQ, 1, ZERO, ZERO, 00435 $ WORK( IPW ), MAX( 1, NQ ) ) 00436 END IF 00437 CALL DGSUM2D( ICTXT, 'Columnwise', ' ', NQ, 1, 00438 $ WORK( IPW ), MAX( 1, NQ ), RDEST, 00439 $ MYCOL ) 00440 * 00441 * sub( C ) := sub( C ) - v * w' 00442 * 00443 CALL DGER( MP, NQ, -TAULOC, WORK, 1, 00444 $ WORK( IPW ), 1, C( IOFFC ), LDC ) 00445 END IF 00446 * 00447 END IF 00448 * 00449 END IF 00450 * 00451 END IF 00452 * 00453 ELSE 00454 * 00455 * sub( C ) is a proper distributed matrix 00456 * 00457 IF( DESCV( M_ ).EQ.INCV ) THEN 00458 * 00459 * Transpose and broadcast row vector V 00460 * 00461 IPW = MP+1 00462 CALL PBDTRNV( ICTXT, 'Rowwise', 'Transpose', M, 00463 $ DESCV( NB_ ), IROFF, V( IOFFV ), LDV, ZERO, 00464 $ WORK, 1, IVROW, IVCOL, ICROW, -1, 00465 $ WORK( IPW ) ) 00466 * 00467 * Perform the local computation within a process column 00468 * 00469 IF( MYROW.EQ.IVROW ) THEN 00470 * 00471 CALL DGEBS2D( ICTXT, 'Columnwise', ' ', 1, 1, 00472 $ TAU( IIV ), 1 ) 00473 TAULOC = TAU( IIV ) 00474 * 00475 ELSE 00476 * 00477 CALL DGEBR2D( ICTXT, 'Columnwise', ' ', 1, 1, TAULOC, 00478 $ 1, IVROW, MYCOL ) 00479 * 00480 END IF 00481 * 00482 IF( TAULOC.NE.ZERO ) THEN 00483 * 00484 * w := sub( C )' * v 00485 * 00486 IF( MP.GT.0 ) THEN 00487 IF( IOFFC.GT.0 ) 00488 $ CALL DGEMV( 'Transpose', MP, NQ, ONE, 00489 $ C( IOFFC ), LDC, WORK, 1, ZERO, 00490 $ WORK( IPW ), 1 ) 00491 ELSE 00492 CALL DLASET( 'All', NQ, 1, ZERO, ZERO, 00493 $ WORK( IPW ), MAX( 1, NQ ) ) 00494 END IF 00495 CALL DGSUM2D( ICTXT, 'Columnwise', ' ', NQ, 1, 00496 $ WORK( IPW ), MAX( 1, NQ ), RDEST, 00497 $ MYCOL ) 00498 * 00499 * sub( C ) := sub( C ) - v * w' 00500 * 00501 IF( IOFFC.GT.0 ) 00502 $ CALL DGER( MP, NQ, -TAULOC, WORK, 1, WORK( IPW ), 00503 $ 1, C( IOFFC ), LDC ) 00504 END IF 00505 * 00506 ELSE 00507 * 00508 * Broadcast column vector V 00509 * 00510 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 00511 IF( MYCOL.EQ.IVCOL ) THEN 00512 * 00513 IPW = MP+1 00514 CALL DCOPY( MP, V( IOFFV ), 1, WORK, 1 ) 00515 WORK(IPW) = TAU( JJV ) 00516 CALL DGEBS2D( ICTXT, 'Rowwise', ROWBTOP, IPW, 1, 00517 $ WORK, IPW ) 00518 TAULOC = TAU( JJV ) 00519 * 00520 ELSE 00521 * 00522 IPW = MP+1 00523 CALL DGEBR2D( ICTXT, 'Rowwise', ROWBTOP, IPW, 1, WORK, 00524 $ IPW, MYROW, IVCOL ) 00525 TAULOC = WORK( IPW ) 00526 * 00527 END IF 00528 * 00529 IF( TAULOC.NE.ZERO ) THEN 00530 * 00531 * w := sub( C )' * v 00532 * 00533 IF( MP.GT.0 ) THEN 00534 IF( IOFFC.GT.0 ) 00535 $ CALL DGEMV( 'Transpose', MP, NQ, ONE, 00536 $ C( IOFFC ), LDC, WORK, 1, ZERO, 00537 $ WORK( IPW ), 1 ) 00538 ELSE 00539 CALL DLASET( 'All', NQ, 1, ZERO, ZERO, 00540 $ WORK( IPW ), MAX( 1, NQ ) ) 00541 END IF 00542 CALL DGSUM2D( ICTXT, 'Columnwise', ' ', NQ, 1, 00543 $ WORK( IPW ), MAX( 1, NQ ), RDEST, 00544 $ MYCOL ) 00545 * 00546 * sub( C ) := sub( C ) - v * w' 00547 * 00548 IF( IOFFC.GT.0 ) 00549 $ CALL DGER( MP, NQ, -TAULOC, WORK, 1, WORK( IPW ), 00550 $ 1, C( IOFFC ), LDC ) 00551 END IF 00552 * 00553 END IF 00554 * 00555 END IF 00556 * 00557 ELSE 00558 * 00559 IF( CCBLCK ) THEN 00560 RDEST = MYROW 00561 ELSE 00562 RDEST = -1 00563 END IF 00564 * 00565 IF( CRBLCK ) THEN 00566 * 00567 * sub( C ) is distributed over a process row 00568 * 00569 IF( DESCV( M_ ).EQ.INCV ) THEN 00570 * 00571 * V is a row vector 00572 * 00573 IF( IVROW.EQ.ICROW ) THEN 00574 * 00575 * Perform the local computation within a process row 00576 * 00577 IF( MYROW.EQ.ICROW ) THEN 00578 * 00579 TAULOC = TAU( IIV ) 00580 * 00581 IF( TAULOC.NE.ZERO ) THEN 00582 * 00583 * w := sub( C ) * v 00584 * 00585 IF( NQ.GT.0 ) THEN 00586 CALL DGEMV( 'No transpose', MP, NQ, ONE, 00587 $ C( IOFFC ), LDC, V( IOFFV ), LDV, 00588 $ ZERO, WORK, 1 ) 00589 ELSE 00590 CALL DLASET( 'All', MP, 1, ZERO, ZERO, 00591 $ WORK, MAX( 1, MP ) ) 00592 END IF 00593 CALL DGSUM2D( ICTXT, 'Rowwise', ' ', MP, 1, 00594 $ WORK, MAX( 1, MP ), RDEST, ICCOL ) 00595 * 00596 * sub( C ) := sub( C ) - w * v' 00597 * 00598 IF( IOFFV.GT.0 .AND. IOFFC.GT.0 ) 00599 $ CALL DGER( MP, NQ, -TAULOC, WORK, 1, 00600 $ V( IOFFV ), LDV, C( IOFFC ), LDC ) 00601 END IF 00602 * 00603 END IF 00604 * 00605 ELSE 00606 * 00607 * Send V and TAU to the process row ICROW 00608 * 00609 IF( MYROW.EQ.IVROW ) THEN 00610 * 00611 IPW = NQ+1 00612 CALL DCOPY( NQ, V( IOFFV ), LDV, WORK, 1 ) 00613 WORK(IPW) = TAU( IIV ) 00614 CALL DGESD2D( ICTXT, IPW, 1, WORK, IPW, ICROW, 00615 $ MYCOL ) 00616 * 00617 ELSE IF( MYROW.EQ.ICROW ) THEN 00618 * 00619 IPW = NQ+1 00620 CALL DGERV2D( ICTXT, IPW, 1, WORK, IPW, IVROW, 00621 $ MYCOL ) 00622 TAULOC = WORK( IPW ) 00623 * 00624 IF( TAULOC.NE.ZERO ) THEN 00625 * 00626 * w := sub( C ) * v 00627 * 00628 IF( NQ.GT.0 ) THEN 00629 CALL DGEMV( 'No transpose', MP, NQ, ONE, 00630 $ C( IOFFC ), LDC, WORK, 1, ZERO, 00631 $ WORK( IPW ), 1 ) 00632 ELSE 00633 CALL DLASET( 'All', MP, 1, ZERO, ZERO, 00634 $ WORK( IPW ), MAX( 1, MP ) ) 00635 END IF 00636 CALL DGSUM2D( ICTXT, 'Rowwise', ' ', MP, 1, 00637 $ WORK( IPW ), MAX( 1, MP ), RDEST, 00638 $ ICCOL ) 00639 * 00640 * sub( C ) := sub( C ) - w * v' 00641 * 00642 CALL DGER( MP, NQ, -TAULOC, WORK( IPW ), 1, 00643 $ WORK, 1, C( IOFFC ), LDC ) 00644 END IF 00645 * 00646 END IF 00647 * 00648 END IF 00649 * 00650 ELSE 00651 * 00652 * Transpose column vector V 00653 * 00654 IPW = NQ+1 00655 CALL PBDTRNV( ICTXT, 'Columnwise', 'Transpose', N, 00656 $ DESCV( MB_ ), ICOFF, V( IOFFV ), 1, ZERO, 00657 $ WORK, 1, IVROW, IVCOL, ICROW, ICCOL, 00658 $ WORK( IPW ) ) 00659 * 00660 * Perform the local computation within a process column 00661 * 00662 IF( MYROW.EQ.ICROW ) THEN 00663 * 00664 IF( MYCOL.EQ.IVCOL ) THEN 00665 * 00666 CALL DGEBS2D( ICTXT, 'Rowwise', ' ', 1, 1, 00667 $ TAU( JJV ), 1 ) 00668 TAULOC = TAU( JJV ) 00669 * 00670 ELSE 00671 * 00672 CALL DGEBR2D( ICTXT, 'Rowwise', ' ', 1, 1, TAULOC, 00673 $ 1, MYROW, IVCOL ) 00674 * 00675 END IF 00676 * 00677 IF( TAULOC.NE.ZERO ) THEN 00678 * 00679 * w := sub( C ) * v 00680 * 00681 IF( NQ.GT.0 ) THEN 00682 CALL DGEMV( 'No transpose', MP, NQ, ONE, 00683 $ C( IOFFC ), LDC, WORK, 1, ZERO, 00684 $ WORK( IPW ), 1 ) 00685 ELSE 00686 CALL DLASET( 'All', MP, 1, ZERO, ZERO, 00687 $ WORK( IPW ), MAX( 1, MP ) ) 00688 END IF 00689 CALL DGSUM2D( ICTXT, 'Rowwise', ' ', MP, 1, 00690 $ WORK( IPW ), MAX( 1, MP ), RDEST, 00691 $ ICCOL ) 00692 * 00693 * sub( C ) := sub( C ) - w * v' 00694 * 00695 CALL DGER( MP, NQ, -TAULOC, WORK( IPW ), 1, WORK, 00696 $ 1, C( IOFFC ), LDC ) 00697 END IF 00698 * 00699 END IF 00700 * 00701 END IF 00702 * 00703 ELSE 00704 * 00705 * sub( C ) is a proper distributed matrix 00706 * 00707 IF( DESCV( M_ ).EQ.INCV ) THEN 00708 * 00709 * Broadcast row vector V 00710 * 00711 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Columnwise', 00712 $ COLBTOP ) 00713 IF( MYROW.EQ.IVROW ) THEN 00714 * 00715 IPW = NQ+1 00716 IF( IOFFV.GT.0 ) 00717 $ CALL DCOPY( NQ, V( IOFFV ), LDV, WORK, 1 ) 00718 WORK(IPW) = TAU( IIV ) 00719 CALL DGEBS2D( ICTXT, 'Columnwise', COLBTOP, IPW, 1, 00720 $ WORK, IPW ) 00721 TAULOC = TAU( IIV ) 00722 * 00723 ELSE 00724 * 00725 IPW = NQ+1 00726 CALL DGEBR2D( ICTXT, 'Columnwise', COLBTOP, IPW, 1, 00727 $ WORK, IPW, IVROW, MYCOL ) 00728 TAULOC = WORK( IPW ) 00729 * 00730 END IF 00731 * 00732 IF( TAULOC.NE.ZERO ) THEN 00733 * 00734 * w := sub( C ) * v 00735 * 00736 IF( NQ.GT.0 ) THEN 00737 CALL DGEMV( 'No Transpose', MP, NQ, ONE, 00738 $ C( IOFFC ), LDC, WORK, 1, ZERO, 00739 $ WORK( IPW ), 1 ) 00740 ELSE 00741 CALL DLASET( 'All', MP, 1, ZERO, ZERO, 00742 $ WORK( IPW ), MAX( 1, MP ) ) 00743 END IF 00744 CALL DGSUM2D( ICTXT, 'Rowwise', ' ', MP, 1, 00745 $ WORK( IPW ), MAX( 1, MP ), RDEST, 00746 $ ICCOL ) 00747 * 00748 * sub( C ) := sub( C ) - w * v' 00749 * 00750 IF( IOFFC.GT.0 ) 00751 $ CALL DGER( MP, NQ, -TAULOC, WORK( IPW ), 1, WORK, 00752 $ 1, C( IOFFC ), LDC ) 00753 END IF 00754 * 00755 ELSE 00756 * 00757 * Transpose and broadcast column vector V 00758 * 00759 IPW = NQ+1 00760 CALL PBDTRNV( ICTXT, 'Columnwise', 'Transpose', N, 00761 $ DESCV( MB_ ), ICOFF, V( IOFFV ), 1, ZERO, 00762 $ WORK, 1, IVROW, IVCOL, -1, ICCOL, 00763 $ WORK( IPW ) ) 00764 * 00765 * Perform the local computation within a process column 00766 * 00767 IF( MYCOL.EQ.IVCOL ) THEN 00768 * 00769 CALL DGEBS2D( ICTXT, 'Rowwise', ' ', 1, 1, TAU( JJV ), 00770 $ 1 ) 00771 TAULOC = TAU( JJV ) 00772 * 00773 ELSE 00774 * 00775 CALL DGEBR2D( ICTXT, 'Rowwise', ' ', 1, 1, TAULOC, 1, 00776 $ MYROW, IVCOL ) 00777 * 00778 END IF 00779 * 00780 IF( TAULOC.NE.ZERO ) THEN 00781 * 00782 * w := sub( C ) * v 00783 * 00784 IF( NQ.GT.0 ) THEN 00785 CALL DGEMV( 'No transpose', MP, NQ, ONE, 00786 $ C( IOFFC ), LDC, WORK, 1, ZERO, 00787 $ WORK( IPW ), 1 ) 00788 ELSE 00789 CALL DLASET( 'All', MP, 1, ZERO, ZERO, WORK( IPW ), 00790 $ MAX( 1, MP ) ) 00791 END IF 00792 CALL DGSUM2D( ICTXT, 'Rowwise', ' ', MP, 1, 00793 $ WORK( IPW ), MAX( 1, MP ), RDEST, 00794 $ ICCOL ) 00795 * 00796 * sub( C ) := sub( C ) - w * v' 00797 * 00798 CALL DGER( MP, NQ, -TAULOC, WORK( IPW ), 1, WORK, 1, 00799 $ C( IOFFC ), LDC ) 00800 END IF 00801 * 00802 END IF 00803 * 00804 END IF 00805 * 00806 END IF 00807 * 00808 RETURN 00809 * 00810 * End of PDLARF 00811 * 00812 END