|
ScaLAPACK
2.0.2
ScaLAPACK: Scalable Linear Algebra PACKage
|
00001 SUBROUTINE PZUNMLQ( SIDE, TRANS, M, N, K, A, IA, JA, DESCA, TAU, 00002 $ C, IC, JC, DESCC, WORK, LWORK, INFO ) 00003 * 00004 * -- ScaLAPACK routine (version 1.7) -- 00005 * University of Tennessee, Knoxville, Oak Ridge National Laboratory, 00006 * and University of California, Berkeley. 00007 * May 25, 2001 00008 * 00009 * .. Scalar Arguments .. 00010 CHARACTER SIDE, TRANS 00011 INTEGER IA, IC, INFO, JA, JC, K, LWORK, M, N 00012 * .. 00013 * .. Array Arguments .. 00014 INTEGER DESCA( * ), DESCC( * ) 00015 COMPLEX*16 A( * ), C( * ), TAU( * ), WORK( * ) 00016 * .. 00017 * 00018 * Purpose 00019 * ======= 00020 * 00021 * PZUNMLQ overwrites the general complex M-by-N distributed matrix 00022 * sub( C ) = C(IC:IC+M-1,JC:JC+N-1) with 00023 * 00024 * SIDE = 'L' SIDE = 'R' 00025 * TRANS = 'N': Q * sub( C ) sub( C ) * Q 00026 * TRANS = 'C': Q**H * sub( C sub( C ) * Q**H 00027 * 00028 * where Q is a complex unitary distributed matrix defined as the 00029 * product of K elementary reflectors 00030 * 00031 * Q = H(k)' . . . H(2)' H(1)' 00032 * 00033 * as returned by PZGELQF. Q is of order M if SIDE = 'L' and of order N 00034 * if SIDE = 'R'. 00035 * 00036 * Notes 00037 * ===== 00038 * 00039 * Each global data object is described by an associated description 00040 * vector. This vector stores the information required to establish 00041 * the mapping between an object element and its corresponding process 00042 * and memory location. 00043 * 00044 * Let A be a generic term for any 2D block cyclicly distributed array. 00045 * Such a global array has an associated description vector DESCA. 00046 * In the following comments, the character _ should be read as 00047 * "of the global array". 00048 * 00049 * NOTATION STORED IN EXPLANATION 00050 * --------------- -------------- -------------------------------------- 00051 * DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 00052 * DTYPE_A = 1. 00053 * CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 00054 * the BLACS process grid A is distribu- 00055 * ted over. The context itself is glo- 00056 * bal, but the handle (the integer 00057 * value) may vary. 00058 * M_A (global) DESCA( M_ ) The number of rows in the global 00059 * array A. 00060 * N_A (global) DESCA( N_ ) The number of columns in the global 00061 * array A. 00062 * MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 00063 * the rows of the array. 00064 * NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 00065 * the columns of the array. 00066 * RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 00067 * row of the array A is distributed. 00068 * CSRC_A (global) DESCA( CSRC_ ) The process column over which the 00069 * first column of the array A is 00070 * distributed. 00071 * LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 00072 * array. LLD_A >= MAX(1,LOCr(M_A)). 00073 * 00074 * Let K be the number of rows or columns of a distributed matrix, 00075 * and assume that its process grid has dimension p x q. 00076 * LOCr( K ) denotes the number of elements of K that a process 00077 * would receive if K were distributed over the p processes of its 00078 * process column. 00079 * Similarly, LOCc( K ) denotes the number of elements of K that a 00080 * process would receive if K were distributed over the q processes of 00081 * its process row. 00082 * The values of LOCr() and LOCc() may be determined via a call to the 00083 * ScaLAPACK tool function, NUMROC: 00084 * LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 00085 * LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 00086 * An upper bound for these quantities may be computed by: 00087 * LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 00088 * LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 00089 * 00090 * Arguments 00091 * ========= 00092 * 00093 * SIDE (global input) CHARACTER 00094 * = 'L': apply Q or Q**H from the Left; 00095 * = 'R': apply Q or Q**H from the Right. 00096 * 00097 * TRANS (global input) CHARACTER 00098 * = 'N': No transpose, apply Q; 00099 * = 'C': Conjugate transpose, apply Q**H. 00100 * 00101 * M (global input) INTEGER 00102 * The number of rows to be operated on i.e the number of rows 00103 * of the distributed submatrix sub( C ). M >= 0. 00104 * 00105 * N (global input) INTEGER 00106 * The number of columns to be operated on i.e the number of 00107 * columns of the distributed submatrix sub( C ). N >= 0. 00108 * 00109 * K (global input) INTEGER 00110 * The number of elementary reflectors whose product defines the 00111 * matrix Q. If SIDE = 'L', M >= K >= 0, if SIDE = 'R', 00112 * N >= K >= 0. 00113 * 00114 * A (local input) COMPLEX*16 pointer into the local memory 00115 * to an array of dimension (LLD_A,LOCc(JA+M-1)) if SIDE='L', 00116 * and (LLD_A,LOCc(JA+N-1)) if SIDE='R', where 00117 * LLD_A >= max(1,LOCr(IA+K-1)); On entry, the i-th row must 00118 * contain the vector which defines the elementary reflector 00119 * H(i), IA <= i <= IA+K-1, as returned by PZGELQF in the 00120 * K rows of its distributed matrix argument A(IA:IA+K-1,JA:*). 00121 * A(IA:IA+K-1,JA:*) is modified by the routine but restored on 00122 * exit. 00123 * 00124 * IA (global input) INTEGER 00125 * The row index in the global array A indicating the first 00126 * row of sub( A ). 00127 * 00128 * JA (global input) INTEGER 00129 * The column index in the global array A indicating the 00130 * first column of sub( A ). 00131 * 00132 * DESCA (global and local input) INTEGER array of dimension DLEN_. 00133 * The array descriptor for the distributed matrix A. 00134 * 00135 * TAU (local input) COMPLEX*16, array, dimension LOCc(IA+K-1). 00136 * This array contains the scalar factors TAU(i) of the 00137 * elementary reflectors H(i) as returned by PZGELQF. 00138 * TAU is tied to the distributed matrix A. 00139 * 00140 * C (local input/local output) COMPLEX*16 pointer into the 00141 * local memory to an array of dimension (LLD_C,LOCc(JC+N-1)). 00142 * On entry, the local pieces of the distributed matrix sub(C). 00143 * On exit, sub( C ) is overwritten by Q*sub( C ) or Q'*sub( C ) 00144 * or sub( C )*Q' or sub( C )*Q. 00145 * 00146 * IC (global input) INTEGER 00147 * The row index in the global array C indicating the first 00148 * row of sub( C ). 00149 * 00150 * JC (global input) INTEGER 00151 * The column index in the global array C indicating the 00152 * first column of sub( C ). 00153 * 00154 * DESCC (global and local input) INTEGER array of dimension DLEN_. 00155 * The array descriptor for the distributed matrix C. 00156 * 00157 * WORK (local workspace/local output) COMPLEX*16 array, 00158 * dimension (LWORK) 00159 * On exit, WORK(1) returns the minimal and optimal LWORK. 00160 * 00161 * LWORK (local or global input) INTEGER 00162 * The dimension of the array WORK. 00163 * LWORK is local input and must be at least 00164 * if SIDE = 'L', 00165 * LWORK >= MAX( (MB_A*(MB_A-1))/2, ( MpC0 + MAX( MqA0 + 00166 * NUMROC( NUMROC( M+IROFFC, MB_A, 0, 0, NPROW ), 00167 * MB_A, 0, 0, LCMP ), NqC0 ) )*MB_A ) + 00168 * MB_A * MB_A 00169 * else if SIDE = 'R', 00170 * LWORK >= MAX( (MB_A*(MB_A-1))/2, (MpC0 + NqC0)*MB_A ) + 00171 * MB_A * MB_A 00172 * end if 00173 * 00174 * where LCMP = LCM / NPROW with LCM = ICLM( NPROW, NPCOL ), 00175 * 00176 * IROFFA = MOD( IA-1, MB_A ), ICOFFA = MOD( JA-1, NB_A ), 00177 * IACOL = INDXG2P( JA, NB_A, MYCOL, CSRC_A, NPCOL ), 00178 * MqA0 = NUMROC( M+ICOFFA, NB_A, MYCOL, IACOL, NPCOL ), 00179 * 00180 * IROFFC = MOD( IC-1, MB_C ), ICOFFC = MOD( JC-1, NB_C ), 00181 * ICROW = INDXG2P( IC, MB_C, MYROW, RSRC_C, NPROW ), 00182 * ICCOL = INDXG2P( JC, NB_C, MYCOL, CSRC_C, NPCOL ), 00183 * MpC0 = NUMROC( M+IROFFC, MB_C, MYROW, ICROW, NPROW ), 00184 * NqC0 = NUMROC( N+ICOFFC, NB_C, MYCOL, ICCOL, NPCOL ), 00185 * 00186 * ILCM, INDXG2P and NUMROC are ScaLAPACK tool functions; 00187 * MYROW, MYCOL, NPROW and NPCOL can be determined by calling 00188 * the subroutine BLACS_GRIDINFO. 00189 * 00190 * If LWORK = -1, then LWORK is global input and a workspace 00191 * query is assumed; the routine only calculates the minimum 00192 * and optimal size for all work arrays. Each of these 00193 * values is returned in the first entry of the corresponding 00194 * work array, and no error message is issued by PXERBLA. 00195 * 00196 * 00197 * INFO (global output) INTEGER 00198 * = 0: successful exit 00199 * < 0: If the i-th argument is an array and the j-entry had 00200 * an illegal value, then INFO = -(i*100+j), if the i-th 00201 * argument is a scalar and had an illegal value, then 00202 * INFO = -i. 00203 * 00204 * Alignment requirements 00205 * ====================== 00206 * 00207 * The distributed submatrices A(IA:*, JA:*) and C(IC:IC+M-1,JC:JC+N-1) 00208 * must verify some alignment properties, namely the following 00209 * expressions should be true: 00210 * 00211 * If SIDE = 'L', 00212 * ( NB_A.EQ.MB_C .AND. ICOFFA.EQ.IROFFC ) 00213 * If SIDE = 'R', 00214 * ( NB_A.EQ.NB_C .AND. ICOFFA.EQ.ICOFFC .AND. IACOL.EQ.ICCOL ) 00215 * 00216 * ===================================================================== 00217 * 00218 * .. Parameters .. 00219 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 00220 $ LLD_, MB_, M_, NB_, N_, RSRC_ 00221 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 00222 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 00223 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 00224 * .. 00225 * .. Local Scalars .. 00226 LOGICAL LEFT, LQUERY, NOTRAN 00227 CHARACTER COLBTOP, ROWBTOP, TRANST 00228 INTEGER I, I1, I2, I3, IACOL, IB, ICC, ICCOL, ICOFFA, 00229 $ ICOFFC, ICROW, ICTXT, IINFO, IPW, IROFFC, JCC, 00230 $ LCM, LCMP, LWMIN, MI, MPC0, MQA0, MYCOL, MYROW, 00231 $ NI, NPCOL, NPROW, NQ, NQC0 00232 * .. 00233 * .. Local Arrays .. 00234 INTEGER IDUM1( 4 ), IDUM2( 4 ) 00235 * .. 00236 * .. External Subroutines .. 00237 EXTERNAL BLACS_GRIDINFO, CHK1MAT, PCHK2MAT, PB_TOPGET, 00238 $ PB_TOPSET, PXERBLA, PZLARFB, PZLARFT, 00239 $ PZUNML2 00240 * .. 00241 * .. External Functions .. 00242 LOGICAL LSAME 00243 INTEGER ICEIL, ILCM, INDXG2P, NUMROC 00244 EXTERNAL ICEIL, ILCM, INDXG2P, LSAME, NUMROC 00245 * .. 00246 * .. Intrinsic Functions .. 00247 INTRINSIC DBLE, DCMPLX, ICHAR, MAX, MIN, MOD 00248 * .. 00249 * .. Executable Statements .. 00250 * 00251 * Get grid parameters 00252 * 00253 ICTXT = DESCA( CTXT_ ) 00254 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 00255 * 00256 * Test the input parameters 00257 * 00258 INFO = 0 00259 IF( NPROW.EQ.-1 ) THEN 00260 INFO = -(900+CTXT_) 00261 ELSE 00262 LEFT = LSAME( SIDE, 'L' ) 00263 NOTRAN = LSAME( TRANS, 'N' ) 00264 * 00265 * NQ is the order of Q 00266 * 00267 IF( LEFT ) THEN 00268 NQ = M 00269 CALL CHK1MAT( K, 5, M, 3, IA, JA, DESCA, 9, INFO ) 00270 ELSE 00271 NQ = N 00272 CALL CHK1MAT( K, 5, N, 4, IA, JA, DESCA, 9, INFO ) 00273 END IF 00274 CALL CHK1MAT( M, 3, N, 4, IC, JC, DESCC, 14, INFO ) 00275 IF( INFO.EQ.0 ) THEN 00276 ICOFFA = MOD( JA-1, DESCA( NB_ ) ) 00277 IROFFC = MOD( IC-1, DESCC( MB_ ) ) 00278 ICOFFC = MOD( JC-1, DESCC( NB_ ) ) 00279 IACOL = INDXG2P( JA, DESCA( NB_ ), MYCOL, DESCA( CSRC_ ), 00280 $ NPCOL ) 00281 ICROW = INDXG2P( IC, DESCC( MB_ ), MYROW, DESCC( RSRC_ ), 00282 $ NPROW ) 00283 ICCOL = INDXG2P( JC, DESCC( NB_ ), MYCOL, DESCC( CSRC_ ), 00284 $ NPCOL ) 00285 MPC0 = NUMROC( M+IROFFC, DESCC( MB_ ), MYROW, ICROW, NPROW ) 00286 NQC0 = NUMROC( N+ICOFFC, DESCC( NB_ ), MYCOL, ICCOL, NPCOL ) 00287 * 00288 IF( LEFT ) THEN 00289 MQA0 = NUMROC( M+ICOFFA, DESCA( NB_ ), MYCOL, IACOL, 00290 $ NPCOL ) 00291 LCM = ILCM( NPROW, NPCOL ) 00292 LCMP = LCM / NPROW 00293 LWMIN = MAX( ( DESCA( MB_ ) * ( DESCA( MB_ ) - 1 ) ) 00294 $ / 2, ( MPC0 + MAX( MQA0 + NUMROC( NUMROC( 00295 $ M+IROFFC, DESCA( MB_ ), 0, 0, NPROW ), 00296 $ DESCA( MB_ ), 0, 0, LCMP ), NQC0 ) ) * 00297 $ DESCA( MB_ ) ) + DESCA( MB_ ) * DESCA( MB_ ) 00298 ELSE 00299 LWMIN = MAX( ( DESCA( MB_ ) * ( DESCA( MB_ ) - 1 ) ) / 2, 00300 $ ( MPC0 + NQC0 ) * DESCA( MB_ ) ) + 00301 $ DESCA( MB_ ) * DESCA( MB_ ) 00302 END IF 00303 * 00304 WORK( 1 ) = DCMPLX( DBLE( LWMIN ) ) 00305 LQUERY = ( LWORK.EQ.-1 ) 00306 IF( .NOT.LEFT .AND. .NOT.LSAME( SIDE, 'R' ) ) THEN 00307 INFO = -1 00308 ELSE IF( .NOT.NOTRAN .AND. .NOT.LSAME( TRANS, 'C' ) ) THEN 00309 INFO = -2 00310 ELSE IF( K.LT.0 .OR. K.GT.NQ ) THEN 00311 INFO = -5 00312 ELSE IF( LEFT .AND. DESCA( NB_ ).NE.DESCC( MB_ ) ) THEN 00313 INFO = -(900+NB_) 00314 ELSE IF( LEFT .AND. ICOFFA.NE.IROFFC ) THEN 00315 INFO = -12 00316 ELSE IF( .NOT.LEFT .AND. ICOFFA.NE.ICOFFC ) THEN 00317 INFO = -13 00318 ELSE IF( .NOT.LEFT .AND. IACOL.NE.ICCOL ) THEN 00319 INFO = -13 00320 ELSE IF( .NOT.LEFT .AND. DESCA( NB_ ).NE.DESCC( NB_ ) ) THEN 00321 INFO = -(1400+NB_) 00322 ELSE IF( ICTXT.NE.DESCC( CTXT_ ) ) THEN 00323 INFO = -(1400+CTXT_) 00324 ELSE IF( LWORK.LT.LWMIN .AND. .NOT.LQUERY ) THEN 00325 INFO = -16 00326 END IF 00327 END IF 00328 IF( LEFT ) THEN 00329 IDUM1( 1 ) = ICHAR( 'L' ) 00330 ELSE 00331 IDUM1( 1 ) = ICHAR( 'R' ) 00332 END IF 00333 IDUM2( 1 ) = 1 00334 IF( NOTRAN ) THEN 00335 IDUM1( 2 ) = ICHAR( 'N' ) 00336 ELSE 00337 IDUM1( 2 ) = ICHAR( 'C' ) 00338 END IF 00339 IDUM2( 2 ) = 2 00340 IDUM1( 3 ) = K 00341 IDUM2( 3 ) = 5 00342 IF( LWORK.EQ.-1 ) THEN 00343 IDUM1( 4 ) = -1 00344 ELSE 00345 IDUM1( 4 ) = 1 00346 END IF 00347 IDUM2( 4 ) = 16 00348 IF( LEFT ) THEN 00349 CALL PCHK2MAT( K, 5, M, 3, IA, JA, DESCA, 9, M, 3, N, 4, IC, 00350 $ JC, DESCC, 14, 4, IDUM1, IDUM2, INFO ) 00351 ELSE 00352 CALL PCHK2MAT( K, 5, N, 4, IA, JA, DESCA, 9, M, 3, N, 4, IC, 00353 $ JC, DESCC, 14, 4, IDUM1, IDUM2, INFO ) 00354 END IF 00355 END IF 00356 * 00357 IF( INFO.NE.0 ) THEN 00358 CALL PXERBLA( ICTXT, 'PZUNMLQ', -INFO ) 00359 RETURN 00360 ELSE IF( LQUERY ) THEN 00361 RETURN 00362 END IF 00363 * 00364 * Quick return if possible 00365 * 00366 IF( M.EQ.0 .OR. N.EQ.0 .OR. K.EQ.0 ) 00367 $ RETURN 00368 * 00369 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 00370 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 00371 * 00372 IF( ( LEFT .AND. NOTRAN ) .OR. 00373 $ ( .NOT.LEFT .AND. .NOT.NOTRAN ) ) THEN 00374 I1 = MIN( ICEIL( IA, DESCA( MB_ ) ) * DESCA( MB_ ), IA+K-1 ) 00375 $ + 1 00376 I2 = IA + K - 1 00377 I3 = DESCA( MB_ ) 00378 ELSE 00379 I1 = MAX( ( (IA+K-2) / DESCA( MB_ ) ) * DESCA( MB_ ) + 1, IA ) 00380 I2 = MIN( ICEIL( IA, DESCA( MB_ ) ) * DESCA( MB_ ), IA+K-1 ) 00381 $ + 1 00382 I3 = -DESCA( MB_ ) 00383 END IF 00384 * 00385 IF( LEFT ) THEN 00386 NI = N 00387 JCC = JC 00388 ELSE 00389 MI = M 00390 ICC = IC 00391 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ' ' ) 00392 IF( NOTRAN ) THEN 00393 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', 'D-ring' ) 00394 ELSE 00395 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', 'I-ring' ) 00396 END IF 00397 END IF 00398 * 00399 IF( NOTRAN ) THEN 00400 TRANST = 'C' 00401 ELSE 00402 TRANST = 'N' 00403 END IF 00404 * 00405 IF( ( LEFT .AND. NOTRAN ) .OR. ( .NOT.LEFT .AND. .NOT.NOTRAN ) ) 00406 $ CALL PZUNML2( SIDE, TRANS, M, N, I1-IA, A, IA, JA, DESCA, TAU, 00407 $ C, IC, JC, DESCC, WORK, LWORK, IINFO ) 00408 * 00409 IPW = DESCA( MB_ ) * DESCA( MB_ ) + 1 00410 DO 10 I = I1, I2, I3 00411 IB = MIN( DESCA( MB_ ), K-I+IA ) 00412 * 00413 * Form the triangular factor of the block reflector 00414 * H = H(i) H(i+1) . . . H(i+ib-1) 00415 * 00416 CALL PZLARFT( 'Forward', 'Rowwise', NQ-I+IA, IB, A, I, JA+I-IA, 00417 $ DESCA, TAU, WORK, WORK( IPW ) ) 00418 IF( LEFT ) THEN 00419 * 00420 * H or H' is applied to C(ic+i-ia:ic+m-1,jc:jc+n-1) 00421 * 00422 MI = M - I + IA 00423 ICC = IC + I - IA 00424 ELSE 00425 * 00426 * H or H' is applied to C(ic:ic+m-1,jc+i-ia:jc+n-1) 00427 * 00428 NI = N - I + IA 00429 JCC = JC + I - IA 00430 END IF 00431 * 00432 * Apply H or H' 00433 * 00434 CALL PZLARFB( SIDE, TRANST, 'Forward', 'Rowwise', MI, NI, IB, 00435 $ A, I, JA+I-IA, DESCA, WORK, C, ICC, JCC, DESCC, 00436 $ WORK( IPW ) ) 00437 10 CONTINUE 00438 * 00439 IF( ( LEFT .AND. .NOT.NOTRAN ) .OR. ( .NOT.LEFT .AND. NOTRAN ) ) 00440 $ CALL PZUNML2( SIDE, TRANS, M, N, I2-IA, A, IA, JA, DESCA, TAU, 00441 $ C, IC, JC, DESCC, WORK, LWORK, IINFO ) 00442 * 00443 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 00444 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 00445 * 00446 WORK( 1 ) = DCMPLX( DBLE( LWMIN ) ) 00447 * 00448 RETURN 00449 * 00450 * End of PZUNMLQ 00451 * 00452 END