|
ScaLAPACK
2.0.2
ScaLAPACK: Scalable Linear Algebra PACKage
|
00001 SUBROUTINE PSTZRZF( M, N, A, IA, JA, DESCA, TAU, WORK, LWORK, 00002 $ INFO ) 00003 * 00004 * -- ScaLAPACK routine (version 1.7) -- 00005 * University of Tennessee, Knoxville, Oak Ridge National Laboratory, 00006 * and University of California, Berkeley. 00007 * May 25, 2001 00008 * 00009 * .. Scalar Arguments .. 00010 INTEGER IA, INFO, JA, LWORK, M, N 00011 * .. 00012 * .. Array Arguments .. 00013 INTEGER DESCA( * ) 00014 REAL A( * ), TAU( * ), WORK( * ) 00015 * .. 00016 * 00017 * Purpose 00018 * ======= 00019 * 00020 * PSTZRZF reduces the M-by-N ( M<=N ) real upper trapezoidal matrix 00021 * sub( A ) = A(IA:IA+M-1,JA:JA+N-1) to upper triangular form by means 00022 * of orthogonal transformations. 00023 * 00024 * The upper trapezoidal matrix sub( A ) is factored as 00025 * 00026 * sub( A ) = ( R 0 ) * Z, 00027 * 00028 * where Z is an N-by-N orthogonal matrix and R is an M-by-M upper 00029 * triangular matrix. 00030 * 00031 * Notes 00032 * ===== 00033 * 00034 * Each global data object is described by an associated description 00035 * vector. This vector stores the information required to establish 00036 * the mapping between an object element and its corresponding process 00037 * and memory location. 00038 * 00039 * Let A be a generic term for any 2D block cyclicly distributed array. 00040 * Such a global array has an associated description vector DESCA. 00041 * In the following comments, the character _ should be read as 00042 * "of the global array". 00043 * 00044 * NOTATION STORED IN EXPLANATION 00045 * --------------- -------------- -------------------------------------- 00046 * DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 00047 * DTYPE_A = 1. 00048 * CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 00049 * the BLACS process grid A is distribu- 00050 * ted over. The context itself is glo- 00051 * bal, but the handle (the integer 00052 * value) may vary. 00053 * M_A (global) DESCA( M_ ) The number of rows in the global 00054 * array A. 00055 * N_A (global) DESCA( N_ ) The number of columns in the global 00056 * array A. 00057 * MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 00058 * the rows of the array. 00059 * NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 00060 * the columns of the array. 00061 * RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 00062 * row of the array A is distributed. 00063 * CSRC_A (global) DESCA( CSRC_ ) The process column over which the 00064 * first column of the array A is 00065 * distributed. 00066 * LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 00067 * array. LLD_A >= MAX(1,LOCr(M_A)). 00068 * 00069 * Let K be the number of rows or columns of a distributed matrix, 00070 * and assume that its process grid has dimension p x q. 00071 * LOCr( K ) denotes the number of elements of K that a process 00072 * would receive if K were distributed over the p processes of its 00073 * process column. 00074 * Similarly, LOCc( K ) denotes the number of elements of K that a 00075 * process would receive if K were distributed over the q processes of 00076 * its process row. 00077 * The values of LOCr() and LOCc() may be determined via a call to the 00078 * ScaLAPACK tool function, NUMROC: 00079 * LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 00080 * LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 00081 * An upper bound for these quantities may be computed by: 00082 * LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 00083 * LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 00084 * 00085 * Arguments 00086 * ========= 00087 * 00088 * M (global input) INTEGER 00089 * The number of rows to be operated on, i.e. the number of rows 00090 * of the distributed submatrix sub( A ). M >= 0. 00091 * 00092 * N (global input) INTEGER 00093 * The number of columns to be operated on, i.e. the number of 00094 * columns of the distributed submatrix sub( A ). N >= 0. 00095 * 00096 * A (local input/local output) REAL pointer into the 00097 * local memory to an array of dimension (LLD_A, LOCc(JA+N-1)). 00098 * On entry, the local pieces of the M-by-N distributed matrix 00099 * sub( A ) which is to be factored. On exit, the leading M-by-M 00100 * upper triangular part of sub( A ) contains the upper trian- 00101 * gular matrix R, and elements M+1 to N of the first M rows of 00102 * sub( A ), with the array TAU, represent the orthogonal matrix 00103 * Z as a product of M elementary reflectors. 00104 * 00105 * IA (global input) INTEGER 00106 * The row index in the global array A indicating the first 00107 * row of sub( A ). 00108 * 00109 * JA (global input) INTEGER 00110 * The column index in the global array A indicating the 00111 * first column of sub( A ). 00112 * 00113 * DESCA (global and local input) INTEGER array of dimension DLEN_. 00114 * The array descriptor for the distributed matrix A. 00115 * 00116 * TAU (local output) REAL, array, dimension LOCr(IA+M-1) 00117 * This array contains the scalar factors of the elementary 00118 * reflectors. TAU is tied to the distributed matrix A. 00119 * 00120 * WORK (local workspace/local output) REAL array, 00121 * dimension (LWORK) 00122 * On exit, WORK(1) returns the minimal and optimal LWORK. 00123 * 00124 * LWORK (local or global input) INTEGER 00125 * The dimension of the array WORK. 00126 * LWORK is local input and must be at least 00127 * LWORK >= MB_A * ( Mp0 + Nq0 + MB_A ), where 00128 * 00129 * IROFF = MOD( IA-1, MB_A ), ICOFF = MOD( JA-1, NB_A ), 00130 * IAROW = INDXG2P( IA, MB_A, MYROW, RSRC_A, NPROW ), 00131 * IACOL = INDXG2P( JA, NB_A, MYCOL, CSRC_A, NPCOL ), 00132 * Mp0 = NUMROC( M+IROFF, MB_A, MYROW, IAROW, NPROW ), 00133 * Nq0 = NUMROC( N+ICOFF, NB_A, MYCOL, IACOL, NPCOL ), 00134 * 00135 * and NUMROC, INDXG2P are ScaLAPACK tool functions; 00136 * MYROW, MYCOL, NPROW and NPCOL can be determined by calling 00137 * the subroutine BLACS_GRIDINFO. 00138 * 00139 * If LWORK = -1, then LWORK is global input and a workspace 00140 * query is assumed; the routine only calculates the minimum 00141 * and optimal size for all work arrays. Each of these 00142 * values is returned in the first entry of the corresponding 00143 * work array, and no error message is issued by PXERBLA. 00144 * 00145 * INFO (global output) INTEGER 00146 * = 0: successful exit 00147 * < 0: If the i-th argument is an array and the j-entry had 00148 * an illegal value, then INFO = -(i*100+j), if the i-th 00149 * argument is a scalar and had an illegal value, then 00150 * INFO = -i. 00151 * 00152 * Further Details 00153 * =============== 00154 * 00155 * The factorization is obtained by Householder's method. The kth 00156 * transformation matrix, Z( k ), which is used to introduce zeros into 00157 * the (m - k + 1)th row of sub( A ), is given in the form 00158 * 00159 * Z( k ) = ( I 0 ), 00160 * ( 0 T( k ) ) 00161 * 00162 * where 00163 * 00164 * T( k ) = I - tau*u( k )*u( k )', u( k ) = ( 1 ), 00165 * ( 0 ) 00166 * ( z( k ) ) 00167 * 00168 * tau is a scalar and z( k ) is an ( n - m ) element vector. 00169 * tau and z( k ) are chosen to annihilate the elements of the kth row 00170 * of sub( A ). 00171 * 00172 * The scalar tau is returned in the kth element of TAU and the vector 00173 * u( k ) in the kth row of sub( A ), such that the elements of z( k ) 00174 * are in a( k, m + 1 ), ..., a( k, n ). The elements of R are returned 00175 * in the upper triangular part of sub( A ). 00176 * 00177 * Z is given by 00178 * 00179 * Z = Z( 1 ) * Z( 2 ) * ... * Z( m ). 00180 * 00181 * ===================================================================== 00182 * 00183 * .. Parameters .. 00184 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 00185 $ LLD_, MB_, M_, NB_, N_, RSRC_ 00186 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 00187 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 00188 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 00189 REAL ZERO 00190 PARAMETER ( ZERO = 0.0E+0 ) 00191 * .. 00192 * .. Local Scalars .. 00193 LOGICAL LQUERY 00194 CHARACTER COLBTOP, ROWBTOP 00195 INTEGER I, IACOL, IAROW, IB, ICTXT, IIA, IL, IN, IPW, 00196 $ IROFFA, J, JM1, L, LWMIN, MP0, MYCOL, MYROW, 00197 $ NPCOL, NPROW, NQ0 00198 * .. 00199 * .. Local Arrays .. 00200 INTEGER IDUM1( 1 ), IDUM2( 1 ) 00201 * .. 00202 * .. External Subroutines .. 00203 EXTERNAL BLACS_GRIDINFO, CHK1MAT, INFOG1L, PCHK1MAT, 00204 $ PSLATRZ, PSLARZB, PSLARZT, PB_TOPGET, 00205 $ PB_TOPSET, PXERBLA 00206 * .. 00207 * .. External Functions .. 00208 INTEGER ICEIL, INDXG2P, NUMROC 00209 EXTERNAL ICEIL, INDXG2P, NUMROC 00210 * .. 00211 * .. Intrinsic Functions .. 00212 INTRINSIC MAX, MIN, MOD, REAL 00213 * .. 00214 * .. Executable Statements .. 00215 * 00216 * Get grid parameters 00217 * 00218 ICTXT = DESCA( CTXT_ ) 00219 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 00220 * 00221 * Test the input parameters 00222 * 00223 INFO = 0 00224 IF( NPROW.EQ.-1 ) THEN 00225 INFO = -(600+CTXT_) 00226 ELSE 00227 CALL CHK1MAT( M, 1, N, 2, IA, JA, DESCA, 6, INFO ) 00228 IF( INFO.EQ.0 ) THEN 00229 IROFFA = MOD( IA-1, DESCA( MB_ ) ) 00230 IAROW = INDXG2P( IA, DESCA( MB_ ), MYROW, DESCA( RSRC_ ), 00231 $ NPROW ) 00232 IACOL = INDXG2P( JA, DESCA( NB_ ), MYCOL, DESCA( CSRC_ ), 00233 $ NPCOL ) 00234 MP0 = NUMROC( M+IROFFA, DESCA( MB_ ), MYROW, IAROW, NPROW ) 00235 NQ0 = NUMROC( N+MOD( JA-1, DESCA( NB_ ) ), DESCA( NB_ ), 00236 $ MYCOL, IACOL, NPCOL ) 00237 LWMIN = DESCA( MB_ ) * ( MP0 + NQ0 + DESCA( MB_ ) ) 00238 * 00239 WORK( 1 ) = REAL( LWMIN ) 00240 LQUERY = ( LWORK.EQ.-1 ) 00241 IF( N.LT.M ) THEN 00242 INFO = -2 00243 ELSE IF( LWORK.LT.LWMIN .AND. .NOT.LQUERY ) THEN 00244 INFO = -9 00245 END IF 00246 END IF 00247 IF( LQUERY ) THEN 00248 IDUM1( 1 ) = -1 00249 ELSE 00250 IDUM1( 1 ) = 1 00251 END IF 00252 IDUM2( 1 ) = 9 00253 CALL PCHK1MAT( M, 1, N, 2, IA, JA, DESCA, 6, 1, IDUM1, IDUM2, 00254 $ INFO ) 00255 END IF 00256 * 00257 IF( INFO.NE.0 ) THEN 00258 CALL PXERBLA( ICTXT, 'PSTZRZF', -INFO ) 00259 RETURN 00260 ELSE IF( LQUERY ) THEN 00261 RETURN 00262 END IF 00263 * 00264 * Quick return if possible 00265 * 00266 IF( M.EQ.0 .OR. N.EQ.0 ) 00267 $ RETURN 00268 * 00269 IF( M.EQ.N ) THEN 00270 * 00271 CALL INFOG1L( IA, DESCA( MB_ ), NPROW, MYROW, DESCA( RSRC_ ), 00272 $ IIA, IAROW ) 00273 IF( MYROW.EQ.IAROW ) 00274 $ MP0 = MP0 - IROFFA 00275 DO 10 I = IIA, IIA+MP0-1 00276 TAU( I ) = ZERO 00277 10 CONTINUE 00278 * 00279 ELSE 00280 * 00281 L = N-M 00282 JM1 = JA + MIN( M+1, N ) - 1 00283 IPW = DESCA( MB_ ) * DESCA( MB_ ) + 1 00284 IN = MIN( ICEIL( IA, DESCA( MB_ ) ) * DESCA( MB_ ), IA+M-1 ) 00285 IL = MAX( ( (IA+M-2) / DESCA( MB_ ) ) * DESCA( MB_ ) + 1, IA ) 00286 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 00287 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 00288 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ' ' ) 00289 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', 'D-ring' ) 00290 * 00291 * Use blocked code initially 00292 * 00293 DO 20 I = IL, IN+1, -DESCA( MB_ ) 00294 IB = MIN( IA+M-I, DESCA( MB_ ) ) 00295 J = JA + I - IA 00296 * 00297 * Compute the complete orthogonal factorization of the current 00298 * block A(i:i+ib-1,j:ja+n-1) 00299 * 00300 CALL PSLATRZ( IB, JA+N-J, L, A, I, J, DESCA, TAU, WORK ) 00301 * 00302 IF( I.GT.IA ) THEN 00303 * 00304 * Form the triangular factor of the block reflector 00305 * H = H(i+ib-1) . . . H(i+1) H(i) 00306 * 00307 CALL PSLARZT( 'Backward', 'Rowwise', L, IB, A, I, JM1, 00308 $ DESCA, TAU, WORK, WORK( IPW ) ) 00309 * 00310 * Apply H to A(ia:i-1,j:ja+n-1) from the right 00311 * 00312 CALL PSLARZB( 'Right', 'No transpose', 'Backward', 00313 $ 'Rowwise', I-IA, JA+N-J, IB, L, A, I, JM1, 00314 $ DESCA, WORK, A, IA, J, DESCA, WORK( IPW ) ) 00315 END IF 00316 * 00317 20 CONTINUE 00318 * 00319 * Use unblocked code to factor the last or only block 00320 * 00321 CALL PSLATRZ( IN-IA+1, N, N-M, A, IA, JA, DESCA, TAU, WORK ) 00322 * 00323 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 00324 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 00325 * 00326 END IF 00327 * 00328 WORK( 1 ) = REAL( LWMIN ) 00329 * 00330 RETURN 00331 * 00332 * End of PSTZRZF 00333 * 00334 END