|
ScaLAPACK
2.0.2
ScaLAPACK: Scalable Linear Algebra PACKage
|
00001 SUBROUTINE PZPOEQU( N, A, IA, JA, DESCA, SR, SC, SCOND, AMAX, 00002 $ INFO ) 00003 * 00004 * -- ScaLAPACK routine (version 1.7) -- 00005 * University of Tennessee, Knoxville, Oak Ridge National Laboratory, 00006 * and University of California, Berkeley. 00007 * May 1, 1997 00008 * 00009 * .. Scalar Arguments .. 00010 INTEGER IA, INFO, JA, N 00011 DOUBLE PRECISION AMAX, SCOND 00012 * .. 00013 * .. Array Arguments .. 00014 INTEGER DESCA( * ) 00015 DOUBLE PRECISION SC( * ), SR( * ) 00016 COMPLEX*16 A( * ) 00017 * .. 00018 * 00019 * Purpose 00020 * ======= 00021 * 00022 * PZPOEQU computes row and column scalings intended to 00023 * equilibrate a distributed Hermitian positive definite matrix 00024 * sub( A ) = A(IA:IA+N-1,JA:JA+N-1) and reduce its condition number 00025 * (with respect to the two-norm). SR and SC contain the scale 00026 * factors, S(i) = 1/sqrt(A(i,i)), chosen so that the scaled distri- 00027 * buted matrix B with elements B(i,j) = S(i)*A(i,j)*S(j) has ones on 00028 * the diagonal. This choice of SR and SC puts the condition number 00029 * of B within a factor N of the smallest possible condition number 00030 * over all possible diagonal scalings. 00031 * 00032 * The scaling factor are stored along process rows in SR and along 00033 * process columns in SC. The duplication of information simplifies 00034 * greatly the application of the factors. 00035 * 00036 * Notes 00037 * ===== 00038 * 00039 * Each global data object is described by an associated description 00040 * vector. This vector stores the information required to establish 00041 * the mapping between an object element and its corresponding process 00042 * and memory location. 00043 * 00044 * Let A be a generic term for any 2D block cyclicly distributed array. 00045 * Such a global array has an associated description vector DESCA. 00046 * In the following comments, the character _ should be read as 00047 * "of the global array". 00048 * 00049 * NOTATION STORED IN EXPLANATION 00050 * --------------- -------------- -------------------------------------- 00051 * DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 00052 * DTYPE_A = 1. 00053 * CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 00054 * the BLACS process grid A is distribu- 00055 * ted over. The context itself is glo- 00056 * bal, but the handle (the integer 00057 * value) may vary. 00058 * M_A (global) DESCA( M_ ) The number of rows in the global 00059 * array A. 00060 * N_A (global) DESCA( N_ ) The number of columns in the global 00061 * array A. 00062 * MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 00063 * the rows of the array. 00064 * NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 00065 * the columns of the array. 00066 * RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 00067 * row of the array A is distributed. 00068 * CSRC_A (global) DESCA( CSRC_ ) The process column over which the 00069 * first column of the array A is 00070 * distributed. 00071 * LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 00072 * array. LLD_A >= MAX(1,LOCr(M_A)). 00073 * 00074 * Let K be the number of rows or columns of a distributed matrix, 00075 * and assume that its process grid has dimension p x q. 00076 * LOCr( K ) denotes the number of elements of K that a process 00077 * would receive if K were distributed over the p processes of its 00078 * process column. 00079 * Similarly, LOCc( K ) denotes the number of elements of K that a 00080 * process would receive if K were distributed over the q processes of 00081 * its process row. 00082 * The values of LOCr() and LOCc() may be determined via a call to the 00083 * ScaLAPACK tool function, NUMROC: 00084 * LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 00085 * LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 00086 * An upper bound for these quantities may be computed by: 00087 * LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 00088 * LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 00089 * 00090 * Arguments 00091 * ========= 00092 * 00093 * N (global input) INTEGER 00094 * The number of rows and columns to be operated on i.e the 00095 * order of the distributed submatrix sub( A ). N >= 0. 00096 * 00097 * A (local input) COMPLEX*16 pointer into the local memory to an 00098 * array of local dimension ( LLD_A, LOCc(JA+N-1) ), the 00099 * N-by-N Hermitian positive definite distributed matrix 00100 * sub( A ) whose scaling factors are to be computed. Only the 00101 * diagonal elements of sub( A ) are referenced. 00102 * 00103 * IA (global input) INTEGER 00104 * The row index in the global array A indicating the first 00105 * row of sub( A ). 00106 * 00107 * JA (global input) INTEGER 00108 * The column index in the global array A indicating the 00109 * first column of sub( A ). 00110 * 00111 * DESCA (global and local input) INTEGER array of dimension DLEN_. 00112 * The array descriptor for the distributed matrix A. 00113 * 00114 * SR (local output) DOUBLE PRECISION array, dimension LOCr(M_A) 00115 * If INFO = 0, SR(IA:IA+N-1) contains the row scale factors 00116 * for sub( A ). SR is aligned with the distributed matrix A, 00117 * and replicated across every process column. SR is tied to the 00118 * distributed matrix A. 00119 * 00120 * SC (local output) DOUBLE PRECISION array, dimension LOCc(N_A) 00121 * If INFO = 0, SC(JA:JA+N-1) contains the column scale factors 00122 * for A(IA:IA+M-1,JA:JA+N-1). SC is aligned with the distribu- 00123 * ted matrix A, and replicated down every process row. SC is 00124 * tied to the distributed matrix A. 00125 * 00126 * SCOND (global output) DOUBLE PRECISION 00127 * If INFO = 0, SCOND contains the ratio of the smallest SR(i) 00128 * (or SC(j)) to the largest SR(i) (or SC(j)), with 00129 * IA <= i <= IA+N-1 and JA <= j <= JA+N-1. If SCOND >= 0.1 00130 * and AMAX is neither too large nor too small, it is not worth 00131 * scaling by SR (or SC). 00132 * 00133 * AMAX (global output) DOUBLE PRECISION 00134 * Absolute value of largest matrix element. If AMAX is very 00135 * close to overflow or very close to underflow, the matrix 00136 * should be scaled. 00137 * 00138 * INFO (global output) INTEGER 00139 * = 0: successful exit 00140 * < 0: If the i-th argument is an array and the j-entry had 00141 * an illegal value, then INFO = -(i*100+j), if the i-th 00142 * argument is a scalar and had an illegal value, then 00143 * INFO = -i. 00144 * > 0: If INFO = K, the K-th diagonal entry of sub( A ) is 00145 * nonpositive. 00146 * 00147 * ===================================================================== 00148 * 00149 * .. Parameters .. 00150 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 00151 $ LLD_, MB_, M_, NB_, N_, RSRC_ 00152 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 00153 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 00154 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 00155 DOUBLE PRECISION ZERO, ONE 00156 PARAMETER ( ZERO = 0.0D+0, ONE = 1.0D+0 ) 00157 * .. 00158 * .. Local Scalars .. 00159 CHARACTER ALLCTOP, COLCTOP, ROWCTOP 00160 INTEGER IACOL, IAROW, ICOFF, ICTXT, ICURCOL, ICURROW, 00161 $ IDUMM, II, IIA, IOFFA, IOFFD, IROFF, J, JB, JJ, 00162 $ JJA, JN, LDA, LL, MYCOL, MYROW, NP, NPCOL, 00163 $ NPROW, NQ 00164 DOUBLE PRECISION AII, SMIN 00165 * .. 00166 * .. Local Arrays .. 00167 INTEGER DESCSC( DLEN_ ), DESCSR( DLEN_ ) 00168 * .. 00169 * .. External Subroutines .. 00170 EXTERNAL BLACS_GRIDINFO, CHK1MAT, DESCSET, DGAMN2D, 00171 $ DGAMX2D, DGSUM2D, IGAMN2D, INFOG2L, 00172 $ PCHK1MAT, PB_TOPGET, PXERBLA 00173 * .. 00174 * .. External Functions .. 00175 INTEGER ICEIL, NUMROC 00176 DOUBLE PRECISION PDLAMCH 00177 EXTERNAL ICEIL, NUMROC, PDLAMCH 00178 * .. 00179 * .. Intrinsic Functions .. 00180 INTRINSIC DBLE, MAX, MIN, MOD, SQRT 00181 * .. 00182 * .. Executable Statements .. 00183 * 00184 * Get grid parameters 00185 * 00186 ICTXT = DESCA( CTXT_ ) 00187 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 00188 * 00189 * Test the input parameters. 00190 * 00191 INFO = 0 00192 IF( NPROW.EQ.-1 ) THEN 00193 INFO = -(500+CTXT_) 00194 ELSE 00195 CALL CHK1MAT( N, 1, N, 1, IA, JA, DESCA, 5, INFO ) 00196 CALL PCHK1MAT( N, 1, N, 1, IA, JA, DESCA, 5, 0, IDUMM, IDUMM, 00197 $ INFO ) 00198 END IF 00199 * 00200 IF( INFO.NE.0 ) THEN 00201 CALL PXERBLA( ICTXT, 'PZPOEQU', -INFO ) 00202 RETURN 00203 END IF 00204 * 00205 * Quick return if possible 00206 * 00207 IF( N.EQ.0 ) THEN 00208 SCOND = ONE 00209 AMAX = ZERO 00210 RETURN 00211 END IF 00212 * 00213 CALL PB_TOPGET( ICTXT, 'Combine', 'All', ALLCTOP ) 00214 CALL PB_TOPGET( ICTXT, 'Combine', 'Rowwise', ROWCTOP ) 00215 CALL PB_TOPGET( ICTXT, 'Combine', 'Columnwise', COLCTOP ) 00216 * 00217 * Compute some local indexes 00218 * 00219 CALL INFOG2L( IA, JA, DESCA, NPROW, NPCOL, MYROW, MYCOL, IIA, JJA, 00220 $ IAROW, IACOL ) 00221 IROFF = MOD( IA-1, DESCA( MB_ ) ) 00222 ICOFF = MOD( JA-1, DESCA( NB_ ) ) 00223 NP = NUMROC( N+IROFF, DESCA( MB_ ), MYROW, IAROW, NPROW ) 00224 NQ = NUMROC( N+ICOFF, DESCA( NB_ ), MYCOL, IACOL, NPCOL ) 00225 IF( MYROW.EQ.IAROW ) 00226 $ NP = NP - IROFF 00227 IF( MYCOL.EQ.IACOL ) 00228 $ NQ = NQ - ICOFF 00229 JN = MIN( ICEIL( JA, DESCA( NB_ ) ) * DESCA( NB_ ), JA+N-1 ) 00230 LDA = DESCA( LLD_ ) 00231 * 00232 * Assign descriptors for SR and SC arrays 00233 * 00234 CALL DESCSET( DESCSR, N, 1, DESCA( MB_ ), 1, 0, 0, ICTXT, 00235 $ MAX( 1, NP ) ) 00236 CALL DESCSET( DESCSC, 1, N, 1, DESCA( NB_ ), 0, 0, ICTXT, 1 ) 00237 * 00238 * Initialize the scaling factors to zero. 00239 * 00240 DO 10 II = IIA, IIA+NP-1 00241 SR( II ) = ZERO 00242 10 CONTINUE 00243 * 00244 DO 20 JJ = JJA, JJA+NQ-1 00245 SC( JJ ) = ZERO 00246 20 CONTINUE 00247 * 00248 * Find the minimum and maximum diagonal elements. 00249 * Handle first block separately. 00250 * 00251 II = IIA 00252 JJ = JJA 00253 JB = JN-JA+1 00254 SMIN = ONE / PDLAMCH( ICTXT, 'S' ) 00255 AMAX = ZERO 00256 * 00257 IOFFA = II+(JJ-1)*LDA 00258 IF( MYROW.EQ.IAROW .AND. MYCOL.EQ.IACOL ) THEN 00259 IOFFD = IOFFA 00260 DO 30 LL = 0, JB-1 00261 AII = DBLE( A( IOFFD ) ) 00262 SR( II+LL ) = AII 00263 SC( JJ+LL ) = AII 00264 SMIN = MIN( SMIN, AII ) 00265 AMAX = MAX( AMAX, AII ) 00266 IF( AII.LE.ZERO .AND. INFO.EQ.0 ) 00267 $ INFO = LL + 1 00268 IOFFD = IOFFD + LDA + 1 00269 30 CONTINUE 00270 END IF 00271 * 00272 IF( MYROW.EQ.IAROW ) THEN 00273 II = II + JB 00274 IOFFA = IOFFA + JB 00275 END IF 00276 IF( MYCOL.EQ.IACOL ) THEN 00277 JJ = JJ + JB 00278 IOFFA = IOFFA + JB*LDA 00279 END IF 00280 ICURROW = MOD( IAROW+1, NPROW ) 00281 ICURCOL = MOD( IACOL+1, NPCOL ) 00282 * 00283 * Loop over remaining blocks of columns 00284 * 00285 DO 50 J = JN+1, JA+N-1, DESCA( NB_ ) 00286 JB = MIN( N-J+JA, DESCA( NB_ ) ) 00287 * 00288 IF( MYROW.EQ.ICURROW .AND. MYCOL.EQ.ICURCOL ) THEN 00289 IOFFD = IOFFA 00290 DO 40 LL = 0, JB-1 00291 AII = DBLE( A( IOFFD ) ) 00292 SR( II+LL ) = AII 00293 SC( JJ+LL ) = AII 00294 SMIN = MIN( SMIN, AII ) 00295 AMAX = MAX( AMAX, AII ) 00296 IF( AII.LE.ZERO .AND. INFO.EQ.0 ) 00297 $ INFO = J + LL - JA + 1 00298 IOFFD = IOFFD + LDA + 1 00299 40 CONTINUE 00300 END IF 00301 * 00302 IF( MYROW.EQ.ICURROW ) THEN 00303 II = II + JB 00304 IOFFA = IOFFA + JB 00305 END IF 00306 IF( MYCOL.EQ.ICURCOL ) THEN 00307 JJ = JJ + JB 00308 IOFFA = IOFFA + JB*LDA 00309 END IF 00310 ICURROW = MOD( ICURROW+1, NPROW ) 00311 ICURCOL = MOD( ICURCOL+1, NPCOL ) 00312 * 00313 50 CONTINUE 00314 * 00315 * Compute scaling factors 00316 * 00317 CALL DGSUM2D( ICTXT, 'Columnwise', COLCTOP, 1, NQ, SC( JJA ), 00318 $ 1, -1, MYCOL ) 00319 CALL DGSUM2D( ICTXT, 'Rowwise', ROWCTOP, NP, 1, SR( IIA ), 00320 $ MAX( 1, NP ), -1, MYCOL ) 00321 * 00322 CALL DGAMX2D( ICTXT, 'All', ALLCTOP, 1, 1, AMAX, 1, IDUMM, IDUMM, 00323 $ -1, -1, MYCOL ) 00324 CALL DGAMN2D( ICTXT, 'All', ALLCTOP, 1, 1, SMIN, 1, IDUMM, IDUMM, 00325 $ -1, -1, MYCOL ) 00326 * 00327 IF( SMIN.LE.ZERO ) THEN 00328 * 00329 * Find the first non-positive diagonal element and return. 00330 * 00331 CALL IGAMN2D( ICTXT, 'All', ALLCTOP, 1, 1, INFO, 1, II, JJ, -1, 00332 $ -1, MYCOL ) 00333 RETURN 00334 * 00335 ELSE 00336 * 00337 * Set the scale factors to the reciprocals 00338 * of the diagonal elements. 00339 * 00340 DO 60 II = IIA, IIA+NP-1 00341 SR( II ) = ONE / SQRT( SR( II ) ) 00342 60 CONTINUE 00343 * 00344 DO 70 JJ = JJA, JJA+NQ-1 00345 SC( JJ ) = ONE / SQRT( SC( JJ ) ) 00346 70 CONTINUE 00347 * 00348 * Compute SCOND = min(S(I)) / max(S(I)) 00349 * 00350 SCOND = SQRT( SMIN ) / SQRT( AMAX ) 00351 * 00352 END IF 00353 * 00354 RETURN 00355 * 00356 * End of PZPOEQU 00357 * 00358 END