|
ScaLAPACK
2.0.2
ScaLAPACK: Scalable Linear Algebra PACKage
|
00001 SUBROUTINE PDPOTRF( UPLO, N, A, IA, JA, DESCA, INFO ) 00002 * 00003 * -- ScaLAPACK routine (version 1.7) -- 00004 * University of Tennessee, Knoxville, Oak Ridge National Laboratory, 00005 * and University of California, Berkeley. 00006 * May 25, 2001 00007 * 00008 * .. Scalar Arguments .. 00009 CHARACTER UPLO 00010 INTEGER IA, INFO, JA, N 00011 * .. 00012 * .. Array Arguments .. 00013 INTEGER DESCA( * ) 00014 DOUBLE PRECISION A( * ) 00015 * .. 00016 * 00017 * Purpose 00018 * ======= 00019 * 00020 * PDPOTRF computes the Cholesky factorization of an N-by-N real 00021 * symmetric positive definite distributed matrix sub( A ) denoting 00022 * A(IA:IA+N-1, JA:JA+N-1). 00023 * 00024 * The factorization has the form 00025 * 00026 * sub( A ) = U' * U , if UPLO = 'U', or 00027 * 00028 * sub( A ) = L * L', if UPLO = 'L', 00029 * 00030 * where U is an upper triangular matrix and L is lower triangular. 00031 * 00032 * Notes 00033 * ===== 00034 * 00035 * Each global data object is described by an associated description 00036 * vector. This vector stores the information required to establish 00037 * the mapping between an object element and its corresponding process 00038 * and memory location. 00039 * 00040 * Let A be a generic term for any 2D block cyclicly distributed array. 00041 * Such a global array has an associated description vector DESCA. 00042 * In the following comments, the character _ should be read as 00043 * "of the global array". 00044 * 00045 * NOTATION STORED IN EXPLANATION 00046 * --------------- -------------- -------------------------------------- 00047 * DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 00048 * DTYPE_A = 1. 00049 * CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 00050 * the BLACS process grid A is distribu- 00051 * ted over. The context itself is glo- 00052 * bal, but the handle (the integer 00053 * value) may vary. 00054 * M_A (global) DESCA( M_ ) The number of rows in the global 00055 * array A. 00056 * N_A (global) DESCA( N_ ) The number of columns in the global 00057 * array A. 00058 * MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 00059 * the rows of the array. 00060 * NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 00061 * the columns of the array. 00062 * RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 00063 * row of the array A is distributed. 00064 * CSRC_A (global) DESCA( CSRC_ ) The process column over which the 00065 * first column of the array A is 00066 * distributed. 00067 * LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 00068 * array. LLD_A >= MAX(1,LOCr(M_A)). 00069 * 00070 * Let K be the number of rows or columns of a distributed matrix, 00071 * and assume that its process grid has dimension p x q. 00072 * LOCr( K ) denotes the number of elements of K that a process 00073 * would receive if K were distributed over the p processes of its 00074 * process column. 00075 * Similarly, LOCc( K ) denotes the number of elements of K that a 00076 * process would receive if K were distributed over the q processes of 00077 * its process row. 00078 * The values of LOCr() and LOCc() may be determined via a call to the 00079 * ScaLAPACK tool function, NUMROC: 00080 * LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 00081 * LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 00082 * An upper bound for these quantities may be computed by: 00083 * LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 00084 * LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 00085 * 00086 * This routine requires square block decomposition ( MB_A = NB_A ). 00087 * 00088 * Arguments 00089 * ========= 00090 * 00091 * UPLO (global input) CHARACTER 00092 * = 'U': Upper triangle of sub( A ) is stored; 00093 * = 'L': Lower triangle of sub( A ) is stored. 00094 * 00095 * N (global input) INTEGER 00096 * The number of rows and columns to be operated on, i.e. the 00097 * order of the distributed submatrix sub( A ). N >= 0. 00098 * 00099 * A (local input/local output) DOUBLE PRECISION pointer into the 00100 * local memory to an array of dimension (LLD_A, LOCc(JA+N-1)). 00101 * On entry, this array contains the local pieces of the 00102 * N-by-N symmetric distributed matrix sub( A ) to be factored. 00103 * If UPLO = 'U', the leading N-by-N upper triangular part of 00104 * sub( A ) contains the upper triangular part of the matrix, 00105 * and its strictly lower triangular part is not referenced. 00106 * If UPLO = 'L', the leading N-by-N lower triangular part of 00107 * sub( A ) contains the lower triangular part of the distribu- 00108 * ted matrix, and its strictly upper triangular part is not 00109 * referenced. On exit, if UPLO = 'U', the upper triangular 00110 * part of the distributed matrix contains the Cholesky factor 00111 * U, if UPLO = 'L', the lower triangular part of the distribu- 00112 * ted matrix contains the Cholesky factor L. 00113 * 00114 * IA (global input) INTEGER 00115 * The row index in the global array A indicating the first 00116 * row of sub( A ). 00117 * 00118 * JA (global input) INTEGER 00119 * The column index in the global array A indicating the 00120 * first column of sub( A ). 00121 * 00122 * DESCA (global and local input) INTEGER array of dimension DLEN_. 00123 * The array descriptor for the distributed matrix A. 00124 * 00125 * INFO (global output) INTEGER 00126 * = 0: successful exit 00127 * < 0: If the i-th argument is an array and the j-entry had 00128 * an illegal value, then INFO = -(i*100+j), if the i-th 00129 * argument is a scalar and had an illegal value, then 00130 * INFO = -i. 00131 * > 0: If INFO = K, the leading minor of order K, 00132 * A(IA:IA+K-1,JA:JA+K-1) is not positive definite, and 00133 * the factorization could not be completed. 00134 * 00135 * ===================================================================== 00136 * 00137 * .. Parameters .. 00138 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 00139 $ LLD_, MB_, M_, NB_, N_, RSRC_ 00140 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 00141 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 00142 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 00143 DOUBLE PRECISION ONE 00144 PARAMETER ( ONE = 1.0D+0 ) 00145 * .. 00146 * .. Local Scalars .. 00147 LOGICAL UPPER 00148 CHARACTER COLBTOP, ROWBTOP 00149 INTEGER I, ICOFF, ICTXT, IROFF, J, JB, JN, MYCOL, 00150 $ MYROW, NPCOL, NPROW 00151 * .. 00152 * .. Local Arrays .. 00153 INTEGER IDUM1( 1 ), IDUM2( 1 ) 00154 * .. 00155 * .. External Subroutines .. 00156 EXTERNAL BLACS_GRIDINFO, CHK1MAT, PCHK1MAT, PB_TOPGET, 00157 $ PB_TOPSET, PDPOTF2, PDSYRK, PDTRSM, 00158 $ PXERBLA 00159 * .. 00160 * .. External Functions .. 00161 LOGICAL LSAME 00162 INTEGER ICEIL 00163 EXTERNAL ICEIL, LSAME 00164 * .. 00165 * .. Intrinsic Functions .. 00166 INTRINSIC ICHAR, MIN, MOD 00167 * .. 00168 * .. Executable Statements .. 00169 * 00170 * Get grid parameters 00171 * 00172 ICTXT = DESCA( CTXT_ ) 00173 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 00174 * 00175 * Test the input parameters 00176 * 00177 INFO = 0 00178 IF( NPROW.EQ.-1 ) THEN 00179 INFO = -(600+CTXT_) 00180 ELSE 00181 CALL CHK1MAT( N, 2, N, 2, IA, JA, DESCA, 6, INFO ) 00182 UPPER = LSAME( UPLO, 'U' ) 00183 IF( INFO.EQ.0 ) THEN 00184 IROFF = MOD( IA-1, DESCA( MB_ ) ) 00185 ICOFF = MOD( JA-1, DESCA( NB_ ) ) 00186 IF ( .NOT.UPPER .AND. .NOT.LSAME( UPLO, 'L' ) ) THEN 00187 INFO = -1 00188 ELSE IF( IROFF.NE.0 ) THEN 00189 INFO = -4 00190 ELSE IF( ICOFF.NE.0 ) THEN 00191 INFO = -5 00192 ELSE IF( DESCA( MB_ ).NE.DESCA( NB_ ) ) THEN 00193 INFO = -(600+NB_) 00194 END IF 00195 END IF 00196 IF( UPPER ) THEN 00197 IDUM1( 1 ) = ICHAR( 'U' ) 00198 ELSE 00199 IDUM1( 1 ) = ICHAR( 'L' ) 00200 END IF 00201 IDUM2( 1 ) = 1 00202 CALL PCHK1MAT( N, 2, N, 2, IA, JA, DESCA, 6, 1, IDUM1, IDUM2, 00203 $ INFO ) 00204 END IF 00205 * 00206 IF( INFO.NE.0 ) THEN 00207 CALL PXERBLA( ICTXT, 'PDPOTRF', -INFO ) 00208 RETURN 00209 END IF 00210 * 00211 * Quick return if possible 00212 * 00213 IF( N.EQ.0 ) 00214 $ RETURN 00215 * 00216 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 00217 CALL PB_TOPGET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 00218 * 00219 IF( UPPER ) THEN 00220 * 00221 * Split-ring topology for the communication along process 00222 * columns, 1-tree topology along process rows. 00223 * 00224 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ' ' ) 00225 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', 'S-ring' ) 00226 * 00227 * A is upper triangular, compute Cholesky factorization A = U'*U. 00228 * 00229 * Handle the first block of columns separately 00230 * 00231 JN = MIN( ICEIL( JA, DESCA( NB_ ) )*DESCA(NB_), JA+N-1 ) 00232 JB = JN - JA + 1 00233 * 00234 * Perform unblocked Cholesky factorization on JB block 00235 * 00236 CALL PDPOTF2( UPLO, JB, A, IA, JA, DESCA, INFO ) 00237 IF( INFO.NE.0 ) 00238 $ GO TO 30 00239 * 00240 IF( JB+1.LE.N ) THEN 00241 * 00242 * Form the row panel of U using the triangular solver 00243 * 00244 CALL PDTRSM( 'Left', UPLO, 'Transpose', 'Non-Unit', 00245 $ JB, N-JB, ONE, A, IA, JA, DESCA, A, IA, JA+JB, 00246 $ DESCA ) 00247 * 00248 * Update the trailing matrix, A = A - U'*U 00249 * 00250 CALL PDSYRK( UPLO, 'Transpose', N-JB, JB, -ONE, A, IA, 00251 $ JA+JB, DESCA, ONE, A, IA+JB, JA+JB, DESCA ) 00252 END IF 00253 * 00254 * Loop over remaining block of columns 00255 * 00256 DO 10 J = JN+1, JA+N-1, DESCA( NB_ ) 00257 JB = MIN( N-J+JA, DESCA( NB_ ) ) 00258 I = IA + J - JA 00259 * 00260 * Perform unblocked Cholesky factorization on JB block 00261 * 00262 CALL PDPOTF2( UPLO, JB, A, I, J, DESCA, INFO ) 00263 IF( INFO.NE.0 ) THEN 00264 INFO = INFO + J - JA 00265 GO TO 30 00266 END IF 00267 * 00268 IF( J-JA+JB+1.LE.N ) THEN 00269 * 00270 * Form the row panel of U using the triangular solver 00271 * 00272 CALL PDTRSM( 'Left', UPLO, 'Transpose', 'Non-Unit', 00273 $ JB, N-J-JB+JA, ONE, A, I, J, DESCA, A, 00274 $ I, J+JB, DESCA ) 00275 * 00276 * Update the trailing matrix, A = A - U'*U 00277 * 00278 CALL PDSYRK( UPLO, 'Transpose', N-J-JB+JA, JB, 00279 $ -ONE, A, I, J+JB, DESCA, ONE, A, I+JB, 00280 $ J+JB, DESCA ) 00281 END IF 00282 10 CONTINUE 00283 * 00284 ELSE 00285 * 00286 * 1-tree topology for the communication along process columns, 00287 * Split-ring topology along process rows. 00288 * 00289 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', 'S-ring' ) 00290 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', ' ' ) 00291 * 00292 * A is lower triangular, compute Cholesky factorization A = L*L' 00293 * (right-looking) 00294 * 00295 * Handle the first block of columns separately 00296 * 00297 JN = MIN( ICEIL( JA, DESCA( NB_ ) )*DESCA( NB_ ), JA+N-1 ) 00298 JB = JN - JA + 1 00299 * 00300 * Perform unblocked Cholesky factorization on JB block 00301 * 00302 CALL PDPOTF2( UPLO, JB, A, IA, JA, DESCA, INFO ) 00303 IF( INFO.NE.0 ) 00304 $ GO TO 30 00305 * 00306 IF( JB+1.LE.N ) THEN 00307 * 00308 * Form the column panel of L using the triangular solver 00309 * 00310 CALL PDTRSM( 'Right', UPLO, 'Transpose', 'Non-Unit', 00311 $ N-JB, JB, ONE, A, IA, JA, DESCA, A, IA+JB, JA, 00312 $ DESCA ) 00313 * 00314 * Update the trailing matrix, A = A - L*L' 00315 * 00316 CALL PDSYRK( UPLO, 'No Transpose', N-JB, JB, -ONE, A, IA+JB, 00317 $ JA, DESCA, ONE, A, IA+JB, JA+JB, DESCA ) 00318 * 00319 END IF 00320 * 00321 DO 20 J = JN+1, JA+N-1, DESCA( NB_ ) 00322 JB = MIN( N-J+JA, DESCA( NB_ ) ) 00323 I = IA + J - JA 00324 * 00325 * Perform unblocked Cholesky factorization on JB block 00326 * 00327 CALL PDPOTF2( UPLO, JB, A, I, J, DESCA, INFO ) 00328 IF( INFO.NE.0 ) THEN 00329 INFO = INFO + J - JA 00330 GO TO 30 00331 END IF 00332 * 00333 IF( J-JA+JB+1.LE.N ) THEN 00334 * 00335 * Form the column panel of L using the triangular solver 00336 * 00337 CALL PDTRSM( 'Right', UPLO, 'Transpose', 'Non-Unit', 00338 $ N-J-JB+JA, JB, ONE, A, I, J, DESCA, A, I+JB, 00339 $ J, DESCA ) 00340 * 00341 * Update the trailing matrix, A = A - L*L' 00342 * 00343 CALL PDSYRK( UPLO, 'No Transpose', N-J-JB+JA, JB, -ONE, 00344 $ A, I+JB, J, DESCA, ONE, A, I+JB, J+JB, 00345 $ DESCA ) 00346 * 00347 END IF 00348 20 CONTINUE 00349 * 00350 END IF 00351 * 00352 30 CONTINUE 00353 * 00354 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Rowwise', ROWBTOP ) 00355 CALL PB_TOPSET( ICTXT, 'Broadcast', 'Columnwise', COLBTOP ) 00356 * 00357 RETURN 00358 * 00359 * End of PDPOTRF 00360 * 00361 END