|
ScaLAPACK
2.0.2
ScaLAPACK: Scalable Linear Algebra PACKage
|
00001 SUBROUTINE PZDTTRS( TRANS, N, NRHS, DL, D, DU, JA, DESCA, B, IB, 00002 $ DESCB, AF, LAF, WORK, LWORK, INFO ) 00003 * 00004 * 00005 * 00006 * -- ScaLAPACK routine (version 1.7) -- 00007 * University of Tennessee, Knoxville, Oak Ridge National Laboratory, 00008 * and University of California, Berkeley. 00009 * August 7, 2001 00010 * 00011 * .. Scalar Arguments .. 00012 CHARACTER TRANS 00013 INTEGER IB, INFO, JA, LAF, LWORK, N, NRHS 00014 * .. 00015 * .. Array Arguments .. 00016 INTEGER DESCA( * ), DESCB( * ) 00017 COMPLEX*16 AF( * ), B( * ), D( * ), DL( * ), DU( * ), 00018 $ WORK( * ) 00019 * .. 00020 * 00021 * 00022 * Purpose 00023 * ======= 00024 * 00025 * PZDTTRS solves a system of linear equations 00026 * 00027 * A(1:N, JA:JA+N-1) * X = B(IB:IB+N-1, 1:NRHS) 00028 * or 00029 * A(1:N, JA:JA+N-1)' * X = B(IB:IB+N-1, 1:NRHS) 00030 * 00031 * where A(1:N, JA:JA+N-1) is the matrix used to produce the factors 00032 * stored in A(1:N,JA:JA+N-1) and AF by PZDTTRF. 00033 * A(1:N, JA:JA+N-1) is an N-by-N complex 00034 * tridiagonal diagonally dominant-like distributed 00035 * matrix. 00036 * 00037 * Routine PZDTTRF MUST be called first. 00038 * 00039 * ===================================================================== 00040 * 00041 * Arguments 00042 * ========= 00043 * 00044 * 00045 * TRANS (global input) CHARACTER 00046 * = 'N': Solve with A(1:N, JA:JA+N-1); 00047 * = 'C': Solve with conjugate_transpose( A(1:N, JA:JA+N-1) ); 00048 * 00049 * N (global input) INTEGER 00050 * The number of rows and columns to be operated on, i.e. the 00051 * order of the distributed submatrix A(1:N, JA:JA+N-1). N >= 0. 00052 * 00053 * NRHS (global input) INTEGER 00054 * The number of right hand sides, i.e., the number of columns 00055 * of the distributed submatrix B(IB:IB+N-1, 1:NRHS). 00056 * NRHS >= 0. 00057 * 00058 * DL (local input/local output) COMPLEX*16 pointer to local 00059 * part of global vector storing the lower diagonal of the 00060 * matrix. Globally, DL(1) is not referenced, and DL must be 00061 * aligned with D. 00062 * Must be of size >= DESCA( NB_ ). 00063 * On exit, this array contains information containing the 00064 * factors of the matrix. 00065 * 00066 * D (local input/local output) COMPLEX*16 pointer to local 00067 * part of global vector storing the main diagonal of the 00068 * matrix. 00069 * On exit, this array contains information containing the 00070 * factors of the matrix. 00071 * Must be of size >= DESCA( NB_ ). 00072 * 00073 * DU (local input/local output) COMPLEX*16 pointer to local 00074 * part of global vector storing the upper diagonal of the 00075 * matrix. Globally, DU(n) is not referenced, and DU must be 00076 * aligned with D. 00077 * On exit, this array contains information containing the 00078 * factors of the matrix. 00079 * Must be of size >= DESCA( NB_ ). 00080 * 00081 * JA (global input) INTEGER 00082 * The index in the global array A that points to the start of 00083 * the matrix to be operated on (which may be either all of A 00084 * or a submatrix of A). 00085 * 00086 * DESCA (global and local input) INTEGER array of dimension DLEN. 00087 * if 1D type (DTYPE_A=501 or 502), DLEN >= 7; 00088 * if 2D type (DTYPE_A=1), DLEN >= 9. 00089 * The array descriptor for the distributed matrix A. 00090 * Contains information of mapping of A to memory. Please 00091 * see NOTES below for full description and options. 00092 * 00093 * B (local input/local output) COMPLEX*16 pointer into 00094 * local memory to an array of local lead dimension lld_b>=NB. 00095 * On entry, this array contains the 00096 * the local pieces of the right hand sides 00097 * B(IB:IB+N-1, 1:NRHS). 00098 * On exit, this contains the local piece of the solutions 00099 * distributed matrix X. 00100 * 00101 * IB (global input) INTEGER 00102 * The row index in the global array B that points to the first 00103 * row of the matrix to be operated on (which may be either 00104 * all of B or a submatrix of B). 00105 * 00106 * DESCB (global and local input) INTEGER array of dimension DLEN. 00107 * if 1D type (DTYPE_B=502), DLEN >=7; 00108 * if 2D type (DTYPE_B=1), DLEN >= 9. 00109 * The array descriptor for the distributed matrix B. 00110 * Contains information of mapping of B to memory. Please 00111 * see NOTES below for full description and options. 00112 * 00113 * AF (local output) COMPLEX*16 array, dimension LAF. 00114 * Auxiliary Fillin Space. 00115 * Fillin is created during the factorization routine 00116 * PZDTTRF and this is stored in AF. If a linear system 00117 * is to be solved using PZDTTRS after the factorization 00118 * routine, AF *must not be altered* after the factorization. 00119 * 00120 * LAF (local input) INTEGER 00121 * Size of user-input Auxiliary Fillin space AF. Must be >= 00122 * 2*(NB+2) 00123 * If LAF is not large enough, an error code will be returned 00124 * and the minimum acceptable size will be returned in AF( 1 ) 00125 * 00126 * WORK (local workspace/local output) 00127 * COMPLEX*16 temporary workspace. This space may 00128 * be overwritten in between calls to routines. WORK must be 00129 * the size given in LWORK. 00130 * On exit, WORK( 1 ) contains the minimal LWORK. 00131 * 00132 * LWORK (local input or global input) INTEGER 00133 * Size of user-input workspace WORK. 00134 * If LWORK is too small, the minimal acceptable size will be 00135 * returned in WORK(1) and an error code is returned. LWORK>= 00136 * 10*NPCOL+4*NRHS 00137 * 00138 * INFO (local output) INTEGER 00139 * = 0: successful exit 00140 * < 0: If the i-th argument is an array and the j-entry had 00141 * an illegal value, then INFO = -(i*100+j), if the i-th 00142 * argument is a scalar and had an illegal value, then 00143 * INFO = -i. 00144 * 00145 * ===================================================================== 00146 * 00147 * 00148 * Restrictions 00149 * ============ 00150 * 00151 * The following are restrictions on the input parameters. Some of these 00152 * are temporary and will be removed in future releases, while others 00153 * may reflect fundamental technical limitations. 00154 * 00155 * Non-cyclic restriction: VERY IMPORTANT! 00156 * P*NB>= mod(JA-1,NB)+N. 00157 * The mapping for matrices must be blocked, reflecting the nature 00158 * of the divide and conquer algorithm as a task-parallel algorithm. 00159 * This formula in words is: no processor may have more than one 00160 * chunk of the matrix. 00161 * 00162 * Blocksize cannot be too small: 00163 * If the matrix spans more than one processor, the following 00164 * restriction on NB, the size of each block on each processor, 00165 * must hold: 00166 * NB >= 2 00167 * The bulk of parallel computation is done on the matrix of size 00168 * O(NB) on each processor. If this is too small, divide and conquer 00169 * is a poor choice of algorithm. 00170 * 00171 * Submatrix reference: 00172 * JA = IB 00173 * Alignment restriction that prevents unnecessary communication. 00174 * 00175 * 00176 * ===================================================================== 00177 * 00178 * 00179 * Notes 00180 * ===== 00181 * 00182 * If the factorization routine and the solve routine are to be called 00183 * separately (to solve various sets of righthand sides using the same 00184 * coefficient matrix), the auxiliary space AF *must not be altered* 00185 * between calls to the factorization routine and the solve routine. 00186 * 00187 * The best algorithm for solving banded and tridiagonal linear systems 00188 * depends on a variety of parameters, especially the bandwidth. 00189 * Currently, only algorithms designed for the case N/P >> bw are 00190 * implemented. These go by many names, including Divide and Conquer, 00191 * Partitioning, domain decomposition-type, etc. 00192 * For tridiagonal matrices, it is obvious: N/P >> bw(=1), and so D&C 00193 * algorithms are the appropriate choice. 00194 * 00195 * Algorithm description: Divide and Conquer 00196 * 00197 * The Divide and Conqer algorithm assumes the matrix is narrowly 00198 * banded compared with the number of equations. In this situation, 00199 * it is best to distribute the input matrix A one-dimensionally, 00200 * with columns atomic and rows divided amongst the processes. 00201 * The basic algorithm divides the tridiagonal matrix up into 00202 * P pieces with one stored on each processor, 00203 * and then proceeds in 2 phases for the factorization or 3 for the 00204 * solution of a linear system. 00205 * 1) Local Phase: 00206 * The individual pieces are factored independently and in 00207 * parallel. These factors are applied to the matrix creating 00208 * fillin, which is stored in a non-inspectable way in auxiliary 00209 * space AF. Mathematically, this is equivalent to reordering 00210 * the matrix A as P A P^T and then factoring the principal 00211 * leading submatrix of size equal to the sum of the sizes of 00212 * the matrices factored on each processor. The factors of 00213 * these submatrices overwrite the corresponding parts of A 00214 * in memory. 00215 * 2) Reduced System Phase: 00216 * A small ((P-1)) system is formed representing 00217 * interaction of the larger blocks, and is stored (as are its 00218 * factors) in the space AF. A parallel Block Cyclic Reduction 00219 * algorithm is used. For a linear system, a parallel front solve 00220 * followed by an analagous backsolve, both using the structure 00221 * of the factored matrix, are performed. 00222 * 3) Backsubsitution Phase: 00223 * For a linear system, a local backsubstitution is performed on 00224 * each processor in parallel. 00225 * 00226 * 00227 * Descriptors 00228 * =========== 00229 * 00230 * Descriptors now have *types* and differ from ScaLAPACK 1.0. 00231 * 00232 * Note: tridiagonal codes can use either the old two dimensional 00233 * or new one-dimensional descriptors, though the processor grid in 00234 * both cases *must be one-dimensional*. We describe both types below. 00235 * 00236 * Each global data object is described by an associated description 00237 * vector. This vector stores the information required to establish 00238 * the mapping between an object element and its corresponding process 00239 * and memory location. 00240 * 00241 * Let A be a generic term for any 2D block cyclicly distributed array. 00242 * Such a global array has an associated description vector DESCA. 00243 * In the following comments, the character _ should be read as 00244 * "of the global array". 00245 * 00246 * NOTATION STORED IN EXPLANATION 00247 * --------------- -------------- -------------------------------------- 00248 * DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 00249 * DTYPE_A = 1. 00250 * CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 00251 * the BLACS process grid A is distribu- 00252 * ted over. The context itself is glo- 00253 * bal, but the handle (the integer 00254 * value) may vary. 00255 * M_A (global) DESCA( M_ ) The number of rows in the global 00256 * array A. 00257 * N_A (global) DESCA( N_ ) The number of columns in the global 00258 * array A. 00259 * MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 00260 * the rows of the array. 00261 * NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 00262 * the columns of the array. 00263 * RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 00264 * row of the array A is distributed. 00265 * CSRC_A (global) DESCA( CSRC_ ) The process column over which the 00266 * first column of the array A is 00267 * distributed. 00268 * LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 00269 * array. LLD_A >= MAX(1,LOCr(M_A)). 00270 * 00271 * Let K be the number of rows or columns of a distributed matrix, 00272 * and assume that its process grid has dimension p x q. 00273 * LOCr( K ) denotes the number of elements of K that a process 00274 * would receive if K were distributed over the p processes of its 00275 * process column. 00276 * Similarly, LOCc( K ) denotes the number of elements of K that a 00277 * process would receive if K were distributed over the q processes of 00278 * its process row. 00279 * The values of LOCr() and LOCc() may be determined via a call to the 00280 * ScaLAPACK tool function, NUMROC: 00281 * LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 00282 * LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 00283 * An upper bound for these quantities may be computed by: 00284 * LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 00285 * LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 00286 * 00287 * 00288 * One-dimensional descriptors: 00289 * 00290 * One-dimensional descriptors are a new addition to ScaLAPACK since 00291 * version 1.0. They simplify and shorten the descriptor for 1D 00292 * arrays. 00293 * 00294 * Since ScaLAPACK supports two-dimensional arrays as the fundamental 00295 * object, we allow 1D arrays to be distributed either over the 00296 * first dimension of the array (as if the grid were P-by-1) or the 00297 * 2nd dimension (as if the grid were 1-by-P). This choice is 00298 * indicated by the descriptor type (501 or 502) 00299 * as described below. 00300 * However, for tridiagonal matrices, since the objects being 00301 * distributed are the individual vectors storing the diagonals, we 00302 * have adopted the convention that both the P-by-1 descriptor and 00303 * the 1-by-P descriptor are allowed and are equivalent for 00304 * tridiagonal matrices. Thus, for tridiagonal matrices, 00305 * DTYPE_A = 501 or 502 can be used interchangeably 00306 * without any other change. 00307 * We require that the distributed vectors storing the diagonals of a 00308 * tridiagonal matrix be aligned with each other. Because of this, a 00309 * single descriptor, DESCA, serves to describe the distribution of 00310 * of all diagonals simultaneously. 00311 * 00312 * IMPORTANT NOTE: the actual BLACS grid represented by the 00313 * CTXT entry in the descriptor may be *either* P-by-1 or 1-by-P 00314 * irrespective of which one-dimensional descriptor type 00315 * (501 or 502) is input. 00316 * This routine will interpret the grid properly either way. 00317 * ScaLAPACK routines *do not support intercontext operations* so that 00318 * the grid passed to a single ScaLAPACK routine *must be the same* 00319 * for all array descriptors passed to that routine. 00320 * 00321 * NOTE: In all cases where 1D descriptors are used, 2D descriptors 00322 * may also be used, since a one-dimensional array is a special case 00323 * of a two-dimensional array with one dimension of size unity. 00324 * The two-dimensional array used in this case *must* be of the 00325 * proper orientation: 00326 * If the appropriate one-dimensional descriptor is DTYPEA=501 00327 * (1 by P type), then the two dimensional descriptor must 00328 * have a CTXT value that refers to a 1 by P BLACS grid; 00329 * If the appropriate one-dimensional descriptor is DTYPEA=502 00330 * (P by 1 type), then the two dimensional descriptor must 00331 * have a CTXT value that refers to a P by 1 BLACS grid. 00332 * 00333 * 00334 * Summary of allowed descriptors, types, and BLACS grids: 00335 * DTYPE 501 502 1 1 00336 * BLACS grid 1xP or Px1 1xP or Px1 1xP Px1 00337 * ----------------------------------------------------- 00338 * A OK OK OK NO 00339 * B NO OK NO OK 00340 * 00341 * Note that a consequence of this chart is that it is not possible 00342 * for *both* DTYPE_A and DTYPE_B to be 2D_type(1), as these lead 00343 * to opposite requirements for the orientation of the BLACS grid, 00344 * and as noted before, the *same* BLACS context must be used in 00345 * all descriptors in a single ScaLAPACK subroutine call. 00346 * 00347 * Let A be a generic term for any 1D block cyclicly distributed array. 00348 * Such a global array has an associated description vector DESCA. 00349 * In the following comments, the character _ should be read as 00350 * "of the global array". 00351 * 00352 * NOTATION STORED IN EXPLANATION 00353 * --------------- ---------- ------------------------------------------ 00354 * DTYPE_A(global) DESCA( 1 ) The descriptor type. For 1D grids, 00355 * TYPE_A = 501: 1-by-P grid. 00356 * TYPE_A = 502: P-by-1 grid. 00357 * CTXT_A (global) DESCA( 2 ) The BLACS context handle, indicating 00358 * the BLACS process grid A is distribu- 00359 * ted over. The context itself is glo- 00360 * bal, but the handle (the integer 00361 * value) may vary. 00362 * N_A (global) DESCA( 3 ) The size of the array dimension being 00363 * distributed. 00364 * NB_A (global) DESCA( 4 ) The blocking factor used to distribute 00365 * the distributed dimension of the array. 00366 * SRC_A (global) DESCA( 5 ) The process row or column over which the 00367 * first row or column of the array 00368 * is distributed. 00369 * Ignored DESCA( 6 ) Ignored for tridiagonal matrices. 00370 * Reserved DESCA( 7 ) Reserved for future use. 00371 * 00372 * 00373 * 00374 * ===================================================================== 00375 * 00376 * Code Developer: Andrew J. Cleary, University of Tennessee. 00377 * Current address: Lawrence Livermore National Labs. 00378 * This version released: August, 2001. 00379 * 00380 * ===================================================================== 00381 * 00382 * .. 00383 * .. Parameters .. 00384 DOUBLE PRECISION ONE, ZERO 00385 PARAMETER ( ONE = 1.0D+0 ) 00386 PARAMETER ( ZERO = 0.0D+0 ) 00387 COMPLEX*16 CONE, CZERO 00388 PARAMETER ( CONE = ( 1.0D+0, 0.0D+0 ) ) 00389 PARAMETER ( CZERO = ( 0.0D+0, 0.0D+0 ) ) 00390 INTEGER INT_ONE 00391 PARAMETER ( INT_ONE = 1 ) 00392 INTEGER DESCMULT, BIGNUM 00393 PARAMETER (DESCMULT = 100, BIGNUM = DESCMULT * DESCMULT) 00394 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 00395 $ LLD_, MB_, M_, NB_, N_, RSRC_ 00396 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 00397 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 00398 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 00399 * .. 00400 * .. Local Scalars .. 00401 INTEGER CSRC, FIRST_PROC, ICTXT, ICTXT_NEW, ICTXT_SAVE, 00402 $ IDUM2, IDUM3, JA_NEW, LLDA, LLDB, MYCOL, MYROW, 00403 $ MY_NUM_COLS, NB, NP, NPCOL, NPROW, NP_SAVE, 00404 $ ODD_SIZE, PART_OFFSET, PART_SIZE, 00405 $ RETURN_CODE, STORE_M_B, STORE_N_A, TEMP, 00406 $ WORK_SIZE_MIN 00407 * .. 00408 * .. Local Arrays .. 00409 INTEGER DESCA_1XP( 7 ), DESCB_PX1( 7 ), 00410 $ PARAM_CHECK( 15, 3 ) 00411 * .. 00412 * .. External Subroutines .. 00413 EXTERNAL BLACS_GRIDINFO, DESC_CONVERT, GLOBCHK, PXERBLA, 00414 $ PZDTTRSV, RESHAPE 00415 * .. 00416 * .. External Functions .. 00417 LOGICAL LSAME 00418 INTEGER NUMROC 00419 COMPLEX*16 ZDOTC 00420 EXTERNAL LSAME, NUMROC, ZDOTC 00421 * .. 00422 * .. Intrinsic Functions .. 00423 INTRINSIC ICHAR, MIN, MOD 00424 * .. 00425 * .. Executable Statements .. 00426 * 00427 * Test the input parameters 00428 * 00429 INFO = 0 00430 * 00431 * Convert descriptor into standard form for easy access to 00432 * parameters, check that grid is of right shape. 00433 * 00434 DESCA_1XP( 1 ) = 501 00435 DESCB_PX1( 1 ) = 502 00436 * 00437 TEMP = DESCA( DTYPE_ ) 00438 IF( TEMP .EQ. 502 ) THEN 00439 * Temporarily set the descriptor type to 1xP type 00440 DESCA( DTYPE_ ) = 501 00441 ENDIF 00442 * 00443 CALL DESC_CONVERT( DESCA, DESCA_1XP, RETURN_CODE ) 00444 * 00445 DESCA( DTYPE_ ) = TEMP 00446 * 00447 IF( RETURN_CODE .NE. 0) THEN 00448 INFO = -( 8*100 + 2 ) 00449 ENDIF 00450 * 00451 CALL DESC_CONVERT( DESCB, DESCB_PX1, RETURN_CODE ) 00452 * 00453 IF( RETURN_CODE .NE. 0) THEN 00454 INFO = -( 11*100 + 2 ) 00455 ENDIF 00456 * 00457 * Consistency checks for DESCA and DESCB. 00458 * 00459 * Context must be the same 00460 IF( DESCA_1XP( 2 ) .NE. DESCB_PX1( 2 ) ) THEN 00461 INFO = -( 11*100 + 2 ) 00462 ENDIF 00463 * 00464 * These are alignment restrictions that may or may not be removed 00465 * in future releases. -Andy Cleary, April 14, 1996. 00466 * 00467 * Block sizes must be the same 00468 IF( DESCA_1XP( 4 ) .NE. DESCB_PX1( 4 ) ) THEN 00469 INFO = -( 11*100 + 4 ) 00470 ENDIF 00471 * 00472 * Source processor must be the same 00473 * 00474 IF( DESCA_1XP( 5 ) .NE. DESCB_PX1( 5 ) ) THEN 00475 INFO = -( 11*100 + 5 ) 00476 ENDIF 00477 * 00478 * Get values out of descriptor for use in code. 00479 * 00480 ICTXT = DESCA_1XP( 2 ) 00481 CSRC = DESCA_1XP( 5 ) 00482 NB = DESCA_1XP( 4 ) 00483 LLDA = DESCA_1XP( 6 ) 00484 STORE_N_A = DESCA_1XP( 3 ) 00485 LLDB = DESCB_PX1( 6 ) 00486 STORE_M_B = DESCB_PX1( 3 ) 00487 * 00488 * Get grid parameters 00489 * 00490 * 00491 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 00492 NP = NPROW * NPCOL 00493 * 00494 * 00495 * 00496 IF( LSAME( TRANS, 'N' ) ) THEN 00497 IDUM2 = ICHAR( 'N' ) 00498 ELSE IF ( LSAME( TRANS, 'C' ) ) THEN 00499 IDUM2 = ICHAR( 'C' ) 00500 ELSE 00501 INFO = -1 00502 END IF 00503 * 00504 IF( LWORK .LT. -1) THEN 00505 INFO = -15 00506 ELSE IF ( LWORK .EQ. -1 ) THEN 00507 IDUM3 = -1 00508 ELSE 00509 IDUM3 = 1 00510 ENDIF 00511 * 00512 IF( N .LT. 0 ) THEN 00513 INFO = -2 00514 ENDIF 00515 * 00516 IF( N+JA-1 .GT. STORE_N_A ) THEN 00517 INFO = -( 8*100 + 6 ) 00518 ENDIF 00519 * 00520 IF( N+IB-1 .GT. STORE_M_B ) THEN 00521 INFO = -( 11*100 + 3 ) 00522 ENDIF 00523 * 00524 IF( LLDB .LT. NB ) THEN 00525 INFO = -( 11*100 + 6 ) 00526 ENDIF 00527 * 00528 IF( NRHS .LT. 0 ) THEN 00529 INFO = -3 00530 ENDIF 00531 * 00532 * Current alignment restriction 00533 * 00534 IF( JA .NE. IB) THEN 00535 INFO = -7 00536 ENDIF 00537 * 00538 * Argument checking that is specific to Divide & Conquer routine 00539 * 00540 IF( NPROW .NE. 1 ) THEN 00541 INFO = -( 8*100+2 ) 00542 ENDIF 00543 * 00544 IF( N .GT. NP*NB-MOD( JA-1, NB )) THEN 00545 INFO = -( 2 ) 00546 CALL PXERBLA( ICTXT, 00547 $ 'PZDTTRS, D&C alg.: only 1 block per proc', 00548 $ -INFO ) 00549 RETURN 00550 ENDIF 00551 * 00552 IF((JA+N-1.GT.NB) .AND. ( NB.LT.2*INT_ONE )) THEN 00553 INFO = -( 8*100+4 ) 00554 CALL PXERBLA( ICTXT, 00555 $ 'PZDTTRS, D&C alg.: NB too small', 00556 $ -INFO ) 00557 RETURN 00558 ENDIF 00559 * 00560 * 00561 WORK_SIZE_MIN = 00562 $ 10*NPCOL+4*NRHS 00563 * 00564 WORK( 1 ) = WORK_SIZE_MIN 00565 * 00566 IF( LWORK .LT. WORK_SIZE_MIN ) THEN 00567 IF( LWORK .NE. -1 ) THEN 00568 INFO = -15 00569 CALL PXERBLA( ICTXT, 00570 $ 'PZDTTRS: worksize error', 00571 $ -INFO ) 00572 ENDIF 00573 RETURN 00574 ENDIF 00575 * 00576 * Pack params and positions into arrays for global consistency check 00577 * 00578 PARAM_CHECK( 15, 1 ) = DESCB(5) 00579 PARAM_CHECK( 14, 1 ) = DESCB(4) 00580 PARAM_CHECK( 13, 1 ) = DESCB(3) 00581 PARAM_CHECK( 12, 1 ) = DESCB(2) 00582 PARAM_CHECK( 11, 1 ) = DESCB(1) 00583 PARAM_CHECK( 10, 1 ) = IB 00584 PARAM_CHECK( 9, 1 ) = DESCA(5) 00585 PARAM_CHECK( 8, 1 ) = DESCA(4) 00586 PARAM_CHECK( 7, 1 ) = DESCA(3) 00587 PARAM_CHECK( 6, 1 ) = DESCA(1) 00588 PARAM_CHECK( 5, 1 ) = JA 00589 PARAM_CHECK( 4, 1 ) = NRHS 00590 PARAM_CHECK( 3, 1 ) = N 00591 PARAM_CHECK( 2, 1 ) = IDUM3 00592 PARAM_CHECK( 1, 1 ) = IDUM2 00593 * 00594 PARAM_CHECK( 15, 2 ) = 1105 00595 PARAM_CHECK( 14, 2 ) = 1104 00596 PARAM_CHECK( 13, 2 ) = 1103 00597 PARAM_CHECK( 12, 2 ) = 1102 00598 PARAM_CHECK( 11, 2 ) = 1101 00599 PARAM_CHECK( 10, 2 ) = 10 00600 PARAM_CHECK( 9, 2 ) = 805 00601 PARAM_CHECK( 8, 2 ) = 804 00602 PARAM_CHECK( 7, 2 ) = 803 00603 PARAM_CHECK( 6, 2 ) = 801 00604 PARAM_CHECK( 5, 2 ) = 7 00605 PARAM_CHECK( 4, 2 ) = 3 00606 PARAM_CHECK( 3, 2 ) = 2 00607 PARAM_CHECK( 2, 2 ) = 15 00608 PARAM_CHECK( 1, 2 ) = 1 00609 * 00610 * Want to find errors with MIN( ), so if no error, set it to a big 00611 * number. If there already is an error, multiply by the the 00612 * descriptor multiplier. 00613 * 00614 IF( INFO.GE.0 ) THEN 00615 INFO = BIGNUM 00616 ELSE IF( INFO.LT.-DESCMULT ) THEN 00617 INFO = -INFO 00618 ELSE 00619 INFO = -INFO * DESCMULT 00620 END IF 00621 * 00622 * Check consistency across processors 00623 * 00624 CALL GLOBCHK( ICTXT, 15, PARAM_CHECK, 15, 00625 $ PARAM_CHECK( 1, 3 ), INFO ) 00626 * 00627 * Prepare output: set info = 0 if no error, and divide by DESCMULT 00628 * if error is not in a descriptor entry. 00629 * 00630 IF( INFO.EQ.BIGNUM ) THEN 00631 INFO = 0 00632 ELSE IF( MOD( INFO, DESCMULT ) .EQ. 0 ) THEN 00633 INFO = -INFO / DESCMULT 00634 ELSE 00635 INFO = -INFO 00636 END IF 00637 * 00638 IF( INFO.LT.0 ) THEN 00639 CALL PXERBLA( ICTXT, 'PZDTTRS', -INFO ) 00640 RETURN 00641 END IF 00642 * 00643 * Quick return if possible 00644 * 00645 IF( N.EQ.0 ) 00646 $ RETURN 00647 * 00648 IF( NRHS.EQ.0 ) 00649 $ RETURN 00650 * 00651 * 00652 * Adjust addressing into matrix space to properly get into 00653 * the beginning part of the relevant data 00654 * 00655 PART_OFFSET = NB*( (JA-1)/(NPCOL*NB) ) 00656 * 00657 IF ( (MYCOL-CSRC) .LT. (JA-PART_OFFSET-1)/NB ) THEN 00658 PART_OFFSET = PART_OFFSET + NB 00659 ENDIF 00660 * 00661 IF ( MYCOL .LT. CSRC ) THEN 00662 PART_OFFSET = PART_OFFSET - NB 00663 ENDIF 00664 * 00665 * Form a new BLACS grid (the "standard form" grid) with only procs 00666 * holding part of the matrix, of size 1xNP where NP is adjusted, 00667 * starting at csrc=0, with JA modified to reflect dropped procs. 00668 * 00669 * First processor to hold part of the matrix: 00670 * 00671 FIRST_PROC = MOD( ( JA-1 )/NB+CSRC, NPCOL ) 00672 * 00673 * Calculate new JA one while dropping off unused processors. 00674 * 00675 JA_NEW = MOD( JA-1, NB ) + 1 00676 * 00677 * Save and compute new value of NP 00678 * 00679 NP_SAVE = NP 00680 NP = ( JA_NEW+N-2 )/NB + 1 00681 * 00682 * Call utility routine that forms "standard-form" grid 00683 * 00684 CALL RESHAPE( ICTXT, INT_ONE, ICTXT_NEW, INT_ONE, 00685 $ FIRST_PROC, INT_ONE, NP ) 00686 * 00687 * Use new context from standard grid as context. 00688 * 00689 ICTXT_SAVE = ICTXT 00690 ICTXT = ICTXT_NEW 00691 DESCA_1XP( 2 ) = ICTXT_NEW 00692 DESCB_PX1( 2 ) = ICTXT_NEW 00693 * 00694 * Get information about new grid. 00695 * 00696 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 00697 * 00698 * Drop out processors that do not have part of the matrix. 00699 * 00700 IF( MYROW .LT. 0 ) THEN 00701 GOTO 1234 00702 ENDIF 00703 * 00704 * ******************************** 00705 * Values reused throughout routine 00706 * 00707 * User-input value of partition size 00708 * 00709 PART_SIZE = NB 00710 * 00711 * Number of columns in each processor 00712 * 00713 MY_NUM_COLS = NUMROC( N, PART_SIZE, MYCOL, 0, NPCOL ) 00714 * 00715 * Offset in columns to beginning of main partition in each proc 00716 * 00717 IF ( MYCOL .EQ. 0 ) THEN 00718 PART_OFFSET = PART_OFFSET+MOD( JA_NEW-1, PART_SIZE ) 00719 MY_NUM_COLS = MY_NUM_COLS - MOD(JA_NEW-1, PART_SIZE ) 00720 ENDIF 00721 * 00722 * Size of main (or odd) partition in each processor 00723 * 00724 ODD_SIZE = MY_NUM_COLS 00725 IF ( MYCOL .LT. NP-1 ) THEN 00726 ODD_SIZE = ODD_SIZE - INT_ONE 00727 ENDIF 00728 * 00729 * 00730 * 00731 * Begin main code 00732 * 00733 INFO = 0 00734 * 00735 * Call frontsolve routine 00736 * 00737 IF( LSAME( TRANS, 'N' ) ) THEN 00738 * 00739 CALL PZDTTRSV( 'L', 'N', N, NRHS, DL( PART_OFFSET+1 ), 00740 $ D( PART_OFFSET+1 ), DU( PART_OFFSET+1 ), JA_NEW, 00741 $ DESCA_1XP, B, IB, DESCB_PX1, AF, LAF, WORK, 00742 $ LWORK, INFO ) 00743 * 00744 ELSE 00745 * 00746 CALL PZDTTRSV( 'U', 'C', N, NRHS, DL( PART_OFFSET+1 ), 00747 $ D( PART_OFFSET+1 ), DU( PART_OFFSET+1 ), JA_NEW, 00748 $ DESCA_1XP, B, IB, DESCB_PX1, AF, LAF, WORK, 00749 $ LWORK, INFO ) 00750 * 00751 ENDIF 00752 * 00753 * Call backsolve routine 00754 * 00755 IF( LSAME( TRANS, 'C' ) ) THEN 00756 * 00757 CALL PZDTTRSV( 'L', 'C', N, NRHS, DL( PART_OFFSET+1 ), 00758 $ D( PART_OFFSET+1 ), DU( PART_OFFSET+1 ), JA_NEW, 00759 $ DESCA_1XP, B, IB, DESCB_PX1, AF, LAF, WORK, 00760 $ LWORK, INFO ) 00761 * 00762 ELSE 00763 * 00764 CALL PZDTTRSV( 'U', 'N', N, NRHS, DL( PART_OFFSET+1 ), 00765 $ D( PART_OFFSET+1 ), DU( PART_OFFSET+1 ), JA_NEW, 00766 $ DESCA_1XP, B, IB, DESCB_PX1, AF, LAF, WORK, 00767 $ LWORK, INFO ) 00768 * 00769 ENDIF 00770 1000 CONTINUE 00771 * 00772 * 00773 * Free BLACS space used to hold standard-form grid. 00774 * 00775 IF( ICTXT_SAVE .NE. ICTXT_NEW ) THEN 00776 CALL BLACS_GRIDEXIT( ICTXT_NEW ) 00777 ENDIF 00778 * 00779 1234 CONTINUE 00780 * 00781 * Restore saved input parameters 00782 * 00783 ICTXT = ICTXT_SAVE 00784 NP = NP_SAVE 00785 * 00786 * Output minimum worksize 00787 * 00788 WORK( 1 ) = WORK_SIZE_MIN 00789 * 00790 * 00791 RETURN 00792 * 00793 * End of PZDTTRS 00794 * 00795 END