SUBROUTINE PZLAHQR( WANTT, WANTZ, N, ILO, IHI, A, DESCA, W, ILOZ, $ IHIZ, Z, DESCZ, WORK, LWORK, IWORK, ILWORK, $ INFO ) * * -- ScaLAPACK routine (version 1.7.3) -- * University of Tennessee, Knoxville, Oak Ridge National Laboratory, * and University of California, Berkeley. * 1.7.3: March 22, 2006 * modification suggested by Mark Fahey and Greg Henry * 1.7.0: July 31, 2001 * * .. Scalar Arguments .. LOGICAL WANTT, WANTZ INTEGER IHI, IHIZ, ILO, ILOZ, ILWORK, INFO, LWORK, N * .. * .. Array Arguments .. INTEGER DESCA( * ), DESCZ( * ), IWORK( * ) COMPLEX*16 A( * ), W( * ), WORK( * ), Z( * ) * .. * * Purpose * ======= * * PZLAHQR is an auxiliary routine used to find the Schur decomposition * and or eigenvalues of a matrix already in Hessenberg form from * cols ILO to IHI. * If Z = I, and WANTT=WANTZ=.TRUE., H gets replaced with Z'HZ, * with Z'Z=I, and H in Schur form. * * Notes * ===== * * Each global data object is described by an associated description * vector. This vector stores the information required to establish * the mapping between an object element and its corresponding process * and memory location. * * Let A be a generic term for any 2D block cyclicly distributed array. * Such a global array has an associated description vector DESCA. * In the following comments, the character _ should be read as * "of the global array". * * NOTATION STORED IN EXPLANATION * --------------- -------------- -------------------------------------- * DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, * DTYPE_A = 1. * CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating * the BLACS process grid A is distribu- * ted over. The context itself is glo- * bal, but the handle (the integer * value) may vary. * M_A (global) DESCA( M_ ) The number of rows in the global * array A. * N_A (global) DESCA( N_ ) The number of columns in the global * array A. * MB_A (global) DESCA( MB_ ) The blocking factor used to distribute * the rows of the array. * NB_A (global) DESCA( NB_ ) The blocking factor used to distribute * the columns of the array. * RSRC_A (global) DESCA( RSRC_ ) The process row over which the first * row of the array A is distributed. * CSRC_A (global) DESCA( CSRC_ ) The process column over which the * first column of the array A is * distributed. * LLD_A (local) DESCA( LLD_ ) The leading dimension of the local * array. LLD_A >= MAX(1,LOCp(M_A)). * * Let K be the number of rows or columns of a distributed matrix, * and assume that its process grid has dimension p x q. * LOCp( K ) denotes the number of elements of K that a process * would receive if K were distributed over the p processes of its * process column. * Similarly, LOCq( K ) denotes the number of elements of K that a * process would receive if K were distributed over the q processes of * its process row. * The values of LOCp() and LOCq() may be determined via a call to the * ScaLAPACK tool function, NUMROC: * LOCp( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), * LOCq( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). * An upper bound for these quantities may be computed by: * LOCp( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A * LOCq( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A * * Arguments * ========= * * WANTT (global input) LOGICAL * = .TRUE. : the full Schur form T is required; * = .FALSE.: only eigenvalues are required. * * WANTZ (global input) LOGICAL * = .TRUE. : the matrix of Schur vectors Z is required; * = .FALSE.: Schur vectors are not required. * * N (global input) INTEGER * The order of the Hessenberg matrix A (and Z if WANTZ). * N >= 0. * * ILO (global input) INTEGER * IHI (global input) INTEGER * It is assumed that A is already upper quasi-triangular in * rows and columns IHI+1:N, and that A(ILO,ILO-1) = 0 (unless * ILO = 1). PZLAHQR works primarily with the Hessenberg * submatrix in rows and columns ILO to IHI, but applies * transformations to all of H if WANTT is .TRUE.. * 1 <= ILO <= max(1,IHI); IHI <= N. * * A (global input/output) COMPLEX*16 array, dimension * (DESCA(LLD_),*) * On entry, the upper Hessenberg matrix A. * On exit, if WANTT is .TRUE., A is upper triangular in rows * and columns ILO:IHI. If WANTT is .FALSE., the contents of * A are unspecified on exit. * * DESCA (global and local input) INTEGER array of dimension DLEN_. * The array descriptor for the distributed matrix A. * * W (global replicated output) COMPLEX*16 array, dimension (N) * The computed eigenvalues ILO to IHI are stored in the * corresponding elements of W. If WANTT is .TRUE., the * eigenvalues are stored in the same order as on the diagonal * of the Schur form returned in A. A may be returned with * larger diagonal blocks until the next release. * * ILOZ (global input) INTEGER * IHIZ (global input) INTEGER * Specify the rows of Z to which transformations must be * applied if WANTZ is .TRUE.. * 1 <= ILOZ <= ILO; IHI <= IHIZ <= N. * * Z (global input/output) COMPLEX*16 array. * If WANTZ is .TRUE., on entry Z must contain the current * matrix Z of transformations accumulated by PZHSEQR, and on * exit Z has been updated; transformations are applied only to * the submatrix Z(ILOZ:IHIZ,ILO:IHI). * If WANTZ is .FALSE., Z is not referenced. * * DESCZ (global and local input) INTEGER array of dimension DLEN_. * The array descriptor for the distributed matrix Z. * * WORK (local output) COMPLEX*16 array of size LWORK * (Unless LWORK=-1, in which case WORK must be at least size 1) * * LWORK (local input) INTEGER * WORK(LWORK) is a local array and LWORK is assumed big enough * so that LWORK >= 3*N + * MAX( 2*MAX(DESCZ(LLD_),DESCA(LLD_)) + 2*LOCq(N), * 7*Ceil(N/HBL)/LCM(NPROW,NPCOL)) + * MAX( 2*N, (8*LCM(NPROW,NPCOL)+2)**2 ) * If LWORK=-1, then WORK(1) gets set to the above number and * the code returns immediately. * * IWORK (global and local input) INTEGER array of size ILWORK * This will hold some of the IBLK integer arrays. * This is held as a place holder for a future release. * Currently unreferenced. * * ILWORK (local input) INTEGER * This will hold the size of the IWORK array. * This is held as a place holder for a future release. * Currently unreferenced. * * INFO (global output) INTEGER * < 0: parameter number -INFO incorrect or inconsistent * = 0: successful exit * > 0: PZLAHQR failed to compute all the eigenvalues ILO to IHI * in a total of 30*(IHI-ILO+1) iterations; if INFO = i, * elements i+1:ihi of W contains those eigenvalues * which have been successfully computed. * * Logic: * This algorithm is very similar to DLAHQR. Unlike DLAHQR, * instead of sending one double shift through the largest * unreduced submatrix, this algorithm sends multiple double shifts * and spaces them apart so that there can be parallelism across * several processor row/columns. Another critical difference is * that this algorithm aggregrates multiple transforms together in * order to apply them in a block fashion. * * Important Local Variables: * IBLK = The maximum number of bulges that can be computed. * Currently fixed. Future releases this won't be fixed. * HBL = The square block size (HBL=DESCA(MB_)=DESCA(NB_)) * ROTN = The number of transforms to block together * NBULGE = The number of bulges that will be attempted on the * current submatrix. * IBULGE = The current number of bulges started. * K1(*),K2(*) = The current bulge loops from K1(*) to K2(*). * * Subroutines: * From LAPACK, this routine calls: * ZLAHQR -> Serial QR used to determine shifts and * eigenvalues * ZLARFG -> Determine the Householder transforms * * This ScaLAPACK, this routine calls: * PZLACONSB -> To determine where to start each iteration * ZLAMSH -> Sends multiple shifts through a small * submatrix to see how the consecutive * subdiagonals change (if PZLACONSB indicates * we can start a run in the middle) * PZLAWIL -> Given the shift, get the transformation * PZLACP3 -> Parallel array to local replicated array copy * & back. * ZLAREF -> Row/column reflector applier. Core routine * here. * PZLASMSUB -> Finds negligible subdiagonal elements. * * Current Notes and/or Restrictions: * 1.) This code requires the distributed block size to be square * and at least six (6); unlike simpler codes like LU, this * algorithm is extremely sensitive to block size. Unwise * choices of too small a block size can lead to bad * performance. * 2.) This code requires A and Z to be distributed identically * and have identical contxts. A future version may allow Z to * have a different contxt to 1D row map it to all nodes (so no * communication on Z is necessary.) * 3.) This code does not currently block the initial transforms * so that none of the rows or columns for any bulge are * completed until all are started. To offset pipeline * start-up it is recommended that at least 2*LCM(NPROW,NPCOL) * bulges are used (if possible) * 4.) The maximum number of bulges currently supported is fixed at * 32. In future versions this will be limited only by the * incoming WORK and IWORK array. * 5.) The matrix A must be in upper Hessenberg form. If elements * below the subdiagonal are nonzero, the resulting transforms * may be nonsimilar. This is also true with the LAPACK * routine ZLAHQR. * 6.) For this release, this code has only been tested for * RSRC_=CSRC_=0, but it has been written for the general case. * 7.) Currently, all the eigenvalues are distributed to all the * nodes. Future releases will probably distribute the * eigenvalues by the column partitioning. * 8.) The internals of this routine are subject to change. * 9.) To optimize this for your architecture, try tuning ZLAREF. * 10.) This code has only been tested for WANTZ = .TRUE. and may * behave unpredictably for WANTZ set to .FALSE. * * Further Details * =============== * * Contributed by Mark Fahey, June, 2000. * * ===================================================================== * * .. Parameters .. INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DT_, $ LLD_, MB_, M_, NB_, N_, RSRC_ PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DT_ = 1, $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) DOUBLE PRECISION RONE PARAMETER ( RONE = 1.0D+0 ) COMPLEX*16 ZERO, ONE PARAMETER ( ZERO = ( 0.0D+0, 0.0D+0 ), $ ONE = ( 1.0D+0, 0.0D+0 ) ) DOUBLE PRECISION CONST PARAMETER ( CONST = 1.50D+0 ) INTEGER IBLK PARAMETER ( IBLK = 32 ) * .. * .. Local Scalars .. LOGICAL SKIP INTEGER CONTXT, DOWN, HBL, I, I1, I2, IAFIRST, IBULGE, $ ICBUF, ICOL, ICOL1, ICOL2, IDIA, IERR, II, $ IRBUF, IROW, IROW1, IROW2, ISPEC, ISTART, $ ISTARTCOL, ISTARTROW, ISTOP, ISUB, ISUP, $ ITERMAX, ITMP1, ITMP2, ITN, ITS, IZBUF, J, $ JAFIRST, JBLK, JJ, K, KI, L, LCMRC, LDA, LDZ, $ LEFT, LIHIH, LIHIZ, LILOH, LILOZ, LOCALI1, $ LOCALI2, LOCALK, LOCALM, M, MODKM1, MYCOL, $ MYROW, NBULGE, NH, NODE, NPCOL, NPROW, NQ, NR, $ NUM, NZ, RIGHT, ROTN, UP, VECSIDX DOUBLE PRECISION CS, OVFL, S, SMLNUM, ULP, UNFL COMPLEX*16 CDUM, H10, H11, H22, H33, H43H34, H44, SN, SUM, $ T1, T1COPY, T2, T3, V1SAVE, V2, V2SAVE, V3, $ V3SAVE * .. * .. Local Arrays .. INTEGER ICURCOL( IBLK ), ICURROW( IBLK ), K1( IBLK ), $ K2( IBLK ), KCOL( IBLK ), KP2COL( IBLK ), $ KP2ROW( IBLK ), KROW( IBLK ) COMPLEX*16 S1( 2*IBLK, 2*IBLK ), SMALLA( 6, 6, IBLK ), $ VCOPY( 3 ) * .. * .. External Functions .. INTEGER ILCM, NUMROC DOUBLE PRECISION PDLAMCH EXTERNAL ILCM, NUMROC, PDLAMCH * .. * .. External Subroutines .. EXTERNAL BLACS_GRIDINFO, IGAMN2D, IGEBR2D, IGEBS2D, $ INFOG1L, INFOG2L, PDLABAD, PXERBLA, PZLACONSB, $ PZLACP3, PZLASMSUB, PZLAWIL, PZROT, ZCOPY, $ ZGEBR2D, ZGEBS2D, ZGERV2D, ZGESD2D, ZGSUM2D, $ ZLAHQR2, ZLAMSH, ZLANV2, ZLAREF, ZLARFG * .. * .. Intrinsic Functions .. * INTRINSIC ABS, DBLE, DCONJG, DIMAG, MAX, MIN, MOD * .. * .. Statement Functions .. DOUBLE PRECISION CABS1 * .. * .. Statement Function definitions .. CABS1( CDUM ) = ABS( DBLE( CDUM ) ) + ABS( DIMAG( CDUM ) ) * .. * .. Executable Statements .. * INFO = 0 * ITERMAX = 30*( IHI-ILO+1 ) IF( N.EQ.0 ) $ RETURN * * NODE (IAFIRST,JAFIRST) OWNS A(1,1) * HBL = DESCA( MB_ ) CONTXT = DESCA( CTXT_ ) LDA = DESCA( LLD_ ) IAFIRST = DESCA( RSRC_ ) JAFIRST = DESCA( CSRC_ ) LDZ = DESCZ( LLD_ ) CALL BLACS_GRIDINFO( CONTXT, NPROW, NPCOL, MYROW, MYCOL ) NODE = MYROW*NPCOL + MYCOL NUM = NPROW*NPCOL LEFT = MOD( MYCOL+NPCOL-1, NPCOL ) RIGHT = MOD( MYCOL+1, NPCOL ) UP = MOD( MYROW+NPROW-1, NPROW ) DOWN = MOD( MYROW+1, NPROW ) LCMRC = ILCM( NPROW, NPCOL ) IF( ( NPROW.LE.3 ) .OR. ( NPCOL.LE.3 ) ) THEN SKIP = .TRUE. ELSE SKIP = .FALSE. END IF * * Determine the number of columns we have so we can check workspace * NQ = NUMROC( N, HBL, MYCOL, JAFIRST, NPCOL ) JJ = N / HBL IF( JJ*HBL.LT.N ) $ JJ = JJ + 1 JJ = 7*JJ / LCMRC JJ = 3*N + MAX( 2*MAX( LDA, LDZ )+2*NQ, JJ ) JJ = JJ + MAX( 2*N, ( 8*LCMRC+2 )**2 ) IF( LWORK.EQ.-1 ) THEN WORK( 1 ) = JJ RETURN END IF IF( LWORK.LT.JJ ) THEN INFO = -14 END IF IF( DESCZ( CTXT_ ).NE.DESCA( CTXT_ ) ) THEN INFO = -( 1300+CTXT_ ) END IF IF( DESCA( MB_ ).NE.DESCA( NB_ ) ) THEN INFO = -( 700+NB_ ) END IF IF( DESCZ( MB_ ).NE.DESCZ( NB_ ) ) THEN INFO = -( 1300+NB_ ) END IF IF( DESCA( MB_ ).NE.DESCZ( MB_ ) ) THEN INFO = -( 1300+MB_ ) END IF IF( ( DESCA( RSRC_ ).NE.0 ) .OR. ( DESCA( CSRC_ ).NE.0 ) ) THEN INFO = -( 700+RSRC_ ) END IF IF( ( DESCZ( RSRC_ ).NE.0 ) .OR. ( DESCZ( CSRC_ ).NE.0 ) ) THEN INFO = -( 1300+RSRC_ ) END IF IF( ( ILO.GT.N ) .OR. ( ILO.LT.1 ) ) THEN INFO = -4 END IF IF( ( IHI.GT.N ) .OR. ( IHI.LT.1 ) ) THEN INFO = -5 END IF IF( HBL.LT.5 ) THEN INFO = -( 700+MB_ ) END IF CALL IGAMN2D( CONTXT, 'ALL', ' ', 1, 1, INFO, 1, ITMP1, ITMP2, -1, $ -1, -1 ) IF( INFO.LT.0 ) THEN CALL PXERBLA( CONTXT, 'PZLAHQR', -INFO ) RETURN END IF * * Set work array indices * VECSIDX = 0 IDIA = 3*N ISUB = 3*N ISUP = 3*N IRBUF = 3*N ICBUF = 3*N IZBUF = 5*N * * Find a value for ROTN * ROTN = HBL / 3 ROTN = MIN( ROTN, HBL-2 ) ROTN = MAX( ROTN, 1 ) * IF( ILO.EQ.IHI ) THEN CALL INFOG2L( ILO, ILO, DESCA, NPROW, NPCOL, MYROW, MYCOL, $ IROW, ICOL, II, JJ ) IF( ( MYROW.EQ.II ) .AND. ( MYCOL.EQ.JJ ) ) THEN W( ILO ) = A( ( ICOL-1 )*LDA+IROW ) ELSE W( ILO ) = ZERO END IF RETURN END IF * NH = IHI - ILO + 1 NZ = IHIZ - ILOZ + 1 * CALL INFOG1L( ILOZ, HBL, NPROW, MYROW, IAFIRST, LILOZ, LIHIZ ) LIHIZ = NUMROC( IHIZ, HBL, MYROW, IAFIRST, NPROW ) * * Set machine-dependent constants for the stopping criterion. * If NORM(H) <= SQRT(OVFL), overflow should not occur. * UNFL = PDLAMCH( CONTXT, 'SAFE MINIMUM' ) OVFL = RONE / UNFL CALL PDLABAD( CONTXT, UNFL, OVFL ) ULP = PDLAMCH( CONTXT, 'PRECISION' ) SMLNUM = UNFL*( NH / ULP ) * * I1 and I2 are the indices of the first row and last column of H * to which transformations must be applied. If eigenvalues only are * being computed, I1 and I2 are set inside the main loop. * IF( WANTT ) THEN I1 = 1 I2 = N END IF * * ITN is the total number of QR iterations allowed. * ITN = ITERMAX * * The main loop begins here. I is the loop index and decreases from * IHI to ILO in steps of our schur block size (<=2*IBLK). Each * iteration of the loop works with the active submatrix in rows * and columns L to I. Eigenvalues I+1 to IHI have already * converged. Either L = ILO or the global A(L,L-1) is negligible * so that the matrix splits. * I = IHI 10 CONTINUE L = ILO IF( I.LT.ILO ) $ GO TO 570 * * Perform QR iterations on rows and columns ILO to I until a * submatrix of order 1 or 2 splits off at the bottom because a * subdiagonal element has become negligible. * DO 540 ITS = 0, ITN * * Look for a single small subdiagonal element. * CALL PZLASMSUB( A, DESCA, I, L, K, SMLNUM, WORK( IRBUF+1 ), $ LWORK-IRBUF ) L = K * IF( L.GT.ILO ) THEN * * H(L,L-1) is negligible * CALL INFOG2L( L, L-1, DESCA, NPROW, NPCOL, MYROW, MYCOL, $ IROW, ICOL, ITMP1, ITMP2 ) IF( ( MYROW.EQ.ITMP1 ) .AND. ( MYCOL.EQ.ITMP2 ) ) THEN A( ( ICOL-1 )*LDA+IROW ) = ZERO END IF WORK( ISUB+L-1 ) = ZERO END IF * * Exit from loop if a submatrix of order 1 or 2 has split off. * IF( WANTT ) THEN * For Schur form, use 2x2 blocks IF( L.GE.I-1 ) THEN GO TO 550 END IF ELSE * If we don't want the Schur form, use bigger blocks. IF( L.GE.I-( 2*IBLK-1 ) ) THEN GO TO 550 END IF END IF * * Now the active submatrix is in rows and columns L to I. If * eigenvalues only are being computed, only the active submatrix * need be transformed. * IF( .NOT.WANTT ) THEN I1 = L I2 = I END IF * * Copy submatrix of size 2*JBLK and prepare to do generalized * Wilkinson shift or an exceptional shift * JBLK = MIN( IBLK, ( ( I-L+1 ) / 2 )-1 ) IF( JBLK.GT.LCMRC ) THEN * * Make sure it's divisible by LCM (we want even workloads!) * JBLK = JBLK - MOD( JBLK, LCMRC ) END IF JBLK = MIN( JBLK, 2*LCMRC ) JBLK = MAX( JBLK, 1 ) * CALL PZLACP3( 2*JBLK, I-2*JBLK+1, A, DESCA, S1, 2*IBLK, -1, -1, $ 0 ) IF( ( ITS.EQ.20 .OR. ITS.EQ.40 ) .AND. ( JBLK.GT.1 ) ) THEN * * Exceptional shift. * DO 20 II = 2*JBLK, 2, -1 S1( II, II ) = CONST*( CABS1( S1( II, II ) )+ $ CABS1( S1( II, II-1 ) ) ) S1( II, II-1 ) = ZERO S1( II-1, II ) = ZERO 20 CONTINUE S1( 1, 1 ) = CONST*CABS1( S1( 1, 1 ) ) ELSE CALL ZLAHQR2( .FALSE., .FALSE., 2*JBLK, 1, 2*JBLK, S1, $ 2*IBLK, WORK( IRBUF+1 ), 1, 2*JBLK, Z, LDZ, $ IERR ) * * Prepare to use Wilkinson's double shift * H44 = S1( 2*JBLK, 2*JBLK ) H33 = S1( 2*JBLK-1, 2*JBLK-1 ) H43H34 = S1( 2*JBLK-1, 2*JBLK )*S1( 2*JBLK, 2*JBLK-1 ) * END IF * * Look for two consecutive small subdiagonal elements: * PZLACONSB is the routine that does this. * CALL PZLACONSB( A, DESCA, I, L, M, H44, H33, H43H34, $ WORK( IRBUF+1 ), LWORK-IRBUF ) * * Double-shift QR step * * NBULGE is the number of bulges that will be attempted * ISTOP = MIN( M+ROTN-1-MOD( M-( M / HBL )*HBL-1, ROTN ), I-2 ) ISTOP = MIN( ISTOP, M+HBL-3-MOD( M-1, HBL ) ) ISTOP = MIN( ISTOP, I2-2 ) ISTOP = MAX( ISTOP, M ) NBULGE = ( I-1-ISTOP ) / HBL * * Do not exceed maximum determined. * NBULGE = MIN( NBULGE, JBLK ) IF( NBULGE.GT.LCMRC ) THEN * * Make sure it's divisible by LCM (we want even workloads!) * NBULGE = NBULGE - MOD( NBULGE, LCMRC ) END IF NBULGE = MAX( NBULGE, 1 ) * * If we are starting in the middle because of consecutive small * subdiagonal elements, we need to see how many bulges we * can send through without breaking the consecutive small * subdiagonal property. * IF( ( NBULGE.GT.1 ) .AND. ( M.GT.L ) ) THEN * * Copy a chunk of elements from global A(M-1:,M-1:) * CALL INFOG2L( M+2, M+2, DESCA, NPROW, NPCOL, MYROW, MYCOL, $ IROW1, ICOL1, ITMP1, ITMP2 ) II = MIN( 4*NBULGE+2, N-M+2 ) CALL PZLACP3( II, M-1, A, DESCA, WORK( IRBUF+1 ), II, ITMP1, $ ITMP2, 0 ) IF( ( MYROW.EQ.ITMP1 ) .AND. ( MYCOL.EQ.ITMP2 ) ) THEN * * Find a new NBULGE based on the bulges we have. * CALL ZLAMSH( S1, 2*IBLK, NBULGE, JBLK, WORK( IRBUF+1 ), $ II, II, ULP ) IF( NUM.GT.1 ) THEN CALL IGEBS2D( CONTXT, 'ALL', ' ', 1, 1, NBULGE, 1 ) END IF ELSE * * Everyone needs to receive the new NBULGE * CALL IGEBR2D( CONTXT, 'ALL', ' ', 1, 1, NBULGE, 1, ITMP1, $ ITMP2 ) END IF END IF * * IBULGE is the number of bulges going so far * IBULGE = 1 * * "A" row defs : main row transforms from LOCALK to LOCALI2 * CALL INFOG1L( M, HBL, NPCOL, MYCOL, JAFIRST, ITMP1, LOCALK ) LOCALK = NQ CALL INFOG1L( 1, HBL, NPCOL, MYCOL, JAFIRST, ICOL1, LOCALI2 ) LOCALI2 = NUMROC( I2, HBL, MYCOL, JAFIRST, NPCOL ) * * "A" col defs : main col transforms from LOCALI1 to LOCALM * CALL INFOG1L( I1, HBL, NPROW, MYROW, IAFIRST, LOCALI1, ICOL1 ) CALL INFOG1L( 1, HBL, NPROW, MYROW, IAFIRST, LOCALM, ICOL1 ) ICOL1 = NUMROC( MIN( M+3, I ), HBL, MYROW, IAFIRST, NPROW ) * * Which row & column will start the bulges * ISTARTROW = MOD( ( M+1 ) / HBL, NPROW ) + IAFIRST ISTARTCOL = MOD( ( M+1 ) / HBL, NPCOL ) + JAFIRST * CALL INFOG1L( M, HBL, NPROW, MYROW, IAFIRST, II, ITMP2 ) CALL INFOG1L( M, HBL, NPCOL, MYCOL, JAFIRST, JJ, ITMP2 ) CALL INFOG1L( 1, HBL, NPROW, MYROW, IAFIRST, ISTOP, $ KP2ROW( 1 ) ) KP2ROW( 1 ) = NUMROC( M+2, HBL, MYROW, IAFIRST, NPROW ) CALL INFOG1L( 1, HBL, NPCOL, MYCOL, JAFIRST, ISTOP, $ KP2COL( 1 ) ) KP2COL( 1 ) = NUMROC( M+2, HBL, MYCOL, JAFIRST, NPCOL ) * * Set all values for bulges. All bulges are stored in * intermediate steps as loops over KI. Their current "task" * over the global M to I-1 values is always K1(KI) to K2(KI). * However, because there are many bulges, K1(KI) & K2(KI) might * go past that range while later bulges (KI+1,KI+2,etc..) are * finishing up. Even if ROTN=1, in order to minimize border * communication sometimes K1(KI)=HBL-2 & K2(KI)=HBL-1 so both * border messages can be handled at once. * * Rules: * If MOD(K1(KI)-1,HBL) < HBL-2 then MOD(K2(KI)-1,HBL)