SUBROUTINE PSLAHQR( WANTT, WANTZ, N, ILO, IHI, A, DESCA, WR, WI, $ ILOZ, IHIZ, Z, DESCZ, WORK, LWORK, IWORK, $ ILWORK, INFO ) * * -- ScaLAPACK auxiliary routine (version 1.5) -- * University of Tennessee, Knoxville, Oak Ridge National Laboratory, * and University of California, Berkeley. * May 1, 1997 * * .. Scalar Arguments .. LOGICAL WANTT, WANTZ INTEGER IHI, IHIZ, ILO, ILOZ, ILWORK, INFO, LWORK, N, $ ROTN * .. * .. Array Arguments .. INTEGER DESCA( * ), DESCZ( * ), IWORK( * ) REAL A( * ), WI( * ), WORK( * ), WR( * ), Z( * ) * .. * * Purpose * ======= * * PSLAHQR is an auxiliary routine used to find the Schur decomposition * and or eigenvalues of a matrix already in Hessenberg form from * cols ILO to IHI. * If Z = I, and WANTT=WANTZ=.TRUE., H gets replaced with Z'HZ, * with Z'Z=I, and H in Schur form. * * Notes * ===== * * Each global data object is described by an associated description * vector. This vector stores the information required to establish * the mapping between an object element and its corresponding process * and memory location. * * Let A be a generic term for any 2D block cyclicly distributed array. * Such a global array has an associated description vector DESCA. * In the following comments, the character _ should be read as * "of the global array". * * NOTATION STORED IN EXPLANATION * --------------- -------------- -------------------------------------- * DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, * DTYPE_A = 1. * CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating * the BLACS process grid A is distribu- * ted over. The context itself is glo- * bal, but the handle (the integer * value) may vary. * M_A (global) DESCA( M_ ) The number of rows in the global * array A. * N_A (global) DESCA( N_ ) The number of columns in the global * array A. * MB_A (global) DESCA( MB_ ) The blocking factor used to distribute * the rows of the array. * NB_A (global) DESCA( NB_ ) The blocking factor used to distribute * the columns of the array. * RSRC_A (global) DESCA( RSRC_ ) The process row over which the first * row of the array A is distributed. * CSRC_A (global) DESCA( CSRC_ ) The process column over which the * first column of the array A is * distributed. * LLD_A (local) DESCA( LLD_ ) The leading dimension of the local * array. LLD_A >= MAX(1,LOCp(M_A)). * * Let K be the number of rows or columns of a distributed matrix, * and assume that its process grid has dimension p x q. * LOCp( K ) denotes the number of elements of K that a process * would receive if K were distributed over the p processes of its * process column. * Similarly, LOCq( K ) denotes the number of elements of K that a * process would receive if K were distributed over the q processes of * its process row. * The values of LOCp() and LOCq() may be determined via a call to the * ScaLAPACK tool function, NUMROC: * LOCp( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), * LOCq( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). * An upper bound for these quantities may be computed by: * LOCp( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A * LOCq( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A * * Arguments * ========= * * WANTT (global input) LOGICAL * = .TRUE. : the full Schur form T is required; * = .FALSE.: only eigenvalues are required. * * WANTZ (global input) LOGICAL * = .TRUE. : the matrix of Schur vectors Z is required; * = .FALSE.: Schur vectors are not required. * * N (global input) INTEGER * The order of the Hessenberg matrix A (and Z if WANTZ). * N >= 0. * * ILO (global input) INTEGER * IHI (global input) INTEGER * It is assumed that A is already upper quasi-triangular in * rows and columns IHI+1:N, and that A(ILO,ILO-1) = 0 (unless * ILO = 1). PSLAHQR works primarily with the Hessenberg * submatrix in rows and columns ILO to IHI, but applies * transformations to all of H if WANTT is .TRUE.. * 1 <= ILO <= max(1,IHI); IHI <= N. * * A (global input/output) REAL array, dimension * (DESCA(LLD_),*) * On entry, the upper Hessenberg matrix A. * On exit, if WANTT is .TRUE., A is upper quasi-triangular in * rows and columns ILO:IHI, with any 2-by-2 or larger diagonal * blocks not yet in standard form. If WANTT is .FALSE., the * contents of A are unspecified on exit. * * DESCA (global and local input) INTEGER array of dimension DLEN_. * The array descriptor for the distributed matrix A. * * WR (global replicated output) REAL array, dimension (N) * WI (global replicated output) REAL array, dimension (N) * The real and imaginary parts, respectively, of the computed * eigenvalues ILO to IHI are stored in the corresponding * elements of WR and WI. If two eigenvalues are computed as a * complex conjugate pair, they are stored in consecutive * elements of WR and WI, say the i-th and (i+1)th, with * WI(i) > 0 and WI(i+1) < 0. If WANTT is .TRUE., the * eigenvalues are stored in the same order as on the diagonal * of the Schur form returned in A. A may be returned with * larger diagonal blocks until the next release. * * ILOZ (global input) INTEGER * IHIZ (global input) INTEGER * Specify the rows of Z to which transformations must be * applied if WANTZ is .TRUE.. * 1 <= ILOZ <= ILO; IHI <= IHIZ <= N. * * Z (global input/output) REAL array. * If WANTZ is .TRUE., on entry Z must contain the current * matrix Z of transformations accumulated by PDHSEQR, and on * exit Z has been updated; transformations are applied only to * the submatrix Z(ILOZ:IHIZ,ILO:IHI). * If WANTZ is .FALSE., Z is not referenced. * * DESCZ (global and local input) INTEGER array of dimension DLEN_. * The array descriptor for the distributed matrix Z. * * WORK (local output) REAL array of size LWORK * (Unless LWORK=-1, in which case WORK must be at least size 1) * * LWORK (local input) INTEGER * WORK(LWORK) is a local array and LWORK is assumed big enough * so that LWORK >= 3*N + * MAX( 2*MAX(DESCZ(LLD_),DESCA(LLD_)) + 2*LOCq(N), * 7*Ceil(N/HBL)/LCM(NPROW,NPCOL)) + * MAX( 2*N, (8*LCM(NPROW,NPCOL)+2)**2 ) * If LWORK=-1, then WORK(1) gets set to the above number and * the code returns immediately. * * IWORK (global and local input) INTEGER array of size ILWORK * This will hold some of the IBLK integer arrays. * This is held as a place holder for a future release. * Currently unreferenced. * * ILWORK (local input) INTEGER * This will hold the size of the IWORK array. * This is held as a place holder for a future release. * Currently unreferenced. * * INFO (global output) INTEGER * < 0: parameter number -INFO incorrect or inconsistent * = 0: successful exit * > 0: PSLAHQR failed to compute all the eigenvalues ILO to IHI * in a total of 30*(IHI-ILO+1) iterations; if INFO = i, * elements i+1:ihi of WR and WI contain those eigenvalues * which have been successfully computed. * * Logic: * This algorithm is very similar to _LAHQR. Unlike _LAHQR, * instead of sending one double shift through the largest * unreduced submatrix, this algorithm sends multiple double shifts * and spaces them apart so that there can be parallelism across * several processor row/columns. Another critical difference is * that this algorithm aggregrates multiple transforms together in * order to apply them in a block fashion. * * Important Local Variables: * IBLK = The maximum number of bulges that can be computed. * Currently fixed. Future releases this won't be fixed. * HBL = The square block size (HBL=DESCA(MB_)=DESCA(NB_)) * ROTN = The number of transforms to block together * NBULGE = The number of bulges that will be attempted on the * current submatrix. * IBULGE = The current number of bulges started. * K1(*),K2(*) = The current bulge loops from K1(*) to K2(*). * * Subroutines: * From LAPACK, this routine calls: * SLAHQR -> Serial QR used to determine shifts and eigenvalues * SLARFG -> Determine the Householder transforms * * This ScaLAPACK, this routine calls: * PSLACONSB -> To determine where to start each iteration * SLAMSH -> Sends multiple shifts through a small submatrix to * see how the consecutive subdiagonals change (if * PSLACONSB indicates we can start a run in the middle) * PSLAWIL -> Given the shift, get the transformation * SLASORTE -> Pair up eigenvalues so that reals are paired. * PSLACP3 -> Parallel array to local replicated array copy & * back. * SLAREF -> Row/column reflector applier. Core routine * here. * PSLASMSUB -> Finds negligible subdiagonal elements. * * Current Notes and/or Restrictions: * 1.) This code requires the distributed block size to be square * and at least six (6); unlike simpler codes like LU, this * algorithm is extremely sensitive to block size. Unwise * choices of too small a block size can lead to bad * performance. * 2.) This code requires A and Z to be distributed identically * and have identical contxts. A future version may allow Z to * have a different contxt to 1D row map it to all nodes (so no * communication on Z is necessary.) * 3.) This release currently does not have a routine for * resolving the Schur blocks into regular 2x2 form after * this code is completed. Because of this, a significant * performance impact is required while the deflation is done * by sometimes a single column of processors. * 4.) This code does not currently block the initial transforms * so that none of the rows or columns for any bulge are * completed until all are started. To offset pipeline * start-up it is recommended that at least 2*LCM(NPROW,NPCOL) * bulges are used (if possible) * 5.) The maximum number of bulges currently supported is fixed at * 32. In future versions this will be limited only by the * incoming WORK and IWORK array. * 6.) The matrix A must be in upper Hessenberg form. If elements * below the subdiagonal are nonzero, the resulting transforms * may be nonsimilar. This is also true with the LAPACK * routine SLAHQR. * 7.) For this release, this code has only been tested for * RSRC_=CSRC_=0, but it has been written for the general case. * 8.) Currently, all the eigenvalues are distributed to all the * nodes. Future releases will probably distribute the * eigenvalues by the column partitioning. * 9.) The internals of this routine are subject to change. * 10.) To optimize this for your architecture, try tuning SLAREF. * 11.) This code has only been tested for WANTZ = .TRUE. and may * behave unpredictably for WANTZ set to .FALSE. * * Implemented by: G. Henry, May 1, 1997 * * ===================================================================== * * .. Parameters .. INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DT_, $ LLD_, MB_, M_, NB_, N_, RSRC_ PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DT_ = 1, $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) REAL ZERO, ONE, HALF PARAMETER ( ZERO = 0.0E+0, ONE = 1.0E+0, HALF = 0.5E+0 ) REAL CONST PARAMETER ( CONST = 1.50E+0 ) INTEGER IBLK PARAMETER ( IBLK = 32 ) * .. * .. Local Scalars .. LOGICAL SKIP INTEGER CONTXT, DOWN, HBL, I, I1, I2, IAFIRST, $ IBULGE, ICBUF, ICOL, ICOL1, ICOL2, IDIA, $ IERR, II, IRBUF, IROW, IROW1, IROW2, ISPEC, $ ISTART, ISTARTCOL, ISTARTROW, ISTOP, ISUB, $ ISUP, ITERMAX, ITMP1, ITMP2, ITN, ITS, $ IZBUF, J, JAFIRST, JBLK, JJ, K, KI, L, LCMRC, $ LDA, LDZ, LEFT, LIHIH, LIHIZ, LILOH, LILOZ, $ LOCALI1, LOCALI2, LOCALK, LOCALM, M, MODKM1, $ MYCOL, MYROW, NBULGE, NH, NODE, NPCOL, $ NPROW, NQ, NR, NUM, NZ, RIGHT, UP, VECSIDX REAL AVE, DISC, H00, H10, H11, H12, $ H21, H22, H33, H43H34, H44, OVFL, S, $ SMLNUM, SUM, T1, T1COPY, T2, T3, $ ULP, UNFL, V1SAVE, V2, V2SAVE, $ V3, V3SAVE * .. * .. Local Arrays .. INTEGER ICURCOL( IBLK ), ICURROW( IBLK ), K1( IBLK ), $ K2( IBLK ), KCOL( IBLK ), KP2COL( IBLK ), $ KP2ROW( IBLK ), KROW( IBLK ) REAL S1( 2*IBLK, 2*IBLK ), SMALLA( 6, 6, IBLK ), $ VCOPY( 3 ) * .. * .. External Functions .. INTEGER ILCM, NUMROC REAL PSLAMCH EXTERNAL ILCM, NUMROC, PSLAMCH * .. * .. External Subroutines .. EXTERNAL BLACS_GRIDINFO, SCOPY, SGEBR2D, $ SGEBS2D, SGERV2D, SGESD2D, SGSUM2D, SLAHQR, $ SLAREF, SLARFG, SLASORTE, SLAMSH, IGAMN2D, $ IGEBR2D, IGEBS2D, INFOG1L, INFOG2L, PSLABAD, $ PSLACONSB, PSLACP3, PSLASMSUB, PSLAWIL, $ PXERBLA * .. * .. Intrinsic Functions .. * INTRINSIC ABS, DABS, DBLE, MAX, MIN, MOD, SIGN, SQRT * .. * .. Executable Statements .. * INFO = 0 * ITERMAX = 30*( IHI-ILO+1 ) IF( N.EQ.0 ) $ RETURN * * NODE (IAFIRST,JAFIRST) OWNS A(1,1) * HBL = DESCA( MB_ ) CONTXT = DESCA( CTXT_ ) LDA = DESCA( LLD_ ) IAFIRST = DESCA( RSRC_ ) JAFIRST = DESCA( CSRC_ ) LDZ = DESCZ( LLD_ ) CALL BLACS_GRIDINFO( CONTXT, NPROW, NPCOL, MYROW, MYCOL ) NODE = MYROW*NPCOL + MYCOL NUM = NPROW*NPCOL LEFT = MOD( MYCOL+NPCOL-1, NPCOL ) RIGHT = MOD( MYCOL+1, NPCOL ) UP = MOD( MYROW+NPROW-1, NPROW ) DOWN = MOD( MYROW+1, NPROW ) LCMRC = ILCM( NPROW, NPCOL ) IF( ( NPROW.LE.3 ) .OR. ( NPCOL.LE.3 ) ) THEN SKIP = .TRUE. ELSE SKIP = .FALSE. END IF * * Determine the number of columns we have so we can check workspace * NQ = NUMROC( N, HBL, MYCOL, JAFIRST, NPCOL ) JJ = N / HBL IF( JJ*HBL.LT.N ) $ JJ = JJ + 1 JJ = 7*JJ / LCMRC JJ = 3*N+MAX( 2*MAX( LDA, LDZ )+2*NQ, JJ ) JJ = JJ + MAX( 2*N, (8*LCMRC+2)**2 ) IF ( LWORK .EQ. -1 ) THEN WORK(1) = JJ RETURN END IF IF ( LWORK .LT. JJ ) THEN INFO = -15 END IF IF( DESCZ( CTXT_ ).NE.DESCA( CTXT_ ) ) THEN INFO = -( 1300+CTXT_ ) END IF IF( DESCA( MB_ ).NE.DESCA( NB_ ) ) THEN INFO = -( 700+NB_ ) END IF IF( DESCZ( MB_ ).NE.DESCZ( NB_ ) ) THEN INFO = -( 1300+NB_ ) END IF IF( DESCA( MB_ ).NE.DESCZ( MB_ ) ) THEN INFO = -( 1300+MB_ ) END IF IF( ( DESCA( RSRC_ ).NE.0 ) .OR. ( DESCA( CSRC_ ).NE.0 ) ) THEN INFO = -( 700+RSRC_ ) END IF IF( ( DESCZ( RSRC_ ).NE.0 ) .OR. ( DESCZ( CSRC_ ).NE.0 ) ) THEN INFO = -( 1300+RSRC_ ) END IF IF( ( ILO.GT.N ) .OR. ( ILO.LT.1 ) ) THEN INFO = -4 END IF IF( ( IHI.GT.N ) .OR. ( IHI.LT.1 ) ) THEN INFO = -5 END IF IF( HBL.LT.5 ) THEN INFO = -( 700+MB_ ) END IF CALL IGAMN2D( CONTXT, 'ALL', ' ', 1, 1, INFO, 1, ITMP1, ITMP2, -1, $ -1, -1 ) IF( INFO.LT.0 ) THEN CALL PXERBLA( CONTXT, 'PSLAHQR', -INFO ) RETURN END IF * * Set work array indices * VECSIDX = 0 IDIA = 3*N ISUB = 3*N ISUP = 3*N IRBUF = 3*N ICBUF = 3*N IZBUF = 5*N * * Find a value for ROTN * ROTN = HBL / 3 ROTN = MIN( ROTN, HBL-2 ) ROTN = MAX( ROTN, 1 ) * IF( ILO.EQ.IHI ) THEN CALL INFOG2L( ILO, ILO, DESCA, NPROW, NPCOL, MYROW, MYCOL, $ IROW, ICOL, II, JJ ) IF( ( MYROW.EQ.II ) .AND. ( MYCOL.EQ.JJ ) ) THEN WR( ILO ) = A( ( ICOL-1 )*LDA+IROW ) ELSE WR( ILO ) = ZERO END IF WI( ILO ) = ZERO RETURN END IF * NH = IHI - ILO + 1 NZ = IHIZ - ILOZ + 1 * CALL INFOG1L( ILOZ, HBL, NPROW, MYROW, IAFIRST, LILOZ, LIHIZ ) LIHIZ = NUMROC( IHIZ, HBL, MYROW, IAFIRST, NPROW ) * * Set machine-dependent constants for the stopping criterion. * If NORM(H) <= SQRT(OVFL), overflow should not occur. * UNFL = PSLAMCH( CONTXT, 'SAFE MINIMUM' ) OVFL = ONE / UNFL CALL PSLABAD( CONTXT, UNFL, OVFL ) ULP = PSLAMCH( CONTXT, 'PRECISION' ) SMLNUM = UNFL*( NH / ULP ) * * I1 and I2 are the indices of the first row and last column of H * to which transformations must be applied. If eigenvalues only are * being computed, I1 and I2 are set inside the main loop. * IF( WANTT ) THEN I1 = 1 I2 = N END IF * * ITN is the total number of QR iterations allowed. * ITN = ITERMAX * * The main loop begins here. I is the loop index and decreases from * IHI to ILO in steps of our schur block size (<=2*IBLK). Each * iteration of the loop works with the active submatrix in rows * and columns L to I. Eigenvalues I+1 to IHI have already * converged. Either L = ILO or the global A(L,L-1) is negligible * so that the matrix splits. * I = IHI 10 CONTINUE L = ILO IF( I.LT.ILO ) $ GO TO 670 * * Perform QR iterations on rows and columns ILO to I until a * submatrix of order 1 or 2 splits off at the bottom because a * subdiagonal element has become negligible. * DO 640 ITS = 0, ITN * * Look for a single small subdiagonal element. * CALL PSLASMSUB( A, DESCA, I, L, K, SMLNUM, WORK( IRBUF+1 ), $ LWORK-IRBUF ) L = K * IF( L.GT.ILO ) THEN * * H(L,L-1) is negligible * CALL INFOG2L( L, L-1, DESCA, NPROW, NPCOL, MYROW, MYCOL, $ IROW, ICOL, ITMP1, ITMP2 ) IF( ( MYROW.EQ.ITMP1 ) .AND. ( MYCOL.EQ.ITMP2 ) ) THEN A( ( ICOL-1 )*LDA+IROW ) = ZERO END IF WORK( ISUB+L-1 ) = ZERO END IF * * Exit from loop if a submatrix of order 1 or 2 has split off. * IF ( WANTT ) THEN * For Schur form, use 2x2 blocks IF ( L .GE. I-1 ) THEN GO TO 650 END IF ELSE * If we don't want the Schur form, use bigger blocks. IF ( L .GE. I-( 2*IBLK-1) ) THEN GO TO 650 END IF END IF * * Now the active submatrix is in rows and columns L to I. If * eigenvalues only are being computed, only the active submatrix * need be transformed. * IF( .NOT.WANTT ) THEN I1 = L I2 = I END IF * * Copy submatrix of size 2*JBLK and prepare to do generalized * Wilkinson shift or an exceptional shift * JBLK = MIN( IBLK, ( ( I-L+1 ) / 2 )-1 ) IF( JBLK.GT.LCMRC ) THEN * * Make sure it's divisible by LCM (we want even workloads!) * JBLK = JBLK - MOD( JBLK, LCMRC ) END IF JBLK = MIN( JBLK, 2*LCMRC ) JBLK = MAX( JBLK, 1 ) * CALL PSLACP3( 2*JBLK, I-2*JBLK+1, A, DESCA, S1, 2*IBLK, -1, -1, $ 0 ) IF( (ITS.EQ.20 .OR. ITS.EQ.40) .AND. (JBLK.GT.1) ) THEN * * Exceptional shift. * DO 20 II = 2*JBLK, 1, -1 S1( II, II ) = CONST*( ABS( S1( II, II ) )+ $ ABS( S1( II, II-1 ) ) ) S1( II, II-1 ) = ZERO S1( II-1, II ) = ZERO 20 CONTINUE ELSE CALL SLAHQR( .FALSE., .FALSE., 2*JBLK, 1, 2*JBLK, S1, $ 2*IBLK, WORK( IRBUF+1 ), WORK( ICBUF+1 ), 1, $ 2*JBLK, Z, LDZ, IERR ) * * Prepare to use Wilkinson's double shift * H44 = S1( 2*JBLK, 2*JBLK ) H33 = S1( 2*JBLK-1, 2*JBLK-1 ) H43H34 = S1( 2*JBLK-1, 2*JBLK )*S1( 2*JBLK, 2*JBLK-1 ) IF( ( JBLK.GT.1 ) .AND. ( ITS.GT.30 ) ) THEN S = S1( 2*JBLK-1, 2*JBLK-2 ) DISC = ( H33-H44 )*HALF DISC = DISC*DISC + H43H34 IF( DISC.GT.ZERO ) THEN * * Real roots: Use Wilkinson's shift twice * DISC = SQRT( DISC ) AVE = HALF*( H33+H44 ) IF( ABS( H33 )-ABS( H44 ).GT.ZERO ) THEN H33 = H33*H44 - H43H34 H44 = H33 / ( SIGN( DISC, AVE )+AVE ) ELSE H44 = SIGN( DISC, AVE ) + AVE END IF H33 = H44 H43H34 = ZERO END IF END IF END IF * * Look for two consecutive small subdiagonal elements: * PSLACONSB is the routine that does this. * CALL PSLACONSB( A, DESCA, I, L, M, H44, H33, H43H34, $ WORK( IRBUF+1 ), LWORK-IRBUF ) * * Double-shift QR step * * NBULGE is the number of bulges that will be attempted * ISTOP = MIN( M+ROTN-1-MOD( M-( M / HBL )*HBL-1, ROTN ), I-2 ) ISTOP = MIN( ISTOP, M+HBL-3-MOD( M-1, HBL ) ) ISTOP = MIN( ISTOP, I2-2 ) ISTOP = MAX( ISTOP, M ) NBULGE = ( I-1-ISTOP ) / HBL * * Do not exceed maximum determined. * NBULGE = MIN( NBULGE, JBLK ) IF( NBULGE.GT.LCMRC ) THEN * * Make sure it's divisible by LCM (we want even workloads!) * NBULGE = NBULGE - MOD( NBULGE, LCMRC ) END IF NBULGE = MAX( NBULGE, 1 ) * IF( ( ITS.NE.20 ) .AND. ( ITS.NE.40 ) .AND. ( NBULGE.GT.1 ) ) $ THEN * * sort the eigenpairs so that they are in twos for double * shifts. only call if several need sorting * CALL SLASORTE( S1( 2*( JBLK-NBULGE )+1, $ 2*( JBLK-NBULGE )+1 ), 2*IBLK, 2*NBULGE, $ WORK( IRBUF+1 ), IERR ) END IF * * If we are starting in the middle because of consecutive small * subdiagonal elements, we need to see how many bulges we * can send through without breaking the consecutive small * subdiagonal property. * IF( ( NBULGE.GT.1 ) .AND. ( M.GT.L ) ) THEN * * Copy a chunk of elements from global A(M-1:,M-1:) * CALL INFOG2L( M+2, M+2, DESCA, NPROW, NPCOL, MYROW, MYCOL, $ IROW1, ICOL1, ITMP1, ITMP2 ) II = MIN( 4*NBULGE+2, N-M+2 ) CALL PSLACP3( II, M-1, A, DESCA, WORK( IRBUF+1 ), II, ITMP1, $ ITMP2, 0 ) IF( ( MYROW.EQ.ITMP1 ) .AND. ( MYCOL.EQ.ITMP2 ) ) THEN * * Find a new NBULGE based on the bulges we have. * CALL SLAMSH( S1, 2*IBLK, NBULGE, JBLK, WORK( IRBUF+1 ), $ II, II, ULP ) IF( NUM.GT.1 ) THEN CALL IGEBS2D( CONTXT, 'ALL', ' ', 1, 1, NBULGE, 1 ) END IF ELSE * * Everyone needs to receive the new NBULGE * CALL IGEBR2D( CONTXT, 'ALL', ' ', 1, 1, NBULGE, 1, ITMP1, $ ITMP2 ) END IF END IF * * IBULGE is the number of bulges going so far * IBULGE = 1 * * "A" row defs : main row transforms from LOCALK to LOCALI2 * CALL INFOG1L( M, HBL, NPCOL, MYCOL, JAFIRST, ITMP1, LOCALK ) LOCALK = NQ CALL INFOG1L( 1, HBL, NPCOL, MYCOL, JAFIRST, ICOL1, LOCALI2 ) LOCALI2 = NUMROC( I2, HBL, MYCOL, JAFIRST, NPCOL ) * * "A" col defs : main col transforms from LOCALI1 to LOCALM * CALL INFOG1L( I1, HBL, NPROW, MYROW, IAFIRST, LOCALI1, $ ICOL1 ) CALL INFOG1L( 1, HBL, NPROW, MYROW, IAFIRST, LOCALM, ICOL1 ) ICOL1 = NUMROC( MIN( M+3, I ), HBL, MYROW, IAFIRST, NPROW ) * * Which row & column will start the bulges * ISTARTROW = MOD( ( M+1 ) / HBL, NPROW ) + IAFIRST ISTARTCOL = MOD( ( M+1 ) / HBL, NPCOL ) + JAFIRST * CALL INFOG1L( M, HBL, NPROW, MYROW, IAFIRST, II, ITMP2 ) CALL INFOG1L( M, HBL, NPCOL, MYCOL, JAFIRST, JJ, ITMP2 ) CALL INFOG1L( 1, HBL, NPROW, MYROW, IAFIRST, ISTOP, $ KP2ROW( 1 ) ) KP2ROW( 1 ) = NUMROC( M+2, HBL, MYROW, IAFIRST, NPROW ) CALL INFOG1L( 1, HBL, NPCOL, MYCOL, JAFIRST, ISTOP, $ KP2COL( 1 ) ) KP2COL( 1 ) = NUMROC( M+2, HBL, MYCOL, JAFIRST, NPCOL ) * * Set all values for bulges. All bulges are stored in * intermediate steps as loops over KI. Their current "task" * over the global M to I-1 values is always K1(KI) to K2(KI). * However, because there are many bulges, K1(KI) & K2(KI) might * go past that range while later bulges (KI+1,KI+2,etc..) are * finishing up. Even if ROTN=1, in order to minimize border * communication sometimes K1(KI)=HBL-2 & K2(KI)=HBL-1 so both * border messages can be handled at once. * * Rules: * If MOD(K1(KI)-1,HBL) < HBL-2 then MOD(K2(KI)-1,HBL)<HBL-2 * If MOD(K1(KI)-1,HBL) = HBL-1 then MOD(K2(KI)-1,HBL)=HBL-1 * K2(KI)-K1(KI) <= ROTN * * We first hit a border when MOD(K1(KI)-1,HBL)=HBL-2 and we hit * it again when MOD(K1(KI)-1,HBL)=HBL-1. * DO 30 KI = 1, NBULGE K1( KI ) = M ISTOP = MIN( M+ROTN-1-MOD( M-( M / HBL )*HBL-1, ROTN ), $ I-2 ) ISTOP = MIN( ISTOP, M+HBL-3-MOD( M-1, HBL ) ) ISTOP = MIN( ISTOP, I2-2 ) ISTOP = MAX( ISTOP, M ) IF( ( MOD( M-1, HBL ).EQ.HBL-2 ) .AND. $ ( ISTOP.LT.MIN( I-2, I2-2 ) ) ) THEN ISTOP = ISTOP + 1 END IF K2( KI ) = ISTOP ICURROW( KI ) = ISTARTROW ICURCOL( KI ) = ISTARTCOL KROW( KI ) = II KCOL( KI ) = JJ IF( KI.GT.1 ) $ KP2ROW( KI ) = KP2ROW( 1 ) IF( KI.GT.1 ) $ KP2COL( KI ) = KP2COL( 1 ) 30 CONTINUE * * Get first transform on node who owns M+2,M+2 * ITMP1 = ISTARTROW ITMP2 = ISTARTCOL CALL PSLAWIL( ITMP1, ITMP2, M, A, DESCA, H44, H33, H43H34, $ VCOPY ) V1SAVE = VCOPY( 1 ) V2SAVE = VCOPY( 2 ) V3SAVE = VCOPY( 3 ) * * The main implicit shift Francis loops over the bulges starts * here! * IF( K2( IBULGE ).LE.I-1 ) THEN 40 CONTINUE IF( ( K1( IBULGE ).GE.M+5 ) .AND. ( IBULGE.LT.NBULGE ) ) $ THEN IF( ( MOD( K2( IBULGE )+2, HBL ).EQ.MOD( K2( IBULGE+1 )+ $ 2, HBL ) ) .AND. ( K1( 1 ).LE.I-1 ) ) THEN H44 = S1( 2*JBLK-2*IBULGE, 2*JBLK-2*IBULGE ) H33 = S1( 2*JBLK-2*IBULGE-1, 2*JBLK-2*IBULGE-1 ) H43H34 = S1( 2*JBLK-2*IBULGE-1, 2*JBLK-2*IBULGE )* $ S1( 2*JBLK-2*IBULGE, 2*JBLK-2*IBULGE-1 ) ITMP1 = ISTARTROW ITMP2 = ISTARTCOL CALL PSLAWIL( ITMP1, ITMP2, M, A, DESCA, H44, H33, $ H43H34, VCOPY ) V1SAVE = VCOPY( 1 ) V2SAVE = VCOPY( 2 ) V3SAVE = VCOPY( 3 ) IBULGE = IBULGE + 1 END IF END IF * * When we hit a border, there are row and column transforms that * overlap over several processors and the code gets very * "congested." As a remedy, when we first hit a border, a 6x6 * *local* matrix is generated on one node (called SMALLA) and * work is done on that. At the end of the border, the data is * passed back and everything stays a lot simpler. * DO 120 KI = 1, IBULGE * ISTART = MAX( K1( KI ), M ) ISTOP = MIN( K2( KI ), I-1 ) K = ISTART MODKM1 = MOD( K-1, HBL ) IF( MODKM1.GE.HBL-2 ) THEN IF( ( MODKM1.EQ.HBL-2 ) .AND. ( K.LT.I-1 ) ) THEN * * Copy 6 elements from global A(K-1:K+4,K-1:K+4) * ITMP1 = ICURROW( KI ) ITMP2 = ICURCOL( KI ) CALL PSLACP3( MIN( 6, N-K+2 ), K-1, A, DESCA, $ SMALLA( 1, 1, KI ), 6, ITMP1, ITMP2, $ 0 ) END IF IF( ( MODKM1.EQ.HBL-1 ) .AND. ( K.EQ.M ) ) THEN * * Copy 6 elements from global A(K-2:K+3,K-2:K+3) * CALL INFOG2L( K+1, K+1, DESCA, NPROW, NPCOL, MYROW, $ MYCOL, IROW1, ICOL1, ITMP1, ITMP2 ) CALL PSLACP3( MIN( 6, N-K+3 ), K-2, A, DESCA, $ SMALLA( 1, 1, KI ), 6, ITMP1, ITMP2, $ 0 ) END IF END IF * * * SLAHQR used to have a single row application and a single * column application to H. Here we do something a little * more clever. We break each transformation down into 3 * parts: * 1.) The minimum amount of work it takes to determine * a group of ROTN transformations (this is on * the critical path.) (Loops 50-120) * (the data is broadcast now: loops 180-240) * 2.) The small work it takes so that each of the rows * and columns is at the same place. For example, * all ROTN row transforms are all complete * through some column TMP. (Loops 250-260) * 3.) The majority of the row and column transforms * are then applied in a block fashion. * (row transforms are in loops 280-380) * (col transforms are in loops 400-540) * * Each of these three parts are further subdivided into 3 * parts: * A.) Work at the start of a border when * MOD(ISTART-1,HBL) = HBL-2 * B.) Work at the end of a border when * MOD(ISTART-1,HBL) = HBL-1 * C.) Work in the middle of the block when * MOD(ISTART-1,HBL) < HBL-2 * * Further optimization is met with the boolean SKIP. A border * communication can be broken into several parts for efficient * parallelism: * Loop over all the bulges, just sending the data out * Loop over all the bulges, just doing the work * Loop over all the bulges, just sending the data back. * * IF( ( MYROW.EQ.ICURROW( KI ) ) .AND. $ ( MYCOL.EQ.ICURCOL( KI ) ) .AND. $ ( MODKM1.EQ.HBL-2 ) .AND. $ ( ISTART.LT.MIN( I-1, ISTOP+1 ) ) ) THEN K = ISTART NR = MIN( 3, I-K+1 ) IF( K.GT.M ) THEN CALL SCOPY( NR, SMALLA( 2, 1, KI ), 1, VCOPY, 1 ) ELSE VCOPY( 1 ) = V1SAVE VCOPY( 2 ) = V2SAVE VCOPY( 3 ) = V3SAVE END IF CALL SLARFG( NR, VCOPY( 1 ), VCOPY( 2 ), 1, T1COPY ) IF( K.GT.M ) THEN SMALLA( 2, 1, KI ) = VCOPY( 1 ) SMALLA( 3, 1, KI ) = ZERO IF( K.LT.I-1 ) $ SMALLA( 4, 1, KI ) = ZERO ELSE IF( M.GT.L ) THEN SMALLA( 2, 1, KI ) = -SMALLA( 2, 1, KI ) END IF V2 = VCOPY( 2 ) T2 = T1COPY*V2 WORK( VECSIDX+( K-1 )*3+1 ) = VCOPY( 2 ) WORK( VECSIDX+( K-1 )*3+2 ) = VCOPY( 3 ) WORK( VECSIDX+( K-1 )*3+3 ) = T1COPY IF( NR.EQ.3 ) THEN * * Do some work so next step is ready... * T1 = T1COPY V3 = VCOPY( 3 ) T3 = T1*V3 ITMP1 = MIN( 6, I2+2-K ) ITMP2 = MAX( I1-K+2, 1 ) DO 50 J = 2, ITMP1 SUM = SMALLA( 2, J, KI ) + $ V2*SMALLA( 3, J, KI ) + $ V3*SMALLA( 4, J, KI ) SMALLA( 2, J, KI ) = SMALLA( 2, J, KI ) - SUM*T1 SMALLA( 3, J, KI ) = SMALLA( 3, J, KI ) - SUM*T2 SMALLA( 4, J, KI ) = SMALLA( 4, J, KI ) - SUM*T3 50 CONTINUE DO 60 J = ITMP2, 5 SUM = SMALLA( J, 2, KI ) + $ V2*SMALLA( J, 3, KI ) + $ V3*SMALLA( J, 4, KI ) SMALLA( J, 2, KI ) = SMALLA( J, 2, KI ) - SUM*T1 SMALLA( J, 3, KI ) = SMALLA( J, 3, KI ) - SUM*T2 SMALLA( J, 4, KI ) = SMALLA( J, 4, KI ) - SUM*T3 60 CONTINUE END IF END IF * IF( ( MOD( ISTOP-1, HBL ).EQ.HBL-1 ) .AND. $ ( MYROW.EQ.ICURROW( KI ) ) .AND. $ ( MYCOL.EQ.ICURCOL( KI ) ) .AND. $ ( ISTART.LE.MIN( I, ISTOP ) ) ) THEN K = ISTOP NR = MIN( 3, I-K+1 ) IF( K.GT.M ) THEN CALL SCOPY( NR, SMALLA( 3, 2, KI ), 1, VCOPY, 1 ) ELSE VCOPY( 1 ) = V1SAVE VCOPY( 2 ) = V2SAVE VCOPY( 3 ) = V3SAVE END IF CALL SLARFG( NR, VCOPY( 1 ), VCOPY( 2 ), 1, T1COPY ) IF( K.GT.M ) THEN SMALLA( 3, 2, KI ) = VCOPY( 1 ) SMALLA( 4, 2, KI ) = ZERO IF( K.LT.I-1 ) $ SMALLA( 5, 2, KI ) = ZERO * * Set a subdiagonal to zero now if it's possible * IF( ( K-2.GT.M ) .AND. ( MOD( K-1, HBL ).GT.1 ) ) $ THEN H11 = SMALLA( 1, 1, KI ) H10 = SMALLA( 2, 1, KI ) H22 = SMALLA( 2, 2, KI ) SUM = ABS( H11 ) + ABS( H22 ) IF( ABS( H10 ).LE.MAX( ULP*SUM, SMLNUM ) ) THEN SMALLA( 2, 1, KI ) = ZERO END IF END IF ELSE IF( M.GT.L ) THEN SMALLA( 3, 2, KI ) = -SMALLA( 3, 2, KI ) END IF V2 = VCOPY( 2 ) T2 = T1COPY*V2 WORK( VECSIDX+( K-1 )*3+1 ) = VCOPY( 2 ) WORK( VECSIDX+( K-1 )*3+2 ) = VCOPY( 3 ) WORK( VECSIDX+( K-1 )*3+3 ) = T1COPY IF( NR.EQ.3 ) THEN * * Do some work so next step is ready... * T1 = T1COPY V3 = VCOPY( 3 ) T3 = T1*V3 ITMP1 = MIN( 6, I2-K+3 ) ITMP2 = MAX( I1-K+3, 1 ) DO 70 J = 3, ITMP1 SUM = SMALLA( 3, J, KI ) + $ V2*SMALLA( 4, J, KI ) + $ V3*SMALLA( 5, J, KI ) SMALLA( 3, J, KI ) = SMALLA( 3, J, KI ) - SUM*T1 SMALLA( 4, J, KI ) = SMALLA( 4, J, KI ) - SUM*T2 SMALLA( 5, J, KI ) = SMALLA( 5, J, KI ) - SUM*T3 70 CONTINUE DO 80 J = ITMP2, 6 SUM = SMALLA( J, 3, KI ) + $ V2*SMALLA( J, 4, KI ) + $ V3*SMALLA( J, 5, KI ) SMALLA( J, 3, KI ) = SMALLA( J, 3, KI ) - SUM*T1 SMALLA( J, 4, KI ) = SMALLA( J, 4, KI ) - SUM*T2 SMALLA( J, 5, KI ) = SMALLA( J, 5, KI ) - SUM*T3 80 CONTINUE END IF END IF * IF( ( MODKM1.EQ.0 ) .AND. ( ISTART.LE.I-1 ) .AND. $ ( MYROW.EQ.ICURROW( KI ) ) .AND. $ ( RIGHT.EQ.ICURCOL( KI ) ) ) THEN * * (IROW1,ICOL1) is (I,J)-coordinates of H(ISTART,ISTART) * IROW1 = KROW( KI ) ICOL1 = KCOL( KI ) IF( ISTART.GT.M ) THEN VCOPY( 1 ) = SMALLA( 4, 3, KI ) VCOPY( 2 ) = SMALLA( 5, 3, KI ) VCOPY( 3 ) = SMALLA( 6, 3, KI ) NR = MIN( 3, I-ISTART+1 ) CALL SLARFG( NR, VCOPY( 1 ), VCOPY( 2 ), 1, $ T1COPY ) A( ( ICOL1-2 )*LDA+IROW1 ) = VCOPY( 1 ) A( ( ICOL1-2 )*LDA+IROW1+1 ) = ZERO IF( ISTART.LT.I-1 ) THEN A( ( ICOL1-2 )*LDA+IROW1+2 ) = ZERO END IF ELSE IF( M.GT.L ) THEN A( ( ICOL1-2 )*LDA+IROW1 ) = -A( ( ICOL1-2 )* $ LDA+IROW1 ) END IF END IF END IF * IF( ( MYROW.EQ.ICURROW( KI ) ) .AND. $ ( MYCOL.EQ.ICURCOL( KI ) ) .AND. $ ( ( ( MODKM1.EQ.HBL-2 ) .AND. ( ISTART.EQ.I- $ 1 ) ) .OR. ( ( MODKM1.LT.HBL-2 ) .AND. ( ISTART.LE.I- $ 1 ) ) ) ) THEN * * (IROW1,ICOL1) is (I,J)-coordinates of H(ISTART,ISTART) * IROW1 = KROW( KI ) ICOL1 = KCOL( KI ) DO 110 K = ISTART, ISTOP * * Create and do these transforms * NR = MIN( 3, I-K+1 ) IF( K.GT.M ) THEN IF( MOD( K-1, HBL ).EQ.0 ) THEN VCOPY( 1 ) = SMALLA( 4, 3, KI ) VCOPY( 2 ) = SMALLA( 5, 3, KI ) VCOPY( 3 ) = SMALLA( 6, 3, KI ) ELSE VCOPY( 1 ) = A( ( ICOL1-2 )*LDA+IROW1 ) VCOPY( 2 ) = A( ( ICOL1-2 )*LDA+IROW1+1 ) IF( NR.EQ.3 ) THEN VCOPY( 3 ) = A( ( ICOL1-2 )*LDA+IROW1+2 ) END IF END IF ELSE VCOPY( 1 ) = V1SAVE VCOPY( 2 ) = V2SAVE VCOPY( 3 ) = V3SAVE END IF IF( NR.EQ.3 ) THEN CALL SLARFG( NR, VCOPY( 1 ), VCOPY( 2 ), 1, $ T1COPY ) ELSE CALL SLARFG( NR, VCOPY( 1 ), VCOPY( 2 ), 1, $ T1COPY ) END IF * CALL SLARFG( NR, VCOPY( 1 ), VCOPY( 2 ), 1, * $ T1COPY ) IF( K.GT.M ) THEN IF( MOD( K-1, HBL ).GT.0 ) THEN A( ( ICOL1-2 )*LDA+IROW1 ) = VCOPY( 1 ) A( ( ICOL1-2 )*LDA+IROW1+1 ) = ZERO IF( K.LT.I-1 ) THEN A( ( ICOL1-2 )*LDA+IROW1+2 ) = ZERO END IF * * Set a subdiagonal to zero now if it's possible * IF( ( IROW1.GT.2 ) .AND. ( ICOL1.GT.2 ) .AND. $ ( K-2.GT.M ) .AND. ( MOD( K-1, $ HBL ).GT.1 ) ) THEN H11 = A( ( ICOL1-3 )*LDA+IROW1-2 ) H10 = A( ( ICOL1-3 )*LDA+IROW1-1 ) H22 = A( ( ICOL1-2 )*LDA+IROW1-1 ) SUM = ABS( H11 ) + ABS( H22 ) IF( ABS( H10 ).LE.MAX( ULP*SUM, SMLNUM ) ) $ THEN A( ( ICOL1-3 )*LDA+IROW1-1 ) = ZERO END IF END IF END IF ELSE IF( M.GT.L ) THEN IF( MOD( K-1, HBL ).GT.0 ) THEN A( ( ICOL1-2 )*LDA+IROW1 ) = -A( ( ICOL1-2 )* $ LDA+IROW1 ) END IF END IF V2 = VCOPY( 2 ) T2 = T1COPY*V2 WORK( VECSIDX+( K-1 )*3+1 ) = VCOPY( 2 ) WORK( VECSIDX+( K-1 )*3+2 ) = VCOPY( 3 ) WORK( VECSIDX+( K-1 )*3+3 ) = T1COPY T1 = T1COPY IF( K.LT.ISTOP ) THEN * * Do some work so next step is ready... * V3 = VCOPY( 3 ) T3 = T1*V3 DO 90 J = ( ICOL1-1 )*LDA + IROW1, $ ( MIN( K2( KI )+1, I-1 )+ICOL1-K-1 )* $ LDA + IROW1, LDA SUM = A( J ) + V2*A( J+1 ) + V3*A( J+2 ) A( J ) = A( J ) - SUM*T1 A( J+1 ) = A( J+1 ) - SUM*T2 A( J+2 ) = A( J+2 ) - SUM*T3 90 CONTINUE DO 100 J = IROW1 + 1, IROW1 + 3 SUM = A( ( ICOL1-1 )*LDA+J ) + $ V2*A( ICOL1*LDA+J ) + $ V3*A( ( ICOL1+1 )*LDA+J ) A( ( ICOL1-1 )*LDA+J ) = A( ( ICOL1-1 )*LDA+ $ J ) - SUM*T1 A( ICOL1*LDA+J ) = A( ICOL1*LDA+J ) - SUM*T2 A( ( ICOL1+1 )*LDA+J ) = A( ( ICOL1+1 )*LDA+ $ J ) - SUM*T3 100 CONTINUE END IF IROW1 = IROW1 + 1 ICOL1 = ICOL1 + 1 110 CONTINUE END IF 120 CONTINUE * * First part of applying the transforms is complete. * Broadcasts of the Householder data is done here. * DO 180 KI = 1, IBULGE * ISTART = MAX( K1( KI ), M ) ISTOP = MIN( K2( KI ), I-1 ) * * Broadcast Householder information from the block * IF( ( MYROW.EQ.ICURROW( KI ) ) .AND. ( NPCOL.GT.1 ) .AND. $ ( ISTART.LE.ISTOP ) ) THEN IF( MYCOL.NE.ICURCOL( KI ) ) THEN CALL SGEBR2D( CONTXT, 'ROW', ' ', $ 3*( ISTOP-ISTART+1 ), 1, $ WORK( VECSIDX+( ISTART-1 )*3+1 ), $ 3*( ISTOP-ISTART+1 ), MYROW, $ ICURCOL( KI ) ) ELSE CALL SGEBS2D( CONTXT, 'ROW', ' ', $ 3*( ISTOP-ISTART+1 ), 1, $ WORK( VECSIDX+( ISTART-1 )*3+1 ), $ 3*( ISTOP-ISTART+1 ) ) END IF END IF 180 CONTINUE * * Now do column transforms and finish work * DO 240 KI = 1, IBULGE * ISTART = MAX( K1( KI ), M ) ISTOP = MIN( K2( KI ), I-1 ) * IF( ( MYCOL.EQ.ICURCOL( KI ) ) .AND. ( NPROW.GT.1 ) .AND. $ ( ISTART.LE.ISTOP ) ) THEN IF( MYROW.NE.ICURROW( KI ) ) THEN CALL SGEBR2D( CONTXT, 'COL', ' ', $ 3*( ISTOP-ISTART+1 ), 1, $ WORK( VECSIDX+( ISTART-1 )*3+1 ), $ 3*( ISTOP-ISTART+1 ), ICURROW( KI ), $ MYCOL ) ELSE CALL SGEBS2D( CONTXT, 'COL', ' ', $ 3*( ISTOP-ISTART+1 ), 1, $ WORK( VECSIDX+( ISTART-1 )*3+1 ), $ 3*( ISTOP-ISTART+1 ) ) END IF END IF 240 CONTINUE * * * Now do make up work to have things in block fashion * DO 260 KI = 1, IBULGE ISTART = MAX( K1( KI ), M ) ISTOP = MIN( K2( KI ), I-1 ) * MODKM1 = MOD( ISTART-1, HBL ) IF( ( MYROW.EQ.ICURROW( KI ) ) .AND. $ ( MYCOL.EQ.ICURCOL( KI ) ) .AND. $ ( ( ( MODKM1.EQ.HBL-2 ) .AND. ( ISTART.EQ.I- $ 1 ) ) .OR. ( ( MODKM1.LT.HBL-2 ) .AND. ( ISTART.LE.I- $ 1 ) ) ) ) THEN * * (IROW1,ICOL1) is (I,J)-coordinates of H(ISTART,ISTART) * IROW1 = KROW( KI ) ICOL1 = KCOL( KI ) DO 250 K = ISTART, ISTOP * * Catch up on column & border work * NR = MIN( 3, I-K+1 ) V2 = WORK( VECSIDX+( K-1 )*3+1 ) V3 = WORK( VECSIDX+( K-1 )*3+2 ) T1 = WORK( VECSIDX+( K-1 )*3+3 ) T2 = T1*V2 IF( K.LT.ISTOP ) THEN * * Do some work so next step is ready... * T3 = T1*V3 CALL SLAREF( 'Col', A, LDA, .FALSE., Z, LDZ, $ .FALSE., ICOL1, ICOL1, ISTART, $ ISTOP, MIN( ISTART+1, I )-K+IROW1, $ IROW1, LILOZ, LIHIZ, $ WORK( VECSIDX+1 ), V2, V3, T1, T2, $ T3 ) IROW1 = IROW1 + 1 ICOL1 = ICOL1 + 1 ELSE IF( ( NR.EQ.3 ) .AND. ( MOD( K-1, $ HBL ).LT.HBL-2 ) ) THEN T3 = T1*V3 CALL SLAREF( 'Row', A, LDA, .FALSE., Z, LDZ, $ .FALSE., IROW1, IROW1, ISTART, $ ISTOP, ICOL1, MIN( MIN( K2( KI ) $ +1, I-1 ), I2 )-K+ICOL1, LILOZ, $ LIHIZ, WORK( VECSIDX+1 ), V2, $ V3, T1, T2, T3 ) END IF END IF 250 CONTINUE END IF * * Send SMALLA back again. * K = ISTART MODKM1 = MOD( K-1, HBL ) IF( ( MODKM1.GE.HBL-2 ) .AND. ( K.LE.I-1 ) ) THEN IF( ( MODKM1.EQ.HBL-2 ) .AND. ( K.LT.I-1 ) ) THEN * * Copy 6 elements from global A(K-1:K+4,K-1:K+4) * ITMP1 = ICURROW(KI) ITMP2 = ICURCOL(KI) CALL PSLACP3( MIN( 6, N-K+2 ), K-1, A, DESCA, $ SMALLA( 1, 1, KI ), 6, ITMP1, ITMP2, $ 1 ) * END IF IF( MODKM1.EQ.HBL-1 ) THEN * * Copy 6 elements from global A(K-2:K+3,K-2:K+3) * ITMP1 = ICURROW( KI ) ITMP2 = ICURCOL( KI ) CALL PSLACP3( MIN( 6, N-K+3 ), K-2, A, DESCA, $ SMALLA( 1, 1, KI ), 6, ITMP1, ITMP2, $ 1 ) END IF END IF * 260 CONTINUE * 270 CONTINUE * * Now start major set of block ROW reflections * DO 280 KI = 1, IBULGE IF( ( MYROW.NE.ICURROW( KI ) ) .AND. $ ( DOWN.NE.ICURROW( KI ) ) )GO TO 280 ISTART = MAX( K1( KI ), M ) ISTOP = MIN( K2( KI ), I-1 ) * IF( ( ISTOP.GT.ISTART ) .AND. $ ( MOD( ISTART-1, HBL ).LT.HBL-2 ) .AND. $ ( ICURROW( KI ).EQ.MYROW ) ) THEN IROW1 = MIN( K2( KI )+1, I-1 ) + 1 CALL INFOG1L( IROW1, HBL, NPCOL, MYCOL, JAFIRST, $ ITMP1, ITMP2 ) ITMP2 = LOCALI2 II = KROW( KI ) CALL SLAREF( 'Row', A, LDA, WANTZ, Z, LDZ, .TRUE., II, $ II, ISTART, ISTOP, ITMP1, ITMP2, LILOZ, $ LIHIZ, WORK( VECSIDX+1 ), V2, V3, T1, T2, $ T3 ) END IF 280 CONTINUE * DO 320 KI = 1, IBULGE IF( KROW( KI ).GT.KP2ROW( KI ) ) $ GO TO 320 IF( ( MYROW.NE.ICURROW( KI ) ) .AND. $ ( DOWN.NE.ICURROW( KI ) ) )GO TO 320 ISTART = MAX( K1( KI ), M ) ISTOP = MIN( K2( KI ), I-1 ) IF( ( ISTART.EQ.ISTOP ) .OR. $ ( MOD( ISTART-1, HBL ).GE.HBL-2 ) .OR. $ ( ICURROW( KI ).NE.MYROW ) ) THEN DO 310 K = ISTART, ISTOP V2 = WORK( VECSIDX+( K-1 )*3+1 ) V3 = WORK( VECSIDX+( K-1 )*3+2 ) T1 = WORK( VECSIDX+( K-1 )*3+3 ) NR = MIN( 3, I-K+1 ) T2 = T1*V2 IF( ( NR.EQ.3 ) .AND. ( KROW( KI ).LE. $ KP2ROW( KI ) ) ) THEN T3 = T1*V3 IF( ( K.LT.ISTOP ) .AND. $ ( MOD( K-1, HBL ).LT.HBL-2 ) ) THEN ITMP1 = MIN( K2( KI )+1, I-1 ) + 1 ELSE IF( MOD( K-1, HBL ).LT.HBL-2 ) THEN ITMP1 = MIN( K2( KI )+1, I-1 ) + 1 END IF IF( MOD( K-1, HBL ).EQ.HBL-2 ) THEN ITMP1 = MIN( K+4, I2 ) + 1 END IF IF( MOD( K-1, HBL ).EQ.HBL-1 ) THEN ITMP1 = MIN( K+3, I2 ) + 1 END IF END IF * * Find local coor of rows K through K+2 * IROW1 = KROW( KI ) IROW2 = KP2ROW( KI ) IF( ( K.GT.ISTART ) .AND. $ ( MOD( K-1, HBL ).GE.HBL-2 ) ) THEN IF( DOWN.EQ.ICURROW( KI ) ) THEN IROW1 = IROW1 + 1 END IF IF( MYROW.EQ.ICURROW( KI ) ) THEN IROW2 = IROW2 + 1 END IF END IF CALL INFOG1L( ITMP1, HBL, NPCOL, MYCOL, $ JAFIRST, ICOL1, ICOL2 ) ICOL2 = LOCALI2 IF( ( MOD( K-1, HBL ).LT.HBL-2 ) .OR. $ ( NPROW.EQ.1 ) ) THEN CALL SLAREF( 'Row', A, LDA, WANTZ, Z, LDZ, $ .FALSE., IROW1, IROW1, ISTART, $ ISTOP, ICOL1, ICOL2, LILOZ, $ LIHIZ, WORK( VECSIDX+1 ), V2, $ V3, T1, T2, T3 ) END IF IF( ( MOD( K-1, HBL ).EQ.HBL-2 ) .AND. $ ( NPROW.GT.1 ) ) THEN IF( IROW1.NE.IROW2 ) THEN CALL SGESD2D( CONTXT, 2, ICOL2-ICOL1+1, $ A( ( ICOL1-1 )*LDA+IROW1 ), $ LDA, DOWN, MYCOL ) IF( SKIP .AND. ( ISTART.EQ.ISTOP ) ) THEN CALL SGERV2D( CONTXT, 2, ICOL2-ICOL1+1, $ A( ( ICOL1-1 )*LDA+ $ IROW1 ), LDA, DOWN, $ MYCOL ) END IF ELSE IF( SKIP ) THEN CALL SGERV2D( CONTXT, 2, ICOL2-ICOL1+1, $ WORK( IRBUF+1 ), 2, UP, $ MYCOL ) DO 290 J = ICOL1, ICOL2 SUM = WORK( IRBUF+2*( J-ICOL1 )+1 ) + $ V2*WORK( IRBUF+2*( J-ICOL1 )+2 ) $ + V3*A( ( J-1 )*LDA+IROW1 ) WORK( IRBUF+2*( J-ICOL1 )+1 ) $ = WORK( IRBUF+2*( J-ICOL1 )+1 ) - $ SUM*T1 WORK( IRBUF+2*( J-ICOL1 )+2 ) $ = WORK( IRBUF+2*( J-ICOL1 )+2 ) - $ SUM*T2 A( ( J-1 )*LDA+IROW1 ) = A( ( J-1 )* $ LDA+IROW1 ) - SUM*T3 290 CONTINUE IF( ISTART.EQ.ISTOP ) THEN CALL SGESD2D( CONTXT, 2, ICOL2-ICOL1+1, $ WORK( IRBUF+1 ), 2, UP, $ MYCOL ) END IF END IF END IF IF( ( MOD( K-1, HBL ).EQ.HBL-1 ) .AND. $ ( NPROW.GT.1 ) ) THEN IF( IROW1.EQ.IROW2 ) THEN IF( ISTART.EQ.ISTOP ) THEN CALL SGESD2D( CONTXT, 2, ICOL2-ICOL1+1, $ A( ( ICOL1-1 )*LDA+IROW1- $ 1 ), LDA, DOWN, MYCOL ) END IF IF( SKIP ) THEN CALL SGERV2D( CONTXT, 2, ICOL2-ICOL1+1, $ A( ( ICOL1-1 )*LDA+IROW1- $ 1 ), LDA, DOWN, MYCOL ) END IF ELSE IF( SKIP ) THEN IF( ISTART.EQ.ISTOP ) THEN CALL SGERV2D( CONTXT, 2, ICOL2-ICOL1+1, $ WORK( IRBUF+1 ), 2, UP, $ MYCOL ) END IF DO 300 J = ICOL1, ICOL2 SUM = WORK( IRBUF+2*( J-ICOL1 )+2 ) + $ V2*A( ( J-1 )*LDA+IROW1 ) + $ V3*A( ( J-1 )*LDA+IROW1+1 ) WORK( IRBUF+2*( J-ICOL1 )+2 ) $ = WORK( IRBUF+2*( J-ICOL1 )+2 ) - $ SUM*T1 A( ( J-1 )*LDA+IROW1 ) = A( ( J-1 )* $ LDA+IROW1 ) - SUM*T2 A( ( J-1 )*LDA+IROW1+1 ) = A( ( J-1 )* $ LDA+IROW1+1 ) - SUM*T3 300 CONTINUE CALL SGESD2D( CONTXT, 2, ICOL2-ICOL1+1, $ WORK( IRBUF+1 ), 2, UP, $ MYCOL ) * END IF END IF END IF 310 CONTINUE END IF 320 CONTINUE * IF( SKIP ) $ GO TO 390 * DO 360 KI = 1, IBULGE IF( KROW( KI ).GT.KP2ROW( KI ) ) $ GO TO 360 IF( ( MYROW.NE.ICURROW( KI ) ) .AND. $ ( DOWN.NE.ICURROW( KI ) ) )GO TO 360 ISTART = MAX( K1( KI ), M ) ISTOP = MIN( K2( KI ), I-1 ) IF( ( ISTART.EQ.ISTOP ) .OR. $ ( MOD( ISTART-1, HBL ).GE.HBL-2 ) .OR. $ ( ICURROW( KI ).NE.MYROW ) ) THEN DO 350 K = ISTART, ISTOP V2 = WORK( VECSIDX+( K-1 )*3+1 ) V3 = WORK( VECSIDX+( K-1 )*3+2 ) T1 = WORK( VECSIDX+( K-1 )*3+3 ) NR = MIN( 3, I-K+1 ) T2 = T1*V2 IF( ( NR.EQ.3 ) .AND. ( KROW( KI ).LE. $ KP2ROW( KI ) ) ) THEN T3 = T1*V3 IF( ( K.LT.ISTOP ) .AND. $ ( MOD( K-1, HBL ).LT.HBL-2 ) ) THEN ITMP1 = MIN( K2( KI )+1, I-1 ) + 1 ELSE IF( MOD( K-1, HBL ).LT.HBL-2 ) THEN ITMP1 = MIN( K2( KI )+1, I-1 ) + 1 END IF IF( MOD( K-1, HBL ).EQ.HBL-2 ) THEN ITMP1 = MIN( K+4, I2 ) + 1 END IF IF( MOD( K-1, HBL ).EQ.HBL-1 ) THEN ITMP1 = MIN( K+3, I2 ) + 1 END IF END IF * * Find local coor of rows K through K+2 * IROW1 = KROW( KI ) IROW2 = KP2ROW( KI ) IF( ( K.GT.ISTART ) .AND. $ ( MOD( K-1, HBL ).GE.HBL-2 ) ) THEN IF( DOWN.EQ.ICURROW( KI ) ) THEN IROW1 = IROW1 + 1 END IF IF( MYROW.EQ.ICURROW( KI ) ) THEN IROW2 = IROW2 + 1 END IF END IF CALL INFOG1L( ITMP1, HBL, NPCOL, MYCOL, $ JAFIRST, ICOL1, ICOL2 ) ICOL2 = LOCALI2 IF( ( MOD( K-1, HBL ).EQ.HBL-2 ) .AND. $ ( NPROW.GT.1 ) ) THEN IF( IROW1.EQ.IROW2 ) THEN CALL SGERV2D( CONTXT, 2, ICOL2-ICOL1+1, $ WORK( IRBUF+1 ), 2, UP, $ MYCOL ) DO 330 J = ICOL1, ICOL2 SUM = WORK( IRBUF+2*( J-ICOL1 )+1 ) + $ V2*WORK( IRBUF+2*( J-ICOL1 )+2 ) $ + V3*A( ( J-1 )*LDA+IROW1 ) WORK( IRBUF+2*( J-ICOL1 )+1 ) $ = WORK( IRBUF+2*( J-ICOL1 )+1 ) - $ SUM*T1 WORK( IRBUF+2*( J-ICOL1 )+2 ) $ = WORK( IRBUF+2*( J-ICOL1 )+2 ) - $ SUM*T2 A( ( J-1 )*LDA+IROW1 ) = A( ( J-1 )* $ LDA+IROW1 ) - SUM*T3 330 CONTINUE IF( ISTART.EQ.ISTOP ) THEN CALL SGESD2D( CONTXT, 2, ICOL2-ICOL1+1, $ WORK( IRBUF+1 ), 2, UP, $ MYCOL ) END IF END IF END IF IF( ( MOD( K-1, HBL ).EQ.HBL-1 ) .AND. $ ( NPROW.GT.1 ) ) THEN IF( IROW1.NE.IROW2 ) THEN IF( ISTART.EQ.ISTOP ) THEN CALL SGERV2D( CONTXT, 2, ICOL2-ICOL1+1, $ WORK( IRBUF+1 ), 2, UP, $ MYCOL ) END IF DO 340 J = ICOL1, ICOL2 SUM = WORK( IRBUF+2*( J-ICOL1 )+2 ) + $ V2*A( ( J-1 )*LDA+IROW1 ) + $ V3*A( ( J-1 )*LDA+IROW1+1 ) WORK( IRBUF+2*( J-ICOL1 )+2 ) $ = WORK( IRBUF+2*( J-ICOL1 )+2 ) - $ SUM*T1 A( ( J-1 )*LDA+IROW1 ) = A( ( J-1 )* $ LDA+IROW1 ) - SUM*T2 A( ( J-1 )*LDA+IROW1+1 ) = A( ( J-1 )* $ LDA+IROW1+1 ) - SUM*T3 340 CONTINUE CALL SGESD2D( CONTXT, 2, ICOL2-ICOL1+1, $ WORK( IRBUF+1 ), 2, UP, $ MYCOL ) END IF END IF END IF 350 CONTINUE END IF 360 CONTINUE * DO 380 KI = 1, IBULGE IF( KROW( KI ).GT.KP2ROW( KI ) ) $ GO TO 380 IF( ( MYROW.NE.ICURROW( KI ) ) .AND. $ ( DOWN.NE.ICURROW( KI ) ) )GO TO 380 ISTART = MAX( K1( KI ), M ) ISTOP = MIN( K2( KI ), I-1 ) IF( ( ISTART.EQ.ISTOP ) .OR. $ ( MOD( ISTART-1, HBL ).GE.HBL-2 ) .OR. $ ( ICURROW( KI ).NE.MYROW ) ) THEN DO 370 K = ISTART, ISTOP V2 = WORK( VECSIDX+( K-1 )*3+1 ) V3 = WORK( VECSIDX+( K-1 )*3+2 ) T1 = WORK( VECSIDX+( K-1 )*3+3 ) NR = MIN( 3, I-K+1 ) T2 = T1*V2 IF( ( NR.EQ.3 ) .AND. ( KROW( KI ).LE. $ KP2ROW( KI ) ) ) THEN T3 = T1*V3 IF( ( K.LT.ISTOP ) .AND. $ ( MOD( K-1, HBL ).LT.HBL-2 ) ) THEN ITMP1 = MIN( K2( KI )+1, I-1 ) + 1 ELSE IF( MOD( K-1, HBL ).LT.HBL-2 ) THEN ITMP1 = MIN( K2( KI )+1, I-1 ) + 1 END IF IF( MOD( K-1, HBL ).EQ.HBL-2 ) THEN ITMP1 = MIN( K+4, I2 ) + 1 END IF IF( MOD( K-1, HBL ).EQ.HBL-1 ) THEN ITMP1 = MIN( K+3, I2 ) + 1 END IF END IF * * Find local coor of rows K through K+2 * IROW1 = KROW( KI ) IROW2 = KP2ROW( KI ) IF( ( K.GT.ISTART ) .AND. $ ( MOD( K-1, HBL ).GE.HBL-2 ) ) THEN IF( DOWN.EQ.ICURROW( KI ) ) THEN IROW1 = IROW1 + 1 END IF IF( MYROW.EQ.ICURROW( KI ) ) THEN IROW2 = IROW2 + 1 END IF END IF CALL INFOG1L( ITMP1, HBL, NPCOL, MYCOL, $ JAFIRST, ICOL1, ICOL2 ) ICOL2 = LOCALI2 IF( ( MOD( K-1, HBL ).EQ.HBL-2 ) .AND. $ ( NPROW.GT.1 ) ) THEN IF( IROW1.NE.IROW2 ) THEN IF( ISTART.EQ.ISTOP ) THEN CALL SGERV2D( CONTXT, 2, ICOL2-ICOL1+1, $ A( ( ICOL1-1 )*LDA+ $ IROW1 ), LDA, DOWN, $ MYCOL ) END IF END IF END IF IF( ( MOD( K-1, HBL ).EQ.HBL-1 ) .AND. $ ( NPROW.GT.1 ) ) THEN IF( IROW1.EQ.IROW2 ) THEN CALL SGERV2D( CONTXT, 2, ICOL2-ICOL1+1, $ A( ( ICOL1-1 )*LDA+IROW1- $ 1 ), LDA, DOWN, MYCOL ) END IF END IF END IF 370 CONTINUE END IF 380 CONTINUE * 390 CONTINUE * * Now start major set of block COL reflections * DO 400 KI = 1, IBULGE IF( ( MYCOL.NE.ICURCOL( KI ) ) .AND. $ ( RIGHT.NE.ICURCOL( KI ) ) )GO TO 400 ISTART = MAX( K1( KI ), M ) ISTOP = MIN( K2( KI ), I-1 ) * IF( ( ( MOD( ISTART-1, HBL ).LT.HBL-2 ) .OR. ( NPCOL.EQ. $ 1 ) ) .AND. ( ICURCOL( KI ).EQ.MYCOL ) .AND. $ ( I-ISTOP+1.GE.3 ) ) THEN K = ISTART IF( ( K.LT.ISTOP ) .AND. ( MOD( K-1, $ HBL ).LT.HBL-2 ) ) THEN ITMP1 = MIN( ISTART+1, I ) - 1 ELSE IF( MOD( K-1, HBL ).LT.HBL-2 ) THEN ITMP1 = MIN( K+3, I ) END IF IF( MOD( K-1, HBL ).EQ.HBL-2 ) THEN ITMP1 = MAX( I1, K-1 ) - 1 END IF IF( MOD( K-1, HBL ).EQ.HBL-1 ) THEN ITMP1 = MAX( I1, K-2 ) - 1 END IF END IF * ICOL1 = KCOL( KI ) CALL INFOG1L( I1, HBL, NPROW, MYROW, IAFIRST, $ IROW1, IROW2 ) IROW2 = NUMROC( ITMP1, HBL, MYROW, IAFIRST, NPROW ) IF( IROW1.LE.IROW2 ) THEN ITMP2 = IROW2 ELSE ITMP2 = -1 END IF CALL SLAREF( 'Col', A, LDA, WANTZ, Z, LDZ, .TRUE., $ ICOL1, ICOL1, ISTART, ISTOP, IROW1, $ IROW2, LILOZ, LIHIZ, WORK( VECSIDX+1 ), $ V2, V3, T1, T2, T3 ) K = ISTOP IF( MOD( K-1, HBL ).LT.HBL-2 ) THEN * * Do from ITMP1+1 to MIN(K+3,I) * IF( MOD( K-1, HBL ).LT.HBL-3 ) THEN IROW1 = ITMP2 + 1 IF( MOD( ( ITMP1 / HBL ), NPROW ).EQ.MYROW ) $ THEN IF( ITMP2.GT.0 ) THEN IROW2 = ITMP2 + MIN( K+3, I ) - ITMP1 ELSE IROW2 = IROW1 - 1 END IF ELSE IROW2 = IROW1 - 1 END IF ELSE CALL INFOG1L( ITMP1+1, HBL, NPROW, MYROW, $ IAFIRST, IROW1, IROW2 ) IROW2 = NUMROC( MIN( K+3, I ), HBL, MYROW, $ IAFIRST, NPROW ) END IF V2 = WORK( VECSIDX+( K-1 )*3+1 ) V3 = WORK( VECSIDX+( K-1 )*3+2 ) T1 = WORK( VECSIDX+( K-1 )*3+3 ) T2 = T1*V2 T3 = T1*V3 ICOL1 = KCOL( KI ) + ISTOP - ISTART CALL SLAREF( 'Col', A, LDA, .FALSE., Z, LDZ, $ .FALSE., ICOL1, ICOL1, ISTART, ISTOP, $ IROW1, IROW2, LILOZ, LIHIZ, $ WORK( VECSIDX+1 ), V2, V3, T1, T2, $ T3 ) END IF END IF 400 CONTINUE * DO 460 KI = 1, IBULGE IF( KCOL( KI ).GT.KP2COL( KI ) ) $ GO TO 460 IF( ( MYCOL.NE.ICURCOL( KI ) ) .AND. $ ( RIGHT.NE.ICURCOL( KI ) ) )GO TO 460 ISTART = MAX( K1( KI ), M ) ISTOP = MIN( K2( KI ), I-1 ) IF( MOD( ISTART-1, HBL ).GE.HBL-2 ) THEN * * INFO is found in a buffer * ISPEC = 1 ELSE * * All INFO is local * ISPEC = 0 END IF DO 450 K = ISTART, ISTOP * V2 = WORK( VECSIDX+( K-1 )*3+1 ) V3 = WORK( VECSIDX+( K-1 )*3+2 ) T1 = WORK( VECSIDX+( K-1 )*3+3 ) NR = MIN( 3, I-K+1 ) T2 = T1*V2 IF( ( NR.EQ.3 ) .AND. ( KCOL( KI ).LE.KP2COL( KI ) ) ) $ THEN T3 = T1*V3 * IF( ( K.LT.ISTOP ) .AND. $ ( MOD( K-1, HBL ).LT.HBL-2 ) ) THEN ITMP1 = MIN( ISTART+1, I ) - 1 ELSE IF( MOD( K-1, HBL ).LT.HBL-2 ) THEN ITMP1 = MIN( K+3, I ) END IF IF( MOD( K-1, HBL ).EQ.HBL-2 ) THEN ITMP1 = MAX( I1, K-1 ) - 1 END IF IF( MOD( K-1, HBL ).EQ.HBL-1 ) THEN ITMP1 = MAX( I1, K-2 ) - 1 END IF END IF IF( MOD( K-1, HBL ).LT.HBL-2 ) THEN ICOL1 = KCOL( KI ) + K - ISTART ICOL2 = KP2COL( KI ) + K - ISTART ELSE ICOL1 = KCOL( KI ) ICOL2 = KP2COL( KI ) IF( K.GT.ISTART ) THEN IF( RIGHT.EQ.ICURCOL( KI ) ) THEN ICOL1 = ICOL1 + 1 END IF IF( MYCOL.EQ.ICURCOL( KI ) ) THEN ICOL2 = ICOL2 + 1 END IF END IF END IF CALL INFOG1L( I1, HBL, NPROW, MYROW, IAFIRST, $ IROW1, IROW2 ) IROW2 = NUMROC( ITMP1, HBL, MYROW, IAFIRST, $ NPROW ) IF( ( MOD( K-1, HBL ).EQ.HBL-2 ) .AND. $ ( NPCOL.GT.1 ) ) THEN IF( ICOL1.NE.ICOL2 ) THEN CALL SGESD2D( CONTXT, IROW2-IROW1+1, 2, $ A( ( ICOL1-1 )*LDA+IROW1 ), $ LDA, MYROW, RIGHT ) IF( ( ISTART.EQ.ISTOP ) .AND. SKIP ) THEN CALL SGERV2D( CONTXT, IROW2-IROW1+1, 2, $ A( ( ICOL1-1 )*LDA+IROW1 ), $ LDA, MYROW, RIGHT ) END IF ELSE IF( SKIP ) THEN CALL SGERV2D( CONTXT, IROW2-IROW1+1, 2, $ WORK( ICBUF+1 ), IROW2-IROW1+1, $ MYROW, LEFT ) II = ICBUF - IROW1 + 1 JJ = ICBUF + IROW2 - 2*IROW1 + 2 DO 410 J = IROW1, IROW2 SUM = WORK( II+J ) + V2*WORK( JJ+J ) + $ V3*A( ( ICOL1-1 )*LDA+J ) WORK( II+J ) = WORK( II+J ) - SUM*T1 WORK( JJ+J ) = WORK( JJ+J ) - SUM*T2 A( ( ICOL1-1 )*LDA+J ) = A( ( ICOL1-1 )* $ LDA+J ) - SUM*T3 410 CONTINUE IF( ISTART.EQ.ISTOP ) THEN CALL SGESD2D( CONTXT, IROW2-IROW1+1, 2, $ WORK( ICBUF+1 ), $ IROW2-IROW1+1, MYROW, LEFT ) END IF END IF END IF IF( ( MOD( K-1, HBL ).EQ.HBL-1 ) .AND. $ ( NPCOL.GT.1 ) ) THEN IF( ICOL1.EQ.ICOL2 ) THEN IF( ISTART.EQ.ISTOP ) THEN CALL SGESD2D( CONTXT, IROW2-IROW1+1, 2, $ A( ( ICOL1-2 )*LDA+IROW1 ), $ LDA, MYROW, RIGHT ) END IF IF( SKIP ) THEN CALL SGERV2D( CONTXT, IROW2-IROW1+1, 2, $ A( ( ICOL1-2 )*LDA+IROW1 ), $ LDA, MYROW, RIGHT ) END IF ELSE IF( SKIP ) THEN IF( ISTART.EQ.ISTOP ) THEN CALL SGERV2D( CONTXT, IROW2-IROW1+1, 2, $ WORK( ICBUF+1 ), $ IROW2-IROW1+1, MYROW, LEFT ) END IF II = ICBUF + IROW2 - 2*IROW1 + 2 DO 420 J = IROW1, IROW2 SUM = WORK( J+II ) + $ V2*A( ( ICOL1-1 )*LDA+J ) + $ V3*A( ICOL1*LDA+J ) WORK( J+II ) = WORK( J+II ) - SUM*T1 A( ( ICOL1-1 )*LDA+J ) = A( ( ICOL1-1 )* $ LDA+J ) - SUM*T2 A( ICOL1*LDA+J ) = A( ICOL1*LDA+J ) - $ SUM*T3 420 CONTINUE CALL SGESD2D( CONTXT, IROW2-IROW1+1, 2, $ WORK( ICBUF+1 ), IROW2-IROW1+1, $ MYROW, LEFT ) END IF END IF * * If we want Z and we haven't already done any Z * IF( ( WANTZ ) .AND. ( MOD( K-1, $ HBL ).GE.HBL-2 ) .AND. ( NPCOL.GT.1 ) ) THEN * * Accumulate transformations in the matrix Z * IROW1 = LILOZ IROW2 = LIHIZ IF( MOD( K-1, HBL ).EQ.HBL-2 ) THEN IF( ICOL1.NE.ICOL2 ) THEN CALL SGESD2D( CONTXT, IROW2-IROW1+1, 2, $ Z( ( ICOL1-1 )*LDZ+IROW1 ), $ LDZ, MYROW, RIGHT ) IF( ( ISTART.EQ.ISTOP ) .AND. SKIP ) THEN CALL SGERV2D( CONTXT, IROW2-IROW1+1, 2, $ Z( ( ICOL1-1 )*LDZ+ $ IROW1 ), LDZ, MYROW, $ RIGHT ) END IF ELSE IF( SKIP ) THEN CALL SGERV2D( CONTXT, IROW2-IROW1+1, 2, $ WORK( IZBUF+1 ), $ IROW2-IROW1+1, MYROW, LEFT ) ICOL1 = ( ICOL1-1 )*LDZ II = IZBUF - IROW1 + 1 JJ = IZBUF + IROW2 - 2*IROW1 + 2 DO 430 J = IROW1, IROW2 SUM = WORK( II+J ) + V2*WORK( JJ+J ) + $ V3*Z( ICOL1+J ) WORK( II+J ) = WORK( II+J ) - SUM*T1 WORK( JJ+J ) = WORK( JJ+J ) - SUM*T2 Z( ICOL1+J ) = Z( ICOL1+J ) - SUM*T3 430 CONTINUE IF( ISTART.EQ.ISTOP ) THEN CALL SGESD2D( CONTXT, IROW2-IROW1+1, 2, $ WORK( IZBUF+1 ), $ IROW2-IROW1+1, MYROW, $ LEFT ) END IF END IF END IF IF( MOD( K-1, HBL ).EQ.HBL-1 ) THEN IF( ICOL1.EQ.ICOL2 ) THEN IF( ISTART.EQ.ISTOP ) THEN CALL SGESD2D( CONTXT, IROW2-IROW1+1, 2, $ Z( ( ICOL1-2 )*LDZ+ $ IROW1 ), LDZ, MYROW, $ RIGHT ) END IF IF( SKIP ) THEN CALL SGERV2D( CONTXT, IROW2-IROW1+1, 2, $ Z( ( ICOL1-2 )*LDZ+ $ IROW1 ), LDZ, MYROW, $ RIGHT ) END IF ELSE IF( SKIP ) THEN IF( ISTART.EQ.ISTOP ) THEN CALL SGERV2D( CONTXT, IROW2-IROW1+1, 2, $ WORK( IZBUF+1 ), $ IROW2-IROW1+1, MYROW, $ LEFT ) END IF ICOL1 = ( ICOL1-1 )*LDZ II = IZBUF + IROW2 - 2*IROW1 + 2 DO 440 J = IROW1, IROW2 SUM = WORK( II+J ) + V2*Z( J+ICOL1 ) + $ V3*Z( J+ICOL1+LDZ ) WORK( II+J ) = WORK( II+J ) - SUM*T1 Z( J+ICOL1 ) = Z( J+ICOL1 ) - SUM*T2 Z( J+ICOL1+LDZ ) = Z( J+ICOL1+LDZ ) - $ SUM*T3 440 CONTINUE CALL SGESD2D( CONTXT, IROW2-IROW1+1, 2, $ WORK( IZBUF+1 ), $ IROW2-IROW1+1, MYROW, LEFT ) END IF END IF END IF END IF 450 CONTINUE 460 CONTINUE * IF( SKIP ) $ GO TO 550 * DO 520 KI = 1, IBULGE IF( KCOL( KI ).GT.KP2COL( KI ) ) $ GO TO 520 IF( ( MYCOL.NE.ICURCOL( KI ) ) .AND. $ ( RIGHT.NE.ICURCOL( KI ) ) )GO TO 520 ISTART = MAX( K1( KI ), M ) ISTOP = MIN( K2( KI ), I-1 ) IF( MOD( ISTART-1, HBL ).GE.HBL-2 ) THEN * * INFO is found in a buffer * ISPEC = 1 ELSE * * All INFO is local * ISPEC = 0 END IF DO 510 K = ISTART, ISTOP * V2 = WORK( VECSIDX+( K-1 )*3+1 ) V3 = WORK( VECSIDX+( K-1 )*3+2 ) T1 = WORK( VECSIDX+( K-1 )*3+3 ) NR = MIN( 3, I-K+1 ) T2 = T1*V2 IF( ( NR.EQ.3 ) .AND. ( KCOL( KI ).LE.KP2COL( KI ) ) ) $ THEN T3 = T1*V3 * IF( ( K.LT.ISTOP ) .AND. $ ( MOD( K-1, HBL ).LT.HBL-2 ) ) THEN ITMP1 = MIN( ISTART+1, I ) - 1 ELSE IF( MOD( K-1, HBL ).LT.HBL-2 ) THEN ITMP1 = MIN( K+3, I ) END IF IF( MOD( K-1, HBL ).EQ.HBL-2 ) THEN ITMP1 = MAX( I1, K-1 ) - 1 END IF IF( MOD( K-1, HBL ).EQ.HBL-1 ) THEN ITMP1 = MAX( I1, K-2 ) - 1 END IF END IF IF( MOD( K-1, HBL ).LT.HBL-2 ) THEN ICOL1 = KCOL( KI ) + K - ISTART ICOL2 = KP2COL( KI ) + K - ISTART ELSE ICOL1 = KCOL( KI ) ICOL2 = KP2COL( KI ) IF( K.GT.ISTART ) THEN IF( RIGHT.EQ.ICURCOL( KI ) ) THEN ICOL1 = ICOL1 + 1 END IF IF( MYCOL.EQ.ICURCOL( KI ) ) THEN ICOL2 = ICOL2 + 1 END IF END IF END IF CALL INFOG1L( I1, HBL, NPROW, MYROW, IAFIRST, $ IROW1, IROW2 ) IROW2 = NUMROC( ITMP1, HBL, MYROW, IAFIRST, $ NPROW ) IF( ( MOD( K-1, HBL ).EQ.HBL-2 ) .AND. $ ( NPCOL.GT.1 ) ) THEN IF( ICOL1.EQ.ICOL2 ) THEN CALL SGERV2D( CONTXT, IROW2-IROW1+1, 2, $ WORK( ICBUF+1 ), IROW2-IROW1+1, $ MYROW, LEFT ) II = ICBUF - IROW1 + 1 JJ = ICBUF + IROW2 - 2*IROW1 + 2 DO 470 J = IROW1, IROW2 SUM = WORK( II+J ) + V2*WORK( JJ+J ) + $ V3*A( ( ICOL1-1 )*LDA+J ) WORK( II+J ) = WORK( II+J ) - SUM*T1 WORK( JJ+J ) = WORK( JJ+J ) - SUM*T2 A( ( ICOL1-1 )*LDA+J ) = A( ( ICOL1-1 )* $ LDA+J ) - SUM*T3 470 CONTINUE IF( ISTART.EQ.ISTOP ) THEN CALL SGESD2D( CONTXT, IROW2-IROW1+1, 2, $ WORK( ICBUF+1 ), $ IROW2-IROW1+1, MYROW, LEFT ) END IF END IF END IF IF( ( MOD( K-1, HBL ).EQ.HBL-1 ) .AND. $ ( NPCOL.GT.1 ) ) THEN IF( ICOL1.NE.ICOL2 ) THEN IF( ISTART.EQ.ISTOP ) THEN CALL SGERV2D( CONTXT, IROW2-IROW1+1, 2, $ WORK( ICBUF+1 ), $ IROW2-IROW1+1, MYROW, LEFT ) END IF II = ICBUF + IROW2 - 2*IROW1 + 2 DO 480 J = IROW1, IROW2 SUM = WORK( J+II ) + $ V2*A( ( ICOL1-1 )*LDA+J ) + $ V3*A( ICOL1*LDA+J ) WORK( J+II ) = WORK( J+II ) - SUM*T1 A( ( ICOL1-1 )*LDA+J ) = A( ( ICOL1-1 )* $ LDA+J ) - SUM*T2 A( ICOL1*LDA+J ) = A( ICOL1*LDA+J ) - $ SUM*T3 480 CONTINUE CALL SGESD2D( CONTXT, IROW2-IROW1+1, 2, $ WORK( ICBUF+1 ), IROW2-IROW1+1, $ MYROW, LEFT ) END IF END IF * * * If we want Z and we haven't already done any Z IF( ( WANTZ ) .AND. ( MOD( K-1, $ HBL ).GE.HBL-2 ) .AND. ( NPCOL.GT.1 ) ) THEN * * Accumulate transformations in the matrix Z * IROW1 = LILOZ IROW2 = LIHIZ IF( MOD( K-1, HBL ).EQ.HBL-2 ) THEN IF( ICOL1.EQ.ICOL2 ) THEN CALL SGERV2D( CONTXT, IROW2-IROW1+1, 2, $ WORK( IZBUF+1 ), $ IROW2-IROW1+1, MYROW, LEFT ) ICOL1 = ( ICOL1-1 )*LDZ II = IZBUF - IROW1 + 1 JJ = IZBUF + IROW2 - 2*IROW1 + 2 DO 490 J = IROW1, IROW2 SUM = WORK( II+J ) + V2*WORK( JJ+J ) + $ V3*Z( ICOL1+J ) WORK( II+J ) = WORK( II+J ) - SUM*T1 WORK( JJ+J ) = WORK( JJ+J ) - SUM*T2 Z( ICOL1+J ) = Z( ICOL1+J ) - SUM*T3 490 CONTINUE IF( ISTART.EQ.ISTOP ) THEN CALL SGESD2D( CONTXT, IROW2-IROW1+1, 2, $ WORK( IZBUF+1 ), $ IROW2-IROW1+1, MYROW, $ LEFT ) END IF END IF END IF IF( MOD( K-1, HBL ).EQ.HBL-1 ) THEN IF( ICOL1.NE.ICOL2 ) THEN IF( ISTART.EQ.ISTOP ) THEN CALL SGERV2D( CONTXT, IROW2-IROW1+1, 2, $ WORK( IZBUF+1 ), $ IROW2-IROW1+1, MYROW, $ LEFT ) END IF ICOL1 = ( ICOL1-1 )*LDZ II = IZBUF + IROW2 - 2*IROW1 + 2 DO 500 J = IROW1, IROW2 SUM = WORK( II+J ) + V2*Z( J+ICOL1 ) + $ V3*Z( J+ICOL1+LDZ ) WORK( II+J ) = WORK( II+J ) - SUM*T1 Z( J+ICOL1 ) = Z( J+ICOL1 ) - SUM*T2 Z( J+ICOL1+LDZ ) = Z( J+ICOL1+LDZ ) - $ SUM*T3 500 CONTINUE CALL SGESD2D( CONTXT, IROW2-IROW1+1, 2, $ WORK( IZBUF+1 ), $ IROW2-IROW1+1, MYROW, LEFT ) END IF END IF END IF END IF 510 CONTINUE 520 CONTINUE DO 540 KI = 1, IBULGE IF( KCOL( KI ).GT.KP2COL( KI ) ) $ GO TO 540 IF( ( MYCOL.NE.ICURCOL( KI ) ) .AND. $ ( RIGHT.NE.ICURCOL( KI ) ) )GO TO 540 ISTART = MAX( K1( KI ), M ) ISTOP = MIN( K2( KI ), I-1 ) IF( MOD( ISTART-1, HBL ).GE.HBL-2 ) THEN * * INFO is found in a buffer * ISPEC = 1 ELSE * * All INFO is local * ISPEC = 0 END IF DO 530 K = ISTART, ISTOP * V2 = WORK( VECSIDX+( K-1 )*3+1 ) V3 = WORK( VECSIDX+( K-1 )*3+2 ) T1 = WORK( VECSIDX+( K-1 )*3+3 ) NR = MIN( 3, I-K+1 ) T2 = T1*V2 IF( ( NR.EQ.3 ) .AND. ( KCOL( KI ).LE.KP2COL( KI ) ) ) $ THEN T3 = T1*V3 * IF( ( K.LT.ISTOP ) .AND. $ ( MOD( K-1, HBL ).LT.HBL-2 ) ) THEN ITMP1 = MIN( ISTART+1, I ) - 1 ELSE IF( MOD( K-1, HBL ).LT.HBL-2 ) THEN ITMP1 = MIN( K+3, I ) END IF IF( MOD( K-1, HBL ).EQ.HBL-2 ) THEN ITMP1 = MAX( I1, K-1 ) - 1 END IF IF( MOD( K-1, HBL ).EQ.HBL-1 ) THEN ITMP1 = MAX( I1, K-2 ) - 1 END IF END IF IF( MOD( K-1, HBL ).LT.HBL-2 ) THEN ICOL1 = KCOL( KI ) + K - ISTART ICOL2 = KP2COL( KI ) + K - ISTART ELSE ICOL1 = KCOL( KI ) ICOL2 = KP2COL( KI ) IF( K.GT.ISTART ) THEN IF( RIGHT.EQ.ICURCOL( KI ) ) THEN ICOL1 = ICOL1 + 1 END IF IF( MYCOL.EQ.ICURCOL( KI ) ) THEN ICOL2 = ICOL2 + 1 END IF END IF END IF CALL INFOG1L( I1, HBL, NPROW, MYROW, IAFIRST, $ IROW1, IROW2 ) IROW2 = NUMROC( ITMP1, HBL, MYROW, IAFIRST, $ NPROW ) IF( ( MOD( K-1, HBL ).EQ.HBL-2 ) .AND. $ ( NPCOL.GT.1 ) ) THEN IF( ICOL1.NE.ICOL2 ) THEN IF( ISTART.EQ.ISTOP ) THEN CALL SGERV2D( CONTXT, IROW2-IROW1+1, 2, $ A( ( ICOL1-1 )*LDA+IROW1 ), $ LDA, MYROW, RIGHT ) END IF END IF END IF IF( ( MOD( K-1, HBL ).EQ.HBL-1 ) .AND. $ ( NPCOL.GT.1 ) ) THEN IF( ICOL1.EQ.ICOL2 ) THEN CALL SGERV2D( CONTXT, IROW2-IROW1+1, 2, $ A( ( ICOL1-2 )*LDA+IROW1 ), $ LDA, MYROW, RIGHT ) END IF END IF * * If we want Z and we haven't already done any Z * IF( ( WANTZ ) .AND. ( MOD( K-1, $ HBL ).GE.HBL-2 ) .AND. ( NPCOL.GT.1 ) ) THEN * * Accumulate transformations in the matrix Z * IROW1 = LILOZ IROW2 = LIHIZ IF( MOD( K-1, HBL ).EQ.HBL-2 ) THEN IF( ICOL1.NE.ICOL2 ) THEN IF( ISTART.EQ.ISTOP ) THEN CALL SGERV2D( CONTXT, IROW2-IROW1+1, 2, $ Z( ( ICOL1-1 )*LDZ+ $ IROW1 ), LDZ, MYROW, $ RIGHT ) END IF END IF END IF IF( MOD( K-1, HBL ).EQ.HBL-1 ) THEN IF( ICOL1.EQ.ICOL2 ) THEN CALL SGERV2D( CONTXT, IROW2-IROW1+1, 2, $ Z( ( ICOL1-2 )*LDZ+IROW1 ), $ LDZ, MYROW, RIGHT ) END IF END IF END IF END IF 530 CONTINUE 540 CONTINUE * * Column work done * 550 CONTINUE * * Now do NR=2 work * DO 630 KI = 1, IBULGE ISTART = MAX( K1( KI ), M ) ISTOP = MIN( K2( KI ), I-1 ) IF( MOD( ISTART-1, HBL ).GE.HBL-2 ) THEN * * INFO is found in a buffer * ISPEC = 1 ELSE * * All INFO is local * ISPEC = 0 END IF * DO 620 K = ISTART, ISTOP * V2 = WORK( VECSIDX+( K-1 )*3+1 ) V3 = WORK( VECSIDX+( K-1 )*3+2 ) T1 = WORK( VECSIDX+( K-1 )*3+3 ) NR = MIN( 3, I-K+1 ) T2 = T1*V2 IF( NR.EQ.2 ) THEN * * Apply G from the left to transform the rows of the matrix * in columns K to I2. * CALL INFOG1L( K, HBL, NPCOL, MYCOL, JAFIRST, $ LILOH, LIHIH ) LIHIH = LOCALI2 CALL INFOG1L( 1, HBL, NPROW, MYROW, IAFIRST, ITMP2, $ ITMP1 ) ITMP1 = NUMROC( K+1, HBL, MYROW, IAFIRST, NPROW ) IF( ICURROW( KI ).EQ.MYROW ) THEN IF( ( ISPEC.EQ.0 ) .OR. ( NPROW.EQ.1 ) .OR. $ ( MOD( K-1, HBL ).EQ.HBL-2 ) ) THEN ITMP1 = ITMP1 - 1 DO 560 J = ( LILOH-1 )*LDA, $ ( LIHIH-1 )*LDA, LDA SUM = A( ITMP1+J ) + V2*A( ITMP1+1+J ) A( ITMP1+J ) = A( ITMP1+J ) - SUM*T1 A( ITMP1+1+J ) = A( ITMP1+1+J ) - SUM*T2 560 CONTINUE ELSE IF( MOD( K-1, HBL ).EQ.HBL-1 ) THEN CALL SGERV2D( CONTXT, 1, LIHIH-LILOH+1, $ WORK( IRBUF+1 ), 1, UP, $ MYCOL ) DO 570 J = LILOH, LIHIH SUM = WORK( IRBUF+J-LILOH+1 ) + $ V2*A( ( J-1 )*LDA+ITMP1 ) WORK( IRBUF+J-LILOH+1 ) = WORK( IRBUF+ $ J-LILOH+1 ) - SUM*T1 A( ( J-1 )*LDA+ITMP1 ) = A( ( J-1 )* $ LDA+ITMP1 ) - SUM*T2 570 CONTINUE CALL SGESD2D( CONTXT, 1, LIHIH-LILOH+1, $ WORK( IRBUF+1 ), 1, UP, $ MYCOL ) END IF END IF ELSE IF( ( MOD( K-1, HBL ).EQ.HBL-1 ) .AND. $ ( ICURROW( KI ).EQ.DOWN ) ) THEN CALL SGESD2D( CONTXT, 1, LIHIH-LILOH+1, $ A( ( LILOH-1 )*LDA+ITMP1 ), $ LDA, DOWN, MYCOL ) CALL SGERV2D( CONTXT, 1, LIHIH-LILOH+1, $ A( ( LILOH-1 )*LDA+ITMP1 ), $ LDA, DOWN, MYCOL ) END IF END IF * * Apply G from the right to transform the columns of the * matrix in rows I1 to MIN(K+3,I). * CALL INFOG1L( I1, HBL, NPROW, MYROW, IAFIRST, $ LILOH, LIHIH ) LIHIH = NUMROC( I, HBL, MYROW, IAFIRST, NPROW ) * IF( ICURCOL( KI ).EQ.MYCOL ) THEN * LOCAL A(LILOZ:LIHIZ,KCOL:KCOL+2) IF( ( ISPEC.EQ.0 ) .OR. ( NPCOL.EQ.1 ) .OR. $ ( MOD( K-1, HBL ).EQ.HBL-2 ) ) THEN CALL INFOG1L( K, HBL, NPCOL, MYCOL, JAFIRST, $ ITMP1, ITMP2 ) ITMP2 = NUMROC( K+1, HBL, MYCOL, JAFIRST, $ NPCOL ) DO 580 J = LILOH, LIHIH SUM = A( ( ITMP1-1 )*LDA+J ) + $ V2*A( ITMP1*LDA+J ) A( ( ITMP1-1 )*LDA+J ) = A( ( ITMP1-1 )* $ LDA+J ) - SUM*T1 A( ITMP1*LDA+J ) = A( ITMP1*LDA+J ) - $ SUM*T2 580 CONTINUE ELSE ITMP1 = KCOL( KI ) IF( MOD( K-1, HBL ).EQ.HBL-1 ) THEN CALL SGERV2D( CONTXT, LIHIH-LILOH+1, 1, $ WORK( ICBUF+1 ), $ LIHIH-LILOH+1, MYROW, LEFT ) DO 590 J = LILOH, LIHIH SUM = WORK( ICBUF+J ) + $ V2*A( ( ITMP1-1 )*LDA+J ) WORK( ICBUF+J ) = WORK( ICBUF+J ) - $ SUM*T1 A( ( ITMP1-1 )*LDA+J ) $ = A( ( ITMP1-1 )*LDA+J ) - SUM*T2 590 CONTINUE CALL SGESD2D( CONTXT, LIHIH-LILOH+1, 1, $ WORK( ICBUF+1 ), $ LIHIH-LILOH+1, MYROW, LEFT ) END IF END IF ELSE IF( ( MOD( K-1, HBL ).EQ.HBL-1 ) .AND. $ ( ICURCOL( KI ).EQ.RIGHT ) ) THEN ITMP1 = KCOL( KI ) CALL SGESD2D( CONTXT, LIHIH-LILOH+1, 1, $ A( ( ITMP1-1 )*LDA+LILOH ), $ LDA, MYROW, RIGHT ) CALL INFOG1L( K, HBL, NPCOL, MYCOL, JAFIRST, $ ITMP1, ITMP2 ) ITMP2 = NUMROC( K+1, HBL, MYCOL, JAFIRST, $ NPCOL ) CALL SGERV2D( CONTXT, LIHIH-LILOH+1, 1, $ A( ( ITMP1-1 )*LDA+LILOH ), $ LDA, MYROW, RIGHT ) END IF END IF * IF( WANTZ ) THEN * * Accumulate transformations in the matrix Z * IF( ICURCOL( KI ).EQ.MYCOL ) THEN * LOCAL Z(LILOZ:LIHIZ,KCOL:KCOL+2) IF( ( ISPEC.EQ.0 ) .OR. ( NPCOL.EQ.1 ) .OR. $ ( MOD( K-1, HBL ).EQ.HBL-2 ) ) THEN ITMP1 = KCOL( KI ) + K - ISTART ITMP1 = ( ITMP1-1 )*LDZ DO 600 J = LILOZ, LIHIZ SUM = Z( J+ITMP1 ) + $ V2*Z( J+ITMP1+LDZ ) Z( J+ITMP1 ) = Z( J+ITMP1 ) - SUM*T1 Z( J+ITMP1+LDZ ) = Z( J+ITMP1+LDZ ) - $ SUM*T2 600 CONTINUE ELSE ITMP1 = KCOL( KI ) * IF WE ACTUALLY OWN COLUMN K IF( MOD( K-1, HBL ).EQ.HBL-1 ) THEN CALL SGERV2D( CONTXT, LIHIZ-LILOZ+1, 1, $ WORK( IZBUF+1 ), LDZ, $ MYROW, LEFT ) ITMP1 = ( ITMP1-1 )*LDZ DO 610 J = LILOZ, LIHIZ SUM = WORK( IZBUF+J ) + $ V2*Z( J+ITMP1 ) WORK( IZBUF+J ) = WORK( IZBUF+J ) - $ SUM*T1 Z( J+ITMP1 ) = Z( J+ITMP1 ) - SUM*T2 610 CONTINUE CALL SGESD2D( CONTXT, LIHIZ-LILOZ+1, 1, $ WORK( IZBUF+1 ), LDZ, $ MYROW, LEFT ) END IF END IF ELSE * * NO WORK BUT NEED TO UPDATE ANYWAY???? * IF( ( MOD( K-1, HBL ).EQ.HBL-1 ) .AND. $ ( ICURCOL( KI ).EQ.RIGHT ) ) THEN ITMP1 = KCOL( KI ) ITMP1 = ( ITMP1-1 )*LDZ CALL SGESD2D( CONTXT, LIHIZ-LILOZ+1, 1, $ Z( LILOZ+ITMP1 ), LDZ, $ MYROW, RIGHT ) CALL SGERV2D( CONTXT, LIHIZ-LILOZ+1, 1, $ Z( LILOZ+ITMP1 ), LDZ, $ MYROW, RIGHT ) END IF END IF END IF END IF 620 CONTINUE * * Adjust local information for this bulge * IF( NPROW.EQ.1 ) THEN KROW( KI ) = KROW( KI ) + K2( KI ) - K1( KI ) + 1 KP2ROW( KI ) = KP2ROW( KI ) + K2( KI ) - K1( KI ) + 1 END IF IF( ( MOD( K1( KI )-1, HBL ).LT.HBL-2 ) .AND. $ ( ICURROW( KI ).EQ.MYROW ) .AND. ( NPROW.GT.1 ) ) $ THEN KROW( KI ) = KROW( KI ) + K2( KI ) - K1( KI ) + 1 END IF IF( ( MOD( K2( KI ), HBL ).LT.HBL-2 ) .AND. $ ( ICURROW( KI ).EQ.MYROW ) .AND. ( NPROW.GT.1 ) ) $ THEN KP2ROW( KI ) = KP2ROW( KI ) + K2( KI ) - K1( KI ) + 1 END IF IF( ( MOD( K1( KI )-1, HBL ).GE.HBL-2 ) .AND. $ ( ( MYROW.EQ.ICURROW( KI ) ) .OR. ( DOWN.EQ. $ ICURROW( KI ) ) ) .AND. ( NPROW.GT.1 ) ) THEN CALL INFOG1L( K2( KI )+1, HBL, NPROW, MYROW, IAFIRST, $ KROW( KI ), ITMP2 ) END IF IF( ( MOD( K2( KI ), HBL ).GE.HBL-2 ) .AND. $ ( ( MYROW.EQ.ICURROW( KI ) ) .OR. ( UP.EQ. $ ICURROW( KI ) ) ) .AND. ( NPROW.GT.1 ) ) THEN KP2ROW( KI ) = NUMROC( K2( KI )+3, HBL, MYROW, $ IAFIRST, NPROW ) END IF IF( NPCOL.EQ.1 ) THEN KCOL( KI ) = KCOL( KI ) + K2( KI ) - K1( KI ) + 1 KP2COL( KI ) = KP2COL( KI ) + K2( KI ) - K1( KI ) + 1 END IF IF( ( MOD( K1( KI )-1, HBL ).LT.HBL-2 ) .AND. $ ( ICURCOL( KI ).EQ.MYCOL ) .AND. ( NPCOL.GT.1 ) ) $ THEN KCOL( KI ) = KCOL( KI ) + K2( KI ) - K1( KI ) + 1 END IF IF( ( MOD( K2( KI ), HBL ).LT.HBL-2 ) .AND. $ ( ICURCOL( KI ).EQ.MYCOL ) .AND. ( NPCOL.GT.1 ) ) $ THEN KP2COL( KI ) = KP2COL( KI ) + K2( KI ) - K1( KI ) + 1 END IF IF( ( MOD( K1( KI )-1, HBL ).GE.HBL-2 ) .AND. $ ( ( MYCOL.EQ.ICURCOL( KI ) ) .OR. ( RIGHT.EQ. $ ICURCOL( KI ) ) ) .AND. ( NPCOL.GT.1 ) ) THEN CALL INFOG1L( K2( KI )+1, HBL, NPCOL, MYCOL, JAFIRST, $ KCOL( KI ), ITMP2 ) END IF IF( ( MOD( K2( KI ), HBL ).GE.HBL-2 ) .AND. $ ( ( MYCOL.EQ.ICURCOL( KI ) ) .OR. ( LEFT.EQ. $ ICURCOL( KI ) ) ) .AND. ( NPCOL.GT.1 ) ) THEN KP2COL( KI ) = NUMROC( K2( KI )+3, HBL, MYCOL, $ JAFIRST, NPCOL ) END IF K1( KI ) = K2( KI ) + 1 ISTOP = MIN( K1( KI )+ROTN-1, I-2 ) ISTOP = MIN( ISTOP, K1( KI )+HBL-3- $ MOD( K1( KI )-1, HBL ) ) ISTOP = MIN( ISTOP, I2-2 ) ISTOP = MAX( ISTOP, K1( KI ) ) IF( ( MOD( K1( KI )-1, HBL ).EQ.HBL-2 ) .AND. $ ( ISTOP.LT.MIN( I-2, I2-2 ) ) ) THEN ISTOP = ISTOP + 1 END IF K2( KI ) = ISTOP IF( K1( KI ).LE.ISTOP ) THEN IF( ( MOD( K1( KI )-1, HBL ).EQ.HBL-2 ) .AND. $ ( I-K1( KI ).GT.1 ) ) THEN * * Next step switches rows & cols * ICURROW( KI ) = MOD( ICURROW( KI )+1, NPROW ) ICURCOL( KI ) = MOD( ICURCOL( KI )+1, NPCOL ) END IF END IF 630 CONTINUE IF( K2( IBULGE ).LE.I-1 ) $ GO TO 40 END IF * 640 CONTINUE * * Failure to converge in remaining number of iterations * INFO = I RETURN * 650 CONTINUE * IF( L.EQ.I ) THEN * * H(I,I-1) is negligible: one eigenvalue has converged. * CALL INFOG2L( I, I, DESCA, NPROW, NPCOL, MYROW, MYCOL, IROW, $ ICOL, ITMP1, ITMP2 ) IF( ( MYROW.EQ.ITMP1 ) .AND. ( MYCOL.EQ.ITMP2 ) ) THEN WR( I ) = A( ( ICOL-1 )*LDA+IROW ) ELSE WR( I ) = ZERO END IF WI( I ) = ZERO ELSE IF( L.EQ.I-1 ) THEN * * H(I-1,I-2) is negligible: a pair of eigenvalues have converged. * WR( I-1 ) = ZERO WR( I ) = ZERO WI( I-1 ) = ZERO WI( I ) = ZERO MODKM1 = MOD( I-1+HBL, HBL ) CALL INFOG2L( I-1, I-1, DESCA, NPROW, NPCOL, MYROW, MYCOL, $ IROW1, ICOL1, II, JJ ) IF( ( MYROW.EQ.II ) .AND. ( MYCOL.EQ.JJ ) ) THEN IF( MODKM1.NE.0 ) THEN H11 = A( ( ICOL1-1 )*LDA+IROW1 ) H21 = A( ( ICOL1-1 )*LDA+IROW1+1 ) H12 = A( ICOL1*LDA+IROW1 ) H22 = A( ICOL1*LDA+IROW1+1 ) ELSE IF( NPROW.GT.1 ) THEN CALL SGERV2D( CONTXT, 1, 1, H21, 1, DOWN, MYCOL ) ELSE H21 = A( ( ICOL1-1 )*LDA+IROW1+1 ) END IF IF( NPCOL.GT.1 ) THEN CALL SGERV2D( CONTXT, 1, 1, H12, 1, MYROW, RIGHT ) ELSE H12 = A( ICOL1*LDA+IROW1 ) END IF IF( NUM.GT.1 ) THEN CALL SGERV2D( CONTXT, 1, 1, H22, 1, DOWN, RIGHT ) ELSE H22 = A( ICOL1*LDA+IROW1+1 ) END IF END IF H00 = HALF*( H11+H22 ) H10 = H11*H22 - H12*H21 ELSE IF( MODKM1.EQ.0 ) THEN IF( ( NPROW.GT.1 ) .AND. ( MYCOL.EQ.JJ ) .AND. $ ( UP.EQ.II ) ) THEN CALL INFOG2L( I, I-1, DESCA, NPROW, NPCOL, MYROW, $ MYCOL, IROW1, ICOL1, ITMP1, ITMP2 ) CALL SGESD2D( CONTXT, 1, 1, $ A( ( ICOL1-1 )*LDA+IROW1 ), 1, II, JJ ) END IF IF( ( NPCOL.GT.1 ) .AND. ( LEFT.EQ.JJ ) .AND. $ ( MYROW.EQ.II ) ) THEN CALL INFOG2L( I-1, I, DESCA, NPROW, NPCOL, MYROW, $ MYCOL, IROW1, ICOL1, ITMP1, ITMP2 ) CALL SGESD2D( CONTXT, 1, 1, $ A( ( ICOL1-1 )*LDA+IROW1 ), 1, II, JJ ) END IF IF( ( NUM.GT.1 ) .AND. ( LEFT.EQ.JJ ) .AND. $ ( UP.EQ.II ) ) THEN CALL INFOG2L( I, I, DESCA, NPROW, NPCOL, MYROW, MYCOL, $ IROW1, ICOL1, ITMP1, ITMP2 ) CALL SGESD2D( CONTXT, 1, 1, $ A( ( ICOL1-1 )*LDA+IROW1 ), 1, II, JJ ) END IF END IF H00 = ZERO H10 = ZERO END IF H21 = H00*H00 - H10 IF( H21.GE.ZERO ) THEN H21 = SQRT( H21 ) WR( I-1 ) = H00 + H21 WI( I-1 ) = ZERO WR( I ) = H00 - H21 WI( I ) = ZERO ELSE H21 = SQRT( ABS( H21 ) ) WR( I-1 ) = H00 WI( I-1 ) = H21 WR( I ) = H00 WI( I ) = -H21 END IF ELSE * * Find the eigenvalues in H(L:I,L:I), L < I-1 * JBLK = I - L + 1 IF( JBLK.LE.2*IBLK ) THEN CALL PSLACP3( I-L+1, L, A, DESCA, S1, 2*IBLK, 0, 0, 0 ) CALL SLAHQR( .FALSE., .FALSE., JBLK, 1, JBLK, S1, 2*IBLK, $ WR( L ), WI( L ), 1, JBLK, Z, LDZ, IERR ) IF( NODE.NE.0 ) THEN * * Erase the eigenvalues * DO 660 K = L, I WR( K ) = ZERO WI( K ) = ZERO 660 CONTINUE END IF END IF END IF * * Decrement number of remaining iterations, and return to start of * the main loop with new value of I. * ITN = ITN - ITS I = L - 1 GO TO 10 * 670 CONTINUE CALL SGSUM2D( CONTXT, 'All', ' ', N, 1, WR, N, -1, -1 ) CALL SGSUM2D( CONTXT, 'All', ' ', N, 1, WI, N, -1, -1 ) RETURN * * END OF PSLAHQR * END