SUBROUTINE PSLAHQR( WANTT, WANTZ, N, ILO, IHI, A, DESCA, WR, WI,
$ ILOZ, IHIZ, Z, DESCZ, WORK, LWORK, IWORK,
$ ILWORK, INFO )
*
* -- ScaLAPACK auxiliary routine (version 1.5) --
* University of Tennessee, Knoxville, Oak Ridge National Laboratory,
* and University of California, Berkeley.
* May 1, 1997
*
* .. Scalar Arguments ..
LOGICAL WANTT, WANTZ
INTEGER IHI, IHIZ, ILO, ILOZ, ILWORK, INFO, LWORK, N,
$ ROTN
* ..
* .. Array Arguments ..
INTEGER DESCA( * ), DESCZ( * ), IWORK( * )
REAL A( * ), WI( * ), WORK( * ), WR( * ), Z( * )
* ..
*
* Purpose
* =======
*
* PSLAHQR is an auxiliary routine used to find the Schur decomposition
* and or eigenvalues of a matrix already in Hessenberg form from
* cols ILO to IHI.
* If Z = I, and WANTT=WANTZ=.TRUE., H gets replaced with Z'HZ,
* with Z'Z=I, and H in Schur form.
*
* Notes
* =====
*
* Each global data object is described by an associated description
* vector. This vector stores the information required to establish
* the mapping between an object element and its corresponding process
* and memory location.
*
* Let A be a generic term for any 2D block cyclicly distributed array.
* Such a global array has an associated description vector DESCA.
* In the following comments, the character _ should be read as
* "of the global array".
*
* NOTATION STORED IN EXPLANATION
* --------------- -------------- --------------------------------------
* DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case,
* DTYPE_A = 1.
* CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
* the BLACS process grid A is distribu-
* ted over. The context itself is glo-
* bal, but the handle (the integer
* value) may vary.
* M_A (global) DESCA( M_ ) The number of rows in the global
* array A.
* N_A (global) DESCA( N_ ) The number of columns in the global
* array A.
* MB_A (global) DESCA( MB_ ) The blocking factor used to distribute
* the rows of the array.
* NB_A (global) DESCA( NB_ ) The blocking factor used to distribute
* the columns of the array.
* RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
* row of the array A is distributed.
* CSRC_A (global) DESCA( CSRC_ ) The process column over which the
* first column of the array A is
* distributed.
* LLD_A (local) DESCA( LLD_ ) The leading dimension of the local
* array. LLD_A >= MAX(1,LOCp(M_A)).
*
* Let K be the number of rows or columns of a distributed matrix,
* and assume that its process grid has dimension p x q.
* LOCp( K ) denotes the number of elements of K that a process
* would receive if K were distributed over the p processes of its
* process column.
* Similarly, LOCq( K ) denotes the number of elements of K that a
* process would receive if K were distributed over the q processes of
* its process row.
* The values of LOCp() and LOCq() may be determined via a call to the
* ScaLAPACK tool function, NUMROC:
* LOCp( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
* LOCq( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).
* An upper bound for these quantities may be computed by:
* LOCp( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
* LOCq( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A
*
* Arguments
* =========
*
* WANTT (global input) LOGICAL
* = .TRUE. : the full Schur form T is required;
* = .FALSE.: only eigenvalues are required.
*
* WANTZ (global input) LOGICAL
* = .TRUE. : the matrix of Schur vectors Z is required;
* = .FALSE.: Schur vectors are not required.
*
* N (global input) INTEGER
* The order of the Hessenberg matrix A (and Z if WANTZ).
* N >= 0.
*
* ILO (global input) INTEGER
* IHI (global input) INTEGER
* It is assumed that A is already upper quasi-triangular in
* rows and columns IHI+1:N, and that A(ILO,ILO-1) = 0 (unless
* ILO = 1). PSLAHQR works primarily with the Hessenberg
* submatrix in rows and columns ILO to IHI, but applies
* transformations to all of H if WANTT is .TRUE..
* 1 <= ILO <= max(1,IHI); IHI <= N.
*
* A (global input/output) REAL array, dimension
* (DESCA(LLD_),*)
* On entry, the upper Hessenberg matrix A.
* On exit, if WANTT is .TRUE., A is upper quasi-triangular in
* rows and columns ILO:IHI, with any 2-by-2 or larger diagonal
* blocks not yet in standard form. If WANTT is .FALSE., the
* contents of A are unspecified on exit.
*
* DESCA (global and local input) INTEGER array of dimension DLEN_.
* The array descriptor for the distributed matrix A.
*
* WR (global replicated output) REAL array, dimension (N)
* WI (global replicated output) REAL array, dimension (N)
* The real and imaginary parts, respectively, of the computed
* eigenvalues ILO to IHI are stored in the corresponding
* elements of WR and WI. If two eigenvalues are computed as a
* complex conjugate pair, they are stored in consecutive
* elements of WR and WI, say the i-th and (i+1)th, with
* WI(i) > 0 and WI(i+1) < 0. If WANTT is .TRUE., the
* eigenvalues are stored in the same order as on the diagonal
* of the Schur form returned in A. A may be returned with
* larger diagonal blocks until the next release.
*
* ILOZ (global input) INTEGER
* IHIZ (global input) INTEGER
* Specify the rows of Z to which transformations must be
* applied if WANTZ is .TRUE..
* 1 <= ILOZ <= ILO; IHI <= IHIZ <= N.
*
* Z (global input/output) REAL array.
* If WANTZ is .TRUE., on entry Z must contain the current
* matrix Z of transformations accumulated by PDHSEQR, and on
* exit Z has been updated; transformations are applied only to
* the submatrix Z(ILOZ:IHIZ,ILO:IHI).
* If WANTZ is .FALSE., Z is not referenced.
*
* DESCZ (global and local input) INTEGER array of dimension DLEN_.
* The array descriptor for the distributed matrix Z.
*
* WORK (local output) REAL array of size LWORK
* (Unless LWORK=-1, in which case WORK must be at least size 1)
*
* LWORK (local input) INTEGER
* WORK(LWORK) is a local array and LWORK is assumed big enough
* so that LWORK >= 3*N +
* MAX( 2*MAX(DESCZ(LLD_),DESCA(LLD_)) + 2*LOCq(N),
* 7*Ceil(N/HBL)/LCM(NPROW,NPCOL)) +
* MAX( 2*N, (8*LCM(NPROW,NPCOL)+2)**2 )
* If LWORK=-1, then WORK(1) gets set to the above number and
* the code returns immediately.
*
* IWORK (global and local input) INTEGER array of size ILWORK
* This will hold some of the IBLK integer arrays.
* This is held as a place holder for a future release.
* Currently unreferenced.
*
* ILWORK (local input) INTEGER
* This will hold the size of the IWORK array.
* This is held as a place holder for a future release.
* Currently unreferenced.
*
* INFO (global output) INTEGER
* < 0: parameter number -INFO incorrect or inconsistent
* = 0: successful exit
* > 0: PSLAHQR failed to compute all the eigenvalues ILO to IHI
* in a total of 30*(IHI-ILO+1) iterations; if INFO = i,
* elements i+1:ihi of WR and WI contain those eigenvalues
* which have been successfully computed.
*
* Logic:
* This algorithm is very similar to _LAHQR. Unlike _LAHQR,
* instead of sending one double shift through the largest
* unreduced submatrix, this algorithm sends multiple double shifts
* and spaces them apart so that there can be parallelism across
* several processor row/columns. Another critical difference is
* that this algorithm aggregrates multiple transforms together in
* order to apply them in a block fashion.
*
* Important Local Variables:
* IBLK = The maximum number of bulges that can be computed.
* Currently fixed. Future releases this won't be fixed.
* HBL = The square block size (HBL=DESCA(MB_)=DESCA(NB_))
* ROTN = The number of transforms to block together
* NBULGE = The number of bulges that will be attempted on the
* current submatrix.
* IBULGE = The current number of bulges started.
* K1(*),K2(*) = The current bulge loops from K1(*) to K2(*).
*
* Subroutines:
* From LAPACK, this routine calls:
* SLAHQR -> Serial QR used to determine shifts and eigenvalues
* SLARFG -> Determine the Householder transforms
*
* This ScaLAPACK, this routine calls:
* PSLACONSB -> To determine where to start each iteration
* SLAMSH -> Sends multiple shifts through a small submatrix to
* see how the consecutive subdiagonals change (if
* PSLACONSB indicates we can start a run in the middle)
* PSLAWIL -> Given the shift, get the transformation
* SLASORTE -> Pair up eigenvalues so that reals are paired.
* PSLACP3 -> Parallel array to local replicated array copy &
* back.
* SLAREF -> Row/column reflector applier. Core routine
* here.
* PSLASMSUB -> Finds negligible subdiagonal elements.
*
* Current Notes and/or Restrictions:
* 1.) This code requires the distributed block size to be square
* and at least six (6); unlike simpler codes like LU, this
* algorithm is extremely sensitive to block size. Unwise
* choices of too small a block size can lead to bad
* performance.
* 2.) This code requires A and Z to be distributed identically
* and have identical contxts. A future version may allow Z to
* have a different contxt to 1D row map it to all nodes (so no
* communication on Z is necessary.)
* 3.) This release currently does not have a routine for
* resolving the Schur blocks into regular 2x2 form after
* this code is completed. Because of this, a significant
* performance impact is required while the deflation is done
* by sometimes a single column of processors.
* 4.) This code does not currently block the initial transforms
* so that none of the rows or columns for any bulge are
* completed until all are started. To offset pipeline
* start-up it is recommended that at least 2*LCM(NPROW,NPCOL)
* bulges are used (if possible)
* 5.) The maximum number of bulges currently supported is fixed at
* 32. In future versions this will be limited only by the
* incoming WORK and IWORK array.
* 6.) The matrix A must be in upper Hessenberg form. If elements
* below the subdiagonal are nonzero, the resulting transforms
* may be nonsimilar. This is also true with the LAPACK
* routine SLAHQR.
* 7.) For this release, this code has only been tested for
* RSRC_=CSRC_=0, but it has been written for the general case.
* 8.) Currently, all the eigenvalues are distributed to all the
* nodes. Future releases will probably distribute the
* eigenvalues by the column partitioning.
* 9.) The internals of this routine are subject to change.
* 10.) To optimize this for your architecture, try tuning SLAREF.
* 11.) This code has only been tested for WANTZ = .TRUE. and may
* behave unpredictably for WANTZ set to .FALSE.
*
* Implemented by: G. Henry, May 1, 1997
*
* =====================================================================
*
* .. Parameters ..
INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DT_,
$ LLD_, MB_, M_, NB_, N_, RSRC_
PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DT_ = 1,
$ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6,
$ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 )
REAL ZERO, ONE, HALF
PARAMETER ( ZERO = 0.0E+0, ONE = 1.0E+0, HALF = 0.5E+0 )
REAL CONST
PARAMETER ( CONST = 1.50E+0 )
INTEGER IBLK
PARAMETER ( IBLK = 32 )
* ..
* .. Local Scalars ..
LOGICAL SKIP
INTEGER CONTXT, DOWN, HBL, I, I1, I2, IAFIRST,
$ IBULGE, ICBUF, ICOL, ICOL1, ICOL2, IDIA,
$ IERR, II, IRBUF, IROW, IROW1, IROW2, ISPEC,
$ ISTART, ISTARTCOL, ISTARTROW, ISTOP, ISUB,
$ ISUP, ITERMAX, ITMP1, ITMP2, ITN, ITS,
$ IZBUF, J, JAFIRST, JBLK, JJ, K, KI, L, LCMRC,
$ LDA, LDZ, LEFT, LIHIH, LIHIZ, LILOH, LILOZ,
$ LOCALI1, LOCALI2, LOCALK, LOCALM, M, MODKM1,
$ MYCOL, MYROW, NBULGE, NH, NODE, NPCOL,
$ NPROW, NQ, NR, NUM, NZ, RIGHT, UP, VECSIDX
REAL AVE, DISC, H00, H10, H11, H12,
$ H21, H22, H33, H43H34, H44, OVFL, S,
$ SMLNUM, SUM, T1, T1COPY, T2, T3,
$ ULP, UNFL, V1SAVE, V2, V2SAVE,
$ V3, V3SAVE
* ..
* .. Local Arrays ..
INTEGER ICURCOL( IBLK ), ICURROW( IBLK ), K1( IBLK ),
$ K2( IBLK ), KCOL( IBLK ), KP2COL( IBLK ),
$ KP2ROW( IBLK ), KROW( IBLK )
REAL S1( 2*IBLK, 2*IBLK ), SMALLA( 6, 6, IBLK ),
$ VCOPY( 3 )
* ..
* .. External Functions ..
INTEGER ILCM, NUMROC
REAL PSLAMCH
EXTERNAL ILCM, NUMROC, PSLAMCH
* ..
* .. External Subroutines ..
EXTERNAL BLACS_GRIDINFO, SCOPY, SGEBR2D,
$ SGEBS2D, SGERV2D, SGESD2D, SGSUM2D, SLAHQR,
$ SLAREF, SLARFG, SLASORTE, SLAMSH, IGAMN2D,
$ IGEBR2D, IGEBS2D, INFOG1L, INFOG2L, PSLABAD,
$ PSLACONSB, PSLACP3, PSLASMSUB, PSLAWIL,
$ PXERBLA
* ..
* .. Intrinsic Functions ..
*
INTRINSIC ABS, DABS, DBLE, MAX, MIN, MOD, SIGN, SQRT
* ..
* .. Executable Statements ..
*
INFO = 0
*
ITERMAX = 30*( IHI-ILO+1 )
IF( N.EQ.0 )
$ RETURN
*
* NODE (IAFIRST,JAFIRST) OWNS A(1,1)
*
HBL = DESCA( MB_ )
CONTXT = DESCA( CTXT_ )
LDA = DESCA( LLD_ )
IAFIRST = DESCA( RSRC_ )
JAFIRST = DESCA( CSRC_ )
LDZ = DESCZ( LLD_ )
CALL BLACS_GRIDINFO( CONTXT, NPROW, NPCOL, MYROW, MYCOL )
NODE = MYROW*NPCOL + MYCOL
NUM = NPROW*NPCOL
LEFT = MOD( MYCOL+NPCOL-1, NPCOL )
RIGHT = MOD( MYCOL+1, NPCOL )
UP = MOD( MYROW+NPROW-1, NPROW )
DOWN = MOD( MYROW+1, NPROW )
LCMRC = ILCM( NPROW, NPCOL )
IF( ( NPROW.LE.3 ) .OR. ( NPCOL.LE.3 ) ) THEN
SKIP = .TRUE.
ELSE
SKIP = .FALSE.
END IF
*
* Determine the number of columns we have so we can check workspace
*
NQ = NUMROC( N, HBL, MYCOL, JAFIRST, NPCOL )
JJ = N / HBL
IF( JJ*HBL.LT.N )
$ JJ = JJ + 1
JJ = 7*JJ / LCMRC
JJ = 3*N+MAX( 2*MAX( LDA, LDZ )+2*NQ, JJ )
JJ = JJ + MAX( 2*N, (8*LCMRC+2)**2 )
IF ( LWORK .EQ. -1 ) THEN
WORK(1) = JJ
RETURN
END IF
IF ( LWORK .LT. JJ ) THEN
INFO = -15
END IF
IF( DESCZ( CTXT_ ).NE.DESCA( CTXT_ ) ) THEN
INFO = -( 1300+CTXT_ )
END IF
IF( DESCA( MB_ ).NE.DESCA( NB_ ) ) THEN
INFO = -( 700+NB_ )
END IF
IF( DESCZ( MB_ ).NE.DESCZ( NB_ ) ) THEN
INFO = -( 1300+NB_ )
END IF
IF( DESCA( MB_ ).NE.DESCZ( MB_ ) ) THEN
INFO = -( 1300+MB_ )
END IF
IF( ( DESCA( RSRC_ ).NE.0 ) .OR. ( DESCA( CSRC_ ).NE.0 ) ) THEN
INFO = -( 700+RSRC_ )
END IF
IF( ( DESCZ( RSRC_ ).NE.0 ) .OR. ( DESCZ( CSRC_ ).NE.0 ) ) THEN
INFO = -( 1300+RSRC_ )
END IF
IF( ( ILO.GT.N ) .OR. ( ILO.LT.1 ) ) THEN
INFO = -4
END IF
IF( ( IHI.GT.N ) .OR. ( IHI.LT.1 ) ) THEN
INFO = -5
END IF
IF( HBL.LT.5 ) THEN
INFO = -( 700+MB_ )
END IF
CALL IGAMN2D( CONTXT, 'ALL', ' ', 1, 1, INFO, 1, ITMP1, ITMP2, -1,
$ -1, -1 )
IF( INFO.LT.0 ) THEN
CALL PXERBLA( CONTXT, 'PSLAHQR', -INFO )
RETURN
END IF
*
* Set work array indices
*
VECSIDX = 0
IDIA = 3*N
ISUB = 3*N
ISUP = 3*N
IRBUF = 3*N
ICBUF = 3*N
IZBUF = 5*N
*
* Find a value for ROTN
*
ROTN = HBL / 3
ROTN = MIN( ROTN, HBL-2 )
ROTN = MAX( ROTN, 1 )
*
IF( ILO.EQ.IHI ) THEN
CALL INFOG2L( ILO, ILO, DESCA, NPROW, NPCOL, MYROW, MYCOL,
$ IROW, ICOL, II, JJ )
IF( ( MYROW.EQ.II ) .AND. ( MYCOL.EQ.JJ ) ) THEN
WR( ILO ) = A( ( ICOL-1 )*LDA+IROW )
ELSE
WR( ILO ) = ZERO
END IF
WI( ILO ) = ZERO
RETURN
END IF
*
NH = IHI - ILO + 1
NZ = IHIZ - ILOZ + 1
*
CALL INFOG1L( ILOZ, HBL, NPROW, MYROW, IAFIRST, LILOZ, LIHIZ )
LIHIZ = NUMROC( IHIZ, HBL, MYROW, IAFIRST, NPROW )
*
* Set machine-dependent constants for the stopping criterion.
* If NORM(H) <= SQRT(OVFL), overflow should not occur.
*
UNFL = PSLAMCH( CONTXT, 'SAFE MINIMUM' )
OVFL = ONE / UNFL
CALL PSLABAD( CONTXT, UNFL, OVFL )
ULP = PSLAMCH( CONTXT, 'PRECISION' )
SMLNUM = UNFL*( NH / ULP )
*
* I1 and I2 are the indices of the first row and last column of H
* to which transformations must be applied. If eigenvalues only are
* being computed, I1 and I2 are set inside the main loop.
*
IF( WANTT ) THEN
I1 = 1
I2 = N
END IF
*
* ITN is the total number of QR iterations allowed.
*
ITN = ITERMAX
*
* The main loop begins here. I is the loop index and decreases from
* IHI to ILO in steps of our schur block size (<=2*IBLK). Each
* iteration of the loop works with the active submatrix in rows
* and columns L to I. Eigenvalues I+1 to IHI have already
* converged. Either L = ILO or the global A(L,L-1) is negligible
* so that the matrix splits.
*
I = IHI
10 CONTINUE
L = ILO
IF( I.LT.ILO )
$ GO TO 670
*
* Perform QR iterations on rows and columns ILO to I until a
* submatrix of order 1 or 2 splits off at the bottom because a
* subdiagonal element has become negligible.
*
DO 640 ITS = 0, ITN
*
* Look for a single small subdiagonal element.
*
CALL PSLASMSUB( A, DESCA, I, L, K, SMLNUM, WORK( IRBUF+1 ),
$ LWORK-IRBUF )
L = K
*
IF( L.GT.ILO ) THEN
*
* H(L,L-1) is negligible
*
CALL INFOG2L( L, L-1, DESCA, NPROW, NPCOL, MYROW, MYCOL,
$ IROW, ICOL, ITMP1, ITMP2 )
IF( ( MYROW.EQ.ITMP1 ) .AND. ( MYCOL.EQ.ITMP2 ) ) THEN
A( ( ICOL-1 )*LDA+IROW ) = ZERO
END IF
WORK( ISUB+L-1 ) = ZERO
END IF
*
* Exit from loop if a submatrix of order 1 or 2 has split off.
*
IF ( WANTT ) THEN
* For Schur form, use 2x2 blocks
IF ( L .GE. I-1 ) THEN
GO TO 650
END IF
ELSE
* If we don't want the Schur form, use bigger blocks.
IF ( L .GE. I-( 2*IBLK-1) ) THEN
GO TO 650
END IF
END IF
*
* Now the active submatrix is in rows and columns L to I. If
* eigenvalues only are being computed, only the active submatrix
* need be transformed.
*
IF( .NOT.WANTT ) THEN
I1 = L
I2 = I
END IF
*
* Copy submatrix of size 2*JBLK and prepare to do generalized
* Wilkinson shift or an exceptional shift
*
JBLK = MIN( IBLK, ( ( I-L+1 ) / 2 )-1 )
IF( JBLK.GT.LCMRC ) THEN
*
* Make sure it's divisible by LCM (we want even workloads!)
*
JBLK = JBLK - MOD( JBLK, LCMRC )
END IF
JBLK = MIN( JBLK, 2*LCMRC )
JBLK = MAX( JBLK, 1 )
*
CALL PSLACP3( 2*JBLK, I-2*JBLK+1, A, DESCA, S1, 2*IBLK, -1, -1,
$ 0 )
IF( (ITS.EQ.20 .OR. ITS.EQ.40) .AND. (JBLK.GT.1) ) THEN
*
* Exceptional shift.
*
DO 20 II = 2*JBLK, 1, -1
S1( II, II ) = CONST*( ABS( S1( II, II ) )+
$ ABS( S1( II, II-1 ) ) )
S1( II, II-1 ) = ZERO
S1( II-1, II ) = ZERO
20 CONTINUE
ELSE
CALL SLAHQR( .FALSE., .FALSE., 2*JBLK, 1, 2*JBLK, S1,
$ 2*IBLK, WORK( IRBUF+1 ), WORK( ICBUF+1 ), 1,
$ 2*JBLK, Z, LDZ, IERR )
*
* Prepare to use Wilkinson's double shift
*
H44 = S1( 2*JBLK, 2*JBLK )
H33 = S1( 2*JBLK-1, 2*JBLK-1 )
H43H34 = S1( 2*JBLK-1, 2*JBLK )*S1( 2*JBLK, 2*JBLK-1 )
IF( ( JBLK.GT.1 ) .AND. ( ITS.GT.30 ) ) THEN
S = S1( 2*JBLK-1, 2*JBLK-2 )
DISC = ( H33-H44 )*HALF
DISC = DISC*DISC + H43H34
IF( DISC.GT.ZERO ) THEN
*
* Real roots: Use Wilkinson's shift twice
*
DISC = SQRT( DISC )
AVE = HALF*( H33+H44 )
IF( ABS( H33 )-ABS( H44 ).GT.ZERO ) THEN
H33 = H33*H44 - H43H34
H44 = H33 / ( SIGN( DISC, AVE )+AVE )
ELSE
H44 = SIGN( DISC, AVE ) + AVE
END IF
H33 = H44
H43H34 = ZERO
END IF
END IF
END IF
*
* Look for two consecutive small subdiagonal elements:
* PSLACONSB is the routine that does this.
*
CALL PSLACONSB( A, DESCA, I, L, M, H44, H33, H43H34,
$ WORK( IRBUF+1 ), LWORK-IRBUF )
*
* Double-shift QR step
*
* NBULGE is the number of bulges that will be attempted
*
ISTOP = MIN( M+ROTN-1-MOD( M-( M / HBL )*HBL-1, ROTN ), I-2 )
ISTOP = MIN( ISTOP, M+HBL-3-MOD( M-1, HBL ) )
ISTOP = MIN( ISTOP, I2-2 )
ISTOP = MAX( ISTOP, M )
NBULGE = ( I-1-ISTOP ) / HBL
*
* Do not exceed maximum determined.
*
NBULGE = MIN( NBULGE, JBLK )
IF( NBULGE.GT.LCMRC ) THEN
*
* Make sure it's divisible by LCM (we want even workloads!)
*
NBULGE = NBULGE - MOD( NBULGE, LCMRC )
END IF
NBULGE = MAX( NBULGE, 1 )
*
IF( ( ITS.NE.20 ) .AND. ( ITS.NE.40 ) .AND. ( NBULGE.GT.1 ) )
$ THEN
*
* sort the eigenpairs so that they are in twos for double
* shifts. only call if several need sorting
*
CALL SLASORTE( S1( 2*( JBLK-NBULGE )+1,
$ 2*( JBLK-NBULGE )+1 ), 2*IBLK, 2*NBULGE,
$ WORK( IRBUF+1 ), IERR )
END IF
*
* If we are starting in the middle because of consecutive small
* subdiagonal elements, we need to see how many bulges we
* can send through without breaking the consecutive small
* subdiagonal property.
*
IF( ( NBULGE.GT.1 ) .AND. ( M.GT.L ) ) THEN
*
* Copy a chunk of elements from global A(M-1:,M-1:)
*
CALL INFOG2L( M+2, M+2, DESCA, NPROW, NPCOL, MYROW, MYCOL,
$ IROW1, ICOL1, ITMP1, ITMP2 )
II = MIN( 4*NBULGE+2, N-M+2 )
CALL PSLACP3( II, M-1, A, DESCA, WORK( IRBUF+1 ), II, ITMP1,
$ ITMP2, 0 )
IF( ( MYROW.EQ.ITMP1 ) .AND. ( MYCOL.EQ.ITMP2 ) ) THEN
*
* Find a new NBULGE based on the bulges we have.
*
CALL SLAMSH( S1, 2*IBLK, NBULGE, JBLK, WORK( IRBUF+1 ),
$ II, II, ULP )
IF( NUM.GT.1 ) THEN
CALL IGEBS2D( CONTXT, 'ALL', ' ', 1, 1, NBULGE, 1 )
END IF
ELSE
*
* Everyone needs to receive the new NBULGE
*
CALL IGEBR2D( CONTXT, 'ALL', ' ', 1, 1, NBULGE, 1, ITMP1,
$ ITMP2 )
END IF
END IF
*
* IBULGE is the number of bulges going so far
*
IBULGE = 1
*
* "A" row defs : main row transforms from LOCALK to LOCALI2
*
CALL INFOG1L( M, HBL, NPCOL, MYCOL, JAFIRST, ITMP1, LOCALK )
LOCALK = NQ
CALL INFOG1L( 1, HBL, NPCOL, MYCOL, JAFIRST, ICOL1, LOCALI2 )
LOCALI2 = NUMROC( I2, HBL, MYCOL, JAFIRST, NPCOL )
*
* "A" col defs : main col transforms from LOCALI1 to LOCALM
*
CALL INFOG1L( I1, HBL, NPROW, MYROW, IAFIRST, LOCALI1,
$ ICOL1 )
CALL INFOG1L( 1, HBL, NPROW, MYROW, IAFIRST, LOCALM, ICOL1 )
ICOL1 = NUMROC( MIN( M+3, I ), HBL, MYROW, IAFIRST, NPROW )
*
* Which row & column will start the bulges
*
ISTARTROW = MOD( ( M+1 ) / HBL, NPROW ) + IAFIRST
ISTARTCOL = MOD( ( M+1 ) / HBL, NPCOL ) + JAFIRST
*
CALL INFOG1L( M, HBL, NPROW, MYROW, IAFIRST, II, ITMP2 )
CALL INFOG1L( M, HBL, NPCOL, MYCOL, JAFIRST, JJ, ITMP2 )
CALL INFOG1L( 1, HBL, NPROW, MYROW, IAFIRST, ISTOP,
$ KP2ROW( 1 ) )
KP2ROW( 1 ) = NUMROC( M+2, HBL, MYROW, IAFIRST, NPROW )
CALL INFOG1L( 1, HBL, NPCOL, MYCOL, JAFIRST, ISTOP,
$ KP2COL( 1 ) )
KP2COL( 1 ) = NUMROC( M+2, HBL, MYCOL, JAFIRST, NPCOL )
*
* Set all values for bulges. All bulges are stored in
* intermediate steps as loops over KI. Their current "task"
* over the global M to I-1 values is always K1(KI) to K2(KI).
* However, because there are many bulges, K1(KI) & K2(KI) might
* go past that range while later bulges (KI+1,KI+2,etc..) are
* finishing up. Even if ROTN=1, in order to minimize border
* communication sometimes K1(KI)=HBL-2 & K2(KI)=HBL-1 so both
* border messages can be handled at once.
*
* Rules:
* If MOD(K1(KI)-1,HBL) < HBL-2 then MOD(K2(KI)-1,HBL)