SUBROUTINE PDGEBAL( JOB, N, A, DESCA, ILO, IHI, SCALE, INFO )
*
* Contribution from the Department of Computing Science and HPC2N,
* Umea University, Sweden
*
* -- ScaLAPACK computational routine (version 2.0.1) --
* University of Tennessee, Knoxville, Oak Ridge National Laboratory,
* Univ. of Colorado Denver and University of California, Berkeley.
* January, 2012
*
IMPLICIT NONE
*
* .. Scalar Arguments ..
CHARACTER JOB
INTEGER IHI, ILO, INFO, N
* ..
* .. Array Arguments ..
INTEGER DESCA( * )
DOUBLE PRECISION A( * ), SCALE( * )
* ..
*
* Purpose
* =======
*
* PDGEBAL balances a general real matrix A. This involves, first,
* permuting A by a similarity transformation to isolate eigenvalues
* in the first 1 to ILO-1 and last IHI+1 to N elements on the
* diagonal; and second, applying a diagonal similarity transformation
* to rows and columns ILO to IHI to make the rows and columns as
* close in norm as possible. Both steps are optional.
*
* Balancing may reduce the 1-norm of the matrix, and improve the
* accuracy of the computed eigenvalues and/or eigenvectors.
*
* Notes
* =====
*
* Each global data object is described by an associated description
* vector. This vector stores the information required to establish
* the mapping between an object element and its corresponding process
* and memory location.
*
* Let A be a generic term for any 2D block cyclicly distributed array.
* Such a global array has an associated description vector DESCA.
* In the following comments, the character _ should be read as
* "of the global array".
*
* NOTATION STORED IN EXPLANATION
* --------------- -------------- --------------------------------------
* DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case,
* DTYPE_A = 1.
* CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
* the BLACS process grid A is distribu-
* ted over. The context itself is glo-
* bal, but the handle (the integer
* value) may vary.
* M_A (global) DESCA( M_ ) The number of rows in the global
* array A.
* N_A (global) DESCA( N_ ) The number of columns in the global
* array A.
* MB_A (global) DESCA( MB_ ) The blocking factor used to distribute
* the rows of the array.
* NB_A (global) DESCA( NB_ ) The blocking factor used to distribute
* the columns of the array.
* RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
* row of the array A is distributed.
* CSRC_A (global) DESCA( CSRC_ ) The process column over which the
* first column of the array A is
* distributed.
* LLD_A (local) DESCA( LLD_ ) The leading dimension of the local
* array. LLD_A >= MAX(1,LOCr(M_A)).
*
* Let K be the number of rows or columns of a distributed matrix,
* and assume that its process grid has dimension p x q.
* LOCr( K ) denotes the number of elements of K that a process
* would receive if K were distributed over the p processes of its
* process column.
* Similarly, LOCc( K ) denotes the number of elements of K that a
* process would receive if K were distributed over the q processes of
* its process row.
* The values of LOCr() and LOCc() may be determined via a call to the
* ScaLAPACK tool function, NUMROC:
* LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
* LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).
* An upper bound for these quantities may be computed by:
* LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
* LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A
*
*
* Arguments
* =========
*
* JOB (global input) CHARACTER*1
* Specifies the operations to be performed on A:
* = 'N': none: simply set ILO = 1, IHI = N, SCALE(I) = 1.0
* for i = 1,...,N;
* = 'P': permute only;
* = 'S': scale only;
* = 'B': both permute and scale.
*
* N (global input) INTEGER
* The order of the matrix A. N >= 0.
*
* A (local input/output) DOUBLE PRECISION array, dimension
* (DESCA(LLD_,LOCc(N))
* On entry, the input matrix A.
* On exit, A is overwritten by the balanced matrix.
* If JOB = 'N', A is not referenced.
* See Further Details.
*
* DESCA (global and local input) INTEGER array of dimension DLEN_.
* The array descriptor for the distributed matrix A.
*
* ILO (global output) INTEGER
* IHI (global output) INTEGER
* ILO and IHI are set to integers such that on exit
* A(i,j) = 0 if i > j and j = 1,...,ILO-1 or I = IHI+1,...,N.
* If JOB = 'N' or 'S', ILO = 1 and IHI = N.
*
* SCALE (global output) DOUBLE PRECISION array, dimension (N)
* Details of the permutations and scaling factors applied to
* A. If P(j) is the index of the row and column interchanged
* with row and column j and D(j) is the scaling factor
* applied to row and column j, then
* SCALE(j) = P(j) for j = 1,...,ILO-1
* = D(j) for j = ILO,...,IHI
* = P(j) for j = IHI+1,...,N.
* The order in which the interchanges are made is N to IHI+1,
* then 1 to ILO-1.
*
* INFO (global output) INTEGER
* = 0: successful exit.
* < 0: if INFO = -i, the i-th argument had an illegal value.
*
* Further Details
* ===============
*
* The permutations consist of row and column interchanges which put
* the matrix in the form
*
* ( T1 X Y )
* P A P = ( 0 B Z )
* ( 0 0 T2 )
*
* where T1 and T2 are upper triangular matrices whose eigenvalues lie
* along the diagonal. The column indices ILO and IHI mark the starting
* and ending columns of the submatrix B. Balancing consists of applying
* a diagonal similarity transformation inv(D) * B * D to make the
* 1-norms of each row of B and its corresponding column nearly equal.
* The output matrix is
*
* ( T1 X*D Y )
* ( 0 inv(D)*B*D inv(D)*Z ).
* ( 0 0 T2 )
*
* Information about the permutations P and the diagonal matrix D is
* returned in the vector SCALE.
*
* This subroutine is based on the EISPACK routine BALANC. In principle,
* the parallelism is extracted by using PBLAS and BLACS routines for
* the permutation and balancing.
*
* Modified by Tzu-Yi Chen, Computer Science Division, University of
* California at Berkeley, USA
*
* Parallel version by Robert Granat and Meiyue Shao, Department of
* Computing Science and HPC2N, Umea University, Sweden
*
* =====================================================================
*
* .. Parameters ..
INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_,
$ LLD_, MB_, M_, NB_, N_, RSRC_
PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1,
$ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6,
$ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 )
DOUBLE PRECISION ZERO, ONE
PARAMETER ( ZERO = 0.0D+0, ONE = 1.0D+0 )
DOUBLE PRECISION SCLFAC
PARAMETER ( SCLFAC = 2.0D+0 )
DOUBLE PRECISION FACTOR
PARAMETER ( FACTOR = 0.95D+0 )
* ..
* .. Local Scalars ..
LOGICAL NOCONV
INTEGER I, ICA, IEXC, IRA, J, K, L, M, LLDA,
$ ICTXT, NPROW, NPCOL, MYROW, MYCOL, II, JJ,
$ ARSRC, ACSRC
DOUBLE PRECISION C, CA, F, G, R, RA, S, SFMAX1, SFMAX2, SFMIN1,
$ SFMIN2, ELEM
* ..
* .. Local Arrays ..
DOUBLE PRECISION CR( 2 )
* ..
* .. External Functions ..
LOGICAL DISNAN, LSAME
INTEGER IDAMAX
DOUBLE PRECISION DLAMCH
EXTERNAL DISNAN, LSAME, DLAMCH
* ..
* .. External Subroutines ..
EXTERNAL PDSCAL, PDSWAP, PDAMAX, PXERBLA,
$ BLACS_GRIDINFO, CHK1MAT, DGSUM2D,
$ INFOG2L, PDELGET
* ..
* .. Intrinsic Functions ..
INTRINSIC ABS, MAX, MIN
* ..
* .. Executable Statements ..
INFO = 0
ICTXT = DESCA( CTXT_ )
CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )
*
* Test the input parameters.
*
IF( .NOT.LSAME( JOB, 'N' ) .AND. .NOT.LSAME( JOB, 'P' ) .AND.
$ .NOT.LSAME( JOB, 'S' ) .AND. .NOT.LSAME( JOB, 'B' ) ) THEN
INFO = -1
ELSE IF( N.LT.0 ) THEN
INFO = -2
ELSE
CALL CHK1MAT( N, 2, N, 2, 1, 1, DESCA, 4, INFO )
END IF
IF( INFO.NE.0 ) THEN
CALL PXERBLA( ICTXT, 'PDGEBAL', -INFO )
RETURN
END IF
*
* Extract local leading dimension of A.
*
LLDA = DESCA( LLD_ )
*
K = 1
L = N
*
IF( N.EQ.0 )
$ GO TO 210
*
IF( LSAME( JOB, 'N' ) ) THEN
DO 10 I = 1, N
SCALE( I ) = ONE
10 CONTINUE
GO TO 210
END IF
*
IF( LSAME( JOB, 'S' ) )
$ GO TO 120
*
* Permutation to isolate eigenvalues if possible.
*
GO TO 50
*
* Row and column exchange.
*
20 CONTINUE
SCALE( M ) = J
IF( J.EQ.M )
$ GO TO 30
*
CALL PDSWAP( L, A, 1, J, DESCA, 1, A, 1, M, DESCA, 1 )
CALL PDSWAP( N-K+1, A, J, K, DESCA, DESCA(M_), A, M, K, DESCA,
$ DESCA(M_) )
*
30 CONTINUE
GO TO ( 40, 80 )IEXC
*
* Search for rows isolating an eigenvalue and push them down.
*
40 CONTINUE
IF( L.EQ.1 )
$ GO TO 210
L = L - 1
*
50 CONTINUE
DO 70 J = L, 1, -1
*
DO 60 I = 1, L
IF( I.EQ.J )
$ GO TO 60
*
* All processors need the information to make correct decisions.
*
CALL PDELGET( 'All', '1-Tree', ELEM, A, J, I, DESCA )
IF( ELEM.NE.ZERO )
$ GO TO 70
60 CONTINUE
*
M = L
IEXC = 1
GO TO 20
70 CONTINUE
*
GO TO 90
*
* Search for columns isolating an eigenvalue and push them left.
*
80 CONTINUE
K = K + 1
*
90 CONTINUE
DO 110 J = K, L
*
DO 100 I = K, L
IF( I.EQ.J )
$ GO TO 100
*
* All processors need the information to make correct decisions.
*
CALL PDELGET( 'All', '1-Tree', ELEM, A, I, J, DESCA )
IF( ELEM.NE.ZERO )
$ GO TO 110
100 CONTINUE
*
M = K
IEXC = 2
GO TO 20
110 CONTINUE
*
120 CONTINUE
DO 130 I = K, L
SCALE( I ) = ONE
130 CONTINUE
*
IF( LSAME( JOB, 'P' ) )
$ GO TO 210
*
* Balance the submatrix in rows K to L.
*
* Iterative loop for norm reduction.
*
SFMIN1 = DLAMCH( 'S' ) / DLAMCH( 'P' )
SFMAX1 = ONE / SFMIN1
SFMIN2 = SFMIN1*SCLFAC
SFMAX2 = ONE / SFMIN2
140 CONTINUE
NOCONV = .FALSE.
*
DO 200 I = K, L
C = ZERO
R = ZERO
*
* Compute local partial values of R and C in parallel and combine
* with a call to the BLACS global summation routine distributing
* information to all processors.
*
DO 150 J = K, L
IF( J.EQ.I )
$ GO TO 150
CALL INFOG2L( J, I, DESCA, NPROW, NPCOL, MYROW,
$ MYCOL, II, JJ, ARSRC, ACSRC )
IF( MYROW.EQ.ARSRC .AND. MYCOL.EQ.ACSRC ) THEN
C = C + ABS( A( II + (JJ-1)*LLDA ) )
END IF
CALL INFOG2L( I, J, DESCA, NPROW, NPCOL, MYROW,
$ MYCOL, II, JJ, ARSRC, ACSRC )
IF( MYROW.EQ.ARSRC .AND. MYCOL.EQ.ACSRC ) THEN
R = R + ABS( A( II + (JJ-1)*LLDA ) )
END IF
150 CONTINUE
CR( 1 ) = C
CR( 2 ) = R
CALL DGSUM2D( ICTXT, 'All', '1-Tree', 2, 1, CR, 2, -1, -1 )
C = CR( 1 )
R = CR( 2 )
*
* Find global maximum absolute values and indices in parallel.
*
CALL PDAMAX( L, CA, ICA, A, 1, I, DESCA, 1 )
CALL PDAMAX( N-K+1, RA, IRA, A, I, K, DESCA, DESCA(M_) )
*
* Guard against zero C or R due to underflow.
*
IF( C.EQ.ZERO .OR. R.EQ.ZERO )
$ GO TO 200
G = R / SCLFAC
F = ONE
S = C + R
160 CONTINUE
IF( C.GE.G .OR. MAX( F, C, CA ).GE.SFMAX2 .OR.
$ MIN( R, G, RA ).LE.SFMIN2 )GO TO 170
IF( DISNAN( C+F+CA+R+G+RA ) ) THEN
*
* Exit if NaN to avoid infinite loop
*
INFO = -3
CALL PXERBLA( ICTXT, 'PDGEBAL', -INFO )
RETURN
END IF
F = F*SCLFAC
C = C*SCLFAC
CA = CA*SCLFAC
R = R / SCLFAC
G = G / SCLFAC
RA = RA / SCLFAC
GO TO 160
*
170 CONTINUE
G = C / SCLFAC
180 CONTINUE
IF( G.LT.R .OR. MAX( R, RA ).GE.SFMAX2 .OR.
$ MIN( F, C, G, CA ).LE.SFMIN2 )GO TO 190
F = F / SCLFAC
C = C / SCLFAC
G = G / SCLFAC
CA = CA / SCLFAC
R = R*SCLFAC
RA = RA*SCLFAC
GO TO 180
*
* Now balance.
*
190 CONTINUE
IF( ( C+R ).GE.FACTOR*S )
$ GO TO 200
IF( F.LT.ONE .AND. SCALE( I ).LT.ONE ) THEN
IF( F*SCALE( I ).LE.SFMIN1 )
$ GO TO 200
END IF
IF( F.GT.ONE .AND. SCALE( I ).GT.ONE ) THEN
IF( SCALE( I ).GE.SFMAX1 / F )
$ GO TO 200
END IF
G = ONE / F
SCALE( I ) = SCALE( I )*F
NOCONV = .TRUE.
*
CALL PDSCAL( N-K+1, G, A, I, K, DESCA, DESCA(M_) )
CALL PDSCAL( L, F, A, 1, I, DESCA, 1 )
*
200 CONTINUE
*
IF( NOCONV )
$ GO TO 140
*
210 CONTINUE
ILO = K
IHI = L
*
RETURN
*
* End of PDGEBAL
*
END