ScaLAPACK 2.1  2.1 ScaLAPACK: Scalable Linear Algebra PACKage
psgebal.f
Go to the documentation of this file.
1  SUBROUTINE psgebal( JOB, N, A, DESCA, ILO, IHI, SCALE, INFO )
2 *
3 * Contribution from the Department of Computing Science and HPC2N,
4 * Umea University, Sweden
5 *
6 * -- ScaLAPACK computational routine (version 2.0.1) --
7 * University of Tennessee, Knoxville, Oak Ridge National Laboratory,
8 * Univ. of Colorado Denver and University of California, Berkeley.
9 * January, 2012
10 *
11  IMPLICIT NONE
12 *
13 * .. Scalar Arguments ..
14  CHARACTER JOB
15  INTEGER IHI, ILO, INFO, N
16 * ..
17 * .. Array Arguments ..
18  INTEGER DESCA( * )
19  REAL A( * ), SCALE( * )
20 * ..
21 *
22 * Purpose
23 * =======
24 *
25 * PSGEBAL balances a general real matrix A. This involves, first,
26 * permuting A by a similarity transformation to isolate eigenvalues
27 * in the first 1 to ILO-1 and last IHI+1 to N elements on the
28 * diagonal; and second, applying a diagonal similarity transformation
29 * to rows and columns ILO to IHI to make the rows and columns as
30 * close in norm as possible. Both steps are optional.
31 *
32 * Balancing may reduce the 1-norm of the matrix, and improve the
33 * accuracy of the computed eigenvalues and/or eigenvectors.
34 *
35 * Notes
36 * =====
37 *
38 * Each global data object is described by an associated description
39 * vector. This vector stores the information required to establish
40 * the mapping between an object element and its corresponding process
41 * and memory location.
42 *
43 * Let A be a generic term for any 2D block cyclicly distributed array.
44 * Such a global array has an associated description vector DESCA.
45 * In the following comments, the character _ should be read as
46 * "of the global array".
47 *
48 * NOTATION STORED IN EXPLANATION
49 * --------------- -------------- --------------------------------------
50 * DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case,
51 * DTYPE_A = 1.
52 * CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
53 * the BLACS process grid A is distribu-
54 * ted over. The context itself is glo-
55 * bal, but the handle (the integer
56 * value) may vary.
57 * M_A (global) DESCA( M_ ) The number of rows in the global
58 * array A.
59 * N_A (global) DESCA( N_ ) The number of columns in the global
60 * array A.
61 * MB_A (global) DESCA( MB_ ) The blocking factor used to distribute
62 * the rows of the array.
63 * NB_A (global) DESCA( NB_ ) The blocking factor used to distribute
64 * the columns of the array.
65 * RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
66 * row of the array A is distributed.
67 * CSRC_A (global) DESCA( CSRC_ ) The process column over which the
68 * first column of the array A is
69 * distributed.
70 * LLD_A (local) DESCA( LLD_ ) The leading dimension of the local
71 * array. LLD_A >= MAX(1,LOCr(M_A)).
72 *
73 * Let K be the number of rows or columns of a distributed matrix,
74 * and assume that its process grid has dimension p x q.
75 * LOCr( K ) denotes the number of elements of K that a process
76 * would receive if K were distributed over the p processes of its
77 * process column.
78 * Similarly, LOCc( K ) denotes the number of elements of K that a
79 * process would receive if K were distributed over the q processes of
80 * its process row.
81 * The values of LOCr() and LOCc() may be determined via a call to the
82 * ScaLAPACK tool function, NUMROC:
83 * LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
84 * LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).
85 * An upper bound for these quantities may be computed by:
86 * LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
87 * LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A
88 *
89 *
90 * Arguments
91 * =========
92 *
93 * JOB (global input) CHARACTER*1
94 * Specifies the operations to be performed on A:
95 * = 'N': none: simply set ILO = 1, IHI = N, SCALE(I) = 1.0
96 * for i = 1,...,N;
97 * = 'P': permute only;
98 * = 'S': scale only;
99 * = 'B': both permute and scale.
100 *
101 * N (global input) INTEGER
102 * The order of the matrix A. N >= 0.
103 *
104 * A (local input/output) REAL array, dimension
105 * (DESCA(LLD_,LOCc(N))
106 * On entry, the input matrix A.
107 * On exit, A is overwritten by the balanced matrix.
108 * If JOB = 'N', A is not referenced.
109 * See Further Details.
110 *
111 * DESCA (global and local input) INTEGER array of dimension DLEN_.
112 * The array descriptor for the distributed matrix A.
113 *
114 * ILO (global output) INTEGER
115 * IHI (global output) INTEGER
116 * ILO and IHI are set to integers such that on exit
117 * A(i,j) = 0 if i > j and j = 1,...,ILO-1 or I = IHI+1,...,N.
118 * If JOB = 'N' or 'S', ILO = 1 and IHI = N.
119 *
120 * SCALE (global output) REAL array, dimension (N)
121 * Details of the permutations and scaling factors applied to
122 * A. If P(j) is the index of the row and column interchanged
123 * with row and column j and D(j) is the scaling factor
124 * applied to row and column j, then
125 * SCALE(j) = P(j) for j = 1,...,ILO-1
126 * = D(j) for j = ILO,...,IHI
127 * = P(j) for j = IHI+1,...,N.
128 * The order in which the interchanges are made is N to IHI+1,
129 * then 1 to ILO-1.
130 *
131 * INFO (global output) INTEGER
132 * = 0: successful exit.
133 * < 0: if INFO = -i, the i-th argument had an illegal value.
134 *
135 * Further Details
136 * ===============
137 *
138 * The permutations consist of row and column interchanges which put
139 * the matrix in the form
140 *
141 * ( T1 X Y )
142 * P A P = ( 0 B Z )
143 * ( 0 0 T2 )
144 *
145 * where T1 and T2 are upper triangular matrices whose eigenvalues lie
146 * along the diagonal. The column indices ILO and IHI mark the starting
147 * and ending columns of the submatrix B. Balancing consists of applying
148 * a diagonal similarity transformation inv(D) * B * D to make the
149 * 1-norms of each row of B and its corresponding column nearly equal.
150 * The output matrix is
151 *
152 * ( T1 X*D Y )
153 * ( 0 inv(D)*B*D inv(D)*Z ).
154 * ( 0 0 T2 )
155 *
156 * Information about the permutations P and the diagonal matrix D is
157 * returned in the vector SCALE.
158 *
159 * This subroutine is based on the EISPACK routine BALANC. In principle,
160 * the parallelism is extracted by using PBLAS and BLACS routines for
161 * the permutation and balancing.
162 *
163 * Modified by Tzu-Yi Chen, Computer Science Division, University of
164 * California at Berkeley, USA
165 *
166 * Parallel version by Robert Granat and Meiyue Shao, Department of
167 * Computing Science and HPC2N, Umea University, Sweden
168 *
169 * =====================================================================
170 *
171 * .. Parameters ..
172  INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_,
173  \$ LLD_, MB_, M_, NB_, N_, RSRC_
174  parameter( block_cyclic_2d = 1, dlen_ = 9, dtype_ = 1,
175  \$ ctxt_ = 2, m_ = 3, n_ = 4, mb_ = 5, nb_ = 6,
176  \$ rsrc_ = 7, csrc_ = 8, lld_ = 9 )
177  REAL ZERO, ONE
178  parameter( zero = 0.0e+0, one = 1.0e+0 )
179  REAL SCLFAC
180  parameter( sclfac = 2.0e+0 )
181  REAL FACTOR
182  parameter( factor = 0.95e+0 )
183 * ..
184 * .. Local Scalars ..
185  LOGICAL NOCONV
186  INTEGER I, ICA, IEXC, IRA, J, K, L, M, LLDA,
187  \$ ICTXT, NPROW, NPCOL, MYROW, MYCOL, II, JJ,
188  \$ ARSRC, ACSRC
189  REAL C, CA, F, G, R, RA, S, SFMAX1, SFMAX2, SFMIN1,
190  \$ SFMIN2, ELEM
191 * ..
192 * .. Local Arrays ..
193  REAL CR( 2 )
194 * ..
195 * .. External Functions ..
196  LOGICAL SISNAN, LSAME
197  INTEGER IDAMAX
198  REAL SLAMCH
199  EXTERNAL sisnan, lsame, slamch
200 * ..
201 * .. External Subroutines ..
202  EXTERNAL psscal, psswap, psamax, pxerbla,
203  \$ blacs_gridinfo, chk1mat, sgsum2d,
204  \$ infog2l, pselget
205 * ..
206 * .. Intrinsic Functions ..
207  INTRINSIC abs, max, min
208 * ..
209 * .. Executable Statements ..
210  info = 0
211  ictxt = desca( ctxt_ )
212  CALL blacs_gridinfo( ictxt, nprow, npcol, myrow, mycol )
213 *
214 * Test the input parameters.
215 *
216  IF( .NOT.lsame( job, 'N' ) .AND. .NOT.lsame( job, 'P' ) .AND.
217  \$ .NOT.lsame( job, 'S' ) .AND. .NOT.lsame( job, 'B' ) ) THEN
218  info = -1
219  ELSE IF( n.LT.0 ) THEN
220  info = -2
221  ELSE
222  CALL chk1mat( n, 2, n, 2, 1, 1, desca, 4, info )
223  END IF
224  IF( info.NE.0 ) THEN
225  CALL pxerbla( ictxt, 'PSGEBAL', -info )
226  RETURN
227  END IF
228 *
229 * Extract local leading dimension of A.
230 *
231  llda = desca( lld_ )
232 *
233  k = 1
234  l = n
235 *
236  IF( n.EQ.0 )
237  \$ GO TO 210
238 *
239  IF( lsame( job, 'N' ) ) THEN
240  DO 10 i = 1, n
241  scale( i ) = one
242  10 CONTINUE
243  GO TO 210
244  END IF
245 *
246  IF( lsame( job, 'S' ) )
247  \$ GO TO 120
248 *
249 * Permutation to isolate eigenvalues if possible.
250 *
251  GO TO 50
252 *
253 * Row and column exchange.
254 *
255  20 CONTINUE
256  scale( m ) = j
257  IF( j.EQ.m )
258  \$ GO TO 30
259 *
260  CALL psswap( l, a, 1, j, desca, 1, a, 1, m, desca, 1 )
261  CALL psswap( n-k+1, a, j, k, desca, desca(m_), a, m, k, desca,
262  \$ desca(m_) )
263 *
264  30 CONTINUE
265  GO TO ( 40, 80 )iexc
266 *
267 * Search for rows isolating an eigenvalue and push them down.
268 *
269  40 CONTINUE
270  IF( l.EQ.1 )
271  \$ GO TO 210
272  l = l - 1
273 *
274  50 CONTINUE
275  DO 70 j = l, 1, -1
276 *
277  DO 60 i = 1, l
278  IF( i.EQ.j )
279  \$ GO TO 60
280 *
281 * All processors need the information to make correct decisions.
282 *
283  CALL pselget( 'All', '1-Tree', elem, a, j, i, desca )
284  IF( elem.NE.zero )
285  \$ GO TO 70
286  60 CONTINUE
287 *
288  m = l
289  iexc = 1
290  GO TO 20
291  70 CONTINUE
292 *
293  GO TO 90
294 *
295 * Search for columns isolating an eigenvalue and push them left.
296 *
297  80 CONTINUE
298  k = k + 1
299 *
300  90 CONTINUE
301  DO 110 j = k, l
302 *
303  DO 100 i = k, l
304  IF( i.EQ.j )
305  \$ GO TO 100
306 *
307 * All processors need the information to make correct decisions.
308 *
309  CALL pselget( 'All', '1-Tree', elem, a, i, j, desca )
310  IF( elem.NE.zero )
311  \$ GO TO 110
312  100 CONTINUE
313 *
314  m = k
315  iexc = 2
316  GO TO 20
317  110 CONTINUE
318 *
319  120 CONTINUE
320  DO 130 i = k, l
321  scale( i ) = one
322  130 CONTINUE
323 *
324  IF( lsame( job, 'P' ) )
325  \$ GO TO 210
326 *
327 * Balance the submatrix in rows K to L.
328 *
329 * Iterative loop for norm reduction.
330 *
331  sfmin1 = slamch( 'S' ) / slamch( 'P' )
332  sfmax1 = one / sfmin1
333  sfmin2 = sfmin1*sclfac
334  sfmax2 = one / sfmin2
335  140 CONTINUE
336  noconv = .false.
337 *
338  DO 200 i = k, l
339  c = zero
340  r = zero
341 *
342 * Compute local partial values of R and C in parallel and combine
343 * with a call to the BLACS global summation routine distributing
344 * information to all processors.
345 *
346  DO 150 j = k, l
347  IF( j.EQ.i )
348  \$ GO TO 150
349  CALL infog2l( j, i, desca, nprow, npcol, myrow,
350  \$ mycol, ii, jj, arsrc, acsrc )
351  IF( myrow.EQ.arsrc .AND. mycol.EQ.acsrc ) THEN
352  c = c + abs( a( ii + (jj-1)*llda ) )
353  END IF
354  CALL infog2l( i, j, desca, nprow, npcol, myrow,
355  \$ mycol, ii, jj, arsrc, acsrc )
356  IF( myrow.EQ.arsrc .AND. mycol.EQ.acsrc ) THEN
357  r = r + abs( a( ii + (jj-1)*llda ) )
358  END IF
359  150 CONTINUE
360  cr( 1 ) = c
361  cr( 2 ) = r
362  CALL sgsum2d( ictxt, 'All', '1-Tree', 2, 1, cr, 2, -1, -1 )
363  c = cr( 1 )
364  r = cr( 2 )
365 *
366 * Find global maximum absolute values and indices in parallel.
367 *
368  CALL psamax( l, ca, ica, a, 1, i, desca, 1 )
369  CALL psamax( n-k+1, ra, ira, a, i, k, desca, desca(m_) )
370 *
371 * Guard against zero C or R due to underflow.
372 *
373  IF( c.EQ.zero .OR. r.EQ.zero )
374  \$ GO TO 200
375  g = r / sclfac
376  f = one
377  s = c + r
378  160 CONTINUE
379  IF( c.GE.g .OR. max( f, c, ca ).GE.sfmax2 .OR.
380  \$ min( r, g, ra ).LE.sfmin2 )GO TO 170
381  IF( sisnan( c+f+ca+r+g+ra ) ) THEN
382 *
383 * Exit if NaN to avoid infinite loop
384 *
385  info = -3
386  CALL pxerbla( ictxt, 'PDGEBAL', -info )
387  RETURN
388  END IF
389  f = f*sclfac
390  c = c*sclfac
391  ca = ca*sclfac
392  r = r / sclfac
393  g = g / sclfac
394  ra = ra / sclfac
395  GO TO 160
396 *
397  170 CONTINUE
398  g = c / sclfac
399  180 CONTINUE
400  IF( g.LT.r .OR. max( r, ra ).GE.sfmax2 .OR.
401  \$ min( f, c, g, ca ).LE.sfmin2 )GO TO 190
402  f = f / sclfac
403  c = c / sclfac
404  g = g / sclfac
405  ca = ca / sclfac
406  r = r*sclfac
407  ra = ra*sclfac
408  GO TO 180
409 *
410 * Now balance.
411 *
412  190 CONTINUE
413  IF( ( c+r ).GE.factor*s )
414  \$ GO TO 200
415  IF( f.LT.one .AND. scale( i ).LT.one ) THEN
416  IF( f*scale( i ).LE.sfmin1 )
417  \$ GO TO 200
418  END IF
419  IF( f.GT.one .AND. scale( i ).GT.one ) THEN
420  IF( scale( i ).GE.sfmax1 / f )
421  \$ GO TO 200
422  END IF
423  g = one / f
424  scale( i ) = scale( i )*f
425  noconv = .true.
426 *
427  CALL psscal( n-k+1, g, a, i, k, desca, desca(m_) )
428  CALL psscal( l, f, a, 1, i, desca, 1 )
429 *
430  200 CONTINUE
431 *
432  IF( noconv )
433  \$ GO TO 140
434 *
435  210 CONTINUE
436  ilo = k
437  ihi = l
438 *
439  RETURN
440 *
441 * End of PSGEBAL
442 *
443  END
max
#define max(A, B)
Definition: pcgemr.c:180
psgebal
subroutine psgebal(JOB, N, A, DESCA, ILO, IHI, SCALE, INFO)
Definition: psgebal.f:2
infog2l
subroutine infog2l(GRINDX, GCINDX, DESC, NPROW, NPCOL, MYROW, MYCOL, LRINDX, LCINDX, RSRC, CSRC)
Definition: infog2l.f:3
pselget
subroutine pselget(SCOPE, TOP, ALPHA, A, IA, JA, DESCA)
Definition: pselget.f:2
chk1mat
subroutine chk1mat(MA, MAPOS0, NA, NAPOS0, IA, JA, DESCA, DESCAPOS0, INFO)
Definition: chk1mat.f:3
pxerbla
subroutine pxerbla(ICTXT, SRNAME, INFO)
Definition: pxerbla.f:2
min
#define min(A, B)
Definition: pcgemr.c:181