LAPACK  3.6.1
LAPACK: Linear Algebra PACKage
sgesvj.f
Go to the documentation of this file.
1 *> \brief \b SGESVJ
2 *
3 * =========== DOCUMENTATION ===========
4 *
5 * Online html documentation available at
6 * http://www.netlib.org/lapack/explore-html/
7 *
8 *> \htmlonly
9 *> Download SGESVJ + dependencies
10 *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/sgesvj.f">
11 *> [TGZ]</a>
12 *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/sgesvj.f">
13 *> [ZIP]</a>
14 *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/sgesvj.f">
15 *> [TXT]</a>
16 *> \endhtmlonly
17 *
18 * Definition:
19 * ===========
20 *
21 * SUBROUTINE SGESVJ( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V,
22 * LDV, WORK, LWORK, INFO )
23 *
24 * .. Scalar Arguments ..
25 * INTEGER INFO, LDA, LDV, LWORK, M, MV, N
26 * CHARACTER*1 JOBA, JOBU, JOBV
27 * ..
28 * .. Array Arguments ..
29 * REAL A( LDA, * ), SVA( N ), V( LDV, * ),
30 * $ WORK( LWORK )
31 * ..
32 *
33 *
34 *> \par Purpose:
35 * =============
36 *>
37 *> \verbatim
38 *>
39 *> SGESVJ computes the singular value decomposition (SVD) of a real
40 *> M-by-N matrix A, where M >= N. The SVD of A is written as
41 *> [++] [xx] [x0] [xx]
42 *> A = U * SIGMA * V^t, [++] = [xx] * [ox] * [xx]
43 *> [++] [xx]
44 *> where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal
45 *> matrix, and V is an N-by-N orthogonal matrix. The diagonal elements
46 *> of SIGMA are the singular values of A. The columns of U and V are the
47 *> left and the right singular vectors of A, respectively.
48 *> SGESVJ can sometimes compute tiny singular values and their singular vectors much
49 *> more accurately than other SVD routines, see below under Further Details.
50 *> \endverbatim
51 *
52 * Arguments:
53 * ==========
54 *
55 *> \param[in] JOBA
56 *> \verbatim
57 *> JOBA is CHARACTER* 1
58 *> Specifies the structure of A.
59 *> = 'L': The input matrix A is lower triangular;
60 *> = 'U': The input matrix A is upper triangular;
61 *> = 'G': The input matrix A is general M-by-N matrix, M >= N.
62 *> \endverbatim
63 *>
64 *> \param[in] JOBU
65 *> \verbatim
66 *> JOBU is CHARACTER*1
67 *> Specifies whether to compute the left singular vectors
68 *> (columns of U):
69 *> = 'U': The left singular vectors corresponding to the nonzero
70 *> singular values are computed and returned in the leading
71 *> columns of A. See more details in the description of A.
72 *> The default numerical orthogonality threshold is set to
73 *> approximately TOL=CTOL*EPS, CTOL=SQRT(M), EPS=SLAMCH('E').
74 *> = 'C': Analogous to JOBU='U', except that user can control the
75 *> level of numerical orthogonality of the computed left
76 *> singular vectors. TOL can be set to TOL = CTOL*EPS, where
77 *> CTOL is given on input in the array WORK.
78 *> No CTOL smaller than ONE is allowed. CTOL greater
79 *> than 1 / EPS is meaningless. The option 'C'
80 *> can be used if M*EPS is satisfactory orthogonality
81 *> of the computed left singular vectors, so CTOL=M could
82 *> save few sweeps of Jacobi rotations.
83 *> See the descriptions of A and WORK(1).
84 *> = 'N': The matrix U is not computed. However, see the
85 *> description of A.
86 *> \endverbatim
87 *>
88 *> \param[in] JOBV
89 *> \verbatim
90 *> JOBV is CHARACTER*1
91 *> Specifies whether to compute the right singular vectors, that
92 *> is, the matrix V:
93 *> = 'V' : the matrix V is computed and returned in the array V
94 *> = 'A' : the Jacobi rotations are applied to the MV-by-N
95 *> array V. In other words, the right singular vector
96 *> matrix V is not computed explicitly; instead it is
97 *> applied to an MV-by-N matrix initially stored in the
98 *> first MV rows of V.
99 *> = 'N' : the matrix V is not computed and the array V is not
100 *> referenced
101 *> \endverbatim
102 *>
103 *> \param[in] M
104 *> \verbatim
105 *> M is INTEGER
106 *> The number of rows of the input matrix A. 1/SLAMCH('E') > M >= 0.
107 *> \endverbatim
108 *>
109 *> \param[in] N
110 *> \verbatim
111 *> N is INTEGER
112 *> The number of columns of the input matrix A.
113 *> M >= N >= 0.
114 *> \endverbatim
115 *>
116 *> \param[in,out] A
117 *> \verbatim
118 *> A is REAL array, dimension (LDA,N)
119 *> On entry, the M-by-N matrix A.
120 *> On exit,
121 *> If JOBU .EQ. 'U' .OR. JOBU .EQ. 'C':
122 *> If INFO .EQ. 0 :
123 *> RANKA orthonormal columns of U are returned in the
124 *> leading RANKA columns of the array A. Here RANKA <= N
125 *> is the number of computed singular values of A that are
126 *> above the underflow threshold SLAMCH('S'). The singular
127 *> vectors corresponding to underflowed or zero singular
128 *> values are not computed. The value of RANKA is returned
129 *> in the array WORK as RANKA=NINT(WORK(2)). Also see the
130 *> descriptions of SVA and WORK. The computed columns of U
131 *> are mutually numerically orthogonal up to approximately
132 *> TOL=SQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU.EQ.'C'),
133 *> see the description of JOBU.
134 *> If INFO .GT. 0,
135 *> the procedure SGESVJ did not converge in the given number
136 *> of iterations (sweeps). In that case, the computed
137 *> columns of U may not be orthogonal up to TOL. The output
138 *> U (stored in A), SIGMA (given by the computed singular
139 *> values in SVA(1:N)) and V is still a decomposition of the
140 *> input matrix A in the sense that the residual
141 *> ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small.
142 *> If JOBU .EQ. 'N':
143 *> If INFO .EQ. 0 :
144 *> Note that the left singular vectors are 'for free' in the
145 *> one-sided Jacobi SVD algorithm. However, if only the
146 *> singular values are needed, the level of numerical
147 *> orthogonality of U is not an issue and iterations are
148 *> stopped when the columns of the iterated matrix are
149 *> numerically orthogonal up to approximately M*EPS. Thus,
150 *> on exit, A contains the columns of U scaled with the
151 *> corresponding singular values.
152 *> If INFO .GT. 0 :
153 *> the procedure SGESVJ did not converge in the given number
154 *> of iterations (sweeps).
155 *> \endverbatim
156 *>
157 *> \param[in] LDA
158 *> \verbatim
159 *> LDA is INTEGER
160 *> The leading dimension of the array A. LDA >= max(1,M).
161 *> \endverbatim
162 *>
163 *> \param[out] SVA
164 *> \verbatim
165 *> SVA is REAL array, dimension (N)
166 *> On exit,
167 *> If INFO .EQ. 0 :
168 *> depending on the value SCALE = WORK(1), we have:
169 *> If SCALE .EQ. ONE:
170 *> SVA(1:N) contains the computed singular values of A.
171 *> During the computation SVA contains the Euclidean column
172 *> norms of the iterated matrices in the array A.
173 *> If SCALE .NE. ONE:
174 *> The singular values of A are SCALE*SVA(1:N), and this
175 *> factored representation is due to the fact that some of the
176 *> singular values of A might underflow or overflow.
177 *>
178 *> If INFO .GT. 0 :
179 *> the procedure SGESVJ did not converge in the given number of
180 *> iterations (sweeps) and SCALE*SVA(1:N) may not be accurate.
181 *> \endverbatim
182 *>
183 *> \param[in] MV
184 *> \verbatim
185 *> MV is INTEGER
186 *> If JOBV .EQ. 'A', then the product of Jacobi rotations in SGESVJ
187 *> is applied to the first MV rows of V. See the description of JOBV.
188 *> \endverbatim
189 *>
190 *> \param[in,out] V
191 *> \verbatim
192 *> V is REAL array, dimension (LDV,N)
193 *> If JOBV = 'V', then V contains on exit the N-by-N matrix of
194 *> the right singular vectors;
195 *> If JOBV = 'A', then V contains the product of the computed right
196 *> singular vector matrix and the initial matrix in
197 *> the array V.
198 *> If JOBV = 'N', then V is not referenced.
199 *> \endverbatim
200 *>
201 *> \param[in] LDV
202 *> \verbatim
203 *> LDV is INTEGER
204 *> The leading dimension of the array V, LDV .GE. 1.
205 *> If JOBV .EQ. 'V', then LDV .GE. max(1,N).
206 *> If JOBV .EQ. 'A', then LDV .GE. max(1,MV) .
207 *> \endverbatim
208 *>
209 *> \param[in,out] WORK
210 *> \verbatim
211 *> WORK is REAL array, dimension max(4,M+N).
212 *> On entry,
213 *> If JOBU .EQ. 'C' :
214 *> WORK(1) = CTOL, where CTOL defines the threshold for convergence.
215 *> The process stops if all columns of A are mutually
216 *> orthogonal up to CTOL*EPS, EPS=SLAMCH('E').
217 *> It is required that CTOL >= ONE, i.e. it is not
218 *> allowed to force the routine to obtain orthogonality
219 *> below EPSILON.
220 *> On exit,
221 *> WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N)
222 *> are the computed singular vcalues of A.
223 *> (See description of SVA().)
224 *> WORK(2) = NINT(WORK(2)) is the number of the computed nonzero
225 *> singular values.
226 *> WORK(3) = NINT(WORK(3)) is the number of the computed singular
227 *> values that are larger than the underflow threshold.
228 *> WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi
229 *> rotations needed for numerical convergence.
230 *> WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep.
231 *> This is useful information in cases when SGESVJ did
232 *> not converge, as it can be used to estimate whether
233 *> the output is stil useful and for post festum analysis.
234 *> WORK(6) = the largest absolute value over all sines of the
235 *> Jacobi rotation angles in the last sweep. It can be
236 *> useful for a post festum analysis.
237 *> \endverbatim
238 *>
239 *> \param[in] LWORK
240 *> \verbatim
241 *> LWORK is INTEGER
242 *> length of WORK, WORK >= MAX(6,M+N)
243 *> \endverbatim
244 *>
245 *> \param[out] INFO
246 *> \verbatim
247 *> INFO is INTEGER
248 *> = 0 : successful exit.
249 *> < 0 : if INFO = -i, then the i-th argument had an illegal value
250 *> > 0 : SGESVJ did not converge in the maximal allowed number (30)
251 *> of sweeps. The output may still be useful. See the
252 *> description of WORK.
253 *> \endverbatim
254 *
255 * Authors:
256 * ========
257 *
258 *> \author Univ. of Tennessee
259 *> \author Univ. of California Berkeley
260 *> \author Univ. of Colorado Denver
261 *> \author NAG Ltd.
262 *
263 *> \date November 2015
264 *
265 *> \ingroup realGEcomputational
266 *
267 *> \par Further Details:
268 * =====================
269 *>
270 *> The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane
271 *> rotations. The rotations are implemented as fast scaled rotations of
272 *> Anda and Park [1]. In the case of underflow of the Jacobi angle, a
273 *> modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses
274 *> column interchanges of de Rijk [2]. The relative accuracy of the computed
275 *> singular values and the accuracy of the computed singular vectors (in
276 *> angle metric) is as guaranteed by the theory of Demmel and Veselic [3].
277 *> The condition number that determines the accuracy in the full rank case
278 *> is essentially min_{D=diag} kappa(A*D), where kappa(.) is the
279 *> spectral condition number. The best performance of this Jacobi SVD
280 *> procedure is achieved if used in an accelerated version of Drmac and
281 *> Veselic [5,6], and it is the kernel routine in the SIGMA library [7].
282 *> Some tunning parameters (marked with [TP]) are available for the
283 *> implementer. \n
284 *> The computational range for the nonzero singular values is the machine
285 *> number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even
286 *> denormalized singular values can be computed with the corresponding
287 *> gradual loss of accurate digits.
288 *>
289 *> \par Contributors:
290 * ==================
291 *>
292 *> Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
293 *>
294 *> \par References:
295 * ================
296 *>
297 *> [1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling. \n
298 *> SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174. \n\n
299 *> [2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the
300 *> singular value decomposition on a vector computer. \n
301 *> SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371. \n\n
302 *> [3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR. \n
303 *> [4] Z. Drmac: Implementation of Jacobi rotations for accurate singular
304 *> value computation in floating point arithmetic. \n
305 *> SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222. \n\n
306 *> [5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I. \n
307 *> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342. \n
308 *> LAPACK Working note 169. \n\n
309 *> [6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II. \n
310 *> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362. \n
311 *> LAPACK Working note 170. \n\n
312 *> [7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
313 *> QSVD, (H,K)-SVD computations.\n
314 *> Department of Mathematics, University of Zagreb, 2008.
315 *>
316 *> \par Bugs, Examples and Comments:
317 * =================================
318 *>
319 *> Please report all bugs and send interesting test examples and comments to
320 *> drmac@math.hr. Thank you.
321 *
322 * =====================================================================
323  SUBROUTINE sgesvj( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V,
324  $ ldv, work, lwork, info )
325 *
326 * -- LAPACK computational routine (version 3.6.0) --
327 * -- LAPACK is a software package provided by Univ. of Tennessee, --
328 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
329 * November 2015
330 *
331 * .. Scalar Arguments ..
332  INTEGER INFO, LDA, LDV, LWORK, M, MV, N
333  CHARACTER*1 JOBA, JOBU, JOBV
334 * ..
335 * .. Array Arguments ..
336  REAL A( lda, * ), SVA( n ), V( ldv, * ),
337  $ work( lwork )
338 * ..
339 *
340 * =====================================================================
341 *
342 * .. Local Parameters ..
343  REAL ZERO, HALF, ONE
344  parameter ( zero = 0.0e0, half = 0.5e0, one = 1.0e0)
345  INTEGER NSWEEP
346  parameter ( nsweep = 30 )
347 * ..
348 * .. Local Scalars ..
349  REAL AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG,
350  $ bigtheta, cs, ctol, epsln, large, mxaapq,
351  $ mxsinj, rootbig, rooteps, rootsfmin, roottol,
352  $ skl, sfmin, small, sn, t, temp1, theta,
353  $ thsign, tol
354  INTEGER BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1,
355  $ iswrot, jbc, jgl, kbl, lkahead, mvl, n2, n34,
356  $ n4, nbl, notrot, p, pskipped, q, rowskip,
357  $ swband
358  LOGICAL APPLV, GOSCALE, LOWER, LSVEC, NOSCALE, ROTOK,
359  $ rsvec, uctol, upper
360 * ..
361 * .. Local Arrays ..
362  REAL FASTR( 5 )
363 * ..
364 * .. Intrinsic Functions ..
365  INTRINSIC abs, max, min, float, sign, sqrt
366 * ..
367 * .. External Functions ..
368 * ..
369 * from BLAS
370  REAL SDOT, SNRM2
371  EXTERNAL sdot, snrm2
372  INTEGER ISAMAX
373  EXTERNAL isamax
374 * from LAPACK
375  REAL SLAMCH
376  EXTERNAL slamch
377  LOGICAL LSAME
378  EXTERNAL lsame
379 * ..
380 * .. External Subroutines ..
381 * ..
382 * from BLAS
383  EXTERNAL saxpy, scopy, srotm, sscal, sswap
384 * from LAPACK
385  EXTERNAL slascl, slaset, slassq, xerbla
386 *
387  EXTERNAL sgsvj0, sgsvj1
388 * ..
389 * .. Executable Statements ..
390 *
391 * Test the input arguments
392 *
393  lsvec = lsame( jobu, 'U' )
394  uctol = lsame( jobu, 'C' )
395  rsvec = lsame( jobv, 'V' )
396  applv = lsame( jobv, 'A' )
397  upper = lsame( joba, 'U' )
398  lower = lsame( joba, 'L' )
399 *
400  IF( .NOT.( upper .OR. lower .OR. lsame( joba, 'G' ) ) ) THEN
401  info = -1
402  ELSE IF( .NOT.( lsvec .OR. uctol .OR. lsame( jobu, 'N' ) ) ) THEN
403  info = -2
404  ELSE IF( .NOT.( rsvec .OR. applv .OR. lsame( jobv, 'N' ) ) ) THEN
405  info = -3
406  ELSE IF( m.LT.0 ) THEN
407  info = -4
408  ELSE IF( ( n.LT.0 ) .OR. ( n.GT.m ) ) THEN
409  info = -5
410  ELSE IF( lda.LT.m ) THEN
411  info = -7
412  ELSE IF( mv.LT.0 ) THEN
413  info = -9
414  ELSE IF( ( rsvec .AND. ( ldv.LT.n ) ) .OR.
415  $ ( applv .AND. ( ldv.LT.mv ) ) ) THEN
416  info = -11
417  ELSE IF( uctol .AND. ( work( 1 ).LE.one ) ) THEN
418  info = -12
419  ELSE IF( lwork.LT.max( m+n, 6 ) ) THEN
420  info = -13
421  ELSE
422  info = 0
423  END IF
424 *
425 * #:(
426  IF( info.NE.0 ) THEN
427  CALL xerbla( 'SGESVJ', -info )
428  RETURN
429  END IF
430 *
431 * #:) Quick return for void matrix
432 *
433  IF( ( m.EQ.0 ) .OR. ( n.EQ.0 ) )RETURN
434 *
435 * Set numerical parameters
436 * The stopping criterion for Jacobi rotations is
437 *
438 * max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS
439 *
440 * where EPS is the round-off and CTOL is defined as follows:
441 *
442  IF( uctol ) THEN
443 * ... user controlled
444  ctol = work( 1 )
445  ELSE
446 * ... default
447  IF( lsvec .OR. rsvec .OR. applv ) THEN
448  ctol = sqrt( float( m ) )
449  ELSE
450  ctol = float( m )
451  END IF
452  END IF
453 * ... and the machine dependent parameters are
454 *[!] (Make sure that SLAMCH() works properly on the target machine.)
455 *
456  epsln = slamch( 'Epsilon' )
457  rooteps = sqrt( epsln )
458  sfmin = slamch( 'SafeMinimum' )
459  rootsfmin = sqrt( sfmin )
460  small = sfmin / epsln
461  big = slamch( 'Overflow' )
462 * BIG = ONE / SFMIN
463  rootbig = one / rootsfmin
464  large = big / sqrt( float( m*n ) )
465  bigtheta = one / rooteps
466 *
467  tol = ctol*epsln
468  roottol = sqrt( tol )
469 *
470  IF( float( m )*epsln.GE.one ) THEN
471  info = -4
472  CALL xerbla( 'SGESVJ', -info )
473  RETURN
474  END IF
475 *
476 * Initialize the right singular vector matrix.
477 *
478  IF( rsvec ) THEN
479  mvl = n
480  CALL slaset( 'A', mvl, n, zero, one, v, ldv )
481  ELSE IF( applv ) THEN
482  mvl = mv
483  END IF
484  rsvec = rsvec .OR. applv
485 *
486 * Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N )
487 *(!) If necessary, scale A to protect the largest singular value
488 * from overflow. It is possible that saving the largest singular
489 * value destroys the information about the small ones.
490 * This initial scaling is almost minimal in the sense that the
491 * goal is to make sure that no column norm overflows, and that
492 * SQRT(N)*max_i SVA(i) does not overflow. If INFinite entries
493 * in A are detected, the procedure returns with INFO=-6.
494 *
495  skl = one / sqrt( float( m )*float( n ) )
496  noscale = .true.
497  goscale = .true.
498 *
499  IF( lower ) THEN
500 * the input matrix is M-by-N lower triangular (trapezoidal)
501  DO 1874 p = 1, n
502  aapp = zero
503  aaqq = one
504  CALL slassq( m-p+1, a( p, p ), 1, aapp, aaqq )
505  IF( aapp.GT.big ) THEN
506  info = -6
507  CALL xerbla( 'SGESVJ', -info )
508  RETURN
509  END IF
510  aaqq = sqrt( aaqq )
511  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
512  sva( p ) = aapp*aaqq
513  ELSE
514  noscale = .false.
515  sva( p ) = aapp*( aaqq*skl )
516  IF( goscale ) THEN
517  goscale = .false.
518  DO 1873 q = 1, p - 1
519  sva( q ) = sva( q )*skl
520  1873 CONTINUE
521  END IF
522  END IF
523  1874 CONTINUE
524  ELSE IF( upper ) THEN
525 * the input matrix is M-by-N upper triangular (trapezoidal)
526  DO 2874 p = 1, n
527  aapp = zero
528  aaqq = one
529  CALL slassq( p, a( 1, p ), 1, aapp, aaqq )
530  IF( aapp.GT.big ) THEN
531  info = -6
532  CALL xerbla( 'SGESVJ', -info )
533  RETURN
534  END IF
535  aaqq = sqrt( aaqq )
536  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
537  sva( p ) = aapp*aaqq
538  ELSE
539  noscale = .false.
540  sva( p ) = aapp*( aaqq*skl )
541  IF( goscale ) THEN
542  goscale = .false.
543  DO 2873 q = 1, p - 1
544  sva( q ) = sva( q )*skl
545  2873 CONTINUE
546  END IF
547  END IF
548  2874 CONTINUE
549  ELSE
550 * the input matrix is M-by-N general dense
551  DO 3874 p = 1, n
552  aapp = zero
553  aaqq = one
554  CALL slassq( m, a( 1, p ), 1, aapp, aaqq )
555  IF( aapp.GT.big ) THEN
556  info = -6
557  CALL xerbla( 'SGESVJ', -info )
558  RETURN
559  END IF
560  aaqq = sqrt( aaqq )
561  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
562  sva( p ) = aapp*aaqq
563  ELSE
564  noscale = .false.
565  sva( p ) = aapp*( aaqq*skl )
566  IF( goscale ) THEN
567  goscale = .false.
568  DO 3873 q = 1, p - 1
569  sva( q ) = sva( q )*skl
570  3873 CONTINUE
571  END IF
572  END IF
573  3874 CONTINUE
574  END IF
575 *
576  IF( noscale )skl = one
577 *
578 * Move the smaller part of the spectrum from the underflow threshold
579 *(!) Start by determining the position of the nonzero entries of the
580 * array SVA() relative to ( SFMIN, BIG ).
581 *
582  aapp = zero
583  aaqq = big
584  DO 4781 p = 1, n
585  IF( sva( p ).NE.zero )aaqq = min( aaqq, sva( p ) )
586  aapp = max( aapp, sva( p ) )
587  4781 CONTINUE
588 *
589 * #:) Quick return for zero matrix
590 *
591  IF( aapp.EQ.zero ) THEN
592  IF( lsvec )CALL slaset( 'G', m, n, zero, one, a, lda )
593  work( 1 ) = one
594  work( 2 ) = zero
595  work( 3 ) = zero
596  work( 4 ) = zero
597  work( 5 ) = zero
598  work( 6 ) = zero
599  RETURN
600  END IF
601 *
602 * #:) Quick return for one-column matrix
603 *
604  IF( n.EQ.1 ) THEN
605  IF( lsvec )CALL slascl( 'G', 0, 0, sva( 1 ), skl, m, 1,
606  $ a( 1, 1 ), lda, ierr )
607  work( 1 ) = one / skl
608  IF( sva( 1 ).GE.sfmin ) THEN
609  work( 2 ) = one
610  ELSE
611  work( 2 ) = zero
612  END IF
613  work( 3 ) = zero
614  work( 4 ) = zero
615  work( 5 ) = zero
616  work( 6 ) = zero
617  RETURN
618  END IF
619 *
620 * Protect small singular values from underflow, and try to
621 * avoid underflows/overflows in computing Jacobi rotations.
622 *
623  sn = sqrt( sfmin / epsln )
624  temp1 = sqrt( big / float( n ) )
625  IF( ( aapp.LE.sn ) .OR. ( aaqq.GE.temp1 ) .OR.
626  $ ( ( sn.LE.aaqq ) .AND. ( aapp.LE.temp1 ) ) ) THEN
627  temp1 = min( big, temp1 / aapp )
628 * AAQQ = AAQQ*TEMP1
629 * AAPP = AAPP*TEMP1
630  ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.LE.temp1 ) ) THEN
631  temp1 = min( sn / aaqq, big / ( aapp*sqrt( float( n ) ) ) )
632 * AAQQ = AAQQ*TEMP1
633 * AAPP = AAPP*TEMP1
634  ELSE IF( ( aaqq.GE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
635  temp1 = max( sn / aaqq, temp1 / aapp )
636 * AAQQ = AAQQ*TEMP1
637 * AAPP = AAPP*TEMP1
638  ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
639  temp1 = min( sn / aaqq, big / ( sqrt( float( n ) )*aapp ) )
640 * AAQQ = AAQQ*TEMP1
641 * AAPP = AAPP*TEMP1
642  ELSE
643  temp1 = one
644  END IF
645 *
646 * Scale, if necessary
647 *
648  IF( temp1.NE.one ) THEN
649  CALL slascl( 'G', 0, 0, one, temp1, n, 1, sva, n, ierr )
650  END IF
651  skl = temp1*skl
652  IF( skl.NE.one ) THEN
653  CALL slascl( joba, 0, 0, one, skl, m, n, a, lda, ierr )
654  skl = one / skl
655  END IF
656 *
657 * Row-cyclic Jacobi SVD algorithm with column pivoting
658 *
659  emptsw = ( n*( n-1 ) ) / 2
660  notrot = 0
661  fastr( 1 ) = zero
662 *
663 * A is represented in factored form A = A * diag(WORK), where diag(WORK)
664 * is initialized to identity. WORK is updated during fast scaled
665 * rotations.
666 *
667  DO 1868 q = 1, n
668  work( q ) = one
669  1868 CONTINUE
670 *
671 *
672  swband = 3
673 *[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective
674 * if SGESVJ is used as a computational routine in the preconditioned
675 * Jacobi SVD algorithm SGESVJ. For sweeps i=1:SWBAND the procedure
676 * works on pivots inside a band-like region around the diagonal.
677 * The boundaries are determined dynamically, based on the number of
678 * pivots above a threshold.
679 *
680  kbl = min( 8, n )
681 *[TP] KBL is a tuning parameter that defines the tile size in the
682 * tiling of the p-q loops of pivot pairs. In general, an optimal
683 * value of KBL depends on the matrix dimensions and on the
684 * parameters of the computer's memory.
685 *
686  nbl = n / kbl
687  IF( ( nbl*kbl ).NE.n )nbl = nbl + 1
688 *
689  blskip = kbl**2
690 *[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL.
691 *
692  rowskip = min( 5, kbl )
693 *[TP] ROWSKIP is a tuning parameter.
694 *
695  lkahead = 1
696 *[TP] LKAHEAD is a tuning parameter.
697 *
698 * Quasi block transformations, using the lower (upper) triangular
699 * structure of the input matrix. The quasi-block-cycling usually
700 * invokes cubic convergence. Big part of this cycle is done inside
701 * canonical subspaces of dimensions less than M.
702 *
703  IF( ( lower .OR. upper ) .AND. ( n.GT.max( 64, 4*kbl ) ) ) THEN
704 *[TP] The number of partition levels and the actual partition are
705 * tuning parameters.
706  n4 = n / 4
707  n2 = n / 2
708  n34 = 3*n4
709  IF( applv ) THEN
710  q = 0
711  ELSE
712  q = 1
713  END IF
714 *
715  IF( lower ) THEN
716 *
717 * This works very well on lower triangular matrices, in particular
718 * in the framework of the preconditioned Jacobi SVD (xGEJSV).
719 * The idea is simple:
720 * [+ 0 0 0] Note that Jacobi transformations of [0 0]
721 * [+ + 0 0] [0 0]
722 * [+ + x 0] actually work on [x 0] [x 0]
723 * [+ + x x] [x x]. [x x]
724 *
725  CALL sgsvj0( jobv, m-n34, n-n34, a( n34+1, n34+1 ), lda,
726  $ work( n34+1 ), sva( n34+1 ), mvl,
727  $ v( n34*q+1, n34+1 ), ldv, epsln, sfmin, tol,
728  $ 2, work( n+1 ), lwork-n, ierr )
729 *
730  CALL sgsvj0( jobv, m-n2, n34-n2, a( n2+1, n2+1 ), lda,
731  $ work( n2+1 ), sva( n2+1 ), mvl,
732  $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 2,
733  $ work( n+1 ), lwork-n, ierr )
734 *
735  CALL sgsvj1( jobv, m-n2, n-n2, n4, a( n2+1, n2+1 ), lda,
736  $ work( n2+1 ), sva( n2+1 ), mvl,
737  $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
738  $ work( n+1 ), lwork-n, ierr )
739 *
740  CALL sgsvj0( jobv, m-n4, n2-n4, a( n4+1, n4+1 ), lda,
741  $ work( n4+1 ), sva( n4+1 ), mvl,
742  $ v( n4*q+1, n4+1 ), ldv, epsln, sfmin, tol, 1,
743  $ work( n+1 ), lwork-n, ierr )
744 *
745  CALL sgsvj0( jobv, m, n4, a, lda, work, sva, mvl, v, ldv,
746  $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
747  $ ierr )
748 *
749  CALL sgsvj1( jobv, m, n2, n4, a, lda, work, sva, mvl, v,
750  $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
751  $ lwork-n, ierr )
752 *
753 *
754  ELSE IF( upper ) THEN
755 *
756 *
757  CALL sgsvj0( jobv, n4, n4, a, lda, work, sva, mvl, v, ldv,
758  $ epsln, sfmin, tol, 2, work( n+1 ), lwork-n,
759  $ ierr )
760 *
761  CALL sgsvj0( jobv, n2, n4, a( 1, n4+1 ), lda, work( n4+1 ),
762  $ sva( n4+1 ), mvl, v( n4*q+1, n4+1 ), ldv,
763  $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
764  $ ierr )
765 *
766  CALL sgsvj1( jobv, n2, n2, n4, a, lda, work, sva, mvl, v,
767  $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
768  $ lwork-n, ierr )
769 *
770  CALL sgsvj0( jobv, n2+n4, n4, a( 1, n2+1 ), lda,
771  $ work( n2+1 ), sva( n2+1 ), mvl,
772  $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
773  $ work( n+1 ), lwork-n, ierr )
774 
775  END IF
776 *
777  END IF
778 *
779 * .. Row-cyclic pivot strategy with de Rijk's pivoting ..
780 *
781  DO 1993 i = 1, nsweep
782 *
783 * .. go go go ...
784 *
785  mxaapq = zero
786  mxsinj = zero
787  iswrot = 0
788 *
789  notrot = 0
790  pskipped = 0
791 *
792 * Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs
793 * 1 <= p < q <= N. This is the first step toward a blocked implementation
794 * of the rotations. New implementation, based on block transformations,
795 * is under development.
796 *
797  DO 2000 ibr = 1, nbl
798 *
799  igl = ( ibr-1 )*kbl + 1
800 *
801  DO 1002 ir1 = 0, min( lkahead, nbl-ibr )
802 *
803  igl = igl + ir1*kbl
804 *
805  DO 2001 p = igl, min( igl+kbl-1, n-1 )
806 *
807 * .. de Rijk's pivoting
808 *
809  q = isamax( n-p+1, sva( p ), 1 ) + p - 1
810  IF( p.NE.q ) THEN
811  CALL sswap( m, a( 1, p ), 1, a( 1, q ), 1 )
812  IF( rsvec )CALL sswap( mvl, v( 1, p ), 1,
813  $ v( 1, q ), 1 )
814  temp1 = sva( p )
815  sva( p ) = sva( q )
816  sva( q ) = temp1
817  temp1 = work( p )
818  work( p ) = work( q )
819  work( q ) = temp1
820  END IF
821 *
822  IF( ir1.EQ.0 ) THEN
823 *
824 * Column norms are periodically updated by explicit
825 * norm computation.
826 * Caveat:
827 * Unfortunately, some BLAS implementations compute SNRM2(M,A(1,p),1)
828 * as SQRT(SDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to
829 * overflow for ||A(:,p)||_2 > SQRT(overflow_threshold), and to
830 * underflow for ||A(:,p)||_2 < SQRT(underflow_threshold).
831 * Hence, SNRM2 cannot be trusted, not even in the case when
832 * the true norm is far from the under(over)flow boundaries.
833 * If properly implemented SNRM2 is available, the IF-THEN-ELSE
834 * below should read "AAPP = SNRM2( M, A(1,p), 1 ) * WORK(p)".
835 *
836  IF( ( sva( p ).LT.rootbig ) .AND.
837  $ ( sva( p ).GT.rootsfmin ) ) THEN
838  sva( p ) = snrm2( m, a( 1, p ), 1 )*work( p )
839  ELSE
840  temp1 = zero
841  aapp = one
842  CALL slassq( m, a( 1, p ), 1, temp1, aapp )
843  sva( p ) = temp1*sqrt( aapp )*work( p )
844  END IF
845  aapp = sva( p )
846  ELSE
847  aapp = sva( p )
848  END IF
849 *
850  IF( aapp.GT.zero ) THEN
851 *
852  pskipped = 0
853 *
854  DO 2002 q = p + 1, min( igl+kbl-1, n )
855 *
856  aaqq = sva( q )
857 *
858  IF( aaqq.GT.zero ) THEN
859 *
860  aapp0 = aapp
861  IF( aaqq.GE.one ) THEN
862  rotok = ( small*aapp ).LE.aaqq
863  IF( aapp.LT.( big / aaqq ) ) THEN
864  aapq = ( sdot( m, a( 1, p ), 1, a( 1,
865  $ q ), 1 )*work( p )*work( q ) /
866  $ aaqq ) / aapp
867  ELSE
868  CALL scopy( m, a( 1, p ), 1,
869  $ work( n+1 ), 1 )
870  CALL slascl( 'G', 0, 0, aapp,
871  $ work( p ), m, 1,
872  $ work( n+1 ), lda, ierr )
873  aapq = sdot( m, work( n+1 ), 1,
874  $ a( 1, q ), 1 )*work( q ) / aaqq
875  END IF
876  ELSE
877  rotok = aapp.LE.( aaqq / small )
878  IF( aapp.GT.( small / aaqq ) ) THEN
879  aapq = ( sdot( m, a( 1, p ), 1, a( 1,
880  $ q ), 1 )*work( p )*work( q ) /
881  $ aaqq ) / aapp
882  ELSE
883  CALL scopy( m, a( 1, q ), 1,
884  $ work( n+1 ), 1 )
885  CALL slascl( 'G', 0, 0, aaqq,
886  $ work( q ), m, 1,
887  $ work( n+1 ), lda, ierr )
888  aapq = sdot( m, work( n+1 ), 1,
889  $ a( 1, p ), 1 )*work( p ) / aapp
890  END IF
891  END IF
892 *
893  mxaapq = max( mxaapq, abs( aapq ) )
894 *
895 * TO rotate or NOT to rotate, THAT is the question ...
896 *
897  IF( abs( aapq ).GT.tol ) THEN
898 *
899 * .. rotate
900 *[RTD] ROTATED = ROTATED + ONE
901 *
902  IF( ir1.EQ.0 ) THEN
903  notrot = 0
904  pskipped = 0
905  iswrot = iswrot + 1
906  END IF
907 *
908  IF( rotok ) THEN
909 *
910  aqoap = aaqq / aapp
911  apoaq = aapp / aaqq
912  theta = -half*abs( aqoap-apoaq ) / aapq
913 *
914  IF( abs( theta ).GT.bigtheta ) THEN
915 *
916  t = half / theta
917  fastr( 3 ) = t*work( p ) / work( q )
918  fastr( 4 ) = -t*work( q ) /
919  $ work( p )
920  CALL srotm( m, a( 1, p ), 1,
921  $ a( 1, q ), 1, fastr )
922  IF( rsvec )CALL srotm( mvl,
923  $ v( 1, p ), 1,
924  $ v( 1, q ), 1,
925  $ fastr )
926  sva( q ) = aaqq*sqrt( max( zero,
927  $ one+t*apoaq*aapq ) )
928  aapp = aapp*sqrt( max( zero,
929  $ one-t*aqoap*aapq ) )
930  mxsinj = max( mxsinj, abs( t ) )
931 *
932  ELSE
933 *
934 * .. choose correct signum for THETA and rotate
935 *
936  thsign = -sign( one, aapq )
937  t = one / ( theta+thsign*
938  $ sqrt( one+theta*theta ) )
939  cs = sqrt( one / ( one+t*t ) )
940  sn = t*cs
941 *
942  mxsinj = max( mxsinj, abs( sn ) )
943  sva( q ) = aaqq*sqrt( max( zero,
944  $ one+t*apoaq*aapq ) )
945  aapp = aapp*sqrt( max( zero,
946  $ one-t*aqoap*aapq ) )
947 *
948  apoaq = work( p ) / work( q )
949  aqoap = work( q ) / work( p )
950  IF( work( p ).GE.one ) THEN
951  IF( work( q ).GE.one ) THEN
952  fastr( 3 ) = t*apoaq
953  fastr( 4 ) = -t*aqoap
954  work( p ) = work( p )*cs
955  work( q ) = work( q )*cs
956  CALL srotm( m, a( 1, p ), 1,
957  $ a( 1, q ), 1,
958  $ fastr )
959  IF( rsvec )CALL srotm( mvl,
960  $ v( 1, p ), 1, v( 1, q ),
961  $ 1, fastr )
962  ELSE
963  CALL saxpy( m, -t*aqoap,
964  $ a( 1, q ), 1,
965  $ a( 1, p ), 1 )
966  CALL saxpy( m, cs*sn*apoaq,
967  $ a( 1, p ), 1,
968  $ a( 1, q ), 1 )
969  work( p ) = work( p )*cs
970  work( q ) = work( q ) / cs
971  IF( rsvec ) THEN
972  CALL saxpy( mvl, -t*aqoap,
973  $ v( 1, q ), 1,
974  $ v( 1, p ), 1 )
975  CALL saxpy( mvl,
976  $ cs*sn*apoaq,
977  $ v( 1, p ), 1,
978  $ v( 1, q ), 1 )
979  END IF
980  END IF
981  ELSE
982  IF( work( q ).GE.one ) THEN
983  CALL saxpy( m, t*apoaq,
984  $ a( 1, p ), 1,
985  $ a( 1, q ), 1 )
986  CALL saxpy( m, -cs*sn*aqoap,
987  $ a( 1, q ), 1,
988  $ a( 1, p ), 1 )
989  work( p ) = work( p ) / cs
990  work( q ) = work( q )*cs
991  IF( rsvec ) THEN
992  CALL saxpy( mvl, t*apoaq,
993  $ v( 1, p ), 1,
994  $ v( 1, q ), 1 )
995  CALL saxpy( mvl,
996  $ -cs*sn*aqoap,
997  $ v( 1, q ), 1,
998  $ v( 1, p ), 1 )
999  END IF
1000  ELSE
1001  IF( work( p ).GE.work( q ) )
1002  $ THEN
1003  CALL saxpy( m, -t*aqoap,
1004  $ a( 1, q ), 1,
1005  $ a( 1, p ), 1 )
1006  CALL saxpy( m, cs*sn*apoaq,
1007  $ a( 1, p ), 1,
1008  $ a( 1, q ), 1 )
1009  work( p ) = work( p )*cs
1010  work( q ) = work( q ) / cs
1011  IF( rsvec ) THEN
1012  CALL saxpy( mvl,
1013  $ -t*aqoap,
1014  $ v( 1, q ), 1,
1015  $ v( 1, p ), 1 )
1016  CALL saxpy( mvl,
1017  $ cs*sn*apoaq,
1018  $ v( 1, p ), 1,
1019  $ v( 1, q ), 1 )
1020  END IF
1021  ELSE
1022  CALL saxpy( m, t*apoaq,
1023  $ a( 1, p ), 1,
1024  $ a( 1, q ), 1 )
1025  CALL saxpy( m,
1026  $ -cs*sn*aqoap,
1027  $ a( 1, q ), 1,
1028  $ a( 1, p ), 1 )
1029  work( p ) = work( p ) / cs
1030  work( q ) = work( q )*cs
1031  IF( rsvec ) THEN
1032  CALL saxpy( mvl,
1033  $ t*apoaq, v( 1, p ),
1034  $ 1, v( 1, q ), 1 )
1035  CALL saxpy( mvl,
1036  $ -cs*sn*aqoap,
1037  $ v( 1, q ), 1,
1038  $ v( 1, p ), 1 )
1039  END IF
1040  END IF
1041  END IF
1042  END IF
1043  END IF
1044 *
1045  ELSE
1046 * .. have to use modified Gram-Schmidt like transformation
1047  CALL scopy( m, a( 1, p ), 1,
1048  $ work( n+1 ), 1 )
1049  CALL slascl( 'G', 0, 0, aapp, one, m,
1050  $ 1, work( n+1 ), lda,
1051  $ ierr )
1052  CALL slascl( 'G', 0, 0, aaqq, one, m,
1053  $ 1, a( 1, q ), lda, ierr )
1054  temp1 = -aapq*work( p ) / work( q )
1055  CALL saxpy( m, temp1, work( n+1 ), 1,
1056  $ a( 1, q ), 1 )
1057  CALL slascl( 'G', 0, 0, one, aaqq, m,
1058  $ 1, a( 1, q ), lda, ierr )
1059  sva( q ) = aaqq*sqrt( max( zero,
1060  $ one-aapq*aapq ) )
1061  mxsinj = max( mxsinj, sfmin )
1062  END IF
1063 * END IF ROTOK THEN ... ELSE
1064 *
1065 * In the case of cancellation in updating SVA(q), SVA(p)
1066 * recompute SVA(q), SVA(p).
1067 *
1068  IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1069  $ THEN
1070  IF( ( aaqq.LT.rootbig ) .AND.
1071  $ ( aaqq.GT.rootsfmin ) ) THEN
1072  sva( q ) = snrm2( m, a( 1, q ), 1 )*
1073  $ work( q )
1074  ELSE
1075  t = zero
1076  aaqq = one
1077  CALL slassq( m, a( 1, q ), 1, t,
1078  $ aaqq )
1079  sva( q ) = t*sqrt( aaqq )*work( q )
1080  END IF
1081  END IF
1082  IF( ( aapp / aapp0 ).LE.rooteps ) THEN
1083  IF( ( aapp.LT.rootbig ) .AND.
1084  $ ( aapp.GT.rootsfmin ) ) THEN
1085  aapp = snrm2( m, a( 1, p ), 1 )*
1086  $ work( p )
1087  ELSE
1088  t = zero
1089  aapp = one
1090  CALL slassq( m, a( 1, p ), 1, t,
1091  $ aapp )
1092  aapp = t*sqrt( aapp )*work( p )
1093  END IF
1094  sva( p ) = aapp
1095  END IF
1096 *
1097  ELSE
1098 * A(:,p) and A(:,q) already numerically orthogonal
1099  IF( ir1.EQ.0 )notrot = notrot + 1
1100 *[RTD] SKIPPED = SKIPPED + 1
1101  pskipped = pskipped + 1
1102  END IF
1103  ELSE
1104 * A(:,q) is zero column
1105  IF( ir1.EQ.0 )notrot = notrot + 1
1106  pskipped = pskipped + 1
1107  END IF
1108 *
1109  IF( ( i.LE.swband ) .AND.
1110  $ ( pskipped.GT.rowskip ) ) THEN
1111  IF( ir1.EQ.0 )aapp = -aapp
1112  notrot = 0
1113  GO TO 2103
1114  END IF
1115 *
1116  2002 CONTINUE
1117 * END q-LOOP
1118 *
1119  2103 CONTINUE
1120 * bailed out of q-loop
1121 *
1122  sva( p ) = aapp
1123 *
1124  ELSE
1125  sva( p ) = aapp
1126  IF( ( ir1.EQ.0 ) .AND. ( aapp.EQ.zero ) )
1127  $ notrot = notrot + min( igl+kbl-1, n ) - p
1128  END IF
1129 *
1130  2001 CONTINUE
1131 * end of the p-loop
1132 * end of doing the block ( ibr, ibr )
1133  1002 CONTINUE
1134 * end of ir1-loop
1135 *
1136 * ... go to the off diagonal blocks
1137 *
1138  igl = ( ibr-1 )*kbl + 1
1139 *
1140  DO 2010 jbc = ibr + 1, nbl
1141 *
1142  jgl = ( jbc-1 )*kbl + 1
1143 *
1144 * doing the block at ( ibr, jbc )
1145 *
1146  ijblsk = 0
1147  DO 2100 p = igl, min( igl+kbl-1, n )
1148 *
1149  aapp = sva( p )
1150  IF( aapp.GT.zero ) THEN
1151 *
1152  pskipped = 0
1153 *
1154  DO 2200 q = jgl, min( jgl+kbl-1, n )
1155 *
1156  aaqq = sva( q )
1157  IF( aaqq.GT.zero ) THEN
1158  aapp0 = aapp
1159 *
1160 * .. M x 2 Jacobi SVD ..
1161 *
1162 * Safe Gram matrix computation
1163 *
1164  IF( aaqq.GE.one ) THEN
1165  IF( aapp.GE.aaqq ) THEN
1166  rotok = ( small*aapp ).LE.aaqq
1167  ELSE
1168  rotok = ( small*aaqq ).LE.aapp
1169  END IF
1170  IF( aapp.LT.( big / aaqq ) ) THEN
1171  aapq = ( sdot( m, a( 1, p ), 1, a( 1,
1172  $ q ), 1 )*work( p )*work( q ) /
1173  $ aaqq ) / aapp
1174  ELSE
1175  CALL scopy( m, a( 1, p ), 1,
1176  $ work( n+1 ), 1 )
1177  CALL slascl( 'G', 0, 0, aapp,
1178  $ work( p ), m, 1,
1179  $ work( n+1 ), lda, ierr )
1180  aapq = sdot( m, work( n+1 ), 1,
1181  $ a( 1, q ), 1 )*work( q ) / aaqq
1182  END IF
1183  ELSE
1184  IF( aapp.GE.aaqq ) THEN
1185  rotok = aapp.LE.( aaqq / small )
1186  ELSE
1187  rotok = aaqq.LE.( aapp / small )
1188  END IF
1189  IF( aapp.GT.( small / aaqq ) ) THEN
1190  aapq = ( sdot( m, a( 1, p ), 1, a( 1,
1191  $ q ), 1 )*work( p )*work( q ) /
1192  $ aaqq ) / aapp
1193  ELSE
1194  CALL scopy( m, a( 1, q ), 1,
1195  $ work( n+1 ), 1 )
1196  CALL slascl( 'G', 0, 0, aaqq,
1197  $ work( q ), m, 1,
1198  $ work( n+1 ), lda, ierr )
1199  aapq = sdot( m, work( n+1 ), 1,
1200  $ a( 1, p ), 1 )*work( p ) / aapp
1201  END IF
1202  END IF
1203 *
1204  mxaapq = max( mxaapq, abs( aapq ) )
1205 *
1206 * TO rotate or NOT to rotate, THAT is the question ...
1207 *
1208  IF( abs( aapq ).GT.tol ) THEN
1209  notrot = 0
1210 *[RTD] ROTATED = ROTATED + 1
1211  pskipped = 0
1212  iswrot = iswrot + 1
1213 *
1214  IF( rotok ) THEN
1215 *
1216  aqoap = aaqq / aapp
1217  apoaq = aapp / aaqq
1218  theta = -half*abs( aqoap-apoaq ) / aapq
1219  IF( aaqq.GT.aapp0 )theta = -theta
1220 *
1221  IF( abs( theta ).GT.bigtheta ) THEN
1222  t = half / theta
1223  fastr( 3 ) = t*work( p ) / work( q )
1224  fastr( 4 ) = -t*work( q ) /
1225  $ work( p )
1226  CALL srotm( m, a( 1, p ), 1,
1227  $ a( 1, q ), 1, fastr )
1228  IF( rsvec )CALL srotm( mvl,
1229  $ v( 1, p ), 1,
1230  $ v( 1, q ), 1,
1231  $ fastr )
1232  sva( q ) = aaqq*sqrt( max( zero,
1233  $ one+t*apoaq*aapq ) )
1234  aapp = aapp*sqrt( max( zero,
1235  $ one-t*aqoap*aapq ) )
1236  mxsinj = max( mxsinj, abs( t ) )
1237  ELSE
1238 *
1239 * .. choose correct signum for THETA and rotate
1240 *
1241  thsign = -sign( one, aapq )
1242  IF( aaqq.GT.aapp0 )thsign = -thsign
1243  t = one / ( theta+thsign*
1244  $ sqrt( one+theta*theta ) )
1245  cs = sqrt( one / ( one+t*t ) )
1246  sn = t*cs
1247  mxsinj = max( mxsinj, abs( sn ) )
1248  sva( q ) = aaqq*sqrt( max( zero,
1249  $ one+t*apoaq*aapq ) )
1250  aapp = aapp*sqrt( max( zero,
1251  $ one-t*aqoap*aapq ) )
1252 *
1253  apoaq = work( p ) / work( q )
1254  aqoap = work( q ) / work( p )
1255  IF( work( p ).GE.one ) THEN
1256 *
1257  IF( work( q ).GE.one ) THEN
1258  fastr( 3 ) = t*apoaq
1259  fastr( 4 ) = -t*aqoap
1260  work( p ) = work( p )*cs
1261  work( q ) = work( q )*cs
1262  CALL srotm( m, a( 1, p ), 1,
1263  $ a( 1, q ), 1,
1264  $ fastr )
1265  IF( rsvec )CALL srotm( mvl,
1266  $ v( 1, p ), 1, v( 1, q ),
1267  $ 1, fastr )
1268  ELSE
1269  CALL saxpy( m, -t*aqoap,
1270  $ a( 1, q ), 1,
1271  $ a( 1, p ), 1 )
1272  CALL saxpy( m, cs*sn*apoaq,
1273  $ a( 1, p ), 1,
1274  $ a( 1, q ), 1 )
1275  IF( rsvec ) THEN
1276  CALL saxpy( mvl, -t*aqoap,
1277  $ v( 1, q ), 1,
1278  $ v( 1, p ), 1 )
1279  CALL saxpy( mvl,
1280  $ cs*sn*apoaq,
1281  $ v( 1, p ), 1,
1282  $ v( 1, q ), 1 )
1283  END IF
1284  work( p ) = work( p )*cs
1285  work( q ) = work( q ) / cs
1286  END IF
1287  ELSE
1288  IF( work( q ).GE.one ) THEN
1289  CALL saxpy( m, t*apoaq,
1290  $ a( 1, p ), 1,
1291  $ a( 1, q ), 1 )
1292  CALL saxpy( m, -cs*sn*aqoap,
1293  $ a( 1, q ), 1,
1294  $ a( 1, p ), 1 )
1295  IF( rsvec ) THEN
1296  CALL saxpy( mvl, t*apoaq,
1297  $ v( 1, p ), 1,
1298  $ v( 1, q ), 1 )
1299  CALL saxpy( mvl,
1300  $ -cs*sn*aqoap,
1301  $ v( 1, q ), 1,
1302  $ v( 1, p ), 1 )
1303  END IF
1304  work( p ) = work( p ) / cs
1305  work( q ) = work( q )*cs
1306  ELSE
1307  IF( work( p ).GE.work( q ) )
1308  $ THEN
1309  CALL saxpy( m, -t*aqoap,
1310  $ a( 1, q ), 1,
1311  $ a( 1, p ), 1 )
1312  CALL saxpy( m, cs*sn*apoaq,
1313  $ a( 1, p ), 1,
1314  $ a( 1, q ), 1 )
1315  work( p ) = work( p )*cs
1316  work( q ) = work( q ) / cs
1317  IF( rsvec ) THEN
1318  CALL saxpy( mvl,
1319  $ -t*aqoap,
1320  $ v( 1, q ), 1,
1321  $ v( 1, p ), 1 )
1322  CALL saxpy( mvl,
1323  $ cs*sn*apoaq,
1324  $ v( 1, p ), 1,
1325  $ v( 1, q ), 1 )
1326  END IF
1327  ELSE
1328  CALL saxpy( m, t*apoaq,
1329  $ a( 1, p ), 1,
1330  $ a( 1, q ), 1 )
1331  CALL saxpy( m,
1332  $ -cs*sn*aqoap,
1333  $ a( 1, q ), 1,
1334  $ a( 1, p ), 1 )
1335  work( p ) = work( p ) / cs
1336  work( q ) = work( q )*cs
1337  IF( rsvec ) THEN
1338  CALL saxpy( mvl,
1339  $ t*apoaq, v( 1, p ),
1340  $ 1, v( 1, q ), 1 )
1341  CALL saxpy( mvl,
1342  $ -cs*sn*aqoap,
1343  $ v( 1, q ), 1,
1344  $ v( 1, p ), 1 )
1345  END IF
1346  END IF
1347  END IF
1348  END IF
1349  END IF
1350 *
1351  ELSE
1352  IF( aapp.GT.aaqq ) THEN
1353  CALL scopy( m, a( 1, p ), 1,
1354  $ work( n+1 ), 1 )
1355  CALL slascl( 'G', 0, 0, aapp, one,
1356  $ m, 1, work( n+1 ), lda,
1357  $ ierr )
1358  CALL slascl( 'G', 0, 0, aaqq, one,
1359  $ m, 1, a( 1, q ), lda,
1360  $ ierr )
1361  temp1 = -aapq*work( p ) / work( q )
1362  CALL saxpy( m, temp1, work( n+1 ),
1363  $ 1, a( 1, q ), 1 )
1364  CALL slascl( 'G', 0, 0, one, aaqq,
1365  $ m, 1, a( 1, q ), lda,
1366  $ ierr )
1367  sva( q ) = aaqq*sqrt( max( zero,
1368  $ one-aapq*aapq ) )
1369  mxsinj = max( mxsinj, sfmin )
1370  ELSE
1371  CALL scopy( m, a( 1, q ), 1,
1372  $ work( n+1 ), 1 )
1373  CALL slascl( 'G', 0, 0, aaqq, one,
1374  $ m, 1, work( n+1 ), lda,
1375  $ ierr )
1376  CALL slascl( 'G', 0, 0, aapp, one,
1377  $ m, 1, a( 1, p ), lda,
1378  $ ierr )
1379  temp1 = -aapq*work( q ) / work( p )
1380  CALL saxpy( m, temp1, work( n+1 ),
1381  $ 1, a( 1, p ), 1 )
1382  CALL slascl( 'G', 0, 0, one, aapp,
1383  $ m, 1, a( 1, p ), lda,
1384  $ ierr )
1385  sva( p ) = aapp*sqrt( max( zero,
1386  $ one-aapq*aapq ) )
1387  mxsinj = max( mxsinj, sfmin )
1388  END IF
1389  END IF
1390 * END IF ROTOK THEN ... ELSE
1391 *
1392 * In the case of cancellation in updating SVA(q)
1393 * .. recompute SVA(q)
1394  IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1395  $ THEN
1396  IF( ( aaqq.LT.rootbig ) .AND.
1397  $ ( aaqq.GT.rootsfmin ) ) THEN
1398  sva( q ) = snrm2( m, a( 1, q ), 1 )*
1399  $ work( q )
1400  ELSE
1401  t = zero
1402  aaqq = one
1403  CALL slassq( m, a( 1, q ), 1, t,
1404  $ aaqq )
1405  sva( q ) = t*sqrt( aaqq )*work( q )
1406  END IF
1407  END IF
1408  IF( ( aapp / aapp0 )**2.LE.rooteps ) THEN
1409  IF( ( aapp.LT.rootbig ) .AND.
1410  $ ( aapp.GT.rootsfmin ) ) THEN
1411  aapp = snrm2( m, a( 1, p ), 1 )*
1412  $ work( p )
1413  ELSE
1414  t = zero
1415  aapp = one
1416  CALL slassq( m, a( 1, p ), 1, t,
1417  $ aapp )
1418  aapp = t*sqrt( aapp )*work( p )
1419  END IF
1420  sva( p ) = aapp
1421  END IF
1422 * end of OK rotation
1423  ELSE
1424  notrot = notrot + 1
1425 *[RTD] SKIPPED = SKIPPED + 1
1426  pskipped = pskipped + 1
1427  ijblsk = ijblsk + 1
1428  END IF
1429  ELSE
1430  notrot = notrot + 1
1431  pskipped = pskipped + 1
1432  ijblsk = ijblsk + 1
1433  END IF
1434 *
1435  IF( ( i.LE.swband ) .AND. ( ijblsk.GE.blskip ) )
1436  $ THEN
1437  sva( p ) = aapp
1438  notrot = 0
1439  GO TO 2011
1440  END IF
1441  IF( ( i.LE.swband ) .AND.
1442  $ ( pskipped.GT.rowskip ) ) THEN
1443  aapp = -aapp
1444  notrot = 0
1445  GO TO 2203
1446  END IF
1447 *
1448  2200 CONTINUE
1449 * end of the q-loop
1450  2203 CONTINUE
1451 *
1452  sva( p ) = aapp
1453 *
1454  ELSE
1455 *
1456  IF( aapp.EQ.zero )notrot = notrot +
1457  $ min( jgl+kbl-1, n ) - jgl + 1
1458  IF( aapp.LT.zero )notrot = 0
1459 *
1460  END IF
1461 *
1462  2100 CONTINUE
1463 * end of the p-loop
1464  2010 CONTINUE
1465 * end of the jbc-loop
1466  2011 CONTINUE
1467 *2011 bailed out of the jbc-loop
1468  DO 2012 p = igl, min( igl+kbl-1, n )
1469  sva( p ) = abs( sva( p ) )
1470  2012 CONTINUE
1471 ***
1472  2000 CONTINUE
1473 *2000 :: end of the ibr-loop
1474 *
1475 * .. update SVA(N)
1476  IF( ( sva( n ).LT.rootbig ) .AND. ( sva( n ).GT.rootsfmin ) )
1477  $ THEN
1478  sva( n ) = snrm2( m, a( 1, n ), 1 )*work( n )
1479  ELSE
1480  t = zero
1481  aapp = one
1482  CALL slassq( m, a( 1, n ), 1, t, aapp )
1483  sva( n ) = t*sqrt( aapp )*work( n )
1484  END IF
1485 *
1486 * Additional steering devices
1487 *
1488  IF( ( i.LT.swband ) .AND. ( ( mxaapq.LE.roottol ) .OR.
1489  $ ( iswrot.LE.n ) ) )swband = i
1490 *
1491  IF( ( i.GT.swband+1 ) .AND. ( mxaapq.LT.sqrt( float( n ) )*
1492  $ tol ) .AND. ( float( n )*mxaapq*mxsinj.LT.tol ) ) THEN
1493  GO TO 1994
1494  END IF
1495 *
1496  IF( notrot.GE.emptsw )GO TO 1994
1497 *
1498  1993 CONTINUE
1499 * end i=1:NSWEEP loop
1500 *
1501 * #:( Reaching this point means that the procedure has not converged.
1502  info = nsweep - 1
1503  GO TO 1995
1504 *
1505  1994 CONTINUE
1506 * #:) Reaching this point means numerical convergence after the i-th
1507 * sweep.
1508 *
1509  info = 0
1510 * #:) INFO = 0 confirms successful iterations.
1511  1995 CONTINUE
1512 *
1513 * Sort the singular values and find how many are above
1514 * the underflow threshold.
1515 *
1516  n2 = 0
1517  n4 = 0
1518  DO 5991 p = 1, n - 1
1519  q = isamax( n-p+1, sva( p ), 1 ) + p - 1
1520  IF( p.NE.q ) THEN
1521  temp1 = sva( p )
1522  sva( p ) = sva( q )
1523  sva( q ) = temp1
1524  temp1 = work( p )
1525  work( p ) = work( q )
1526  work( q ) = temp1
1527  CALL sswap( m, a( 1, p ), 1, a( 1, q ), 1 )
1528  IF( rsvec )CALL sswap( mvl, v( 1, p ), 1, v( 1, q ), 1 )
1529  END IF
1530  IF( sva( p ).NE.zero ) THEN
1531  n4 = n4 + 1
1532  IF( sva( p )*skl.GT.sfmin )n2 = n2 + 1
1533  END IF
1534  5991 CONTINUE
1535  IF( sva( n ).NE.zero ) THEN
1536  n4 = n4 + 1
1537  IF( sva( n )*skl.GT.sfmin )n2 = n2 + 1
1538  END IF
1539 *
1540 * Normalize the left singular vectors.
1541 *
1542  IF( lsvec .OR. uctol ) THEN
1543  DO 1998 p = 1, n2
1544  CALL sscal( m, work( p ) / sva( p ), a( 1, p ), 1 )
1545  1998 CONTINUE
1546  END IF
1547 *
1548 * Scale the product of Jacobi rotations (assemble the fast rotations).
1549 *
1550  IF( rsvec ) THEN
1551  IF( applv ) THEN
1552  DO 2398 p = 1, n
1553  CALL sscal( mvl, work( p ), v( 1, p ), 1 )
1554  2398 CONTINUE
1555  ELSE
1556  DO 2399 p = 1, n
1557  temp1 = one / snrm2( mvl, v( 1, p ), 1 )
1558  CALL sscal( mvl, temp1, v( 1, p ), 1 )
1559  2399 CONTINUE
1560  END IF
1561  END IF
1562 *
1563 * Undo scaling, if necessary (and possible).
1564  IF( ( ( skl.GT.one ) .AND. ( sva( 1 ).LT.( big / skl ) ) )
1565  $ .OR. ( ( skl.LT.one ) .AND. ( sva( max( n2, 1 ) ) .GT.
1566  $ ( sfmin / skl ) ) ) ) THEN
1567  DO 2400 p = 1, n
1568  sva( p ) = skl*sva( p )
1569  2400 CONTINUE
1570  skl = one
1571  END IF
1572 *
1573  work( 1 ) = skl
1574 * The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE
1575 * then some of the singular values may overflow or underflow and
1576 * the spectrum is given in this factored representation.
1577 *
1578  work( 2 ) = float( n4 )
1579 * N4 is the number of computed nonzero singular values of A.
1580 *
1581  work( 3 ) = float( n2 )
1582 * N2 is the number of singular values of A greater than SFMIN.
1583 * If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers
1584 * that may carry some information.
1585 *
1586  work( 4 ) = float( i )
1587 * i is the index of the last sweep before declaring convergence.
1588 *
1589  work( 5 ) = mxaapq
1590 * MXAAPQ is the largest absolute value of scaled pivots in the
1591 * last sweep
1592 *
1593  work( 6 ) = mxsinj
1594 * MXSINJ is the largest absolute value of the sines of Jacobi angles
1595 * in the last sweep
1596 *
1597  RETURN
1598 * ..
1599 * .. END OF SGESVJ
1600 * ..
1601  END
subroutine slassq(N, X, INCX, SCALE, SUMSQ)
SLASSQ updates a sum of squares represented in scaled form.
Definition: slassq.f:105
subroutine sgesvj(JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, LDV, WORK, LWORK, INFO)
SGESVJ
Definition: sgesvj.f:325
subroutine srotm(N, SX, INCX, SY, INCY, SPARAM)
SROTM
Definition: srotm.f:101
subroutine slascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
SLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: slascl.f:145
subroutine sgsvj1(JOBV, M, N, N1, A, LDA, D, SVA, MV, V, LDV, EPS, SFMIN, TOL, NSWEEP, WORK, LWORK, INFO)
SGSVJ1 pre-processor for the routine sgesvj, applies Jacobi rotations targeting only particular pivot...
Definition: sgsvj1.f:238
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:62
subroutine slaset(UPLO, M, N, ALPHA, BETA, A, LDA)
SLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values...
Definition: slaset.f:112
subroutine saxpy(N, SA, SX, INCX, SY, INCY)
SAXPY
Definition: saxpy.f:54
subroutine sgsvj0(JOBV, M, N, A, LDA, D, SVA, MV, V, LDV, EPS, SFMIN, TOL, NSWEEP, WORK, LWORK, INFO)
SGSVJ0 pre-processor for the routine sgesvj.
Definition: sgsvj0.f:220
subroutine sscal(N, SA, SX, INCX)
SSCAL
Definition: sscal.f:55
subroutine sswap(N, SX, INCX, SY, INCY)
SSWAP
Definition: sswap.f:53
subroutine scopy(N, SX, INCX, SY, INCY)
SCOPY
Definition: scopy.f:53