LAPACK  3.6.1
LAPACK: Linear Algebra PACKage
dgesvj.f
Go to the documentation of this file.
1 *> \brief \b DGESVJ
2 *
3 * =========== DOCUMENTATION ===========
4 *
5 * Online html documentation available at
6 * http://www.netlib.org/lapack/explore-html/
7 *
8 *> \htmlonly
9 *> Download DGESVJ + dependencies
10 *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/dgesvj.f">
11 *> [TGZ]</a>
12 *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/dgesvj.f">
13 *> [ZIP]</a>
14 *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/dgesvj.f">
15 *> [TXT]</a>
16 *> \endhtmlonly
17 *
18 * Definition:
19 * ===========
20 *
21 * SUBROUTINE DGESVJ( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V,
22 * LDV, WORK, LWORK, INFO )
23 *
24 * .. Scalar Arguments ..
25 * INTEGER INFO, LDA, LDV, LWORK, M, MV, N
26 * CHARACTER*1 JOBA, JOBU, JOBV
27 * ..
28 * .. Array Arguments ..
29 * DOUBLE PRECISION A( LDA, * ), SVA( N ), V( LDV, * ),
30 * $ WORK( LWORK )
31 * ..
32 *
33 *
34 *> \par Purpose:
35 * =============
36 *>
37 *> \verbatim
38 *>
39 *> DGESVJ computes the singular value decomposition (SVD) of a real
40 *> M-by-N matrix A, where M >= N. The SVD of A is written as
41 *> [++] [xx] [x0] [xx]
42 *> A = U * SIGMA * V^t, [++] = [xx] * [ox] * [xx]
43 *> [++] [xx]
44 *> where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal
45 *> matrix, and V is an N-by-N orthogonal matrix. The diagonal elements
46 *> of SIGMA are the singular values of A. The columns of U and V are the
47 *> left and the right singular vectors of A, respectively.
48 *> DGESVJ can sometimes compute tiny singular values and their singular vectors much
49 *> more accurately than other SVD routines, see below under Further Details.
50 *> \endverbatim
51 *
52 * Arguments:
53 * ==========
54 *
55 *> \param[in] JOBA
56 *> \verbatim
57 *> JOBA is CHARACTER* 1
58 *> Specifies the structure of A.
59 *> = 'L': The input matrix A is lower triangular;
60 *> = 'U': The input matrix A is upper triangular;
61 *> = 'G': The input matrix A is general M-by-N matrix, M >= N.
62 *> \endverbatim
63 *>
64 *> \param[in] JOBU
65 *> \verbatim
66 *> JOBU is CHARACTER*1
67 *> Specifies whether to compute the left singular vectors
68 *> (columns of U):
69 *> = 'U': The left singular vectors corresponding to the nonzero
70 *> singular values are computed and returned in the leading
71 *> columns of A. See more details in the description of A.
72 *> The default numerical orthogonality threshold is set to
73 *> approximately TOL=CTOL*EPS, CTOL=DSQRT(M), EPS=DLAMCH('E').
74 *> = 'C': Analogous to JOBU='U', except that user can control the
75 *> level of numerical orthogonality of the computed left
76 *> singular vectors. TOL can be set to TOL = CTOL*EPS, where
77 *> CTOL is given on input in the array WORK.
78 *> No CTOL smaller than ONE is allowed. CTOL greater
79 *> than 1 / EPS is meaningless. The option 'C'
80 *> can be used if M*EPS is satisfactory orthogonality
81 *> of the computed left singular vectors, so CTOL=M could
82 *> save few sweeps of Jacobi rotations.
83 *> See the descriptions of A and WORK(1).
84 *> = 'N': The matrix U is not computed. However, see the
85 *> description of A.
86 *> \endverbatim
87 *>
88 *> \param[in] JOBV
89 *> \verbatim
90 *> JOBV is CHARACTER*1
91 *> Specifies whether to compute the right singular vectors, that
92 *> is, the matrix V:
93 *> = 'V' : the matrix V is computed and returned in the array V
94 *> = 'A' : the Jacobi rotations are applied to the MV-by-N
95 *> array V. In other words, the right singular vector
96 *> matrix V is not computed explicitly, instead it is
97 *> applied to an MV-by-N matrix initially stored in the
98 *> first MV rows of V.
99 *> = 'N' : the matrix V is not computed and the array V is not
100 *> referenced
101 *> \endverbatim
102 *>
103 *> \param[in] M
104 *> \verbatim
105 *> M is INTEGER
106 *> The number of rows of the input matrix A. 1/DLAMCH('E') > M >= 0.
107 *> \endverbatim
108 *>
109 *> \param[in] N
110 *> \verbatim
111 *> N is INTEGER
112 *> The number of columns of the input matrix A.
113 *> M >= N >= 0.
114 *> \endverbatim
115 *>
116 *> \param[in,out] A
117 *> \verbatim
118 *> A is DOUBLE PRECISION array, dimension (LDA,N)
119 *> On entry, the M-by-N matrix A.
120 *> On exit :
121 *> If JOBU .EQ. 'U' .OR. JOBU .EQ. 'C' :
122 *> If INFO .EQ. 0 :
123 *> RANKA orthonormal columns of U are returned in the
124 *> leading RANKA columns of the array A. Here RANKA <= N
125 *> is the number of computed singular values of A that are
126 *> above the underflow threshold DLAMCH('S'). The singular
127 *> vectors corresponding to underflowed or zero singular
128 *> values are not computed. The value of RANKA is returned
129 *> in the array WORK as RANKA=NINT(WORK(2)). Also see the
130 *> descriptions of SVA and WORK. The computed columns of U
131 *> are mutually numerically orthogonal up to approximately
132 *> TOL=DSQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU.EQ.'C'),
133 *> see the description of JOBU.
134 *> If INFO .GT. 0 :
135 *> the procedure DGESVJ did not converge in the given number
136 *> of iterations (sweeps). In that case, the computed
137 *> columns of U may not be orthogonal up to TOL. The output
138 *> U (stored in A), SIGMA (given by the computed singular
139 *> values in SVA(1:N)) and V is still a decomposition of the
140 *> input matrix A in the sense that the residual
141 *> ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small.
142 *>
143 *> If JOBU .EQ. 'N' :
144 *> If INFO .EQ. 0 :
145 *> Note that the left singular vectors are 'for free' in the
146 *> one-sided Jacobi SVD algorithm. However, if only the
147 *> singular values are needed, the level of numerical
148 *> orthogonality of U is not an issue and iterations are
149 *> stopped when the columns of the iterated matrix are
150 *> numerically orthogonal up to approximately M*EPS. Thus,
151 *> on exit, A contains the columns of U scaled with the
152 *> corresponding singular values.
153 *> If INFO .GT. 0 :
154 *> the procedure DGESVJ did not converge in the given number
155 *> of iterations (sweeps).
156 *> \endverbatim
157 *>
158 *> \param[in] LDA
159 *> \verbatim
160 *> LDA is INTEGER
161 *> The leading dimension of the array A. LDA >= max(1,M).
162 *> \endverbatim
163 *>
164 *> \param[out] SVA
165 *> \verbatim
166 *> SVA is DOUBLE PRECISION array, dimension (N)
167 *> On exit :
168 *> If INFO .EQ. 0 :
169 *> depending on the value SCALE = WORK(1), we have:
170 *> If SCALE .EQ. ONE :
171 *> SVA(1:N) contains the computed singular values of A.
172 *> During the computation SVA contains the Euclidean column
173 *> norms of the iterated matrices in the array A.
174 *> If SCALE .NE. ONE :
175 *> The singular values of A are SCALE*SVA(1:N), and this
176 *> factored representation is due to the fact that some of the
177 *> singular values of A might underflow or overflow.
178 *> If INFO .GT. 0 :
179 *> the procedure DGESVJ did not converge in the given number of
180 *> iterations (sweeps) and SCALE*SVA(1:N) may not be accurate.
181 *> \endverbatim
182 *>
183 *> \param[in] MV
184 *> \verbatim
185 *> MV is INTEGER
186 *> If JOBV .EQ. 'A', then the product of Jacobi rotations in DGESVJ
187 *> is applied to the first MV rows of V. See the description of JOBV.
188 *> \endverbatim
189 *>
190 *> \param[in,out] V
191 *> \verbatim
192 *> V is DOUBLE PRECISION array, dimension (LDV,N)
193 *> If JOBV = 'V', then V contains on exit the N-by-N matrix of
194 *> the right singular vectors;
195 *> If JOBV = 'A', then V contains the product of the computed right
196 *> singular vector matrix and the initial matrix in
197 *> the array V.
198 *> If JOBV = 'N', then V is not referenced.
199 *> \endverbatim
200 *>
201 *> \param[in] LDV
202 *> \verbatim
203 *> LDV is INTEGER
204 *> The leading dimension of the array V, LDV .GE. 1.
205 *> If JOBV .EQ. 'V', then LDV .GE. max(1,N).
206 *> If JOBV .EQ. 'A', then LDV .GE. max(1,MV) .
207 *> \endverbatim
208 *>
209 *> \param[in,out] WORK
210 *> \verbatim
211 *> WORK is DOUBLE PRECISION array, dimension max(4,M+N).
212 *> On entry :
213 *> If JOBU .EQ. 'C' :
214 *> WORK(1) = CTOL, where CTOL defines the threshold for convergence.
215 *> The process stops if all columns of A are mutually
216 *> orthogonal up to CTOL*EPS, EPS=DLAMCH('E').
217 *> It is required that CTOL >= ONE, i.e. it is not
218 *> allowed to force the routine to obtain orthogonality
219 *> below EPS.
220 *> On exit :
221 *> WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N)
222 *> are the computed singular values of A.
223 *> (See description of SVA().)
224 *> WORK(2) = NINT(WORK(2)) is the number of the computed nonzero
225 *> singular values.
226 *> WORK(3) = NINT(WORK(3)) is the number of the computed singular
227 *> values that are larger than the underflow threshold.
228 *> WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi
229 *> rotations needed for numerical convergence.
230 *> WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep.
231 *> This is useful information in cases when DGESVJ did
232 *> not converge, as it can be used to estimate whether
233 *> the output is stil useful and for post festum analysis.
234 *> WORK(6) = the largest absolute value over all sines of the
235 *> Jacobi rotation angles in the last sweep. It can be
236 *> useful for a post festum analysis.
237 *> \endverbatim
238 *>
239 *> \param[in] LWORK
240 *> \verbatim
241 *> LWORK is INTEGER
242 *> length of WORK, WORK >= MAX(6,M+N)
243 *> \endverbatim
244 *>
245 *> \param[out] INFO
246 *> \verbatim
247 *> INFO is INTEGER
248 *> = 0 : successful exit.
249 *> < 0 : if INFO = -i, then the i-th argument had an illegal value
250 *> > 0 : DGESVJ did not converge in the maximal allowed number (30)
251 *> of sweeps. The output may still be useful. See the
252 *> description of WORK.
253 *> \endverbatim
254 *
255 * Authors:
256 * ========
257 *
258 *> \author Univ. of Tennessee
259 *> \author Univ. of California Berkeley
260 *> \author Univ. of Colorado Denver
261 *> \author NAG Ltd.
262 *
263 *> \date November 2015
264 *
265 *> \ingroup doubleGEcomputational
266 *
267 *> \par Further Details:
268 * =====================
269 *>
270 *> \verbatim
271 *>
272 *> The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane
273 *> rotations. The rotations are implemented as fast scaled rotations of
274 *> Anda and Park [1]. In the case of underflow of the Jacobi angle, a
275 *> modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses
276 *> column interchanges of de Rijk [2]. The relative accuracy of the computed
277 *> singular values and the accuracy of the computed singular vectors (in
278 *> angle metric) is as guaranteed by the theory of Demmel and Veselic [3].
279 *> The condition number that determines the accuracy in the full rank case
280 *> is essentially min_{D=diag} kappa(A*D), where kappa(.) is the
281 *> spectral condition number. The best performance of this Jacobi SVD
282 *> procedure is achieved if used in an accelerated version of Drmac and
283 *> Veselic [5,6], and it is the kernel routine in the SIGMA library [7].
284 *> Some tunning parameters (marked with [TP]) are available for the
285 *> implementer.
286 *> The computational range for the nonzero singular values is the machine
287 *> number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even
288 *> denormalized singular values can be computed with the corresponding
289 *> gradual loss of accurate digits.
290 *> \endverbatim
291 *
292 *> \par Contributors:
293 * ==================
294 *>
295 *> \verbatim
296 *>
297 *> ============
298 *>
299 *> Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
300 *> \endverbatim
301 *
302 *> \par References:
303 * ================
304 *>
305 *> \verbatim
306 *>
307 *> [1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling.
308 *> SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174.
309 *> [2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the
310 *> singular value decomposition on a vector computer.
311 *> SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371.
312 *> [3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR.
313 *> [4] Z. Drmac: Implementation of Jacobi rotations for accurate singular
314 *> value computation in floating point arithmetic.
315 *> SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222.
316 *> [5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
317 *> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
318 *> LAPACK Working note 169.
319 *> [6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
320 *> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
321 *> LAPACK Working note 170.
322 *> [7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
323 *> QSVD, (H,K)-SVD computations.
324 *> Department of Mathematics, University of Zagreb, 2008.
325 *> \endverbatim
326 *
327 *> \par Bugs, examples and comments:
328 * =================================
329 *>
330 *> \verbatim
331 *> ===========================
332 *> Please report all bugs and send interesting test examples and comments to
333 *> drmac@math.hr. Thank you.
334 *> \endverbatim
335 *>
336 * =====================================================================
337  SUBROUTINE dgesvj( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V,
338  $ ldv, work, lwork, info )
339 *
340 * -- LAPACK computational routine (version 3.6.0) --
341 * -- LAPACK is a software package provided by Univ. of Tennessee, --
342 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
343 * November 2015
344 *
345 * .. Scalar Arguments ..
346  INTEGER INFO, LDA, LDV, LWORK, M, MV, N
347  CHARACTER*1 JOBA, JOBU, JOBV
348 * ..
349 * .. Array Arguments ..
350  DOUBLE PRECISION A( lda, * ), SVA( n ), V( ldv, * ),
351  $ work( lwork )
352 * ..
353 *
354 * =====================================================================
355 *
356 * .. Local Parameters ..
357  DOUBLE PRECISION ZERO, HALF, ONE
358  parameter ( zero = 0.0d0, half = 0.5d0, one = 1.0d0)
359  INTEGER NSWEEP
360  parameter ( nsweep = 30 )
361 * ..
362 * .. Local Scalars ..
363  DOUBLE PRECISION AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG,
364  $ bigtheta, cs, ctol, epsln, large, mxaapq,
365  $ mxsinj, rootbig, rooteps, rootsfmin, roottol,
366  $ skl, sfmin, small, sn, t, temp1, theta,
367  $ thsign, tol
368  INTEGER BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1,
369  $ iswrot, jbc, jgl, kbl, lkahead, mvl, n2, n34,
370  $ n4, nbl, notrot, p, pskipped, q, rowskip,
371  $ swband
372  LOGICAL APPLV, GOSCALE, LOWER, LSVEC, NOSCALE, ROTOK,
373  $ rsvec, uctol, upper
374 * ..
375 * .. Local Arrays ..
376  DOUBLE PRECISION FASTR( 5 )
377 * ..
378 * .. Intrinsic Functions ..
379  INTRINSIC dabs, max, min, dble, dsign, dsqrt
380 * ..
381 * .. External Functions ..
382 * ..
383 * from BLAS
384  DOUBLE PRECISION DDOT, DNRM2
385  EXTERNAL ddot, dnrm2
386  INTEGER IDAMAX
387  EXTERNAL idamax
388 * from LAPACK
389  DOUBLE PRECISION DLAMCH
390  EXTERNAL dlamch
391  LOGICAL LSAME
392  EXTERNAL lsame
393 * ..
394 * .. External Subroutines ..
395 * ..
396 * from BLAS
397  EXTERNAL daxpy, dcopy, drotm, dscal, dswap
398 * from LAPACK
399  EXTERNAL dlascl, dlaset, dlassq, xerbla
400 *
401  EXTERNAL dgsvj0, dgsvj1
402 * ..
403 * .. Executable Statements ..
404 *
405 * Test the input arguments
406 *
407  lsvec = lsame( jobu, 'U' )
408  uctol = lsame( jobu, 'C' )
409  rsvec = lsame( jobv, 'V' )
410  applv = lsame( jobv, 'A' )
411  upper = lsame( joba, 'U' )
412  lower = lsame( joba, 'L' )
413 *
414  IF( .NOT.( upper .OR. lower .OR. lsame( joba, 'G' ) ) ) THEN
415  info = -1
416  ELSE IF( .NOT.( lsvec .OR. uctol .OR. lsame( jobu, 'N' ) ) ) THEN
417  info = -2
418  ELSE IF( .NOT.( rsvec .OR. applv .OR. lsame( jobv, 'N' ) ) ) THEN
419  info = -3
420  ELSE IF( m.LT.0 ) THEN
421  info = -4
422  ELSE IF( ( n.LT.0 ) .OR. ( n.GT.m ) ) THEN
423  info = -5
424  ELSE IF( lda.LT.m ) THEN
425  info = -7
426  ELSE IF( mv.LT.0 ) THEN
427  info = -9
428  ELSE IF( ( rsvec .AND. ( ldv.LT.n ) ) .OR.
429  $ ( applv .AND. ( ldv.LT.mv ) ) ) THEN
430  info = -11
431  ELSE IF( uctol .AND. ( work( 1 ).LE.one ) ) THEN
432  info = -12
433  ELSE IF( lwork.LT.max( m+n, 6 ) ) THEN
434  info = -13
435  ELSE
436  info = 0
437  END IF
438 *
439 * #:(
440  IF( info.NE.0 ) THEN
441  CALL xerbla( 'DGESVJ', -info )
442  RETURN
443  END IF
444 *
445 * #:) Quick return for void matrix
446 *
447  IF( ( m.EQ.0 ) .OR. ( n.EQ.0 ) )RETURN
448 *
449 * Set numerical parameters
450 * The stopping criterion for Jacobi rotations is
451 *
452 * max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS
453 *
454 * where EPS is the round-off and CTOL is defined as follows:
455 *
456  IF( uctol ) THEN
457 * ... user controlled
458  ctol = work( 1 )
459  ELSE
460 * ... default
461  IF( lsvec .OR. rsvec .OR. applv ) THEN
462  ctol = dsqrt( dble( m ) )
463  ELSE
464  ctol = dble( m )
465  END IF
466  END IF
467 * ... and the machine dependent parameters are
468 *[!] (Make sure that DLAMCH() works properly on the target machine.)
469 *
470  epsln = dlamch( 'Epsilon' )
471  rooteps = dsqrt( epsln )
472  sfmin = dlamch( 'SafeMinimum' )
473  rootsfmin = dsqrt( sfmin )
474  small = sfmin / epsln
475  big = dlamch( 'Overflow' )
476 * BIG = ONE / SFMIN
477  rootbig = one / rootsfmin
478  large = big / dsqrt( dble( m*n ) )
479  bigtheta = one / rooteps
480 *
481  tol = ctol*epsln
482  roottol = dsqrt( tol )
483 *
484  IF( dble( m )*epsln.GE.one ) THEN
485  info = -4
486  CALL xerbla( 'DGESVJ', -info )
487  RETURN
488  END IF
489 *
490 * Initialize the right singular vector matrix.
491 *
492  IF( rsvec ) THEN
493  mvl = n
494  CALL dlaset( 'A', mvl, n, zero, one, v, ldv )
495  ELSE IF( applv ) THEN
496  mvl = mv
497  END IF
498  rsvec = rsvec .OR. applv
499 *
500 * Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N )
501 *(!) If necessary, scale A to protect the largest singular value
502 * from overflow. It is possible that saving the largest singular
503 * value destroys the information about the small ones.
504 * This initial scaling is almost minimal in the sense that the
505 * goal is to make sure that no column norm overflows, and that
506 * DSQRT(N)*max_i SVA(i) does not overflow. If INFinite entries
507 * in A are detected, the procedure returns with INFO=-6.
508 *
509  skl= one / dsqrt( dble( m )*dble( n ) )
510  noscale = .true.
511  goscale = .true.
512 *
513  IF( lower ) THEN
514 * the input matrix is M-by-N lower triangular (trapezoidal)
515  DO 1874 p = 1, n
516  aapp = zero
517  aaqq = one
518  CALL dlassq( m-p+1, a( p, p ), 1, aapp, aaqq )
519  IF( aapp.GT.big ) THEN
520  info = -6
521  CALL xerbla( 'DGESVJ', -info )
522  RETURN
523  END IF
524  aaqq = dsqrt( aaqq )
525  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
526  sva( p ) = aapp*aaqq
527  ELSE
528  noscale = .false.
529  sva( p ) = aapp*( aaqq*skl)
530  IF( goscale ) THEN
531  goscale = .false.
532  DO 1873 q = 1, p - 1
533  sva( q ) = sva( q )*skl
534  1873 CONTINUE
535  END IF
536  END IF
537  1874 CONTINUE
538  ELSE IF( upper ) THEN
539 * the input matrix is M-by-N upper triangular (trapezoidal)
540  DO 2874 p = 1, n
541  aapp = zero
542  aaqq = one
543  CALL dlassq( p, a( 1, p ), 1, aapp, aaqq )
544  IF( aapp.GT.big ) THEN
545  info = -6
546  CALL xerbla( 'DGESVJ', -info )
547  RETURN
548  END IF
549  aaqq = dsqrt( aaqq )
550  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
551  sva( p ) = aapp*aaqq
552  ELSE
553  noscale = .false.
554  sva( p ) = aapp*( aaqq*skl)
555  IF( goscale ) THEN
556  goscale = .false.
557  DO 2873 q = 1, p - 1
558  sva( q ) = sva( q )*skl
559  2873 CONTINUE
560  END IF
561  END IF
562  2874 CONTINUE
563  ELSE
564 * the input matrix is M-by-N general dense
565  DO 3874 p = 1, n
566  aapp = zero
567  aaqq = one
568  CALL dlassq( m, a( 1, p ), 1, aapp, aaqq )
569  IF( aapp.GT.big ) THEN
570  info = -6
571  CALL xerbla( 'DGESVJ', -info )
572  RETURN
573  END IF
574  aaqq = dsqrt( aaqq )
575  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
576  sva( p ) = aapp*aaqq
577  ELSE
578  noscale = .false.
579  sva( p ) = aapp*( aaqq*skl)
580  IF( goscale ) THEN
581  goscale = .false.
582  DO 3873 q = 1, p - 1
583  sva( q ) = sva( q )*skl
584  3873 CONTINUE
585  END IF
586  END IF
587  3874 CONTINUE
588  END IF
589 *
590  IF( noscale )skl= one
591 *
592 * Move the smaller part of the spectrum from the underflow threshold
593 *(!) Start by determining the position of the nonzero entries of the
594 * array SVA() relative to ( SFMIN, BIG ).
595 *
596  aapp = zero
597  aaqq = big
598  DO 4781 p = 1, n
599  IF( sva( p ).NE.zero )aaqq = min( aaqq, sva( p ) )
600  aapp = max( aapp, sva( p ) )
601  4781 CONTINUE
602 *
603 * #:) Quick return for zero matrix
604 *
605  IF( aapp.EQ.zero ) THEN
606  IF( lsvec )CALL dlaset( 'G', m, n, zero, one, a, lda )
607  work( 1 ) = one
608  work( 2 ) = zero
609  work( 3 ) = zero
610  work( 4 ) = zero
611  work( 5 ) = zero
612  work( 6 ) = zero
613  RETURN
614  END IF
615 *
616 * #:) Quick return for one-column matrix
617 *
618  IF( n.EQ.1 ) THEN
619  IF( lsvec )CALL dlascl( 'G', 0, 0, sva( 1 ), skl, m, 1,
620  $ a( 1, 1 ), lda, ierr )
621  work( 1 ) = one / skl
622  IF( sva( 1 ).GE.sfmin ) THEN
623  work( 2 ) = one
624  ELSE
625  work( 2 ) = zero
626  END IF
627  work( 3 ) = zero
628  work( 4 ) = zero
629  work( 5 ) = zero
630  work( 6 ) = zero
631  RETURN
632  END IF
633 *
634 * Protect small singular values from underflow, and try to
635 * avoid underflows/overflows in computing Jacobi rotations.
636 *
637  sn = dsqrt( sfmin / epsln )
638  temp1 = dsqrt( big / dble( n ) )
639  IF( ( aapp.LE.sn ) .OR. ( aaqq.GE.temp1 ) .OR.
640  $ ( ( sn.LE.aaqq ) .AND. ( aapp.LE.temp1 ) ) ) THEN
641  temp1 = min( big, temp1 / aapp )
642 * AAQQ = AAQQ*TEMP1
643 * AAPP = AAPP*TEMP1
644  ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.LE.temp1 ) ) THEN
645  temp1 = min( sn / aaqq, big / ( aapp*dsqrt( dble( n ) ) ) )
646 * AAQQ = AAQQ*TEMP1
647 * AAPP = AAPP*TEMP1
648  ELSE IF( ( aaqq.GE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
649  temp1 = max( sn / aaqq, temp1 / aapp )
650 * AAQQ = AAQQ*TEMP1
651 * AAPP = AAPP*TEMP1
652  ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
653  temp1 = min( sn / aaqq, big / ( dsqrt( dble( n ) )*aapp ) )
654 * AAQQ = AAQQ*TEMP1
655 * AAPP = AAPP*TEMP1
656  ELSE
657  temp1 = one
658  END IF
659 *
660 * Scale, if necessary
661 *
662  IF( temp1.NE.one ) THEN
663  CALL dlascl( 'G', 0, 0, one, temp1, n, 1, sva, n, ierr )
664  END IF
665  skl= temp1*skl
666  IF( skl.NE.one ) THEN
667  CALL dlascl( joba, 0, 0, one, skl, m, n, a, lda, ierr )
668  skl= one / skl
669  END IF
670 *
671 * Row-cyclic Jacobi SVD algorithm with column pivoting
672 *
673  emptsw = ( n*( n-1 ) ) / 2
674  notrot = 0
675  fastr( 1 ) = zero
676 *
677 * A is represented in factored form A = A * diag(WORK), where diag(WORK)
678 * is initialized to identity. WORK is updated during fast scaled
679 * rotations.
680 *
681  DO 1868 q = 1, n
682  work( q ) = one
683  1868 CONTINUE
684 *
685 *
686  swband = 3
687 *[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective
688 * if DGESVJ is used as a computational routine in the preconditioned
689 * Jacobi SVD algorithm DGESVJ. For sweeps i=1:SWBAND the procedure
690 * works on pivots inside a band-like region around the diagonal.
691 * The boundaries are determined dynamically, based on the number of
692 * pivots above a threshold.
693 *
694  kbl = min( 8, n )
695 *[TP] KBL is a tuning parameter that defines the tile size in the
696 * tiling of the p-q loops of pivot pairs. In general, an optimal
697 * value of KBL depends on the matrix dimensions and on the
698 * parameters of the computer's memory.
699 *
700  nbl = n / kbl
701  IF( ( nbl*kbl ).NE.n )nbl = nbl + 1
702 *
703  blskip = kbl**2
704 *[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL.
705 *
706  rowskip = min( 5, kbl )
707 *[TP] ROWSKIP is a tuning parameter.
708 *
709  lkahead = 1
710 *[TP] LKAHEAD is a tuning parameter.
711 *
712 * Quasi block transformations, using the lower (upper) triangular
713 * structure of the input matrix. The quasi-block-cycling usually
714 * invokes cubic convergence. Big part of this cycle is done inside
715 * canonical subspaces of dimensions less than M.
716 *
717  IF( ( lower .OR. upper ) .AND. ( n.GT.max( 64, 4*kbl ) ) ) THEN
718 *[TP] The number of partition levels and the actual partition are
719 * tuning parameters.
720  n4 = n / 4
721  n2 = n / 2
722  n34 = 3*n4
723  IF( applv ) THEN
724  q = 0
725  ELSE
726  q = 1
727  END IF
728 *
729  IF( lower ) THEN
730 *
731 * This works very well on lower triangular matrices, in particular
732 * in the framework of the preconditioned Jacobi SVD (xGEJSV).
733 * The idea is simple:
734 * [+ 0 0 0] Note that Jacobi transformations of [0 0]
735 * [+ + 0 0] [0 0]
736 * [+ + x 0] actually work on [x 0] [x 0]
737 * [+ + x x] [x x]. [x x]
738 *
739  CALL dgsvj0( jobv, m-n34, n-n34, a( n34+1, n34+1 ), lda,
740  $ work( n34+1 ), sva( n34+1 ), mvl,
741  $ v( n34*q+1, n34+1 ), ldv, epsln, sfmin, tol,
742  $ 2, work( n+1 ), lwork-n, ierr )
743 *
744  CALL dgsvj0( jobv, m-n2, n34-n2, a( n2+1, n2+1 ), lda,
745  $ work( n2+1 ), sva( n2+1 ), mvl,
746  $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 2,
747  $ work( n+1 ), lwork-n, ierr )
748 *
749  CALL dgsvj1( jobv, m-n2, n-n2, n4, a( n2+1, n2+1 ), lda,
750  $ work( n2+1 ), sva( n2+1 ), mvl,
751  $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
752  $ work( n+1 ), lwork-n, ierr )
753 *
754  CALL dgsvj0( jobv, m-n4, n2-n4, a( n4+1, n4+1 ), lda,
755  $ work( n4+1 ), sva( n4+1 ), mvl,
756  $ v( n4*q+1, n4+1 ), ldv, epsln, sfmin, tol, 1,
757  $ work( n+1 ), lwork-n, ierr )
758 *
759  CALL dgsvj0( jobv, m, n4, a, lda, work, sva, mvl, v, ldv,
760  $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
761  $ ierr )
762 *
763  CALL dgsvj1( jobv, m, n2, n4, a, lda, work, sva, mvl, v,
764  $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
765  $ lwork-n, ierr )
766 *
767 *
768  ELSE IF( upper ) THEN
769 *
770 *
771  CALL dgsvj0( jobv, n4, n4, a, lda, work, sva, mvl, v, ldv,
772  $ epsln, sfmin, tol, 2, work( n+1 ), lwork-n,
773  $ ierr )
774 *
775  CALL dgsvj0( jobv, n2, n4, a( 1, n4+1 ), lda, work( n4+1 ),
776  $ sva( n4+1 ), mvl, v( n4*q+1, n4+1 ), ldv,
777  $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
778  $ ierr )
779 *
780  CALL dgsvj1( jobv, n2, n2, n4, a, lda, work, sva, mvl, v,
781  $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
782  $ lwork-n, ierr )
783 *
784  CALL dgsvj0( jobv, n2+n4, n4, a( 1, n2+1 ), lda,
785  $ work( n2+1 ), sva( n2+1 ), mvl,
786  $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
787  $ work( n+1 ), lwork-n, ierr )
788 
789  END IF
790 *
791  END IF
792 *
793 * .. Row-cyclic pivot strategy with de Rijk's pivoting ..
794 *
795  DO 1993 i = 1, nsweep
796 *
797 * .. go go go ...
798 *
799  mxaapq = zero
800  mxsinj = zero
801  iswrot = 0
802 *
803  notrot = 0
804  pskipped = 0
805 *
806 * Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs
807 * 1 <= p < q <= N. This is the first step toward a blocked implementation
808 * of the rotations. New implementation, based on block transformations,
809 * is under development.
810 *
811  DO 2000 ibr = 1, nbl
812 *
813  igl = ( ibr-1 )*kbl + 1
814 *
815  DO 1002 ir1 = 0, min( lkahead, nbl-ibr )
816 *
817  igl = igl + ir1*kbl
818 *
819  DO 2001 p = igl, min( igl+kbl-1, n-1 )
820 *
821 * .. de Rijk's pivoting
822 *
823  q = idamax( n-p+1, sva( p ), 1 ) + p - 1
824  IF( p.NE.q ) THEN
825  CALL dswap( m, a( 1, p ), 1, a( 1, q ), 1 )
826  IF( rsvec )CALL dswap( mvl, v( 1, p ), 1,
827  $ v( 1, q ), 1 )
828  temp1 = sva( p )
829  sva( p ) = sva( q )
830  sva( q ) = temp1
831  temp1 = work( p )
832  work( p ) = work( q )
833  work( q ) = temp1
834  END IF
835 *
836  IF( ir1.EQ.0 ) THEN
837 *
838 * Column norms are periodically updated by explicit
839 * norm computation.
840 * Caveat:
841 * Unfortunately, some BLAS implementations compute DNRM2(M,A(1,p),1)
842 * as DSQRT(DDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to
843 * overflow for ||A(:,p)||_2 > DSQRT(overflow_threshold), and to
844 * underflow for ||A(:,p)||_2 < DSQRT(underflow_threshold).
845 * Hence, DNRM2 cannot be trusted, not even in the case when
846 * the true norm is far from the under(over)flow boundaries.
847 * If properly implemented DNRM2 is available, the IF-THEN-ELSE
848 * below should read "AAPP = DNRM2( M, A(1,p), 1 ) * WORK(p)".
849 *
850  IF( ( sva( p ).LT.rootbig ) .AND.
851  $ ( sva( p ).GT.rootsfmin ) ) THEN
852  sva( p ) = dnrm2( m, a( 1, p ), 1 )*work( p )
853  ELSE
854  temp1 = zero
855  aapp = one
856  CALL dlassq( m, a( 1, p ), 1, temp1, aapp )
857  sva( p ) = temp1*dsqrt( aapp )*work( p )
858  END IF
859  aapp = sva( p )
860  ELSE
861  aapp = sva( p )
862  END IF
863 *
864  IF( aapp.GT.zero ) THEN
865 *
866  pskipped = 0
867 *
868  DO 2002 q = p + 1, min( igl+kbl-1, n )
869 *
870  aaqq = sva( q )
871 *
872  IF( aaqq.GT.zero ) THEN
873 *
874  aapp0 = aapp
875  IF( aaqq.GE.one ) THEN
876  rotok = ( small*aapp ).LE.aaqq
877  IF( aapp.LT.( big / aaqq ) ) THEN
878  aapq = ( ddot( m, a( 1, p ), 1, a( 1,
879  $ q ), 1 )*work( p )*work( q ) /
880  $ aaqq ) / aapp
881  ELSE
882  CALL dcopy( m, a( 1, p ), 1,
883  $ work( n+1 ), 1 )
884  CALL dlascl( 'G', 0, 0, aapp,
885  $ work( p ), m, 1,
886  $ work( n+1 ), lda, ierr )
887  aapq = ddot( m, work( n+1 ), 1,
888  $ a( 1, q ), 1 )*work( q ) / aaqq
889  END IF
890  ELSE
891  rotok = aapp.LE.( aaqq / small )
892  IF( aapp.GT.( small / aaqq ) ) THEN
893  aapq = ( ddot( m, a( 1, p ), 1, a( 1,
894  $ q ), 1 )*work( p )*work( q ) /
895  $ aaqq ) / aapp
896  ELSE
897  CALL dcopy( m, a( 1, q ), 1,
898  $ work( n+1 ), 1 )
899  CALL dlascl( 'G', 0, 0, aaqq,
900  $ work( q ), m, 1,
901  $ work( n+1 ), lda, ierr )
902  aapq = ddot( m, work( n+1 ), 1,
903  $ a( 1, p ), 1 )*work( p ) / aapp
904  END IF
905  END IF
906 *
907  mxaapq = max( mxaapq, dabs( aapq ) )
908 *
909 * TO rotate or NOT to rotate, THAT is the question ...
910 *
911  IF( dabs( aapq ).GT.tol ) THEN
912 *
913 * .. rotate
914 *[RTD] ROTATED = ROTATED + ONE
915 *
916  IF( ir1.EQ.0 ) THEN
917  notrot = 0
918  pskipped = 0
919  iswrot = iswrot + 1
920  END IF
921 *
922  IF( rotok ) THEN
923 *
924  aqoap = aaqq / aapp
925  apoaq = aapp / aaqq
926  theta = -half*dabs(aqoap-apoaq)/aapq
927 *
928  IF( dabs( theta ).GT.bigtheta ) THEN
929 *
930  t = half / theta
931  fastr( 3 ) = t*work( p ) / work( q )
932  fastr( 4 ) = -t*work( q ) /
933  $ work( p )
934  CALL drotm( m, a( 1, p ), 1,
935  $ a( 1, q ), 1, fastr )
936  IF( rsvec )CALL drotm( mvl,
937  $ v( 1, p ), 1,
938  $ v( 1, q ), 1,
939  $ fastr )
940  sva( q ) = aaqq*dsqrt( max( zero,
941  $ one+t*apoaq*aapq ) )
942  aapp = aapp*dsqrt( max( zero,
943  $ one-t*aqoap*aapq ) )
944  mxsinj = max( mxsinj, dabs( t ) )
945 *
946  ELSE
947 *
948 * .. choose correct signum for THETA and rotate
949 *
950  thsign = -dsign( one, aapq )
951  t = one / ( theta+thsign*
952  $ dsqrt( one+theta*theta ) )
953  cs = dsqrt( one / ( one+t*t ) )
954  sn = t*cs
955 *
956  mxsinj = max( mxsinj, dabs( sn ) )
957  sva( q ) = aaqq*dsqrt( max( zero,
958  $ one+t*apoaq*aapq ) )
959  aapp = aapp*dsqrt( max( zero,
960  $ one-t*aqoap*aapq ) )
961 *
962  apoaq = work( p ) / work( q )
963  aqoap = work( q ) / work( p )
964  IF( work( p ).GE.one ) THEN
965  IF( work( q ).GE.one ) THEN
966  fastr( 3 ) = t*apoaq
967  fastr( 4 ) = -t*aqoap
968  work( p ) = work( p )*cs
969  work( q ) = work( q )*cs
970  CALL drotm( m, a( 1, p ), 1,
971  $ a( 1, q ), 1,
972  $ fastr )
973  IF( rsvec )CALL drotm( mvl,
974  $ v( 1, p ), 1, v( 1, q ),
975  $ 1, fastr )
976  ELSE
977  CALL daxpy( m, -t*aqoap,
978  $ a( 1, q ), 1,
979  $ a( 1, p ), 1 )
980  CALL daxpy( m, cs*sn*apoaq,
981  $ a( 1, p ), 1,
982  $ a( 1, q ), 1 )
983  work( p ) = work( p )*cs
984  work( q ) = work( q ) / cs
985  IF( rsvec ) THEN
986  CALL daxpy( mvl, -t*aqoap,
987  $ v( 1, q ), 1,
988  $ v( 1, p ), 1 )
989  CALL daxpy( mvl,
990  $ cs*sn*apoaq,
991  $ v( 1, p ), 1,
992  $ v( 1, q ), 1 )
993  END IF
994  END IF
995  ELSE
996  IF( work( q ).GE.one ) THEN
997  CALL daxpy( m, t*apoaq,
998  $ a( 1, p ), 1,
999  $ a( 1, q ), 1 )
1000  CALL daxpy( m, -cs*sn*aqoap,
1001  $ a( 1, q ), 1,
1002  $ a( 1, p ), 1 )
1003  work( p ) = work( p ) / cs
1004  work( q ) = work( q )*cs
1005  IF( rsvec ) THEN
1006  CALL daxpy( mvl, t*apoaq,
1007  $ v( 1, p ), 1,
1008  $ v( 1, q ), 1 )
1009  CALL daxpy( mvl,
1010  $ -cs*sn*aqoap,
1011  $ v( 1, q ), 1,
1012  $ v( 1, p ), 1 )
1013  END IF
1014  ELSE
1015  IF( work( p ).GE.work( q ) )
1016  $ THEN
1017  CALL daxpy( m, -t*aqoap,
1018  $ a( 1, q ), 1,
1019  $ a( 1, p ), 1 )
1020  CALL daxpy( m, cs*sn*apoaq,
1021  $ a( 1, p ), 1,
1022  $ a( 1, q ), 1 )
1023  work( p ) = work( p )*cs
1024  work( q ) = work( q ) / cs
1025  IF( rsvec ) THEN
1026  CALL daxpy( mvl,
1027  $ -t*aqoap,
1028  $ v( 1, q ), 1,
1029  $ v( 1, p ), 1 )
1030  CALL daxpy( mvl,
1031  $ cs*sn*apoaq,
1032  $ v( 1, p ), 1,
1033  $ v( 1, q ), 1 )
1034  END IF
1035  ELSE
1036  CALL daxpy( m, t*apoaq,
1037  $ a( 1, p ), 1,
1038  $ a( 1, q ), 1 )
1039  CALL daxpy( m,
1040  $ -cs*sn*aqoap,
1041  $ a( 1, q ), 1,
1042  $ a( 1, p ), 1 )
1043  work( p ) = work( p ) / cs
1044  work( q ) = work( q )*cs
1045  IF( rsvec ) THEN
1046  CALL daxpy( mvl,
1047  $ t*apoaq, v( 1, p ),
1048  $ 1, v( 1, q ), 1 )
1049  CALL daxpy( mvl,
1050  $ -cs*sn*aqoap,
1051  $ v( 1, q ), 1,
1052  $ v( 1, p ), 1 )
1053  END IF
1054  END IF
1055  END IF
1056  END IF
1057  END IF
1058 *
1059  ELSE
1060 * .. have to use modified Gram-Schmidt like transformation
1061  CALL dcopy( m, a( 1, p ), 1,
1062  $ work( n+1 ), 1 )
1063  CALL dlascl( 'G', 0, 0, aapp, one, m,
1064  $ 1, work( n+1 ), lda,
1065  $ ierr )
1066  CALL dlascl( 'G', 0, 0, aaqq, one, m,
1067  $ 1, a( 1, q ), lda, ierr )
1068  temp1 = -aapq*work( p ) / work( q )
1069  CALL daxpy( m, temp1, work( n+1 ), 1,
1070  $ a( 1, q ), 1 )
1071  CALL dlascl( 'G', 0, 0, one, aaqq, m,
1072  $ 1, a( 1, q ), lda, ierr )
1073  sva( q ) = aaqq*dsqrt( max( zero,
1074  $ one-aapq*aapq ) )
1075  mxsinj = max( mxsinj, sfmin )
1076  END IF
1077 * END IF ROTOK THEN ... ELSE
1078 *
1079 * In the case of cancellation in updating SVA(q), SVA(p)
1080 * recompute SVA(q), SVA(p).
1081 *
1082  IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1083  $ THEN
1084  IF( ( aaqq.LT.rootbig ) .AND.
1085  $ ( aaqq.GT.rootsfmin ) ) THEN
1086  sva( q ) = dnrm2( m, a( 1, q ), 1 )*
1087  $ work( q )
1088  ELSE
1089  t = zero
1090  aaqq = one
1091  CALL dlassq( m, a( 1, q ), 1, t,
1092  $ aaqq )
1093  sva( q ) = t*dsqrt( aaqq )*work( q )
1094  END IF
1095  END IF
1096  IF( ( aapp / aapp0 ).LE.rooteps ) THEN
1097  IF( ( aapp.LT.rootbig ) .AND.
1098  $ ( aapp.GT.rootsfmin ) ) THEN
1099  aapp = dnrm2( m, a( 1, p ), 1 )*
1100  $ work( p )
1101  ELSE
1102  t = zero
1103  aapp = one
1104  CALL dlassq( m, a( 1, p ), 1, t,
1105  $ aapp )
1106  aapp = t*dsqrt( aapp )*work( p )
1107  END IF
1108  sva( p ) = aapp
1109  END IF
1110 *
1111  ELSE
1112 * A(:,p) and A(:,q) already numerically orthogonal
1113  IF( ir1.EQ.0 )notrot = notrot + 1
1114 *[RTD] SKIPPED = SKIPPED + 1
1115  pskipped = pskipped + 1
1116  END IF
1117  ELSE
1118 * A(:,q) is zero column
1119  IF( ir1.EQ.0 )notrot = notrot + 1
1120  pskipped = pskipped + 1
1121  END IF
1122 *
1123  IF( ( i.LE.swband ) .AND.
1124  $ ( pskipped.GT.rowskip ) ) THEN
1125  IF( ir1.EQ.0 )aapp = -aapp
1126  notrot = 0
1127  GO TO 2103
1128  END IF
1129 *
1130  2002 CONTINUE
1131 * END q-LOOP
1132 *
1133  2103 CONTINUE
1134 * bailed out of q-loop
1135 *
1136  sva( p ) = aapp
1137 *
1138  ELSE
1139  sva( p ) = aapp
1140  IF( ( ir1.EQ.0 ) .AND. ( aapp.EQ.zero ) )
1141  $ notrot = notrot + min( igl+kbl-1, n ) - p
1142  END IF
1143 *
1144  2001 CONTINUE
1145 * end of the p-loop
1146 * end of doing the block ( ibr, ibr )
1147  1002 CONTINUE
1148 * end of ir1-loop
1149 *
1150 * ... go to the off diagonal blocks
1151 *
1152  igl = ( ibr-1 )*kbl + 1
1153 *
1154  DO 2010 jbc = ibr + 1, nbl
1155 *
1156  jgl = ( jbc-1 )*kbl + 1
1157 *
1158 * doing the block at ( ibr, jbc )
1159 *
1160  ijblsk = 0
1161  DO 2100 p = igl, min( igl+kbl-1, n )
1162 *
1163  aapp = sva( p )
1164  IF( aapp.GT.zero ) THEN
1165 *
1166  pskipped = 0
1167 *
1168  DO 2200 q = jgl, min( jgl+kbl-1, n )
1169 *
1170  aaqq = sva( q )
1171  IF( aaqq.GT.zero ) THEN
1172  aapp0 = aapp
1173 *
1174 * .. M x 2 Jacobi SVD ..
1175 *
1176 * Safe Gram matrix computation
1177 *
1178  IF( aaqq.GE.one ) THEN
1179  IF( aapp.GE.aaqq ) THEN
1180  rotok = ( small*aapp ).LE.aaqq
1181  ELSE
1182  rotok = ( small*aaqq ).LE.aapp
1183  END IF
1184  IF( aapp.LT.( big / aaqq ) ) THEN
1185  aapq = ( ddot( m, a( 1, p ), 1, a( 1,
1186  $ q ), 1 )*work( p )*work( q ) /
1187  $ aaqq ) / aapp
1188  ELSE
1189  CALL dcopy( m, a( 1, p ), 1,
1190  $ work( n+1 ), 1 )
1191  CALL dlascl( 'G', 0, 0, aapp,
1192  $ work( p ), m, 1,
1193  $ work( n+1 ), lda, ierr )
1194  aapq = ddot( m, work( n+1 ), 1,
1195  $ a( 1, q ), 1 )*work( q ) / aaqq
1196  END IF
1197  ELSE
1198  IF( aapp.GE.aaqq ) THEN
1199  rotok = aapp.LE.( aaqq / small )
1200  ELSE
1201  rotok = aaqq.LE.( aapp / small )
1202  END IF
1203  IF( aapp.GT.( small / aaqq ) ) THEN
1204  aapq = ( ddot( m, a( 1, p ), 1, a( 1,
1205  $ q ), 1 )*work( p )*work( q ) /
1206  $ aaqq ) / aapp
1207  ELSE
1208  CALL dcopy( m, a( 1, q ), 1,
1209  $ work( n+1 ), 1 )
1210  CALL dlascl( 'G', 0, 0, aaqq,
1211  $ work( q ), m, 1,
1212  $ work( n+1 ), lda, ierr )
1213  aapq = ddot( m, work( n+1 ), 1,
1214  $ a( 1, p ), 1 )*work( p ) / aapp
1215  END IF
1216  END IF
1217 *
1218  mxaapq = max( mxaapq, dabs( aapq ) )
1219 *
1220 * TO rotate or NOT to rotate, THAT is the question ...
1221 *
1222  IF( dabs( aapq ).GT.tol ) THEN
1223  notrot = 0
1224 *[RTD] ROTATED = ROTATED + 1
1225  pskipped = 0
1226  iswrot = iswrot + 1
1227 *
1228  IF( rotok ) THEN
1229 *
1230  aqoap = aaqq / aapp
1231  apoaq = aapp / aaqq
1232  theta = -half*dabs(aqoap-apoaq)/aapq
1233  IF( aaqq.GT.aapp0 )theta = -theta
1234 *
1235  IF( dabs( theta ).GT.bigtheta ) THEN
1236  t = half / theta
1237  fastr( 3 ) = t*work( p ) / work( q )
1238  fastr( 4 ) = -t*work( q ) /
1239  $ work( p )
1240  CALL drotm( m, a( 1, p ), 1,
1241  $ a( 1, q ), 1, fastr )
1242  IF( rsvec )CALL drotm( mvl,
1243  $ v( 1, p ), 1,
1244  $ v( 1, q ), 1,
1245  $ fastr )
1246  sva( q ) = aaqq*dsqrt( max( zero,
1247  $ one+t*apoaq*aapq ) )
1248  aapp = aapp*dsqrt( max( zero,
1249  $ one-t*aqoap*aapq ) )
1250  mxsinj = max( mxsinj, dabs( t ) )
1251  ELSE
1252 *
1253 * .. choose correct signum for THETA and rotate
1254 *
1255  thsign = -dsign( one, aapq )
1256  IF( aaqq.GT.aapp0 )thsign = -thsign
1257  t = one / ( theta+thsign*
1258  $ dsqrt( one+theta*theta ) )
1259  cs = dsqrt( one / ( one+t*t ) )
1260  sn = t*cs
1261  mxsinj = max( mxsinj, dabs( sn ) )
1262  sva( q ) = aaqq*dsqrt( max( zero,
1263  $ one+t*apoaq*aapq ) )
1264  aapp = aapp*dsqrt( max( zero,
1265  $ one-t*aqoap*aapq ) )
1266 *
1267  apoaq = work( p ) / work( q )
1268  aqoap = work( q ) / work( p )
1269  IF( work( p ).GE.one ) THEN
1270 *
1271  IF( work( q ).GE.one ) THEN
1272  fastr( 3 ) = t*apoaq
1273  fastr( 4 ) = -t*aqoap
1274  work( p ) = work( p )*cs
1275  work( q ) = work( q )*cs
1276  CALL drotm( m, a( 1, p ), 1,
1277  $ a( 1, q ), 1,
1278  $ fastr )
1279  IF( rsvec )CALL drotm( mvl,
1280  $ v( 1, p ), 1, v( 1, q ),
1281  $ 1, fastr )
1282  ELSE
1283  CALL daxpy( m, -t*aqoap,
1284  $ a( 1, q ), 1,
1285  $ a( 1, p ), 1 )
1286  CALL daxpy( m, cs*sn*apoaq,
1287  $ a( 1, p ), 1,
1288  $ a( 1, q ), 1 )
1289  IF( rsvec ) THEN
1290  CALL daxpy( mvl, -t*aqoap,
1291  $ v( 1, q ), 1,
1292  $ v( 1, p ), 1 )
1293  CALL daxpy( mvl,
1294  $ cs*sn*apoaq,
1295  $ v( 1, p ), 1,
1296  $ v( 1, q ), 1 )
1297  END IF
1298  work( p ) = work( p )*cs
1299  work( q ) = work( q ) / cs
1300  END IF
1301  ELSE
1302  IF( work( q ).GE.one ) THEN
1303  CALL daxpy( m, t*apoaq,
1304  $ a( 1, p ), 1,
1305  $ a( 1, q ), 1 )
1306  CALL daxpy( m, -cs*sn*aqoap,
1307  $ a( 1, q ), 1,
1308  $ a( 1, p ), 1 )
1309  IF( rsvec ) THEN
1310  CALL daxpy( mvl, t*apoaq,
1311  $ v( 1, p ), 1,
1312  $ v( 1, q ), 1 )
1313  CALL daxpy( mvl,
1314  $ -cs*sn*aqoap,
1315  $ v( 1, q ), 1,
1316  $ v( 1, p ), 1 )
1317  END IF
1318  work( p ) = work( p ) / cs
1319  work( q ) = work( q )*cs
1320  ELSE
1321  IF( work( p ).GE.work( q ) )
1322  $ THEN
1323  CALL daxpy( m, -t*aqoap,
1324  $ a( 1, q ), 1,
1325  $ a( 1, p ), 1 )
1326  CALL daxpy( m, cs*sn*apoaq,
1327  $ a( 1, p ), 1,
1328  $ a( 1, q ), 1 )
1329  work( p ) = work( p )*cs
1330  work( q ) = work( q ) / cs
1331  IF( rsvec ) THEN
1332  CALL daxpy( mvl,
1333  $ -t*aqoap,
1334  $ v( 1, q ), 1,
1335  $ v( 1, p ), 1 )
1336  CALL daxpy( mvl,
1337  $ cs*sn*apoaq,
1338  $ v( 1, p ), 1,
1339  $ v( 1, q ), 1 )
1340  END IF
1341  ELSE
1342  CALL daxpy( m, t*apoaq,
1343  $ a( 1, p ), 1,
1344  $ a( 1, q ), 1 )
1345  CALL daxpy( m,
1346  $ -cs*sn*aqoap,
1347  $ a( 1, q ), 1,
1348  $ a( 1, p ), 1 )
1349  work( p ) = work( p ) / cs
1350  work( q ) = work( q )*cs
1351  IF( rsvec ) THEN
1352  CALL daxpy( mvl,
1353  $ t*apoaq, v( 1, p ),
1354  $ 1, v( 1, q ), 1 )
1355  CALL daxpy( mvl,
1356  $ -cs*sn*aqoap,
1357  $ v( 1, q ), 1,
1358  $ v( 1, p ), 1 )
1359  END IF
1360  END IF
1361  END IF
1362  END IF
1363  END IF
1364 *
1365  ELSE
1366  IF( aapp.GT.aaqq ) THEN
1367  CALL dcopy( m, a( 1, p ), 1,
1368  $ work( n+1 ), 1 )
1369  CALL dlascl( 'G', 0, 0, aapp, one,
1370  $ m, 1, work( n+1 ), lda,
1371  $ ierr )
1372  CALL dlascl( 'G', 0, 0, aaqq, one,
1373  $ m, 1, a( 1, q ), lda,
1374  $ ierr )
1375  temp1 = -aapq*work( p ) / work( q )
1376  CALL daxpy( m, temp1, work( n+1 ),
1377  $ 1, a( 1, q ), 1 )
1378  CALL dlascl( 'G', 0, 0, one, aaqq,
1379  $ m, 1, a( 1, q ), lda,
1380  $ ierr )
1381  sva( q ) = aaqq*dsqrt( max( zero,
1382  $ one-aapq*aapq ) )
1383  mxsinj = max( mxsinj, sfmin )
1384  ELSE
1385  CALL dcopy( m, a( 1, q ), 1,
1386  $ work( n+1 ), 1 )
1387  CALL dlascl( 'G', 0, 0, aaqq, one,
1388  $ m, 1, work( n+1 ), lda,
1389  $ ierr )
1390  CALL dlascl( 'G', 0, 0, aapp, one,
1391  $ m, 1, a( 1, p ), lda,
1392  $ ierr )
1393  temp1 = -aapq*work( q ) / work( p )
1394  CALL daxpy( m, temp1, work( n+1 ),
1395  $ 1, a( 1, p ), 1 )
1396  CALL dlascl( 'G', 0, 0, one, aapp,
1397  $ m, 1, a( 1, p ), lda,
1398  $ ierr )
1399  sva( p ) = aapp*dsqrt( max( zero,
1400  $ one-aapq*aapq ) )
1401  mxsinj = max( mxsinj, sfmin )
1402  END IF
1403  END IF
1404 * END IF ROTOK THEN ... ELSE
1405 *
1406 * In the case of cancellation in updating SVA(q)
1407 * .. recompute SVA(q)
1408  IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1409  $ THEN
1410  IF( ( aaqq.LT.rootbig ) .AND.
1411  $ ( aaqq.GT.rootsfmin ) ) THEN
1412  sva( q ) = dnrm2( m, a( 1, q ), 1 )*
1413  $ work( q )
1414  ELSE
1415  t = zero
1416  aaqq = one
1417  CALL dlassq( m, a( 1, q ), 1, t,
1418  $ aaqq )
1419  sva( q ) = t*dsqrt( aaqq )*work( q )
1420  END IF
1421  END IF
1422  IF( ( aapp / aapp0 )**2.LE.rooteps ) THEN
1423  IF( ( aapp.LT.rootbig ) .AND.
1424  $ ( aapp.GT.rootsfmin ) ) THEN
1425  aapp = dnrm2( m, a( 1, p ), 1 )*
1426  $ work( p )
1427  ELSE
1428  t = zero
1429  aapp = one
1430  CALL dlassq( m, a( 1, p ), 1, t,
1431  $ aapp )
1432  aapp = t*dsqrt( aapp )*work( p )
1433  END IF
1434  sva( p ) = aapp
1435  END IF
1436 * end of OK rotation
1437  ELSE
1438  notrot = notrot + 1
1439 *[RTD] SKIPPED = SKIPPED + 1
1440  pskipped = pskipped + 1
1441  ijblsk = ijblsk + 1
1442  END IF
1443  ELSE
1444  notrot = notrot + 1
1445  pskipped = pskipped + 1
1446  ijblsk = ijblsk + 1
1447  END IF
1448 *
1449  IF( ( i.LE.swband ) .AND. ( ijblsk.GE.blskip ) )
1450  $ THEN
1451  sva( p ) = aapp
1452  notrot = 0
1453  GO TO 2011
1454  END IF
1455  IF( ( i.LE.swband ) .AND.
1456  $ ( pskipped.GT.rowskip ) ) THEN
1457  aapp = -aapp
1458  notrot = 0
1459  GO TO 2203
1460  END IF
1461 *
1462  2200 CONTINUE
1463 * end of the q-loop
1464  2203 CONTINUE
1465 *
1466  sva( p ) = aapp
1467 *
1468  ELSE
1469 *
1470  IF( aapp.EQ.zero )notrot = notrot +
1471  $ min( jgl+kbl-1, n ) - jgl + 1
1472  IF( aapp.LT.zero )notrot = 0
1473 *
1474  END IF
1475 *
1476  2100 CONTINUE
1477 * end of the p-loop
1478  2010 CONTINUE
1479 * end of the jbc-loop
1480  2011 CONTINUE
1481 *2011 bailed out of the jbc-loop
1482  DO 2012 p = igl, min( igl+kbl-1, n )
1483  sva( p ) = dabs( sva( p ) )
1484  2012 CONTINUE
1485 ***
1486  2000 CONTINUE
1487 *2000 :: end of the ibr-loop
1488 *
1489 * .. update SVA(N)
1490  IF( ( sva( n ).LT.rootbig ) .AND. ( sva( n ).GT.rootsfmin ) )
1491  $ THEN
1492  sva( n ) = dnrm2( m, a( 1, n ), 1 )*work( n )
1493  ELSE
1494  t = zero
1495  aapp = one
1496  CALL dlassq( m, a( 1, n ), 1, t, aapp )
1497  sva( n ) = t*dsqrt( aapp )*work( n )
1498  END IF
1499 *
1500 * Additional steering devices
1501 *
1502  IF( ( i.LT.swband ) .AND. ( ( mxaapq.LE.roottol ) .OR.
1503  $ ( iswrot.LE.n ) ) )swband = i
1504 *
1505  IF( ( i.GT.swband+1 ) .AND. ( mxaapq.LT.dsqrt( dble( n ) )*
1506  $ tol ) .AND. ( dble( n )*mxaapq*mxsinj.LT.tol ) ) THEN
1507  GO TO 1994
1508  END IF
1509 *
1510  IF( notrot.GE.emptsw )GO TO 1994
1511 *
1512  1993 CONTINUE
1513 * end i=1:NSWEEP loop
1514 *
1515 * #:( Reaching this point means that the procedure has not converged.
1516  info = nsweep - 1
1517  GO TO 1995
1518 *
1519  1994 CONTINUE
1520 * #:) Reaching this point means numerical convergence after the i-th
1521 * sweep.
1522 *
1523  info = 0
1524 * #:) INFO = 0 confirms successful iterations.
1525  1995 CONTINUE
1526 *
1527 * Sort the singular values and find how many are above
1528 * the underflow threshold.
1529 *
1530  n2 = 0
1531  n4 = 0
1532  DO 5991 p = 1, n - 1
1533  q = idamax( n-p+1, sva( p ), 1 ) + p - 1
1534  IF( p.NE.q ) THEN
1535  temp1 = sva( p )
1536  sva( p ) = sva( q )
1537  sva( q ) = temp1
1538  temp1 = work( p )
1539  work( p ) = work( q )
1540  work( q ) = temp1
1541  CALL dswap( m, a( 1, p ), 1, a( 1, q ), 1 )
1542  IF( rsvec )CALL dswap( mvl, v( 1, p ), 1, v( 1, q ), 1 )
1543  END IF
1544  IF( sva( p ).NE.zero ) THEN
1545  n4 = n4 + 1
1546  IF( sva( p )*skl.GT.sfmin )n2 = n2 + 1
1547  END IF
1548  5991 CONTINUE
1549  IF( sva( n ).NE.zero ) THEN
1550  n4 = n4 + 1
1551  IF( sva( n )*skl.GT.sfmin )n2 = n2 + 1
1552  END IF
1553 *
1554 * Normalize the left singular vectors.
1555 *
1556  IF( lsvec .OR. uctol ) THEN
1557  DO 1998 p = 1, n2
1558  CALL dscal( m, work( p ) / sva( p ), a( 1, p ), 1 )
1559  1998 CONTINUE
1560  END IF
1561 *
1562 * Scale the product of Jacobi rotations (assemble the fast rotations).
1563 *
1564  IF( rsvec ) THEN
1565  IF( applv ) THEN
1566  DO 2398 p = 1, n
1567  CALL dscal( mvl, work( p ), v( 1, p ), 1 )
1568  2398 CONTINUE
1569  ELSE
1570  DO 2399 p = 1, n
1571  temp1 = one / dnrm2( mvl, v( 1, p ), 1 )
1572  CALL dscal( mvl, temp1, v( 1, p ), 1 )
1573  2399 CONTINUE
1574  END IF
1575  END IF
1576 *
1577 * Undo scaling, if necessary (and possible).
1578  IF( ( ( skl.GT.one ) .AND. ( sva( 1 ).LT.( big / skl) ) )
1579  $ .OR. ( ( skl.LT.one ) .AND. ( sva( max( n2, 1 ) ) .GT.
1580  $ ( sfmin / skl) ) ) ) THEN
1581  DO 2400 p = 1, n
1582  sva( p ) = skl*sva( p )
1583  2400 CONTINUE
1584  skl= one
1585  END IF
1586 *
1587  work( 1 ) = skl
1588 * The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE
1589 * then some of the singular values may overflow or underflow and
1590 * the spectrum is given in this factored representation.
1591 *
1592  work( 2 ) = dble( n4 )
1593 * N4 is the number of computed nonzero singular values of A.
1594 *
1595  work( 3 ) = dble( n2 )
1596 * N2 is the number of singular values of A greater than SFMIN.
1597 * If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers
1598 * that may carry some information.
1599 *
1600  work( 4 ) = dble( i )
1601 * i is the index of the last sweep before declaring convergence.
1602 *
1603  work( 5 ) = mxaapq
1604 * MXAAPQ is the largest absolute value of scaled pivots in the
1605 * last sweep
1606 *
1607  work( 6 ) = mxsinj
1608 * MXSINJ is the largest absolute value of the sines of Jacobi angles
1609 * in the last sweep
1610 *
1611  RETURN
1612 * ..
1613 * .. END OF DGESVJ
1614 * ..
1615  END
subroutine dlaset(UPLO, M, N, ALPHA, BETA, A, LDA)
DLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values...
Definition: dlaset.f:112
subroutine dgsvj0(JOBV, M, N, A, LDA, D, SVA, MV, V, LDV, EPS, SFMIN, TOL, NSWEEP, WORK, LWORK, INFO)
DGSVJ0 pre-processor for the routine dgesvj.
Definition: dgsvj0.f:220
subroutine dcopy(N, DX, INCX, DY, INCY)
DCOPY
Definition: dcopy.f:53
subroutine dgesvj(JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, LDV, WORK, LWORK, INFO)
DGESVJ
Definition: dgesvj.f:339
subroutine dlascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
DLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: dlascl.f:145
subroutine daxpy(N, DA, DX, INCX, DY, INCY)
DAXPY
Definition: daxpy.f:54
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:62
subroutine dswap(N, DX, INCX, DY, INCY)
DSWAP
Definition: dswap.f:53
subroutine drotm(N, DX, INCX, DY, INCY, DPARAM)
DROTM
Definition: drotm.f:100
subroutine dscal(N, DA, DX, INCX)
DSCAL
Definition: dscal.f:55
subroutine dlassq(N, X, INCX, SCALE, SUMSQ)
DLASSQ updates a sum of squares represented in scaled form.
Definition: dlassq.f:105
subroutine dgsvj1(JOBV, M, N, N1, A, LDA, D, SVA, MV, V, LDV, EPS, SFMIN, TOL, NSWEEP, WORK, LWORK, INFO)
DGSVJ1 pre-processor for the routine dgesvj, applies Jacobi rotations targeting only particular pivot...
Definition: dgsvj1.f:238