LAPACK 3.12.0
LAPACK: Linear Algebra PACKage
Loading...
Searching...
No Matches
dgesvj.f
Go to the documentation of this file.
1*> \brief \b DGESVJ
2*
3* =========== DOCUMENTATION ===========
4*
5* Online html documentation available at
6* http://www.netlib.org/lapack/explore-html/
7*
8*> \htmlonly
9*> Download DGESVJ + dependencies
10*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/dgesvj.f">
11*> [TGZ]</a>
12*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/dgesvj.f">
13*> [ZIP]</a>
14*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/dgesvj.f">
15*> [TXT]</a>
16*> \endhtmlonly
17*
18* Definition:
19* ===========
20*
21* SUBROUTINE DGESVJ( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V,
22* LDV, WORK, LWORK, INFO )
23*
24* .. Scalar Arguments ..
25* INTEGER INFO, LDA, LDV, LWORK, M, MV, N
26* CHARACTER*1 JOBA, JOBU, JOBV
27* ..
28* .. Array Arguments ..
29* DOUBLE PRECISION A( LDA, * ), SVA( N ), V( LDV, * ),
30* $ WORK( LWORK )
31* ..
32*
33*
34*> \par Purpose:
35* =============
36*>
37*> \verbatim
38*>
39*> DGESVJ computes the singular value decomposition (SVD) of a real
40*> M-by-N matrix A, where M >= N. The SVD of A is written as
41*> [++] [xx] [x0] [xx]
42*> A = U * SIGMA * V^t, [++] = [xx] * [ox] * [xx]
43*> [++] [xx]
44*> where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal
45*> matrix, and V is an N-by-N orthogonal matrix. The diagonal elements
46*> of SIGMA are the singular values of A. The columns of U and V are the
47*> left and the right singular vectors of A, respectively.
48*> DGESVJ can sometimes compute tiny singular values and their singular vectors much
49*> more accurately than other SVD routines, see below under Further Details.
50*> \endverbatim
51*
52* Arguments:
53* ==========
54*
55*> \param[in] JOBA
56*> \verbatim
57*> JOBA is CHARACTER*1
58*> Specifies the structure of A.
59*> = 'L': The input matrix A is lower triangular;
60*> = 'U': The input matrix A is upper triangular;
61*> = 'G': The input matrix A is general M-by-N matrix, M >= N.
62*> \endverbatim
63*>
64*> \param[in] JOBU
65*> \verbatim
66*> JOBU is CHARACTER*1
67*> Specifies whether to compute the left singular vectors
68*> (columns of U):
69*> = 'U': The left singular vectors corresponding to the nonzero
70*> singular values are computed and returned in the leading
71*> columns of A. See more details in the description of A.
72*> The default numerical orthogonality threshold is set to
73*> approximately TOL=CTOL*EPS, CTOL=DSQRT(M), EPS=DLAMCH('E').
74*> = 'C': Analogous to JOBU='U', except that user can control the
75*> level of numerical orthogonality of the computed left
76*> singular vectors. TOL can be set to TOL = CTOL*EPS, where
77*> CTOL is given on input in the array WORK.
78*> No CTOL smaller than ONE is allowed. CTOL greater
79*> than 1 / EPS is meaningless. The option 'C'
80*> can be used if M*EPS is satisfactory orthogonality
81*> of the computed left singular vectors, so CTOL=M could
82*> save few sweeps of Jacobi rotations.
83*> See the descriptions of A and WORK(1).
84*> = 'N': The matrix U is not computed. However, see the
85*> description of A.
86*> \endverbatim
87*>
88*> \param[in] JOBV
89*> \verbatim
90*> JOBV is CHARACTER*1
91*> Specifies whether to compute the right singular vectors, that
92*> is, the matrix V:
93*> = 'V': the matrix V is computed and returned in the array V
94*> = 'A': the Jacobi rotations are applied to the MV-by-N
95*> array V. In other words, the right singular vector
96*> matrix V is not computed explicitly, instead it is
97*> applied to an MV-by-N matrix initially stored in the
98*> first MV rows of V.
99*> = 'N': the matrix V is not computed and the array V is not
100*> referenced
101*> \endverbatim
102*>
103*> \param[in] M
104*> \verbatim
105*> M is INTEGER
106*> The number of rows of the input matrix A. 1/DLAMCH('E') > M >= 0.
107*> \endverbatim
108*>
109*> \param[in] N
110*> \verbatim
111*> N is INTEGER
112*> The number of columns of the input matrix A.
113*> M >= N >= 0.
114*> \endverbatim
115*>
116*> \param[in,out] A
117*> \verbatim
118*> A is DOUBLE PRECISION array, dimension (LDA,N)
119*> On entry, the M-by-N matrix A.
120*> On exit :
121*> If JOBU = 'U' .OR. JOBU = 'C' :
122*> If INFO = 0 :
123*> RANKA orthonormal columns of U are returned in the
124*> leading RANKA columns of the array A. Here RANKA <= N
125*> is the number of computed singular values of A that are
126*> above the underflow threshold DLAMCH('S'). The singular
127*> vectors corresponding to underflowed or zero singular
128*> values are not computed. The value of RANKA is returned
129*> in the array WORK as RANKA=NINT(WORK(2)). Also see the
130*> descriptions of SVA and WORK. The computed columns of U
131*> are mutually numerically orthogonal up to approximately
132*> TOL=DSQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU = 'C'),
133*> see the description of JOBU.
134*> If INFO > 0 :
135*> the procedure DGESVJ did not converge in the given number
136*> of iterations (sweeps). In that case, the computed
137*> columns of U may not be orthogonal up to TOL. The output
138*> U (stored in A), SIGMA (given by the computed singular
139*> values in SVA(1:N)) and V is still a decomposition of the
140*> input matrix A in the sense that the residual
141*> ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small.
142*>
143*> If JOBU = 'N' :
144*> If INFO = 0 :
145*> Note that the left singular vectors are 'for free' in the
146*> one-sided Jacobi SVD algorithm. However, if only the
147*> singular values are needed, the level of numerical
148*> orthogonality of U is not an issue and iterations are
149*> stopped when the columns of the iterated matrix are
150*> numerically orthogonal up to approximately M*EPS. Thus,
151*> on exit, A contains the columns of U scaled with the
152*> corresponding singular values.
153*> If INFO > 0 :
154*> the procedure DGESVJ did not converge in the given number
155*> of iterations (sweeps).
156*> \endverbatim
157*>
158*> \param[in] LDA
159*> \verbatim
160*> LDA is INTEGER
161*> The leading dimension of the array A. LDA >= max(1,M).
162*> \endverbatim
163*>
164*> \param[out] SVA
165*> \verbatim
166*> SVA is DOUBLE PRECISION array, dimension (N)
167*> On exit :
168*> If INFO = 0 :
169*> depending on the value SCALE = WORK(1), we have:
170*> If SCALE = ONE :
171*> SVA(1:N) contains the computed singular values of A.
172*> During the computation SVA contains the Euclidean column
173*> norms of the iterated matrices in the array A.
174*> If SCALE .NE. ONE :
175*> The singular values of A are SCALE*SVA(1:N), and this
176*> factored representation is due to the fact that some of the
177*> singular values of A might underflow or overflow.
178*> If INFO > 0 :
179*> the procedure DGESVJ did not converge in the given number of
180*> iterations (sweeps) and SCALE*SVA(1:N) may not be accurate.
181*> \endverbatim
182*>
183*> \param[in] MV
184*> \verbatim
185*> MV is INTEGER
186*> If JOBV = 'A', then the product of Jacobi rotations in DGESVJ
187*> is applied to the first MV rows of V. See the description of JOBV.
188*> \endverbatim
189*>
190*> \param[in,out] V
191*> \verbatim
192*> V is DOUBLE PRECISION array, dimension (LDV,N)
193*> If JOBV = 'V', then V contains on exit the N-by-N matrix of
194*> the right singular vectors;
195*> If JOBV = 'A', then V contains the product of the computed right
196*> singular vector matrix and the initial matrix in
197*> the array V.
198*> If JOBV = 'N', then V is not referenced.
199*> \endverbatim
200*>
201*> \param[in] LDV
202*> \verbatim
203*> LDV is INTEGER
204*> The leading dimension of the array V, LDV >= 1.
205*> If JOBV = 'V', then LDV >= max(1,N).
206*> If JOBV = 'A', then LDV >= max(1,MV) .
207*> \endverbatim
208*>
209*> \param[in,out] WORK
210*> \verbatim
211*> WORK is DOUBLE PRECISION array, dimension (LWORK)
212*> On entry :
213*> If JOBU = 'C' :
214*> WORK(1) = CTOL, where CTOL defines the threshold for convergence.
215*> The process stops if all columns of A are mutually
216*> orthogonal up to CTOL*EPS, EPS=DLAMCH('E').
217*> It is required that CTOL >= ONE, i.e. it is not
218*> allowed to force the routine to obtain orthogonality
219*> below EPS.
220*> On exit :
221*> WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N)
222*> are the computed singular values of A.
223*> (See description of SVA().)
224*> WORK(2) = NINT(WORK(2)) is the number of the computed nonzero
225*> singular values.
226*> WORK(3) = NINT(WORK(3)) is the number of the computed singular
227*> values that are larger than the underflow threshold.
228*> WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi
229*> rotations needed for numerical convergence.
230*> WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep.
231*> This is useful information in cases when DGESVJ did
232*> not converge, as it can be used to estimate whether
233*> the output is still useful and for post festum analysis.
234*> WORK(6) = the largest absolute value over all sines of the
235*> Jacobi rotation angles in the last sweep. It can be
236*> useful for a post festum analysis.
237*> \endverbatim
238*>
239*> \param[in] LWORK
240*> \verbatim
241*> LWORK is INTEGER
242*> length of WORK, WORK >= MAX(6,M+N)
243*> \endverbatim
244*>
245*> \param[out] INFO
246*> \verbatim
247*> INFO is INTEGER
248*> = 0: successful exit.
249*> < 0: if INFO = -i, then the i-th argument had an illegal value
250*> > 0: DGESVJ did not converge in the maximal allowed number (30)
251*> of sweeps. The output may still be useful. See the
252*> description of WORK.
253*> \endverbatim
254*
255* Authors:
256* ========
257*
258*> \author Univ. of Tennessee
259*> \author Univ. of California Berkeley
260*> \author Univ. of Colorado Denver
261*> \author NAG Ltd.
262*
263*> \ingroup gesvj
264*
265*> \par Further Details:
266* =====================
267*>
268*> \verbatim
269*>
270*> The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane
271*> rotations. The rotations are implemented as fast scaled rotations of
272*> Anda and Park [1]. In the case of underflow of the Jacobi angle, a
273*> modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses
274*> column interchanges of de Rijk [2]. The relative accuracy of the computed
275*> singular values and the accuracy of the computed singular vectors (in
276*> angle metric) is as guaranteed by the theory of Demmel and Veselic [3].
277*> The condition number that determines the accuracy in the full rank case
278*> is essentially min_{D=diag} kappa(A*D), where kappa(.) is the
279*> spectral condition number. The best performance of this Jacobi SVD
280*> procedure is achieved if used in an accelerated version of Drmac and
281*> Veselic [5,6], and it is the kernel routine in the SIGMA library [7].
282*> Some tuning parameters (marked with [TP]) are available for the
283*> implementer.
284*> The computational range for the nonzero singular values is the machine
285*> number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even
286*> denormalized singular values can be computed with the corresponding
287*> gradual loss of accurate digits.
288*> \endverbatim
289*
290*> \par Contributors:
291* ==================
292*>
293*> \verbatim
294*>
295*> ============
296*>
297*> Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
298*> \endverbatim
299*
300*> \par References:
301* ================
302*>
303*> \verbatim
304*>
305*> [1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling.
306*> SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174.
307*> [2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the
308*> singular value decomposition on a vector computer.
309*> SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371.
310*> [3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR.
311*> [4] Z. Drmac: Implementation of Jacobi rotations for accurate singular
312*> value computation in floating point arithmetic.
313*> SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222.
314*> [5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
315*> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
316*> LAPACK Working note 169.
317*> [6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
318*> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
319*> LAPACK Working note 170.
320*> [7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
321*> QSVD, (H,K)-SVD computations.
322*> Department of Mathematics, University of Zagreb, 2008.
323*> \endverbatim
324*
325*> \par Bugs, examples and comments:
326* =================================
327*>
328*> \verbatim
329*> ===========================
330*> Please report all bugs and send interesting test examples and comments to
331*> drmac@math.hr. Thank you.
332*> \endverbatim
333*>
334* =====================================================================
335 SUBROUTINE dgesvj( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V,
336 $ LDV, WORK, LWORK, INFO )
337*
338* -- LAPACK computational routine --
339* -- LAPACK is a software package provided by Univ. of Tennessee, --
340* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
341*
342* .. Scalar Arguments ..
343 INTEGER INFO, LDA, LDV, LWORK, M, MV, N
344 CHARACTER*1 JOBA, JOBU, JOBV
345* ..
346* .. Array Arguments ..
347 DOUBLE PRECISION A( LDA, * ), SVA( N ), V( LDV, * ),
348 $ work( lwork )
349* ..
350*
351* =====================================================================
352*
353* .. Local Parameters ..
354 DOUBLE PRECISION ZERO, HALF, ONE
355 parameter( zero = 0.0d0, half = 0.5d0, one = 1.0d0)
356 INTEGER NSWEEP
357 parameter( nsweep = 30 )
358* ..
359* .. Local Scalars ..
360 DOUBLE PRECISION AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG,
361 $ bigtheta, cs, ctol, epsln, large, mxaapq,
362 $ mxsinj, rootbig, rooteps, rootsfmin, roottol,
363 $ skl, sfmin, small, sn, t, temp1, theta,
364 $ thsign, tol
365 INTEGER BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1,
366 $ iswrot, jbc, jgl, kbl, lkahead, mvl, n2, n34,
367 $ n4, nbl, notrot, p, pskipped, q, rowskip,
368 $ swband
369 LOGICAL APPLV, GOSCALE, LOWER, LSVEC, NOSCALE, ROTOK,
370 $ rsvec, uctol, upper
371* ..
372* .. Local Arrays ..
373 DOUBLE PRECISION FASTR( 5 )
374* ..
375* .. Intrinsic Functions ..
376 INTRINSIC dabs, max, min, dble, dsign, dsqrt
377* ..
378* .. External Functions ..
379* ..
380* from BLAS
381 DOUBLE PRECISION DDOT, DNRM2
382 EXTERNAL ddot, dnrm2
383 INTEGER IDAMAX
384 EXTERNAL idamax
385* from LAPACK
386 DOUBLE PRECISION DLAMCH
387 EXTERNAL dlamch
388 LOGICAL LSAME
389 EXTERNAL lsame
390* ..
391* .. External Subroutines ..
392* ..
393* from BLAS
394 EXTERNAL daxpy, dcopy, drotm, dscal, dswap
395* from LAPACK
396 EXTERNAL dlascl, dlaset, dlassq, xerbla
397*
398 EXTERNAL dgsvj0, dgsvj1
399* ..
400* .. Executable Statements ..
401*
402* Test the input arguments
403*
404 lsvec = lsame( jobu, 'U' )
405 uctol = lsame( jobu, 'C' )
406 rsvec = lsame( jobv, 'V' )
407 applv = lsame( jobv, 'A' )
408 upper = lsame( joba, 'U' )
409 lower = lsame( joba, 'L' )
410*
411 IF( .NOT.( upper .OR. lower .OR. lsame( joba, 'G' ) ) ) THEN
412 info = -1
413 ELSE IF( .NOT.( lsvec .OR. uctol .OR. lsame( jobu, 'N' ) ) ) THEN
414 info = -2
415 ELSE IF( .NOT.( rsvec .OR. applv .OR. lsame( jobv, 'N' ) ) ) THEN
416 info = -3
417 ELSE IF( m.LT.0 ) THEN
418 info = -4
419 ELSE IF( ( n.LT.0 ) .OR. ( n.GT.m ) ) THEN
420 info = -5
421 ELSE IF( lda.LT.m ) THEN
422 info = -7
423 ELSE IF( mv.LT.0 ) THEN
424 info = -9
425 ELSE IF( ( rsvec .AND. ( ldv.LT.n ) ) .OR.
426 $ ( applv .AND. ( ldv.LT.mv ) ) ) THEN
427 info = -11
428 ELSE IF( uctol .AND. ( work( 1 ).LE.one ) ) THEN
429 info = -12
430 ELSE IF( lwork.LT.max( m+n, 6 ) ) THEN
431 info = -13
432 ELSE
433 info = 0
434 END IF
435*
436* #:(
437 IF( info.NE.0 ) THEN
438 CALL xerbla( 'DGESVJ', -info )
439 RETURN
440 END IF
441*
442* #:) Quick return for void matrix
443*
444 IF( ( m.EQ.0 ) .OR. ( n.EQ.0 ) )RETURN
445*
446* Set numerical parameters
447* The stopping criterion for Jacobi rotations is
448*
449* max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS
450*
451* where EPS is the round-off and CTOL is defined as follows:
452*
453 IF( uctol ) THEN
454* ... user controlled
455 ctol = work( 1 )
456 ELSE
457* ... default
458 IF( lsvec .OR. rsvec .OR. applv ) THEN
459 ctol = dsqrt( dble( m ) )
460 ELSE
461 ctol = dble( m )
462 END IF
463 END IF
464* ... and the machine dependent parameters are
465*[!] (Make sure that DLAMCH() works properly on the target machine.)
466*
467 epsln = dlamch( 'Epsilon' )
468 rooteps = dsqrt( epsln )
469 sfmin = dlamch( 'SafeMinimum' )
470 rootsfmin = dsqrt( sfmin )
471 small = sfmin / epsln
472 big = dlamch( 'Overflow' )
473* BIG = ONE / SFMIN
474 rootbig = one / rootsfmin
475 large = big / dsqrt( dble( m*n ) )
476 bigtheta = one / rooteps
477*
478 tol = ctol*epsln
479 roottol = dsqrt( tol )
480*
481 IF( dble( m )*epsln.GE.one ) THEN
482 info = -4
483 CALL xerbla( 'DGESVJ', -info )
484 RETURN
485 END IF
486*
487* Initialize the right singular vector matrix.
488*
489 IF( rsvec ) THEN
490 mvl = n
491 CALL dlaset( 'A', mvl, n, zero, one, v, ldv )
492 ELSE IF( applv ) THEN
493 mvl = mv
494 END IF
495 rsvec = rsvec .OR. applv
496*
497* Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N )
498*(!) If necessary, scale A to protect the largest singular value
499* from overflow. It is possible that saving the largest singular
500* value destroys the information about the small ones.
501* This initial scaling is almost minimal in the sense that the
502* goal is to make sure that no column norm overflows, and that
503* DSQRT(N)*max_i SVA(i) does not overflow. If INFinite entries
504* in A are detected, the procedure returns with INFO=-6.
505*
506 skl= one / dsqrt( dble( m )*dble( n ) )
507 noscale = .true.
508 goscale = .true.
509*
510 IF( lower ) THEN
511* the input matrix is M-by-N lower triangular (trapezoidal)
512 DO 1874 p = 1, n
513 aapp = zero
514 aaqq = one
515 CALL dlassq( m-p+1, a( p, p ), 1, aapp, aaqq )
516 IF( aapp.GT.big ) THEN
517 info = -6
518 CALL xerbla( 'DGESVJ', -info )
519 RETURN
520 END IF
521 aaqq = dsqrt( aaqq )
522 IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
523 sva( p ) = aapp*aaqq
524 ELSE
525 noscale = .false.
526 sva( p ) = aapp*( aaqq*skl)
527 IF( goscale ) THEN
528 goscale = .false.
529 DO 1873 q = 1, p - 1
530 sva( q ) = sva( q )*skl
531 1873 CONTINUE
532 END IF
533 END IF
534 1874 CONTINUE
535 ELSE IF( upper ) THEN
536* the input matrix is M-by-N upper triangular (trapezoidal)
537 DO 2874 p = 1, n
538 aapp = zero
539 aaqq = one
540 CALL dlassq( p, a( 1, p ), 1, aapp, aaqq )
541 IF( aapp.GT.big ) THEN
542 info = -6
543 CALL xerbla( 'DGESVJ', -info )
544 RETURN
545 END IF
546 aaqq = dsqrt( aaqq )
547 IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
548 sva( p ) = aapp*aaqq
549 ELSE
550 noscale = .false.
551 sva( p ) = aapp*( aaqq*skl)
552 IF( goscale ) THEN
553 goscale = .false.
554 DO 2873 q = 1, p - 1
555 sva( q ) = sva( q )*skl
556 2873 CONTINUE
557 END IF
558 END IF
559 2874 CONTINUE
560 ELSE
561* the input matrix is M-by-N general dense
562 DO 3874 p = 1, n
563 aapp = zero
564 aaqq = one
565 CALL dlassq( m, a( 1, p ), 1, aapp, aaqq )
566 IF( aapp.GT.big ) THEN
567 info = -6
568 CALL xerbla( 'DGESVJ', -info )
569 RETURN
570 END IF
571 aaqq = dsqrt( aaqq )
572 IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
573 sva( p ) = aapp*aaqq
574 ELSE
575 noscale = .false.
576 sva( p ) = aapp*( aaqq*skl)
577 IF( goscale ) THEN
578 goscale = .false.
579 DO 3873 q = 1, p - 1
580 sva( q ) = sva( q )*skl
581 3873 CONTINUE
582 END IF
583 END IF
584 3874 CONTINUE
585 END IF
586*
587 IF( noscale )skl= one
588*
589* Move the smaller part of the spectrum from the underflow threshold
590*(!) Start by determining the position of the nonzero entries of the
591* array SVA() relative to ( SFMIN, BIG ).
592*
593 aapp = zero
594 aaqq = big
595 DO 4781 p = 1, n
596 IF( sva( p ).NE.zero )aaqq = min( aaqq, sva( p ) )
597 aapp = max( aapp, sva( p ) )
598 4781 CONTINUE
599*
600* #:) Quick return for zero matrix
601*
602 IF( aapp.EQ.zero ) THEN
603 IF( lsvec )CALL dlaset( 'G', m, n, zero, one, a, lda )
604 work( 1 ) = one
605 work( 2 ) = zero
606 work( 3 ) = zero
607 work( 4 ) = zero
608 work( 5 ) = zero
609 work( 6 ) = zero
610 RETURN
611 END IF
612*
613* #:) Quick return for one-column matrix
614*
615 IF( n.EQ.1 ) THEN
616 IF( lsvec )CALL dlascl( 'G', 0, 0, sva( 1 ), skl, m, 1,
617 $ a( 1, 1 ), lda, ierr )
618 work( 1 ) = one / skl
619 IF( sva( 1 ).GE.sfmin ) THEN
620 work( 2 ) = one
621 ELSE
622 work( 2 ) = zero
623 END IF
624 work( 3 ) = zero
625 work( 4 ) = zero
626 work( 5 ) = zero
627 work( 6 ) = zero
628 RETURN
629 END IF
630*
631* Protect small singular values from underflow, and try to
632* avoid underflows/overflows in computing Jacobi rotations.
633*
634 sn = dsqrt( sfmin / epsln )
635 temp1 = dsqrt( big / dble( n ) )
636 IF( ( aapp.LE.sn ) .OR. ( aaqq.GE.temp1 ) .OR.
637 $ ( ( sn.LE.aaqq ) .AND. ( aapp.LE.temp1 ) ) ) THEN
638 temp1 = min( big, temp1 / aapp )
639* AAQQ = AAQQ*TEMP1
640* AAPP = AAPP*TEMP1
641 ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.LE.temp1 ) ) THEN
642 temp1 = min( sn / aaqq, big / ( aapp*dsqrt( dble( n ) ) ) )
643* AAQQ = AAQQ*TEMP1
644* AAPP = AAPP*TEMP1
645 ELSE IF( ( aaqq.GE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
646 temp1 = max( sn / aaqq, temp1 / aapp )
647* AAQQ = AAQQ*TEMP1
648* AAPP = AAPP*TEMP1
649 ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
650 temp1 = min( sn / aaqq, big / ( dsqrt( dble( n ) )*aapp ) )
651* AAQQ = AAQQ*TEMP1
652* AAPP = AAPP*TEMP1
653 ELSE
654 temp1 = one
655 END IF
656*
657* Scale, if necessary
658*
659 IF( temp1.NE.one ) THEN
660 CALL dlascl( 'G', 0, 0, one, temp1, n, 1, sva, n, ierr )
661 END IF
662 skl= temp1*skl
663 IF( skl.NE.one ) THEN
664 CALL dlascl( joba, 0, 0, one, skl, m, n, a, lda, ierr )
665 skl= one / skl
666 END IF
667*
668* Row-cyclic Jacobi SVD algorithm with column pivoting
669*
670 emptsw = ( n*( n-1 ) ) / 2
671 notrot = 0
672 fastr( 1 ) = zero
673*
674* A is represented in factored form A = A * diag(WORK), where diag(WORK)
675* is initialized to identity. WORK is updated during fast scaled
676* rotations.
677*
678 DO 1868 q = 1, n
679 work( q ) = one
680 1868 CONTINUE
681*
682*
683 swband = 3
684*[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective
685* if DGESVJ is used as a computational routine in the preconditioned
686* Jacobi SVD algorithm DGESVJ. For sweeps i=1:SWBAND the procedure
687* works on pivots inside a band-like region around the diagonal.
688* The boundaries are determined dynamically, based on the number of
689* pivots above a threshold.
690*
691 kbl = min( 8, n )
692*[TP] KBL is a tuning parameter that defines the tile size in the
693* tiling of the p-q loops of pivot pairs. In general, an optimal
694* value of KBL depends on the matrix dimensions and on the
695* parameters of the computer's memory.
696*
697 nbl = n / kbl
698 IF( ( nbl*kbl ).NE.n )nbl = nbl + 1
699*
700 blskip = kbl**2
701*[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL.
702*
703 rowskip = min( 5, kbl )
704*[TP] ROWSKIP is a tuning parameter.
705*
706 lkahead = 1
707*[TP] LKAHEAD is a tuning parameter.
708*
709* Quasi block transformations, using the lower (upper) triangular
710* structure of the input matrix. The quasi-block-cycling usually
711* invokes cubic convergence. Big part of this cycle is done inside
712* canonical subspaces of dimensions less than M.
713*
714 IF( ( lower .OR. upper ) .AND. ( n.GT.max( 64, 4*kbl ) ) ) THEN
715*[TP] The number of partition levels and the actual partition are
716* tuning parameters.
717 n4 = n / 4
718 n2 = n / 2
719 n34 = 3*n4
720 IF( applv ) THEN
721 q = 0
722 ELSE
723 q = 1
724 END IF
725*
726 IF( lower ) THEN
727*
728* This works very well on lower triangular matrices, in particular
729* in the framework of the preconditioned Jacobi SVD (xGEJSV).
730* The idea is simple:
731* [+ 0 0 0] Note that Jacobi transformations of [0 0]
732* [+ + 0 0] [0 0]
733* [+ + x 0] actually work on [x 0] [x 0]
734* [+ + x x] [x x]. [x x]
735*
736 CALL dgsvj0( jobv, m-n34, n-n34, a( n34+1, n34+1 ), lda,
737 $ work( n34+1 ), sva( n34+1 ), mvl,
738 $ v( n34*q+1, n34+1 ), ldv, epsln, sfmin, tol,
739 $ 2, work( n+1 ), lwork-n, ierr )
740*
741 CALL dgsvj0( jobv, m-n2, n34-n2, a( n2+1, n2+1 ), lda,
742 $ work( n2+1 ), sva( n2+1 ), mvl,
743 $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 2,
744 $ work( n+1 ), lwork-n, ierr )
745*
746 CALL dgsvj1( jobv, m-n2, n-n2, n4, a( n2+1, n2+1 ), lda,
747 $ work( n2+1 ), sva( n2+1 ), mvl,
748 $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
749 $ work( n+1 ), lwork-n, ierr )
750*
751 CALL dgsvj0( jobv, m-n4, n2-n4, a( n4+1, n4+1 ), lda,
752 $ work( n4+1 ), sva( n4+1 ), mvl,
753 $ v( n4*q+1, n4+1 ), ldv, epsln, sfmin, tol, 1,
754 $ work( n+1 ), lwork-n, ierr )
755*
756 CALL dgsvj0( jobv, m, n4, a, lda, work, sva, mvl, v, ldv,
757 $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
758 $ ierr )
759*
760 CALL dgsvj1( jobv, m, n2, n4, a, lda, work, sva, mvl, v,
761 $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
762 $ lwork-n, ierr )
763*
764*
765 ELSE IF( upper ) THEN
766*
767*
768 CALL dgsvj0( jobv, n4, n4, a, lda, work, sva, mvl, v, ldv,
769 $ epsln, sfmin, tol, 2, work( n+1 ), lwork-n,
770 $ ierr )
771*
772 CALL dgsvj0( jobv, n2, n4, a( 1, n4+1 ), lda, work( n4+1 ),
773 $ sva( n4+1 ), mvl, v( n4*q+1, n4+1 ), ldv,
774 $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
775 $ ierr )
776*
777 CALL dgsvj1( jobv, n2, n2, n4, a, lda, work, sva, mvl, v,
778 $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
779 $ lwork-n, ierr )
780*
781 CALL dgsvj0( jobv, n2+n4, n4, a( 1, n2+1 ), lda,
782 $ work( n2+1 ), sva( n2+1 ), mvl,
783 $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
784 $ work( n+1 ), lwork-n, ierr )
785
786 END IF
787*
788 END IF
789*
790* .. Row-cyclic pivot strategy with de Rijk's pivoting ..
791*
792 DO 1993 i = 1, nsweep
793*
794* .. go go go ...
795*
796 mxaapq = zero
797 mxsinj = zero
798 iswrot = 0
799*
800 notrot = 0
801 pskipped = 0
802*
803* Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs
804* 1 <= p < q <= N. This is the first step toward a blocked implementation
805* of the rotations. New implementation, based on block transformations,
806* is under development.
807*
808 DO 2000 ibr = 1, nbl
809*
810 igl = ( ibr-1 )*kbl + 1
811*
812 DO 1002 ir1 = 0, min( lkahead, nbl-ibr )
813*
814 igl = igl + ir1*kbl
815*
816 DO 2001 p = igl, min( igl+kbl-1, n-1 )
817*
818* .. de Rijk's pivoting
819*
820 q = idamax( n-p+1, sva( p ), 1 ) + p - 1
821 IF( p.NE.q ) THEN
822 CALL dswap( m, a( 1, p ), 1, a( 1, q ), 1 )
823 IF( rsvec )CALL dswap( mvl, v( 1, p ), 1,
824 $ v( 1, q ), 1 )
825 temp1 = sva( p )
826 sva( p ) = sva( q )
827 sva( q ) = temp1
828 temp1 = work( p )
829 work( p ) = work( q )
830 work( q ) = temp1
831 END IF
832*
833 IF( ir1.EQ.0 ) THEN
834*
835* Column norms are periodically updated by explicit
836* norm computation.
837* Caveat:
838* Unfortunately, some BLAS implementations compute DNRM2(M,A(1,p),1)
839* as DSQRT(DDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to
840* overflow for ||A(:,p)||_2 > DSQRT(overflow_threshold), and to
841* underflow for ||A(:,p)||_2 < DSQRT(underflow_threshold).
842* Hence, DNRM2 cannot be trusted, not even in the case when
843* the true norm is far from the under(over)flow boundaries.
844* If properly implemented DNRM2 is available, the IF-THEN-ELSE
845* below should read "AAPP = DNRM2( M, A(1,p), 1 ) * WORK(p)".
846*
847 IF( ( sva( p ).LT.rootbig ) .AND.
848 $ ( sva( p ).GT.rootsfmin ) ) THEN
849 sva( p ) = dnrm2( m, a( 1, p ), 1 )*work( p )
850 ELSE
851 temp1 = zero
852 aapp = one
853 CALL dlassq( m, a( 1, p ), 1, temp1, aapp )
854 sva( p ) = temp1*dsqrt( aapp )*work( p )
855 END IF
856 aapp = sva( p )
857 ELSE
858 aapp = sva( p )
859 END IF
860*
861 IF( aapp.GT.zero ) THEN
862*
863 pskipped = 0
864*
865 DO 2002 q = p + 1, min( igl+kbl-1, n )
866*
867 aaqq = sva( q )
868*
869 IF( aaqq.GT.zero ) THEN
870*
871 aapp0 = aapp
872 IF( aaqq.GE.one ) THEN
873 rotok = ( small*aapp ).LE.aaqq
874 IF( aapp.LT.( big / aaqq ) ) THEN
875 aapq = ( ddot( m, a( 1, p ), 1, a( 1,
876 $ q ), 1 )*work( p )*work( q ) /
877 $ aaqq ) / aapp
878 ELSE
879 CALL dcopy( m, a( 1, p ), 1,
880 $ work( n+1 ), 1 )
881 CALL dlascl( 'G', 0, 0, aapp,
882 $ work( p ), m, 1,
883 $ work( n+1 ), lda, ierr )
884 aapq = ddot( m, work( n+1 ), 1,
885 $ a( 1, q ), 1 )*work( q ) / aaqq
886 END IF
887 ELSE
888 rotok = aapp.LE.( aaqq / small )
889 IF( aapp.GT.( small / aaqq ) ) THEN
890 aapq = ( ddot( m, a( 1, p ), 1, a( 1,
891 $ q ), 1 )*work( p )*work( q ) /
892 $ aaqq ) / aapp
893 ELSE
894 CALL dcopy( m, a( 1, q ), 1,
895 $ work( n+1 ), 1 )
896 CALL dlascl( 'G', 0, 0, aaqq,
897 $ work( q ), m, 1,
898 $ work( n+1 ), lda, ierr )
899 aapq = ddot( m, work( n+1 ), 1,
900 $ a( 1, p ), 1 )*work( p ) / aapp
901 END IF
902 END IF
903*
904 mxaapq = max( mxaapq, dabs( aapq ) )
905*
906* TO rotate or NOT to rotate, THAT is the question ...
907*
908 IF( dabs( aapq ).GT.tol ) THEN
909*
910* .. rotate
911*[RTD] ROTATED = ROTATED + ONE
912*
913 IF( ir1.EQ.0 ) THEN
914 notrot = 0
915 pskipped = 0
916 iswrot = iswrot + 1
917 END IF
918*
919 IF( rotok ) THEN
920*
921 aqoap = aaqq / aapp
922 apoaq = aapp / aaqq
923 theta = -half*dabs(aqoap-apoaq)/aapq
924*
925 IF( dabs( theta ).GT.bigtheta ) THEN
926*
927 t = half / theta
928 fastr( 3 ) = t*work( p ) / work( q )
929 fastr( 4 ) = -t*work( q ) /
930 $ work( p )
931 CALL drotm( m, a( 1, p ), 1,
932 $ a( 1, q ), 1, fastr )
933 IF( rsvec )CALL drotm( mvl,
934 $ v( 1, p ), 1,
935 $ v( 1, q ), 1,
936 $ fastr )
937 sva( q ) = aaqq*dsqrt( max( zero,
938 $ one+t*apoaq*aapq ) )
939 aapp = aapp*dsqrt( max( zero,
940 $ one-t*aqoap*aapq ) )
941 mxsinj = max( mxsinj, dabs( t ) )
942*
943 ELSE
944*
945* .. choose correct signum for THETA and rotate
946*
947 thsign = -dsign( one, aapq )
948 t = one / ( theta+thsign*
949 $ dsqrt( one+theta*theta ) )
950 cs = dsqrt( one / ( one+t*t ) )
951 sn = t*cs
952*
953 mxsinj = max( mxsinj, dabs( sn ) )
954 sva( q ) = aaqq*dsqrt( max( zero,
955 $ one+t*apoaq*aapq ) )
956 aapp = aapp*dsqrt( max( zero,
957 $ one-t*aqoap*aapq ) )
958*
959 apoaq = work( p ) / work( q )
960 aqoap = work( q ) / work( p )
961 IF( work( p ).GE.one ) THEN
962 IF( work( q ).GE.one ) THEN
963 fastr( 3 ) = t*apoaq
964 fastr( 4 ) = -t*aqoap
965 work( p ) = work( p )*cs
966 work( q ) = work( q )*cs
967 CALL drotm( m, a( 1, p ), 1,
968 $ a( 1, q ), 1,
969 $ fastr )
970 IF( rsvec )CALL drotm( mvl,
971 $ v( 1, p ), 1, v( 1, q ),
972 $ 1, fastr )
973 ELSE
974 CALL daxpy( m, -t*aqoap,
975 $ a( 1, q ), 1,
976 $ a( 1, p ), 1 )
977 CALL daxpy( m, cs*sn*apoaq,
978 $ a( 1, p ), 1,
979 $ a( 1, q ), 1 )
980 work( p ) = work( p )*cs
981 work( q ) = work( q ) / cs
982 IF( rsvec ) THEN
983 CALL daxpy( mvl, -t*aqoap,
984 $ v( 1, q ), 1,
985 $ v( 1, p ), 1 )
986 CALL daxpy( mvl,
987 $ cs*sn*apoaq,
988 $ v( 1, p ), 1,
989 $ v( 1, q ), 1 )
990 END IF
991 END IF
992 ELSE
993 IF( work( q ).GE.one ) THEN
994 CALL daxpy( m, t*apoaq,
995 $ a( 1, p ), 1,
996 $ a( 1, q ), 1 )
997 CALL daxpy( m, -cs*sn*aqoap,
998 $ a( 1, q ), 1,
999 $ a( 1, p ), 1 )
1000 work( p ) = work( p ) / cs
1001 work( q ) = work( q )*cs
1002 IF( rsvec ) THEN
1003 CALL daxpy( mvl, t*apoaq,
1004 $ v( 1, p ), 1,
1005 $ v( 1, q ), 1 )
1006 CALL daxpy( mvl,
1007 $ -cs*sn*aqoap,
1008 $ v( 1, q ), 1,
1009 $ v( 1, p ), 1 )
1010 END IF
1011 ELSE
1012 IF( work( p ).GE.work( q ) )
1013 $ THEN
1014 CALL daxpy( m, -t*aqoap,
1015 $ a( 1, q ), 1,
1016 $ a( 1, p ), 1 )
1017 CALL daxpy( m, cs*sn*apoaq,
1018 $ a( 1, p ), 1,
1019 $ a( 1, q ), 1 )
1020 work( p ) = work( p )*cs
1021 work( q ) = work( q ) / cs
1022 IF( rsvec ) THEN
1023 CALL daxpy( mvl,
1024 $ -t*aqoap,
1025 $ v( 1, q ), 1,
1026 $ v( 1, p ), 1 )
1027 CALL daxpy( mvl,
1028 $ cs*sn*apoaq,
1029 $ v( 1, p ), 1,
1030 $ v( 1, q ), 1 )
1031 END IF
1032 ELSE
1033 CALL daxpy( m, t*apoaq,
1034 $ a( 1, p ), 1,
1035 $ a( 1, q ), 1 )
1036 CALL daxpy( m,
1037 $ -cs*sn*aqoap,
1038 $ a( 1, q ), 1,
1039 $ a( 1, p ), 1 )
1040 work( p ) = work( p ) / cs
1041 work( q ) = work( q )*cs
1042 IF( rsvec ) THEN
1043 CALL daxpy( mvl,
1044 $ t*apoaq, v( 1, p ),
1045 $ 1, v( 1, q ), 1 )
1046 CALL daxpy( mvl,
1047 $ -cs*sn*aqoap,
1048 $ v( 1, q ), 1,
1049 $ v( 1, p ), 1 )
1050 END IF
1051 END IF
1052 END IF
1053 END IF
1054 END IF
1055*
1056 ELSE
1057* .. have to use modified Gram-Schmidt like transformation
1058 CALL dcopy( m, a( 1, p ), 1,
1059 $ work( n+1 ), 1 )
1060 CALL dlascl( 'G', 0, 0, aapp, one, m,
1061 $ 1, work( n+1 ), lda,
1062 $ ierr )
1063 CALL dlascl( 'G', 0, 0, aaqq, one, m,
1064 $ 1, a( 1, q ), lda, ierr )
1065 temp1 = -aapq*work( p ) / work( q )
1066 CALL daxpy( m, temp1, work( n+1 ), 1,
1067 $ a( 1, q ), 1 )
1068 CALL dlascl( 'G', 0, 0, one, aaqq, m,
1069 $ 1, a( 1, q ), lda, ierr )
1070 sva( q ) = aaqq*dsqrt( max( zero,
1071 $ one-aapq*aapq ) )
1072 mxsinj = max( mxsinj, sfmin )
1073 END IF
1074* END IF ROTOK THEN ... ELSE
1075*
1076* In the case of cancellation in updating SVA(q), SVA(p)
1077* recompute SVA(q), SVA(p).
1078*
1079 IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1080 $ THEN
1081 IF( ( aaqq.LT.rootbig ) .AND.
1082 $ ( aaqq.GT.rootsfmin ) ) THEN
1083 sva( q ) = dnrm2( m, a( 1, q ), 1 )*
1084 $ work( q )
1085 ELSE
1086 t = zero
1087 aaqq = one
1088 CALL dlassq( m, a( 1, q ), 1, t,
1089 $ aaqq )
1090 sva( q ) = t*dsqrt( aaqq )*work( q )
1091 END IF
1092 END IF
1093 IF( ( aapp / aapp0 ).LE.rooteps ) THEN
1094 IF( ( aapp.LT.rootbig ) .AND.
1095 $ ( aapp.GT.rootsfmin ) ) THEN
1096 aapp = dnrm2( m, a( 1, p ), 1 )*
1097 $ work( p )
1098 ELSE
1099 t = zero
1100 aapp = one
1101 CALL dlassq( m, a( 1, p ), 1, t,
1102 $ aapp )
1103 aapp = t*dsqrt( aapp )*work( p )
1104 END IF
1105 sva( p ) = aapp
1106 END IF
1107*
1108 ELSE
1109* A(:,p) and A(:,q) already numerically orthogonal
1110 IF( ir1.EQ.0 )notrot = notrot + 1
1111*[RTD] SKIPPED = SKIPPED + 1
1112 pskipped = pskipped + 1
1113 END IF
1114 ELSE
1115* A(:,q) is zero column
1116 IF( ir1.EQ.0 )notrot = notrot + 1
1117 pskipped = pskipped + 1
1118 END IF
1119*
1120 IF( ( i.LE.swband ) .AND.
1121 $ ( pskipped.GT.rowskip ) ) THEN
1122 IF( ir1.EQ.0 )aapp = -aapp
1123 notrot = 0
1124 GO TO 2103
1125 END IF
1126*
1127 2002 CONTINUE
1128* END q-LOOP
1129*
1130 2103 CONTINUE
1131* bailed out of q-loop
1132*
1133 sva( p ) = aapp
1134*
1135 ELSE
1136 sva( p ) = aapp
1137 IF( ( ir1.EQ.0 ) .AND. ( aapp.EQ.zero ) )
1138 $ notrot = notrot + min( igl+kbl-1, n ) - p
1139 END IF
1140*
1141 2001 CONTINUE
1142* end of the p-loop
1143* end of doing the block ( ibr, ibr )
1144 1002 CONTINUE
1145* end of ir1-loop
1146*
1147* ... go to the off diagonal blocks
1148*
1149 igl = ( ibr-1 )*kbl + 1
1150*
1151 DO 2010 jbc = ibr + 1, nbl
1152*
1153 jgl = ( jbc-1 )*kbl + 1
1154*
1155* doing the block at ( ibr, jbc )
1156*
1157 ijblsk = 0
1158 DO 2100 p = igl, min( igl+kbl-1, n )
1159*
1160 aapp = sva( p )
1161 IF( aapp.GT.zero ) THEN
1162*
1163 pskipped = 0
1164*
1165 DO 2200 q = jgl, min( jgl+kbl-1, n )
1166*
1167 aaqq = sva( q )
1168 IF( aaqq.GT.zero ) THEN
1169 aapp0 = aapp
1170*
1171* .. M x 2 Jacobi SVD ..
1172*
1173* Safe Gram matrix computation
1174*
1175 IF( aaqq.GE.one ) THEN
1176 IF( aapp.GE.aaqq ) THEN
1177 rotok = ( small*aapp ).LE.aaqq
1178 ELSE
1179 rotok = ( small*aaqq ).LE.aapp
1180 END IF
1181 IF( aapp.LT.( big / aaqq ) ) THEN
1182 aapq = ( ddot( m, a( 1, p ), 1, a( 1,
1183 $ q ), 1 )*work( p )*work( q ) /
1184 $ aaqq ) / aapp
1185 ELSE
1186 CALL dcopy( m, a( 1, p ), 1,
1187 $ work( n+1 ), 1 )
1188 CALL dlascl( 'G', 0, 0, aapp,
1189 $ work( p ), m, 1,
1190 $ work( n+1 ), lda, ierr )
1191 aapq = ddot( m, work( n+1 ), 1,
1192 $ a( 1, q ), 1 )*work( q ) / aaqq
1193 END IF
1194 ELSE
1195 IF( aapp.GE.aaqq ) THEN
1196 rotok = aapp.LE.( aaqq / small )
1197 ELSE
1198 rotok = aaqq.LE.( aapp / small )
1199 END IF
1200 IF( aapp.GT.( small / aaqq ) ) THEN
1201 aapq = ( ddot( m, a( 1, p ), 1, a( 1,
1202 $ q ), 1 )*work( p )*work( q ) /
1203 $ aaqq ) / aapp
1204 ELSE
1205 CALL dcopy( m, a( 1, q ), 1,
1206 $ work( n+1 ), 1 )
1207 CALL dlascl( 'G', 0, 0, aaqq,
1208 $ work( q ), m, 1,
1209 $ work( n+1 ), lda, ierr )
1210 aapq = ddot( m, work( n+1 ), 1,
1211 $ a( 1, p ), 1 )*work( p ) / aapp
1212 END IF
1213 END IF
1214*
1215 mxaapq = max( mxaapq, dabs( aapq ) )
1216*
1217* TO rotate or NOT to rotate, THAT is the question ...
1218*
1219 IF( dabs( aapq ).GT.tol ) THEN
1220 notrot = 0
1221*[RTD] ROTATED = ROTATED + 1
1222 pskipped = 0
1223 iswrot = iswrot + 1
1224*
1225 IF( rotok ) THEN
1226*
1227 aqoap = aaqq / aapp
1228 apoaq = aapp / aaqq
1229 theta = -half*dabs(aqoap-apoaq)/aapq
1230 IF( aaqq.GT.aapp0 )theta = -theta
1231*
1232 IF( dabs( theta ).GT.bigtheta ) THEN
1233 t = half / theta
1234 fastr( 3 ) = t*work( p ) / work( q )
1235 fastr( 4 ) = -t*work( q ) /
1236 $ work( p )
1237 CALL drotm( m, a( 1, p ), 1,
1238 $ a( 1, q ), 1, fastr )
1239 IF( rsvec )CALL drotm( mvl,
1240 $ v( 1, p ), 1,
1241 $ v( 1, q ), 1,
1242 $ fastr )
1243 sva( q ) = aaqq*dsqrt( max( zero,
1244 $ one+t*apoaq*aapq ) )
1245 aapp = aapp*dsqrt( max( zero,
1246 $ one-t*aqoap*aapq ) )
1247 mxsinj = max( mxsinj, dabs( t ) )
1248 ELSE
1249*
1250* .. choose correct signum for THETA and rotate
1251*
1252 thsign = -dsign( one, aapq )
1253 IF( aaqq.GT.aapp0 )thsign = -thsign
1254 t = one / ( theta+thsign*
1255 $ dsqrt( one+theta*theta ) )
1256 cs = dsqrt( one / ( one+t*t ) )
1257 sn = t*cs
1258 mxsinj = max( mxsinj, dabs( sn ) )
1259 sva( q ) = aaqq*dsqrt( max( zero,
1260 $ one+t*apoaq*aapq ) )
1261 aapp = aapp*dsqrt( max( zero,
1262 $ one-t*aqoap*aapq ) )
1263*
1264 apoaq = work( p ) / work( q )
1265 aqoap = work( q ) / work( p )
1266 IF( work( p ).GE.one ) THEN
1267*
1268 IF( work( q ).GE.one ) THEN
1269 fastr( 3 ) = t*apoaq
1270 fastr( 4 ) = -t*aqoap
1271 work( p ) = work( p )*cs
1272 work( q ) = work( q )*cs
1273 CALL drotm( m, a( 1, p ), 1,
1274 $ a( 1, q ), 1,
1275 $ fastr )
1276 IF( rsvec )CALL drotm( mvl,
1277 $ v( 1, p ), 1, v( 1, q ),
1278 $ 1, fastr )
1279 ELSE
1280 CALL daxpy( m, -t*aqoap,
1281 $ a( 1, q ), 1,
1282 $ a( 1, p ), 1 )
1283 CALL daxpy( m, cs*sn*apoaq,
1284 $ a( 1, p ), 1,
1285 $ a( 1, q ), 1 )
1286 IF( rsvec ) THEN
1287 CALL daxpy( mvl, -t*aqoap,
1288 $ v( 1, q ), 1,
1289 $ v( 1, p ), 1 )
1290 CALL daxpy( mvl,
1291 $ cs*sn*apoaq,
1292 $ v( 1, p ), 1,
1293 $ v( 1, q ), 1 )
1294 END IF
1295 work( p ) = work( p )*cs
1296 work( q ) = work( q ) / cs
1297 END IF
1298 ELSE
1299 IF( work( q ).GE.one ) THEN
1300 CALL daxpy( m, t*apoaq,
1301 $ a( 1, p ), 1,
1302 $ a( 1, q ), 1 )
1303 CALL daxpy( m, -cs*sn*aqoap,
1304 $ a( 1, q ), 1,
1305 $ a( 1, p ), 1 )
1306 IF( rsvec ) THEN
1307 CALL daxpy( mvl, t*apoaq,
1308 $ v( 1, p ), 1,
1309 $ v( 1, q ), 1 )
1310 CALL daxpy( mvl,
1311 $ -cs*sn*aqoap,
1312 $ v( 1, q ), 1,
1313 $ v( 1, p ), 1 )
1314 END IF
1315 work( p ) = work( p ) / cs
1316 work( q ) = work( q )*cs
1317 ELSE
1318 IF( work( p ).GE.work( q ) )
1319 $ THEN
1320 CALL daxpy( m, -t*aqoap,
1321 $ a( 1, q ), 1,
1322 $ a( 1, p ), 1 )
1323 CALL daxpy( m, cs*sn*apoaq,
1324 $ a( 1, p ), 1,
1325 $ a( 1, q ), 1 )
1326 work( p ) = work( p )*cs
1327 work( q ) = work( q ) / cs
1328 IF( rsvec ) THEN
1329 CALL daxpy( mvl,
1330 $ -t*aqoap,
1331 $ v( 1, q ), 1,
1332 $ v( 1, p ), 1 )
1333 CALL daxpy( mvl,
1334 $ cs*sn*apoaq,
1335 $ v( 1, p ), 1,
1336 $ v( 1, q ), 1 )
1337 END IF
1338 ELSE
1339 CALL daxpy( m, t*apoaq,
1340 $ a( 1, p ), 1,
1341 $ a( 1, q ), 1 )
1342 CALL daxpy( m,
1343 $ -cs*sn*aqoap,
1344 $ a( 1, q ), 1,
1345 $ a( 1, p ), 1 )
1346 work( p ) = work( p ) / cs
1347 work( q ) = work( q )*cs
1348 IF( rsvec ) THEN
1349 CALL daxpy( mvl,
1350 $ t*apoaq, v( 1, p ),
1351 $ 1, v( 1, q ), 1 )
1352 CALL daxpy( mvl,
1353 $ -cs*sn*aqoap,
1354 $ v( 1, q ), 1,
1355 $ v( 1, p ), 1 )
1356 END IF
1357 END IF
1358 END IF
1359 END IF
1360 END IF
1361*
1362 ELSE
1363 IF( aapp.GT.aaqq ) THEN
1364 CALL dcopy( m, a( 1, p ), 1,
1365 $ work( n+1 ), 1 )
1366 CALL dlascl( 'G', 0, 0, aapp, one,
1367 $ m, 1, work( n+1 ), lda,
1368 $ ierr )
1369 CALL dlascl( 'G', 0, 0, aaqq, one,
1370 $ m, 1, a( 1, q ), lda,
1371 $ ierr )
1372 temp1 = -aapq*work( p ) / work( q )
1373 CALL daxpy( m, temp1, work( n+1 ),
1374 $ 1, a( 1, q ), 1 )
1375 CALL dlascl( 'G', 0, 0, one, aaqq,
1376 $ m, 1, a( 1, q ), lda,
1377 $ ierr )
1378 sva( q ) = aaqq*dsqrt( max( zero,
1379 $ one-aapq*aapq ) )
1380 mxsinj = max( mxsinj, sfmin )
1381 ELSE
1382 CALL dcopy( m, a( 1, q ), 1,
1383 $ work( n+1 ), 1 )
1384 CALL dlascl( 'G', 0, 0, aaqq, one,
1385 $ m, 1, work( n+1 ), lda,
1386 $ ierr )
1387 CALL dlascl( 'G', 0, 0, aapp, one,
1388 $ m, 1, a( 1, p ), lda,
1389 $ ierr )
1390 temp1 = -aapq*work( q ) / work( p )
1391 CALL daxpy( m, temp1, work( n+1 ),
1392 $ 1, a( 1, p ), 1 )
1393 CALL dlascl( 'G', 0, 0, one, aapp,
1394 $ m, 1, a( 1, p ), lda,
1395 $ ierr )
1396 sva( p ) = aapp*dsqrt( max( zero,
1397 $ one-aapq*aapq ) )
1398 mxsinj = max( mxsinj, sfmin )
1399 END IF
1400 END IF
1401* END IF ROTOK THEN ... ELSE
1402*
1403* In the case of cancellation in updating SVA(q)
1404* .. recompute SVA(q)
1405 IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1406 $ THEN
1407 IF( ( aaqq.LT.rootbig ) .AND.
1408 $ ( aaqq.GT.rootsfmin ) ) THEN
1409 sva( q ) = dnrm2( m, a( 1, q ), 1 )*
1410 $ work( q )
1411 ELSE
1412 t = zero
1413 aaqq = one
1414 CALL dlassq( m, a( 1, q ), 1, t,
1415 $ aaqq )
1416 sva( q ) = t*dsqrt( aaqq )*work( q )
1417 END IF
1418 END IF
1419 IF( ( aapp / aapp0 )**2.LE.rooteps ) THEN
1420 IF( ( aapp.LT.rootbig ) .AND.
1421 $ ( aapp.GT.rootsfmin ) ) THEN
1422 aapp = dnrm2( m, a( 1, p ), 1 )*
1423 $ work( p )
1424 ELSE
1425 t = zero
1426 aapp = one
1427 CALL dlassq( m, a( 1, p ), 1, t,
1428 $ aapp )
1429 aapp = t*dsqrt( aapp )*work( p )
1430 END IF
1431 sva( p ) = aapp
1432 END IF
1433* end of OK rotation
1434 ELSE
1435 notrot = notrot + 1
1436*[RTD] SKIPPED = SKIPPED + 1
1437 pskipped = pskipped + 1
1438 ijblsk = ijblsk + 1
1439 END IF
1440 ELSE
1441 notrot = notrot + 1
1442 pskipped = pskipped + 1
1443 ijblsk = ijblsk + 1
1444 END IF
1445*
1446 IF( ( i.LE.swband ) .AND. ( ijblsk.GE.blskip ) )
1447 $ THEN
1448 sva( p ) = aapp
1449 notrot = 0
1450 GO TO 2011
1451 END IF
1452 IF( ( i.LE.swband ) .AND.
1453 $ ( pskipped.GT.rowskip ) ) THEN
1454 aapp = -aapp
1455 notrot = 0
1456 GO TO 2203
1457 END IF
1458*
1459 2200 CONTINUE
1460* end of the q-loop
1461 2203 CONTINUE
1462*
1463 sva( p ) = aapp
1464*
1465 ELSE
1466*
1467 IF( aapp.EQ.zero )notrot = notrot +
1468 $ min( jgl+kbl-1, n ) - jgl + 1
1469 IF( aapp.LT.zero )notrot = 0
1470*
1471 END IF
1472*
1473 2100 CONTINUE
1474* end of the p-loop
1475 2010 CONTINUE
1476* end of the jbc-loop
1477 2011 CONTINUE
1478*2011 bailed out of the jbc-loop
1479 DO 2012 p = igl, min( igl+kbl-1, n )
1480 sva( p ) = dabs( sva( p ) )
1481 2012 CONTINUE
1482***
1483 2000 CONTINUE
1484*2000 :: end of the ibr-loop
1485*
1486* .. update SVA(N)
1487 IF( ( sva( n ).LT.rootbig ) .AND. ( sva( n ).GT.rootsfmin ) )
1488 $ THEN
1489 sva( n ) = dnrm2( m, a( 1, n ), 1 )*work( n )
1490 ELSE
1491 t = zero
1492 aapp = one
1493 CALL dlassq( m, a( 1, n ), 1, t, aapp )
1494 sva( n ) = t*dsqrt( aapp )*work( n )
1495 END IF
1496*
1497* Additional steering devices
1498*
1499 IF( ( i.LT.swband ) .AND. ( ( mxaapq.LE.roottol ) .OR.
1500 $ ( iswrot.LE.n ) ) )swband = i
1501*
1502 IF( ( i.GT.swband+1 ) .AND. ( mxaapq.LT.dsqrt( dble( n ) )*
1503 $ tol ) .AND. ( dble( n )*mxaapq*mxsinj.LT.tol ) ) THEN
1504 GO TO 1994
1505 END IF
1506*
1507 IF( notrot.GE.emptsw )GO TO 1994
1508*
1509 1993 CONTINUE
1510* end i=1:NSWEEP loop
1511*
1512* #:( Reaching this point means that the procedure has not converged.
1513 info = nsweep - 1
1514 GO TO 1995
1515*
1516 1994 CONTINUE
1517* #:) Reaching this point means numerical convergence after the i-th
1518* sweep.
1519*
1520 info = 0
1521* #:) INFO = 0 confirms successful iterations.
1522 1995 CONTINUE
1523*
1524* Sort the singular values and find how many are above
1525* the underflow threshold.
1526*
1527 n2 = 0
1528 n4 = 0
1529 DO 5991 p = 1, n - 1
1530 q = idamax( n-p+1, sva( p ), 1 ) + p - 1
1531 IF( p.NE.q ) THEN
1532 temp1 = sva( p )
1533 sva( p ) = sva( q )
1534 sva( q ) = temp1
1535 temp1 = work( p )
1536 work( p ) = work( q )
1537 work( q ) = temp1
1538 CALL dswap( m, a( 1, p ), 1, a( 1, q ), 1 )
1539 IF( rsvec )CALL dswap( mvl, v( 1, p ), 1, v( 1, q ), 1 )
1540 END IF
1541 IF( sva( p ).NE.zero ) THEN
1542 n4 = n4 + 1
1543 IF( sva( p )*skl.GT.sfmin )n2 = n2 + 1
1544 END IF
1545 5991 CONTINUE
1546 IF( sva( n ).NE.zero ) THEN
1547 n4 = n4 + 1
1548 IF( sva( n )*skl.GT.sfmin )n2 = n2 + 1
1549 END IF
1550*
1551* Normalize the left singular vectors.
1552*
1553 IF( lsvec .OR. uctol ) THEN
1554 DO 1998 p = 1, n2
1555 CALL dscal( m, work( p ) / sva( p ), a( 1, p ), 1 )
1556 1998 CONTINUE
1557 END IF
1558*
1559* Scale the product of Jacobi rotations (assemble the fast rotations).
1560*
1561 IF( rsvec ) THEN
1562 IF( applv ) THEN
1563 DO 2398 p = 1, n
1564 CALL dscal( mvl, work( p ), v( 1, p ), 1 )
1565 2398 CONTINUE
1566 ELSE
1567 DO 2399 p = 1, n
1568 temp1 = one / dnrm2( mvl, v( 1, p ), 1 )
1569 CALL dscal( mvl, temp1, v( 1, p ), 1 )
1570 2399 CONTINUE
1571 END IF
1572 END IF
1573*
1574* Undo scaling, if necessary (and possible).
1575 IF( ( ( skl.GT.one ) .AND. ( sva( 1 ).LT.( big / skl) ) )
1576 $ .OR. ( ( skl.LT.one ) .AND. ( sva( max( n2, 1 ) ) .GT.
1577 $ ( sfmin / skl) ) ) ) THEN
1578 DO 2400 p = 1, n
1579 sva( p ) = skl*sva( p )
1580 2400 CONTINUE
1581 skl= one
1582 END IF
1583*
1584 work( 1 ) = skl
1585* The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE
1586* then some of the singular values may overflow or underflow and
1587* the spectrum is given in this factored representation.
1588*
1589 work( 2 ) = dble( n4 )
1590* N4 is the number of computed nonzero singular values of A.
1591*
1592 work( 3 ) = dble( n2 )
1593* N2 is the number of singular values of A greater than SFMIN.
1594* If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers
1595* that may carry some information.
1596*
1597 work( 4 ) = dble( i )
1598* i is the index of the last sweep before declaring convergence.
1599*
1600 work( 5 ) = mxaapq
1601* MXAAPQ is the largest absolute value of scaled pivots in the
1602* last sweep
1603*
1604 work( 6 ) = mxsinj
1605* MXSINJ is the largest absolute value of the sines of Jacobi angles
1606* in the last sweep
1607*
1608 RETURN
1609* ..
1610* .. END OF DGESVJ
1611* ..
1612 END
subroutine xerbla(srname, info)
Definition cblat2.f:3285
subroutine daxpy(n, da, dx, incx, dy, incy)
DAXPY
Definition daxpy.f:89
subroutine dcopy(n, dx, incx, dy, incy)
DCOPY
Definition dcopy.f:82
subroutine dgesvj(joba, jobu, jobv, m, n, a, lda, sva, mv, v, ldv, work, lwork, info)
DGESVJ
Definition dgesvj.f:337
subroutine dgsvj0(jobv, m, n, a, lda, d, sva, mv, v, ldv, eps, sfmin, tol, nsweep, work, lwork, info)
DGSVJ0 pre-processor for the routine dgesvj.
Definition dgsvj0.f:218
subroutine dgsvj1(jobv, m, n, n1, a, lda, d, sva, mv, v, ldv, eps, sfmin, tol, nsweep, work, lwork, info)
DGSVJ1 pre-processor for the routine dgesvj, applies Jacobi rotations targeting only particular pivot...
Definition dgsvj1.f:236
subroutine dlascl(type, kl, ku, cfrom, cto, m, n, a, lda, info)
DLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition dlascl.f:143
subroutine dlaset(uplo, m, n, alpha, beta, a, lda)
DLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values.
Definition dlaset.f:110
subroutine dlassq(n, x, incx, scale, sumsq)
DLASSQ updates a sum of squares represented in scaled form.
Definition dlassq.f90:124
subroutine drotm(n, dx, incx, dy, incy, dparam)
DROTM
Definition drotm.f:96
subroutine dscal(n, da, dx, incx)
DSCAL
Definition dscal.f:79
subroutine dswap(n, dx, incx, dy, incy)
DSWAP
Definition dswap.f:82