LAPACK 3.12.0
LAPACK: Linear Algebra PACKage
Loading...
Searching...
No Matches

◆ dgesvj()

subroutine dgesvj ( character*1  joba,
character*1  jobu,
character*1  jobv,
integer  m,
integer  n,
double precision, dimension( lda, * )  a,
integer  lda,
double precision, dimension( n )  sva,
integer  mv,
double precision, dimension( ldv, * )  v,
integer  ldv,
double precision, dimension( lwork )  work,
integer  lwork,
integer  info 
)

DGESVJ

Download DGESVJ + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 DGESVJ computes the singular value decomposition (SVD) of a real
 M-by-N matrix A, where M >= N. The SVD of A is written as
                                    [++]   [xx]   [x0]   [xx]
              A = U * SIGMA * V^t,  [++] = [xx] * [ox] * [xx]
                                    [++]   [xx]
 where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal
 matrix, and V is an N-by-N orthogonal matrix. The diagonal elements
 of SIGMA are the singular values of A. The columns of U and V are the
 left and the right singular vectors of A, respectively.
 DGESVJ can sometimes compute tiny singular values and their singular vectors much
 more accurately than other SVD routines, see below under Further Details.
Parameters
[in]JOBA
          JOBA is CHARACTER*1
          Specifies the structure of A.
          = 'L': The input matrix A is lower triangular;
          = 'U': The input matrix A is upper triangular;
          = 'G': The input matrix A is general M-by-N matrix, M >= N.
[in]JOBU
          JOBU is CHARACTER*1
          Specifies whether to compute the left singular vectors
          (columns of U):
          = 'U': The left singular vectors corresponding to the nonzero
                 singular values are computed and returned in the leading
                 columns of A. See more details in the description of A.
                 The default numerical orthogonality threshold is set to
                 approximately TOL=CTOL*EPS, CTOL=DSQRT(M), EPS=DLAMCH('E').
          = 'C': Analogous to JOBU='U', except that user can control the
                 level of numerical orthogonality of the computed left
                 singular vectors. TOL can be set to TOL = CTOL*EPS, where
                 CTOL is given on input in the array WORK.
                 No CTOL smaller than ONE is allowed. CTOL greater
                 than 1 / EPS is meaningless. The option 'C'
                 can be used if M*EPS is satisfactory orthogonality
                 of the computed left singular vectors, so CTOL=M could
                 save few sweeps of Jacobi rotations.
                 See the descriptions of A and WORK(1).
          = 'N': The matrix U is not computed. However, see the
                 description of A.
[in]JOBV
          JOBV is CHARACTER*1
          Specifies whether to compute the right singular vectors, that
          is, the matrix V:
          = 'V':  the matrix V is computed and returned in the array V
          = 'A':  the Jacobi rotations are applied to the MV-by-N
                  array V. In other words, the right singular vector
                  matrix V is not computed explicitly, instead it is
                  applied to an MV-by-N matrix initially stored in the
                  first MV rows of V.
          = 'N':  the matrix V is not computed and the array V is not
                  referenced
[in]M
          M is INTEGER
          The number of rows of the input matrix A. 1/DLAMCH('E') > M >= 0.
[in]N
          N is INTEGER
          The number of columns of the input matrix A.
          M >= N >= 0.
[in,out]A
          A is DOUBLE PRECISION array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
          On exit :
          If JOBU = 'U' .OR. JOBU = 'C' :
                 If INFO = 0 :
                 RANKA orthonormal columns of U are returned in the
                 leading RANKA columns of the array A. Here RANKA <= N
                 is the number of computed singular values of A that are
                 above the underflow threshold DLAMCH('S'). The singular
                 vectors corresponding to underflowed or zero singular
                 values are not computed. The value of RANKA is returned
                 in the array WORK as RANKA=NINT(WORK(2)). Also see the
                 descriptions of SVA and WORK. The computed columns of U
                 are mutually numerically orthogonal up to approximately
                 TOL=DSQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU = 'C'),
                 see the description of JOBU.
                 If INFO > 0 :
                 the procedure DGESVJ did not converge in the given number
                 of iterations (sweeps). In that case, the computed
                 columns of U may not be orthogonal up to TOL. The output
                 U (stored in A), SIGMA (given by the computed singular
                 values in SVA(1:N)) and V is still a decomposition of the
                 input matrix A in the sense that the residual
                 ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small.

          If JOBU = 'N' :
                 If INFO = 0 :
                 Note that the left singular vectors are 'for free' in the
                 one-sided Jacobi SVD algorithm. However, if only the
                 singular values are needed, the level of numerical
                 orthogonality of U is not an issue and iterations are
                 stopped when the columns of the iterated matrix are
                 numerically orthogonal up to approximately M*EPS. Thus,
                 on exit, A contains the columns of U scaled with the
                 corresponding singular values.
                 If INFO > 0 :
                 the procedure DGESVJ did not converge in the given number
                 of iterations (sweeps).
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]SVA
          SVA is DOUBLE PRECISION array, dimension (N)
          On exit :
          If INFO = 0 :
          depending on the value SCALE = WORK(1), we have:
                 If SCALE = ONE :
                 SVA(1:N) contains the computed singular values of A.
                 During the computation SVA contains the Euclidean column
                 norms of the iterated matrices in the array A.
                 If SCALE .NE. ONE :
                 The singular values of A are SCALE*SVA(1:N), and this
                 factored representation is due to the fact that some of the
                 singular values of A might underflow or overflow.
          If INFO > 0 :
          the procedure DGESVJ did not converge in the given number of
          iterations (sweeps) and SCALE*SVA(1:N) may not be accurate.
[in]MV
          MV is INTEGER
          If JOBV = 'A', then the product of Jacobi rotations in DGESVJ
          is applied to the first MV rows of V. See the description of JOBV.
[in,out]V
          V is DOUBLE PRECISION array, dimension (LDV,N)
          If JOBV = 'V', then V contains on exit the N-by-N matrix of
                         the right singular vectors;
          If JOBV = 'A', then V contains the product of the computed right
                         singular vector matrix and the initial matrix in
                         the array V.
          If JOBV = 'N', then V is not referenced.
[in]LDV
          LDV is INTEGER
          The leading dimension of the array V, LDV >= 1.
          If JOBV = 'V', then LDV >= max(1,N).
          If JOBV = 'A', then LDV >= max(1,MV) .
[in,out]WORK
          WORK is DOUBLE PRECISION array, dimension (LWORK)
          On entry :
          If JOBU = 'C' :
          WORK(1) = CTOL, where CTOL defines the threshold for convergence.
                    The process stops if all columns of A are mutually
                    orthogonal up to CTOL*EPS, EPS=DLAMCH('E').
                    It is required that CTOL >= ONE, i.e. it is not
                    allowed to force the routine to obtain orthogonality
                    below EPS.
          On exit :
          WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N)
                    are the computed singular values of A.
                    (See description of SVA().)
          WORK(2) = NINT(WORK(2)) is the number of the computed nonzero
                    singular values.
          WORK(3) = NINT(WORK(3)) is the number of the computed singular
                    values that are larger than the underflow threshold.
          WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi
                    rotations needed for numerical convergence.
          WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep.
                    This is useful information in cases when DGESVJ did
                    not converge, as it can be used to estimate whether
                    the output is still useful and for post festum analysis.
          WORK(6) = the largest absolute value over all sines of the
                    Jacobi rotation angles in the last sweep. It can be
                    useful for a post festum analysis.
[in]LWORK
          LWORK is INTEGER
          length of WORK, WORK >= MAX(6,M+N)
[out]INFO
          INFO is INTEGER
          = 0:  successful exit.
          < 0:  if INFO = -i, then the i-th argument had an illegal value
          > 0:  DGESVJ did not converge in the maximal allowed number (30)
                of sweeps. The output may still be useful. See the
                description of WORK.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Further Details:
  The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane
  rotations. The rotations are implemented as fast scaled rotations of
  Anda and Park [1]. In the case of underflow of the Jacobi angle, a
  modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses
  column interchanges of de Rijk [2]. The relative accuracy of the computed
  singular values and the accuracy of the computed singular vectors (in
  angle metric) is as guaranteed by the theory of Demmel and Veselic [3].
  The condition number that determines the accuracy in the full rank case
  is essentially min_{D=diag} kappa(A*D), where kappa(.) is the
  spectral condition number. The best performance of this Jacobi SVD
  procedure is achieved if used in an  accelerated version of Drmac and
  Veselic [5,6], and it is the kernel routine in the SIGMA library [7].
  Some tuning parameters (marked with [TP]) are available for the
  implementer.
  The computational range for the nonzero singular values is the  machine
  number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even
  denormalized singular values can be computed with the corresponding
  gradual loss of accurate digits.
Contributors:
  ============

  Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
References:
 [1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling.
     SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174.
 [2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the
     singular value decomposition on a vector computer.
     SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371.
 [3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR.
 [4] Z. Drmac: Implementation of Jacobi rotations for accurate singular
     value computation in floating point arithmetic.
     SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222.
 [5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
     LAPACK Working note 169.
 [6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
     LAPACK Working note 170.
 [7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
     QSVD, (H,K)-SVD computations.
     Department of Mathematics, University of Zagreb, 2008.
Bugs, examples and comments:
  ===========================
  Please report all bugs and send interesting test examples and comments to
  drmac@math.hr. Thank you.

Definition at line 335 of file dgesvj.f.

337*
338* -- LAPACK computational routine --
339* -- LAPACK is a software package provided by Univ. of Tennessee, --
340* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
341*
342* .. Scalar Arguments ..
343 INTEGER INFO, LDA, LDV, LWORK, M, MV, N
344 CHARACTER*1 JOBA, JOBU, JOBV
345* ..
346* .. Array Arguments ..
347 DOUBLE PRECISION A( LDA, * ), SVA( N ), V( LDV, * ),
348 $ WORK( LWORK )
349* ..
350*
351* =====================================================================
352*
353* .. Local Parameters ..
354 DOUBLE PRECISION ZERO, HALF, ONE
355 parameter( zero = 0.0d0, half = 0.5d0, one = 1.0d0)
356 INTEGER NSWEEP
357 parameter( nsweep = 30 )
358* ..
359* .. Local Scalars ..
360 DOUBLE PRECISION AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG,
361 $ BIGTHETA, CS, CTOL, EPSLN, LARGE, MXAAPQ,
362 $ MXSINJ, ROOTBIG, ROOTEPS, ROOTSFMIN, ROOTTOL,
363 $ SKL, SFMIN, SMALL, SN, T, TEMP1, THETA,
364 $ THSIGN, TOL
365 INTEGER BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1,
366 $ ISWROT, jbc, jgl, KBL, LKAHEAD, MVL, N2, N34,
367 $ N4, NBL, NOTROT, p, PSKIPPED, q, ROWSKIP,
368 $ SWBAND
369 LOGICAL APPLV, GOSCALE, LOWER, LSVEC, NOSCALE, ROTOK,
370 $ RSVEC, UCTOL, UPPER
371* ..
372* .. Local Arrays ..
373 DOUBLE PRECISION FASTR( 5 )
374* ..
375* .. Intrinsic Functions ..
376 INTRINSIC dabs, max, min, dble, dsign, dsqrt
377* ..
378* .. External Functions ..
379* ..
380* from BLAS
381 DOUBLE PRECISION DDOT, DNRM2
382 EXTERNAL ddot, dnrm2
383 INTEGER IDAMAX
384 EXTERNAL idamax
385* from LAPACK
386 DOUBLE PRECISION DLAMCH
387 EXTERNAL dlamch
388 LOGICAL LSAME
389 EXTERNAL lsame
390* ..
391* .. External Subroutines ..
392* ..
393* from BLAS
394 EXTERNAL daxpy, dcopy, drotm, dscal, dswap
395* from LAPACK
396 EXTERNAL dlascl, dlaset, dlassq, xerbla
397*
398 EXTERNAL dgsvj0, dgsvj1
399* ..
400* .. Executable Statements ..
401*
402* Test the input arguments
403*
404 lsvec = lsame( jobu, 'U' )
405 uctol = lsame( jobu, 'C' )
406 rsvec = lsame( jobv, 'V' )
407 applv = lsame( jobv, 'A' )
408 upper = lsame( joba, 'U' )
409 lower = lsame( joba, 'L' )
410*
411 IF( .NOT.( upper .OR. lower .OR. lsame( joba, 'G' ) ) ) THEN
412 info = -1
413 ELSE IF( .NOT.( lsvec .OR. uctol .OR. lsame( jobu, 'N' ) ) ) THEN
414 info = -2
415 ELSE IF( .NOT.( rsvec .OR. applv .OR. lsame( jobv, 'N' ) ) ) THEN
416 info = -3
417 ELSE IF( m.LT.0 ) THEN
418 info = -4
419 ELSE IF( ( n.LT.0 ) .OR. ( n.GT.m ) ) THEN
420 info = -5
421 ELSE IF( lda.LT.m ) THEN
422 info = -7
423 ELSE IF( mv.LT.0 ) THEN
424 info = -9
425 ELSE IF( ( rsvec .AND. ( ldv.LT.n ) ) .OR.
426 $ ( applv .AND. ( ldv.LT.mv ) ) ) THEN
427 info = -11
428 ELSE IF( uctol .AND. ( work( 1 ).LE.one ) ) THEN
429 info = -12
430 ELSE IF( lwork.LT.max( m+n, 6 ) ) THEN
431 info = -13
432 ELSE
433 info = 0
434 END IF
435*
436* #:(
437 IF( info.NE.0 ) THEN
438 CALL xerbla( 'DGESVJ', -info )
439 RETURN
440 END IF
441*
442* #:) Quick return for void matrix
443*
444 IF( ( m.EQ.0 ) .OR. ( n.EQ.0 ) )RETURN
445*
446* Set numerical parameters
447* The stopping criterion for Jacobi rotations is
448*
449* max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS
450*
451* where EPS is the round-off and CTOL is defined as follows:
452*
453 IF( uctol ) THEN
454* ... user controlled
455 ctol = work( 1 )
456 ELSE
457* ... default
458 IF( lsvec .OR. rsvec .OR. applv ) THEN
459 ctol = dsqrt( dble( m ) )
460 ELSE
461 ctol = dble( m )
462 END IF
463 END IF
464* ... and the machine dependent parameters are
465*[!] (Make sure that DLAMCH() works properly on the target machine.)
466*
467 epsln = dlamch( 'Epsilon' )
468 rooteps = dsqrt( epsln )
469 sfmin = dlamch( 'SafeMinimum' )
470 rootsfmin = dsqrt( sfmin )
471 small = sfmin / epsln
472 big = dlamch( 'Overflow' )
473* BIG = ONE / SFMIN
474 rootbig = one / rootsfmin
475 large = big / dsqrt( dble( m*n ) )
476 bigtheta = one / rooteps
477*
478 tol = ctol*epsln
479 roottol = dsqrt( tol )
480*
481 IF( dble( m )*epsln.GE.one ) THEN
482 info = -4
483 CALL xerbla( 'DGESVJ', -info )
484 RETURN
485 END IF
486*
487* Initialize the right singular vector matrix.
488*
489 IF( rsvec ) THEN
490 mvl = n
491 CALL dlaset( 'A', mvl, n, zero, one, v, ldv )
492 ELSE IF( applv ) THEN
493 mvl = mv
494 END IF
495 rsvec = rsvec .OR. applv
496*
497* Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N )
498*(!) If necessary, scale A to protect the largest singular value
499* from overflow. It is possible that saving the largest singular
500* value destroys the information about the small ones.
501* This initial scaling is almost minimal in the sense that the
502* goal is to make sure that no column norm overflows, and that
503* DSQRT(N)*max_i SVA(i) does not overflow. If INFinite entries
504* in A are detected, the procedure returns with INFO=-6.
505*
506 skl= one / dsqrt( dble( m )*dble( n ) )
507 noscale = .true.
508 goscale = .true.
509*
510 IF( lower ) THEN
511* the input matrix is M-by-N lower triangular (trapezoidal)
512 DO 1874 p = 1, n
513 aapp = zero
514 aaqq = one
515 CALL dlassq( m-p+1, a( p, p ), 1, aapp, aaqq )
516 IF( aapp.GT.big ) THEN
517 info = -6
518 CALL xerbla( 'DGESVJ', -info )
519 RETURN
520 END IF
521 aaqq = dsqrt( aaqq )
522 IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
523 sva( p ) = aapp*aaqq
524 ELSE
525 noscale = .false.
526 sva( p ) = aapp*( aaqq*skl)
527 IF( goscale ) THEN
528 goscale = .false.
529 DO 1873 q = 1, p - 1
530 sva( q ) = sva( q )*skl
531 1873 CONTINUE
532 END IF
533 END IF
534 1874 CONTINUE
535 ELSE IF( upper ) THEN
536* the input matrix is M-by-N upper triangular (trapezoidal)
537 DO 2874 p = 1, n
538 aapp = zero
539 aaqq = one
540 CALL dlassq( p, a( 1, p ), 1, aapp, aaqq )
541 IF( aapp.GT.big ) THEN
542 info = -6
543 CALL xerbla( 'DGESVJ', -info )
544 RETURN
545 END IF
546 aaqq = dsqrt( aaqq )
547 IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
548 sva( p ) = aapp*aaqq
549 ELSE
550 noscale = .false.
551 sva( p ) = aapp*( aaqq*skl)
552 IF( goscale ) THEN
553 goscale = .false.
554 DO 2873 q = 1, p - 1
555 sva( q ) = sva( q )*skl
556 2873 CONTINUE
557 END IF
558 END IF
559 2874 CONTINUE
560 ELSE
561* the input matrix is M-by-N general dense
562 DO 3874 p = 1, n
563 aapp = zero
564 aaqq = one
565 CALL dlassq( m, a( 1, p ), 1, aapp, aaqq )
566 IF( aapp.GT.big ) THEN
567 info = -6
568 CALL xerbla( 'DGESVJ', -info )
569 RETURN
570 END IF
571 aaqq = dsqrt( aaqq )
572 IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
573 sva( p ) = aapp*aaqq
574 ELSE
575 noscale = .false.
576 sva( p ) = aapp*( aaqq*skl)
577 IF( goscale ) THEN
578 goscale = .false.
579 DO 3873 q = 1, p - 1
580 sva( q ) = sva( q )*skl
581 3873 CONTINUE
582 END IF
583 END IF
584 3874 CONTINUE
585 END IF
586*
587 IF( noscale )skl= one
588*
589* Move the smaller part of the spectrum from the underflow threshold
590*(!) Start by determining the position of the nonzero entries of the
591* array SVA() relative to ( SFMIN, BIG ).
592*
593 aapp = zero
594 aaqq = big
595 DO 4781 p = 1, n
596 IF( sva( p ).NE.zero )aaqq = min( aaqq, sva( p ) )
597 aapp = max( aapp, sva( p ) )
598 4781 CONTINUE
599*
600* #:) Quick return for zero matrix
601*
602 IF( aapp.EQ.zero ) THEN
603 IF( lsvec )CALL dlaset( 'G', m, n, zero, one, a, lda )
604 work( 1 ) = one
605 work( 2 ) = zero
606 work( 3 ) = zero
607 work( 4 ) = zero
608 work( 5 ) = zero
609 work( 6 ) = zero
610 RETURN
611 END IF
612*
613* #:) Quick return for one-column matrix
614*
615 IF( n.EQ.1 ) THEN
616 IF( lsvec )CALL dlascl( 'G', 0, 0, sva( 1 ), skl, m, 1,
617 $ a( 1, 1 ), lda, ierr )
618 work( 1 ) = one / skl
619 IF( sva( 1 ).GE.sfmin ) THEN
620 work( 2 ) = one
621 ELSE
622 work( 2 ) = zero
623 END IF
624 work( 3 ) = zero
625 work( 4 ) = zero
626 work( 5 ) = zero
627 work( 6 ) = zero
628 RETURN
629 END IF
630*
631* Protect small singular values from underflow, and try to
632* avoid underflows/overflows in computing Jacobi rotations.
633*
634 sn = dsqrt( sfmin / epsln )
635 temp1 = dsqrt( big / dble( n ) )
636 IF( ( aapp.LE.sn ) .OR. ( aaqq.GE.temp1 ) .OR.
637 $ ( ( sn.LE.aaqq ) .AND. ( aapp.LE.temp1 ) ) ) THEN
638 temp1 = min( big, temp1 / aapp )
639* AAQQ = AAQQ*TEMP1
640* AAPP = AAPP*TEMP1
641 ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.LE.temp1 ) ) THEN
642 temp1 = min( sn / aaqq, big / ( aapp*dsqrt( dble( n ) ) ) )
643* AAQQ = AAQQ*TEMP1
644* AAPP = AAPP*TEMP1
645 ELSE IF( ( aaqq.GE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
646 temp1 = max( sn / aaqq, temp1 / aapp )
647* AAQQ = AAQQ*TEMP1
648* AAPP = AAPP*TEMP1
649 ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
650 temp1 = min( sn / aaqq, big / ( dsqrt( dble( n ) )*aapp ) )
651* AAQQ = AAQQ*TEMP1
652* AAPP = AAPP*TEMP1
653 ELSE
654 temp1 = one
655 END IF
656*
657* Scale, if necessary
658*
659 IF( temp1.NE.one ) THEN
660 CALL dlascl( 'G', 0, 0, one, temp1, n, 1, sva, n, ierr )
661 END IF
662 skl= temp1*skl
663 IF( skl.NE.one ) THEN
664 CALL dlascl( joba, 0, 0, one, skl, m, n, a, lda, ierr )
665 skl= one / skl
666 END IF
667*
668* Row-cyclic Jacobi SVD algorithm with column pivoting
669*
670 emptsw = ( n*( n-1 ) ) / 2
671 notrot = 0
672 fastr( 1 ) = zero
673*
674* A is represented in factored form A = A * diag(WORK), where diag(WORK)
675* is initialized to identity. WORK is updated during fast scaled
676* rotations.
677*
678 DO 1868 q = 1, n
679 work( q ) = one
680 1868 CONTINUE
681*
682*
683 swband = 3
684*[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective
685* if DGESVJ is used as a computational routine in the preconditioned
686* Jacobi SVD algorithm DGESVJ. For sweeps i=1:SWBAND the procedure
687* works on pivots inside a band-like region around the diagonal.
688* The boundaries are determined dynamically, based on the number of
689* pivots above a threshold.
690*
691 kbl = min( 8, n )
692*[TP] KBL is a tuning parameter that defines the tile size in the
693* tiling of the p-q loops of pivot pairs. In general, an optimal
694* value of KBL depends on the matrix dimensions and on the
695* parameters of the computer's memory.
696*
697 nbl = n / kbl
698 IF( ( nbl*kbl ).NE.n )nbl = nbl + 1
699*
700 blskip = kbl**2
701*[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL.
702*
703 rowskip = min( 5, kbl )
704*[TP] ROWSKIP is a tuning parameter.
705*
706 lkahead = 1
707*[TP] LKAHEAD is a tuning parameter.
708*
709* Quasi block transformations, using the lower (upper) triangular
710* structure of the input matrix. The quasi-block-cycling usually
711* invokes cubic convergence. Big part of this cycle is done inside
712* canonical subspaces of dimensions less than M.
713*
714 IF( ( lower .OR. upper ) .AND. ( n.GT.max( 64, 4*kbl ) ) ) THEN
715*[TP] The number of partition levels and the actual partition are
716* tuning parameters.
717 n4 = n / 4
718 n2 = n / 2
719 n34 = 3*n4
720 IF( applv ) THEN
721 q = 0
722 ELSE
723 q = 1
724 END IF
725*
726 IF( lower ) THEN
727*
728* This works very well on lower triangular matrices, in particular
729* in the framework of the preconditioned Jacobi SVD (xGEJSV).
730* The idea is simple:
731* [+ 0 0 0] Note that Jacobi transformations of [0 0]
732* [+ + 0 0] [0 0]
733* [+ + x 0] actually work on [x 0] [x 0]
734* [+ + x x] [x x]. [x x]
735*
736 CALL dgsvj0( jobv, m-n34, n-n34, a( n34+1, n34+1 ), lda,
737 $ work( n34+1 ), sva( n34+1 ), mvl,
738 $ v( n34*q+1, n34+1 ), ldv, epsln, sfmin, tol,
739 $ 2, work( n+1 ), lwork-n, ierr )
740*
741 CALL dgsvj0( jobv, m-n2, n34-n2, a( n2+1, n2+1 ), lda,
742 $ work( n2+1 ), sva( n2+1 ), mvl,
743 $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 2,
744 $ work( n+1 ), lwork-n, ierr )
745*
746 CALL dgsvj1( jobv, m-n2, n-n2, n4, a( n2+1, n2+1 ), lda,
747 $ work( n2+1 ), sva( n2+1 ), mvl,
748 $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
749 $ work( n+1 ), lwork-n, ierr )
750*
751 CALL dgsvj0( jobv, m-n4, n2-n4, a( n4+1, n4+1 ), lda,
752 $ work( n4+1 ), sva( n4+1 ), mvl,
753 $ v( n4*q+1, n4+1 ), ldv, epsln, sfmin, tol, 1,
754 $ work( n+1 ), lwork-n, ierr )
755*
756 CALL dgsvj0( jobv, m, n4, a, lda, work, sva, mvl, v, ldv,
757 $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
758 $ ierr )
759*
760 CALL dgsvj1( jobv, m, n2, n4, a, lda, work, sva, mvl, v,
761 $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
762 $ lwork-n, ierr )
763*
764*
765 ELSE IF( upper ) THEN
766*
767*
768 CALL dgsvj0( jobv, n4, n4, a, lda, work, sva, mvl, v, ldv,
769 $ epsln, sfmin, tol, 2, work( n+1 ), lwork-n,
770 $ ierr )
771*
772 CALL dgsvj0( jobv, n2, n4, a( 1, n4+1 ), lda, work( n4+1 ),
773 $ sva( n4+1 ), mvl, v( n4*q+1, n4+1 ), ldv,
774 $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
775 $ ierr )
776*
777 CALL dgsvj1( jobv, n2, n2, n4, a, lda, work, sva, mvl, v,
778 $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
779 $ lwork-n, ierr )
780*
781 CALL dgsvj0( jobv, n2+n4, n4, a( 1, n2+1 ), lda,
782 $ work( n2+1 ), sva( n2+1 ), mvl,
783 $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
784 $ work( n+1 ), lwork-n, ierr )
785
786 END IF
787*
788 END IF
789*
790* .. Row-cyclic pivot strategy with de Rijk's pivoting ..
791*
792 DO 1993 i = 1, nsweep
793*
794* .. go go go ...
795*
796 mxaapq = zero
797 mxsinj = zero
798 iswrot = 0
799*
800 notrot = 0
801 pskipped = 0
802*
803* Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs
804* 1 <= p < q <= N. This is the first step toward a blocked implementation
805* of the rotations. New implementation, based on block transformations,
806* is under development.
807*
808 DO 2000 ibr = 1, nbl
809*
810 igl = ( ibr-1 )*kbl + 1
811*
812 DO 1002 ir1 = 0, min( lkahead, nbl-ibr )
813*
814 igl = igl + ir1*kbl
815*
816 DO 2001 p = igl, min( igl+kbl-1, n-1 )
817*
818* .. de Rijk's pivoting
819*
820 q = idamax( n-p+1, sva( p ), 1 ) + p - 1
821 IF( p.NE.q ) THEN
822 CALL dswap( m, a( 1, p ), 1, a( 1, q ), 1 )
823 IF( rsvec )CALL dswap( mvl, v( 1, p ), 1,
824 $ v( 1, q ), 1 )
825 temp1 = sva( p )
826 sva( p ) = sva( q )
827 sva( q ) = temp1
828 temp1 = work( p )
829 work( p ) = work( q )
830 work( q ) = temp1
831 END IF
832*
833 IF( ir1.EQ.0 ) THEN
834*
835* Column norms are periodically updated by explicit
836* norm computation.
837* Caveat:
838* Unfortunately, some BLAS implementations compute DNRM2(M,A(1,p),1)
839* as DSQRT(DDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to
840* overflow for ||A(:,p)||_2 > DSQRT(overflow_threshold), and to
841* underflow for ||A(:,p)||_2 < DSQRT(underflow_threshold).
842* Hence, DNRM2 cannot be trusted, not even in the case when
843* the true norm is far from the under(over)flow boundaries.
844* If properly implemented DNRM2 is available, the IF-THEN-ELSE
845* below should read "AAPP = DNRM2( M, A(1,p), 1 ) * WORK(p)".
846*
847 IF( ( sva( p ).LT.rootbig ) .AND.
848 $ ( sva( p ).GT.rootsfmin ) ) THEN
849 sva( p ) = dnrm2( m, a( 1, p ), 1 )*work( p )
850 ELSE
851 temp1 = zero
852 aapp = one
853 CALL dlassq( m, a( 1, p ), 1, temp1, aapp )
854 sva( p ) = temp1*dsqrt( aapp )*work( p )
855 END IF
856 aapp = sva( p )
857 ELSE
858 aapp = sva( p )
859 END IF
860*
861 IF( aapp.GT.zero ) THEN
862*
863 pskipped = 0
864*
865 DO 2002 q = p + 1, min( igl+kbl-1, n )
866*
867 aaqq = sva( q )
868*
869 IF( aaqq.GT.zero ) THEN
870*
871 aapp0 = aapp
872 IF( aaqq.GE.one ) THEN
873 rotok = ( small*aapp ).LE.aaqq
874 IF( aapp.LT.( big / aaqq ) ) THEN
875 aapq = ( ddot( m, a( 1, p ), 1, a( 1,
876 $ q ), 1 )*work( p )*work( q ) /
877 $ aaqq ) / aapp
878 ELSE
879 CALL dcopy( m, a( 1, p ), 1,
880 $ work( n+1 ), 1 )
881 CALL dlascl( 'G', 0, 0, aapp,
882 $ work( p ), m, 1,
883 $ work( n+1 ), lda, ierr )
884 aapq = ddot( m, work( n+1 ), 1,
885 $ a( 1, q ), 1 )*work( q ) / aaqq
886 END IF
887 ELSE
888 rotok = aapp.LE.( aaqq / small )
889 IF( aapp.GT.( small / aaqq ) ) THEN
890 aapq = ( ddot( m, a( 1, p ), 1, a( 1,
891 $ q ), 1 )*work( p )*work( q ) /
892 $ aaqq ) / aapp
893 ELSE
894 CALL dcopy( m, a( 1, q ), 1,
895 $ work( n+1 ), 1 )
896 CALL dlascl( 'G', 0, 0, aaqq,
897 $ work( q ), m, 1,
898 $ work( n+1 ), lda, ierr )
899 aapq = ddot( m, work( n+1 ), 1,
900 $ a( 1, p ), 1 )*work( p ) / aapp
901 END IF
902 END IF
903*
904 mxaapq = max( mxaapq, dabs( aapq ) )
905*
906* TO rotate or NOT to rotate, THAT is the question ...
907*
908 IF( dabs( aapq ).GT.tol ) THEN
909*
910* .. rotate
911*[RTD] ROTATED = ROTATED + ONE
912*
913 IF( ir1.EQ.0 ) THEN
914 notrot = 0
915 pskipped = 0
916 iswrot = iswrot + 1
917 END IF
918*
919 IF( rotok ) THEN
920*
921 aqoap = aaqq / aapp
922 apoaq = aapp / aaqq
923 theta = -half*dabs(aqoap-apoaq)/aapq
924*
925 IF( dabs( theta ).GT.bigtheta ) THEN
926*
927 t = half / theta
928 fastr( 3 ) = t*work( p ) / work( q )
929 fastr( 4 ) = -t*work( q ) /
930 $ work( p )
931 CALL drotm( m, a( 1, p ), 1,
932 $ a( 1, q ), 1, fastr )
933 IF( rsvec )CALL drotm( mvl,
934 $ v( 1, p ), 1,
935 $ v( 1, q ), 1,
936 $ fastr )
937 sva( q ) = aaqq*dsqrt( max( zero,
938 $ one+t*apoaq*aapq ) )
939 aapp = aapp*dsqrt( max( zero,
940 $ one-t*aqoap*aapq ) )
941 mxsinj = max( mxsinj, dabs( t ) )
942*
943 ELSE
944*
945* .. choose correct signum for THETA and rotate
946*
947 thsign = -dsign( one, aapq )
948 t = one / ( theta+thsign*
949 $ dsqrt( one+theta*theta ) )
950 cs = dsqrt( one / ( one+t*t ) )
951 sn = t*cs
952*
953 mxsinj = max( mxsinj, dabs( sn ) )
954 sva( q ) = aaqq*dsqrt( max( zero,
955 $ one+t*apoaq*aapq ) )
956 aapp = aapp*dsqrt( max( zero,
957 $ one-t*aqoap*aapq ) )
958*
959 apoaq = work( p ) / work( q )
960 aqoap = work( q ) / work( p )
961 IF( work( p ).GE.one ) THEN
962 IF( work( q ).GE.one ) THEN
963 fastr( 3 ) = t*apoaq
964 fastr( 4 ) = -t*aqoap
965 work( p ) = work( p )*cs
966 work( q ) = work( q )*cs
967 CALL drotm( m, a( 1, p ), 1,
968 $ a( 1, q ), 1,
969 $ fastr )
970 IF( rsvec )CALL drotm( mvl,
971 $ v( 1, p ), 1, v( 1, q ),
972 $ 1, fastr )
973 ELSE
974 CALL daxpy( m, -t*aqoap,
975 $ a( 1, q ), 1,
976 $ a( 1, p ), 1 )
977 CALL daxpy( m, cs*sn*apoaq,
978 $ a( 1, p ), 1,
979 $ a( 1, q ), 1 )
980 work( p ) = work( p )*cs
981 work( q ) = work( q ) / cs
982 IF( rsvec ) THEN
983 CALL daxpy( mvl, -t*aqoap,
984 $ v( 1, q ), 1,
985 $ v( 1, p ), 1 )
986 CALL daxpy( mvl,
987 $ cs*sn*apoaq,
988 $ v( 1, p ), 1,
989 $ v( 1, q ), 1 )
990 END IF
991 END IF
992 ELSE
993 IF( work( q ).GE.one ) THEN
994 CALL daxpy( m, t*apoaq,
995 $ a( 1, p ), 1,
996 $ a( 1, q ), 1 )
997 CALL daxpy( m, -cs*sn*aqoap,
998 $ a( 1, q ), 1,
999 $ a( 1, p ), 1 )
1000 work( p ) = work( p ) / cs
1001 work( q ) = work( q )*cs
1002 IF( rsvec ) THEN
1003 CALL daxpy( mvl, t*apoaq,
1004 $ v( 1, p ), 1,
1005 $ v( 1, q ), 1 )
1006 CALL daxpy( mvl,
1007 $ -cs*sn*aqoap,
1008 $ v( 1, q ), 1,
1009 $ v( 1, p ), 1 )
1010 END IF
1011 ELSE
1012 IF( work( p ).GE.work( q ) )
1013 $ THEN
1014 CALL daxpy( m, -t*aqoap,
1015 $ a( 1, q ), 1,
1016 $ a( 1, p ), 1 )
1017 CALL daxpy( m, cs*sn*apoaq,
1018 $ a( 1, p ), 1,
1019 $ a( 1, q ), 1 )
1020 work( p ) = work( p )*cs
1021 work( q ) = work( q ) / cs
1022 IF( rsvec ) THEN
1023 CALL daxpy( mvl,
1024 $ -t*aqoap,
1025 $ v( 1, q ), 1,
1026 $ v( 1, p ), 1 )
1027 CALL daxpy( mvl,
1028 $ cs*sn*apoaq,
1029 $ v( 1, p ), 1,
1030 $ v( 1, q ), 1 )
1031 END IF
1032 ELSE
1033 CALL daxpy( m, t*apoaq,
1034 $ a( 1, p ), 1,
1035 $ a( 1, q ), 1 )
1036 CALL daxpy( m,
1037 $ -cs*sn*aqoap,
1038 $ a( 1, q ), 1,
1039 $ a( 1, p ), 1 )
1040 work( p ) = work( p ) / cs
1041 work( q ) = work( q )*cs
1042 IF( rsvec ) THEN
1043 CALL daxpy( mvl,
1044 $ t*apoaq, v( 1, p ),
1045 $ 1, v( 1, q ), 1 )
1046 CALL daxpy( mvl,
1047 $ -cs*sn*aqoap,
1048 $ v( 1, q ), 1,
1049 $ v( 1, p ), 1 )
1050 END IF
1051 END IF
1052 END IF
1053 END IF
1054 END IF
1055*
1056 ELSE
1057* .. have to use modified Gram-Schmidt like transformation
1058 CALL dcopy( m, a( 1, p ), 1,
1059 $ work( n+1 ), 1 )
1060 CALL dlascl( 'G', 0, 0, aapp, one, m,
1061 $ 1, work( n+1 ), lda,
1062 $ ierr )
1063 CALL dlascl( 'G', 0, 0, aaqq, one, m,
1064 $ 1, a( 1, q ), lda, ierr )
1065 temp1 = -aapq*work( p ) / work( q )
1066 CALL daxpy( m, temp1, work( n+1 ), 1,
1067 $ a( 1, q ), 1 )
1068 CALL dlascl( 'G', 0, 0, one, aaqq, m,
1069 $ 1, a( 1, q ), lda, ierr )
1070 sva( q ) = aaqq*dsqrt( max( zero,
1071 $ one-aapq*aapq ) )
1072 mxsinj = max( mxsinj, sfmin )
1073 END IF
1074* END IF ROTOK THEN ... ELSE
1075*
1076* In the case of cancellation in updating SVA(q), SVA(p)
1077* recompute SVA(q), SVA(p).
1078*
1079 IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1080 $ THEN
1081 IF( ( aaqq.LT.rootbig ) .AND.
1082 $ ( aaqq.GT.rootsfmin ) ) THEN
1083 sva( q ) = dnrm2( m, a( 1, q ), 1 )*
1084 $ work( q )
1085 ELSE
1086 t = zero
1087 aaqq = one
1088 CALL dlassq( m, a( 1, q ), 1, t,
1089 $ aaqq )
1090 sva( q ) = t*dsqrt( aaqq )*work( q )
1091 END IF
1092 END IF
1093 IF( ( aapp / aapp0 ).LE.rooteps ) THEN
1094 IF( ( aapp.LT.rootbig ) .AND.
1095 $ ( aapp.GT.rootsfmin ) ) THEN
1096 aapp = dnrm2( m, a( 1, p ), 1 )*
1097 $ work( p )
1098 ELSE
1099 t = zero
1100 aapp = one
1101 CALL dlassq( m, a( 1, p ), 1, t,
1102 $ aapp )
1103 aapp = t*dsqrt( aapp )*work( p )
1104 END IF
1105 sva( p ) = aapp
1106 END IF
1107*
1108 ELSE
1109* A(:,p) and A(:,q) already numerically orthogonal
1110 IF( ir1.EQ.0 )notrot = notrot + 1
1111*[RTD] SKIPPED = SKIPPED + 1
1112 pskipped = pskipped + 1
1113 END IF
1114 ELSE
1115* A(:,q) is zero column
1116 IF( ir1.EQ.0 )notrot = notrot + 1
1117 pskipped = pskipped + 1
1118 END IF
1119*
1120 IF( ( i.LE.swband ) .AND.
1121 $ ( pskipped.GT.rowskip ) ) THEN
1122 IF( ir1.EQ.0 )aapp = -aapp
1123 notrot = 0
1124 GO TO 2103
1125 END IF
1126*
1127 2002 CONTINUE
1128* END q-LOOP
1129*
1130 2103 CONTINUE
1131* bailed out of q-loop
1132*
1133 sva( p ) = aapp
1134*
1135 ELSE
1136 sva( p ) = aapp
1137 IF( ( ir1.EQ.0 ) .AND. ( aapp.EQ.zero ) )
1138 $ notrot = notrot + min( igl+kbl-1, n ) - p
1139 END IF
1140*
1141 2001 CONTINUE
1142* end of the p-loop
1143* end of doing the block ( ibr, ibr )
1144 1002 CONTINUE
1145* end of ir1-loop
1146*
1147* ... go to the off diagonal blocks
1148*
1149 igl = ( ibr-1 )*kbl + 1
1150*
1151 DO 2010 jbc = ibr + 1, nbl
1152*
1153 jgl = ( jbc-1 )*kbl + 1
1154*
1155* doing the block at ( ibr, jbc )
1156*
1157 ijblsk = 0
1158 DO 2100 p = igl, min( igl+kbl-1, n )
1159*
1160 aapp = sva( p )
1161 IF( aapp.GT.zero ) THEN
1162*
1163 pskipped = 0
1164*
1165 DO 2200 q = jgl, min( jgl+kbl-1, n )
1166*
1167 aaqq = sva( q )
1168 IF( aaqq.GT.zero ) THEN
1169 aapp0 = aapp
1170*
1171* .. M x 2 Jacobi SVD ..
1172*
1173* Safe Gram matrix computation
1174*
1175 IF( aaqq.GE.one ) THEN
1176 IF( aapp.GE.aaqq ) THEN
1177 rotok = ( small*aapp ).LE.aaqq
1178 ELSE
1179 rotok = ( small*aaqq ).LE.aapp
1180 END IF
1181 IF( aapp.LT.( big / aaqq ) ) THEN
1182 aapq = ( ddot( m, a( 1, p ), 1, a( 1,
1183 $ q ), 1 )*work( p )*work( q ) /
1184 $ aaqq ) / aapp
1185 ELSE
1186 CALL dcopy( m, a( 1, p ), 1,
1187 $ work( n+1 ), 1 )
1188 CALL dlascl( 'G', 0, 0, aapp,
1189 $ work( p ), m, 1,
1190 $ work( n+1 ), lda, ierr )
1191 aapq = ddot( m, work( n+1 ), 1,
1192 $ a( 1, q ), 1 )*work( q ) / aaqq
1193 END IF
1194 ELSE
1195 IF( aapp.GE.aaqq ) THEN
1196 rotok = aapp.LE.( aaqq / small )
1197 ELSE
1198 rotok = aaqq.LE.( aapp / small )
1199 END IF
1200 IF( aapp.GT.( small / aaqq ) ) THEN
1201 aapq = ( ddot( m, a( 1, p ), 1, a( 1,
1202 $ q ), 1 )*work( p )*work( q ) /
1203 $ aaqq ) / aapp
1204 ELSE
1205 CALL dcopy( m, a( 1, q ), 1,
1206 $ work( n+1 ), 1 )
1207 CALL dlascl( 'G', 0, 0, aaqq,
1208 $ work( q ), m, 1,
1209 $ work( n+1 ), lda, ierr )
1210 aapq = ddot( m, work( n+1 ), 1,
1211 $ a( 1, p ), 1 )*work( p ) / aapp
1212 END IF
1213 END IF
1214*
1215 mxaapq = max( mxaapq, dabs( aapq ) )
1216*
1217* TO rotate or NOT to rotate, THAT is the question ...
1218*
1219 IF( dabs( aapq ).GT.tol ) THEN
1220 notrot = 0
1221*[RTD] ROTATED = ROTATED + 1
1222 pskipped = 0
1223 iswrot = iswrot + 1
1224*
1225 IF( rotok ) THEN
1226*
1227 aqoap = aaqq / aapp
1228 apoaq = aapp / aaqq
1229 theta = -half*dabs(aqoap-apoaq)/aapq
1230 IF( aaqq.GT.aapp0 )theta = -theta
1231*
1232 IF( dabs( theta ).GT.bigtheta ) THEN
1233 t = half / theta
1234 fastr( 3 ) = t*work( p ) / work( q )
1235 fastr( 4 ) = -t*work( q ) /
1236 $ work( p )
1237 CALL drotm( m, a( 1, p ), 1,
1238 $ a( 1, q ), 1, fastr )
1239 IF( rsvec )CALL drotm( mvl,
1240 $ v( 1, p ), 1,
1241 $ v( 1, q ), 1,
1242 $ fastr )
1243 sva( q ) = aaqq*dsqrt( max( zero,
1244 $ one+t*apoaq*aapq ) )
1245 aapp = aapp*dsqrt( max( zero,
1246 $ one-t*aqoap*aapq ) )
1247 mxsinj = max( mxsinj, dabs( t ) )
1248 ELSE
1249*
1250* .. choose correct signum for THETA and rotate
1251*
1252 thsign = -dsign( one, aapq )
1253 IF( aaqq.GT.aapp0 )thsign = -thsign
1254 t = one / ( theta+thsign*
1255 $ dsqrt( one+theta*theta ) )
1256 cs = dsqrt( one / ( one+t*t ) )
1257 sn = t*cs
1258 mxsinj = max( mxsinj, dabs( sn ) )
1259 sva( q ) = aaqq*dsqrt( max( zero,
1260 $ one+t*apoaq*aapq ) )
1261 aapp = aapp*dsqrt( max( zero,
1262 $ one-t*aqoap*aapq ) )
1263*
1264 apoaq = work( p ) / work( q )
1265 aqoap = work( q ) / work( p )
1266 IF( work( p ).GE.one ) THEN
1267*
1268 IF( work( q ).GE.one ) THEN
1269 fastr( 3 ) = t*apoaq
1270 fastr( 4 ) = -t*aqoap
1271 work( p ) = work( p )*cs
1272 work( q ) = work( q )*cs
1273 CALL drotm( m, a( 1, p ), 1,
1274 $ a( 1, q ), 1,
1275 $ fastr )
1276 IF( rsvec )CALL drotm( mvl,
1277 $ v( 1, p ), 1, v( 1, q ),
1278 $ 1, fastr )
1279 ELSE
1280 CALL daxpy( m, -t*aqoap,
1281 $ a( 1, q ), 1,
1282 $ a( 1, p ), 1 )
1283 CALL daxpy( m, cs*sn*apoaq,
1284 $ a( 1, p ), 1,
1285 $ a( 1, q ), 1 )
1286 IF( rsvec ) THEN
1287 CALL daxpy( mvl, -t*aqoap,
1288 $ v( 1, q ), 1,
1289 $ v( 1, p ), 1 )
1290 CALL daxpy( mvl,
1291 $ cs*sn*apoaq,
1292 $ v( 1, p ), 1,
1293 $ v( 1, q ), 1 )
1294 END IF
1295 work( p ) = work( p )*cs
1296 work( q ) = work( q ) / cs
1297 END IF
1298 ELSE
1299 IF( work( q ).GE.one ) THEN
1300 CALL daxpy( m, t*apoaq,
1301 $ a( 1, p ), 1,
1302 $ a( 1, q ), 1 )
1303 CALL daxpy( m, -cs*sn*aqoap,
1304 $ a( 1, q ), 1,
1305 $ a( 1, p ), 1 )
1306 IF( rsvec ) THEN
1307 CALL daxpy( mvl, t*apoaq,
1308 $ v( 1, p ), 1,
1309 $ v( 1, q ), 1 )
1310 CALL daxpy( mvl,
1311 $ -cs*sn*aqoap,
1312 $ v( 1, q ), 1,
1313 $ v( 1, p ), 1 )
1314 END IF
1315 work( p ) = work( p ) / cs
1316 work( q ) = work( q )*cs
1317 ELSE
1318 IF( work( p ).GE.work( q ) )
1319 $ THEN
1320 CALL daxpy( m, -t*aqoap,
1321 $ a( 1, q ), 1,
1322 $ a( 1, p ), 1 )
1323 CALL daxpy( m, cs*sn*apoaq,
1324 $ a( 1, p ), 1,
1325 $ a( 1, q ), 1 )
1326 work( p ) = work( p )*cs
1327 work( q ) = work( q ) / cs
1328 IF( rsvec ) THEN
1329 CALL daxpy( mvl,
1330 $ -t*aqoap,
1331 $ v( 1, q ), 1,
1332 $ v( 1, p ), 1 )
1333 CALL daxpy( mvl,
1334 $ cs*sn*apoaq,
1335 $ v( 1, p ), 1,
1336 $ v( 1, q ), 1 )
1337 END IF
1338 ELSE
1339 CALL daxpy( m, t*apoaq,
1340 $ a( 1, p ), 1,
1341 $ a( 1, q ), 1 )
1342 CALL daxpy( m,
1343 $ -cs*sn*aqoap,
1344 $ a( 1, q ), 1,
1345 $ a( 1, p ), 1 )
1346 work( p ) = work( p ) / cs
1347 work( q ) = work( q )*cs
1348 IF( rsvec ) THEN
1349 CALL daxpy( mvl,
1350 $ t*apoaq, v( 1, p ),
1351 $ 1, v( 1, q ), 1 )
1352 CALL daxpy( mvl,
1353 $ -cs*sn*aqoap,
1354 $ v( 1, q ), 1,
1355 $ v( 1, p ), 1 )
1356 END IF
1357 END IF
1358 END IF
1359 END IF
1360 END IF
1361*
1362 ELSE
1363 IF( aapp.GT.aaqq ) THEN
1364 CALL dcopy( m, a( 1, p ), 1,
1365 $ work( n+1 ), 1 )
1366 CALL dlascl( 'G', 0, 0, aapp, one,
1367 $ m, 1, work( n+1 ), lda,
1368 $ ierr )
1369 CALL dlascl( 'G', 0, 0, aaqq, one,
1370 $ m, 1, a( 1, q ), lda,
1371 $ ierr )
1372 temp1 = -aapq*work( p ) / work( q )
1373 CALL daxpy( m, temp1, work( n+1 ),
1374 $ 1, a( 1, q ), 1 )
1375 CALL dlascl( 'G', 0, 0, one, aaqq,
1376 $ m, 1, a( 1, q ), lda,
1377 $ ierr )
1378 sva( q ) = aaqq*dsqrt( max( zero,
1379 $ one-aapq*aapq ) )
1380 mxsinj = max( mxsinj, sfmin )
1381 ELSE
1382 CALL dcopy( m, a( 1, q ), 1,
1383 $ work( n+1 ), 1 )
1384 CALL dlascl( 'G', 0, 0, aaqq, one,
1385 $ m, 1, work( n+1 ), lda,
1386 $ ierr )
1387 CALL dlascl( 'G', 0, 0, aapp, one,
1388 $ m, 1, a( 1, p ), lda,
1389 $ ierr )
1390 temp1 = -aapq*work( q ) / work( p )
1391 CALL daxpy( m, temp1, work( n+1 ),
1392 $ 1, a( 1, p ), 1 )
1393 CALL dlascl( 'G', 0, 0, one, aapp,
1394 $ m, 1, a( 1, p ), lda,
1395 $ ierr )
1396 sva( p ) = aapp*dsqrt( max( zero,
1397 $ one-aapq*aapq ) )
1398 mxsinj = max( mxsinj, sfmin )
1399 END IF
1400 END IF
1401* END IF ROTOK THEN ... ELSE
1402*
1403* In the case of cancellation in updating SVA(q)
1404* .. recompute SVA(q)
1405 IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1406 $ THEN
1407 IF( ( aaqq.LT.rootbig ) .AND.
1408 $ ( aaqq.GT.rootsfmin ) ) THEN
1409 sva( q ) = dnrm2( m, a( 1, q ), 1 )*
1410 $ work( q )
1411 ELSE
1412 t = zero
1413 aaqq = one
1414 CALL dlassq( m, a( 1, q ), 1, t,
1415 $ aaqq )
1416 sva( q ) = t*dsqrt( aaqq )*work( q )
1417 END IF
1418 END IF
1419 IF( ( aapp / aapp0 )**2.LE.rooteps ) THEN
1420 IF( ( aapp.LT.rootbig ) .AND.
1421 $ ( aapp.GT.rootsfmin ) ) THEN
1422 aapp = dnrm2( m, a( 1, p ), 1 )*
1423 $ work( p )
1424 ELSE
1425 t = zero
1426 aapp = one
1427 CALL dlassq( m, a( 1, p ), 1, t,
1428 $ aapp )
1429 aapp = t*dsqrt( aapp )*work( p )
1430 END IF
1431 sva( p ) = aapp
1432 END IF
1433* end of OK rotation
1434 ELSE
1435 notrot = notrot + 1
1436*[RTD] SKIPPED = SKIPPED + 1
1437 pskipped = pskipped + 1
1438 ijblsk = ijblsk + 1
1439 END IF
1440 ELSE
1441 notrot = notrot + 1
1442 pskipped = pskipped + 1
1443 ijblsk = ijblsk + 1
1444 END IF
1445*
1446 IF( ( i.LE.swband ) .AND. ( ijblsk.GE.blskip ) )
1447 $ THEN
1448 sva( p ) = aapp
1449 notrot = 0
1450 GO TO 2011
1451 END IF
1452 IF( ( i.LE.swband ) .AND.
1453 $ ( pskipped.GT.rowskip ) ) THEN
1454 aapp = -aapp
1455 notrot = 0
1456 GO TO 2203
1457 END IF
1458*
1459 2200 CONTINUE
1460* end of the q-loop
1461 2203 CONTINUE
1462*
1463 sva( p ) = aapp
1464*
1465 ELSE
1466*
1467 IF( aapp.EQ.zero )notrot = notrot +
1468 $ min( jgl+kbl-1, n ) - jgl + 1
1469 IF( aapp.LT.zero )notrot = 0
1470*
1471 END IF
1472*
1473 2100 CONTINUE
1474* end of the p-loop
1475 2010 CONTINUE
1476* end of the jbc-loop
1477 2011 CONTINUE
1478*2011 bailed out of the jbc-loop
1479 DO 2012 p = igl, min( igl+kbl-1, n )
1480 sva( p ) = dabs( sva( p ) )
1481 2012 CONTINUE
1482***
1483 2000 CONTINUE
1484*2000 :: end of the ibr-loop
1485*
1486* .. update SVA(N)
1487 IF( ( sva( n ).LT.rootbig ) .AND. ( sva( n ).GT.rootsfmin ) )
1488 $ THEN
1489 sva( n ) = dnrm2( m, a( 1, n ), 1 )*work( n )
1490 ELSE
1491 t = zero
1492 aapp = one
1493 CALL dlassq( m, a( 1, n ), 1, t, aapp )
1494 sva( n ) = t*dsqrt( aapp )*work( n )
1495 END IF
1496*
1497* Additional steering devices
1498*
1499 IF( ( i.LT.swband ) .AND. ( ( mxaapq.LE.roottol ) .OR.
1500 $ ( iswrot.LE.n ) ) )swband = i
1501*
1502 IF( ( i.GT.swband+1 ) .AND. ( mxaapq.LT.dsqrt( dble( n ) )*
1503 $ tol ) .AND. ( dble( n )*mxaapq*mxsinj.LT.tol ) ) THEN
1504 GO TO 1994
1505 END IF
1506*
1507 IF( notrot.GE.emptsw )GO TO 1994
1508*
1509 1993 CONTINUE
1510* end i=1:NSWEEP loop
1511*
1512* #:( Reaching this point means that the procedure has not converged.
1513 info = nsweep - 1
1514 GO TO 1995
1515*
1516 1994 CONTINUE
1517* #:) Reaching this point means numerical convergence after the i-th
1518* sweep.
1519*
1520 info = 0
1521* #:) INFO = 0 confirms successful iterations.
1522 1995 CONTINUE
1523*
1524* Sort the singular values and find how many are above
1525* the underflow threshold.
1526*
1527 n2 = 0
1528 n4 = 0
1529 DO 5991 p = 1, n - 1
1530 q = idamax( n-p+1, sva( p ), 1 ) + p - 1
1531 IF( p.NE.q ) THEN
1532 temp1 = sva( p )
1533 sva( p ) = sva( q )
1534 sva( q ) = temp1
1535 temp1 = work( p )
1536 work( p ) = work( q )
1537 work( q ) = temp1
1538 CALL dswap( m, a( 1, p ), 1, a( 1, q ), 1 )
1539 IF( rsvec )CALL dswap( mvl, v( 1, p ), 1, v( 1, q ), 1 )
1540 END IF
1541 IF( sva( p ).NE.zero ) THEN
1542 n4 = n4 + 1
1543 IF( sva( p )*skl.GT.sfmin )n2 = n2 + 1
1544 END IF
1545 5991 CONTINUE
1546 IF( sva( n ).NE.zero ) THEN
1547 n4 = n4 + 1
1548 IF( sva( n )*skl.GT.sfmin )n2 = n2 + 1
1549 END IF
1550*
1551* Normalize the left singular vectors.
1552*
1553 IF( lsvec .OR. uctol ) THEN
1554 DO 1998 p = 1, n2
1555 CALL dscal( m, work( p ) / sva( p ), a( 1, p ), 1 )
1556 1998 CONTINUE
1557 END IF
1558*
1559* Scale the product of Jacobi rotations (assemble the fast rotations).
1560*
1561 IF( rsvec ) THEN
1562 IF( applv ) THEN
1563 DO 2398 p = 1, n
1564 CALL dscal( mvl, work( p ), v( 1, p ), 1 )
1565 2398 CONTINUE
1566 ELSE
1567 DO 2399 p = 1, n
1568 temp1 = one / dnrm2( mvl, v( 1, p ), 1 )
1569 CALL dscal( mvl, temp1, v( 1, p ), 1 )
1570 2399 CONTINUE
1571 END IF
1572 END IF
1573*
1574* Undo scaling, if necessary (and possible).
1575 IF( ( ( skl.GT.one ) .AND. ( sva( 1 ).LT.( big / skl) ) )
1576 $ .OR. ( ( skl.LT.one ) .AND. ( sva( max( n2, 1 ) ) .GT.
1577 $ ( sfmin / skl) ) ) ) THEN
1578 DO 2400 p = 1, n
1579 sva( p ) = skl*sva( p )
1580 2400 CONTINUE
1581 skl= one
1582 END IF
1583*
1584 work( 1 ) = skl
1585* The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE
1586* then some of the singular values may overflow or underflow and
1587* the spectrum is given in this factored representation.
1588*
1589 work( 2 ) = dble( n4 )
1590* N4 is the number of computed nonzero singular values of A.
1591*
1592 work( 3 ) = dble( n2 )
1593* N2 is the number of singular values of A greater than SFMIN.
1594* If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers
1595* that may carry some information.
1596*
1597 work( 4 ) = dble( i )
1598* i is the index of the last sweep before declaring convergence.
1599*
1600 work( 5 ) = mxaapq
1601* MXAAPQ is the largest absolute value of scaled pivots in the
1602* last sweep
1603*
1604 work( 6 ) = mxsinj
1605* MXSINJ is the largest absolute value of the sines of Jacobi angles
1606* in the last sweep
1607*
1608 RETURN
1609* ..
1610* .. END OF DGESVJ
1611* ..
subroutine xerbla(srname, info)
Definition cblat2.f:3285
subroutine daxpy(n, da, dx, incx, dy, incy)
DAXPY
Definition daxpy.f:89
subroutine dcopy(n, dx, incx, dy, incy)
DCOPY
Definition dcopy.f:82
double precision function ddot(n, dx, incx, dy, incy)
DDOT
Definition ddot.f:82
subroutine dgsvj0(jobv, m, n, a, lda, d, sva, mv, v, ldv, eps, sfmin, tol, nsweep, work, lwork, info)
DGSVJ0 pre-processor for the routine dgesvj.
Definition dgsvj0.f:218
subroutine dgsvj1(jobv, m, n, n1, a, lda, d, sva, mv, v, ldv, eps, sfmin, tol, nsweep, work, lwork, info)
DGSVJ1 pre-processor for the routine dgesvj, applies Jacobi rotations targeting only particular pivot...
Definition dgsvj1.f:236
integer function idamax(n, dx, incx)
IDAMAX
Definition idamax.f:71
double precision function dlamch(cmach)
DLAMCH
Definition dlamch.f:69
subroutine dlascl(type, kl, ku, cfrom, cto, m, n, a, lda, info)
DLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition dlascl.f:143
subroutine dlaset(uplo, m, n, alpha, beta, a, lda)
DLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values.
Definition dlaset.f:110
subroutine dlassq(n, x, incx, scale, sumsq)
DLASSQ updates a sum of squares represented in scaled form.
Definition dlassq.f90:124
logical function lsame(ca, cb)
LSAME
Definition lsame.f:48
real(wp) function dnrm2(n, x, incx)
DNRM2
Definition dnrm2.f90:89
subroutine drotm(n, dx, incx, dy, incy, dparam)
DROTM
Definition drotm.f:96
subroutine dscal(n, da, dx, incx)
DSCAL
Definition dscal.f:79
subroutine dswap(n, dx, incx, dy, incy)
DSWAP
Definition dswap.f:82
Here is the call graph for this function:
Here is the caller graph for this function: