LAPACK  3.8.0
LAPACK: Linear Algebra PACKage

◆ dgesvj()

subroutine dgesvj ( character*1  JOBA,
character*1  JOBU,
character*1  JOBV,
integer  M,
integer  N,
double precision, dimension( lda, * )  A,
integer  LDA,
double precision, dimension( n )  SVA,
integer  MV,
double precision, dimension( ldv, * )  V,
integer  LDV,
double precision, dimension( lwork )  WORK,
integer  LWORK,
integer  INFO 
)

DGESVJ

Download DGESVJ + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 DGESVJ computes the singular value decomposition (SVD) of a real
 M-by-N matrix A, where M >= N. The SVD of A is written as
                                    [++]   [xx]   [x0]   [xx]
              A = U * SIGMA * V^t,  [++] = [xx] * [ox] * [xx]
                                    [++]   [xx]
 where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal
 matrix, and V is an N-by-N orthogonal matrix. The diagonal elements
 of SIGMA are the singular values of A. The columns of U and V are the
 left and the right singular vectors of A, respectively.
 DGESVJ can sometimes compute tiny singular values and their singular vectors much
 more accurately than other SVD routines, see below under Further Details.
Parameters
[in]JOBA
          JOBA is CHARACTER*1
          Specifies the structure of A.
          = 'L': The input matrix A is lower triangular;
          = 'U': The input matrix A is upper triangular;
          = 'G': The input matrix A is general M-by-N matrix, M >= N.
[in]JOBU
          JOBU is CHARACTER*1
          Specifies whether to compute the left singular vectors
          (columns of U):
          = 'U': The left singular vectors corresponding to the nonzero
                 singular values are computed and returned in the leading
                 columns of A. See more details in the description of A.
                 The default numerical orthogonality threshold is set to
                 approximately TOL=CTOL*EPS, CTOL=DSQRT(M), EPS=DLAMCH('E').
          = 'C': Analogous to JOBU='U', except that user can control the
                 level of numerical orthogonality of the computed left
                 singular vectors. TOL can be set to TOL = CTOL*EPS, where
                 CTOL is given on input in the array WORK.
                 No CTOL smaller than ONE is allowed. CTOL greater
                 than 1 / EPS is meaningless. The option 'C'
                 can be used if M*EPS is satisfactory orthogonality
                 of the computed left singular vectors, so CTOL=M could
                 save few sweeps of Jacobi rotations.
                 See the descriptions of A and WORK(1).
          = 'N': The matrix U is not computed. However, see the
                 description of A.
[in]JOBV
          JOBV is CHARACTER*1
          Specifies whether to compute the right singular vectors, that
          is, the matrix V:
          = 'V' : the matrix V is computed and returned in the array V
          = 'A' : the Jacobi rotations are applied to the MV-by-N
                  array V. In other words, the right singular vector
                  matrix V is not computed explicitly, instead it is
                  applied to an MV-by-N matrix initially stored in the
                  first MV rows of V.
          = 'N' : the matrix V is not computed and the array V is not
                  referenced
[in]M
          M is INTEGER
          The number of rows of the input matrix A. 1/DLAMCH('E') > M >= 0.
[in]N
          N is INTEGER
          The number of columns of the input matrix A.
          M >= N >= 0.
[in,out]A
          A is DOUBLE PRECISION array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
          On exit :
          If JOBU .EQ. 'U' .OR. JOBU .EQ. 'C' :
                 If INFO .EQ. 0 :
                 RANKA orthonormal columns of U are returned in the
                 leading RANKA columns of the array A. Here RANKA <= N
                 is the number of computed singular values of A that are
                 above the underflow threshold DLAMCH('S'). The singular
                 vectors corresponding to underflowed or zero singular
                 values are not computed. The value of RANKA is returned
                 in the array WORK as RANKA=NINT(WORK(2)). Also see the
                 descriptions of SVA and WORK. The computed columns of U
                 are mutually numerically orthogonal up to approximately
                 TOL=DSQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU.EQ.'C'),
                 see the description of JOBU.
                 If INFO .GT. 0 :
                 the procedure DGESVJ did not converge in the given number
                 of iterations (sweeps). In that case, the computed
                 columns of U may not be orthogonal up to TOL. The output
                 U (stored in A), SIGMA (given by the computed singular
                 values in SVA(1:N)) and V is still a decomposition of the
                 input matrix A in the sense that the residual
                 ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small.

          If JOBU .EQ. 'N' :
                 If INFO .EQ. 0 :
                 Note that the left singular vectors are 'for free' in the
                 one-sided Jacobi SVD algorithm. However, if only the
                 singular values are needed, the level of numerical
                 orthogonality of U is not an issue and iterations are
                 stopped when the columns of the iterated matrix are
                 numerically orthogonal up to approximately M*EPS. Thus,
                 on exit, A contains the columns of U scaled with the
                 corresponding singular values.
                 If INFO .GT. 0 :
                 the procedure DGESVJ did not converge in the given number
                 of iterations (sweeps).
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]SVA
          SVA is DOUBLE PRECISION array, dimension (N)
          On exit :
          If INFO .EQ. 0 :
          depending on the value SCALE = WORK(1), we have:
                 If SCALE .EQ. ONE :
                 SVA(1:N) contains the computed singular values of A.
                 During the computation SVA contains the Euclidean column
                 norms of the iterated matrices in the array A.
                 If SCALE .NE. ONE :
                 The singular values of A are SCALE*SVA(1:N), and this
                 factored representation is due to the fact that some of the
                 singular values of A might underflow or overflow.
          If INFO .GT. 0 :
          the procedure DGESVJ did not converge in the given number of
          iterations (sweeps) and SCALE*SVA(1:N) may not be accurate.
[in]MV
          MV is INTEGER
          If JOBV .EQ. 'A', then the product of Jacobi rotations in DGESVJ
          is applied to the first MV rows of V. See the description of JOBV.
[in,out]V
          V is DOUBLE PRECISION array, dimension (LDV,N)
          If JOBV = 'V', then V contains on exit the N-by-N matrix of
                         the right singular vectors;
          If JOBV = 'A', then V contains the product of the computed right
                         singular vector matrix and the initial matrix in
                         the array V.
          If JOBV = 'N', then V is not referenced.
[in]LDV
          LDV is INTEGER
          The leading dimension of the array V, LDV .GE. 1.
          If JOBV .EQ. 'V', then LDV .GE. max(1,N).
          If JOBV .EQ. 'A', then LDV .GE. max(1,MV) .
[in,out]WORK
          WORK is DOUBLE PRECISION array, dimension (LWORK)
          On entry :
          If JOBU .EQ. 'C' :
          WORK(1) = CTOL, where CTOL defines the threshold for convergence.
                    The process stops if all columns of A are mutually
                    orthogonal up to CTOL*EPS, EPS=DLAMCH('E').
                    It is required that CTOL >= ONE, i.e. it is not
                    allowed to force the routine to obtain orthogonality
                    below EPS.
          On exit :
          WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N)
                    are the computed singular values of A.
                    (See description of SVA().)
          WORK(2) = NINT(WORK(2)) is the number of the computed nonzero
                    singular values.
          WORK(3) = NINT(WORK(3)) is the number of the computed singular
                    values that are larger than the underflow threshold.
          WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi
                    rotations needed for numerical convergence.
          WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep.
                    This is useful information in cases when DGESVJ did
                    not converge, as it can be used to estimate whether
                    the output is stil useful and for post festum analysis.
          WORK(6) = the largest absolute value over all sines of the
                    Jacobi rotation angles in the last sweep. It can be
                    useful for a post festum analysis.
[in]LWORK
          LWORK is INTEGER
          length of WORK, WORK >= MAX(6,M+N)
[out]INFO
          INFO is INTEGER
          = 0 : successful exit.
          < 0 : if INFO = -i, then the i-th argument had an illegal value
          > 0 : DGESVJ did not converge in the maximal allowed number (30)
                of sweeps. The output may still be useful. See the
                description of WORK.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Date
June 2017
Further Details:
  The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane
  rotations. The rotations are implemented as fast scaled rotations of
  Anda and Park [1]. In the case of underflow of the Jacobi angle, a
  modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses
  column interchanges of de Rijk [2]. The relative accuracy of the computed
  singular values and the accuracy of the computed singular vectors (in
  angle metric) is as guaranteed by the theory of Demmel and Veselic [3].
  The condition number that determines the accuracy in the full rank case
  is essentially min_{D=diag} kappa(A*D), where kappa(.) is the
  spectral condition number. The best performance of this Jacobi SVD
  procedure is achieved if used in an  accelerated version of Drmac and
  Veselic [5,6], and it is the kernel routine in the SIGMA library [7].
  Some tunning parameters (marked with [TP]) are available for the
  implementer.
  The computational range for the nonzero singular values is the  machine
  number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even
  denormalized singular values can be computed with the corresponding
  gradual loss of accurate digits.
Contributors:
  ============

  Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
References:
 [1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling.
     SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174.
 [2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the
     singular value decomposition on a vector computer.
     SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371.
 [3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR.
 [4] Z. Drmac: Implementation of Jacobi rotations for accurate singular
     value computation in floating point arithmetic.
     SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222.
 [5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
     LAPACK Working note 169.
 [6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
     LAPACK Working note 170.
 [7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
     QSVD, (H,K)-SVD computations.
     Department of Mathematics, University of Zagreb, 2008.
Bugs, examples and comments:
  ===========================
  Please report all bugs and send interesting test examples and comments to
  drmac@math.hr. Thank you.

Definition at line 339 of file dgesvj.f.

339 *
340 * -- LAPACK computational routine (version 3.7.1) --
341 * -- LAPACK is a software package provided by Univ. of Tennessee, --
342 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
343 * June 2017
344 *
345 * .. Scalar Arguments ..
346  INTEGER info, lda, ldv, lwork, m, mv, n
347  CHARACTER*1 joba, jobu, jobv
348 * ..
349 * .. Array Arguments ..
350  DOUBLE PRECISION a( lda, * ), sva( n ), v( ldv, * ),
351  $ work( lwork )
352 * ..
353 *
354 * =====================================================================
355 *
356 * .. Local Parameters ..
357  DOUBLE PRECISION zero, half, one
358  parameter( zero = 0.0d0, half = 0.5d0, one = 1.0d0)
359  INTEGER nsweep
360  parameter( nsweep = 30 )
361 * ..
362 * .. Local Scalars ..
363  DOUBLE PRECISION aapp, aapp0, aapq, aaqq, apoaq, aqoap, big,
364  $ bigtheta, cs, ctol, epsln, large, mxaapq,
365  $ mxsinj, rootbig, rooteps, rootsfmin, roottol,
366  $ skl, sfmin, small, sn, t, temp1, theta,
367  $ thsign, tol
368  INTEGER blskip, emptsw, i, ibr, ierr, igl, ijblsk, ir1,
369  $ iswrot, jbc, jgl, kbl, lkahead, mvl, n2, n34,
370  $ n4, nbl, notrot, p, pskipped, q, rowskip,
371  $ swband
372  LOGICAL applv, goscale, lower, lsvec, noscale, rotok,
373  $ rsvec, uctol, upper
374 * ..
375 * .. Local Arrays ..
376  DOUBLE PRECISION fastr( 5 )
377 * ..
378 * .. Intrinsic Functions ..
379  INTRINSIC dabs, max, min, dble, dsign, dsqrt
380 * ..
381 * .. External Functions ..
382 * ..
383 * from BLAS
384  DOUBLE PRECISION ddot, dnrm2
385  EXTERNAL ddot, dnrm2
386  INTEGER idamax
387  EXTERNAL idamax
388 * from LAPACK
389  DOUBLE PRECISION dlamch
390  EXTERNAL dlamch
391  LOGICAL lsame
392  EXTERNAL lsame
393 * ..
394 * .. External Subroutines ..
395 * ..
396 * from BLAS
397  EXTERNAL daxpy, dcopy, drotm, dscal, dswap
398 * from LAPACK
399  EXTERNAL dlascl, dlaset, dlassq, xerbla
400 *
401  EXTERNAL dgsvj0, dgsvj1
402 * ..
403 * .. Executable Statements ..
404 *
405 * Test the input arguments
406 *
407  lsvec = lsame( jobu, 'U' )
408  uctol = lsame( jobu, 'C' )
409  rsvec = lsame( jobv, 'V' )
410  applv = lsame( jobv, 'A' )
411  upper = lsame( joba, 'U' )
412  lower = lsame( joba, 'L' )
413 *
414  IF( .NOT.( upper .OR. lower .OR. lsame( joba, 'G' ) ) ) THEN
415  info = -1
416  ELSE IF( .NOT.( lsvec .OR. uctol .OR. lsame( jobu, 'N' ) ) ) THEN
417  info = -2
418  ELSE IF( .NOT.( rsvec .OR. applv .OR. lsame( jobv, 'N' ) ) ) THEN
419  info = -3
420  ELSE IF( m.LT.0 ) THEN
421  info = -4
422  ELSE IF( ( n.LT.0 ) .OR. ( n.GT.m ) ) THEN
423  info = -5
424  ELSE IF( lda.LT.m ) THEN
425  info = -7
426  ELSE IF( mv.LT.0 ) THEN
427  info = -9
428  ELSE IF( ( rsvec .AND. ( ldv.LT.n ) ) .OR.
429  $ ( applv .AND. ( ldv.LT.mv ) ) ) THEN
430  info = -11
431  ELSE IF( uctol .AND. ( work( 1 ).LE.one ) ) THEN
432  info = -12
433  ELSE IF( lwork.LT.max( m+n, 6 ) ) THEN
434  info = -13
435  ELSE
436  info = 0
437  END IF
438 *
439 * #:(
440  IF( info.NE.0 ) THEN
441  CALL xerbla( 'DGESVJ', -info )
442  RETURN
443  END IF
444 *
445 * #:) Quick return for void matrix
446 *
447  IF( ( m.EQ.0 ) .OR. ( n.EQ.0 ) )RETURN
448 *
449 * Set numerical parameters
450 * The stopping criterion for Jacobi rotations is
451 *
452 * max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS
453 *
454 * where EPS is the round-off and CTOL is defined as follows:
455 *
456  IF( uctol ) THEN
457 * ... user controlled
458  ctol = work( 1 )
459  ELSE
460 * ... default
461  IF( lsvec .OR. rsvec .OR. applv ) THEN
462  ctol = dsqrt( dble( m ) )
463  ELSE
464  ctol = dble( m )
465  END IF
466  END IF
467 * ... and the machine dependent parameters are
468 *[!] (Make sure that DLAMCH() works properly on the target machine.)
469 *
470  epsln = dlamch( 'Epsilon' )
471  rooteps = dsqrt( epsln )
472  sfmin = dlamch( 'SafeMinimum' )
473  rootsfmin = dsqrt( sfmin )
474  small = sfmin / epsln
475  big = dlamch( 'Overflow' )
476 * BIG = ONE / SFMIN
477  rootbig = one / rootsfmin
478  large = big / dsqrt( dble( m*n ) )
479  bigtheta = one / rooteps
480 *
481  tol = ctol*epsln
482  roottol = dsqrt( tol )
483 *
484  IF( dble( m )*epsln.GE.one ) THEN
485  info = -4
486  CALL xerbla( 'DGESVJ', -info )
487  RETURN
488  END IF
489 *
490 * Initialize the right singular vector matrix.
491 *
492  IF( rsvec ) THEN
493  mvl = n
494  CALL dlaset( 'A', mvl, n, zero, one, v, ldv )
495  ELSE IF( applv ) THEN
496  mvl = mv
497  END IF
498  rsvec = rsvec .OR. applv
499 *
500 * Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N )
501 *(!) If necessary, scale A to protect the largest singular value
502 * from overflow. It is possible that saving the largest singular
503 * value destroys the information about the small ones.
504 * This initial scaling is almost minimal in the sense that the
505 * goal is to make sure that no column norm overflows, and that
506 * DSQRT(N)*max_i SVA(i) does not overflow. If INFinite entries
507 * in A are detected, the procedure returns with INFO=-6.
508 *
509  skl= one / dsqrt( dble( m )*dble( n ) )
510  noscale = .true.
511  goscale = .true.
512 *
513  IF( lower ) THEN
514 * the input matrix is M-by-N lower triangular (trapezoidal)
515  DO 1874 p = 1, n
516  aapp = zero
517  aaqq = one
518  CALL dlassq( m-p+1, a( p, p ), 1, aapp, aaqq )
519  IF( aapp.GT.big ) THEN
520  info = -6
521  CALL xerbla( 'DGESVJ', -info )
522  RETURN
523  END IF
524  aaqq = dsqrt( aaqq )
525  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
526  sva( p ) = aapp*aaqq
527  ELSE
528  noscale = .false.
529  sva( p ) = aapp*( aaqq*skl)
530  IF( goscale ) THEN
531  goscale = .false.
532  DO 1873 q = 1, p - 1
533  sva( q ) = sva( q )*skl
534  1873 CONTINUE
535  END IF
536  END IF
537  1874 CONTINUE
538  ELSE IF( upper ) THEN
539 * the input matrix is M-by-N upper triangular (trapezoidal)
540  DO 2874 p = 1, n
541  aapp = zero
542  aaqq = one
543  CALL dlassq( p, a( 1, p ), 1, aapp, aaqq )
544  IF( aapp.GT.big ) THEN
545  info = -6
546  CALL xerbla( 'DGESVJ', -info )
547  RETURN
548  END IF
549  aaqq = dsqrt( aaqq )
550  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
551  sva( p ) = aapp*aaqq
552  ELSE
553  noscale = .false.
554  sva( p ) = aapp*( aaqq*skl)
555  IF( goscale ) THEN
556  goscale = .false.
557  DO 2873 q = 1, p - 1
558  sva( q ) = sva( q )*skl
559  2873 CONTINUE
560  END IF
561  END IF
562  2874 CONTINUE
563  ELSE
564 * the input matrix is M-by-N general dense
565  DO 3874 p = 1, n
566  aapp = zero
567  aaqq = one
568  CALL dlassq( m, a( 1, p ), 1, aapp, aaqq )
569  IF( aapp.GT.big ) THEN
570  info = -6
571  CALL xerbla( 'DGESVJ', -info )
572  RETURN
573  END IF
574  aaqq = dsqrt( aaqq )
575  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
576  sva( p ) = aapp*aaqq
577  ELSE
578  noscale = .false.
579  sva( p ) = aapp*( aaqq*skl)
580  IF( goscale ) THEN
581  goscale = .false.
582  DO 3873 q = 1, p - 1
583  sva( q ) = sva( q )*skl
584  3873 CONTINUE
585  END IF
586  END IF
587  3874 CONTINUE
588  END IF
589 *
590  IF( noscale )skl= one
591 *
592 * Move the smaller part of the spectrum from the underflow threshold
593 *(!) Start by determining the position of the nonzero entries of the
594 * array SVA() relative to ( SFMIN, BIG ).
595 *
596  aapp = zero
597  aaqq = big
598  DO 4781 p = 1, n
599  IF( sva( p ).NE.zero )aaqq = min( aaqq, sva( p ) )
600  aapp = max( aapp, sva( p ) )
601  4781 CONTINUE
602 *
603 * #:) Quick return for zero matrix
604 *
605  IF( aapp.EQ.zero ) THEN
606  IF( lsvec )CALL dlaset( 'G', m, n, zero, one, a, lda )
607  work( 1 ) = one
608  work( 2 ) = zero
609  work( 3 ) = zero
610  work( 4 ) = zero
611  work( 5 ) = zero
612  work( 6 ) = zero
613  RETURN
614  END IF
615 *
616 * #:) Quick return for one-column matrix
617 *
618  IF( n.EQ.1 ) THEN
619  IF( lsvec )CALL dlascl( 'G', 0, 0, sva( 1 ), skl, m, 1,
620  $ a( 1, 1 ), lda, ierr )
621  work( 1 ) = one / skl
622  IF( sva( 1 ).GE.sfmin ) THEN
623  work( 2 ) = one
624  ELSE
625  work( 2 ) = zero
626  END IF
627  work( 3 ) = zero
628  work( 4 ) = zero
629  work( 5 ) = zero
630  work( 6 ) = zero
631  RETURN
632  END IF
633 *
634 * Protect small singular values from underflow, and try to
635 * avoid underflows/overflows in computing Jacobi rotations.
636 *
637  sn = dsqrt( sfmin / epsln )
638  temp1 = dsqrt( big / dble( n ) )
639  IF( ( aapp.LE.sn ) .OR. ( aaqq.GE.temp1 ) .OR.
640  $ ( ( sn.LE.aaqq ) .AND. ( aapp.LE.temp1 ) ) ) THEN
641  temp1 = min( big, temp1 / aapp )
642 * AAQQ = AAQQ*TEMP1
643 * AAPP = AAPP*TEMP1
644  ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.LE.temp1 ) ) THEN
645  temp1 = min( sn / aaqq, big / ( aapp*dsqrt( dble( n ) ) ) )
646 * AAQQ = AAQQ*TEMP1
647 * AAPP = AAPP*TEMP1
648  ELSE IF( ( aaqq.GE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
649  temp1 = max( sn / aaqq, temp1 / aapp )
650 * AAQQ = AAQQ*TEMP1
651 * AAPP = AAPP*TEMP1
652  ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
653  temp1 = min( sn / aaqq, big / ( dsqrt( dble( n ) )*aapp ) )
654 * AAQQ = AAQQ*TEMP1
655 * AAPP = AAPP*TEMP1
656  ELSE
657  temp1 = one
658  END IF
659 *
660 * Scale, if necessary
661 *
662  IF( temp1.NE.one ) THEN
663  CALL dlascl( 'G', 0, 0, one, temp1, n, 1, sva, n, ierr )
664  END IF
665  skl= temp1*skl
666  IF( skl.NE.one ) THEN
667  CALL dlascl( joba, 0, 0, one, skl, m, n, a, lda, ierr )
668  skl= one / skl
669  END IF
670 *
671 * Row-cyclic Jacobi SVD algorithm with column pivoting
672 *
673  emptsw = ( n*( n-1 ) ) / 2
674  notrot = 0
675  fastr( 1 ) = zero
676 *
677 * A is represented in factored form A = A * diag(WORK), where diag(WORK)
678 * is initialized to identity. WORK is updated during fast scaled
679 * rotations.
680 *
681  DO 1868 q = 1, n
682  work( q ) = one
683  1868 CONTINUE
684 *
685 *
686  swband = 3
687 *[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective
688 * if DGESVJ is used as a computational routine in the preconditioned
689 * Jacobi SVD algorithm DGESVJ. For sweeps i=1:SWBAND the procedure
690 * works on pivots inside a band-like region around the diagonal.
691 * The boundaries are determined dynamically, based on the number of
692 * pivots above a threshold.
693 *
694  kbl = min( 8, n )
695 *[TP] KBL is a tuning parameter that defines the tile size in the
696 * tiling of the p-q loops of pivot pairs. In general, an optimal
697 * value of KBL depends on the matrix dimensions and on the
698 * parameters of the computer's memory.
699 *
700  nbl = n / kbl
701  IF( ( nbl*kbl ).NE.n )nbl = nbl + 1
702 *
703  blskip = kbl**2
704 *[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL.
705 *
706  rowskip = min( 5, kbl )
707 *[TP] ROWSKIP is a tuning parameter.
708 *
709  lkahead = 1
710 *[TP] LKAHEAD is a tuning parameter.
711 *
712 * Quasi block transformations, using the lower (upper) triangular
713 * structure of the input matrix. The quasi-block-cycling usually
714 * invokes cubic convergence. Big part of this cycle is done inside
715 * canonical subspaces of dimensions less than M.
716 *
717  IF( ( lower .OR. upper ) .AND. ( n.GT.max( 64, 4*kbl ) ) ) THEN
718 *[TP] The number of partition levels and the actual partition are
719 * tuning parameters.
720  n4 = n / 4
721  n2 = n / 2
722  n34 = 3*n4
723  IF( applv ) THEN
724  q = 0
725  ELSE
726  q = 1
727  END IF
728 *
729  IF( lower ) THEN
730 *
731 * This works very well on lower triangular matrices, in particular
732 * in the framework of the preconditioned Jacobi SVD (xGEJSV).
733 * The idea is simple:
734 * [+ 0 0 0] Note that Jacobi transformations of [0 0]
735 * [+ + 0 0] [0 0]
736 * [+ + x 0] actually work on [x 0] [x 0]
737 * [+ + x x] [x x]. [x x]
738 *
739  CALL dgsvj0( jobv, m-n34, n-n34, a( n34+1, n34+1 ), lda,
740  $ work( n34+1 ), sva( n34+1 ), mvl,
741  $ v( n34*q+1, n34+1 ), ldv, epsln, sfmin, tol,
742  $ 2, work( n+1 ), lwork-n, ierr )
743 *
744  CALL dgsvj0( jobv, m-n2, n34-n2, a( n2+1, n2+1 ), lda,
745  $ work( n2+1 ), sva( n2+1 ), mvl,
746  $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 2,
747  $ work( n+1 ), lwork-n, ierr )
748 *
749  CALL dgsvj1( jobv, m-n2, n-n2, n4, a( n2+1, n2+1 ), lda,
750  $ work( n2+1 ), sva( n2+1 ), mvl,
751  $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
752  $ work( n+1 ), lwork-n, ierr )
753 *
754  CALL dgsvj0( jobv, m-n4, n2-n4, a( n4+1, n4+1 ), lda,
755  $ work( n4+1 ), sva( n4+1 ), mvl,
756  $ v( n4*q+1, n4+1 ), ldv, epsln, sfmin, tol, 1,
757  $ work( n+1 ), lwork-n, ierr )
758 *
759  CALL dgsvj0( jobv, m, n4, a, lda, work, sva, mvl, v, ldv,
760  $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
761  $ ierr )
762 *
763  CALL dgsvj1( jobv, m, n2, n4, a, lda, work, sva, mvl, v,
764  $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
765  $ lwork-n, ierr )
766 *
767 *
768  ELSE IF( upper ) THEN
769 *
770 *
771  CALL dgsvj0( jobv, n4, n4, a, lda, work, sva, mvl, v, ldv,
772  $ epsln, sfmin, tol, 2, work( n+1 ), lwork-n,
773  $ ierr )
774 *
775  CALL dgsvj0( jobv, n2, n4, a( 1, n4+1 ), lda, work( n4+1 ),
776  $ sva( n4+1 ), mvl, v( n4*q+1, n4+1 ), ldv,
777  $ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
778  $ ierr )
779 *
780  CALL dgsvj1( jobv, n2, n2, n4, a, lda, work, sva, mvl, v,
781  $ ldv, epsln, sfmin, tol, 1, work( n+1 ),
782  $ lwork-n, ierr )
783 *
784  CALL dgsvj0( jobv, n2+n4, n4, a( 1, n2+1 ), lda,
785  $ work( n2+1 ), sva( n2+1 ), mvl,
786  $ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
787  $ work( n+1 ), lwork-n, ierr )
788 
789  END IF
790 *
791  END IF
792 *
793 * .. Row-cyclic pivot strategy with de Rijk's pivoting ..
794 *
795  DO 1993 i = 1, nsweep
796 *
797 * .. go go go ...
798 *
799  mxaapq = zero
800  mxsinj = zero
801  iswrot = 0
802 *
803  notrot = 0
804  pskipped = 0
805 *
806 * Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs
807 * 1 <= p < q <= N. This is the first step toward a blocked implementation
808 * of the rotations. New implementation, based on block transformations,
809 * is under development.
810 *
811  DO 2000 ibr = 1, nbl
812 *
813  igl = ( ibr-1 )*kbl + 1
814 *
815  DO 1002 ir1 = 0, min( lkahead, nbl-ibr )
816 *
817  igl = igl + ir1*kbl
818 *
819  DO 2001 p = igl, min( igl+kbl-1, n-1 )
820 *
821 * .. de Rijk's pivoting
822 *
823  q = idamax( n-p+1, sva( p ), 1 ) + p - 1
824  IF( p.NE.q ) THEN
825  CALL dswap( m, a( 1, p ), 1, a( 1, q ), 1 )
826  IF( rsvec )CALL dswap( mvl, v( 1, p ), 1,
827  $ v( 1, q ), 1 )
828  temp1 = sva( p )
829  sva( p ) = sva( q )
830  sva( q ) = temp1
831  temp1 = work( p )
832  work( p ) = work( q )
833  work( q ) = temp1
834  END IF
835 *
836  IF( ir1.EQ.0 ) THEN
837 *
838 * Column norms are periodically updated by explicit
839 * norm computation.
840 * Caveat:
841 * Unfortunately, some BLAS implementations compute DNRM2(M,A(1,p),1)
842 * as DSQRT(DDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to
843 * overflow for ||A(:,p)||_2 > DSQRT(overflow_threshold), and to
844 * underflow for ||A(:,p)||_2 < DSQRT(underflow_threshold).
845 * Hence, DNRM2 cannot be trusted, not even in the case when
846 * the true norm is far from the under(over)flow boundaries.
847 * If properly implemented DNRM2 is available, the IF-THEN-ELSE
848 * below should read "AAPP = DNRM2( M, A(1,p), 1 ) * WORK(p)".
849 *
850  IF( ( sva( p ).LT.rootbig ) .AND.
851  $ ( sva( p ).GT.rootsfmin ) ) THEN
852  sva( p ) = dnrm2( m, a( 1, p ), 1 )*work( p )
853  ELSE
854  temp1 = zero
855  aapp = one
856  CALL dlassq( m, a( 1, p ), 1, temp1, aapp )
857  sva( p ) = temp1*dsqrt( aapp )*work( p )
858  END IF
859  aapp = sva( p )
860  ELSE
861  aapp = sva( p )
862  END IF
863 *
864  IF( aapp.GT.zero ) THEN
865 *
866  pskipped = 0
867 *
868  DO 2002 q = p + 1, min( igl+kbl-1, n )
869 *
870  aaqq = sva( q )
871 *
872  IF( aaqq.GT.zero ) THEN
873 *
874  aapp0 = aapp
875  IF( aaqq.GE.one ) THEN
876  rotok = ( small*aapp ).LE.aaqq
877  IF( aapp.LT.( big / aaqq ) ) THEN
878  aapq = ( ddot( m, a( 1, p ), 1, a( 1,
879  $ q ), 1 )*work( p )*work( q ) /
880  $ aaqq ) / aapp
881  ELSE
882  CALL dcopy( m, a( 1, p ), 1,
883  $ work( n+1 ), 1 )
884  CALL dlascl( 'G', 0, 0, aapp,
885  $ work( p ), m, 1,
886  $ work( n+1 ), lda, ierr )
887  aapq = ddot( m, work( n+1 ), 1,
888  $ a( 1, q ), 1 )*work( q ) / aaqq
889  END IF
890  ELSE
891  rotok = aapp.LE.( aaqq / small )
892  IF( aapp.GT.( small / aaqq ) ) THEN
893  aapq = ( ddot( m, a( 1, p ), 1, a( 1,
894  $ q ), 1 )*work( p )*work( q ) /
895  $ aaqq ) / aapp
896  ELSE
897  CALL dcopy( m, a( 1, q ), 1,
898  $ work( n+1 ), 1 )
899  CALL dlascl( 'G', 0, 0, aaqq,
900  $ work( q ), m, 1,
901  $ work( n+1 ), lda, ierr )
902  aapq = ddot( m, work( n+1 ), 1,
903  $ a( 1, p ), 1 )*work( p ) / aapp
904  END IF
905  END IF
906 *
907  mxaapq = max( mxaapq, dabs( aapq ) )
908 *
909 * TO rotate or NOT to rotate, THAT is the question ...
910 *
911  IF( dabs( aapq ).GT.tol ) THEN
912 *
913 * .. rotate
914 *[RTD] ROTATED = ROTATED + ONE
915 *
916  IF( ir1.EQ.0 ) THEN
917  notrot = 0
918  pskipped = 0
919  iswrot = iswrot + 1
920  END IF
921 *
922  IF( rotok ) THEN
923 *
924  aqoap = aaqq / aapp
925  apoaq = aapp / aaqq
926  theta = -half*dabs(aqoap-apoaq)/aapq
927 *
928  IF( dabs( theta ).GT.bigtheta ) THEN
929 *
930  t = half / theta
931  fastr( 3 ) = t*work( p ) / work( q )
932  fastr( 4 ) = -t*work( q ) /
933  $ work( p )
934  CALL drotm( m, a( 1, p ), 1,
935  $ a( 1, q ), 1, fastr )
936  IF( rsvec )CALL drotm( mvl,
937  $ v( 1, p ), 1,
938  $ v( 1, q ), 1,
939  $ fastr )
940  sva( q ) = aaqq*dsqrt( max( zero,
941  $ one+t*apoaq*aapq ) )
942  aapp = aapp*dsqrt( max( zero,
943  $ one-t*aqoap*aapq ) )
944  mxsinj = max( mxsinj, dabs( t ) )
945 *
946  ELSE
947 *
948 * .. choose correct signum for THETA and rotate
949 *
950  thsign = -dsign( one, aapq )
951  t = one / ( theta+thsign*
952  $ dsqrt( one+theta*theta ) )
953  cs = dsqrt( one / ( one+t*t ) )
954  sn = t*cs
955 *
956  mxsinj = max( mxsinj, dabs( sn ) )
957  sva( q ) = aaqq*dsqrt( max( zero,
958  $ one+t*apoaq*aapq ) )
959  aapp = aapp*dsqrt( max( zero,
960  $ one-t*aqoap*aapq ) )
961 *
962  apoaq = work( p ) / work( q )
963  aqoap = work( q ) / work( p )
964  IF( work( p ).GE.one ) THEN
965  IF( work( q ).GE.one ) THEN
966  fastr( 3 ) = t*apoaq
967  fastr( 4 ) = -t*aqoap
968  work( p ) = work( p )*cs
969  work( q ) = work( q )*cs
970  CALL drotm( m, a( 1, p ), 1,
971  $ a( 1, q ), 1,
972  $ fastr )
973  IF( rsvec )CALL drotm( mvl,
974  $ v( 1, p ), 1, v( 1, q ),
975  $ 1, fastr )
976  ELSE
977  CALL daxpy( m, -t*aqoap,
978  $ a( 1, q ), 1,
979  $ a( 1, p ), 1 )
980  CALL daxpy( m, cs*sn*apoaq,
981  $ a( 1, p ), 1,
982  $ a( 1, q ), 1 )
983  work( p ) = work( p )*cs
984  work( q ) = work( q ) / cs
985  IF( rsvec ) THEN
986  CALL daxpy( mvl, -t*aqoap,
987  $ v( 1, q ), 1,
988  $ v( 1, p ), 1 )
989  CALL daxpy( mvl,
990  $ cs*sn*apoaq,
991  $ v( 1, p ), 1,
992  $ v( 1, q ), 1 )
993  END IF
994  END IF
995  ELSE
996  IF( work( q ).GE.one ) THEN
997  CALL daxpy( m, t*apoaq,
998  $ a( 1, p ), 1,
999  $ a( 1, q ), 1 )
1000  CALL daxpy( m, -cs*sn*aqoap,
1001  $ a( 1, q ), 1,
1002  $ a( 1, p ), 1 )
1003  work( p ) = work( p ) / cs
1004  work( q ) = work( q )*cs
1005  IF( rsvec ) THEN
1006  CALL daxpy( mvl, t*apoaq,
1007  $ v( 1, p ), 1,
1008  $ v( 1, q ), 1 )
1009  CALL daxpy( mvl,
1010  $ -cs*sn*aqoap,
1011  $ v( 1, q ), 1,
1012  $ v( 1, p ), 1 )
1013  END IF
1014  ELSE
1015  IF( work( p ).GE.work( q ) )
1016  $ THEN
1017  CALL daxpy( m, -t*aqoap,
1018  $ a( 1, q ), 1,
1019  $ a( 1, p ), 1 )
1020  CALL daxpy( m, cs*sn*apoaq,
1021  $ a( 1, p ), 1,
1022  $ a( 1, q ), 1 )
1023  work( p ) = work( p )*cs
1024  work( q ) = work( q ) / cs
1025  IF( rsvec ) THEN
1026  CALL daxpy( mvl,
1027  $ -t*aqoap,
1028  $ v( 1, q ), 1,
1029  $ v( 1, p ), 1 )
1030  CALL daxpy( mvl,
1031  $ cs*sn*apoaq,
1032  $ v( 1, p ), 1,
1033  $ v( 1, q ), 1 )
1034  END IF
1035  ELSE
1036  CALL daxpy( m, t*apoaq,
1037  $ a( 1, p ), 1,
1038  $ a( 1, q ), 1 )
1039  CALL daxpy( m,
1040  $ -cs*sn*aqoap,
1041  $ a( 1, q ), 1,
1042  $ a( 1, p ), 1 )
1043  work( p ) = work( p ) / cs
1044  work( q ) = work( q )*cs
1045  IF( rsvec ) THEN
1046  CALL daxpy( mvl,
1047  $ t*apoaq, v( 1, p ),
1048  $ 1, v( 1, q ), 1 )
1049  CALL daxpy( mvl,
1050  $ -cs*sn*aqoap,
1051  $ v( 1, q ), 1,
1052  $ v( 1, p ), 1 )
1053  END IF
1054  END IF
1055  END IF
1056  END IF
1057  END IF
1058 *
1059  ELSE
1060 * .. have to use modified Gram-Schmidt like transformation
1061  CALL dcopy( m, a( 1, p ), 1,
1062  $ work( n+1 ), 1 )
1063  CALL dlascl( 'G', 0, 0, aapp, one, m,
1064  $ 1, work( n+1 ), lda,
1065  $ ierr )
1066  CALL dlascl( 'G', 0, 0, aaqq, one, m,
1067  $ 1, a( 1, q ), lda, ierr )
1068  temp1 = -aapq*work( p ) / work( q )
1069  CALL daxpy( m, temp1, work( n+1 ), 1,
1070  $ a( 1, q ), 1 )
1071  CALL dlascl( 'G', 0, 0, one, aaqq, m,
1072  $ 1, a( 1, q ), lda, ierr )
1073  sva( q ) = aaqq*dsqrt( max( zero,
1074  $ one-aapq*aapq ) )
1075  mxsinj = max( mxsinj, sfmin )
1076  END IF
1077 * END IF ROTOK THEN ... ELSE
1078 *
1079 * In the case of cancellation in updating SVA(q), SVA(p)
1080 * recompute SVA(q), SVA(p).
1081 *
1082  IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1083  $ THEN
1084  IF( ( aaqq.LT.rootbig ) .AND.
1085  $ ( aaqq.GT.rootsfmin ) ) THEN
1086  sva( q ) = dnrm2( m, a( 1, q ), 1 )*
1087  $ work( q )
1088  ELSE
1089  t = zero
1090  aaqq = one
1091  CALL dlassq( m, a( 1, q ), 1, t,
1092  $ aaqq )
1093  sva( q ) = t*dsqrt( aaqq )*work( q )
1094  END IF
1095  END IF
1096  IF( ( aapp / aapp0 ).LE.rooteps ) THEN
1097  IF( ( aapp.LT.rootbig ) .AND.
1098  $ ( aapp.GT.rootsfmin ) ) THEN
1099  aapp = dnrm2( m, a( 1, p ), 1 )*
1100  $ work( p )
1101  ELSE
1102  t = zero
1103  aapp = one
1104  CALL dlassq( m, a( 1, p ), 1, t,
1105  $ aapp )
1106  aapp = t*dsqrt( aapp )*work( p )
1107  END IF
1108  sva( p ) = aapp
1109  END IF
1110 *
1111  ELSE
1112 * A(:,p) and A(:,q) already numerically orthogonal
1113  IF( ir1.EQ.0 )notrot = notrot + 1
1114 *[RTD] SKIPPED = SKIPPED + 1
1115  pskipped = pskipped + 1
1116  END IF
1117  ELSE
1118 * A(:,q) is zero column
1119  IF( ir1.EQ.0 )notrot = notrot + 1
1120  pskipped = pskipped + 1
1121  END IF
1122 *
1123  IF( ( i.LE.swband ) .AND.
1124  $ ( pskipped.GT.rowskip ) ) THEN
1125  IF( ir1.EQ.0 )aapp = -aapp
1126  notrot = 0
1127  GO TO 2103
1128  END IF
1129 *
1130  2002 CONTINUE
1131 * END q-LOOP
1132 *
1133  2103 CONTINUE
1134 * bailed out of q-loop
1135 *
1136  sva( p ) = aapp
1137 *
1138  ELSE
1139  sva( p ) = aapp
1140  IF( ( ir1.EQ.0 ) .AND. ( aapp.EQ.zero ) )
1141  $ notrot = notrot + min( igl+kbl-1, n ) - p
1142  END IF
1143 *
1144  2001 CONTINUE
1145 * end of the p-loop
1146 * end of doing the block ( ibr, ibr )
1147  1002 CONTINUE
1148 * end of ir1-loop
1149 *
1150 * ... go to the off diagonal blocks
1151 *
1152  igl = ( ibr-1 )*kbl + 1
1153 *
1154  DO 2010 jbc = ibr + 1, nbl
1155 *
1156  jgl = ( jbc-1 )*kbl + 1
1157 *
1158 * doing the block at ( ibr, jbc )
1159 *
1160  ijblsk = 0
1161  DO 2100 p = igl, min( igl+kbl-1, n )
1162 *
1163  aapp = sva( p )
1164  IF( aapp.GT.zero ) THEN
1165 *
1166  pskipped = 0
1167 *
1168  DO 2200 q = jgl, min( jgl+kbl-1, n )
1169 *
1170  aaqq = sva( q )
1171  IF( aaqq.GT.zero ) THEN
1172  aapp0 = aapp
1173 *
1174 * .. M x 2 Jacobi SVD ..
1175 *
1176 * Safe Gram matrix computation
1177 *
1178  IF( aaqq.GE.one ) THEN
1179  IF( aapp.GE.aaqq ) THEN
1180  rotok = ( small*aapp ).LE.aaqq
1181  ELSE
1182  rotok = ( small*aaqq ).LE.aapp
1183  END IF
1184  IF( aapp.LT.( big / aaqq ) ) THEN
1185  aapq = ( ddot( m, a( 1, p ), 1, a( 1,
1186  $ q ), 1 )*work( p )*work( q ) /
1187  $ aaqq ) / aapp
1188  ELSE
1189  CALL dcopy( m, a( 1, p ), 1,
1190  $ work( n+1 ), 1 )
1191  CALL dlascl( 'G', 0, 0, aapp,
1192  $ work( p ), m, 1,
1193  $ work( n+1 ), lda, ierr )
1194  aapq = ddot( m, work( n+1 ), 1,
1195  $ a( 1, q ), 1 )*work( q ) / aaqq
1196  END IF
1197  ELSE
1198  IF( aapp.GE.aaqq ) THEN
1199  rotok = aapp.LE.( aaqq / small )
1200  ELSE
1201  rotok = aaqq.LE.( aapp / small )
1202  END IF
1203  IF( aapp.GT.( small / aaqq ) ) THEN
1204  aapq = ( ddot( m, a( 1, p ), 1, a( 1,
1205  $ q ), 1 )*work( p )*work( q ) /
1206  $ aaqq ) / aapp
1207  ELSE
1208  CALL dcopy( m, a( 1, q ), 1,
1209  $ work( n+1 ), 1 )
1210  CALL dlascl( 'G', 0, 0, aaqq,
1211  $ work( q ), m, 1,
1212  $ work( n+1 ), lda, ierr )
1213  aapq = ddot( m, work( n+1 ), 1,
1214  $ a( 1, p ), 1 )*work( p ) / aapp
1215  END IF
1216  END IF
1217 *
1218  mxaapq = max( mxaapq, dabs( aapq ) )
1219 *
1220 * TO rotate or NOT to rotate, THAT is the question ...
1221 *
1222  IF( dabs( aapq ).GT.tol ) THEN
1223  notrot = 0
1224 *[RTD] ROTATED = ROTATED + 1
1225  pskipped = 0
1226  iswrot = iswrot + 1
1227 *
1228  IF( rotok ) THEN
1229 *
1230  aqoap = aaqq / aapp
1231  apoaq = aapp / aaqq
1232  theta = -half*dabs(aqoap-apoaq)/aapq
1233  IF( aaqq.GT.aapp0 )theta = -theta
1234 *
1235  IF( dabs( theta ).GT.bigtheta ) THEN
1236  t = half / theta
1237  fastr( 3 ) = t*work( p ) / work( q )
1238  fastr( 4 ) = -t*work( q ) /
1239  $ work( p )
1240  CALL drotm( m, a( 1, p ), 1,
1241  $ a( 1, q ), 1, fastr )
1242  IF( rsvec )CALL drotm( mvl,
1243  $ v( 1, p ), 1,
1244  $ v( 1, q ), 1,
1245  $ fastr )
1246  sva( q ) = aaqq*dsqrt( max( zero,
1247  $ one+t*apoaq*aapq ) )
1248  aapp = aapp*dsqrt( max( zero,
1249  $ one-t*aqoap*aapq ) )
1250  mxsinj = max( mxsinj, dabs( t ) )
1251  ELSE
1252 *
1253 * .. choose correct signum for THETA and rotate
1254 *
1255  thsign = -dsign( one, aapq )
1256  IF( aaqq.GT.aapp0 )thsign = -thsign
1257  t = one / ( theta+thsign*
1258  $ dsqrt( one+theta*theta ) )
1259  cs = dsqrt( one / ( one+t*t ) )
1260  sn = t*cs
1261  mxsinj = max( mxsinj, dabs( sn ) )
1262  sva( q ) = aaqq*dsqrt( max( zero,
1263  $ one+t*apoaq*aapq ) )
1264  aapp = aapp*dsqrt( max( zero,
1265  $ one-t*aqoap*aapq ) )
1266 *
1267  apoaq = work( p ) / work( q )
1268  aqoap = work( q ) / work( p )
1269  IF( work( p ).GE.one ) THEN
1270 *
1271  IF( work( q ).GE.one ) THEN
1272  fastr( 3 ) = t*apoaq
1273  fastr( 4 ) = -t*aqoap
1274  work( p ) = work( p )*cs
1275  work( q ) = work( q )*cs
1276  CALL drotm( m, a( 1, p ), 1,
1277  $ a( 1, q ), 1,
1278  $ fastr )
1279  IF( rsvec )CALL drotm( mvl,
1280  $ v( 1, p ), 1, v( 1, q ),
1281  $ 1, fastr )
1282  ELSE
1283  CALL daxpy( m, -t*aqoap,
1284  $ a( 1, q ), 1,
1285  $ a( 1, p ), 1 )
1286  CALL daxpy( m, cs*sn*apoaq,
1287  $ a( 1, p ), 1,
1288  $ a( 1, q ), 1 )
1289  IF( rsvec ) THEN
1290  CALL daxpy( mvl, -t*aqoap,
1291  $ v( 1, q ), 1,
1292  $ v( 1, p ), 1 )
1293  CALL daxpy( mvl,
1294  $ cs*sn*apoaq,
1295  $ v( 1, p ), 1,
1296  $ v( 1, q ), 1 )
1297  END IF
1298  work( p ) = work( p )*cs
1299  work( q ) = work( q ) / cs
1300  END IF
1301  ELSE
1302  IF( work( q ).GE.one ) THEN
1303  CALL daxpy( m, t*apoaq,
1304  $ a( 1, p ), 1,
1305  $ a( 1, q ), 1 )
1306  CALL daxpy( m, -cs*sn*aqoap,
1307  $ a( 1, q ), 1,
1308  $ a( 1, p ), 1 )
1309  IF( rsvec ) THEN
1310  CALL daxpy( mvl, t*apoaq,
1311  $ v( 1, p ), 1,
1312  $ v( 1, q ), 1 )
1313  CALL daxpy( mvl,
1314  $ -cs*sn*aqoap,
1315  $ v( 1, q ), 1,
1316  $ v( 1, p ), 1 )
1317  END IF
1318  work( p ) = work( p ) / cs
1319  work( q ) = work( q )*cs
1320  ELSE
1321  IF( work( p ).GE.work( q ) )
1322  $ THEN
1323  CALL daxpy( m, -t*aqoap,
1324  $ a( 1, q ), 1,
1325  $ a( 1, p ), 1 )
1326  CALL daxpy( m, cs*sn*apoaq,
1327  $ a( 1, p ), 1,
1328  $ a( 1, q ), 1 )
1329  work( p ) = work( p )*cs
1330  work( q ) = work( q ) / cs
1331  IF( rsvec ) THEN
1332  CALL daxpy( mvl,
1333  $ -t*aqoap,
1334  $ v( 1, q ), 1,
1335  $ v( 1, p ), 1 )
1336  CALL daxpy( mvl,
1337  $ cs*sn*apoaq,
1338  $ v( 1, p ), 1,
1339  $ v( 1, q ), 1 )
1340  END IF
1341  ELSE
1342  CALL daxpy( m, t*apoaq,
1343  $ a( 1, p ), 1,
1344  $ a( 1, q ), 1 )
1345  CALL daxpy( m,
1346  $ -cs*sn*aqoap,
1347  $ a( 1, q ), 1,
1348  $ a( 1, p ), 1 )
1349  work( p ) = work( p ) / cs
1350  work( q ) = work( q )*cs
1351  IF( rsvec ) THEN
1352  CALL daxpy( mvl,
1353  $ t*apoaq, v( 1, p ),
1354  $ 1, v( 1, q ), 1 )
1355  CALL daxpy( mvl,
1356  $ -cs*sn*aqoap,
1357  $ v( 1, q ), 1,
1358  $ v( 1, p ), 1 )
1359  END IF
1360  END IF
1361  END IF
1362  END IF
1363  END IF
1364 *
1365  ELSE
1366  IF( aapp.GT.aaqq ) THEN
1367  CALL dcopy( m, a( 1, p ), 1,
1368  $ work( n+1 ), 1 )
1369  CALL dlascl( 'G', 0, 0, aapp, one,
1370  $ m, 1, work( n+1 ), lda,
1371  $ ierr )
1372  CALL dlascl( 'G', 0, 0, aaqq, one,
1373  $ m, 1, a( 1, q ), lda,
1374  $ ierr )
1375  temp1 = -aapq*work( p ) / work( q )
1376  CALL daxpy( m, temp1, work( n+1 ),
1377  $ 1, a( 1, q ), 1 )
1378  CALL dlascl( 'G', 0, 0, one, aaqq,
1379  $ m, 1, a( 1, q ), lda,
1380  $ ierr )
1381  sva( q ) = aaqq*dsqrt( max( zero,
1382  $ one-aapq*aapq ) )
1383  mxsinj = max( mxsinj, sfmin )
1384  ELSE
1385  CALL dcopy( m, a( 1, q ), 1,
1386  $ work( n+1 ), 1 )
1387  CALL dlascl( 'G', 0, 0, aaqq, one,
1388  $ m, 1, work( n+1 ), lda,
1389  $ ierr )
1390  CALL dlascl( 'G', 0, 0, aapp, one,
1391  $ m, 1, a( 1, p ), lda,
1392  $ ierr )
1393  temp1 = -aapq*work( q ) / work( p )
1394  CALL daxpy( m, temp1, work( n+1 ),
1395  $ 1, a( 1, p ), 1 )
1396  CALL dlascl( 'G', 0, 0, one, aapp,
1397  $ m, 1, a( 1, p ), lda,
1398  $ ierr )
1399  sva( p ) = aapp*dsqrt( max( zero,
1400  $ one-aapq*aapq ) )
1401  mxsinj = max( mxsinj, sfmin )
1402  END IF
1403  END IF
1404 * END IF ROTOK THEN ... ELSE
1405 *
1406 * In the case of cancellation in updating SVA(q)
1407 * .. recompute SVA(q)
1408  IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1409  $ THEN
1410  IF( ( aaqq.LT.rootbig ) .AND.
1411  $ ( aaqq.GT.rootsfmin ) ) THEN
1412  sva( q ) = dnrm2( m, a( 1, q ), 1 )*
1413  $ work( q )
1414  ELSE
1415  t = zero
1416  aaqq = one
1417  CALL dlassq( m, a( 1, q ), 1, t,
1418  $ aaqq )
1419  sva( q ) = t*dsqrt( aaqq )*work( q )
1420  END IF
1421  END IF
1422  IF( ( aapp / aapp0 )**2.LE.rooteps ) THEN
1423  IF( ( aapp.LT.rootbig ) .AND.
1424  $ ( aapp.GT.rootsfmin ) ) THEN
1425  aapp = dnrm2( m, a( 1, p ), 1 )*
1426  $ work( p )
1427  ELSE
1428  t = zero
1429  aapp = one
1430  CALL dlassq( m, a( 1, p ), 1, t,
1431  $ aapp )
1432  aapp = t*dsqrt( aapp )*work( p )
1433  END IF
1434  sva( p ) = aapp
1435  END IF
1436 * end of OK rotation
1437  ELSE
1438  notrot = notrot + 1
1439 *[RTD] SKIPPED = SKIPPED + 1
1440  pskipped = pskipped + 1
1441  ijblsk = ijblsk + 1
1442  END IF
1443  ELSE
1444  notrot = notrot + 1
1445  pskipped = pskipped + 1
1446  ijblsk = ijblsk + 1
1447  END IF
1448 *
1449  IF( ( i.LE.swband ) .AND. ( ijblsk.GE.blskip ) )
1450  $ THEN
1451  sva( p ) = aapp
1452  notrot = 0
1453  GO TO 2011
1454  END IF
1455  IF( ( i.LE.swband ) .AND.
1456  $ ( pskipped.GT.rowskip ) ) THEN
1457  aapp = -aapp
1458  notrot = 0
1459  GO TO 2203
1460  END IF
1461 *
1462  2200 CONTINUE
1463 * end of the q-loop
1464  2203 CONTINUE
1465 *
1466  sva( p ) = aapp
1467 *
1468  ELSE
1469 *
1470  IF( aapp.EQ.zero )notrot = notrot +
1471  $ min( jgl+kbl-1, n ) - jgl + 1
1472  IF( aapp.LT.zero )notrot = 0
1473 *
1474  END IF
1475 *
1476  2100 CONTINUE
1477 * end of the p-loop
1478  2010 CONTINUE
1479 * end of the jbc-loop
1480  2011 CONTINUE
1481 *2011 bailed out of the jbc-loop
1482  DO 2012 p = igl, min( igl+kbl-1, n )
1483  sva( p ) = dabs( sva( p ) )
1484  2012 CONTINUE
1485 ***
1486  2000 CONTINUE
1487 *2000 :: end of the ibr-loop
1488 *
1489 * .. update SVA(N)
1490  IF( ( sva( n ).LT.rootbig ) .AND. ( sva( n ).GT.rootsfmin ) )
1491  $ THEN
1492  sva( n ) = dnrm2( m, a( 1, n ), 1 )*work( n )
1493  ELSE
1494  t = zero
1495  aapp = one
1496  CALL dlassq( m, a( 1, n ), 1, t, aapp )
1497  sva( n ) = t*dsqrt( aapp )*work( n )
1498  END IF
1499 *
1500 * Additional steering devices
1501 *
1502  IF( ( i.LT.swband ) .AND. ( ( mxaapq.LE.roottol ) .OR.
1503  $ ( iswrot.LE.n ) ) )swband = i
1504 *
1505  IF( ( i.GT.swband+1 ) .AND. ( mxaapq.LT.dsqrt( dble( n ) )*
1506  $ tol ) .AND. ( dble( n )*mxaapq*mxsinj.LT.tol ) ) THEN
1507  GO TO 1994
1508  END IF
1509 *
1510  IF( notrot.GE.emptsw )GO TO 1994
1511 *
1512  1993 CONTINUE
1513 * end i=1:NSWEEP loop
1514 *
1515 * #:( Reaching this point means that the procedure has not converged.
1516  info = nsweep - 1
1517  GO TO 1995
1518 *
1519  1994 CONTINUE
1520 * #:) Reaching this point means numerical convergence after the i-th
1521 * sweep.
1522 *
1523  info = 0
1524 * #:) INFO = 0 confirms successful iterations.
1525  1995 CONTINUE
1526 *
1527 * Sort the singular values and find how many are above
1528 * the underflow threshold.
1529 *
1530  n2 = 0
1531  n4 = 0
1532  DO 5991 p = 1, n - 1
1533  q = idamax( n-p+1, sva( p ), 1 ) + p - 1
1534  IF( p.NE.q ) THEN
1535  temp1 = sva( p )
1536  sva( p ) = sva( q )
1537  sva( q ) = temp1
1538  temp1 = work( p )
1539  work( p ) = work( q )
1540  work( q ) = temp1
1541  CALL dswap( m, a( 1, p ), 1, a( 1, q ), 1 )
1542  IF( rsvec )CALL dswap( mvl, v( 1, p ), 1, v( 1, q ), 1 )
1543  END IF
1544  IF( sva( p ).NE.zero ) THEN
1545  n4 = n4 + 1
1546  IF( sva( p )*skl.GT.sfmin )n2 = n2 + 1
1547  END IF
1548  5991 CONTINUE
1549  IF( sva( n ).NE.zero ) THEN
1550  n4 = n4 + 1
1551  IF( sva( n )*skl.GT.sfmin )n2 = n2 + 1
1552  END IF
1553 *
1554 * Normalize the left singular vectors.
1555 *
1556  IF( lsvec .OR. uctol ) THEN
1557  DO 1998 p = 1, n2
1558  CALL dscal( m, work( p ) / sva( p ), a( 1, p ), 1 )
1559  1998 CONTINUE
1560  END IF
1561 *
1562 * Scale the product of Jacobi rotations (assemble the fast rotations).
1563 *
1564  IF( rsvec ) THEN
1565  IF( applv ) THEN
1566  DO 2398 p = 1, n
1567  CALL dscal( mvl, work( p ), v( 1, p ), 1 )
1568  2398 CONTINUE
1569  ELSE
1570  DO 2399 p = 1, n
1571  temp1 = one / dnrm2( mvl, v( 1, p ), 1 )
1572  CALL dscal( mvl, temp1, v( 1, p ), 1 )
1573  2399 CONTINUE
1574  END IF
1575  END IF
1576 *
1577 * Undo scaling, if necessary (and possible).
1578  IF( ( ( skl.GT.one ) .AND. ( sva( 1 ).LT.( big / skl) ) )
1579  $ .OR. ( ( skl.LT.one ) .AND. ( sva( max( n2, 1 ) ) .GT.
1580  $ ( sfmin / skl) ) ) ) THEN
1581  DO 2400 p = 1, n
1582  sva( p ) = skl*sva( p )
1583  2400 CONTINUE
1584  skl= one
1585  END IF
1586 *
1587  work( 1 ) = skl
1588 * The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE
1589 * then some of the singular values may overflow or underflow and
1590 * the spectrum is given in this factored representation.
1591 *
1592  work( 2 ) = dble( n4 )
1593 * N4 is the number of computed nonzero singular values of A.
1594 *
1595  work( 3 ) = dble( n2 )
1596 * N2 is the number of singular values of A greater than SFMIN.
1597 * If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers
1598 * that may carry some information.
1599 *
1600  work( 4 ) = dble( i )
1601 * i is the index of the last sweep before declaring convergence.
1602 *
1603  work( 5 ) = mxaapq
1604 * MXAAPQ is the largest absolute value of scaled pivots in the
1605 * last sweep
1606 *
1607  work( 6 ) = mxsinj
1608 * MXSINJ is the largest absolute value of the sines of Jacobi angles
1609 * in the last sweep
1610 *
1611  RETURN
1612 * ..
1613 * .. END OF DGESVJ
1614 * ..
double precision function dlamch(CMACH)
DLAMCH
Definition: dlamch.f:65
subroutine dcopy(N, DX, INCX, DY, INCY)
DCOPY
Definition: dcopy.f:84
integer function idamax(N, DX, INCX)
IDAMAX
Definition: idamax.f:73
double precision function ddot(N, DX, INCX, DY, INCY)
DDOT
Definition: ddot.f:84
subroutine dlascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
DLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: dlascl.f:145
subroutine daxpy(N, DA, DX, INCX, DY, INCY)
DAXPY
Definition: daxpy.f:91
subroutine dlaset(UPLO, M, N, ALPHA, BETA, A, LDA)
DLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values...
Definition: dlaset.f:112
subroutine dswap(N, DX, INCX, DY, INCY)
DSWAP
Definition: dswap.f:84
subroutine dlassq(N, X, INCX, SCALE, SUMSQ)
DLASSQ updates a sum of squares represented in scaled form.
Definition: dlassq.f:105
subroutine drotm(N, DX, INCX, DY, INCY, DPARAM)
DROTM
Definition: drotm.f:98
double precision function dnrm2(N, X, INCX)
DNRM2
Definition: dnrm2.f:76
subroutine dgsvj0(JOBV, M, N, A, LDA, D, SVA, MV, V, LDV, EPS, SFMIN, TOL, NSWEEP, WORK, LWORK, INFO)
DGSVJ0 pre-processor for the routine dgesvj.
Definition: dgsvj0.f:220
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:62
subroutine dgsvj1(JOBV, M, N, N1, A, LDA, D, SVA, MV, V, LDV, EPS, SFMIN, TOL, NSWEEP, WORK, LWORK, INFO)
DGSVJ1 pre-processor for the routine dgesvj, applies Jacobi rotations targeting only particular pivot...
Definition: dgsvj1.f:238
logical function lsame(CA, CB)
LSAME
Definition: lsame.f:55
subroutine dscal(N, DA, DX, INCX)
DSCAL
Definition: dscal.f:81
Here is the call graph for this function:
Here is the caller graph for this function: