◆ dgesvj()

subroutine dgesvj	(	character*1	joba,
		character*1	jobu,
		character*1	jobv,
		integer	m,
		integer	n,
		double precision, dimension( lda, * )	a,
		integer	lda,
		double precision, dimension( n )	sva,
		integer	mv,
		double precision, dimension( ldv, * )	v,
		integer	ldv,
		double precision, dimension( lwork )	work,
		integer	lwork,
		integer	info
	)

DGESVJ

Download DGESVJ + dependencies [TGZ] [ZIP] [TXT]

Purpose:

 DGESVJ computes the singular value decomposition (SVD) of a real
 M-by-N matrix A, where M >= N. The SVD of A is written as
                                    [++]   [xx]   [x0]   [xx]
              A = U * SIGMA * V^t,  [++] = [xx] * [ox] * [xx]
                                    [++]   [xx]
 where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal
 matrix, and V is an N-by-N orthogonal matrix. The diagonal elements
 of SIGMA are the singular values of A. The columns of U and V are the
 left and the right singular vectors of A, respectively.
 DGESVJ can sometimes compute tiny singular values and their singular vectors much
 more accurately than other SVD routines, see below under Further Details.

Parameters

[in]	JOBA	JOBA is CHARACTER*1 Specifies the structure of A. = 'L': The input matrix A is lower triangular; = 'U': The input matrix A is upper triangular; = 'G': The input matrix A is general M-by-N matrix, M >= N.
[in]	JOBU	JOBU is CHARACTER1 Specifies whether to compute the left singular vectors (columns of U): = 'U': The left singular vectors corresponding to the nonzero singular values are computed and returned in the leading columns of A. See more details in the description of A. The default numerical orthogonality threshold is set to approximately TOL=CTOLEPS, CTOL=DSQRT(M), EPS=DLAMCH('E'). = 'C': Analogous to JOBU='U', except that user can control the level of numerical orthogonality of the computed left singular vectors. TOL can be set to TOL = CTOLEPS, where CTOL is given on input in the array WORK. No CTOL smaller than ONE is allowed. CTOL greater than 1 / EPS is meaningless. The option 'C' can be used if MEPS is satisfactory orthogonality of the computed left singular vectors, so CTOL=M could save few sweeps of Jacobi rotations. See the descriptions of A and WORK(1). = 'N': The matrix U is not computed. However, see the description of A.
[in]	JOBV	JOBV is CHARACTER*1 Specifies whether to compute the right singular vectors, that is, the matrix V: = 'V': the matrix V is computed and returned in the array V = 'A': the Jacobi rotations are applied to the MV-by-N array V. In other words, the right singular vector matrix V is not computed explicitly, instead it is applied to an MV-by-N matrix initially stored in the first MV rows of V. = 'N': the matrix V is not computed and the array V is not referenced
[in]	M	M is INTEGER The number of rows of the input matrix A. 1/DLAMCH('E') > M >= 0.
[in]	N	N is INTEGER The number of columns of the input matrix A. M >= N >= 0.
[in,out]	A	A is DOUBLE PRECISION array, dimension (LDA,N) On entry, the M-by-N matrix A. On exit : If JOBU = 'U' .OR. JOBU = 'C' : If INFO = 0 : RANKA orthonormal columns of U are returned in the leading RANKA columns of the array A. Here RANKA <= N is the number of computed singular values of A that are above the underflow threshold DLAMCH('S'). The singular vectors corresponding to underflowed or zero singular values are not computed. The value of RANKA is returned in the array WORK as RANKA=NINT(WORK(2)). Also see the descriptions of SVA and WORK. The computed columns of U are mutually numerically orthogonal up to approximately TOL=DSQRT(M)EPS (default); or TOL=CTOLEPS (JOBU = 'C'), see the description of JOBU. If INFO > 0 : the procedure DGESVJ did not converge in the given number of iterations (sweeps). In that case, the computed columns of U may not be orthogonal up to TOL. The output U (stored in A), SIGMA (given by the computed singular values in SVA(1:N)) and V is still a decomposition of the input matrix A in the sense that the residual \|\|A-SCALEUSIGMAV^T\|\|_2 / \|\|A\|\|_2 is small. If JOBU = 'N' : If INFO = 0 : Note that the left singular vectors are 'for free' in the one-sided Jacobi SVD algorithm. However, if only the singular values are needed, the level of numerical orthogonality of U is not an issue and iterations are stopped when the columns of the iterated matrix are numerically orthogonal up to approximately MEPS. Thus, on exit, A contains the columns of U scaled with the corresponding singular values. If INFO > 0 : the procedure DGESVJ did not converge in the given number of iterations (sweeps).
[in]	LDA	LDA is INTEGER The leading dimension of the array A. LDA >= max(1,M).
[out]	SVA	SVA is DOUBLE PRECISION array, dimension (N) On exit : If INFO = 0 : depending on the value SCALE = WORK(1), we have: If SCALE = ONE : SVA(1:N) contains the computed singular values of A. During the computation SVA contains the Euclidean column norms of the iterated matrices in the array A. If SCALE .NE. ONE : The singular values of A are SCALESVA(1:N), and this factored representation is due to the fact that some of the singular values of A might underflow or overflow. If INFO > 0 : the procedure DGESVJ did not converge in the given number of iterations (sweeps) and SCALESVA(1:N) may not be accurate.
[in]	MV	MV is INTEGER If JOBV = 'A', then the product of Jacobi rotations in DGESVJ is applied to the first MV rows of V. See the description of JOBV.
[in,out]	V	V is DOUBLE PRECISION array, dimension (LDV,N) If JOBV = 'V', then V contains on exit the N-by-N matrix of the right singular vectors; If JOBV = 'A', then V contains the product of the computed right singular vector matrix and the initial matrix in the array V. If JOBV = 'N', then V is not referenced.
[in]	LDV	LDV is INTEGER The leading dimension of the array V, LDV >= 1. If JOBV = 'V', then LDV >= max(1,N). If JOBV = 'A', then LDV >= max(1,MV) .
[in,out]	WORK	WORK is DOUBLE PRECISION array, dimension (LWORK) On entry : If JOBU = 'C' : WORK(1) = CTOL, where CTOL defines the threshold for convergence. The process stops if all columns of A are mutually orthogonal up to CTOLEPS, EPS=DLAMCH('E'). It is required that CTOL >= ONE, i.e. it is not allowed to force the routine to obtain orthogonality below EPS. On exit : WORK(1) = SCALE is the scaling factor such that SCALESVA(1:N) are the computed singular values of A. (See description of SVA().) WORK(2) = NINT(WORK(2)) is the number of the computed nonzero singular values. WORK(3) = NINT(WORK(3)) is the number of the computed singular values that are larger than the underflow threshold. WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi rotations needed for numerical convergence. WORK(5) = max_{i.NE.j} \|COS(A(:,i),A(:,j))\| in the last sweep. This is useful information in cases when DGESVJ did not converge, as it can be used to estimate whether the output is still useful and for post festum analysis. WORK(6) = the largest absolute value over all sines of the Jacobi rotation angles in the last sweep. It can be useful for a post festum analysis.
[in]	LWORK	LWORK is INTEGER length of WORK, WORK >= MAX(6,M+N)
[out]	INFO	INFO is INTEGER = 0: successful exit. < 0: if INFO = -i, then the i-th argument had an illegal value > 0: DGESVJ did not converge in the maximal allowed number (30) of sweeps. The output may still be useful. See the description of WORK.

Author: Univ. of Tennessee; Univ. of California Berkeley; Univ. of Colorado Denver; NAG Ltd.

Further Details:

  The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane
  rotations. The rotations are implemented as fast scaled rotations of
  Anda and Park [1]. In the case of underflow of the Jacobi angle, a
  modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses
  column interchanges of de Rijk [2]. The relative accuracy of the computed
  singular values and the accuracy of the computed singular vectors (in
  angle metric) is as guaranteed by the theory of Demmel and Veselic [3].
  The condition number that determines the accuracy in the full rank case
  is essentially min_{D=diag} kappa(A*D), where kappa(.) is the
  spectral condition number. The best performance of this Jacobi SVD
  procedure is achieved if used in an  accelerated version of Drmac and
  Veselic [5,6], and it is the kernel routine in the SIGMA library [7].
  Some tuning parameters (marked with [TP]) are available for the
  implementer.
  The computational range for the nonzero singular values is the  machine
  number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even
  denormalized singular values can be computed with the corresponding
  gradual loss of accurate digits.

Contributors:

  ============

  Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)

References:

 [1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling.
     SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174.
 [2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the
     singular value decomposition on a vector computer.
     SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371.
 [3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR.
 [4] Z. Drmac: Implementation of Jacobi rotations for accurate singular
     value computation in floating point arithmetic.
     SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222.
 [5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
     LAPACK Working note 169.
 [6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
     LAPACK Working note 170.
 [7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
     QSVD, (H,K)-SVD computations.
     Department of Mathematics, University of Zagreb, 2008.

Bugs, examples and comments:

  ===========================
  Please report all bugs and send interesting test examples and comments to
  drmac@math.hr. Thank you.

Definition at line 335 of file dgesvj.f.

*
*  -- LAPACK computational routine --
*  -- LAPACK is a software package provided by Univ. of Tennessee,    --
*  -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
*
*     .. Scalar Arguments ..
      INTEGER            INFO, LDA, LDV, LWORK, M, MV, N
      CHARACTER*1        JOBA, JOBU, JOBV
*     ..
*     .. Array Arguments ..
      DOUBLE PRECISION   A( LDA, * ), SVA( N ), V( LDV, * ),
     $                   WORK( LWORK )
*     ..
*
*  =====================================================================
*
*     .. Local Parameters ..
      DOUBLE PRECISION   ZERO, HALF, ONE
      parameter( zero = 0.0d0, half = 0.5d0, one = 1.0d0)
      INTEGER            NSWEEP
      parameter( nsweep = 30 )
*     ..
*     .. Local Scalars ..
      DOUBLE PRECISION   AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG,
     $                   BIGTHETA, CS, CTOL, EPSLN, LARGE, MXAAPQ,
     $                   MXSINJ, ROOTBIG, ROOTEPS, ROOTSFMIN, ROOTTOL,
     $                   SKL, SFMIN, SMALL, SN, T, TEMP1, THETA,
     $                   THSIGN, TOL
      INTEGER            BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1,
     $                   ISWROT, jbc, jgl, KBL, LKAHEAD, MVL, N2, N34,
     $                   N4, NBL, NOTROT, p, PSKIPPED, q, ROWSKIP,
     $                   SWBAND
      LOGICAL            APPLV, GOSCALE, LOWER, LSVEC, NOSCALE, ROTOK,
     $                   RSVEC, UCTOL, UPPER
*     ..
*     .. Local Arrays ..
      DOUBLE PRECISION   FASTR( 5 )
*     ..
*     .. Intrinsic Functions ..
      INTRINSIC          dabs, max, min, dble, dsign, dsqrt
*     ..
*     .. External Functions ..
*     ..
*     from BLAS
      DOUBLE PRECISION   DDOT, DNRM2
      EXTERNAL           ddot, dnrm2
      INTEGER            IDAMAX
      EXTERNAL           idamax
*     from LAPACK
      DOUBLE PRECISION   DLAMCH
      EXTERNAL           dlamch
      LOGICAL            LSAME
      EXTERNAL           lsame
*     ..
*     .. External Subroutines ..
*     ..
*     from BLAS
      EXTERNAL           daxpy, dcopy, drotm, dscal, dswap
*     from LAPACK
      EXTERNAL           dlascl, dlaset, dlassq, xerbla
*
      EXTERNAL           dgsvj0, dgsvj1
*     ..
*     .. Executable Statements ..
*
*     Test the input arguments
*
      lsvec = lsame( jobu, 'U' )
      uctol = lsame( jobu, 'C' )
      rsvec = lsame( jobv, 'V' )
      applv = lsame( jobv, 'A' )
      upper = lsame( joba, 'U' )
      lower = lsame( joba, 'L' )
*
      IF( .NOT.( upper .OR. lower .OR. lsame( joba, 'G' ) ) ) THEN
         info = -1
      ELSE IF( .NOT.( lsvec .OR. uctol .OR. lsame( jobu, 'N' ) ) ) THEN
         info = -2
      ELSE IF( .NOT.( rsvec .OR. applv .OR. lsame( jobv, 'N' ) ) ) THEN
         info = -3
      ELSE IF( m.LT.0 ) THEN
         info = -4
      ELSE IF( ( n.LT.0 ) .OR. ( n.GT.m ) ) THEN
         info = -5
      ELSE IF( lda.LT.m ) THEN
         info = -7
      ELSE IF( mv.LT.0 ) THEN
         info = -9
      ELSE IF( ( rsvec .AND. ( ldv.LT.n ) ) .OR.
     $         ( applv .AND. ( ldv.LT.mv ) ) ) THEN
         info = -11
      ELSE IF( uctol .AND. ( work( 1 ).LE.one ) ) THEN
         info = -12
      ELSE IF( lwork.LT.max( m+n, 6 ) ) THEN
         info = -13
      ELSE
         info = 0
      END IF
*
*     #:(
      IF( info.NE.0 ) THEN
         CALL xerbla( 'DGESVJ', -info )
         RETURN
      END IF
*
* #:) Quick return for void matrix
*
      IF( ( m.EQ.0 ) .OR. ( n.EQ.0 ) )RETURN
*
*     Set numerical parameters
*     The stopping criterion for Jacobi rotations is
*
*     max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS
*
*     where EPS is the round-off and CTOL is defined as follows:
*
      IF( uctol ) THEN
*        ... user controlled
         ctol = work( 1 )
      ELSE
*        ... default
         IF( lsvec .OR. rsvec .OR. applv ) THEN
            ctol = dsqrt( dble( m ) )
         ELSE
            ctol = dble( m )
         END IF
      END IF
*     ... and the machine dependent parameters are
*[!]  (Make sure that DLAMCH() works properly on the target machine.)
*
      epsln = dlamch( 'Epsilon' )
      rooteps = dsqrt( epsln )
      sfmin = dlamch( 'SafeMinimum' )
      rootsfmin = dsqrt( sfmin )
      small = sfmin / epsln
      big = dlamch( 'Overflow' )
*     BIG         = ONE    / SFMIN
      rootbig = one / rootsfmin
      large = big / dsqrt( dble( m*n ) )
      bigtheta = one / rooteps
*
      tol = ctol*epsln
      roottol = dsqrt( tol )
*
      IF( dble( m )*epsln.GE.one ) THEN
         info = -4
         CALL xerbla( 'DGESVJ', -info )
         RETURN
      END IF
*
*     Initialize the right singular vector matrix.
*
      IF( rsvec ) THEN
         mvl = n
         CALL dlaset( 'A', mvl, n, zero, one, v, ldv )
      ELSE IF( applv ) THEN
         mvl = mv
      END IF
      rsvec = rsvec .OR. applv
*
*     Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N )
*(!)  If necessary, scale A to protect the largest singular value
*     from overflow. It is possible that saving the largest singular
*     value destroys the information about the small ones.
*     This initial scaling is almost minimal in the sense that the
*     goal is to make sure that no column norm overflows, and that
*     DSQRT(N)*max_i SVA(i) does not overflow. If INFinite entries
*     in A are detected, the procedure returns with INFO=-6.
*
      skl= one / dsqrt( dble( m )*dble( n ) )
      noscale = .true.
      goscale = .true.
*
      IF( lower ) THEN
*        the input matrix is M-by-N lower triangular (trapezoidal)
         DO 1874 p = 1, n
            aapp = zero
            aaqq = one
            CALL dlassq( m-p+1, a( p, p ), 1, aapp, aaqq )
            IF( aapp.GT.big ) THEN
               info = -6
               CALL xerbla( 'DGESVJ', -info )
               RETURN
            END IF
            aaqq = dsqrt( aaqq )
            IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
               sva( p ) = aapp*aaqq
            ELSE
               noscale = .false.
               sva( p ) = aapp*( aaqq*skl)
               IF( goscale ) THEN
                  goscale = .false.
                  DO 1873 q = 1, p - 1
                     sva( q ) = sva( q )*skl
 1873             CONTINUE
               END IF
            END IF
 1874    CONTINUE
      ELSE IF( upper ) THEN
*        the input matrix is M-by-N upper triangular (trapezoidal)
         DO 2874 p = 1, n
            aapp = zero
            aaqq = one
            CALL dlassq( p, a( 1, p ), 1, aapp, aaqq )
            IF( aapp.GT.big ) THEN
               info = -6
               CALL xerbla( 'DGESVJ', -info )
               RETURN
            END IF
            aaqq = dsqrt( aaqq )
            IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
               sva( p ) = aapp*aaqq
            ELSE
               noscale = .false.
               sva( p ) = aapp*( aaqq*skl)
               IF( goscale ) THEN
                  goscale = .false.
                  DO 2873 q = 1, p - 1
                     sva( q ) = sva( q )*skl
 2873             CONTINUE
               END IF
            END IF
 2874    CONTINUE
      ELSE
*        the input matrix is M-by-N general dense
         DO 3874 p = 1, n
            aapp = zero
            aaqq = one
            CALL dlassq( m, a( 1, p ), 1, aapp, aaqq )
            IF( aapp.GT.big ) THEN
               info = -6
               CALL xerbla( 'DGESVJ', -info )
               RETURN
            END IF
            aaqq = dsqrt( aaqq )
            IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
               sva( p ) = aapp*aaqq
            ELSE
               noscale = .false.
               sva( p ) = aapp*( aaqq*skl)
               IF( goscale ) THEN
                  goscale = .false.
                  DO 3873 q = 1, p - 1
                     sva( q ) = sva( q )*skl
 3873             CONTINUE
               END IF
            END IF
 3874    CONTINUE
      END IF
*
      IF( noscale )skl= one
*
*     Move the smaller part of the spectrum from the underflow threshold
*(!)  Start by determining the position of the nonzero entries of the
*     array SVA() relative to ( SFMIN, BIG ).
*
      aapp = zero
      aaqq = big
      DO 4781 p = 1, n
         IF( sva( p ).NE.zero )aaqq = min( aaqq, sva( p ) )
         aapp = max( aapp, sva( p ) )
 4781 CONTINUE
*
* #:) Quick return for zero matrix
*
      IF( aapp.EQ.zero ) THEN
         IF( lsvec )CALL dlaset( 'G', m, n, zero, one, a, lda )
         work( 1 ) = one
         work( 2 ) = zero
         work( 3 ) = zero
         work( 4 ) = zero
         work( 5 ) = zero
         work( 6 ) = zero
         RETURN
      END IF
*
* #:) Quick return for one-column matrix
*
      IF( n.EQ.1 ) THEN
         IF( lsvec )CALL dlascl( 'G', 0, 0, sva( 1 ), skl, m, 1,
     $                           a( 1, 1 ), lda, ierr )
         work( 1 ) = one / skl
         IF( sva( 1 ).GE.sfmin ) THEN
            work( 2 ) = one
         ELSE
            work( 2 ) = zero
         END IF
         work( 3 ) = zero
         work( 4 ) = zero
         work( 5 ) = zero
         work( 6 ) = zero
         RETURN
      END IF
*
*     Protect small singular values from underflow, and try to
*     avoid underflows/overflows in computing Jacobi rotations.
*
      sn = dsqrt( sfmin / epsln )
      temp1 = dsqrt( big / dble( n ) )
      IF( ( aapp.LE.sn ) .OR. ( aaqq.GE.temp1 ) .OR.
     $    ( ( sn.LE.aaqq ) .AND. ( aapp.LE.temp1 ) ) ) THEN
         temp1 = min( big, temp1 / aapp )
*         AAQQ  = AAQQ*TEMP1
*         AAPP  = AAPP*TEMP1
      ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.LE.temp1 ) ) THEN
         temp1 = min( sn / aaqq, big / ( aapp*dsqrt( dble( n ) ) ) )
*         AAQQ  = AAQQ*TEMP1
*         AAPP  = AAPP*TEMP1
      ELSE IF( ( aaqq.GE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
         temp1 = max( sn / aaqq, temp1 / aapp )
*         AAQQ  = AAQQ*TEMP1
*         AAPP  = AAPP*TEMP1
      ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
         temp1 = min( sn / aaqq, big / ( dsqrt( dble( n ) )*aapp ) )
*         AAQQ  = AAQQ*TEMP1
*         AAPP  = AAPP*TEMP1
      ELSE
         temp1 = one
      END IF
*
*     Scale, if necessary
*
      IF( temp1.NE.one ) THEN
         CALL dlascl( 'G', 0, 0, one, temp1, n, 1, sva, n, ierr )
      END IF
      skl= temp1*skl
      IF( skl.NE.one ) THEN
         CALL dlascl( joba, 0, 0, one, skl, m, n, a, lda, ierr )
         skl= one / skl
      END IF
*
*     Row-cyclic Jacobi SVD algorithm with column pivoting
*
      emptsw = ( n*( n-1 ) ) / 2
      notrot = 0
      fastr( 1 ) = zero
*
*     A is represented in factored form A = A * diag(WORK), where diag(WORK)
*     is initialized to identity. WORK is updated during fast scaled
*     rotations.
*
      DO 1868 q = 1, n
         work( q ) = one
 1868 CONTINUE
*
*
      swband = 3
*[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective
*     if DGESVJ is used as a computational routine in the preconditioned
*     Jacobi SVD algorithm DGESVJ. For sweeps i=1:SWBAND the procedure
*     works on pivots inside a band-like region around the diagonal.
*     The boundaries are determined dynamically, based on the number of
*     pivots above a threshold.
*
      kbl = min( 8, n )
*[TP] KBL is a tuning parameter that defines the tile size in the
*     tiling of the p-q loops of pivot pairs. In general, an optimal
*     value of KBL depends on the matrix dimensions and on the
*     parameters of the computer's memory.
*
      nbl = n / kbl
      IF( ( nbl*kbl ).NE.n )nbl = nbl + 1
*
      blskip = kbl**2
*[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL.
*
      rowskip = min( 5, kbl )
*[TP] ROWSKIP is a tuning parameter.
*
      lkahead = 1
*[TP] LKAHEAD is a tuning parameter.
*
*     Quasi block transformations, using the lower (upper) triangular
*     structure of the input matrix. The quasi-block-cycling usually
*     invokes cubic convergence. Big part of this cycle is done inside
*     canonical subspaces of dimensions less than M.
*
      IF( ( lower .OR. upper ) .AND. ( n.GT.max( 64, 4*kbl ) ) ) THEN
*[TP] The number of partition levels and the actual partition are
*     tuning parameters.
         n4 = n / 4
         n2 = n / 2
         n34 = 3*n4
         IF( applv ) THEN
            q = 0
         ELSE
            q = 1
         END IF
*
         IF( lower ) THEN
*
*     This works very well on lower triangular matrices, in particular
*     in the framework of the preconditioned Jacobi SVD (xGEJSV).
*     The idea is simple:
*     [+ 0 0 0]   Note that Jacobi transformations of [0 0]
*     [+ + 0 0]                                       [0 0]
*     [+ + x 0]   actually work on [x 0]              [x 0]
*     [+ + x x]                    [x x].             [x x]
*
            CALL dgsvj0( jobv, m-n34, n-n34, a( n34+1, n34+1 ), lda,
     $                   work( n34+1 ), sva( n34+1 ), mvl,
     $                   v( n34*q+1, n34+1 ), ldv, epsln, sfmin, tol,
     $                   2, work( n+1 ), lwork-n, ierr )
*
            CALL dgsvj0( jobv, m-n2, n34-n2, a( n2+1, n2+1 ), lda,
     $                   work( n2+1 ), sva( n2+1 ), mvl,
     $                   v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 2,
     $                   work( n+1 ), lwork-n, ierr )
*
            CALL dgsvj1( jobv, m-n2, n-n2, n4, a( n2+1, n2+1 ), lda,
     $                   work( n2+1 ), sva( n2+1 ), mvl,
     $                   v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
     $                   work( n+1 ), lwork-n, ierr )
*
            CALL dgsvj0( jobv, m-n4, n2-n4, a( n4+1, n4+1 ), lda,
     $                   work( n4+1 ), sva( n4+1 ), mvl,
     $                   v( n4*q+1, n4+1 ), ldv, epsln, sfmin, tol, 1,
     $                   work( n+1 ), lwork-n, ierr )
*
            CALL dgsvj0( jobv, m, n4, a, lda, work, sva, mvl, v, ldv,
     $                   epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
     $                   ierr )
*
            CALL dgsvj1( jobv, m, n2, n4, a, lda, work, sva, mvl, v,
     $                   ldv, epsln, sfmin, tol, 1, work( n+1 ),
     $                   lwork-n, ierr )
*
*
         ELSE IF( upper ) THEN
*
*
            CALL dgsvj0( jobv, n4, n4, a, lda, work, sva, mvl, v, ldv,
     $                   epsln, sfmin, tol, 2, work( n+1 ), lwork-n,
     $                   ierr )
*
            CALL dgsvj0( jobv, n2, n4, a( 1, n4+1 ), lda, work( n4+1 ),
     $                   sva( n4+1 ), mvl, v( n4*q+1, n4+1 ), ldv,
     $                   epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
     $                   ierr )
*
            CALL dgsvj1( jobv, n2, n2, n4, a, lda, work, sva, mvl, v,
     $                   ldv, epsln, sfmin, tol, 1, work( n+1 ),
     $                   lwork-n, ierr )
*
            CALL dgsvj0( jobv, n2+n4, n4, a( 1, n2+1 ), lda,
     $                   work( n2+1 ), sva( n2+1 ), mvl,
     $                   v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
     $                   work( n+1 ), lwork-n, ierr )
 
         END IF
*
      END IF
*
*     .. Row-cyclic pivot strategy with de Rijk's pivoting ..
*
      DO 1993 i = 1, nsweep
*
*     .. go go go ...
*
         mxaapq = zero
         mxsinj = zero
         iswrot = 0
*
         notrot = 0
         pskipped = 0
*
*     Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs
*     1 <= p < q <= N. This is the first step toward a blocked implementation
*     of the rotations. New implementation, based on block transformations,
*     is under development.
*
         DO 2000 ibr = 1, nbl
*
            igl = ( ibr-1 )*kbl + 1
*
            DO 1002 ir1 = 0, min( lkahead, nbl-ibr )
*
               igl = igl + ir1*kbl
*
               DO 2001 p = igl, min( igl+kbl-1, n-1 )
*
*     .. de Rijk's pivoting
*
                  q = idamax( n-p+1, sva( p ), 1 ) + p - 1
                  IF( p.NE.q ) THEN
                     CALL dswap( m, a( 1, p ), 1, a( 1, q ), 1 )
                     IF( rsvec )CALL dswap( mvl, v( 1, p ), 1,
     $                                      v( 1, q ), 1 )
                     temp1 = sva( p )
                     sva( p ) = sva( q )
                     sva( q ) = temp1
                     temp1 = work( p )
                     work( p ) = work( q )
                     work( q ) = temp1
                  END IF
*
                  IF( ir1.EQ.0 ) THEN
*
*        Column norms are periodically updated by explicit
*        norm computation.
*        Caveat:
*        Unfortunately, some BLAS implementations compute DNRM2(M,A(1,p),1)
*        as DSQRT(DDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to
*        overflow for ||A(:,p)||_2 > DSQRT(overflow_threshold), and to
*        underflow for ||A(:,p)||_2 < DSQRT(underflow_threshold).
*        Hence, DNRM2 cannot be trusted, not even in the case when
*        the true norm is far from the under(over)flow boundaries.
*        If properly implemented DNRM2 is available, the IF-THEN-ELSE
*        below should read "AAPP = DNRM2( M, A(1,p), 1 ) * WORK(p)".
*
                     IF( ( sva( p ).LT.rootbig ) .AND.
     $                   ( sva( p ).GT.rootsfmin ) ) THEN
                        sva( p ) = dnrm2( m, a( 1, p ), 1 )*work( p )
                     ELSE
                        temp1 = zero
                        aapp = one
                        CALL dlassq( m, a( 1, p ), 1, temp1, aapp )
                        sva( p ) = temp1*dsqrt( aapp )*work( p )
                     END IF
                     aapp = sva( p )
                  ELSE
                     aapp = sva( p )
                  END IF
*
                  IF( aapp.GT.zero ) THEN
*
                     pskipped = 0
*
                     DO 2002 q = p + 1, min( igl+kbl-1, n )
*
                        aaqq = sva( q )
*
                        IF( aaqq.GT.zero ) THEN
*
                           aapp0 = aapp
                           IF( aaqq.GE.one ) THEN
                              rotok = ( small*aapp ).LE.aaqq
                              IF( aapp.LT.( big / aaqq ) ) THEN
                                 aapq = ( ddot( m, a( 1, p ), 1, a( 1,
     $                                  q ), 1 )*work( p )*work( q ) /
     $                                  aaqq ) / aapp
                              ELSE
                                 CALL dcopy( m, a( 1, p ), 1,
     $                                       work( n+1 ), 1 )
                                 CALL dlascl( 'G', 0, 0, aapp,
     $                                        work( p ), m, 1,
     $                                        work( n+1 ), lda, ierr )
                                 aapq = ddot( m, work( n+1 ), 1,
     $                                  a( 1, q ), 1 )*work( q ) / aaqq
                              END IF
                           ELSE
                              rotok = aapp.LE.( aaqq / small )
                              IF( aapp.GT.( small / aaqq ) ) THEN
                                 aapq = ( ddot( m, a( 1, p ), 1, a( 1,
     $                                  q ), 1 )*work( p )*work( q ) /
     $                                  aaqq ) / aapp
                              ELSE
                                 CALL dcopy( m, a( 1, q ), 1,
     $                                       work( n+1 ), 1 )
                                 CALL dlascl( 'G', 0, 0, aaqq,
     $                                        work( q ), m, 1,
     $                                        work( n+1 ), lda, ierr )
                                 aapq = ddot( m, work( n+1 ), 1,
     $                                  a( 1, p ), 1 )*work( p ) / aapp
                              END IF
                           END IF
*
                           mxaapq = max( mxaapq, dabs( aapq ) )
*
*        TO rotate or NOT to rotate, THAT is the question ...
*
                           IF( dabs( aapq ).GT.tol ) THEN
*
*           .. rotate
*[RTD]      ROTATED = ROTATED + ONE
*
                              IF( ir1.EQ.0 ) THEN
                                 notrot = 0
                                 pskipped = 0
                                 iswrot = iswrot + 1
                              END IF
*
                              IF( rotok ) THEN
*
                                 aqoap = aaqq / aapp
                                 apoaq = aapp / aaqq
                                 theta = -half*dabs(aqoap-apoaq)/aapq
*
                                 IF( dabs( theta ).GT.bigtheta ) THEN
*
                                    t = half / theta
                                    fastr( 3 ) = t*work( p ) / work( q )
                                    fastr( 4 ) = -t*work( q ) /
     $                                           work( p )
                                    CALL drotm( m, a( 1, p ), 1,
     $                                          a( 1, q ), 1, fastr )
                                    IF( rsvec )CALL drotm( mvl,
     $                                              v( 1, p ), 1,
     $                                              v( 1, q ), 1,
     $                                              fastr )
                                    sva( q ) = aaqq*dsqrt( max( zero,
     $                                         one+t*apoaq*aapq ) )
                                    aapp = aapp*dsqrt( max( zero,
     $                                     one-t*aqoap*aapq ) )
                                    mxsinj = max( mxsinj, dabs( t ) )
*
                                 ELSE
*
*                 .. choose correct signum for THETA and rotate
*
                                    thsign = -dsign( one, aapq )
                                    t = one / ( theta+thsign*
     $                                  dsqrt( one+theta*theta ) )
                                    cs = dsqrt( one / ( one+t*t ) )
                                    sn = t*cs
*
                                    mxsinj = max( mxsinj, dabs( sn ) )
                                    sva( q ) = aaqq*dsqrt( max( zero,
     $                                         one+t*apoaq*aapq ) )
                                    aapp = aapp*dsqrt( max( zero,
     $                                     one-t*aqoap*aapq ) )
*
                                    apoaq = work( p ) / work( q )
                                    aqoap = work( q ) / work( p )
                                    IF( work( p ).GE.one ) THEN
                                       IF( work( q ).GE.one ) THEN
                                          fastr( 3 ) = t*apoaq
                                          fastr( 4 ) = -t*aqoap
                                          work( p ) = work( p )*cs
                                          work( q ) = work( q )*cs
                                          CALL drotm( m, a( 1, p ), 1,
     $                                                a( 1, q ), 1,
     $                                                fastr )
                                          IF( rsvec )CALL drotm( mvl,
     $                                        v( 1, p ), 1, v( 1, q ),
     $                                        1, fastr )
                                       ELSE
                                          CALL daxpy( m, -t*aqoap,
     $                                                a( 1, q ), 1,
     $                                                a( 1, p ), 1 )
                                          CALL daxpy( m, cs*sn*apoaq,
     $                                                a( 1, p ), 1,
     $                                                a( 1, q ), 1 )
                                          work( p ) = work( p )*cs
                                          work( q ) = work( q ) / cs
                                          IF( rsvec ) THEN
                                             CALL daxpy( mvl, -t*aqoap,
     $                                                   v( 1, q ), 1,
     $                                                   v( 1, p ), 1 )
                                             CALL daxpy( mvl,
     $                                                   cs*sn*apoaq,
     $                                                   v( 1, p ), 1,
     $                                                   v( 1, q ), 1 )
                                          END IF
                                       END IF
                                    ELSE
                                       IF( work( q ).GE.one ) THEN
                                          CALL daxpy( m, t*apoaq,
     $                                                a( 1, p ), 1,
     $                                                a( 1, q ), 1 )
                                          CALL daxpy( m, -cs*sn*aqoap,
     $                                                a( 1, q ), 1,
     $                                                a( 1, p ), 1 )
                                          work( p ) = work( p ) / cs
                                          work( q ) = work( q )*cs
                                          IF( rsvec ) THEN
                                             CALL daxpy( mvl, t*apoaq,
     $                                                   v( 1, p ), 1,
     $                                                   v( 1, q ), 1 )
                                             CALL daxpy( mvl,
     $                                                   -cs*sn*aqoap,
     $                                                   v( 1, q ), 1,
     $                                                   v( 1, p ), 1 )
                                          END IF
                                       ELSE
                                          IF( work( p ).GE.work( q ) )
     $                                        THEN
                                             CALL daxpy( m, -t*aqoap,
     $                                                   a( 1, q ), 1,
     $                                                   a( 1, p ), 1 )
                                             CALL daxpy( m, cs*sn*apoaq,
     $                                                   a( 1, p ), 1,
     $                                                   a( 1, q ), 1 )
                                             work( p ) = work( p )*cs
                                             work( q ) = work( q ) / cs
                                             IF( rsvec ) THEN
                                                CALL daxpy( mvl,
     $                                               -t*aqoap,
     $                                               v( 1, q ), 1,
     $                                               v( 1, p ), 1 )
                                                CALL daxpy( mvl,
     $                                               cs*sn*apoaq,
     $                                               v( 1, p ), 1,
     $                                               v( 1, q ), 1 )
                                             END IF
                                          ELSE
                                             CALL daxpy( m, t*apoaq,
     $                                                   a( 1, p ), 1,
     $                                                   a( 1, q ), 1 )
                                             CALL daxpy( m,
     $                                                   -cs*sn*aqoap,
     $                                                   a( 1, q ), 1,
     $                                                   a( 1, p ), 1 )
                                             work( p ) = work( p ) / cs
                                             work( q ) = work( q )*cs
                                             IF( rsvec ) THEN
                                                CALL daxpy( mvl,
     $                                               t*apoaq, v( 1, p ),
     $                                               1, v( 1, q ), 1 )
                                                CALL daxpy( mvl,
     $                                               -cs*sn*aqoap,
     $                                               v( 1, q ), 1,
     $                                               v( 1, p ), 1 )
                                             END IF
                                          END IF
                                       END IF
                                    END IF
                                 END IF
*
                              ELSE
*              .. have to use modified Gram-Schmidt like transformation
                                 CALL dcopy( m, a( 1, p ), 1,
     $                                       work( n+1 ), 1 )
                                 CALL dlascl( 'G', 0, 0, aapp, one, m,
     $                                        1, work( n+1 ), lda,
     $                                        ierr )
                                 CALL dlascl( 'G', 0, 0, aaqq, one, m,
     $                                        1, a( 1, q ), lda, ierr )
                                 temp1 = -aapq*work( p ) / work( q )
                                 CALL daxpy( m, temp1, work( n+1 ), 1,
     $                                       a( 1, q ), 1 )
                                 CALL dlascl( 'G', 0, 0, one, aaqq, m,
     $                                        1, a( 1, q ), lda, ierr )
                                 sva( q ) = aaqq*dsqrt( max( zero,
     $                                      one-aapq*aapq ) )
                                 mxsinj = max( mxsinj, sfmin )
                              END IF
*           END IF ROTOK THEN ... ELSE
*
*           In the case of cancellation in updating SVA(q), SVA(p)
*           recompute SVA(q), SVA(p).
*
                              IF( ( sva( q ) / aaqq )**2.LE.rooteps )
     $                            THEN
                                 IF( ( aaqq.LT.rootbig ) .AND.
     $                               ( aaqq.GT.rootsfmin ) ) THEN
                                    sva( q ) = dnrm2( m, a( 1, q ), 1 )*
     $                                         work( q )
                                 ELSE
                                    t = zero
                                    aaqq = one
                                    CALL dlassq( m, a( 1, q ), 1, t,
     $                                           aaqq )
                                    sva( q ) = t*dsqrt( aaqq )*work( q )
                                 END IF
                              END IF
                              IF( ( aapp / aapp0 ).LE.rooteps ) THEN
                                 IF( ( aapp.LT.rootbig ) .AND.
     $                               ( aapp.GT.rootsfmin ) ) THEN
                                    aapp = dnrm2( m, a( 1, p ), 1 )*
     $                                     work( p )
                                 ELSE
                                    t = zero
                                    aapp = one
                                    CALL dlassq( m, a( 1, p ), 1, t,
     $                                           aapp )
                                    aapp = t*dsqrt( aapp )*work( p )
                                 END IF
                                 sva( p ) = aapp
                              END IF
*
                           ELSE
*        A(:,p) and A(:,q) already numerically orthogonal
                              IF( ir1.EQ.0 )notrot = notrot + 1
*[RTD]      SKIPPED  = SKIPPED  + 1
                              pskipped = pskipped + 1
                           END IF
                        ELSE
*        A(:,q) is zero column
                           IF( ir1.EQ.0 )notrot = notrot + 1
                           pskipped = pskipped + 1
                        END IF
*
                        IF( ( i.LE.swband ) .AND.
     $                      ( pskipped.GT.rowskip ) ) THEN
                           IF( ir1.EQ.0 )aapp = -aapp
                           notrot = 0
                           GO TO 2103
                        END IF
*
 2002                CONTINUE
*     END q-LOOP
*
 2103                CONTINUE
*     bailed out of q-loop
*
                     sva( p ) = aapp
*
                  ELSE
                     sva( p ) = aapp
                     IF( ( ir1.EQ.0 ) .AND. ( aapp.EQ.zero ) )
     $                   notrot = notrot + min( igl+kbl-1, n ) - p
                  END IF
*
 2001          CONTINUE
*     end of the p-loop
*     end of doing the block ( ibr, ibr )
 1002       CONTINUE
*     end of ir1-loop
*
* ... go to the off diagonal blocks
*
            igl = ( ibr-1 )*kbl + 1
*
            DO 2010 jbc = ibr + 1, nbl
*
               jgl = ( jbc-1 )*kbl + 1
*
*        doing the block at ( ibr, jbc )
*
               ijblsk = 0
               DO 2100 p = igl, min( igl+kbl-1, n )
*
                  aapp = sva( p )
                  IF( aapp.GT.zero ) THEN
*
                     pskipped = 0
*
                     DO 2200 q = jgl, min( jgl+kbl-1, n )
*
                        aaqq = sva( q )
                        IF( aaqq.GT.zero ) THEN
                           aapp0 = aapp
*
*     .. M x 2 Jacobi SVD ..
*
*        Safe Gram matrix computation
*
                           IF( aaqq.GE.one ) THEN
                              IF( aapp.GE.aaqq ) THEN
                                 rotok = ( small*aapp ).LE.aaqq
                              ELSE
                                 rotok = ( small*aaqq ).LE.aapp
                              END IF
                              IF( aapp.LT.( big / aaqq ) ) THEN
                                 aapq = ( ddot( m, a( 1, p ), 1, a( 1,
     $                                  q ), 1 )*work( p )*work( q ) /
     $                                  aaqq ) / aapp
                              ELSE
                                 CALL dcopy( m, a( 1, p ), 1,
     $                                       work( n+1 ), 1 )
                                 CALL dlascl( 'G', 0, 0, aapp,
     $                                        work( p ), m, 1,
     $                                        work( n+1 ), lda, ierr )
                                 aapq = ddot( m, work( n+1 ), 1,
     $                                  a( 1, q ), 1 )*work( q ) / aaqq
                              END IF
                           ELSE
                              IF( aapp.GE.aaqq ) THEN
                                 rotok = aapp.LE.( aaqq / small )
                              ELSE
                                 rotok = aaqq.LE.( aapp / small )
                              END IF
                              IF( aapp.GT.( small / aaqq ) ) THEN
                                 aapq = ( ddot( m, a( 1, p ), 1, a( 1,
     $                                  q ), 1 )*work( p )*work( q ) /
     $                                  aaqq ) / aapp
                              ELSE
                                 CALL dcopy( m, a( 1, q ), 1,
     $                                       work( n+1 ), 1 )
                                 CALL dlascl( 'G', 0, 0, aaqq,
     $                                        work( q ), m, 1,
     $                                        work( n+1 ), lda, ierr )
                                 aapq = ddot( m, work( n+1 ), 1,
     $                                  a( 1, p ), 1 )*work( p ) / aapp
                              END IF
                           END IF
*
                           mxaapq = max( mxaapq, dabs( aapq ) )
*
*        TO rotate or NOT to rotate, THAT is the question ...
*
                           IF( dabs( aapq ).GT.tol ) THEN
                              notrot = 0
*[RTD]      ROTATED  = ROTATED + 1
                              pskipped = 0
                              iswrot = iswrot + 1
*
                              IF( rotok ) THEN
*
                                 aqoap = aaqq / aapp
                                 apoaq = aapp / aaqq
                                 theta = -half*dabs(aqoap-apoaq)/aapq
                                 IF( aaqq.GT.aapp0 )theta = -theta
*
                                 IF( dabs( theta ).GT.bigtheta ) THEN
                                    t = half / theta
                                    fastr( 3 ) = t*work( p ) / work( q )
                                    fastr( 4 ) = -t*work( q ) /
     $                                           work( p )
                                    CALL drotm( m, a( 1, p ), 1,
     $                                          a( 1, q ), 1, fastr )
                                    IF( rsvec )CALL drotm( mvl,
     $                                              v( 1, p ), 1,
     $                                              v( 1, q ), 1,
     $                                              fastr )
                                    sva( q ) = aaqq*dsqrt( max( zero,
     $                                         one+t*apoaq*aapq ) )
                                    aapp = aapp*dsqrt( max( zero,
     $                                     one-t*aqoap*aapq ) )
                                    mxsinj = max( mxsinj, dabs( t ) )
                                 ELSE
*
*                 .. choose correct signum for THETA and rotate
*
                                    thsign = -dsign( one, aapq )
                                    IF( aaqq.GT.aapp0 )thsign = -thsign
                                    t = one / ( theta+thsign*
     $                                  dsqrt( one+theta*theta ) )
                                    cs = dsqrt( one / ( one+t*t ) )
                                    sn = t*cs
                                    mxsinj = max( mxsinj, dabs( sn ) )
                                    sva( q ) = aaqq*dsqrt( max( zero,
     $                                         one+t*apoaq*aapq ) )
                                    aapp = aapp*dsqrt( max( zero,
     $                                     one-t*aqoap*aapq ) )
*
                                    apoaq = work( p ) / work( q )
                                    aqoap = work( q ) / work( p )
                                    IF( work( p ).GE.one ) THEN
*
                                       IF( work( q ).GE.one ) THEN
                                          fastr( 3 ) = t*apoaq
                                          fastr( 4 ) = -t*aqoap
                                          work( p ) = work( p )*cs
                                          work( q ) = work( q )*cs
                                          CALL drotm( m, a( 1, p ), 1,
     $                                                a( 1, q ), 1,
     $                                                fastr )
                                          IF( rsvec )CALL drotm( mvl,
     $                                        v( 1, p ), 1, v( 1, q ),
     $                                        1, fastr )
                                       ELSE
                                          CALL daxpy( m, -t*aqoap,
     $                                                a( 1, q ), 1,
     $                                                a( 1, p ), 1 )
                                          CALL daxpy( m, cs*sn*apoaq,
     $                                                a( 1, p ), 1,
     $                                                a( 1, q ), 1 )
                                          IF( rsvec ) THEN
                                             CALL daxpy( mvl, -t*aqoap,
     $                                                   v( 1, q ), 1,
     $                                                   v( 1, p ), 1 )
                                             CALL daxpy( mvl,
     $                                                   cs*sn*apoaq,
     $                                                   v( 1, p ), 1,
     $                                                   v( 1, q ), 1 )
                                          END IF
                                          work( p ) = work( p )*cs
                                          work( q ) = work( q ) / cs
                                       END IF
                                    ELSE
                                       IF( work( q ).GE.one ) THEN
                                          CALL daxpy( m, t*apoaq,
     $                                                a( 1, p ), 1,
     $                                                a( 1, q ), 1 )
                                          CALL daxpy( m, -cs*sn*aqoap,
     $                                                a( 1, q ), 1,
     $                                                a( 1, p ), 1 )
                                          IF( rsvec ) THEN
                                             CALL daxpy( mvl, t*apoaq,
     $                                                   v( 1, p ), 1,
     $                                                   v( 1, q ), 1 )
                                             CALL daxpy( mvl,
     $                                                   -cs*sn*aqoap,
     $                                                   v( 1, q ), 1,
     $                                                   v( 1, p ), 1 )
                                          END IF
                                          work( p ) = work( p ) / cs
                                          work( q ) = work( q )*cs
                                       ELSE
                                          IF( work( p ).GE.work( q ) )
     $                                        THEN
                                             CALL daxpy( m, -t*aqoap,
     $                                                   a( 1, q ), 1,
     $                                                   a( 1, p ), 1 )
                                             CALL daxpy( m, cs*sn*apoaq,
     $                                                   a( 1, p ), 1,
     $                                                   a( 1, q ), 1 )
                                             work( p ) = work( p )*cs
                                             work( q ) = work( q ) / cs
                                             IF( rsvec ) THEN
                                                CALL daxpy( mvl,
     $                                               -t*aqoap,
     $                                               v( 1, q ), 1,
     $                                               v( 1, p ), 1 )
                                                CALL daxpy( mvl,
     $                                               cs*sn*apoaq,
     $                                               v( 1, p ), 1,
     $                                               v( 1, q ), 1 )
                                             END IF
                                          ELSE
                                             CALL daxpy( m, t*apoaq,
     $                                                   a( 1, p ), 1,
     $                                                   a( 1, q ), 1 )
                                             CALL daxpy( m,
     $                                                   -cs*sn*aqoap,
     $                                                   a( 1, q ), 1,
     $                                                   a( 1, p ), 1 )
                                             work( p ) = work( p ) / cs
                                             work( q ) = work( q )*cs
                                             IF( rsvec ) THEN
                                                CALL daxpy( mvl,
     $                                               t*apoaq, v( 1, p ),
     $                                               1, v( 1, q ), 1 )
                                                CALL daxpy( mvl,
     $                                               -cs*sn*aqoap,
     $                                               v( 1, q ), 1,
     $                                               v( 1, p ), 1 )
                                             END IF
                                          END IF
                                       END IF
                                    END IF
                                 END IF
*
                              ELSE
                                 IF( aapp.GT.aaqq ) THEN
                                    CALL dcopy( m, a( 1, p ), 1,
     $                                          work( n+1 ), 1 )
                                    CALL dlascl( 'G', 0, 0, aapp, one,
     $                                           m, 1, work( n+1 ), lda,
     $                                           ierr )
                                    CALL dlascl( 'G', 0, 0, aaqq, one,
     $                                           m, 1, a( 1, q ), lda,
     $                                           ierr )
                                    temp1 = -aapq*work( p ) / work( q )
                                    CALL daxpy( m, temp1, work( n+1 ),
     $                                          1, a( 1, q ), 1 )
                                    CALL dlascl( 'G', 0, 0, one, aaqq,
     $                                           m, 1, a( 1, q ), lda,
     $                                           ierr )
                                    sva( q ) = aaqq*dsqrt( max( zero,
     $                                         one-aapq*aapq ) )
                                    mxsinj = max( mxsinj, sfmin )
                                 ELSE
                                    CALL dcopy( m, a( 1, q ), 1,
     $                                          work( n+1 ), 1 )
                                    CALL dlascl( 'G', 0, 0, aaqq, one,
     $                                           m, 1, work( n+1 ), lda,
     $                                           ierr )
                                    CALL dlascl( 'G', 0, 0, aapp, one,
     $                                           m, 1, a( 1, p ), lda,
     $                                           ierr )
                                    temp1 = -aapq*work( q ) / work( p )
                                    CALL daxpy( m, temp1, work( n+1 ),
     $                                          1, a( 1, p ), 1 )
                                    CALL dlascl( 'G', 0, 0, one, aapp,
     $                                           m, 1, a( 1, p ), lda,
     $                                           ierr )
                                    sva( p ) = aapp*dsqrt( max( zero,
     $                                         one-aapq*aapq ) )
                                    mxsinj = max( mxsinj, sfmin )
                                 END IF
                              END IF
*           END IF ROTOK THEN ... ELSE
*
*           In the case of cancellation in updating SVA(q)
*           .. recompute SVA(q)
                              IF( ( sva( q ) / aaqq )**2.LE.rooteps )
     $                            THEN
                                 IF( ( aaqq.LT.rootbig ) .AND.
     $                               ( aaqq.GT.rootsfmin ) ) THEN
                                    sva( q ) = dnrm2( m, a( 1, q ), 1 )*
     $                                         work( q )
                                 ELSE
                                    t = zero
                                    aaqq = one
                                    CALL dlassq( m, a( 1, q ), 1, t,
     $                                           aaqq )
                                    sva( q ) = t*dsqrt( aaqq )*work( q )
                                 END IF
                              END IF
                              IF( ( aapp / aapp0 )**2.LE.rooteps ) THEN
                                 IF( ( aapp.LT.rootbig ) .AND.
     $                               ( aapp.GT.rootsfmin ) ) THEN
                                    aapp = dnrm2( m, a( 1, p ), 1 )*
     $                                     work( p )
                                 ELSE
                                    t = zero
                                    aapp = one
                                    CALL dlassq( m, a( 1, p ), 1, t,
     $                                           aapp )
                                    aapp = t*dsqrt( aapp )*work( p )
                                 END IF
                                 sva( p ) = aapp
                              END IF
*              end of OK rotation
                           ELSE
                              notrot = notrot + 1
*[RTD]      SKIPPED  = SKIPPED  + 1
                              pskipped = pskipped + 1
                              ijblsk = ijblsk + 1
                           END IF
                        ELSE
                           notrot = notrot + 1
                           pskipped = pskipped + 1
                           ijblsk = ijblsk + 1
                        END IF
*
                        IF( ( i.LE.swband ) .AND. ( ijblsk.GE.blskip ) )
     $                      THEN
                           sva( p ) = aapp
                           notrot = 0
                           GO TO 2011
                        END IF
                        IF( ( i.LE.swband ) .AND.
     $                      ( pskipped.GT.rowskip ) ) THEN
                           aapp = -aapp
                           notrot = 0
                           GO TO 2203
                        END IF
*
 2200                CONTINUE
*        end of the q-loop
 2203                CONTINUE
*
                     sva( p ) = aapp
*
                  ELSE
*
                     IF( aapp.EQ.zero )notrot = notrot +
     $                   min( jgl+kbl-1, n ) - jgl + 1
                     IF( aapp.LT.zero )notrot = 0
*
                  END IF
*
 2100          CONTINUE
*     end of the p-loop
 2010       CONTINUE
*     end of the jbc-loop
 2011       CONTINUE
*2011 bailed out of the jbc-loop
            DO 2012 p = igl, min( igl+kbl-1, n )
               sva( p ) = dabs( sva( p ) )
 2012       CONTINUE
***
 2000    CONTINUE
*2000 :: end of the ibr-loop
*
*     .. update SVA(N)
         IF( ( sva( n ).LT.rootbig ) .AND. ( sva( n ).GT.rootsfmin ) )
     $       THEN
            sva( n ) = dnrm2( m, a( 1, n ), 1 )*work( n )
         ELSE
            t = zero
            aapp = one
            CALL dlassq( m, a( 1, n ), 1, t, aapp )
            sva( n ) = t*dsqrt( aapp )*work( n )
         END IF
*
*     Additional steering devices
*
         IF( ( i.LT.swband ) .AND. ( ( mxaapq.LE.roottol ) .OR.
     $       ( iswrot.LE.n ) ) )swband = i
*
         IF( ( i.GT.swband+1 ) .AND. ( mxaapq.LT.dsqrt( dble( n ) )*
     $       tol ) .AND. ( dble( n )*mxaapq*mxsinj.LT.tol ) ) THEN
            GO TO 1994
         END IF
*
         IF( notrot.GE.emptsw )GO TO 1994
*
 1993 CONTINUE
*     end i=1:NSWEEP loop
*
* #:( Reaching this point means that the procedure has not converged.
      info = nsweep - 1
      GO TO 1995
*
 1994 CONTINUE
* #:) Reaching this point means numerical convergence after the i-th
*     sweep.
*
      info = 0
* #:) INFO = 0 confirms successful iterations.
 1995 CONTINUE
*
*     Sort the singular values and find how many are above
*     the underflow threshold.
*
      n2 = 0
      n4 = 0
      DO 5991 p = 1, n - 1
         q = idamax( n-p+1, sva( p ), 1 ) + p - 1
         IF( p.NE.q ) THEN
            temp1 = sva( p )
            sva( p ) = sva( q )
            sva( q ) = temp1
            temp1 = work( p )
            work( p ) = work( q )
            work( q ) = temp1
            CALL dswap( m, a( 1, p ), 1, a( 1, q ), 1 )
            IF( rsvec )CALL dswap( mvl, v( 1, p ), 1, v( 1, q ), 1 )
         END IF
         IF( sva( p ).NE.zero ) THEN
            n4 = n4 + 1
            IF( sva( p )*skl.GT.sfmin )n2 = n2 + 1
         END IF
 5991 CONTINUE
      IF( sva( n ).NE.zero ) THEN
         n4 = n4 + 1
         IF( sva( n )*skl.GT.sfmin )n2 = n2 + 1
      END IF
*
*     Normalize the left singular vectors.
*
      IF( lsvec .OR. uctol ) THEN
         DO 1998 p = 1, n2
            CALL dscal( m, work( p ) / sva( p ), a( 1, p ), 1 )
 1998    CONTINUE
      END IF
*
*     Scale the product of Jacobi rotations (assemble the fast rotations).
*
      IF( rsvec ) THEN
         IF( applv ) THEN
            DO 2398 p = 1, n
               CALL dscal( mvl, work( p ), v( 1, p ), 1 )
 2398       CONTINUE
         ELSE
            DO 2399 p = 1, n
               temp1 = one / dnrm2( mvl, v( 1, p ), 1 )
               CALL dscal( mvl, temp1, v( 1, p ), 1 )
 2399       CONTINUE
         END IF
      END IF
*
*     Undo scaling, if necessary (and possible).
      IF( ( ( skl.GT.one ) .AND. ( sva( 1 ).LT.( big / skl) ) )
     $    .OR. ( ( skl.LT.one ) .AND. ( sva( max( n2, 1 ) ) .GT.
     $    ( sfmin / skl) ) ) ) THEN
         DO 2400 p = 1, n
            sva( p ) = skl*sva( p )
 2400    CONTINUE
         skl= one
      END IF
*
      work( 1 ) = skl
*     The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE
*     then some of the singular values may overflow or underflow and
*     the spectrum is given in this factored representation.
*
      work( 2 ) = dble( n4 )
*     N4 is the number of computed nonzero singular values of A.
*
      work( 3 ) = dble( n2 )
*     N2 is the number of singular values of A greater than SFMIN.
*     If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers
*     that may carry some information.
*
      work( 4 ) = dble( i )
*     i is the index of the last sweep before declaring convergence.
*
      work( 5 ) = mxaapq
*     MXAAPQ is the largest absolute value of scaled pivots in the
*     last sweep
*
      work( 6 ) = mxsinj
*     MXSINJ is the largest absolute value of the sines of Jacobi angles
*     in the last sweep
*
      RETURN
*     ..
*     .. END OF DGESVJ
*     ..

Here is the call graph for this function:

Here is the caller graph for this function: