LAPACK  3.8.0
LAPACK: Linear Algebra PACKage

◆ dgejsv()

subroutine dgejsv ( character*1  JOBA,
character*1  JOBU,
character*1  JOBV,
character*1  JOBR,
character*1  JOBT,
character*1  JOBP,
integer  M,
integer  N,
double precision, dimension( lda, * )  A,
integer  LDA,
double precision, dimension( n )  SVA,
double precision, dimension( ldu, * )  U,
integer  LDU,
double precision, dimension( ldv, * )  V,
integer  LDV,
double precision, dimension( lwork )  WORK,
integer  LWORK,
integer, dimension( * )  IWORK,
integer  INFO 
)

DGEJSV

Download DGEJSV + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 DGEJSV computes the singular value decomposition (SVD) of a real M-by-N
 matrix [A], where M >= N. The SVD of [A] is written as

              [A] = [U] * [SIGMA] * [V]^t,

 where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
 diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix, and
 [V] is an N-by-N orthogonal matrix. The diagonal elements of [SIGMA] are
 the singular values of [A]. The columns of [U] and [V] are the left and
 the right singular vectors of [A], respectively. The matrices [U] and [V]
 are computed and stored in the arrays U and V, respectively. The diagonal
 of [SIGMA] is computed and stored in the array SVA.
 DGEJSV can sometimes compute tiny singular values and their singular vectors much
 more accurately than other SVD routines, see below under Further Details.
Parameters
[in]JOBA
          JOBA is CHARACTER*1
        Specifies the level of accuracy:
       = 'C': This option works well (high relative accuracy) if A = B * D,
             with well-conditioned B and arbitrary diagonal matrix D.
             The accuracy cannot be spoiled by COLUMN scaling. The
             accuracy of the computed output depends on the condition of
             B, and the procedure aims at the best theoretical accuracy.
             The relative error max_{i=1:N}|d sigma_i| / sigma_i is
             bounded by f(M,N)*epsilon* cond(B), independent of D.
             The input matrix is preprocessed with the QRF with column
             pivoting. This initial preprocessing and preconditioning by
             a rank revealing QR factorization is common for all values of
             JOBA. Additional actions are specified as follows:
       = 'E': Computation as with 'C' with an additional estimate of the
             condition number of B. It provides a realistic error bound.
       = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
             D1, D2, and well-conditioned matrix C, this option gives
             higher accuracy than the 'C' option. If the structure of the
             input matrix is not known, and relative accuracy is
             desirable, then this option is advisable. The input matrix A
             is preprocessed with QR factorization with FULL (row and
             column) pivoting.
       = 'G'  Computation as with 'F' with an additional estimate of the
             condition number of B, where A=D*B. If A has heavily weighted
             rows, then using this condition number gives too pessimistic
             error bound.
       = 'A': Small singular values are the noise and the matrix is treated
             as numerically rank deficient. The error in the computed
             singular values is bounded by f(m,n)*epsilon*||A||.
             The computed SVD A = U * S * V^t restores A up to
             f(m,n)*epsilon*||A||.
             This gives the procedure the licence to discard (set to zero)
             all singular values below N*epsilon*||A||.
       = 'R': Similar as in 'A'. Rank revealing property of the initial
             QR factorization is used do reveal (using triangular factor)
             a gap sigma_{r+1} < epsilon * sigma_r in which case the
             numerical RANK is declared to be r. The SVD is computed with
             absolute error bounds, but more accurately than with 'A'.
[in]JOBU
          JOBU is CHARACTER*1
        Specifies whether to compute the columns of U:
       = 'U': N columns of U are returned in the array U.
       = 'F': full set of M left sing. vectors is returned in the array U.
       = 'W': U may be used as workspace of length M*N. See the description
             of U.
       = 'N': U is not computed.
[in]JOBV
          JOBV is CHARACTER*1
        Specifies whether to compute the matrix V:
       = 'V': N columns of V are returned in the array V; Jacobi rotations
             are not explicitly accumulated.
       = 'J': N columns of V are returned in the array V, but they are
             computed as the product of Jacobi rotations. This option is
             allowed only if JOBU .NE. 'N', i.e. in computing the full SVD.
       = 'W': V may be used as workspace of length N*N. See the description
             of V.
       = 'N': V is not computed.
[in]JOBR
          JOBR is CHARACTER*1
        Specifies the RANGE for the singular values. Issues the licence to
        set to zero small positive singular values if they are outside
        specified range. If A .NE. 0 is scaled so that the largest singular
        value of c*A is around DSQRT(BIG), BIG=SLAMCH('O'), then JOBR issues
        the licence to kill columns of A whose norm in c*A is less than
        DSQRT(SFMIN) (for JOBR.EQ.'R'), or less than SMALL=SFMIN/EPSLN,
        where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E').
       = 'N': Do not kill small columns of c*A. This option assumes that
             BLAS and QR factorizations and triangular solvers are
             implemented to work in that range. If the condition of A
             is greater than BIG, use DGESVJ.
       = 'R': RESTRICTED range for sigma(c*A) is [DSQRT(SFMIN), DSQRT(BIG)]
             (roughly, as described above). This option is recommended.
                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~
        For computing the singular values in the FULL range [SFMIN,BIG]
        use DGESVJ.
[in]JOBT
          JOBT is CHARACTER*1
        If the matrix is square then the procedure may determine to use
        transposed A if A^t seems to be better with respect to convergence.
        If the matrix is not square, JOBT is ignored. This is subject to
        changes in the future.
        The decision is based on two values of entropy over the adjoint
        orbit of A^t * A. See the descriptions of WORK(6) and WORK(7).
       = 'T': transpose if entropy test indicates possibly faster
        convergence of Jacobi process if A^t is taken as input. If A is
        replaced with A^t, then the row pivoting is included automatically.
       = 'N': do not speculate.
        This option can be used to compute only the singular values, or the
        full SVD (U, SIGMA and V). For only one set of singular vectors
        (U or V), the caller should provide both U and V, as one of the
        matrices is used as workspace if the matrix A is transposed.
        The implementer can easily remove this constraint and make the
        code more complicated. See the descriptions of U and V.
[in]JOBP
          JOBP is CHARACTER*1
        Issues the licence to introduce structured perturbations to drown
        denormalized numbers. This licence should be active if the
        denormals are poorly implemented, causing slow computation,
        especially in cases of fast convergence (!). For details see [1,2].
        For the sake of simplicity, this perturbations are included only
        when the full SVD or only the singular values are requested. The
        implementer/user can easily add the perturbation for the cases of
        computing one set of singular vectors.
       = 'P': introduce perturbation
       = 'N': do not perturb
[in]M
          M is INTEGER
         The number of rows of the input matrix A.  M >= 0.
[in]N
          N is INTEGER
         The number of columns of the input matrix A. M >= N >= 0.
[in,out]A
          A is DOUBLE PRECISION array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]SVA
          SVA is DOUBLE PRECISION array, dimension (N)
          On exit,
          - For WORK(1)/WORK(2) = ONE: The singular values of A. During the
            computation SVA contains Euclidean column norms of the
            iterated matrices in the array A.
          - For WORK(1) .NE. WORK(2): The singular values of A are
            (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
            sigma_max(A) overflows or if small singular values have been
            saved from underflow by scaling the input matrix A.
          - If JOBR='R' then some of the singular values may be returned
            as exact zeros obtained by "set to zero" because they are
            below the numerical rank threshold or are denormalized numbers.
[out]U
          U is DOUBLE PRECISION array, dimension ( LDU, N )
          If JOBU = 'U', then U contains on exit the M-by-N matrix of
                         the left singular vectors.
          If JOBU = 'F', then U contains on exit the M-by-M matrix of
                         the left singular vectors, including an ONB
                         of the orthogonal complement of the Range(A).
          If JOBU = 'W'  .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N),
                         then U is used as workspace if the procedure
                         replaces A with A^t. In that case, [V] is computed
                         in U as left singular vectors of A^t and then
                         copied back to the V array. This 'W' option is just
                         a reminder to the caller that in this case U is
                         reserved as workspace of length N*N.
          If JOBU = 'N'  U is not referenced, unless JOBT='T'.
[in]LDU
          LDU is INTEGER
          The leading dimension of the array U,  LDU >= 1.
          IF  JOBU = 'U' or 'F' or 'W',  then LDU >= M.
[out]V
          V is DOUBLE PRECISION array, dimension ( LDV, N )
          If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
                         the right singular vectors;
          If JOBV = 'W', AND (JOBU.EQ.'U' AND JOBT.EQ.'T' AND M.EQ.N),
                         then V is used as workspace if the pprocedure
                         replaces A with A^t. In that case, [U] is computed
                         in V as right singular vectors of A^t and then
                         copied back to the U array. This 'W' option is just
                         a reminder to the caller that in this case V is
                         reserved as workspace of length N*N.
          If JOBV = 'N'  V is not referenced, unless JOBT='T'.
[in]LDV
          LDV is INTEGER
          The leading dimension of the array V,  LDV >= 1.
          If JOBV = 'V' or 'J' or 'W', then LDV >= N.
[out]WORK
          WORK is DOUBLE PRECISION array, dimension (LWORK)
          On exit, if N.GT.0 .AND. M.GT.0 (else not referenced),
          WORK(1) = SCALE = WORK(2) / WORK(1) is the scaling factor such
                    that SCALE*SVA(1:N) are the computed singular values
                    of A. (See the description of SVA().)
          WORK(2) = See the description of WORK(1).
          WORK(3) = SCONDA is an estimate for the condition number of
                    column equilibrated A. (If JOBA .EQ. 'E' or 'G')
                    SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1).
                    It is computed using DPOCON. It holds
                    N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
                    where R is the triangular factor from the QRF of A.
                    However, if R is truncated and the numerical rank is
                    determined to be strictly smaller than N, SCONDA is
                    returned as -1, thus indicating that the smallest
                    singular values might be lost.

          If full SVD is needed, the following two condition numbers are
          useful for the analysis of the algorithm. They are provied for
          a developer/implementer who is familiar with the details of
          the method.

          WORK(4) = an estimate of the scaled condition number of the
                    triangular factor in the first QR factorization.
          WORK(5) = an estimate of the scaled condition number of the
                    triangular factor in the second QR factorization.
          The following two parameters are computed if JOBT .EQ. 'T'.
          They are provided for a developer/implementer who is familiar
          with the details of the method.

          WORK(6) = the entropy of A^t*A :: this is the Shannon entropy
                    of diag(A^t*A) / Trace(A^t*A) taken as point in the
                    probability simplex.
          WORK(7) = the entropy of A*A^t.
[in]LWORK
          LWORK is INTEGER
          Length of WORK to confirm proper allocation of work space.
          LWORK depends on the job:

          If only SIGMA is needed ( JOBU.EQ.'N', JOBV.EQ.'N' ) and
            -> .. no scaled condition estimate required (JOBE.EQ.'N'):
               LWORK >= max(2*M+N,4*N+1,7). This is the minimal requirement.
               ->> For optimal performance (blocked code) the optimal value
               is LWORK >= max(2*M+N,3*N+(N+1)*NB,7). Here NB is the optimal
               block size for DGEQP3 and DGEQRF.
               In general, optimal LWORK is computed as
               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), 7).
            -> .. an estimate of the scaled condition number of A is
               required (JOBA='E', 'G'). In this case, LWORK is the maximum
               of the above and N*N+4*N, i.e. LWORK >= max(2*M+N,N*N+4*N,7).
               ->> For optimal performance (blocked code) the optimal value
               is LWORK >= max(2*M+N,3*N+(N+1)*NB, N*N+4*N, 7).
               In general, the optimal length LWORK is computed as
               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF),
                                                     N+N*N+LWORK(DPOCON),7).

          If SIGMA and the right singular vectors are needed (JOBV.EQ.'V'),
            -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
            -> For optimal performance, LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
               where NB is the optimal block size for DGEQP3, DGEQRF, DGELQF,
               DORMLQ. In general, the optimal length LWORK is computed as
               LWORK >= max(2*M+N,N+LWORK(DGEQP3), N+LWORK(DPOCON),
                       N+LWORK(DGELQF), 2*N+LWORK(DGEQRF), N+LWORK(DORMLQ)).

          If SIGMA and the left singular vectors are needed
            -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
            -> For optimal performance:
               if JOBU.EQ.'U' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
               if JOBU.EQ.'F' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,N+M*NB,7),
               where NB is the optimal block size for DGEQP3, DGEQRF, DORMQR.
               In general, the optimal length LWORK is computed as
               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DPOCON),
                        2*N+LWORK(DGEQRF), N+LWORK(DORMQR)).
               Here LWORK(DORMQR) equals N*NB (for JOBU.EQ.'U') or
               M*NB (for JOBU.EQ.'F').

          If the full SVD is needed: (JOBU.EQ.'U' or JOBU.EQ.'F') and
            -> if JOBV.EQ.'V'
               the minimal requirement is LWORK >= max(2*M+N,6*N+2*N*N).
            -> if JOBV.EQ.'J' the minimal requirement is
               LWORK >= max(2*M+N, 4*N+N*N,2*N+N*N+6).
            -> For optimal performance, LWORK should be additionally
               larger than N+M*NB, where NB is the optimal block size
               for DORMQR.
[out]IWORK
          IWORK is INTEGER array, dimension (M+3*N).
          On exit,
          IWORK(1) = the numerical rank determined after the initial
                     QR factorization with pivoting. See the descriptions
                     of JOBA and JOBR.
          IWORK(2) = the number of the computed nonzero singular values
          IWORK(3) = if nonzero, a warning message:
                     If IWORK(3).EQ.1 then some of the column norms of A
                     were denormalized floats. The requested high accuracy
                     is not warranted by the data.
[out]INFO
          INFO is INTEGER
           < 0  : if INFO = -i, then the i-th argument had an illegal value.
           = 0 :  successful exit;
           > 0 :  DGEJSV  did not converge in the maximal allowed number
                  of sweeps. The computed values may be inaccurate.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Date
June 2016
Further Details:
  DGEJSV implements a preconditioned Jacobi SVD algorithm. It uses DGEQP3,
  DGEQRF, and DGELQF as preprocessors and preconditioners. Optionally, an
  additional row pivoting can be used as a preprocessor, which in some
  cases results in much higher accuracy. An example is matrix A with the
  structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
  diagonal matrices and C is well-conditioned matrix. In that case, complete
  pivoting in the first QR factorizations provides accuracy dependent on the
  condition number of C, and independent of D1, D2. Such higher accuracy is
  not completely understood theoretically, but it works well in practice.
  Further, if A can be written as A = B*D, with well-conditioned B and some
  diagonal D, then the high accuracy is guaranteed, both theoretically and
  in software, independent of D. For more details see [1], [2].
     The computational range for the singular values can be the full range
  ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
  & LAPACK routines called by DGEJSV are implemented to work in that range.
  If that is not the case, then the restriction for safe computation with
  the singular values in the range of normalized IEEE numbers is that the
  spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
  overflow. This code (DGEJSV) is best used in this restricted range,
  meaning that singular values of magnitude below ||A||_2 / DLAMCH('O') are
  returned as zeros. See JOBR for details on this.
     Further, this implementation is somewhat slower than the one described
  in [1,2] due to replacement of some non-LAPACK components, and because
  the choice of some tuning parameters in the iterative part (DGESVJ) is
  left to the implementer on a particular machine.
     The rank revealing QR factorization (in this code: DGEQP3) should be
  implemented as in [3]. We have a new version of DGEQP3 under development
  that is more robust than the current one in LAPACK, with a cleaner cut in
  rank deficient cases. It will be available in the SIGMA library [4].
  If M is much larger than N, it is obvious that the initial QRF with
  column pivoting can be preprocessed by the QRF without pivoting. That
  well known trick is not used in DGEJSV because in some cases heavy row
  weighting can be treated with complete pivoting. The overhead in cases
  M much larger than N is then only due to pivoting, but the benefits in
  terms of accuracy have prevailed. The implementer/user can incorporate
  this extra QRF step easily. The implementer can also improve data movement
  (matrix transpose, matrix copy, matrix transposed copy) - this
  implementation of DGEJSV uses only the simplest, naive data movement.
Contributors:
Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
References:
 [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
     LAPACK Working note 169.
 [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
     LAPACK Working note 170.
 [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
     factorization software - a case study.
     ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
     LAPACK Working note 176.
 [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
     QSVD, (H,K)-SVD computations.
     Department of Mathematics, University of Zagreb, 2008.
Bugs, examples and comments:
Please report all bugs and send interesting examples and/or comments to drmac.nosp@m.@mat.nosp@m.h.hr. Thank you.

Definition at line 478 of file dgejsv.f.

478 *
479 * -- LAPACK computational routine (version 3.7.1) --
480 * -- LAPACK is a software package provided by Univ. of Tennessee, --
481 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
482 * June 2016
483 *
484 * .. Scalar Arguments ..
485  IMPLICIT NONE
486  INTEGER info, lda, ldu, ldv, lwork, m, n
487 * ..
488 * .. Array Arguments ..
489  DOUBLE PRECISION a( lda, * ), sva( n ), u( ldu, * ), v( ldv, * ),
490  $ work( lwork )
491  INTEGER iwork( * )
492  CHARACTER*1 joba, jobp, jobr, jobt, jobu, jobv
493 * ..
494 *
495 * ===========================================================================
496 *
497 * .. Local Parameters ..
498  DOUBLE PRECISION zero, one
499  parameter( zero = 0.0d0, one = 1.0d0 )
500 * ..
501 * .. Local Scalars ..
502  DOUBLE PRECISION aapp, aaqq, aatmax, aatmin, big, big1, cond_ok,
503  $ condr1, condr2, entra, entrat, epsln, maxprj, scalem,
504  $ sconda, sfmin, small, temp1, uscal1, uscal2, xsc
505  INTEGER ierr, n1, nr, numrank, p, q, warning
506  LOGICAL almort, defr, errest, goscal, jracc, kill, lsvec,
507  $ l2aber, l2kill, l2pert, l2rank, l2tran,
508  $ noscal, rowpiv, rsvec, transp
509 * ..
510 * .. Intrinsic Functions ..
511  INTRINSIC dabs, dlog, max, min, dble, idnint, dsign, dsqrt
512 * ..
513 * .. External Functions ..
514  DOUBLE PRECISION dlamch, dnrm2
515  INTEGER idamax
516  LOGICAL lsame
517  EXTERNAL idamax, lsame, dlamch, dnrm2
518 * ..
519 * .. External Subroutines ..
520  EXTERNAL dcopy, dgelqf, dgeqp3, dgeqrf, dlacpy, dlascl,
523 *
524  EXTERNAL dgesvj
525 * ..
526 *
527 * Test the input arguments
528 *
529  lsvec = lsame( jobu, 'U' ) .OR. lsame( jobu, 'F' )
530  jracc = lsame( jobv, 'J' )
531  rsvec = lsame( jobv, 'V' ) .OR. jracc
532  rowpiv = lsame( joba, 'F' ) .OR. lsame( joba, 'G' )
533  l2rank = lsame( joba, 'R' )
534  l2aber = lsame( joba, 'A' )
535  errest = lsame( joba, 'E' ) .OR. lsame( joba, 'G' )
536  l2tran = lsame( jobt, 'T' )
537  l2kill = lsame( jobr, 'R' )
538  defr = lsame( jobr, 'N' )
539  l2pert = lsame( jobp, 'P' )
540 *
541  IF ( .NOT.(rowpiv .OR. l2rank .OR. l2aber .OR.
542  $ errest .OR. lsame( joba, 'C' ) )) THEN
543  info = - 1
544  ELSE IF ( .NOT.( lsvec .OR. lsame( jobu, 'N' ) .OR.
545  $ lsame( jobu, 'W' )) ) THEN
546  info = - 2
547  ELSE IF ( .NOT.( rsvec .OR. lsame( jobv, 'N' ) .OR.
548  $ lsame( jobv, 'W' )) .OR. ( jracc .AND. (.NOT.lsvec) ) ) THEN
549  info = - 3
550  ELSE IF ( .NOT. ( l2kill .OR. defr ) ) THEN
551  info = - 4
552  ELSE IF ( .NOT. ( l2tran .OR. lsame( jobt, 'N' ) ) ) THEN
553  info = - 5
554  ELSE IF ( .NOT. ( l2pert .OR. lsame( jobp, 'N' ) ) ) THEN
555  info = - 6
556  ELSE IF ( m .LT. 0 ) THEN
557  info = - 7
558  ELSE IF ( ( n .LT. 0 ) .OR. ( n .GT. m ) ) THEN
559  info = - 8
560  ELSE IF ( lda .LT. m ) THEN
561  info = - 10
562  ELSE IF ( lsvec .AND. ( ldu .LT. m ) ) THEN
563  info = - 13
564  ELSE IF ( rsvec .AND. ( ldv .LT. n ) ) THEN
565  info = - 15
566  ELSE IF ( (.NOT.(lsvec .OR. rsvec .OR. errest).AND.
567  & (lwork .LT. max(7,4*n+1,2*m+n))) .OR.
568  & (.NOT.(lsvec .OR. rsvec) .AND. errest .AND.
569  & (lwork .LT. max(7,4*n+n*n,2*m+n))) .OR.
570  & (lsvec .AND. (.NOT.rsvec) .AND. (lwork .LT. max(7,2*m+n,4*n+1)))
571  & .OR.
572  & (rsvec .AND. (.NOT.lsvec) .AND. (lwork .LT. max(7,2*m+n,4*n+1)))
573  & .OR.
574  & (lsvec .AND. rsvec .AND. (.NOT.jracc) .AND.
575  & (lwork.LT.max(2*m+n,6*n+2*n*n)))
576  & .OR. (lsvec .AND. rsvec .AND. jracc .AND.
577  & lwork.LT.max(2*m+n,4*n+n*n,2*n+n*n+6)))
578  & THEN
579  info = - 17
580  ELSE
581 * #:)
582  info = 0
583  END IF
584 *
585  IF ( info .NE. 0 ) THEN
586 * #:(
587  CALL xerbla( 'DGEJSV', - info )
588  RETURN
589  END IF
590 *
591 * Quick return for void matrix (Y3K safe)
592 * #:)
593  IF ( ( m .EQ. 0 ) .OR. ( n .EQ. 0 ) ) THEN
594  iwork(1:3) = 0
595  work(1:7) = 0
596  RETURN
597  ENDIF
598 *
599 * Determine whether the matrix U should be M x N or M x M
600 *
601  IF ( lsvec ) THEN
602  n1 = n
603  IF ( lsame( jobu, 'F' ) ) n1 = m
604  END IF
605 *
606 * Set numerical parameters
607 *
608 *! NOTE: Make sure DLAMCH() does not fail on the target architecture.
609 *
610  epsln = dlamch('Epsilon')
611  sfmin = dlamch('SafeMinimum')
612  small = sfmin / epsln
613  big = dlamch('O')
614 * BIG = ONE / SFMIN
615 *
616 * Initialize SVA(1:N) = diag( ||A e_i||_2 )_1^N
617 *
618 *(!) If necessary, scale SVA() to protect the largest norm from
619 * overflow. It is possible that this scaling pushes the smallest
620 * column norm left from the underflow threshold (extreme case).
621 *
622  scalem = one / dsqrt(dble(m)*dble(n))
623  noscal = .true.
624  goscal = .true.
625  DO 1874 p = 1, n
626  aapp = zero
627  aaqq = one
628  CALL dlassq( m, a(1,p), 1, aapp, aaqq )
629  IF ( aapp .GT. big ) THEN
630  info = - 9
631  CALL xerbla( 'DGEJSV', -info )
632  RETURN
633  END IF
634  aaqq = dsqrt(aaqq)
635  IF ( ( aapp .LT. (big / aaqq) ) .AND. noscal ) THEN
636  sva(p) = aapp * aaqq
637  ELSE
638  noscal = .false.
639  sva(p) = aapp * ( aaqq * scalem )
640  IF ( goscal ) THEN
641  goscal = .false.
642  CALL dscal( p-1, scalem, sva, 1 )
643  END IF
644  END IF
645  1874 CONTINUE
646 *
647  IF ( noscal ) scalem = one
648 *
649  aapp = zero
650  aaqq = big
651  DO 4781 p = 1, n
652  aapp = max( aapp, sva(p) )
653  IF ( sva(p) .NE. zero ) aaqq = min( aaqq, sva(p) )
654  4781 CONTINUE
655 *
656 * Quick return for zero M x N matrix
657 * #:)
658  IF ( aapp .EQ. zero ) THEN
659  IF ( lsvec ) CALL dlaset( 'G', m, n1, zero, one, u, ldu )
660  IF ( rsvec ) CALL dlaset( 'G', n, n, zero, one, v, ldv )
661  work(1) = one
662  work(2) = one
663  IF ( errest ) work(3) = one
664  IF ( lsvec .AND. rsvec ) THEN
665  work(4) = one
666  work(5) = one
667  END IF
668  IF ( l2tran ) THEN
669  work(6) = zero
670  work(7) = zero
671  END IF
672  iwork(1) = 0
673  iwork(2) = 0
674  iwork(3) = 0
675  RETURN
676  END IF
677 *
678 * Issue warning if denormalized column norms detected. Override the
679 * high relative accuracy request. Issue licence to kill columns
680 * (set them to zero) whose norm is less than sigma_max / BIG (roughly).
681 * #:(
682  warning = 0
683  IF ( aaqq .LE. sfmin ) THEN
684  l2rank = .true.
685  l2kill = .true.
686  warning = 1
687  END IF
688 *
689 * Quick return for one-column matrix
690 * #:)
691  IF ( n .EQ. 1 ) THEN
692 *
693  IF ( lsvec ) THEN
694  CALL dlascl( 'G',0,0,sva(1),scalem, m,1,a(1,1),lda,ierr )
695  CALL dlacpy( 'A', m, 1, a, lda, u, ldu )
696 * computing all M left singular vectors of the M x 1 matrix
697  IF ( n1 .NE. n ) THEN
698  CALL dgeqrf( m, n, u,ldu, work, work(n+1),lwork-n,ierr )
699  CALL dorgqr( m,n1,1, u,ldu,work,work(n+1),lwork-n,ierr )
700  CALL dcopy( m, a(1,1), 1, u(1,1), 1 )
701  END IF
702  END IF
703  IF ( rsvec ) THEN
704  v(1,1) = one
705  END IF
706  IF ( sva(1) .LT. (big*scalem) ) THEN
707  sva(1) = sva(1) / scalem
708  scalem = one
709  END IF
710  work(1) = one / scalem
711  work(2) = one
712  IF ( sva(1) .NE. zero ) THEN
713  iwork(1) = 1
714  IF ( ( sva(1) / scalem) .GE. sfmin ) THEN
715  iwork(2) = 1
716  ELSE
717  iwork(2) = 0
718  END IF
719  ELSE
720  iwork(1) = 0
721  iwork(2) = 0
722  END IF
723  iwork(3) = 0
724  IF ( errest ) work(3) = one
725  IF ( lsvec .AND. rsvec ) THEN
726  work(4) = one
727  work(5) = one
728  END IF
729  IF ( l2tran ) THEN
730  work(6) = zero
731  work(7) = zero
732  END IF
733  RETURN
734 *
735  END IF
736 *
737  transp = .false.
738  l2tran = l2tran .AND. ( m .EQ. n )
739 *
740  aatmax = -one
741  aatmin = big
742  IF ( rowpiv .OR. l2tran ) THEN
743 *
744 * Compute the row norms, needed to determine row pivoting sequence
745 * (in the case of heavily row weighted A, row pivoting is strongly
746 * advised) and to collect information needed to compare the
747 * structures of A * A^t and A^t * A (in the case L2TRAN.EQ..TRUE.).
748 *
749  IF ( l2tran ) THEN
750  DO 1950 p = 1, m
751  xsc = zero
752  temp1 = one
753  CALL dlassq( n, a(p,1), lda, xsc, temp1 )
754 * DLASSQ gets both the ell_2 and the ell_infinity norm
755 * in one pass through the vector
756  work(m+n+p) = xsc * scalem
757  work(n+p) = xsc * (scalem*dsqrt(temp1))
758  aatmax = max( aatmax, work(n+p) )
759  IF (work(n+p) .NE. zero) aatmin = min(aatmin,work(n+p))
760  1950 CONTINUE
761  ELSE
762  DO 1904 p = 1, m
763  work(m+n+p) = scalem*dabs( a(p,idamax(n,a(p,1),lda)) )
764  aatmax = max( aatmax, work(m+n+p) )
765  aatmin = min( aatmin, work(m+n+p) )
766  1904 CONTINUE
767  END IF
768 *
769  END IF
770 *
771 * For square matrix A try to determine whether A^t would be better
772 * input for the preconditioned Jacobi SVD, with faster convergence.
773 * The decision is based on an O(N) function of the vector of column
774 * and row norms of A, based on the Shannon entropy. This should give
775 * the right choice in most cases when the difference actually matters.
776 * It may fail and pick the slower converging side.
777 *
778  entra = zero
779  entrat = zero
780  IF ( l2tran ) THEN
781 *
782  xsc = zero
783  temp1 = one
784  CALL dlassq( n, sva, 1, xsc, temp1 )
785  temp1 = one / temp1
786 *
787  entra = zero
788  DO 1113 p = 1, n
789  big1 = ( ( sva(p) / xsc )**2 ) * temp1
790  IF ( big1 .NE. zero ) entra = entra + big1 * dlog(big1)
791  1113 CONTINUE
792  entra = - entra / dlog(dble(n))
793 *
794 * Now, SVA().^2/Trace(A^t * A) is a point in the probability simplex.
795 * It is derived from the diagonal of A^t * A. Do the same with the
796 * diagonal of A * A^t, compute the entropy of the corresponding
797 * probability distribution. Note that A * A^t and A^t * A have the
798 * same trace.
799 *
800  entrat = zero
801  DO 1114 p = n+1, n+m
802  big1 = ( ( work(p) / xsc )**2 ) * temp1
803  IF ( big1 .NE. zero ) entrat = entrat + big1 * dlog(big1)
804  1114 CONTINUE
805  entrat = - entrat / dlog(dble(m))
806 *
807 * Analyze the entropies and decide A or A^t. Smaller entropy
808 * usually means better input for the algorithm.
809 *
810  transp = ( entrat .LT. entra )
811 *
812 * If A^t is better than A, transpose A.
813 *
814  IF ( transp ) THEN
815 * In an optimal implementation, this trivial transpose
816 * should be replaced with faster transpose.
817  DO 1115 p = 1, n - 1
818  DO 1116 q = p + 1, n
819  temp1 = a(q,p)
820  a(q,p) = a(p,q)
821  a(p,q) = temp1
822  1116 CONTINUE
823  1115 CONTINUE
824  DO 1117 p = 1, n
825  work(m+n+p) = sva(p)
826  sva(p) = work(n+p)
827  1117 CONTINUE
828  temp1 = aapp
829  aapp = aatmax
830  aatmax = temp1
831  temp1 = aaqq
832  aaqq = aatmin
833  aatmin = temp1
834  kill = lsvec
835  lsvec = rsvec
836  rsvec = kill
837  IF ( lsvec ) n1 = n
838 *
839  rowpiv = .true.
840  END IF
841 *
842  END IF
843 * END IF L2TRAN
844 *
845 * Scale the matrix so that its maximal singular value remains less
846 * than DSQRT(BIG) -- the matrix is scaled so that its maximal column
847 * has Euclidean norm equal to DSQRT(BIG/N). The only reason to keep
848 * DSQRT(BIG) instead of BIG is the fact that DGEJSV uses LAPACK and
849 * BLAS routines that, in some implementations, are not capable of
850 * working in the full interval [SFMIN,BIG] and that they may provoke
851 * overflows in the intermediate results. If the singular values spread
852 * from SFMIN to BIG, then DGESVJ will compute them. So, in that case,
853 * one should use DGESVJ instead of DGEJSV.
854 *
855  big1 = dsqrt( big )
856  temp1 = dsqrt( big / dble(n) )
857 *
858  CALL dlascl( 'G', 0, 0, aapp, temp1, n, 1, sva, n, ierr )
859  IF ( aaqq .GT. (aapp * sfmin) ) THEN
860  aaqq = ( aaqq / aapp ) * temp1
861  ELSE
862  aaqq = ( aaqq * temp1 ) / aapp
863  END IF
864  temp1 = temp1 * scalem
865  CALL dlascl( 'G', 0, 0, aapp, temp1, m, n, a, lda, ierr )
866 *
867 * To undo scaling at the end of this procedure, multiply the
868 * computed singular values with USCAL2 / USCAL1.
869 *
870  uscal1 = temp1
871  uscal2 = aapp
872 *
873  IF ( l2kill ) THEN
874 * L2KILL enforces computation of nonzero singular values in
875 * the restricted range of condition number of the initial A,
876 * sigma_max(A) / sigma_min(A) approx. DSQRT(BIG)/DSQRT(SFMIN).
877  xsc = dsqrt( sfmin )
878  ELSE
879  xsc = small
880 *
881 * Now, if the condition number of A is too big,
882 * sigma_max(A) / sigma_min(A) .GT. DSQRT(BIG/N) * EPSLN / SFMIN,
883 * as a precaution measure, the full SVD is computed using DGESVJ
884 * with accumulated Jacobi rotations. This provides numerically
885 * more robust computation, at the cost of slightly increased run
886 * time. Depending on the concrete implementation of BLAS and LAPACK
887 * (i.e. how they behave in presence of extreme ill-conditioning) the
888 * implementor may decide to remove this switch.
889  IF ( ( aaqq.LT.dsqrt(sfmin) ) .AND. lsvec .AND. rsvec ) THEN
890  jracc = .true.
891  END IF
892 *
893  END IF
894  IF ( aaqq .LT. xsc ) THEN
895  DO 700 p = 1, n
896  IF ( sva(p) .LT. xsc ) THEN
897  CALL dlaset( 'A', m, 1, zero, zero, a(1,p), lda )
898  sva(p) = zero
899  END IF
900  700 CONTINUE
901  END IF
902 *
903 * Preconditioning using QR factorization with pivoting
904 *
905  IF ( rowpiv ) THEN
906 * Optional row permutation (Bjoerck row pivoting):
907 * A result by Cox and Higham shows that the Bjoerck's
908 * row pivoting combined with standard column pivoting
909 * has similar effect as Powell-Reid complete pivoting.
910 * The ell-infinity norms of A are made nonincreasing.
911  DO 1952 p = 1, m - 1
912  q = idamax( m-p+1, work(m+n+p), 1 ) + p - 1
913  iwork(2*n+p) = q
914  IF ( p .NE. q ) THEN
915  temp1 = work(m+n+p)
916  work(m+n+p) = work(m+n+q)
917  work(m+n+q) = temp1
918  END IF
919  1952 CONTINUE
920  CALL dlaswp( n, a, lda, 1, m-1, iwork(2*n+1), 1 )
921  END IF
922 *
923 * End of the preparation phase (scaling, optional sorting and
924 * transposing, optional flushing of small columns).
925 *
926 * Preconditioning
927 *
928 * If the full SVD is needed, the right singular vectors are computed
929 * from a matrix equation, and for that we need theoretical analysis
930 * of the Businger-Golub pivoting. So we use DGEQP3 as the first RR QRF.
931 * In all other cases the first RR QRF can be chosen by other criteria
932 * (eg speed by replacing global with restricted window pivoting, such
933 * as in SGEQPX from TOMS # 782). Good results will be obtained using
934 * SGEQPX with properly (!) chosen numerical parameters.
935 * Any improvement of DGEQP3 improves overal performance of DGEJSV.
936 *
937 * A * P1 = Q1 * [ R1^t 0]^t:
938  DO 1963 p = 1, n
939 * .. all columns are free columns
940  iwork(p) = 0
941  1963 CONTINUE
942  CALL dgeqp3( m,n,a,lda, iwork,work, work(n+1),lwork-n, ierr )
943 *
944 * The upper triangular matrix R1 from the first QRF is inspected for
945 * rank deficiency and possibilities for deflation, or possible
946 * ill-conditioning. Depending on the user specified flag L2RANK,
947 * the procedure explores possibilities to reduce the numerical
948 * rank by inspecting the computed upper triangular factor. If
949 * L2RANK or L2ABER are up, then DGEJSV will compute the SVD of
950 * A + dA, where ||dA|| <= f(M,N)*EPSLN.
951 *
952  nr = 1
953  IF ( l2aber ) THEN
954 * Standard absolute error bound suffices. All sigma_i with
955 * sigma_i < N*EPSLN*||A|| are flushed to zero. This is an
956 * agressive enforcement of lower numerical rank by introducing a
957 * backward error of the order of N*EPSLN*||A||.
958  temp1 = dsqrt(dble(n))*epsln
959  DO 3001 p = 2, n
960  IF ( dabs(a(p,p)) .GE. (temp1*dabs(a(1,1))) ) THEN
961  nr = nr + 1
962  ELSE
963  GO TO 3002
964  END IF
965  3001 CONTINUE
966  3002 CONTINUE
967  ELSE IF ( l2rank ) THEN
968 * .. similarly as above, only slightly more gentle (less agressive).
969 * Sudden drop on the diagonal of R1 is used as the criterion for
970 * close-to-rank-deficient.
971  temp1 = dsqrt(sfmin)
972  DO 3401 p = 2, n
973  IF ( ( dabs(a(p,p)) .LT. (epsln*dabs(a(p-1,p-1))) ) .OR.
974  $ ( dabs(a(p,p)) .LT. small ) .OR.
975  $ ( l2kill .AND. (dabs(a(p,p)) .LT. temp1) ) ) GO TO 3402
976  nr = nr + 1
977  3401 CONTINUE
978  3402 CONTINUE
979 *
980  ELSE
981 * The goal is high relative accuracy. However, if the matrix
982 * has high scaled condition number the relative accuracy is in
983 * general not feasible. Later on, a condition number estimator
984 * will be deployed to estimate the scaled condition number.
985 * Here we just remove the underflowed part of the triangular
986 * factor. This prevents the situation in which the code is
987 * working hard to get the accuracy not warranted by the data.
988  temp1 = dsqrt(sfmin)
989  DO 3301 p = 2, n
990  IF ( ( dabs(a(p,p)) .LT. small ) .OR.
991  $ ( l2kill .AND. (dabs(a(p,p)) .LT. temp1) ) ) GO TO 3302
992  nr = nr + 1
993  3301 CONTINUE
994  3302 CONTINUE
995 *
996  END IF
997 *
998  almort = .false.
999  IF ( nr .EQ. n ) THEN
1000  maxprj = one
1001  DO 3051 p = 2, n
1002  temp1 = dabs(a(p,p)) / sva(iwork(p))
1003  maxprj = min( maxprj, temp1 )
1004  3051 CONTINUE
1005  IF ( maxprj**2 .GE. one - dble(n)*epsln ) almort = .true.
1006  END IF
1007 *
1008 *
1009  sconda = - one
1010  condr1 = - one
1011  condr2 = - one
1012 *
1013  IF ( errest ) THEN
1014  IF ( n .EQ. nr ) THEN
1015  IF ( rsvec ) THEN
1016 * .. V is available as workspace
1017  CALL dlacpy( 'U', n, n, a, lda, v, ldv )
1018  DO 3053 p = 1, n
1019  temp1 = sva(iwork(p))
1020  CALL dscal( p, one/temp1, v(1,p), 1 )
1021  3053 CONTINUE
1022  CALL dpocon( 'U', n, v, ldv, one, temp1,
1023  $ work(n+1), iwork(2*n+m+1), ierr )
1024  ELSE IF ( lsvec ) THEN
1025 * .. U is available as workspace
1026  CALL dlacpy( 'U', n, n, a, lda, u, ldu )
1027  DO 3054 p = 1, n
1028  temp1 = sva(iwork(p))
1029  CALL dscal( p, one/temp1, u(1,p), 1 )
1030  3054 CONTINUE
1031  CALL dpocon( 'U', n, u, ldu, one, temp1,
1032  $ work(n+1), iwork(2*n+m+1), ierr )
1033  ELSE
1034  CALL dlacpy( 'U', n, n, a, lda, work(n+1), n )
1035  DO 3052 p = 1, n
1036  temp1 = sva(iwork(p))
1037  CALL dscal( p, one/temp1, work(n+(p-1)*n+1), 1 )
1038  3052 CONTINUE
1039 * .. the columns of R are scaled to have unit Euclidean lengths.
1040  CALL dpocon( 'U', n, work(n+1), n, one, temp1,
1041  $ work(n+n*n+1), iwork(2*n+m+1), ierr )
1042  END IF
1043  sconda = one / dsqrt(temp1)
1044 * SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1).
1045 * N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
1046  ELSE
1047  sconda = - one
1048  END IF
1049  END IF
1050 *
1051  l2pert = l2pert .AND. ( dabs( a(1,1)/a(nr,nr) ) .GT. dsqrt(big1) )
1052 * If there is no violent scaling, artificial perturbation is not needed.
1053 *
1054 * Phase 3:
1055 *
1056  IF ( .NOT. ( rsvec .OR. lsvec ) ) THEN
1057 *
1058 * Singular Values only
1059 *
1060 * .. transpose A(1:NR,1:N)
1061  DO 1946 p = 1, min( n-1, nr )
1062  CALL dcopy( n-p, a(p,p+1), lda, a(p+1,p), 1 )
1063  1946 CONTINUE
1064 *
1065 * The following two DO-loops introduce small relative perturbation
1066 * into the strict upper triangle of the lower triangular matrix.
1067 * Small entries below the main diagonal are also changed.
1068 * This modification is useful if the computing environment does not
1069 * provide/allow FLUSH TO ZERO underflow, for it prevents many
1070 * annoying denormalized numbers in case of strongly scaled matrices.
1071 * The perturbation is structured so that it does not introduce any
1072 * new perturbation of the singular values, and it does not destroy
1073 * the job done by the preconditioner.
1074 * The licence for this perturbation is in the variable L2PERT, which
1075 * should be .FALSE. if FLUSH TO ZERO underflow is active.
1076 *
1077  IF ( .NOT. almort ) THEN
1078 *
1079  IF ( l2pert ) THEN
1080 * XSC = DSQRT(SMALL)
1081  xsc = epsln / dble(n)
1082  DO 4947 q = 1, nr
1083  temp1 = xsc*dabs(a(q,q))
1084  DO 4949 p = 1, n
1085  IF ( ( (p.GT.q) .AND. (dabs(a(p,q)).LE.temp1) )
1086  $ .OR. ( p .LT. q ) )
1087  $ a(p,q) = dsign( temp1, a(p,q) )
1088  4949 CONTINUE
1089  4947 CONTINUE
1090  ELSE
1091  CALL dlaset( 'U', nr-1,nr-1, zero,zero, a(1,2),lda )
1092  END IF
1093 *
1094 * .. second preconditioning using the QR factorization
1095 *
1096  CALL dgeqrf( n,nr, a,lda, work, work(n+1),lwork-n, ierr )
1097 *
1098 * .. and transpose upper to lower triangular
1099  DO 1948 p = 1, nr - 1
1100  CALL dcopy( nr-p, a(p,p+1), lda, a(p+1,p), 1 )
1101  1948 CONTINUE
1102 *
1103  END IF
1104 *
1105 * Row-cyclic Jacobi SVD algorithm with column pivoting
1106 *
1107 * .. again some perturbation (a "background noise") is added
1108 * to drown denormals
1109  IF ( l2pert ) THEN
1110 * XSC = DSQRT(SMALL)
1111  xsc = epsln / dble(n)
1112  DO 1947 q = 1, nr
1113  temp1 = xsc*dabs(a(q,q))
1114  DO 1949 p = 1, nr
1115  IF ( ( (p.GT.q) .AND. (dabs(a(p,q)).LE.temp1) )
1116  $ .OR. ( p .LT. q ) )
1117  $ a(p,q) = dsign( temp1, a(p,q) )
1118  1949 CONTINUE
1119  1947 CONTINUE
1120  ELSE
1121  CALL dlaset( 'U', nr-1, nr-1, zero, zero, a(1,2), lda )
1122  END IF
1123 *
1124 * .. and one-sided Jacobi rotations are started on a lower
1125 * triangular matrix (plus perturbation which is ignored in
1126 * the part which destroys triangular form (confusing?!))
1127 *
1128  CALL dgesvj( 'L', 'NoU', 'NoV', nr, nr, a, lda, sva,
1129  $ n, v, ldv, work, lwork, info )
1130 *
1131  scalem = work(1)
1132  numrank = idnint(work(2))
1133 *
1134 *
1135  ELSE IF ( rsvec .AND. ( .NOT. lsvec ) ) THEN
1136 *
1137 * -> Singular Values and Right Singular Vectors <-
1138 *
1139  IF ( almort ) THEN
1140 *
1141 * .. in this case NR equals N
1142  DO 1998 p = 1, nr
1143  CALL dcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1144  1998 CONTINUE
1145  CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2), ldv )
1146 *
1147  CALL dgesvj( 'L','U','N', n, nr, v,ldv, sva, nr, a,lda,
1148  $ work, lwork, info )
1149  scalem = work(1)
1150  numrank = idnint(work(2))
1151 
1152  ELSE
1153 *
1154 * .. two more QR factorizations ( one QRF is not enough, two require
1155 * accumulated product of Jacobi rotations, three are perfect )
1156 *
1157  CALL dlaset( 'Lower', nr-1, nr-1, zero, zero, a(2,1), lda )
1158  CALL dgelqf( nr, n, a, lda, work, work(n+1), lwork-n, ierr)
1159  CALL dlacpy( 'Lower', nr, nr, a, lda, v, ldv )
1160  CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2), ldv )
1161  CALL dgeqrf( nr, nr, v, ldv, work(n+1), work(2*n+1),
1162  $ lwork-2*n, ierr )
1163  DO 8998 p = 1, nr
1164  CALL dcopy( nr-p+1, v(p,p), ldv, v(p,p), 1 )
1165  8998 CONTINUE
1166  CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2), ldv )
1167 *
1168  CALL dgesvj( 'Lower', 'U','N', nr, nr, v,ldv, sva, nr, u,
1169  $ ldu, work(n+1), lwork, info )
1170  scalem = work(n+1)
1171  numrank = idnint(work(n+2))
1172  IF ( nr .LT. n ) THEN
1173  CALL dlaset( 'A',n-nr, nr, zero,zero, v(nr+1,1), ldv )
1174  CALL dlaset( 'A',nr, n-nr, zero,zero, v(1,nr+1), ldv )
1175  CALL dlaset( 'A',n-nr,n-nr,zero,one, v(nr+1,nr+1), ldv )
1176  END IF
1177 *
1178  CALL dormlq( 'Left', 'Transpose', n, n, nr, a, lda, work,
1179  $ v, ldv, work(n+1), lwork-n, ierr )
1180 *
1181  END IF
1182 *
1183  DO 8991 p = 1, n
1184  CALL dcopy( n, v(p,1), ldv, a(iwork(p),1), lda )
1185  8991 CONTINUE
1186  CALL dlacpy( 'All', n, n, a, lda, v, ldv )
1187 *
1188  IF ( transp ) THEN
1189  CALL dlacpy( 'All', n, n, v, ldv, u, ldu )
1190  END IF
1191 *
1192  ELSE IF ( lsvec .AND. ( .NOT. rsvec ) ) THEN
1193 *
1194 * .. Singular Values and Left Singular Vectors ..
1195 *
1196 * .. second preconditioning step to avoid need to accumulate
1197 * Jacobi rotations in the Jacobi iterations.
1198  DO 1965 p = 1, nr
1199  CALL dcopy( n-p+1, a(p,p), lda, u(p,p), 1 )
1200  1965 CONTINUE
1201  CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, u(1,2), ldu )
1202 *
1203  CALL dgeqrf( n, nr, u, ldu, work(n+1), work(2*n+1),
1204  $ lwork-2*n, ierr )
1205 *
1206  DO 1967 p = 1, nr - 1
1207  CALL dcopy( nr-p, u(p,p+1), ldu, u(p+1,p), 1 )
1208  1967 CONTINUE
1209  CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, u(1,2), ldu )
1210 *
1211  CALL dgesvj( 'Lower', 'U', 'N', nr,nr, u, ldu, sva, nr, a,
1212  $ lda, work(n+1), lwork-n, info )
1213  scalem = work(n+1)
1214  numrank = idnint(work(n+2))
1215 *
1216  IF ( nr .LT. m ) THEN
1217  CALL dlaset( 'A', m-nr, nr,zero, zero, u(nr+1,1), ldu )
1218  IF ( nr .LT. n1 ) THEN
1219  CALL dlaset( 'A',nr, n1-nr, zero, zero, u(1,nr+1), ldu )
1220  CALL dlaset( 'A',m-nr,n1-nr,zero,one,u(nr+1,nr+1), ldu )
1221  END IF
1222  END IF
1223 *
1224  CALL dormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1225  $ ldu, work(n+1), lwork-n, ierr )
1226 *
1227  IF ( rowpiv )
1228  $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1229 *
1230  DO 1974 p = 1, n1
1231  xsc = one / dnrm2( m, u(1,p), 1 )
1232  CALL dscal( m, xsc, u(1,p), 1 )
1233  1974 CONTINUE
1234 *
1235  IF ( transp ) THEN
1236  CALL dlacpy( 'All', n, n, u, ldu, v, ldv )
1237  END IF
1238 *
1239  ELSE
1240 *
1241 * .. Full SVD ..
1242 *
1243  IF ( .NOT. jracc ) THEN
1244 *
1245  IF ( .NOT. almort ) THEN
1246 *
1247 * Second Preconditioning Step (QRF [with pivoting])
1248 * Note that the composition of TRANSPOSE, QRF and TRANSPOSE is
1249 * equivalent to an LQF CALL. Since in many libraries the QRF
1250 * seems to be better optimized than the LQF, we do explicit
1251 * transpose and use the QRF. This is subject to changes in an
1252 * optimized implementation of DGEJSV.
1253 *
1254  DO 1968 p = 1, nr
1255  CALL dcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1256  1968 CONTINUE
1257 *
1258 * .. the following two loops perturb small entries to avoid
1259 * denormals in the second QR factorization, where they are
1260 * as good as zeros. This is done to avoid painfully slow
1261 * computation with denormals. The relative size of the perturbation
1262 * is a parameter that can be changed by the implementer.
1263 * This perturbation device will be obsolete on machines with
1264 * properly implemented arithmetic.
1265 * To switch it off, set L2PERT=.FALSE. To remove it from the
1266 * code, remove the action under L2PERT=.TRUE., leave the ELSE part.
1267 * The following two loops should be blocked and fused with the
1268 * transposed copy above.
1269 *
1270  IF ( l2pert ) THEN
1271  xsc = dsqrt(small)
1272  DO 2969 q = 1, nr
1273  temp1 = xsc*dabs( v(q,q) )
1274  DO 2968 p = 1, n
1275  IF ( ( p .GT. q ) .AND. ( dabs(v(p,q)) .LE. temp1 )
1276  $ .OR. ( p .LT. q ) )
1277  $ v(p,q) = dsign( temp1, v(p,q) )
1278  IF ( p .LT. q ) v(p,q) = - v(p,q)
1279  2968 CONTINUE
1280  2969 CONTINUE
1281  ELSE
1282  CALL dlaset( 'U', nr-1, nr-1, zero, zero, v(1,2), ldv )
1283  END IF
1284 *
1285 * Estimate the row scaled condition number of R1
1286 * (If R1 is rectangular, N > NR, then the condition number
1287 * of the leading NR x NR submatrix is estimated.)
1288 *
1289  CALL dlacpy( 'L', nr, nr, v, ldv, work(2*n+1), nr )
1290  DO 3950 p = 1, nr
1291  temp1 = dnrm2(nr-p+1,work(2*n+(p-1)*nr+p),1)
1292  CALL dscal(nr-p+1,one/temp1,work(2*n+(p-1)*nr+p),1)
1293  3950 CONTINUE
1294  CALL dpocon('Lower',nr,work(2*n+1),nr,one,temp1,
1295  $ work(2*n+nr*nr+1),iwork(m+2*n+1),ierr)
1296  condr1 = one / dsqrt(temp1)
1297 * .. here need a second oppinion on the condition number
1298 * .. then assume worst case scenario
1299 * R1 is OK for inverse <=> CONDR1 .LT. DBLE(N)
1300 * more conservative <=> CONDR1 .LT. DSQRT(DBLE(N))
1301 *
1302  cond_ok = dsqrt(dble(nr))
1303 *[TP] COND_OK is a tuning parameter.
1304 
1305  IF ( condr1 .LT. cond_ok ) THEN
1306 * .. the second QRF without pivoting. Note: in an optimized
1307 * implementation, this QRF should be implemented as the QRF
1308 * of a lower triangular matrix.
1309 * R1^t = Q2 * R2
1310  CALL dgeqrf( n, nr, v, ldv, work(n+1), work(2*n+1),
1311  $ lwork-2*n, ierr )
1312 *
1313  IF ( l2pert ) THEN
1314  xsc = dsqrt(small)/epsln
1315  DO 3959 p = 2, nr
1316  DO 3958 q = 1, p - 1
1317  temp1 = xsc * min(dabs(v(p,p)),dabs(v(q,q)))
1318  IF ( dabs(v(q,p)) .LE. temp1 )
1319  $ v(q,p) = dsign( temp1, v(q,p) )
1320  3958 CONTINUE
1321  3959 CONTINUE
1322  END IF
1323 *
1324  IF ( nr .NE. n )
1325  $ CALL dlacpy( 'A', n, nr, v, ldv, work(2*n+1), n )
1326 * .. save ...
1327 *
1328 * .. this transposed copy should be better than naive
1329  DO 1969 p = 1, nr - 1
1330  CALL dcopy( nr-p, v(p,p+1), ldv, v(p+1,p), 1 )
1331  1969 CONTINUE
1332 *
1333  condr2 = condr1
1334 *
1335  ELSE
1336 *
1337 * .. ill-conditioned case: second QRF with pivoting
1338 * Note that windowed pivoting would be equaly good
1339 * numerically, and more run-time efficient. So, in
1340 * an optimal implementation, the next call to DGEQP3
1341 * should be replaced with eg. CALL SGEQPX (ACM TOMS #782)
1342 * with properly (carefully) chosen parameters.
1343 *
1344 * R1^t * P2 = Q2 * R2
1345  DO 3003 p = 1, nr
1346  iwork(n+p) = 0
1347  3003 CONTINUE
1348  CALL dgeqp3( n, nr, v, ldv, iwork(n+1), work(n+1),
1349  $ work(2*n+1), lwork-2*n, ierr )
1350 ** CALL DGEQRF( N, NR, V, LDV, WORK(N+1), WORK(2*N+1),
1351 ** $ LWORK-2*N, IERR )
1352  IF ( l2pert ) THEN
1353  xsc = dsqrt(small)
1354  DO 3969 p = 2, nr
1355  DO 3968 q = 1, p - 1
1356  temp1 = xsc * min(dabs(v(p,p)),dabs(v(q,q)))
1357  IF ( dabs(v(q,p)) .LE. temp1 )
1358  $ v(q,p) = dsign( temp1, v(q,p) )
1359  3968 CONTINUE
1360  3969 CONTINUE
1361  END IF
1362 *
1363  CALL dlacpy( 'A', n, nr, v, ldv, work(2*n+1), n )
1364 *
1365  IF ( l2pert ) THEN
1366  xsc = dsqrt(small)
1367  DO 8970 p = 2, nr
1368  DO 8971 q = 1, p - 1
1369  temp1 = xsc * min(dabs(v(p,p)),dabs(v(q,q)))
1370  v(p,q) = - dsign( temp1, v(q,p) )
1371  8971 CONTINUE
1372  8970 CONTINUE
1373  ELSE
1374  CALL dlaset( 'L',nr-1,nr-1,zero,zero,v(2,1),ldv )
1375  END IF
1376 * Now, compute R2 = L3 * Q3, the LQ factorization.
1377  CALL dgelqf( nr, nr, v, ldv, work(2*n+n*nr+1),
1378  $ work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, ierr )
1379 * .. and estimate the condition number
1380  CALL dlacpy( 'L',nr,nr,v,ldv,work(2*n+n*nr+nr+1),nr )
1381  DO 4950 p = 1, nr
1382  temp1 = dnrm2( p, work(2*n+n*nr+nr+p), nr )
1383  CALL dscal( p, one/temp1, work(2*n+n*nr+nr+p), nr )
1384  4950 CONTINUE
1385  CALL dpocon( 'L',nr,work(2*n+n*nr+nr+1),nr,one,temp1,
1386  $ work(2*n+n*nr+nr+nr*nr+1),iwork(m+2*n+1),ierr )
1387  condr2 = one / dsqrt(temp1)
1388 *
1389  IF ( condr2 .GE. cond_ok ) THEN
1390 * .. save the Householder vectors used for Q3
1391 * (this overwrittes the copy of R2, as it will not be
1392 * needed in this branch, but it does not overwritte the
1393 * Huseholder vectors of Q2.).
1394  CALL dlacpy( 'U', nr, nr, v, ldv, work(2*n+1), n )
1395 * .. and the rest of the information on Q3 is in
1396 * WORK(2*N+N*NR+1:2*N+N*NR+N)
1397  END IF
1398 *
1399  END IF
1400 *
1401  IF ( l2pert ) THEN
1402  xsc = dsqrt(small)
1403  DO 4968 q = 2, nr
1404  temp1 = xsc * v(q,q)
1405  DO 4969 p = 1, q - 1
1406 * V(p,q) = - DSIGN( TEMP1, V(q,p) )
1407  v(p,q) = - dsign( temp1, v(p,q) )
1408  4969 CONTINUE
1409  4968 CONTINUE
1410  ELSE
1411  CALL dlaset( 'U', nr-1,nr-1, zero,zero, v(1,2), ldv )
1412  END IF
1413 *
1414 * Second preconditioning finished; continue with Jacobi SVD
1415 * The input matrix is lower trinagular.
1416 *
1417 * Recover the right singular vectors as solution of a well
1418 * conditioned triangular matrix equation.
1419 *
1420  IF ( condr1 .LT. cond_ok ) THEN
1421 *
1422  CALL dgesvj( 'L','U','N',nr,nr,v,ldv,sva,nr,u,
1423  $ ldu,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,info )
1424  scalem = work(2*n+n*nr+nr+1)
1425  numrank = idnint(work(2*n+n*nr+nr+2))
1426  DO 3970 p = 1, nr
1427  CALL dcopy( nr, v(1,p), 1, u(1,p), 1 )
1428  CALL dscal( nr, sva(p), v(1,p), 1 )
1429  3970 CONTINUE
1430 
1431 * .. pick the right matrix equation and solve it
1432 *
1433  IF ( nr .EQ. n ) THEN
1434 * :)) .. best case, R1 is inverted. The solution of this matrix
1435 * equation is Q2*V2 = the product of the Jacobi rotations
1436 * used in DGESVJ, premultiplied with the orthogonal matrix
1437 * from the second QR factorization.
1438  CALL dtrsm( 'L','U','N','N', nr,nr,one, a,lda, v,ldv )
1439  ELSE
1440 * .. R1 is well conditioned, but non-square. Transpose(R2)
1441 * is inverted to get the product of the Jacobi rotations
1442 * used in DGESVJ. The Q-factor from the second QR
1443 * factorization is then built in explicitly.
1444  CALL dtrsm('L','U','T','N',nr,nr,one,work(2*n+1),
1445  $ n,v,ldv)
1446  IF ( nr .LT. n ) THEN
1447  CALL dlaset('A',n-nr,nr,zero,zero,v(nr+1,1),ldv)
1448  CALL dlaset('A',nr,n-nr,zero,zero,v(1,nr+1),ldv)
1449  CALL dlaset('A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv)
1450  END IF
1451  CALL dormqr('L','N',n,n,nr,work(2*n+1),n,work(n+1),
1452  $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr)
1453  END IF
1454 *
1455  ELSE IF ( condr2 .LT. cond_ok ) THEN
1456 *
1457 * :) .. the input matrix A is very likely a relative of
1458 * the Kahan matrix :)
1459 * The matrix R2 is inverted. The solution of the matrix equation
1460 * is Q3^T*V3 = the product of the Jacobi rotations (appplied to
1461 * the lower triangular L3 from the LQ factorization of
1462 * R2=L3*Q3), pre-multiplied with the transposed Q3.
1463  CALL dgesvj( 'L', 'U', 'N', nr, nr, v, ldv, sva, nr, u,
1464  $ ldu, work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, info )
1465  scalem = work(2*n+n*nr+nr+1)
1466  numrank = idnint(work(2*n+n*nr+nr+2))
1467  DO 3870 p = 1, nr
1468  CALL dcopy( nr, v(1,p), 1, u(1,p), 1 )
1469  CALL dscal( nr, sva(p), u(1,p), 1 )
1470  3870 CONTINUE
1471  CALL dtrsm('L','U','N','N',nr,nr,one,work(2*n+1),n,u,ldu)
1472 * .. apply the permutation from the second QR factorization
1473  DO 873 q = 1, nr
1474  DO 872 p = 1, nr
1475  work(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1476  872 CONTINUE
1477  DO 874 p = 1, nr
1478  u(p,q) = work(2*n+n*nr+nr+p)
1479  874 CONTINUE
1480  873 CONTINUE
1481  IF ( nr .LT. n ) THEN
1482  CALL dlaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1483  CALL dlaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1484  CALL dlaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1485  END IF
1486  CALL dormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1487  $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1488  ELSE
1489 * Last line of defense.
1490 * #:( This is a rather pathological case: no scaled condition
1491 * improvement after two pivoted QR factorizations. Other
1492 * possibility is that the rank revealing QR factorization
1493 * or the condition estimator has failed, or the COND_OK
1494 * is set very close to ONE (which is unnecessary). Normally,
1495 * this branch should never be executed, but in rare cases of
1496 * failure of the RRQR or condition estimator, the last line of
1497 * defense ensures that DGEJSV completes the task.
1498 * Compute the full SVD of L3 using DGESVJ with explicit
1499 * accumulation of Jacobi rotations.
1500  CALL dgesvj( 'L', 'U', 'V', nr, nr, v, ldv, sva, nr, u,
1501  $ ldu, work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, info )
1502  scalem = work(2*n+n*nr+nr+1)
1503  numrank = idnint(work(2*n+n*nr+nr+2))
1504  IF ( nr .LT. n ) THEN
1505  CALL dlaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1506  CALL dlaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1507  CALL dlaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1508  END IF
1509  CALL dormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1510  $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1511 *
1512  CALL dormlq( 'L', 'T', nr, nr, nr, work(2*n+1), n,
1513  $ work(2*n+n*nr+1), u, ldu, work(2*n+n*nr+nr+1),
1514  $ lwork-2*n-n*nr-nr, ierr )
1515  DO 773 q = 1, nr
1516  DO 772 p = 1, nr
1517  work(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1518  772 CONTINUE
1519  DO 774 p = 1, nr
1520  u(p,q) = work(2*n+n*nr+nr+p)
1521  774 CONTINUE
1522  773 CONTINUE
1523 *
1524  END IF
1525 *
1526 * Permute the rows of V using the (column) permutation from the
1527 * first QRF. Also, scale the columns to make them unit in
1528 * Euclidean norm. This applies to all cases.
1529 *
1530  temp1 = dsqrt(dble(n)) * epsln
1531  DO 1972 q = 1, n
1532  DO 972 p = 1, n
1533  work(2*n+n*nr+nr+iwork(p)) = v(p,q)
1534  972 CONTINUE
1535  DO 973 p = 1, n
1536  v(p,q) = work(2*n+n*nr+nr+p)
1537  973 CONTINUE
1538  xsc = one / dnrm2( n, v(1,q), 1 )
1539  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1540  $ CALL dscal( n, xsc, v(1,q), 1 )
1541  1972 CONTINUE
1542 * At this moment, V contains the right singular vectors of A.
1543 * Next, assemble the left singular vector matrix U (M x N).
1544  IF ( nr .LT. m ) THEN
1545  CALL dlaset( 'A', m-nr, nr, zero, zero, u(nr+1,1), ldu )
1546  IF ( nr .LT. n1 ) THEN
1547  CALL dlaset('A',nr,n1-nr,zero,zero,u(1,nr+1),ldu)
1548  CALL dlaset('A',m-nr,n1-nr,zero,one,u(nr+1,nr+1),ldu)
1549  END IF
1550  END IF
1551 *
1552 * The Q matrix from the first QRF is built into the left singular
1553 * matrix U. This applies to all cases.
1554 *
1555  CALL dormqr( 'Left', 'No_Tr', m, n1, n, a, lda, work, u,
1556  $ ldu, work(n+1), lwork-n, ierr )
1557 
1558 * The columns of U are normalized. The cost is O(M*N) flops.
1559  temp1 = dsqrt(dble(m)) * epsln
1560  DO 1973 p = 1, nr
1561  xsc = one / dnrm2( m, u(1,p), 1 )
1562  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1563  $ CALL dscal( m, xsc, u(1,p), 1 )
1564  1973 CONTINUE
1565 *
1566 * If the initial QRF is computed with row pivoting, the left
1567 * singular vectors must be adjusted.
1568 *
1569  IF ( rowpiv )
1570  $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1571 *
1572  ELSE
1573 *
1574 * .. the initial matrix A has almost orthogonal columns and
1575 * the second QRF is not needed
1576 *
1577  CALL dlacpy( 'Upper', n, n, a, lda, work(n+1), n )
1578  IF ( l2pert ) THEN
1579  xsc = dsqrt(small)
1580  DO 5970 p = 2, n
1581  temp1 = xsc * work( n + (p-1)*n + p )
1582  DO 5971 q = 1, p - 1
1583  work(n+(q-1)*n+p)=-dsign(temp1,work(n+(p-1)*n+q))
1584  5971 CONTINUE
1585  5970 CONTINUE
1586  ELSE
1587  CALL dlaset( 'Lower',n-1,n-1,zero,zero,work(n+2),n )
1588  END IF
1589 *
1590  CALL dgesvj( 'Upper', 'U', 'N', n, n, work(n+1), n, sva,
1591  $ n, u, ldu, work(n+n*n+1), lwork-n-n*n, info )
1592 *
1593  scalem = work(n+n*n+1)
1594  numrank = idnint(work(n+n*n+2))
1595  DO 6970 p = 1, n
1596  CALL dcopy( n, work(n+(p-1)*n+1), 1, u(1,p), 1 )
1597  CALL dscal( n, sva(p), work(n+(p-1)*n+1), 1 )
1598  6970 CONTINUE
1599 *
1600  CALL dtrsm( 'Left', 'Upper', 'NoTrans', 'No UD', n, n,
1601  $ one, a, lda, work(n+1), n )
1602  DO 6972 p = 1, n
1603  CALL dcopy( n, work(n+p), n, v(iwork(p),1), ldv )
1604  6972 CONTINUE
1605  temp1 = dsqrt(dble(n))*epsln
1606  DO 6971 p = 1, n
1607  xsc = one / dnrm2( n, v(1,p), 1 )
1608  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1609  $ CALL dscal( n, xsc, v(1,p), 1 )
1610  6971 CONTINUE
1611 *
1612 * Assemble the left singular vector matrix U (M x N).
1613 *
1614  IF ( n .LT. m ) THEN
1615  CALL dlaset( 'A', m-n, n, zero, zero, u(n+1,1), ldu )
1616  IF ( n .LT. n1 ) THEN
1617  CALL dlaset( 'A',n, n1-n, zero, zero, u(1,n+1),ldu )
1618  CALL dlaset( 'A',m-n,n1-n, zero, one,u(n+1,n+1),ldu )
1619  END IF
1620  END IF
1621  CALL dormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1622  $ ldu, work(n+1), lwork-n, ierr )
1623  temp1 = dsqrt(dble(m))*epsln
1624  DO 6973 p = 1, n1
1625  xsc = one / dnrm2( m, u(1,p), 1 )
1626  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1627  $ CALL dscal( m, xsc, u(1,p), 1 )
1628  6973 CONTINUE
1629 *
1630  IF ( rowpiv )
1631  $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1632 *
1633  END IF
1634 *
1635 * end of the >> almost orthogonal case << in the full SVD
1636 *
1637  ELSE
1638 *
1639 * This branch deploys a preconditioned Jacobi SVD with explicitly
1640 * accumulated rotations. It is included as optional, mainly for
1641 * experimental purposes. It does perfom well, and can also be used.
1642 * In this implementation, this branch will be automatically activated
1643 * if the condition number sigma_max(A) / sigma_min(A) is predicted
1644 * to be greater than the overflow threshold. This is because the
1645 * a posteriori computation of the singular vectors assumes robust
1646 * implementation of BLAS and some LAPACK procedures, capable of working
1647 * in presence of extreme values. Since that is not always the case, ...
1648 *
1649  DO 7968 p = 1, nr
1650  CALL dcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1651  7968 CONTINUE
1652 *
1653  IF ( l2pert ) THEN
1654  xsc = dsqrt(small/epsln)
1655  DO 5969 q = 1, nr
1656  temp1 = xsc*dabs( v(q,q) )
1657  DO 5968 p = 1, n
1658  IF ( ( p .GT. q ) .AND. ( dabs(v(p,q)) .LE. temp1 )
1659  $ .OR. ( p .LT. q ) )
1660  $ v(p,q) = dsign( temp1, v(p,q) )
1661  IF ( p .LT. q ) v(p,q) = - v(p,q)
1662  5968 CONTINUE
1663  5969 CONTINUE
1664  ELSE
1665  CALL dlaset( 'U', nr-1, nr-1, zero, zero, v(1,2), ldv )
1666  END IF
1667 
1668  CALL dgeqrf( n, nr, v, ldv, work(n+1), work(2*n+1),
1669  $ lwork-2*n, ierr )
1670  CALL dlacpy( 'L', n, nr, v, ldv, work(2*n+1), n )
1671 *
1672  DO 7969 p = 1, nr
1673  CALL dcopy( nr-p+1, v(p,p), ldv, u(p,p), 1 )
1674  7969 CONTINUE
1675 
1676  IF ( l2pert ) THEN
1677  xsc = dsqrt(small/epsln)
1678  DO 9970 q = 2, nr
1679  DO 9971 p = 1, q - 1
1680  temp1 = xsc * min(dabs(u(p,p)),dabs(u(q,q)))
1681  u(p,q) = - dsign( temp1, u(q,p) )
1682  9971 CONTINUE
1683  9970 CONTINUE
1684  ELSE
1685  CALL dlaset('U', nr-1, nr-1, zero, zero, u(1,2), ldu )
1686  END IF
1687 
1688  CALL dgesvj( 'G', 'U', 'V', nr, nr, u, ldu, sva,
1689  $ n, v, ldv, work(2*n+n*nr+1), lwork-2*n-n*nr, info )
1690  scalem = work(2*n+n*nr+1)
1691  numrank = idnint(work(2*n+n*nr+2))
1692 
1693  IF ( nr .LT. n ) THEN
1694  CALL dlaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1695  CALL dlaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1696  CALL dlaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1697  END IF
1698 
1699  CALL dormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1700  $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1701 *
1702 * Permute the rows of V using the (column) permutation from the
1703 * first QRF. Also, scale the columns to make them unit in
1704 * Euclidean norm. This applies to all cases.
1705 *
1706  temp1 = dsqrt(dble(n)) * epsln
1707  DO 7972 q = 1, n
1708  DO 8972 p = 1, n
1709  work(2*n+n*nr+nr+iwork(p)) = v(p,q)
1710  8972 CONTINUE
1711  DO 8973 p = 1, n
1712  v(p,q) = work(2*n+n*nr+nr+p)
1713  8973 CONTINUE
1714  xsc = one / dnrm2( n, v(1,q), 1 )
1715  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1716  $ CALL dscal( n, xsc, v(1,q), 1 )
1717  7972 CONTINUE
1718 *
1719 * At this moment, V contains the right singular vectors of A.
1720 * Next, assemble the left singular vector matrix U (M x N).
1721 *
1722  IF ( nr .LT. m ) THEN
1723  CALL dlaset( 'A', m-nr, nr, zero, zero, u(nr+1,1), ldu )
1724  IF ( nr .LT. n1 ) THEN
1725  CALL dlaset( 'A',nr, n1-nr, zero, zero, u(1,nr+1),ldu )
1726  CALL dlaset( 'A',m-nr,n1-nr, zero, one,u(nr+1,nr+1),ldu )
1727  END IF
1728  END IF
1729 *
1730  CALL dormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1731  $ ldu, work(n+1), lwork-n, ierr )
1732 *
1733  IF ( rowpiv )
1734  $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1735 *
1736 *
1737  END IF
1738  IF ( transp ) THEN
1739 * .. swap U and V because the procedure worked on A^t
1740  DO 6974 p = 1, n
1741  CALL dswap( n, u(1,p), 1, v(1,p), 1 )
1742  6974 CONTINUE
1743  END IF
1744 *
1745  END IF
1746 * end of the full SVD
1747 *
1748 * Undo scaling, if necessary (and possible)
1749 *
1750  IF ( uscal2 .LE. (big/sva(1))*uscal1 ) THEN
1751  CALL dlascl( 'G', 0, 0, uscal1, uscal2, nr, 1, sva, n, ierr )
1752  uscal1 = one
1753  uscal2 = one
1754  END IF
1755 *
1756  IF ( nr .LT. n ) THEN
1757  DO 3004 p = nr+1, n
1758  sva(p) = zero
1759  3004 CONTINUE
1760  END IF
1761 *
1762  work(1) = uscal2 * scalem
1763  work(2) = uscal1
1764  IF ( errest ) work(3) = sconda
1765  IF ( lsvec .AND. rsvec ) THEN
1766  work(4) = condr1
1767  work(5) = condr2
1768  END IF
1769  IF ( l2tran ) THEN
1770  work(6) = entra
1771  work(7) = entrat
1772  END IF
1773 *
1774  iwork(1) = nr
1775  iwork(2) = numrank
1776  iwork(3) = warning
1777 *
1778  RETURN
1779 * ..
1780 * .. END OF DGEJSV
1781 * ..
subroutine dlacpy(UPLO, M, N, A, LDA, B, LDB)
DLACPY copies all or part of one two-dimensional array to another.
Definition: dlacpy.f:105
double precision function dlamch(CMACH)
DLAMCH
Definition: dlamch.f:65
subroutine dtrsm(SIDE, UPLO, TRANSA, DIAG, M, N, ALPHA, A, LDA, B, LDB)
DTRSM
Definition: dtrsm.f:183
subroutine dcopy(N, DX, INCX, DY, INCY)
DCOPY
Definition: dcopy.f:84
integer function idamax(N, DX, INCX)
IDAMAX
Definition: idamax.f:73
subroutine dgeqp3(M, N, A, LDA, JPVT, TAU, WORK, LWORK, INFO)
DGEQP3
Definition: dgeqp3.f:153
subroutine dlascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
DLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: dlascl.f:145
subroutine dlaset(UPLO, M, N, ALPHA, BETA, A, LDA)
DLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values...
Definition: dlaset.f:112
subroutine dgelqf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
DGELQF
Definition: dgelqf.f:137
subroutine dswap(N, DX, INCX, DY, INCY)
DSWAP
Definition: dswap.f:84
subroutine dlassq(N, X, INCX, SCALE, SUMSQ)
DLASSQ updates a sum of squares represented in scaled form.
Definition: dlassq.f:105
subroutine dormqr(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
DORMQR
Definition: dormqr.f:169
subroutine dgesvj(JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, LDV, WORK, LWORK, INFO)
DGESVJ
Definition: dgesvj.f:339
subroutine dorgqr(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
DORGQR
Definition: dorgqr.f:130
double precision function dnrm2(N, X, INCX)
DNRM2
Definition: dnrm2.f:76
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:62
subroutine dlaswp(N, A, LDA, K1, K2, IPIV, INCX)
DLASWP performs a series of row interchanges on a general rectangular matrix.
Definition: dlaswp.f:117
logical function lsame(CA, CB)
LSAME
Definition: lsame.f:55
subroutine dormlq(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
DORMLQ
Definition: dormlq.f:169
subroutine dscal(N, DA, DX, INCX)
DSCAL
Definition: dscal.f:81
subroutine dpocon(UPLO, N, A, LDA, ANORM, RCOND, WORK, IWORK, INFO)
DPOCON
Definition: dpocon.f:123
subroutine dgeqrf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
DGEQRF
Definition: dgeqrf.f:138
Here is the call graph for this function:
Here is the caller graph for this function: