LAPACK 3.12.0
LAPACK: Linear Algebra PACKage
Loading...
Searching...
No Matches

◆ dgejsv()

subroutine dgejsv ( character*1  joba,
character*1  jobu,
character*1  jobv,
character*1  jobr,
character*1  jobt,
character*1  jobp,
integer  m,
integer  n,
double precision, dimension( lda, * )  a,
integer  lda,
double precision, dimension( n )  sva,
double precision, dimension( ldu, * )  u,
integer  ldu,
double precision, dimension( ldv, * )  v,
integer  ldv,
double precision, dimension( lwork )  work,
integer  lwork,
integer, dimension( * )  iwork,
integer  info 
)

DGEJSV

Download DGEJSV + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 DGEJSV computes the singular value decomposition (SVD) of a real M-by-N
 matrix [A], where M >= N. The SVD of [A] is written as

              [A] = [U] * [SIGMA] * [V]^t,

 where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
 diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix, and
 [V] is an N-by-N orthogonal matrix. The diagonal elements of [SIGMA] are
 the singular values of [A]. The columns of [U] and [V] are the left and
 the right singular vectors of [A], respectively. The matrices [U] and [V]
 are computed and stored in the arrays U and V, respectively. The diagonal
 of [SIGMA] is computed and stored in the array SVA.
 DGEJSV can sometimes compute tiny singular values and their singular vectors much
 more accurately than other SVD routines, see below under Further Details.
Parameters
[in]JOBA
          JOBA is CHARACTER*1
        Specifies the level of accuracy:
       = 'C': This option works well (high relative accuracy) if A = B * D,
             with well-conditioned B and arbitrary diagonal matrix D.
             The accuracy cannot be spoiled by COLUMN scaling. The
             accuracy of the computed output depends on the condition of
             B, and the procedure aims at the best theoretical accuracy.
             The relative error max_{i=1:N}|d sigma_i| / sigma_i is
             bounded by f(M,N)*epsilon* cond(B), independent of D.
             The input matrix is preprocessed with the QRF with column
             pivoting. This initial preprocessing and preconditioning by
             a rank revealing QR factorization is common for all values of
             JOBA. Additional actions are specified as follows:
       = 'E': Computation as with 'C' with an additional estimate of the
             condition number of B. It provides a realistic error bound.
       = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
             D1, D2, and well-conditioned matrix C, this option gives
             higher accuracy than the 'C' option. If the structure of the
             input matrix is not known, and relative accuracy is
             desirable, then this option is advisable. The input matrix A
             is preprocessed with QR factorization with FULL (row and
             column) pivoting.
       = 'G': Computation as with 'F' with an additional estimate of the
             condition number of B, where A=D*B. If A has heavily weighted
             rows, then using this condition number gives too pessimistic
             error bound.
       = 'A': Small singular values are the noise and the matrix is treated
             as numerically rank deficient. The error in the computed
             singular values is bounded by f(m,n)*epsilon*||A||.
             The computed SVD A = U * S * V^t restores A up to
             f(m,n)*epsilon*||A||.
             This gives the procedure the licence to discard (set to zero)
             all singular values below N*epsilon*||A||.
       = 'R': Similar as in 'A'. Rank revealing property of the initial
             QR factorization is used do reveal (using triangular factor)
             a gap sigma_{r+1} < epsilon * sigma_r in which case the
             numerical RANK is declared to be r. The SVD is computed with
             absolute error bounds, but more accurately than with 'A'.
[in]JOBU
          JOBU is CHARACTER*1
        Specifies whether to compute the columns of U:
       = 'U': N columns of U are returned in the array U.
       = 'F': full set of M left sing. vectors is returned in the array U.
       = 'W': U may be used as workspace of length M*N. See the description
             of U.
       = 'N': U is not computed.
[in]JOBV
          JOBV is CHARACTER*1
        Specifies whether to compute the matrix V:
       = 'V': N columns of V are returned in the array V; Jacobi rotations
             are not explicitly accumulated.
       = 'J': N columns of V are returned in the array V, but they are
             computed as the product of Jacobi rotations. This option is
             allowed only if JOBU .NE. 'N', i.e. in computing the full SVD.
       = 'W': V may be used as workspace of length N*N. See the description
             of V.
       = 'N': V is not computed.
[in]JOBR
          JOBR is CHARACTER*1
        Specifies the RANGE for the singular values. Issues the licence to
        set to zero small positive singular values if they are outside
        specified range. If A .NE. 0 is scaled so that the largest singular
        value of c*A is around DSQRT(BIG), BIG=SLAMCH('O'), then JOBR issues
        the licence to kill columns of A whose norm in c*A is less than
        DSQRT(SFMIN) (for JOBR = 'R'), or less than SMALL=SFMIN/EPSLN,
        where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E').
       = 'N': Do not kill small columns of c*A. This option assumes that
             BLAS and QR factorizations and triangular solvers are
             implemented to work in that range. If the condition of A
             is greater than BIG, use DGESVJ.
       = 'R': RESTRICTED range for sigma(c*A) is [DSQRT(SFMIN), DSQRT(BIG)]
             (roughly, as described above). This option is recommended.
                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~
        For computing the singular values in the FULL range [SFMIN,BIG]
        use DGESVJ.
[in]JOBT
          JOBT is CHARACTER*1
        If the matrix is square then the procedure may determine to use
        transposed A if A^t seems to be better with respect to convergence.
        If the matrix is not square, JOBT is ignored. This is subject to
        changes in the future.
        The decision is based on two values of entropy over the adjoint
        orbit of A^t * A. See the descriptions of WORK(6) and WORK(7).
       = 'T': transpose if entropy test indicates possibly faster
        convergence of Jacobi process if A^t is taken as input. If A is
        replaced with A^t, then the row pivoting is included automatically.
       = 'N': do not speculate.
        This option can be used to compute only the singular values, or the
        full SVD (U, SIGMA and V). For only one set of singular vectors
        (U or V), the caller should provide both U and V, as one of the
        matrices is used as workspace if the matrix A is transposed.
        The implementer can easily remove this constraint and make the
        code more complicated. See the descriptions of U and V.
[in]JOBP
          JOBP is CHARACTER*1
        Issues the licence to introduce structured perturbations to drown
        denormalized numbers. This licence should be active if the
        denormals are poorly implemented, causing slow computation,
        especially in cases of fast convergence (!). For details see [1,2].
        For the sake of simplicity, this perturbations are included only
        when the full SVD or only the singular values are requested. The
        implementer/user can easily add the perturbation for the cases of
        computing one set of singular vectors.
       = 'P': introduce perturbation
       = 'N': do not perturb
[in]M
          M is INTEGER
         The number of rows of the input matrix A.  M >= 0.
[in]N
          N is INTEGER
         The number of columns of the input matrix A. M >= N >= 0.
[in,out]A
          A is DOUBLE PRECISION array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]SVA
          SVA is DOUBLE PRECISION array, dimension (N)
          On exit,
          - For WORK(1)/WORK(2) = ONE: The singular values of A. During the
            computation SVA contains Euclidean column norms of the
            iterated matrices in the array A.
          - For WORK(1) .NE. WORK(2): The singular values of A are
            (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
            sigma_max(A) overflows or if small singular values have been
            saved from underflow by scaling the input matrix A.
          - If JOBR='R' then some of the singular values may be returned
            as exact zeros obtained by "set to zero" because they are
            below the numerical rank threshold or are denormalized numbers.
[out]U
          U is DOUBLE PRECISION array, dimension ( LDU, N ) or ( LDU, M )
          If JOBU = 'U', then U contains on exit the M-by-N matrix of
                         the left singular vectors.
          If JOBU = 'F', then U contains on exit the M-by-M matrix of
                         the left singular vectors, including an ONB
                         of the orthogonal complement of the Range(A).
          If JOBU = 'W'  .AND. (JOBV = 'V' .AND. JOBT = 'T' .AND. M = N),
                         then U is used as workspace if the procedure
                         replaces A with A^t. In that case, [V] is computed
                         in U as left singular vectors of A^t and then
                         copied back to the V array. This 'W' option is just
                         a reminder to the caller that in this case U is
                         reserved as workspace of length N*N.
          If JOBU = 'N'  U is not referenced, unless JOBT='T'.
[in]LDU
          LDU is INTEGER
          The leading dimension of the array U,  LDU >= 1.
          IF  JOBU = 'U' or 'F' or 'W',  then LDU >= M.
[out]V
          V is DOUBLE PRECISION array, dimension ( LDV, N )
          If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
                         the right singular vectors;
          If JOBV = 'W', AND (JOBU = 'U' AND JOBT = 'T' AND M = N),
                         then V is used as workspace if the procedure
                         replaces A with A^t. In that case, [U] is computed
                         in V as right singular vectors of A^t and then
                         copied back to the U array. This 'W' option is just
                         a reminder to the caller that in this case V is
                         reserved as workspace of length N*N.
          If JOBV = 'N'  V is not referenced, unless JOBT='T'.
[in]LDV
          LDV is INTEGER
          The leading dimension of the array V,  LDV >= 1.
          If JOBV = 'V' or 'J' or 'W', then LDV >= N.
[out]WORK
          WORK is DOUBLE PRECISION array, dimension (MAX(7,LWORK))
          On exit, if N > 0 .AND. M > 0 (else not referenced),
          WORK(1) = SCALE = WORK(2) / WORK(1) is the scaling factor such
                    that SCALE*SVA(1:N) are the computed singular values
                    of A. (See the description of SVA().)
          WORK(2) = See the description of WORK(1).
          WORK(3) = SCONDA is an estimate for the condition number of
                    column equilibrated A. (If JOBA = 'E' or 'G')
                    SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1).
                    It is computed using DPOCON. It holds
                    N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
                    where R is the triangular factor from the QRF of A.
                    However, if R is truncated and the numerical rank is
                    determined to be strictly smaller than N, SCONDA is
                    returned as -1, thus indicating that the smallest
                    singular values might be lost.

          If full SVD is needed, the following two condition numbers are
          useful for the analysis of the algorithm. They are provided for
          a developer/implementer who is familiar with the details of
          the method.

          WORK(4) = an estimate of the scaled condition number of the
                    triangular factor in the first QR factorization.
          WORK(5) = an estimate of the scaled condition number of the
                    triangular factor in the second QR factorization.
          The following two parameters are computed if JOBT = 'T'.
          They are provided for a developer/implementer who is familiar
          with the details of the method.

          WORK(6) = the entropy of A^t*A :: this is the Shannon entropy
                    of diag(A^t*A) / Trace(A^t*A) taken as point in the
                    probability simplex.
          WORK(7) = the entropy of A*A^t.
[in]LWORK
          LWORK is INTEGER
          Length of WORK to confirm proper allocation of work space.
          LWORK depends on the job:

          If only SIGMA is needed (JOBU = 'N', JOBV = 'N') and
            -> .. no scaled condition estimate required (JOBE = 'N'):
               LWORK >= max(2*M+N,4*N+1,7). This is the minimal requirement.
               ->> For optimal performance (blocked code) the optimal value
               is LWORK >= max(2*M+N,3*N+(N+1)*NB,7). Here NB is the optimal
               block size for DGEQP3 and DGEQRF.
               In general, optimal LWORK is computed as
               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), 7).
            -> .. an estimate of the scaled condition number of A is
               required (JOBA='E', 'G'). In this case, LWORK is the maximum
               of the above and N*N+4*N, i.e. LWORK >= max(2*M+N,N*N+4*N,7).
               ->> For optimal performance (blocked code) the optimal value
               is LWORK >= max(2*M+N,3*N+(N+1)*NB, N*N+4*N, 7).
               In general, the optimal length LWORK is computed as
               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF),
                                                     N+N*N+LWORK(DPOCON),7).

          If SIGMA and the right singular vectors are needed (JOBV = 'V'),
            -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
            -> For optimal performance, LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
               where NB is the optimal block size for DGEQP3, DGEQRF, DGELQF,
               DORMLQ. In general, the optimal length LWORK is computed as
               LWORK >= max(2*M+N,N+LWORK(DGEQP3), N+LWORK(DPOCON),
                       N+LWORK(DGELQF), 2*N+LWORK(DGEQRF), N+LWORK(DORMLQ)).

          If SIGMA and the left singular vectors are needed
            -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
            -> For optimal performance:
               if JOBU = 'U' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
               if JOBU = 'F' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,N+M*NB,7),
               where NB is the optimal block size for DGEQP3, DGEQRF, DORMQR.
               In general, the optimal length LWORK is computed as
               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DPOCON),
                        2*N+LWORK(DGEQRF), N+LWORK(DORMQR)).
               Here LWORK(DORMQR) equals N*NB (for JOBU = 'U') or
               M*NB (for JOBU = 'F').

          If the full SVD is needed: (JOBU = 'U' or JOBU = 'F') and
            -> if JOBV = 'V'
               the minimal requirement is LWORK >= max(2*M+N,6*N+2*N*N).
            -> if JOBV = 'J' the minimal requirement is
               LWORK >= max(2*M+N, 4*N+N*N,2*N+N*N+6).
            -> For optimal performance, LWORK should be additionally
               larger than N+M*NB, where NB is the optimal block size
               for DORMQR.
[out]IWORK
          IWORK is INTEGER array, dimension (MAX(3,M+3*N)).
          On exit,
          IWORK(1) = the numerical rank determined after the initial
                     QR factorization with pivoting. See the descriptions
                     of JOBA and JOBR.
          IWORK(2) = the number of the computed nonzero singular values
          IWORK(3) = if nonzero, a warning message:
                     If IWORK(3) = 1 then some of the column norms of A
                     were denormalized floats. The requested high accuracy
                     is not warranted by the data.
[out]INFO
          INFO is INTEGER
           < 0:  if INFO = -i, then the i-th argument had an illegal value.
           = 0:  successful exit;
           > 0:  DGEJSV  did not converge in the maximal allowed number
                 of sweeps. The computed values may be inaccurate.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Further Details:
  DGEJSV implements a preconditioned Jacobi SVD algorithm. It uses DGEQP3,
  DGEQRF, and DGELQF as preprocessors and preconditioners. Optionally, an
  additional row pivoting can be used as a preprocessor, which in some
  cases results in much higher accuracy. An example is matrix A with the
  structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
  diagonal matrices and C is well-conditioned matrix. In that case, complete
  pivoting in the first QR factorizations provides accuracy dependent on the
  condition number of C, and independent of D1, D2. Such higher accuracy is
  not completely understood theoretically, but it works well in practice.
  Further, if A can be written as A = B*D, with well-conditioned B and some
  diagonal D, then the high accuracy is guaranteed, both theoretically and
  in software, independent of D. For more details see [1], [2].
     The computational range for the singular values can be the full range
  ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
  & LAPACK routines called by DGEJSV are implemented to work in that range.
  If that is not the case, then the restriction for safe computation with
  the singular values in the range of normalized IEEE numbers is that the
  spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
  overflow. This code (DGEJSV) is best used in this restricted range,
  meaning that singular values of magnitude below ||A||_2 / DLAMCH('O') are
  returned as zeros. See JOBR for details on this.
     Further, this implementation is somewhat slower than the one described
  in [1,2] due to replacement of some non-LAPACK components, and because
  the choice of some tuning parameters in the iterative part (DGESVJ) is
  left to the implementer on a particular machine.
     The rank revealing QR factorization (in this code: DGEQP3) should be
  implemented as in [3]. We have a new version of DGEQP3 under development
  that is more robust than the current one in LAPACK, with a cleaner cut in
  rank deficient cases. It will be available in the SIGMA library [4].
  If M is much larger than N, it is obvious that the initial QRF with
  column pivoting can be preprocessed by the QRF without pivoting. That
  well known trick is not used in DGEJSV because in some cases heavy row
  weighting can be treated with complete pivoting. The overhead in cases
  M much larger than N is then only due to pivoting, but the benefits in
  terms of accuracy have prevailed. The implementer/user can incorporate
  this extra QRF step easily. The implementer can also improve data movement
  (matrix transpose, matrix copy, matrix transposed copy) - this
  implementation of DGEJSV uses only the simplest, naive data movement.
Contributors:
Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
References:
 [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
     LAPACK Working note 169.
 [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
     LAPACK Working note 170.
 [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
     factorization software - a case study.
     ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
     LAPACK Working note 176.
 [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
     QSVD, (H,K)-SVD computations.
     Department of Mathematics, University of Zagreb, 2008.
Bugs, examples and comments:
Please report all bugs and send interesting examples and/or comments to drmac.nosp@m.@mat.nosp@m.h.hr. Thank you.

Definition at line 473 of file dgejsv.f.

476*
477* -- LAPACK computational routine --
478* -- LAPACK is a software package provided by Univ. of Tennessee, --
479* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
480*
481* .. Scalar Arguments ..
482 IMPLICIT NONE
483 INTEGER INFO, LDA, LDU, LDV, LWORK, M, N
484* ..
485* .. Array Arguments ..
486 DOUBLE PRECISION A( LDA, * ), SVA( N ), U( LDU, * ), V( LDV, * ),
487 $ WORK( LWORK )
488 INTEGER IWORK( * )
489 CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
490* ..
491*
492* ===========================================================================
493*
494* .. Local Parameters ..
495 DOUBLE PRECISION ZERO, ONE
496 parameter( zero = 0.0d0, one = 1.0d0 )
497* ..
498* .. Local Scalars ..
499 DOUBLE PRECISION AAPP, AAQQ, AATMAX, AATMIN, BIG, BIG1, COND_OK,
500 $ CONDR1, CONDR2, ENTRA, ENTRAT, EPSLN, MAXPRJ, SCALEM,
501 $ SCONDA, SFMIN, SMALL, TEMP1, USCAL1, USCAL2, XSC
502 INTEGER IERR, N1, NR, NUMRANK, p, q, WARNING
503 LOGICAL ALMORT, DEFR, ERREST, GOSCAL, JRACC, KILL, LSVEC,
504 $ L2ABER, L2KILL, L2PERT, L2RANK, L2TRAN,
505 $ NOSCAL, ROWPIV, RSVEC, TRANSP
506* ..
507* .. Intrinsic Functions ..
508 INTRINSIC dabs, dlog, max, min, dble, idnint, dsign, dsqrt
509* ..
510* .. External Functions ..
511 DOUBLE PRECISION DLAMCH, DNRM2
512 INTEGER IDAMAX
513 LOGICAL LSAME
514 EXTERNAL idamax, lsame, dlamch, dnrm2
515* ..
516* .. External Subroutines ..
517 EXTERNAL dcopy, dgelqf, dgeqp3, dgeqrf, dlacpy, dlascl,
520*
521 EXTERNAL dgesvj
522* ..
523*
524* Test the input arguments
525*
526 lsvec = lsame( jobu, 'U' ) .OR. lsame( jobu, 'F' )
527 jracc = lsame( jobv, 'J' )
528 rsvec = lsame( jobv, 'V' ) .OR. jracc
529 rowpiv = lsame( joba, 'F' ) .OR. lsame( joba, 'G' )
530 l2rank = lsame( joba, 'R' )
531 l2aber = lsame( joba, 'A' )
532 errest = lsame( joba, 'E' ) .OR. lsame( joba, 'G' )
533 l2tran = lsame( jobt, 'T' )
534 l2kill = lsame( jobr, 'R' )
535 defr = lsame( jobr, 'N' )
536 l2pert = lsame( jobp, 'P' )
537*
538 IF ( .NOT.(rowpiv .OR. l2rank .OR. l2aber .OR.
539 $ errest .OR. lsame( joba, 'C' ) )) THEN
540 info = - 1
541 ELSE IF ( .NOT.( lsvec .OR. lsame( jobu, 'N' ) .OR.
542 $ lsame( jobu, 'W' )) ) THEN
543 info = - 2
544 ELSE IF ( .NOT.( rsvec .OR. lsame( jobv, 'N' ) .OR.
545 $ lsame( jobv, 'W' )) .OR. ( jracc .AND. (.NOT.lsvec) ) ) THEN
546 info = - 3
547 ELSE IF ( .NOT. ( l2kill .OR. defr ) ) THEN
548 info = - 4
549 ELSE IF ( .NOT. ( l2tran .OR. lsame( jobt, 'N' ) ) ) THEN
550 info = - 5
551 ELSE IF ( .NOT. ( l2pert .OR. lsame( jobp, 'N' ) ) ) THEN
552 info = - 6
553 ELSE IF ( m .LT. 0 ) THEN
554 info = - 7
555 ELSE IF ( ( n .LT. 0 ) .OR. ( n .GT. m ) ) THEN
556 info = - 8
557 ELSE IF ( lda .LT. m ) THEN
558 info = - 10
559 ELSE IF ( lsvec .AND. ( ldu .LT. m ) ) THEN
560 info = - 13
561 ELSE IF ( rsvec .AND. ( ldv .LT. n ) ) THEN
562 info = - 15
563 ELSE IF ( (.NOT.(lsvec .OR. rsvec .OR. errest).AND.
564 & (lwork .LT. max(7,4*n+1,2*m+n))) .OR.
565 & (.NOT.(lsvec .OR. rsvec) .AND. errest .AND.
566 & (lwork .LT. max(7,4*n+n*n,2*m+n))) .OR.
567 & (lsvec .AND. (.NOT.rsvec) .AND. (lwork .LT. max(7,2*m+n,4*n+1)))
568 & .OR.
569 & (rsvec .AND. (.NOT.lsvec) .AND. (lwork .LT. max(7,2*m+n,4*n+1)))
570 & .OR.
571 & (lsvec .AND. rsvec .AND. (.NOT.jracc) .AND.
572 & (lwork.LT.max(2*m+n,6*n+2*n*n)))
573 & .OR. (lsvec .AND. rsvec .AND. jracc .AND.
574 & lwork.LT.max(2*m+n,4*n+n*n,2*n+n*n+6)))
575 & THEN
576 info = - 17
577 ELSE
578* #:)
579 info = 0
580 END IF
581*
582 IF ( info .NE. 0 ) THEN
583* #:(
584 CALL xerbla( 'DGEJSV', - info )
585 RETURN
586 END IF
587*
588* Quick return for void matrix (Y3K safe)
589* #:)
590 IF ( ( m .EQ. 0 ) .OR. ( n .EQ. 0 ) ) THEN
591 iwork(1:3) = 0
592 work(1:7) = 0
593 RETURN
594 ENDIF
595*
596* Determine whether the matrix U should be M x N or M x M
597*
598 IF ( lsvec ) THEN
599 n1 = n
600 IF ( lsame( jobu, 'F' ) ) n1 = m
601 END IF
602*
603* Set numerical parameters
604*
605*! NOTE: Make sure DLAMCH() does not fail on the target architecture.
606*
607 epsln = dlamch('Epsilon')
608 sfmin = dlamch('SafeMinimum')
609 small = sfmin / epsln
610 big = dlamch('O')
611* BIG = ONE / SFMIN
612*
613* Initialize SVA(1:N) = diag( ||A e_i||_2 )_1^N
614*
615*(!) If necessary, scale SVA() to protect the largest norm from
616* overflow. It is possible that this scaling pushes the smallest
617* column norm left from the underflow threshold (extreme case).
618*
619 scalem = one / dsqrt(dble(m)*dble(n))
620 noscal = .true.
621 goscal = .true.
622 DO 1874 p = 1, n
623 aapp = zero
624 aaqq = one
625 CALL dlassq( m, a(1,p), 1, aapp, aaqq )
626 IF ( aapp .GT. big ) THEN
627 info = - 9
628 CALL xerbla( 'DGEJSV', -info )
629 RETURN
630 END IF
631 aaqq = dsqrt(aaqq)
632 IF ( ( aapp .LT. (big / aaqq) ) .AND. noscal ) THEN
633 sva(p) = aapp * aaqq
634 ELSE
635 noscal = .false.
636 sva(p) = aapp * ( aaqq * scalem )
637 IF ( goscal ) THEN
638 goscal = .false.
639 CALL dscal( p-1, scalem, sva, 1 )
640 END IF
641 END IF
642 1874 CONTINUE
643*
644 IF ( noscal ) scalem = one
645*
646 aapp = zero
647 aaqq = big
648 DO 4781 p = 1, n
649 aapp = max( aapp, sva(p) )
650 IF ( sva(p) .NE. zero ) aaqq = min( aaqq, sva(p) )
651 4781 CONTINUE
652*
653* Quick return for zero M x N matrix
654* #:)
655 IF ( aapp .EQ. zero ) THEN
656 IF ( lsvec ) CALL dlaset( 'G', m, n1, zero, one, u, ldu )
657 IF ( rsvec ) CALL dlaset( 'G', n, n, zero, one, v, ldv )
658 work(1) = one
659 work(2) = one
660 IF ( errest ) work(3) = one
661 IF ( lsvec .AND. rsvec ) THEN
662 work(4) = one
663 work(5) = one
664 END IF
665 IF ( l2tran ) THEN
666 work(6) = zero
667 work(7) = zero
668 END IF
669 iwork(1) = 0
670 iwork(2) = 0
671 iwork(3) = 0
672 RETURN
673 END IF
674*
675* Issue warning if denormalized column norms detected. Override the
676* high relative accuracy request. Issue licence to kill columns
677* (set them to zero) whose norm is less than sigma_max / BIG (roughly).
678* #:(
679 warning = 0
680 IF ( aaqq .LE. sfmin ) THEN
681 l2rank = .true.
682 l2kill = .true.
683 warning = 1
684 END IF
685*
686* Quick return for one-column matrix
687* #:)
688 IF ( n .EQ. 1 ) THEN
689*
690 IF ( lsvec ) THEN
691 CALL dlascl( 'G',0,0,sva(1),scalem, m,1,a(1,1),lda,ierr )
692 CALL dlacpy( 'A', m, 1, a, lda, u, ldu )
693* computing all M left singular vectors of the M x 1 matrix
694 IF ( n1 .NE. n ) THEN
695 CALL dgeqrf( m, n, u,ldu, work, work(n+1),lwork-n,ierr )
696 CALL dorgqr( m,n1,1, u,ldu,work,work(n+1),lwork-n,ierr )
697 CALL dcopy( m, a(1,1), 1, u(1,1), 1 )
698 END IF
699 END IF
700 IF ( rsvec ) THEN
701 v(1,1) = one
702 END IF
703 IF ( sva(1) .LT. (big*scalem) ) THEN
704 sva(1) = sva(1) / scalem
705 scalem = one
706 END IF
707 work(1) = one / scalem
708 work(2) = one
709 IF ( sva(1) .NE. zero ) THEN
710 iwork(1) = 1
711 IF ( ( sva(1) / scalem) .GE. sfmin ) THEN
712 iwork(2) = 1
713 ELSE
714 iwork(2) = 0
715 END IF
716 ELSE
717 iwork(1) = 0
718 iwork(2) = 0
719 END IF
720 iwork(3) = 0
721 IF ( errest ) work(3) = one
722 IF ( lsvec .AND. rsvec ) THEN
723 work(4) = one
724 work(5) = one
725 END IF
726 IF ( l2tran ) THEN
727 work(6) = zero
728 work(7) = zero
729 END IF
730 RETURN
731*
732 END IF
733*
734 transp = .false.
735 l2tran = l2tran .AND. ( m .EQ. n )
736*
737 aatmax = -one
738 aatmin = big
739 IF ( rowpiv .OR. l2tran ) THEN
740*
741* Compute the row norms, needed to determine row pivoting sequence
742* (in the case of heavily row weighted A, row pivoting is strongly
743* advised) and to collect information needed to compare the
744* structures of A * A^t and A^t * A (in the case L2TRAN.EQ..TRUE.).
745*
746 IF ( l2tran ) THEN
747 DO 1950 p = 1, m
748 xsc = zero
749 temp1 = one
750 CALL dlassq( n, a(p,1), lda, xsc, temp1 )
751* DLASSQ gets both the ell_2 and the ell_infinity norm
752* in one pass through the vector
753 work(m+n+p) = xsc * scalem
754 work(n+p) = xsc * (scalem*dsqrt(temp1))
755 aatmax = max( aatmax, work(n+p) )
756 IF (work(n+p) .NE. zero) aatmin = min(aatmin,work(n+p))
757 1950 CONTINUE
758 ELSE
759 DO 1904 p = 1, m
760 work(m+n+p) = scalem*dabs( a(p,idamax(n,a(p,1),lda)) )
761 aatmax = max( aatmax, work(m+n+p) )
762 aatmin = min( aatmin, work(m+n+p) )
763 1904 CONTINUE
764 END IF
765*
766 END IF
767*
768* For square matrix A try to determine whether A^t would be better
769* input for the preconditioned Jacobi SVD, with faster convergence.
770* The decision is based on an O(N) function of the vector of column
771* and row norms of A, based on the Shannon entropy. This should give
772* the right choice in most cases when the difference actually matters.
773* It may fail and pick the slower converging side.
774*
775 entra = zero
776 entrat = zero
777 IF ( l2tran ) THEN
778*
779 xsc = zero
780 temp1 = one
781 CALL dlassq( n, sva, 1, xsc, temp1 )
782 temp1 = one / temp1
783*
784 entra = zero
785 DO 1113 p = 1, n
786 big1 = ( ( sva(p) / xsc )**2 ) * temp1
787 IF ( big1 .NE. zero ) entra = entra + big1 * dlog(big1)
788 1113 CONTINUE
789 entra = - entra / dlog(dble(n))
790*
791* Now, SVA().^2/Trace(A^t * A) is a point in the probability simplex.
792* It is derived from the diagonal of A^t * A. Do the same with the
793* diagonal of A * A^t, compute the entropy of the corresponding
794* probability distribution. Note that A * A^t and A^t * A have the
795* same trace.
796*
797 entrat = zero
798 DO 1114 p = n+1, n+m
799 big1 = ( ( work(p) / xsc )**2 ) * temp1
800 IF ( big1 .NE. zero ) entrat = entrat + big1 * dlog(big1)
801 1114 CONTINUE
802 entrat = - entrat / dlog(dble(m))
803*
804* Analyze the entropies and decide A or A^t. Smaller entropy
805* usually means better input for the algorithm.
806*
807 transp = ( entrat .LT. entra )
808*
809* If A^t is better than A, transpose A.
810*
811 IF ( transp ) THEN
812* In an optimal implementation, this trivial transpose
813* should be replaced with faster transpose.
814 DO 1115 p = 1, n - 1
815 DO 1116 q = p + 1, n
816 temp1 = a(q,p)
817 a(q,p) = a(p,q)
818 a(p,q) = temp1
819 1116 CONTINUE
820 1115 CONTINUE
821 DO 1117 p = 1, n
822 work(m+n+p) = sva(p)
823 sva(p) = work(n+p)
824 1117 CONTINUE
825 temp1 = aapp
826 aapp = aatmax
827 aatmax = temp1
828 temp1 = aaqq
829 aaqq = aatmin
830 aatmin = temp1
831 kill = lsvec
832 lsvec = rsvec
833 rsvec = kill
834 IF ( lsvec ) n1 = n
835*
836 rowpiv = .true.
837 END IF
838*
839 END IF
840* END IF L2TRAN
841*
842* Scale the matrix so that its maximal singular value remains less
843* than DSQRT(BIG) -- the matrix is scaled so that its maximal column
844* has Euclidean norm equal to DSQRT(BIG/N). The only reason to keep
845* DSQRT(BIG) instead of BIG is the fact that DGEJSV uses LAPACK and
846* BLAS routines that, in some implementations, are not capable of
847* working in the full interval [SFMIN,BIG] and that they may provoke
848* overflows in the intermediate results. If the singular values spread
849* from SFMIN to BIG, then DGESVJ will compute them. So, in that case,
850* one should use DGESVJ instead of DGEJSV.
851*
852 big1 = dsqrt( big )
853 temp1 = dsqrt( big / dble(n) )
854*
855 CALL dlascl( 'G', 0, 0, aapp, temp1, n, 1, sva, n, ierr )
856 IF ( aaqq .GT. (aapp * sfmin) ) THEN
857 aaqq = ( aaqq / aapp ) * temp1
858 ELSE
859 aaqq = ( aaqq * temp1 ) / aapp
860 END IF
861 temp1 = temp1 * scalem
862 CALL dlascl( 'G', 0, 0, aapp, temp1, m, n, a, lda, ierr )
863*
864* To undo scaling at the end of this procedure, multiply the
865* computed singular values with USCAL2 / USCAL1.
866*
867 uscal1 = temp1
868 uscal2 = aapp
869*
870 IF ( l2kill ) THEN
871* L2KILL enforces computation of nonzero singular values in
872* the restricted range of condition number of the initial A,
873* sigma_max(A) / sigma_min(A) approx. DSQRT(BIG)/DSQRT(SFMIN).
874 xsc = dsqrt( sfmin )
875 ELSE
876 xsc = small
877*
878* Now, if the condition number of A is too big,
879* sigma_max(A) / sigma_min(A) .GT. DSQRT(BIG/N) * EPSLN / SFMIN,
880* as a precaution measure, the full SVD is computed using DGESVJ
881* with accumulated Jacobi rotations. This provides numerically
882* more robust computation, at the cost of slightly increased run
883* time. Depending on the concrete implementation of BLAS and LAPACK
884* (i.e. how they behave in presence of extreme ill-conditioning) the
885* implementor may decide to remove this switch.
886 IF ( ( aaqq.LT.dsqrt(sfmin) ) .AND. lsvec .AND. rsvec ) THEN
887 jracc = .true.
888 END IF
889*
890 END IF
891 IF ( aaqq .LT. xsc ) THEN
892 DO 700 p = 1, n
893 IF ( sva(p) .LT. xsc ) THEN
894 CALL dlaset( 'A', m, 1, zero, zero, a(1,p), lda )
895 sva(p) = zero
896 END IF
897 700 CONTINUE
898 END IF
899*
900* Preconditioning using QR factorization with pivoting
901*
902 IF ( rowpiv ) THEN
903* Optional row permutation (Bjoerck row pivoting):
904* A result by Cox and Higham shows that the Bjoerck's
905* row pivoting combined with standard column pivoting
906* has similar effect as Powell-Reid complete pivoting.
907* The ell-infinity norms of A are made nonincreasing.
908 DO 1952 p = 1, m - 1
909 q = idamax( m-p+1, work(m+n+p), 1 ) + p - 1
910 iwork(2*n+p) = q
911 IF ( p .NE. q ) THEN
912 temp1 = work(m+n+p)
913 work(m+n+p) = work(m+n+q)
914 work(m+n+q) = temp1
915 END IF
916 1952 CONTINUE
917 CALL dlaswp( n, a, lda, 1, m-1, iwork(2*n+1), 1 )
918 END IF
919*
920* End of the preparation phase (scaling, optional sorting and
921* transposing, optional flushing of small columns).
922*
923* Preconditioning
924*
925* If the full SVD is needed, the right singular vectors are computed
926* from a matrix equation, and for that we need theoretical analysis
927* of the Businger-Golub pivoting. So we use DGEQP3 as the first RR QRF.
928* In all other cases the first RR QRF can be chosen by other criteria
929* (eg speed by replacing global with restricted window pivoting, such
930* as in SGEQPX from TOMS # 782). Good results will be obtained using
931* SGEQPX with properly (!) chosen numerical parameters.
932* Any improvement of DGEQP3 improves overall performance of DGEJSV.
933*
934* A * P1 = Q1 * [ R1^t 0]^t:
935 DO 1963 p = 1, n
936* .. all columns are free columns
937 iwork(p) = 0
938 1963 CONTINUE
939 CALL dgeqp3( m,n,a,lda, iwork,work, work(n+1),lwork-n, ierr )
940*
941* The upper triangular matrix R1 from the first QRF is inspected for
942* rank deficiency and possibilities for deflation, or possible
943* ill-conditioning. Depending on the user specified flag L2RANK,
944* the procedure explores possibilities to reduce the numerical
945* rank by inspecting the computed upper triangular factor. If
946* L2RANK or L2ABER are up, then DGEJSV will compute the SVD of
947* A + dA, where ||dA|| <= f(M,N)*EPSLN.
948*
949 nr = 1
950 IF ( l2aber ) THEN
951* Standard absolute error bound suffices. All sigma_i with
952* sigma_i < N*EPSLN*||A|| are flushed to zero. This is an
953* aggressive enforcement of lower numerical rank by introducing a
954* backward error of the order of N*EPSLN*||A||.
955 temp1 = dsqrt(dble(n))*epsln
956 DO 3001 p = 2, n
957 IF ( dabs(a(p,p)) .GE. (temp1*dabs(a(1,1))) ) THEN
958 nr = nr + 1
959 ELSE
960 GO TO 3002
961 END IF
962 3001 CONTINUE
963 3002 CONTINUE
964 ELSE IF ( l2rank ) THEN
965* .. similarly as above, only slightly more gentle (less aggressive).
966* Sudden drop on the diagonal of R1 is used as the criterion for
967* close-to-rank-deficient.
968 temp1 = dsqrt(sfmin)
969 DO 3401 p = 2, n
970 IF ( ( dabs(a(p,p)) .LT. (epsln*dabs(a(p-1,p-1))) ) .OR.
971 $ ( dabs(a(p,p)) .LT. small ) .OR.
972 $ ( l2kill .AND. (dabs(a(p,p)) .LT. temp1) ) ) GO TO 3402
973 nr = nr + 1
974 3401 CONTINUE
975 3402 CONTINUE
976*
977 ELSE
978* The goal is high relative accuracy. However, if the matrix
979* has high scaled condition number the relative accuracy is in
980* general not feasible. Later on, a condition number estimator
981* will be deployed to estimate the scaled condition number.
982* Here we just remove the underflowed part of the triangular
983* factor. This prevents the situation in which the code is
984* working hard to get the accuracy not warranted by the data.
985 temp1 = dsqrt(sfmin)
986 DO 3301 p = 2, n
987 IF ( ( dabs(a(p,p)) .LT. small ) .OR.
988 $ ( l2kill .AND. (dabs(a(p,p)) .LT. temp1) ) ) GO TO 3302
989 nr = nr + 1
990 3301 CONTINUE
991 3302 CONTINUE
992*
993 END IF
994*
995 almort = .false.
996 IF ( nr .EQ. n ) THEN
997 maxprj = one
998 DO 3051 p = 2, n
999 temp1 = dabs(a(p,p)) / sva(iwork(p))
1000 maxprj = min( maxprj, temp1 )
1001 3051 CONTINUE
1002 IF ( maxprj**2 .GE. one - dble(n)*epsln ) almort = .true.
1003 END IF
1004*
1005*
1006 sconda = - one
1007 condr1 = - one
1008 condr2 = - one
1009*
1010 IF ( errest ) THEN
1011 IF ( n .EQ. nr ) THEN
1012 IF ( rsvec ) THEN
1013* .. V is available as workspace
1014 CALL dlacpy( 'U', n, n, a, lda, v, ldv )
1015 DO 3053 p = 1, n
1016 temp1 = sva(iwork(p))
1017 CALL dscal( p, one/temp1, v(1,p), 1 )
1018 3053 CONTINUE
1019 CALL dpocon( 'U', n, v, ldv, one, temp1,
1020 $ work(n+1), iwork(2*n+m+1), ierr )
1021 ELSE IF ( lsvec ) THEN
1022* .. U is available as workspace
1023 CALL dlacpy( 'U', n, n, a, lda, u, ldu )
1024 DO 3054 p = 1, n
1025 temp1 = sva(iwork(p))
1026 CALL dscal( p, one/temp1, u(1,p), 1 )
1027 3054 CONTINUE
1028 CALL dpocon( 'U', n, u, ldu, one, temp1,
1029 $ work(n+1), iwork(2*n+m+1), ierr )
1030 ELSE
1031 CALL dlacpy( 'U', n, n, a, lda, work(n+1), n )
1032 DO 3052 p = 1, n
1033 temp1 = sva(iwork(p))
1034 CALL dscal( p, one/temp1, work(n+(p-1)*n+1), 1 )
1035 3052 CONTINUE
1036* .. the columns of R are scaled to have unit Euclidean lengths.
1037 CALL dpocon( 'U', n, work(n+1), n, one, temp1,
1038 $ work(n+n*n+1), iwork(2*n+m+1), ierr )
1039 END IF
1040 sconda = one / dsqrt(temp1)
1041* SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1).
1042* N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
1043 ELSE
1044 sconda = - one
1045 END IF
1046 END IF
1047*
1048 l2pert = l2pert .AND. ( dabs( a(1,1)/a(nr,nr) ) .GT. dsqrt(big1) )
1049* If there is no violent scaling, artificial perturbation is not needed.
1050*
1051* Phase 3:
1052*
1053 IF ( .NOT. ( rsvec .OR. lsvec ) ) THEN
1054*
1055* Singular Values only
1056*
1057* .. transpose A(1:NR,1:N)
1058 DO 1946 p = 1, min( n-1, nr )
1059 CALL dcopy( n-p, a(p,p+1), lda, a(p+1,p), 1 )
1060 1946 CONTINUE
1061*
1062* The following two DO-loops introduce small relative perturbation
1063* into the strict upper triangle of the lower triangular matrix.
1064* Small entries below the main diagonal are also changed.
1065* This modification is useful if the computing environment does not
1066* provide/allow FLUSH TO ZERO underflow, for it prevents many
1067* annoying denormalized numbers in case of strongly scaled matrices.
1068* The perturbation is structured so that it does not introduce any
1069* new perturbation of the singular values, and it does not destroy
1070* the job done by the preconditioner.
1071* The licence for this perturbation is in the variable L2PERT, which
1072* should be .FALSE. if FLUSH TO ZERO underflow is active.
1073*
1074 IF ( .NOT. almort ) THEN
1075*
1076 IF ( l2pert ) THEN
1077* XSC = DSQRT(SMALL)
1078 xsc = epsln / dble(n)
1079 DO 4947 q = 1, nr
1080 temp1 = xsc*dabs(a(q,q))
1081 DO 4949 p = 1, n
1082 IF ( ( (p.GT.q) .AND. (dabs(a(p,q)).LE.temp1) )
1083 $ .OR. ( p .LT. q ) )
1084 $ a(p,q) = dsign( temp1, a(p,q) )
1085 4949 CONTINUE
1086 4947 CONTINUE
1087 ELSE
1088 CALL dlaset( 'U', nr-1,nr-1, zero,zero, a(1,2),lda )
1089 END IF
1090*
1091* .. second preconditioning using the QR factorization
1092*
1093 CALL dgeqrf( n,nr, a,lda, work, work(n+1),lwork-n, ierr )
1094*
1095* .. and transpose upper to lower triangular
1096 DO 1948 p = 1, nr - 1
1097 CALL dcopy( nr-p, a(p,p+1), lda, a(p+1,p), 1 )
1098 1948 CONTINUE
1099*
1100 END IF
1101*
1102* Row-cyclic Jacobi SVD algorithm with column pivoting
1103*
1104* .. again some perturbation (a "background noise") is added
1105* to drown denormals
1106 IF ( l2pert ) THEN
1107* XSC = DSQRT(SMALL)
1108 xsc = epsln / dble(n)
1109 DO 1947 q = 1, nr
1110 temp1 = xsc*dabs(a(q,q))
1111 DO 1949 p = 1, nr
1112 IF ( ( (p.GT.q) .AND. (dabs(a(p,q)).LE.temp1) )
1113 $ .OR. ( p .LT. q ) )
1114 $ a(p,q) = dsign( temp1, a(p,q) )
1115 1949 CONTINUE
1116 1947 CONTINUE
1117 ELSE
1118 CALL dlaset( 'U', nr-1, nr-1, zero, zero, a(1,2), lda )
1119 END IF
1120*
1121* .. and one-sided Jacobi rotations are started on a lower
1122* triangular matrix (plus perturbation which is ignored in
1123* the part which destroys triangular form (confusing?!))
1124*
1125 CALL dgesvj( 'L', 'NoU', 'NoV', nr, nr, a, lda, sva,
1126 $ n, v, ldv, work, lwork, info )
1127*
1128 scalem = work(1)
1129 numrank = idnint(work(2))
1130*
1131*
1132 ELSE IF ( rsvec .AND. ( .NOT. lsvec ) ) THEN
1133*
1134* -> Singular Values and Right Singular Vectors <-
1135*
1136 IF ( almort ) THEN
1137*
1138* .. in this case NR equals N
1139 DO 1998 p = 1, nr
1140 CALL dcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1141 1998 CONTINUE
1142 CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2), ldv )
1143*
1144 CALL dgesvj( 'L','U','N', n, nr, v,ldv, sva, nr, a,lda,
1145 $ work, lwork, info )
1146 scalem = work(1)
1147 numrank = idnint(work(2))
1148
1149 ELSE
1150*
1151* .. two more QR factorizations ( one QRF is not enough, two require
1152* accumulated product of Jacobi rotations, three are perfect )
1153*
1154 CALL dlaset( 'Lower', nr-1, nr-1, zero, zero, a(2,1), lda )
1155 CALL dgelqf( nr, n, a, lda, work, work(n+1), lwork-n, ierr)
1156 CALL dlacpy( 'Lower', nr, nr, a, lda, v, ldv )
1157 CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2), ldv )
1158 CALL dgeqrf( nr, nr, v, ldv, work(n+1), work(2*n+1),
1159 $ lwork-2*n, ierr )
1160 DO 8998 p = 1, nr
1161 CALL dcopy( nr-p+1, v(p,p), ldv, v(p,p), 1 )
1162 8998 CONTINUE
1163 CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2), ldv )
1164*
1165 CALL dgesvj( 'Lower', 'U','N', nr, nr, v,ldv, sva, nr, u,
1166 $ ldu, work(n+1), lwork, info )
1167 scalem = work(n+1)
1168 numrank = idnint(work(n+2))
1169 IF ( nr .LT. n ) THEN
1170 CALL dlaset( 'A',n-nr, nr, zero,zero, v(nr+1,1), ldv )
1171 CALL dlaset( 'A',nr, n-nr, zero,zero, v(1,nr+1), ldv )
1172 CALL dlaset( 'A',n-nr,n-nr,zero,one, v(nr+1,nr+1), ldv )
1173 END IF
1174*
1175 CALL dormlq( 'Left', 'Transpose', n, n, nr, a, lda, work,
1176 $ v, ldv, work(n+1), lwork-n, ierr )
1177*
1178 END IF
1179*
1180 DO 8991 p = 1, n
1181 CALL dcopy( n, v(p,1), ldv, a(iwork(p),1), lda )
1182 8991 CONTINUE
1183 CALL dlacpy( 'All', n, n, a, lda, v, ldv )
1184*
1185 IF ( transp ) THEN
1186 CALL dlacpy( 'All', n, n, v, ldv, u, ldu )
1187 END IF
1188*
1189 ELSE IF ( lsvec .AND. ( .NOT. rsvec ) ) THEN
1190*
1191* .. Singular Values and Left Singular Vectors ..
1192*
1193* .. second preconditioning step to avoid need to accumulate
1194* Jacobi rotations in the Jacobi iterations.
1195 DO 1965 p = 1, nr
1196 CALL dcopy( n-p+1, a(p,p), lda, u(p,p), 1 )
1197 1965 CONTINUE
1198 CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, u(1,2), ldu )
1199*
1200 CALL dgeqrf( n, nr, u, ldu, work(n+1), work(2*n+1),
1201 $ lwork-2*n, ierr )
1202*
1203 DO 1967 p = 1, nr - 1
1204 CALL dcopy( nr-p, u(p,p+1), ldu, u(p+1,p), 1 )
1205 1967 CONTINUE
1206 CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, u(1,2), ldu )
1207*
1208 CALL dgesvj( 'Lower', 'U', 'N', nr,nr, u, ldu, sva, nr, a,
1209 $ lda, work(n+1), lwork-n, info )
1210 scalem = work(n+1)
1211 numrank = idnint(work(n+2))
1212*
1213 IF ( nr .LT. m ) THEN
1214 CALL dlaset( 'A', m-nr, nr,zero, zero, u(nr+1,1), ldu )
1215 IF ( nr .LT. n1 ) THEN
1216 CALL dlaset( 'A',nr, n1-nr, zero, zero, u(1,nr+1), ldu )
1217 CALL dlaset( 'A',m-nr,n1-nr,zero,one,u(nr+1,nr+1), ldu )
1218 END IF
1219 END IF
1220*
1221 CALL dormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1222 $ ldu, work(n+1), lwork-n, ierr )
1223*
1224 IF ( rowpiv )
1225 $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1226*
1227 DO 1974 p = 1, n1
1228 xsc = one / dnrm2( m, u(1,p), 1 )
1229 CALL dscal( m, xsc, u(1,p), 1 )
1230 1974 CONTINUE
1231*
1232 IF ( transp ) THEN
1233 CALL dlacpy( 'All', n, n, u, ldu, v, ldv )
1234 END IF
1235*
1236 ELSE
1237*
1238* .. Full SVD ..
1239*
1240 IF ( .NOT. jracc ) THEN
1241*
1242 IF ( .NOT. almort ) THEN
1243*
1244* Second Preconditioning Step (QRF [with pivoting])
1245* Note that the composition of TRANSPOSE, QRF and TRANSPOSE is
1246* equivalent to an LQF CALL. Since in many libraries the QRF
1247* seems to be better optimized than the LQF, we do explicit
1248* transpose and use the QRF. This is subject to changes in an
1249* optimized implementation of DGEJSV.
1250*
1251 DO 1968 p = 1, nr
1252 CALL dcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1253 1968 CONTINUE
1254*
1255* .. the following two loops perturb small entries to avoid
1256* denormals in the second QR factorization, where they are
1257* as good as zeros. This is done to avoid painfully slow
1258* computation with denormals. The relative size of the perturbation
1259* is a parameter that can be changed by the implementer.
1260* This perturbation device will be obsolete on machines with
1261* properly implemented arithmetic.
1262* To switch it off, set L2PERT=.FALSE. To remove it from the
1263* code, remove the action under L2PERT=.TRUE., leave the ELSE part.
1264* The following two loops should be blocked and fused with the
1265* transposed copy above.
1266*
1267 IF ( l2pert ) THEN
1268 xsc = dsqrt(small)
1269 DO 2969 q = 1, nr
1270 temp1 = xsc*dabs( v(q,q) )
1271 DO 2968 p = 1, n
1272 IF ( ( p .GT. q ) .AND. ( dabs(v(p,q)) .LE. temp1 )
1273 $ .OR. ( p .LT. q ) )
1274 $ v(p,q) = dsign( temp1, v(p,q) )
1275 IF ( p .LT. q ) v(p,q) = - v(p,q)
1276 2968 CONTINUE
1277 2969 CONTINUE
1278 ELSE
1279 CALL dlaset( 'U', nr-1, nr-1, zero, zero, v(1,2), ldv )
1280 END IF
1281*
1282* Estimate the row scaled condition number of R1
1283* (If R1 is rectangular, N > NR, then the condition number
1284* of the leading NR x NR submatrix is estimated.)
1285*
1286 CALL dlacpy( 'L', nr, nr, v, ldv, work(2*n+1), nr )
1287 DO 3950 p = 1, nr
1288 temp1 = dnrm2(nr-p+1,work(2*n+(p-1)*nr+p),1)
1289 CALL dscal(nr-p+1,one/temp1,work(2*n+(p-1)*nr+p),1)
1290 3950 CONTINUE
1291 CALL dpocon('Lower',nr,work(2*n+1),nr,one,temp1,
1292 $ work(2*n+nr*nr+1),iwork(m+2*n+1),ierr)
1293 condr1 = one / dsqrt(temp1)
1294* .. here need a second opinion on the condition number
1295* .. then assume worst case scenario
1296* R1 is OK for inverse <=> CONDR1 .LT. DBLE(N)
1297* more conservative <=> CONDR1 .LT. DSQRT(DBLE(N))
1298*
1299 cond_ok = dsqrt(dble(nr))
1300*[TP] COND_OK is a tuning parameter.
1301
1302 IF ( condr1 .LT. cond_ok ) THEN
1303* .. the second QRF without pivoting. Note: in an optimized
1304* implementation, this QRF should be implemented as the QRF
1305* of a lower triangular matrix.
1306* R1^t = Q2 * R2
1307 CALL dgeqrf( n, nr, v, ldv, work(n+1), work(2*n+1),
1308 $ lwork-2*n, ierr )
1309*
1310 IF ( l2pert ) THEN
1311 xsc = dsqrt(small)/epsln
1312 DO 3959 p = 2, nr
1313 DO 3958 q = 1, p - 1
1314 temp1 = xsc * min(dabs(v(p,p)),dabs(v(q,q)))
1315 IF ( dabs(v(q,p)) .LE. temp1 )
1316 $ v(q,p) = dsign( temp1, v(q,p) )
1317 3958 CONTINUE
1318 3959 CONTINUE
1319 END IF
1320*
1321 IF ( nr .NE. n )
1322 $ CALL dlacpy( 'A', n, nr, v, ldv, work(2*n+1), n )
1323* .. save ...
1324*
1325* .. this transposed copy should be better than naive
1326 DO 1969 p = 1, nr - 1
1327 CALL dcopy( nr-p, v(p,p+1), ldv, v(p+1,p), 1 )
1328 1969 CONTINUE
1329*
1330 condr2 = condr1
1331*
1332 ELSE
1333*
1334* .. ill-conditioned case: second QRF with pivoting
1335* Note that windowed pivoting would be equally good
1336* numerically, and more run-time efficient. So, in
1337* an optimal implementation, the next call to DGEQP3
1338* should be replaced with eg. CALL SGEQPX (ACM TOMS #782)
1339* with properly (carefully) chosen parameters.
1340*
1341* R1^t * P2 = Q2 * R2
1342 DO 3003 p = 1, nr
1343 iwork(n+p) = 0
1344 3003 CONTINUE
1345 CALL dgeqp3( n, nr, v, ldv, iwork(n+1), work(n+1),
1346 $ work(2*n+1), lwork-2*n, ierr )
1347** CALL DGEQRF( N, NR, V, LDV, WORK(N+1), WORK(2*N+1),
1348** $ LWORK-2*N, IERR )
1349 IF ( l2pert ) THEN
1350 xsc = dsqrt(small)
1351 DO 3969 p = 2, nr
1352 DO 3968 q = 1, p - 1
1353 temp1 = xsc * min(dabs(v(p,p)),dabs(v(q,q)))
1354 IF ( dabs(v(q,p)) .LE. temp1 )
1355 $ v(q,p) = dsign( temp1, v(q,p) )
1356 3968 CONTINUE
1357 3969 CONTINUE
1358 END IF
1359*
1360 CALL dlacpy( 'A', n, nr, v, ldv, work(2*n+1), n )
1361*
1362 IF ( l2pert ) THEN
1363 xsc = dsqrt(small)
1364 DO 8970 p = 2, nr
1365 DO 8971 q = 1, p - 1
1366 temp1 = xsc * min(dabs(v(p,p)),dabs(v(q,q)))
1367 v(p,q) = - dsign( temp1, v(q,p) )
1368 8971 CONTINUE
1369 8970 CONTINUE
1370 ELSE
1371 CALL dlaset( 'L',nr-1,nr-1,zero,zero,v(2,1),ldv )
1372 END IF
1373* Now, compute R2 = L3 * Q3, the LQ factorization.
1374 CALL dgelqf( nr, nr, v, ldv, work(2*n+n*nr+1),
1375 $ work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, ierr )
1376* .. and estimate the condition number
1377 CALL dlacpy( 'L',nr,nr,v,ldv,work(2*n+n*nr+nr+1),nr )
1378 DO 4950 p = 1, nr
1379 temp1 = dnrm2( p, work(2*n+n*nr+nr+p), nr )
1380 CALL dscal( p, one/temp1, work(2*n+n*nr+nr+p), nr )
1381 4950 CONTINUE
1382 CALL dpocon( 'L',nr,work(2*n+n*nr+nr+1),nr,one,temp1,
1383 $ work(2*n+n*nr+nr+nr*nr+1),iwork(m+2*n+1),ierr )
1384 condr2 = one / dsqrt(temp1)
1385*
1386 IF ( condr2 .GE. cond_ok ) THEN
1387* .. save the Householder vectors used for Q3
1388* (this overwrites the copy of R2, as it will not be
1389* needed in this branch, but it does not overwrite the
1390* Huseholder vectors of Q2.).
1391 CALL dlacpy( 'U', nr, nr, v, ldv, work(2*n+1), n )
1392* .. and the rest of the information on Q3 is in
1393* WORK(2*N+N*NR+1:2*N+N*NR+N)
1394 END IF
1395*
1396 END IF
1397*
1398 IF ( l2pert ) THEN
1399 xsc = dsqrt(small)
1400 DO 4968 q = 2, nr
1401 temp1 = xsc * v(q,q)
1402 DO 4969 p = 1, q - 1
1403* V(p,q) = - DSIGN( TEMP1, V(q,p) )
1404 v(p,q) = - dsign( temp1, v(p,q) )
1405 4969 CONTINUE
1406 4968 CONTINUE
1407 ELSE
1408 CALL dlaset( 'U', nr-1,nr-1, zero,zero, v(1,2), ldv )
1409 END IF
1410*
1411* Second preconditioning finished; continue with Jacobi SVD
1412* The input matrix is lower triangular.
1413*
1414* Recover the right singular vectors as solution of a well
1415* conditioned triangular matrix equation.
1416*
1417 IF ( condr1 .LT. cond_ok ) THEN
1418*
1419 CALL dgesvj( 'L','U','N',nr,nr,v,ldv,sva,nr,u,
1420 $ ldu,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,info )
1421 scalem = work(2*n+n*nr+nr+1)
1422 numrank = idnint(work(2*n+n*nr+nr+2))
1423 DO 3970 p = 1, nr
1424 CALL dcopy( nr, v(1,p), 1, u(1,p), 1 )
1425 CALL dscal( nr, sva(p), v(1,p), 1 )
1426 3970 CONTINUE
1427
1428* .. pick the right matrix equation and solve it
1429*
1430 IF ( nr .EQ. n ) THEN
1431* :)) .. best case, R1 is inverted. The solution of this matrix
1432* equation is Q2*V2 = the product of the Jacobi rotations
1433* used in DGESVJ, premultiplied with the orthogonal matrix
1434* from the second QR factorization.
1435 CALL dtrsm( 'L','U','N','N', nr,nr,one, a,lda, v,ldv )
1436 ELSE
1437* .. R1 is well conditioned, but non-square. Transpose(R2)
1438* is inverted to get the product of the Jacobi rotations
1439* used in DGESVJ. The Q-factor from the second QR
1440* factorization is then built in explicitly.
1441 CALL dtrsm('L','U','T','N',nr,nr,one,work(2*n+1),
1442 $ n,v,ldv)
1443 IF ( nr .LT. n ) THEN
1444 CALL dlaset('A',n-nr,nr,zero,zero,v(nr+1,1),ldv)
1445 CALL dlaset('A',nr,n-nr,zero,zero,v(1,nr+1),ldv)
1446 CALL dlaset('A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv)
1447 END IF
1448 CALL dormqr('L','N',n,n,nr,work(2*n+1),n,work(n+1),
1449 $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr)
1450 END IF
1451*
1452 ELSE IF ( condr2 .LT. cond_ok ) THEN
1453*
1454* :) .. the input matrix A is very likely a relative of
1455* the Kahan matrix :)
1456* The matrix R2 is inverted. The solution of the matrix equation
1457* is Q3^T*V3 = the product of the Jacobi rotations (applied to
1458* the lower triangular L3 from the LQ factorization of
1459* R2=L3*Q3), pre-multiplied with the transposed Q3.
1460 CALL dgesvj( 'L', 'U', 'N', nr, nr, v, ldv, sva, nr, u,
1461 $ ldu, work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, info )
1462 scalem = work(2*n+n*nr+nr+1)
1463 numrank = idnint(work(2*n+n*nr+nr+2))
1464 DO 3870 p = 1, nr
1465 CALL dcopy( nr, v(1,p), 1, u(1,p), 1 )
1466 CALL dscal( nr, sva(p), u(1,p), 1 )
1467 3870 CONTINUE
1468 CALL dtrsm('L','U','N','N',nr,nr,one,work(2*n+1),n,u,ldu)
1469* .. apply the permutation from the second QR factorization
1470 DO 873 q = 1, nr
1471 DO 872 p = 1, nr
1472 work(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1473 872 CONTINUE
1474 DO 874 p = 1, nr
1475 u(p,q) = work(2*n+n*nr+nr+p)
1476 874 CONTINUE
1477 873 CONTINUE
1478 IF ( nr .LT. n ) THEN
1479 CALL dlaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1480 CALL dlaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1481 CALL dlaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1482 END IF
1483 CALL dormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1484 $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1485 ELSE
1486* Last line of defense.
1487* #:( This is a rather pathological case: no scaled condition
1488* improvement after two pivoted QR factorizations. Other
1489* possibility is that the rank revealing QR factorization
1490* or the condition estimator has failed, or the COND_OK
1491* is set very close to ONE (which is unnecessary). Normally,
1492* this branch should never be executed, but in rare cases of
1493* failure of the RRQR or condition estimator, the last line of
1494* defense ensures that DGEJSV completes the task.
1495* Compute the full SVD of L3 using DGESVJ with explicit
1496* accumulation of Jacobi rotations.
1497 CALL dgesvj( 'L', 'U', 'V', nr, nr, v, ldv, sva, nr, u,
1498 $ ldu, work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, info )
1499 scalem = work(2*n+n*nr+nr+1)
1500 numrank = idnint(work(2*n+n*nr+nr+2))
1501 IF ( nr .LT. n ) THEN
1502 CALL dlaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1503 CALL dlaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1504 CALL dlaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1505 END IF
1506 CALL dormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1507 $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1508*
1509 CALL dormlq( 'L', 'T', nr, nr, nr, work(2*n+1), n,
1510 $ work(2*n+n*nr+1), u, ldu, work(2*n+n*nr+nr+1),
1511 $ lwork-2*n-n*nr-nr, ierr )
1512 DO 773 q = 1, nr
1513 DO 772 p = 1, nr
1514 work(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1515 772 CONTINUE
1516 DO 774 p = 1, nr
1517 u(p,q) = work(2*n+n*nr+nr+p)
1518 774 CONTINUE
1519 773 CONTINUE
1520*
1521 END IF
1522*
1523* Permute the rows of V using the (column) permutation from the
1524* first QRF. Also, scale the columns to make them unit in
1525* Euclidean norm. This applies to all cases.
1526*
1527 temp1 = dsqrt(dble(n)) * epsln
1528 DO 1972 q = 1, n
1529 DO 972 p = 1, n
1530 work(2*n+n*nr+nr+iwork(p)) = v(p,q)
1531 972 CONTINUE
1532 DO 973 p = 1, n
1533 v(p,q) = work(2*n+n*nr+nr+p)
1534 973 CONTINUE
1535 xsc = one / dnrm2( n, v(1,q), 1 )
1536 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1537 $ CALL dscal( n, xsc, v(1,q), 1 )
1538 1972 CONTINUE
1539* At this moment, V contains the right singular vectors of A.
1540* Next, assemble the left singular vector matrix U (M x N).
1541 IF ( nr .LT. m ) THEN
1542 CALL dlaset( 'A', m-nr, nr, zero, zero, u(nr+1,1), ldu )
1543 IF ( nr .LT. n1 ) THEN
1544 CALL dlaset('A',nr,n1-nr,zero,zero,u(1,nr+1),ldu)
1545 CALL dlaset('A',m-nr,n1-nr,zero,one,u(nr+1,nr+1),ldu)
1546 END IF
1547 END IF
1548*
1549* The Q matrix from the first QRF is built into the left singular
1550* matrix U. This applies to all cases.
1551*
1552 CALL dormqr( 'Left', 'No_Tr', m, n1, n, a, lda, work, u,
1553 $ ldu, work(n+1), lwork-n, ierr )
1554
1555* The columns of U are normalized. The cost is O(M*N) flops.
1556 temp1 = dsqrt(dble(m)) * epsln
1557 DO 1973 p = 1, nr
1558 xsc = one / dnrm2( m, u(1,p), 1 )
1559 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1560 $ CALL dscal( m, xsc, u(1,p), 1 )
1561 1973 CONTINUE
1562*
1563* If the initial QRF is computed with row pivoting, the left
1564* singular vectors must be adjusted.
1565*
1566 IF ( rowpiv )
1567 $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1568*
1569 ELSE
1570*
1571* .. the initial matrix A has almost orthogonal columns and
1572* the second QRF is not needed
1573*
1574 CALL dlacpy( 'Upper', n, n, a, lda, work(n+1), n )
1575 IF ( l2pert ) THEN
1576 xsc = dsqrt(small)
1577 DO 5970 p = 2, n
1578 temp1 = xsc * work( n + (p-1)*n + p )
1579 DO 5971 q = 1, p - 1
1580 work(n+(q-1)*n+p)=-dsign(temp1,work(n+(p-1)*n+q))
1581 5971 CONTINUE
1582 5970 CONTINUE
1583 ELSE
1584 CALL dlaset( 'Lower',n-1,n-1,zero,zero,work(n+2),n )
1585 END IF
1586*
1587 CALL dgesvj( 'Upper', 'U', 'N', n, n, work(n+1), n, sva,
1588 $ n, u, ldu, work(n+n*n+1), lwork-n-n*n, info )
1589*
1590 scalem = work(n+n*n+1)
1591 numrank = idnint(work(n+n*n+2))
1592 DO 6970 p = 1, n
1593 CALL dcopy( n, work(n+(p-1)*n+1), 1, u(1,p), 1 )
1594 CALL dscal( n, sva(p), work(n+(p-1)*n+1), 1 )
1595 6970 CONTINUE
1596*
1597 CALL dtrsm( 'Left', 'Upper', 'NoTrans', 'No UD', n, n,
1598 $ one, a, lda, work(n+1), n )
1599 DO 6972 p = 1, n
1600 CALL dcopy( n, work(n+p), n, v(iwork(p),1), ldv )
1601 6972 CONTINUE
1602 temp1 = dsqrt(dble(n))*epsln
1603 DO 6971 p = 1, n
1604 xsc = one / dnrm2( n, v(1,p), 1 )
1605 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1606 $ CALL dscal( n, xsc, v(1,p), 1 )
1607 6971 CONTINUE
1608*
1609* Assemble the left singular vector matrix U (M x N).
1610*
1611 IF ( n .LT. m ) THEN
1612 CALL dlaset( 'A', m-n, n, zero, zero, u(n+1,1), ldu )
1613 IF ( n .LT. n1 ) THEN
1614 CALL dlaset( 'A',n, n1-n, zero, zero, u(1,n+1),ldu )
1615 CALL dlaset( 'A',m-n,n1-n, zero, one,u(n+1,n+1),ldu )
1616 END IF
1617 END IF
1618 CALL dormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1619 $ ldu, work(n+1), lwork-n, ierr )
1620 temp1 = dsqrt(dble(m))*epsln
1621 DO 6973 p = 1, n1
1622 xsc = one / dnrm2( m, u(1,p), 1 )
1623 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1624 $ CALL dscal( m, xsc, u(1,p), 1 )
1625 6973 CONTINUE
1626*
1627 IF ( rowpiv )
1628 $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1629*
1630 END IF
1631*
1632* end of the >> almost orthogonal case << in the full SVD
1633*
1634 ELSE
1635*
1636* This branch deploys a preconditioned Jacobi SVD with explicitly
1637* accumulated rotations. It is included as optional, mainly for
1638* experimental purposes. It does perform well, and can also be used.
1639* In this implementation, this branch will be automatically activated
1640* if the condition number sigma_max(A) / sigma_min(A) is predicted
1641* to be greater than the overflow threshold. This is because the
1642* a posteriori computation of the singular vectors assumes robust
1643* implementation of BLAS and some LAPACK procedures, capable of working
1644* in presence of extreme values. Since that is not always the case, ...
1645*
1646 DO 7968 p = 1, nr
1647 CALL dcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1648 7968 CONTINUE
1649*
1650 IF ( l2pert ) THEN
1651 xsc = dsqrt(small/epsln)
1652 DO 5969 q = 1, nr
1653 temp1 = xsc*dabs( v(q,q) )
1654 DO 5968 p = 1, n
1655 IF ( ( p .GT. q ) .AND. ( dabs(v(p,q)) .LE. temp1 )
1656 $ .OR. ( p .LT. q ) )
1657 $ v(p,q) = dsign( temp1, v(p,q) )
1658 IF ( p .LT. q ) v(p,q) = - v(p,q)
1659 5968 CONTINUE
1660 5969 CONTINUE
1661 ELSE
1662 CALL dlaset( 'U', nr-1, nr-1, zero, zero, v(1,2), ldv )
1663 END IF
1664
1665 CALL dgeqrf( n, nr, v, ldv, work(n+1), work(2*n+1),
1666 $ lwork-2*n, ierr )
1667 CALL dlacpy( 'L', n, nr, v, ldv, work(2*n+1), n )
1668*
1669 DO 7969 p = 1, nr
1670 CALL dcopy( nr-p+1, v(p,p), ldv, u(p,p), 1 )
1671 7969 CONTINUE
1672
1673 IF ( l2pert ) THEN
1674 xsc = dsqrt(small/epsln)
1675 DO 9970 q = 2, nr
1676 DO 9971 p = 1, q - 1
1677 temp1 = xsc * min(dabs(u(p,p)),dabs(u(q,q)))
1678 u(p,q) = - dsign( temp1, u(q,p) )
1679 9971 CONTINUE
1680 9970 CONTINUE
1681 ELSE
1682 CALL dlaset('U', nr-1, nr-1, zero, zero, u(1,2), ldu )
1683 END IF
1684
1685 CALL dgesvj( 'G', 'U', 'V', nr, nr, u, ldu, sva,
1686 $ n, v, ldv, work(2*n+n*nr+1), lwork-2*n-n*nr, info )
1687 scalem = work(2*n+n*nr+1)
1688 numrank = idnint(work(2*n+n*nr+2))
1689
1690 IF ( nr .LT. n ) THEN
1691 CALL dlaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1692 CALL dlaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1693 CALL dlaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1694 END IF
1695
1696 CALL dormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1697 $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1698*
1699* Permute the rows of V using the (column) permutation from the
1700* first QRF. Also, scale the columns to make them unit in
1701* Euclidean norm. This applies to all cases.
1702*
1703 temp1 = dsqrt(dble(n)) * epsln
1704 DO 7972 q = 1, n
1705 DO 8972 p = 1, n
1706 work(2*n+n*nr+nr+iwork(p)) = v(p,q)
1707 8972 CONTINUE
1708 DO 8973 p = 1, n
1709 v(p,q) = work(2*n+n*nr+nr+p)
1710 8973 CONTINUE
1711 xsc = one / dnrm2( n, v(1,q), 1 )
1712 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1713 $ CALL dscal( n, xsc, v(1,q), 1 )
1714 7972 CONTINUE
1715*
1716* At this moment, V contains the right singular vectors of A.
1717* Next, assemble the left singular vector matrix U (M x N).
1718*
1719 IF ( nr .LT. m ) THEN
1720 CALL dlaset( 'A', m-nr, nr, zero, zero, u(nr+1,1), ldu )
1721 IF ( nr .LT. n1 ) THEN
1722 CALL dlaset( 'A',nr, n1-nr, zero, zero, u(1,nr+1),ldu )
1723 CALL dlaset( 'A',m-nr,n1-nr, zero, one,u(nr+1,nr+1),ldu )
1724 END IF
1725 END IF
1726*
1727 CALL dormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1728 $ ldu, work(n+1), lwork-n, ierr )
1729*
1730 IF ( rowpiv )
1731 $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1732*
1733*
1734 END IF
1735 IF ( transp ) THEN
1736* .. swap U and V because the procedure worked on A^t
1737 DO 6974 p = 1, n
1738 CALL dswap( n, u(1,p), 1, v(1,p), 1 )
1739 6974 CONTINUE
1740 END IF
1741*
1742 END IF
1743* end of the full SVD
1744*
1745* Undo scaling, if necessary (and possible)
1746*
1747 IF ( uscal2 .LE. (big/sva(1))*uscal1 ) THEN
1748 CALL dlascl( 'G', 0, 0, uscal1, uscal2, nr, 1, sva, n, ierr )
1749 uscal1 = one
1750 uscal2 = one
1751 END IF
1752*
1753 IF ( nr .LT. n ) THEN
1754 DO 3004 p = nr+1, n
1755 sva(p) = zero
1756 3004 CONTINUE
1757 END IF
1758*
1759 work(1) = uscal2 * scalem
1760 work(2) = uscal1
1761 IF ( errest ) work(3) = sconda
1762 IF ( lsvec .AND. rsvec ) THEN
1763 work(4) = condr1
1764 work(5) = condr2
1765 END IF
1766 IF ( l2tran ) THEN
1767 work(6) = entra
1768 work(7) = entrat
1769 END IF
1770*
1771 iwork(1) = nr
1772 iwork(2) = numrank
1773 iwork(3) = warning
1774*
1775 RETURN
1776* ..
1777* .. END OF DGEJSV
1778* ..
subroutine xerbla(srname, info)
Definition cblat2.f:3285
subroutine dcopy(n, dx, incx, dy, incy)
DCOPY
Definition dcopy.f:82
subroutine dgelqf(m, n, a, lda, tau, work, lwork, info)
DGELQF
Definition dgelqf.f:143
subroutine dgeqp3(m, n, a, lda, jpvt, tau, work, lwork, info)
DGEQP3
Definition dgeqp3.f:151
subroutine dgeqrf(m, n, a, lda, tau, work, lwork, info)
DGEQRF
Definition dgeqrf.f:146
subroutine dgesvj(joba, jobu, jobv, m, n, a, lda, sva, mv, v, ldv, work, lwork, info)
DGESVJ
Definition dgesvj.f:337
integer function idamax(n, dx, incx)
IDAMAX
Definition idamax.f:71
subroutine dlacpy(uplo, m, n, a, lda, b, ldb)
DLACPY copies all or part of one two-dimensional array to another.
Definition dlacpy.f:103
double precision function dlamch(cmach)
DLAMCH
Definition dlamch.f:69
subroutine dlascl(type, kl, ku, cfrom, cto, m, n, a, lda, info)
DLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition dlascl.f:143
subroutine dlaset(uplo, m, n, alpha, beta, a, lda)
DLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values.
Definition dlaset.f:110
subroutine dlassq(n, x, incx, scale, sumsq)
DLASSQ updates a sum of squares represented in scaled form.
Definition dlassq.f90:124
subroutine dlaswp(n, a, lda, k1, k2, ipiv, incx)
DLASWP performs a series of row interchanges on a general rectangular matrix.
Definition dlaswp.f:115
logical function lsame(ca, cb)
LSAME
Definition lsame.f:48
real(wp) function dnrm2(n, x, incx)
DNRM2
Definition dnrm2.f90:89
subroutine dpocon(uplo, n, a, lda, anorm, rcond, work, iwork, info)
DPOCON
Definition dpocon.f:121
subroutine dscal(n, da, dx, incx)
DSCAL
Definition dscal.f:79
subroutine dswap(n, dx, incx, dy, incy)
DSWAP
Definition dswap.f:82
subroutine dtrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
DTRSM
Definition dtrsm.f:181
subroutine dorgqr(m, n, k, a, lda, tau, work, lwork, info)
DORGQR
Definition dorgqr.f:128
subroutine dormlq(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)
DORMLQ
Definition dormlq.f:167
subroutine dormqr(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)
DORMQR
Definition dormqr.f:167
Here is the call graph for this function:
Here is the caller graph for this function: