LAPACK  3.6.0
LAPACK: Linear Algebra PACKage
Collaboration diagram for double:

Functions

subroutine dgejsv (JOBA, JOBU, JOBV, JOBR, JOBT, JOBP, M, N, A, LDA, SVA, U, LDU, V, LDV, WORK, LWORK, IWORK, INFO)
 DGEJSV More...
 
subroutine dgesdd (JOBZ, M, N, A, LDA, S, U, LDU, VT, LDVT, WORK, LWORK, IWORK, INFO)
 DGESDD More...
 
subroutine dgesvd (JOBU, JOBVT, M, N, A, LDA, S, U, LDU, VT, LDVT, WORK, LWORK, INFO)
  DGESVD computes the singular value decomposition (SVD) for GE matrices More...
 
subroutine dgesvdx (JOBU, JOBVT, RANGE, M, N, A, LDA, VL, VU, IL, IU, NS, S, U, LDU, VT, LDVT, WORK, LWORK, IWORK, INFO)
  DGESVDX computes the singular value decomposition (SVD) for GE matrices More...
 

Detailed Description

This is the group of double singular value driver functions for GE matrices

Function Documentation

subroutine dgejsv ( character*1  JOBA,
character*1  JOBU,
character*1  JOBV,
character*1  JOBR,
character*1  JOBT,
character*1  JOBP,
integer  M,
integer  N,
double precision, dimension( lda, * )  A,
integer  LDA,
double precision, dimension( n )  SVA,
double precision, dimension( ldu, * )  U,
integer  LDU,
double precision, dimension( ldv, * )  V,
integer  LDV,
double precision, dimension( lwork )  WORK,
integer  LWORK,
integer, dimension( * )  IWORK,
integer  INFO 
)

DGEJSV

Download DGEJSV + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 DGEJSV computes the singular value decomposition (SVD) of a real M-by-N
 matrix [A], where M >= N. The SVD of [A] is written as

              [A] = [U] * [SIGMA] * [V]^t,

 where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
 diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix, and
 [V] is an N-by-N orthogonal matrix. The diagonal elements of [SIGMA] are
 the singular values of [A]. The columns of [U] and [V] are the left and
 the right singular vectors of [A], respectively. The matrices [U] and [V]
 are computed and stored in the arrays U and V, respectively. The diagonal
 of [SIGMA] is computed and stored in the array SVA.
 DGEJSV can sometimes compute tiny singular values and their singular vectors much
 more accurately than other SVD routines, see below under Further Details.*> 
Parameters
[in]JOBA
          JOBA is CHARACTER*1
        Specifies the level of accuracy:
       = 'C': This option works well (high relative accuracy) if A = B * D,
             with well-conditioned B and arbitrary diagonal matrix D.
             The accuracy cannot be spoiled by COLUMN scaling. The
             accuracy of the computed output depends on the condition of
             B, and the procedure aims at the best theoretical accuracy.
             The relative error max_{i=1:N}|d sigma_i| / sigma_i is
             bounded by f(M,N)*epsilon* cond(B), independent of D.
             The input matrix is preprocessed with the QRF with column
             pivoting. This initial preprocessing and preconditioning by
             a rank revealing QR factorization is common for all values of
             JOBA. Additional actions are specified as follows:
       = 'E': Computation as with 'C' with an additional estimate of the
             condition number of B. It provides a realistic error bound.
       = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
             D1, D2, and well-conditioned matrix C, this option gives
             higher accuracy than the 'C' option. If the structure of the
             input matrix is not known, and relative accuracy is
             desirable, then this option is advisable. The input matrix A
             is preprocessed with QR factorization with FULL (row and
             column) pivoting.
       = 'G'  Computation as with 'F' with an additional estimate of the
             condition number of B, where A=D*B. If A has heavily weighted
             rows, then using this condition number gives too pessimistic
             error bound.
       = 'A': Small singular values are the noise and the matrix is treated
             as numerically rank defficient. The error in the computed
             singular values is bounded by f(m,n)*epsilon*||A||.
             The computed SVD A = U * S * V^t restores A up to
             f(m,n)*epsilon*||A||.
             This gives the procedure the licence to discard (set to zero)
             all singular values below N*epsilon*||A||.
       = 'R': Similar as in 'A'. Rank revealing property of the initial
             QR factorization is used do reveal (using triangular factor)
             a gap sigma_{r+1} < epsilon * sigma_r in which case the
             numerical RANK is declared to be r. The SVD is computed with
             absolute error bounds, but more accurately than with 'A'.
[in]JOBU
          JOBU is CHARACTER*1
        Specifies whether to compute the columns of U:
       = 'U': N columns of U are returned in the array U.
       = 'F': full set of M left sing. vectors is returned in the array U.
       = 'W': U may be used as workspace of length M*N. See the description
             of U.
       = 'N': U is not computed.
[in]JOBV
          JOBV is CHARACTER*1
        Specifies whether to compute the matrix V:
       = 'V': N columns of V are returned in the array V; Jacobi rotations
             are not explicitly accumulated.
       = 'J': N columns of V are returned in the array V, but they are
             computed as the product of Jacobi rotations. This option is
             allowed only if JOBU .NE. 'N', i.e. in computing the full SVD.
       = 'W': V may be used as workspace of length N*N. See the description
             of V.
       = 'N': V is not computed.
[in]JOBR
          JOBR is CHARACTER*1
        Specifies the RANGE for the singular values. Issues the licence to
        set to zero small positive singular values if they are outside
        specified range. If A .NE. 0 is scaled so that the largest singular
        value of c*A is around DSQRT(BIG), BIG=SLAMCH('O'), then JOBR issues
        the licence to kill columns of A whose norm in c*A is less than
        DSQRT(SFMIN) (for JOBR.EQ.'R'), or less than SMALL=SFMIN/EPSLN,
        where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E').
       = 'N': Do not kill small columns of c*A. This option assumes that
             BLAS and QR factorizations and triangular solvers are
             implemented to work in that range. If the condition of A
             is greater than BIG, use DGESVJ.
       = 'R': RESTRICTED range for sigma(c*A) is [DSQRT(SFMIN), DSQRT(BIG)]
             (roughly, as described above). This option is recommended.
                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~
        For computing the singular values in the FULL range [SFMIN,BIG]
        use DGESVJ.
[in]JOBT
          JOBT is CHARACTER*1
        If the matrix is square then the procedure may determine to use
        transposed A if A^t seems to be better with respect to convergence.
        If the matrix is not square, JOBT is ignored. This is subject to
        changes in the future.
        The decision is based on two values of entropy over the adjoint
        orbit of A^t * A. See the descriptions of WORK(6) and WORK(7).
       = 'T': transpose if entropy test indicates possibly faster
        convergence of Jacobi process if A^t is taken as input. If A is
        replaced with A^t, then the row pivoting is included automatically.
       = 'N': do not speculate.
        This option can be used to compute only the singular values, or the
        full SVD (U, SIGMA and V). For only one set of singular vectors
        (U or V), the caller should provide both U and V, as one of the
        matrices is used as workspace if the matrix A is transposed.
        The implementer can easily remove this constraint and make the
        code more complicated. See the descriptions of U and V.
[in]JOBP
          JOBP is CHARACTER*1
        Issues the licence to introduce structured perturbations to drown
        denormalized numbers. This licence should be active if the
        denormals are poorly implemented, causing slow computation,
        especially in cases of fast convergence (!). For details see [1,2].
        For the sake of simplicity, this perturbations are included only
        when the full SVD or only the singular values are requested. The
        implementer/user can easily add the perturbation for the cases of
        computing one set of singular vectors.
       = 'P': introduce perturbation
       = 'N': do not perturb
[in]M
          M is INTEGER
         The number of rows of the input matrix A.  M >= 0.
[in]N
          N is INTEGER
         The number of columns of the input matrix A. M >= N >= 0.
[in,out]A
          A is DOUBLE PRECISION array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]SVA
          SVA is DOUBLE PRECISION array, dimension (N)
          On exit,
          - For WORK(1)/WORK(2) = ONE: The singular values of A. During the
            computation SVA contains Euclidean column norms of the
            iterated matrices in the array A.
          - For WORK(1) .NE. WORK(2): The singular values of A are
            (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
            sigma_max(A) overflows or if small singular values have been
            saved from underflow by scaling the input matrix A.
          - If JOBR='R' then some of the singular values may be returned
            as exact zeros obtained by "set to zero" because they are
            below the numerical rank threshold or are denormalized numbers.
[out]U
          U is DOUBLE PRECISION array, dimension ( LDU, N )
          If JOBU = 'U', then U contains on exit the M-by-N matrix of
                         the left singular vectors.
          If JOBU = 'F', then U contains on exit the M-by-M matrix of
                         the left singular vectors, including an ONB
                         of the orthogonal complement of the Range(A).
          If JOBU = 'W'  .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N),
                         then U is used as workspace if the procedure
                         replaces A with A^t. In that case, [V] is computed
                         in U as left singular vectors of A^t and then
                         copied back to the V array. This 'W' option is just
                         a reminder to the caller that in this case U is
                         reserved as workspace of length N*N.
          If JOBU = 'N'  U is not referenced.
[in]LDU
          LDU is INTEGER
          The leading dimension of the array U,  LDU >= 1.
          IF  JOBU = 'U' or 'F' or 'W',  then LDU >= M.
[out]V
          V is DOUBLE PRECISION array, dimension ( LDV, N )
          If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
                         the right singular vectors;
          If JOBV = 'W', AND (JOBU.EQ.'U' AND JOBT.EQ.'T' AND M.EQ.N),
                         then V is used as workspace if the pprocedure
                         replaces A with A^t. In that case, [U] is computed
                         in V as right singular vectors of A^t and then
                         copied back to the U array. This 'W' option is just
                         a reminder to the caller that in this case V is
                         reserved as workspace of length N*N.
          If JOBV = 'N'  V is not referenced.
[in]LDV
          LDV is INTEGER
          The leading dimension of the array V,  LDV >= 1.
          If JOBV = 'V' or 'J' or 'W', then LDV >= N.
[out]WORK
          WORK is DOUBLE PRECISION array, dimension at least LWORK.
          On exit, if N.GT.0 .AND. M.GT.0 (else not referenced), 
          WORK(1) = SCALE = WORK(2) / WORK(1) is the scaling factor such
                    that SCALE*SVA(1:N) are the computed singular values
                    of A. (See the description of SVA().)
          WORK(2) = See the description of WORK(1).
          WORK(3) = SCONDA is an estimate for the condition number of
                    column equilibrated A. (If JOBA .EQ. 'E' or 'G')
                    SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1).
                    It is computed using DPOCON. It holds
                    N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
                    where R is the triangular factor from the QRF of A.
                    However, if R is truncated and the numerical rank is
                    determined to be strictly smaller than N, SCONDA is
                    returned as -1, thus indicating that the smallest
                    singular values might be lost.

          If full SVD is needed, the following two condition numbers are
          useful for the analysis of the algorithm. They are provied for
          a developer/implementer who is familiar with the details of
          the method.

          WORK(4) = an estimate of the scaled condition number of the
                    triangular factor in the first QR factorization.
          WORK(5) = an estimate of the scaled condition number of the
                    triangular factor in the second QR factorization.
          The following two parameters are computed if JOBT .EQ. 'T'.
          They are provided for a developer/implementer who is familiar
          with the details of the method.

          WORK(6) = the entropy of A^t*A :: this is the Shannon entropy
                    of diag(A^t*A) / Trace(A^t*A) taken as point in the
                    probability simplex.
          WORK(7) = the entropy of A*A^t.
[in]LWORK
          LWORK is INTEGER
          Length of WORK to confirm proper allocation of work space.
          LWORK depends on the job:

          If only SIGMA is needed ( JOBU.EQ.'N', JOBV.EQ.'N' ) and
            -> .. no scaled condition estimate required (JOBE.EQ.'N'):
               LWORK >= max(2*M+N,4*N+1,7). This is the minimal requirement.
               ->> For optimal performance (blocked code) the optimal value
               is LWORK >= max(2*M+N,3*N+(N+1)*NB,7). Here NB is the optimal
               block size for DGEQP3 and DGEQRF.
               In general, optimal LWORK is computed as 
               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), 7).        
            -> .. an estimate of the scaled condition number of A is
               required (JOBA='E', 'G'). In this case, LWORK is the maximum
               of the above and N*N+4*N, i.e. LWORK >= max(2*M+N,N*N+4*N,7).
               ->> For optimal performance (blocked code) the optimal value 
               is LWORK >= max(2*M+N,3*N+(N+1)*NB, N*N+4*N, 7).
               In general, the optimal length LWORK is computed as
               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), 
                                                     N+N*N+LWORK(DPOCON),7).

          If SIGMA and the right singular vectors are needed (JOBV.EQ.'V'),
            -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
            -> For optimal performance, LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
               where NB is the optimal block size for DGEQP3, DGEQRF, DGELQ,
               DORMLQ. In general, the optimal length LWORK is computed as
               LWORK >= max(2*M+N,N+LWORK(DGEQP3), N+LWORK(DPOCON), 
                       N+LWORK(DGELQ), 2*N+LWORK(DGEQRF), N+LWORK(DORMLQ)).

          If SIGMA and the left singular vectors are needed
            -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
            -> For optimal performance:
               if JOBU.EQ.'U' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
               if JOBU.EQ.'F' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,N+M*NB,7),
               where NB is the optimal block size for DGEQP3, DGEQRF, DORMQR.
               In general, the optimal length LWORK is computed as
               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DPOCON),
                        2*N+LWORK(DGEQRF), N+LWORK(DORMQR)). 
               Here LWORK(DORMQR) equals N*NB (for JOBU.EQ.'U') or 
               M*NB (for JOBU.EQ.'F').
               
          If the full SVD is needed: (JOBU.EQ.'U' or JOBU.EQ.'F') and 
            -> if JOBV.EQ.'V'  
               the minimal requirement is LWORK >= max(2*M+N,6*N+2*N*N). 
            -> if JOBV.EQ.'J' the minimal requirement is 
               LWORK >= max(2*M+N, 4*N+N*N,2*N+N*N+6).
            -> For optimal performance, LWORK should be additionally
               larger than N+M*NB, where NB is the optimal block size
               for DORMQR.
[out]IWORK
          IWORK is INTEGER array, dimension M+3*N.
          On exit,
          IWORK(1) = the numerical rank determined after the initial
                     QR factorization with pivoting. See the descriptions
                     of JOBA and JOBR.
          IWORK(2) = the number of the computed nonzero singular values
          IWORK(3) = if nonzero, a warning message:
                     If IWORK(3).EQ.1 then some of the column norms of A
                     were denormalized floats. The requested high accuracy
                     is not warranted by the data.
[out]INFO
          INFO is INTEGER
           < 0  : if INFO = -i, then the i-th argument had an illegal value.
           = 0 :  successfull exit;
           > 0 :  DGEJSV  did not converge in the maximal allowed number
                  of sweeps. The computed values may be inaccurate.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Date
November 2015
Further Details:
  DGEJSV implements a preconditioned Jacobi SVD algorithm. It uses DGEQP3,
  DGEQRF, and DGELQF as preprocessors and preconditioners. Optionally, an
  additional row pivoting can be used as a preprocessor, which in some
  cases results in much higher accuracy. An example is matrix A with the
  structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
  diagonal matrices and C is well-conditioned matrix. In that case, complete
  pivoting in the first QR factorizations provides accuracy dependent on the
  condition number of C, and independent of D1, D2. Such higher accuracy is
  not completely understood theoretically, but it works well in practice.
  Further, if A can be written as A = B*D, with well-conditioned B and some
  diagonal D, then the high accuracy is guaranteed, both theoretically and
  in software, independent of D. For more details see [1], [2].
     The computational range for the singular values can be the full range
  ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
  & LAPACK routines called by DGEJSV are implemented to work in that range.
  If that is not the case, then the restriction for safe computation with
  the singular values in the range of normalized IEEE numbers is that the
  spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
  overflow. This code (DGEJSV) is best used in this restricted range,
  meaning that singular values of magnitude below ||A||_2 / DLAMCH('O') are
  returned as zeros. See JOBR for details on this.
     Further, this implementation is somewhat slower than the one described
  in [1,2] due to replacement of some non-LAPACK components, and because
  the choice of some tuning parameters in the iterative part (DGESVJ) is
  left to the implementer on a particular machine.
     The rank revealing QR factorization (in this code: DGEQP3) should be
  implemented as in [3]. We have a new version of DGEQP3 under development
  that is more robust than the current one in LAPACK, with a cleaner cut in
  rank defficient cases. It will be available in the SIGMA library [4].
  If M is much larger than N, it is obvious that the inital QRF with
  column pivoting can be preprocessed by the QRF without pivoting. That
  well known trick is not used in DGEJSV because in some cases heavy row
  weighting can be treated with complete pivoting. The overhead in cases
  M much larger than N is then only due to pivoting, but the benefits in
  terms of accuracy have prevailed. The implementer/user can incorporate
  this extra QRF step easily. The implementer can also improve data movement
  (matrix transpose, matrix copy, matrix transposed copy) - this
  implementation of DGEJSV uses only the simplest, naive data movement.
Contributors:
Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
References:
 [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
     LAPACK Working note 169.
 [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
     LAPACK Working note 170.
 [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
     factorization software - a case study.
     ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
     LAPACK Working note 176.
 [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
     QSVD, (H,K)-SVD computations.
     Department of Mathematics, University of Zagreb, 2008.
Bugs, examples and comments:
Please report all bugs and send interesting examples and/or comments to drmac.nosp@m.@mat.nosp@m.h.hr. Thank you.

Definition at line 477 of file dgejsv.f.

477 *
478 * -- LAPACK computational routine (version 3.6.0) --
479 * -- LAPACK is a software package provided by Univ. of Tennessee, --
480 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
481 * November 2015
482 *
483 * .. Scalar Arguments ..
484  IMPLICIT NONE
485  INTEGER info, lda, ldu, ldv, lwork, m, n
486 * ..
487 * .. Array Arguments ..
488  DOUBLE PRECISION a( lda, * ), sva( n ), u( ldu, * ), v( ldv, * ),
489  $ work( lwork )
490  INTEGER iwork( * )
491  CHARACTER*1 joba, jobp, jobr, jobt, jobu, jobv
492 * ..
493 *
494 * ===========================================================================
495 *
496 * .. Local Parameters ..
497  DOUBLE PRECISION zero, one
498  parameter( zero = 0.0d0, one = 1.0d0 )
499 * ..
500 * .. Local Scalars ..
501  DOUBLE PRECISION aapp, aaqq, aatmax, aatmin, big, big1, cond_ok,
502  $ condr1, condr2, entra, entrat, epsln, maxprj, scalem,
503  $ sconda, sfmin, small, temp1, uscal1, uscal2, xsc
504  INTEGER ierr, n1, nr, numrank, p, q, warning
505  LOGICAL almort, defr, errest, goscal, jracc, kill, lsvec,
506  $ l2aber, l2kill, l2pert, l2rank, l2tran,
507  $ noscal, rowpiv, rsvec, transp
508 * ..
509 * .. Intrinsic Functions ..
510  INTRINSIC dabs, dlog, max, min, dble, idnint, dsign, dsqrt
511 * ..
512 * .. External Functions ..
513  DOUBLE PRECISION dlamch, dnrm2
514  INTEGER idamax
515  LOGICAL lsame
516  EXTERNAL idamax, lsame, dlamch, dnrm2
517 * ..
518 * .. External Subroutines ..
519  EXTERNAL dcopy, dgelqf, dgeqp3, dgeqrf, dlacpy, dlascl,
522 *
523  EXTERNAL dgesvj
524 * ..
525 *
526 * Test the input arguments
527 *
528  lsvec = lsame( jobu, 'U' ) .OR. lsame( jobu, 'F' )
529  jracc = lsame( jobv, 'J' )
530  rsvec = lsame( jobv, 'V' ) .OR. jracc
531  rowpiv = lsame( joba, 'F' ) .OR. lsame( joba, 'G' )
532  l2rank = lsame( joba, 'R' )
533  l2aber = lsame( joba, 'A' )
534  errest = lsame( joba, 'E' ) .OR. lsame( joba, 'G' )
535  l2tran = lsame( jobt, 'T' )
536  l2kill = lsame( jobr, 'R' )
537  defr = lsame( jobr, 'N' )
538  l2pert = lsame( jobp, 'P' )
539 *
540  IF ( .NOT.(rowpiv .OR. l2rank .OR. l2aber .OR.
541  $ errest .OR. lsame( joba, 'C' ) )) THEN
542  info = - 1
543  ELSE IF ( .NOT.( lsvec .OR. lsame( jobu, 'N' ) .OR.
544  $ lsame( jobu, 'W' )) ) THEN
545  info = - 2
546  ELSE IF ( .NOT.( rsvec .OR. lsame( jobv, 'N' ) .OR.
547  $ lsame( jobv, 'W' )) .OR. ( jracc .AND. (.NOT.lsvec) ) ) THEN
548  info = - 3
549  ELSE IF ( .NOT. ( l2kill .OR. defr ) ) THEN
550  info = - 4
551  ELSE IF ( .NOT. ( l2tran .OR. lsame( jobt, 'N' ) ) ) THEN
552  info = - 5
553  ELSE IF ( .NOT. ( l2pert .OR. lsame( jobp, 'N' ) ) ) THEN
554  info = - 6
555  ELSE IF ( m .LT. 0 ) THEN
556  info = - 7
557  ELSE IF ( ( n .LT. 0 ) .OR. ( n .GT. m ) ) THEN
558  info = - 8
559  ELSE IF ( lda .LT. m ) THEN
560  info = - 10
561  ELSE IF ( lsvec .AND. ( ldu .LT. m ) ) THEN
562  info = - 13
563  ELSE IF ( rsvec .AND. ( ldv .LT. n ) ) THEN
564  info = - 14
565  ELSE IF ( (.NOT.(lsvec .OR. rsvec .OR. errest).AND.
566  & (lwork .LT. max(7,4*n+1,2*m+n))) .OR.
567  & (.NOT.(lsvec .OR. rsvec) .AND. errest .AND.
568  & (lwork .LT. max(7,4*n+n*n,2*m+n))) .OR.
569  & (lsvec .AND. (.NOT.rsvec) .AND. (lwork .LT. max(7,2*m+n,4*n+1)))
570  & .OR.
571  & (rsvec .AND. (.NOT.lsvec) .AND. (lwork .LT. max(7,2*m+n,4*n+1)))
572  & .OR.
573  & (lsvec .AND. rsvec .AND. (.NOT.jracc) .AND.
574  & (lwork.LT.max(2*m+n,6*n+2*n*n)))
575  & .OR. (lsvec .AND. rsvec .AND. jracc .AND.
576  & lwork.LT.max(2*m+n,4*n+n*n,2*n+n*n+6)))
577  & THEN
578  info = - 17
579  ELSE
580 * #:)
581  info = 0
582  END IF
583 *
584  IF ( info .NE. 0 ) THEN
585 * #:(
586  CALL xerbla( 'DGEJSV', - info )
587  RETURN
588  END IF
589 *
590 * Quick return for void matrix (Y3K safe)
591 * #:)
592  IF ( ( m .EQ. 0 ) .OR. ( n .EQ. 0 ) ) RETURN
593 *
594 * Determine whether the matrix U should be M x N or M x M
595 *
596  IF ( lsvec ) THEN
597  n1 = n
598  IF ( lsame( jobu, 'F' ) ) n1 = m
599  END IF
600 *
601 * Set numerical parameters
602 *
603 *! NOTE: Make sure DLAMCH() does not fail on the target architecture.
604 *
605  epsln = dlamch('Epsilon')
606  sfmin = dlamch('SafeMinimum')
607  small = sfmin / epsln
608  big = dlamch('O')
609 * BIG = ONE / SFMIN
610 *
611 * Initialize SVA(1:N) = diag( ||A e_i||_2 )_1^N
612 *
613 *(!) If necessary, scale SVA() to protect the largest norm from
614 * overflow. It is possible that this scaling pushes the smallest
615 * column norm left from the underflow threshold (extreme case).
616 *
617  scalem = one / dsqrt(dble(m)*dble(n))
618  noscal = .true.
619  goscal = .true.
620  DO 1874 p = 1, n
621  aapp = zero
622  aaqq = one
623  CALL dlassq( m, a(1,p), 1, aapp, aaqq )
624  IF ( aapp .GT. big ) THEN
625  info = - 9
626  CALL xerbla( 'DGEJSV', -info )
627  RETURN
628  END IF
629  aaqq = dsqrt(aaqq)
630  IF ( ( aapp .LT. (big / aaqq) ) .AND. noscal ) THEN
631  sva(p) = aapp * aaqq
632  ELSE
633  noscal = .false.
634  sva(p) = aapp * ( aaqq * scalem )
635  IF ( goscal ) THEN
636  goscal = .false.
637  CALL dscal( p-1, scalem, sva, 1 )
638  END IF
639  END IF
640  1874 CONTINUE
641 *
642  IF ( noscal ) scalem = one
643 *
644  aapp = zero
645  aaqq = big
646  DO 4781 p = 1, n
647  aapp = max( aapp, sva(p) )
648  IF ( sva(p) .NE. zero ) aaqq = min( aaqq, sva(p) )
649  4781 CONTINUE
650 *
651 * Quick return for zero M x N matrix
652 * #:)
653  IF ( aapp .EQ. zero ) THEN
654  IF ( lsvec ) CALL dlaset( 'G', m, n1, zero, one, u, ldu )
655  IF ( rsvec ) CALL dlaset( 'G', n, n, zero, one, v, ldv )
656  work(1) = one
657  work(2) = one
658  IF ( errest ) work(3) = one
659  IF ( lsvec .AND. rsvec ) THEN
660  work(4) = one
661  work(5) = one
662  END IF
663  IF ( l2tran ) THEN
664  work(6) = zero
665  work(7) = zero
666  END IF
667  iwork(1) = 0
668  iwork(2) = 0
669  iwork(3) = 0
670  RETURN
671  END IF
672 *
673 * Issue warning if denormalized column norms detected. Override the
674 * high relative accuracy request. Issue licence to kill columns
675 * (set them to zero) whose norm is less than sigma_max / BIG (roughly).
676 * #:(
677  warning = 0
678  IF ( aaqq .LE. sfmin ) THEN
679  l2rank = .true.
680  l2kill = .true.
681  warning = 1
682  END IF
683 *
684 * Quick return for one-column matrix
685 * #:)
686  IF ( n .EQ. 1 ) THEN
687 *
688  IF ( lsvec ) THEN
689  CALL dlascl( 'G',0,0,sva(1),scalem, m,1,a(1,1),lda,ierr )
690  CALL dlacpy( 'A', m, 1, a, lda, u, ldu )
691 * computing all M left singular vectors of the M x 1 matrix
692  IF ( n1 .NE. n ) THEN
693  CALL dgeqrf( m, n, u,ldu, work, work(n+1),lwork-n,ierr )
694  CALL dorgqr( m,n1,1, u,ldu,work,work(n+1),lwork-n,ierr )
695  CALL dcopy( m, a(1,1), 1, u(1,1), 1 )
696  END IF
697  END IF
698  IF ( rsvec ) THEN
699  v(1,1) = one
700  END IF
701  IF ( sva(1) .LT. (big*scalem) ) THEN
702  sva(1) = sva(1) / scalem
703  scalem = one
704  END IF
705  work(1) = one / scalem
706  work(2) = one
707  IF ( sva(1) .NE. zero ) THEN
708  iwork(1) = 1
709  IF ( ( sva(1) / scalem) .GE. sfmin ) THEN
710  iwork(2) = 1
711  ELSE
712  iwork(2) = 0
713  END IF
714  ELSE
715  iwork(1) = 0
716  iwork(2) = 0
717  END IF
718  IF ( errest ) work(3) = one
719  IF ( lsvec .AND. rsvec ) THEN
720  work(4) = one
721  work(5) = one
722  END IF
723  IF ( l2tran ) THEN
724  work(6) = zero
725  work(7) = zero
726  END IF
727  RETURN
728 *
729  END IF
730 *
731  transp = .false.
732  l2tran = l2tran .AND. ( m .EQ. n )
733 *
734  aatmax = -one
735  aatmin = big
736  IF ( rowpiv .OR. l2tran ) THEN
737 *
738 * Compute the row norms, needed to determine row pivoting sequence
739 * (in the case of heavily row weighted A, row pivoting is strongly
740 * advised) and to collect information needed to compare the
741 * structures of A * A^t and A^t * A (in the case L2TRAN.EQ..TRUE.).
742 *
743  IF ( l2tran ) THEN
744  DO 1950 p = 1, m
745  xsc = zero
746  temp1 = one
747  CALL dlassq( n, a(p,1), lda, xsc, temp1 )
748 * DLASSQ gets both the ell_2 and the ell_infinity norm
749 * in one pass through the vector
750  work(m+n+p) = xsc * scalem
751  work(n+p) = xsc * (scalem*dsqrt(temp1))
752  aatmax = max( aatmax, work(n+p) )
753  IF (work(n+p) .NE. zero) aatmin = min(aatmin,work(n+p))
754  1950 CONTINUE
755  ELSE
756  DO 1904 p = 1, m
757  work(m+n+p) = scalem*dabs( a(p,idamax(n,a(p,1),lda)) )
758  aatmax = max( aatmax, work(m+n+p) )
759  aatmin = min( aatmin, work(m+n+p) )
760  1904 CONTINUE
761  END IF
762 *
763  END IF
764 *
765 * For square matrix A try to determine whether A^t would be better
766 * input for the preconditioned Jacobi SVD, with faster convergence.
767 * The decision is based on an O(N) function of the vector of column
768 * and row norms of A, based on the Shannon entropy. This should give
769 * the right choice in most cases when the difference actually matters.
770 * It may fail and pick the slower converging side.
771 *
772  entra = zero
773  entrat = zero
774  IF ( l2tran ) THEN
775 *
776  xsc = zero
777  temp1 = one
778  CALL dlassq( n, sva, 1, xsc, temp1 )
779  temp1 = one / temp1
780 *
781  entra = zero
782  DO 1113 p = 1, n
783  big1 = ( ( sva(p) / xsc )**2 ) * temp1
784  IF ( big1 .NE. zero ) entra = entra + big1 * dlog(big1)
785  1113 CONTINUE
786  entra = - entra / dlog(dble(n))
787 *
788 * Now, SVA().^2/Trace(A^t * A) is a point in the probability simplex.
789 * It is derived from the diagonal of A^t * A. Do the same with the
790 * diagonal of A * A^t, compute the entropy of the corresponding
791 * probability distribution. Note that A * A^t and A^t * A have the
792 * same trace.
793 *
794  entrat = zero
795  DO 1114 p = n+1, n+m
796  big1 = ( ( work(p) / xsc )**2 ) * temp1
797  IF ( big1 .NE. zero ) entrat = entrat + big1 * dlog(big1)
798  1114 CONTINUE
799  entrat = - entrat / dlog(dble(m))
800 *
801 * Analyze the entropies and decide A or A^t. Smaller entropy
802 * usually means better input for the algorithm.
803 *
804  transp = ( entrat .LT. entra )
805 *
806 * If A^t is better than A, transpose A.
807 *
808  IF ( transp ) THEN
809 * In an optimal implementation, this trivial transpose
810 * should be replaced with faster transpose.
811  DO 1115 p = 1, n - 1
812  DO 1116 q = p + 1, n
813  temp1 = a(q,p)
814  a(q,p) = a(p,q)
815  a(p,q) = temp1
816  1116 CONTINUE
817  1115 CONTINUE
818  DO 1117 p = 1, n
819  work(m+n+p) = sva(p)
820  sva(p) = work(n+p)
821  1117 CONTINUE
822  temp1 = aapp
823  aapp = aatmax
824  aatmax = temp1
825  temp1 = aaqq
826  aaqq = aatmin
827  aatmin = temp1
828  kill = lsvec
829  lsvec = rsvec
830  rsvec = kill
831  IF ( lsvec ) n1 = n
832 *
833  rowpiv = .true.
834  END IF
835 *
836  END IF
837 * END IF L2TRAN
838 *
839 * Scale the matrix so that its maximal singular value remains less
840 * than DSQRT(BIG) -- the matrix is scaled so that its maximal column
841 * has Euclidean norm equal to DSQRT(BIG/N). The only reason to keep
842 * DSQRT(BIG) instead of BIG is the fact that DGEJSV uses LAPACK and
843 * BLAS routines that, in some implementations, are not capable of
844 * working in the full interval [SFMIN,BIG] and that they may provoke
845 * overflows in the intermediate results. If the singular values spread
846 * from SFMIN to BIG, then DGESVJ will compute them. So, in that case,
847 * one should use DGESVJ instead of DGEJSV.
848 *
849  big1 = dsqrt( big )
850  temp1 = dsqrt( big / dble(n) )
851 *
852  CALL dlascl( 'G', 0, 0, aapp, temp1, n, 1, sva, n, ierr )
853  IF ( aaqq .GT. (aapp * sfmin) ) THEN
854  aaqq = ( aaqq / aapp ) * temp1
855  ELSE
856  aaqq = ( aaqq * temp1 ) / aapp
857  END IF
858  temp1 = temp1 * scalem
859  CALL dlascl( 'G', 0, 0, aapp, temp1, m, n, a, lda, ierr )
860 *
861 * To undo scaling at the end of this procedure, multiply the
862 * computed singular values with USCAL2 / USCAL1.
863 *
864  uscal1 = temp1
865  uscal2 = aapp
866 *
867  IF ( l2kill ) THEN
868 * L2KILL enforces computation of nonzero singular values in
869 * the restricted range of condition number of the initial A,
870 * sigma_max(A) / sigma_min(A) approx. DSQRT(BIG)/DSQRT(SFMIN).
871  xsc = dsqrt( sfmin )
872  ELSE
873  xsc = small
874 *
875 * Now, if the condition number of A is too big,
876 * sigma_max(A) / sigma_min(A) .GT. DSQRT(BIG/N) * EPSLN / SFMIN,
877 * as a precaution measure, the full SVD is computed using DGESVJ
878 * with accumulated Jacobi rotations. This provides numerically
879 * more robust computation, at the cost of slightly increased run
880 * time. Depending on the concrete implementation of BLAS and LAPACK
881 * (i.e. how they behave in presence of extreme ill-conditioning) the
882 * implementor may decide to remove this switch.
883  IF ( ( aaqq.LT.dsqrt(sfmin) ) .AND. lsvec .AND. rsvec ) THEN
884  jracc = .true.
885  END IF
886 *
887  END IF
888  IF ( aaqq .LT. xsc ) THEN
889  DO 700 p = 1, n
890  IF ( sva(p) .LT. xsc ) THEN
891  CALL dlaset( 'A', m, 1, zero, zero, a(1,p), lda )
892  sva(p) = zero
893  END IF
894  700 CONTINUE
895  END IF
896 *
897 * Preconditioning using QR factorization with pivoting
898 *
899  IF ( rowpiv ) THEN
900 * Optional row permutation (Bjoerck row pivoting):
901 * A result by Cox and Higham shows that the Bjoerck's
902 * row pivoting combined with standard column pivoting
903 * has similar effect as Powell-Reid complete pivoting.
904 * The ell-infinity norms of A are made nonincreasing.
905  DO 1952 p = 1, m - 1
906  q = idamax( m-p+1, work(m+n+p), 1 ) + p - 1
907  iwork(2*n+p) = q
908  IF ( p .NE. q ) THEN
909  temp1 = work(m+n+p)
910  work(m+n+p) = work(m+n+q)
911  work(m+n+q) = temp1
912  END IF
913  1952 CONTINUE
914  CALL dlaswp( n, a, lda, 1, m-1, iwork(2*n+1), 1 )
915  END IF
916 *
917 * End of the preparation phase (scaling, optional sorting and
918 * transposing, optional flushing of small columns).
919 *
920 * Preconditioning
921 *
922 * If the full SVD is needed, the right singular vectors are computed
923 * from a matrix equation, and for that we need theoretical analysis
924 * of the Businger-Golub pivoting. So we use DGEQP3 as the first RR QRF.
925 * In all other cases the first RR QRF can be chosen by other criteria
926 * (eg speed by replacing global with restricted window pivoting, such
927 * as in SGEQPX from TOMS # 782). Good results will be obtained using
928 * SGEQPX with properly (!) chosen numerical parameters.
929 * Any improvement of DGEQP3 improves overal performance of DGEJSV.
930 *
931 * A * P1 = Q1 * [ R1^t 0]^t:
932  DO 1963 p = 1, n
933 * .. all columns are free columns
934  iwork(p) = 0
935  1963 CONTINUE
936  CALL dgeqp3( m,n,a,lda, iwork,work, work(n+1),lwork-n, ierr )
937 *
938 * The upper triangular matrix R1 from the first QRF is inspected for
939 * rank deficiency and possibilities for deflation, or possible
940 * ill-conditioning. Depending on the user specified flag L2RANK,
941 * the procedure explores possibilities to reduce the numerical
942 * rank by inspecting the computed upper triangular factor. If
943 * L2RANK or L2ABER are up, then DGEJSV will compute the SVD of
944 * A + dA, where ||dA|| <= f(M,N)*EPSLN.
945 *
946  nr = 1
947  IF ( l2aber ) THEN
948 * Standard absolute error bound suffices. All sigma_i with
949 * sigma_i < N*EPSLN*||A|| are flushed to zero. This is an
950 * agressive enforcement of lower numerical rank by introducing a
951 * backward error of the order of N*EPSLN*||A||.
952  temp1 = dsqrt(dble(n))*epsln
953  DO 3001 p = 2, n
954  IF ( dabs(a(p,p)) .GE. (temp1*dabs(a(1,1))) ) THEN
955  nr = nr + 1
956  ELSE
957  GO TO 3002
958  END IF
959  3001 CONTINUE
960  3002 CONTINUE
961  ELSE IF ( l2rank ) THEN
962 * .. similarly as above, only slightly more gentle (less agressive).
963 * Sudden drop on the diagonal of R1 is used as the criterion for
964 * close-to-rank-defficient.
965  temp1 = dsqrt(sfmin)
966  DO 3401 p = 2, n
967  IF ( ( dabs(a(p,p)) .LT. (epsln*dabs(a(p-1,p-1))) ) .OR.
968  $ ( dabs(a(p,p)) .LT. small ) .OR.
969  $ ( l2kill .AND. (dabs(a(p,p)) .LT. temp1) ) ) GO TO 3402
970  nr = nr + 1
971  3401 CONTINUE
972  3402 CONTINUE
973 *
974  ELSE
975 * The goal is high relative accuracy. However, if the matrix
976 * has high scaled condition number the relative accuracy is in
977 * general not feasible. Later on, a condition number estimator
978 * will be deployed to estimate the scaled condition number.
979 * Here we just remove the underflowed part of the triangular
980 * factor. This prevents the situation in which the code is
981 * working hard to get the accuracy not warranted by the data.
982  temp1 = dsqrt(sfmin)
983  DO 3301 p = 2, n
984  IF ( ( dabs(a(p,p)) .LT. small ) .OR.
985  $ ( l2kill .AND. (dabs(a(p,p)) .LT. temp1) ) ) GO TO 3302
986  nr = nr + 1
987  3301 CONTINUE
988  3302 CONTINUE
989 *
990  END IF
991 *
992  almort = .false.
993  IF ( nr .EQ. n ) THEN
994  maxprj = one
995  DO 3051 p = 2, n
996  temp1 = dabs(a(p,p)) / sva(iwork(p))
997  maxprj = min( maxprj, temp1 )
998  3051 CONTINUE
999  IF ( maxprj**2 .GE. one - dble(n)*epsln ) almort = .true.
1000  END IF
1001 *
1002 *
1003  sconda = - one
1004  condr1 = - one
1005  condr2 = - one
1006 *
1007  IF ( errest ) THEN
1008  IF ( n .EQ. nr ) THEN
1009  IF ( rsvec ) THEN
1010 * .. V is available as workspace
1011  CALL dlacpy( 'U', n, n, a, lda, v, ldv )
1012  DO 3053 p = 1, n
1013  temp1 = sva(iwork(p))
1014  CALL dscal( p, one/temp1, v(1,p), 1 )
1015  3053 CONTINUE
1016  CALL dpocon( 'U', n, v, ldv, one, temp1,
1017  $ work(n+1), iwork(2*n+m+1), ierr )
1018  ELSE IF ( lsvec ) THEN
1019 * .. U is available as workspace
1020  CALL dlacpy( 'U', n, n, a, lda, u, ldu )
1021  DO 3054 p = 1, n
1022  temp1 = sva(iwork(p))
1023  CALL dscal( p, one/temp1, u(1,p), 1 )
1024  3054 CONTINUE
1025  CALL dpocon( 'U', n, u, ldu, one, temp1,
1026  $ work(n+1), iwork(2*n+m+1), ierr )
1027  ELSE
1028  CALL dlacpy( 'U', n, n, a, lda, work(n+1), n )
1029  DO 3052 p = 1, n
1030  temp1 = sva(iwork(p))
1031  CALL dscal( p, one/temp1, work(n+(p-1)*n+1), 1 )
1032  3052 CONTINUE
1033 * .. the columns of R are scaled to have unit Euclidean lengths.
1034  CALL dpocon( 'U', n, work(n+1), n, one, temp1,
1035  $ work(n+n*n+1), iwork(2*n+m+1), ierr )
1036  END IF
1037  sconda = one / dsqrt(temp1)
1038 * SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1).
1039 * N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
1040  ELSE
1041  sconda = - one
1042  END IF
1043  END IF
1044 *
1045  l2pert = l2pert .AND. ( dabs( a(1,1)/a(nr,nr) ) .GT. dsqrt(big1) )
1046 * If there is no violent scaling, artificial perturbation is not needed.
1047 *
1048 * Phase 3:
1049 *
1050  IF ( .NOT. ( rsvec .OR. lsvec ) ) THEN
1051 *
1052 * Singular Values only
1053 *
1054 * .. transpose A(1:NR,1:N)
1055  DO 1946 p = 1, min( n-1, nr )
1056  CALL dcopy( n-p, a(p,p+1), lda, a(p+1,p), 1 )
1057  1946 CONTINUE
1058 *
1059 * The following two DO-loops introduce small relative perturbation
1060 * into the strict upper triangle of the lower triangular matrix.
1061 * Small entries below the main diagonal are also changed.
1062 * This modification is useful if the computing environment does not
1063 * provide/allow FLUSH TO ZERO underflow, for it prevents many
1064 * annoying denormalized numbers in case of strongly scaled matrices.
1065 * The perturbation is structured so that it does not introduce any
1066 * new perturbation of the singular values, and it does not destroy
1067 * the job done by the preconditioner.
1068 * The licence for this perturbation is in the variable L2PERT, which
1069 * should be .FALSE. if FLUSH TO ZERO underflow is active.
1070 *
1071  IF ( .NOT. almort ) THEN
1072 *
1073  IF ( l2pert ) THEN
1074 * XSC = DSQRT(SMALL)
1075  xsc = epsln / dble(n)
1076  DO 4947 q = 1, nr
1077  temp1 = xsc*dabs(a(q,q))
1078  DO 4949 p = 1, n
1079  IF ( ( (p.GT.q) .AND. (dabs(a(p,q)).LE.temp1) )
1080  $ .OR. ( p .LT. q ) )
1081  $ a(p,q) = dsign( temp1, a(p,q) )
1082  4949 CONTINUE
1083  4947 CONTINUE
1084  ELSE
1085  CALL dlaset( 'U', nr-1,nr-1, zero,zero, a(1,2),lda )
1086  END IF
1087 *
1088 * .. second preconditioning using the QR factorization
1089 *
1090  CALL dgeqrf( n,nr, a,lda, work, work(n+1),lwork-n, ierr )
1091 *
1092 * .. and transpose upper to lower triangular
1093  DO 1948 p = 1, nr - 1
1094  CALL dcopy( nr-p, a(p,p+1), lda, a(p+1,p), 1 )
1095  1948 CONTINUE
1096 *
1097  END IF
1098 *
1099 * Row-cyclic Jacobi SVD algorithm with column pivoting
1100 *
1101 * .. again some perturbation (a "background noise") is added
1102 * to drown denormals
1103  IF ( l2pert ) THEN
1104 * XSC = DSQRT(SMALL)
1105  xsc = epsln / dble(n)
1106  DO 1947 q = 1, nr
1107  temp1 = xsc*dabs(a(q,q))
1108  DO 1949 p = 1, nr
1109  IF ( ( (p.GT.q) .AND. (dabs(a(p,q)).LE.temp1) )
1110  $ .OR. ( p .LT. q ) )
1111  $ a(p,q) = dsign( temp1, a(p,q) )
1112  1949 CONTINUE
1113  1947 CONTINUE
1114  ELSE
1115  CALL dlaset( 'U', nr-1, nr-1, zero, zero, a(1,2), lda )
1116  END IF
1117 *
1118 * .. and one-sided Jacobi rotations are started on a lower
1119 * triangular matrix (plus perturbation which is ignored in
1120 * the part which destroys triangular form (confusing?!))
1121 *
1122  CALL dgesvj( 'L', 'NoU', 'NoV', nr, nr, a, lda, sva,
1123  $ n, v, ldv, work, lwork, info )
1124 *
1125  scalem = work(1)
1126  numrank = idnint(work(2))
1127 *
1128 *
1129  ELSE IF ( rsvec .AND. ( .NOT. lsvec ) ) THEN
1130 *
1131 * -> Singular Values and Right Singular Vectors <-
1132 *
1133  IF ( almort ) THEN
1134 *
1135 * .. in this case NR equals N
1136  DO 1998 p = 1, nr
1137  CALL dcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1138  1998 CONTINUE
1139  CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2), ldv )
1140 *
1141  CALL dgesvj( 'L','U','N', n, nr, v,ldv, sva, nr, a,lda,
1142  $ work, lwork, info )
1143  scalem = work(1)
1144  numrank = idnint(work(2))
1145 
1146  ELSE
1147 *
1148 * .. two more QR factorizations ( one QRF is not enough, two require
1149 * accumulated product of Jacobi rotations, three are perfect )
1150 *
1151  CALL dlaset( 'Lower', nr-1, nr-1, zero, zero, a(2,1), lda )
1152  CALL dgelqf( nr, n, a, lda, work, work(n+1), lwork-n, ierr)
1153  CALL dlacpy( 'Lower', nr, nr, a, lda, v, ldv )
1154  CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2), ldv )
1155  CALL dgeqrf( nr, nr, v, ldv, work(n+1), work(2*n+1),
1156  $ lwork-2*n, ierr )
1157  DO 8998 p = 1, nr
1158  CALL dcopy( nr-p+1, v(p,p), ldv, v(p,p), 1 )
1159  8998 CONTINUE
1160  CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2), ldv )
1161 *
1162  CALL dgesvj( 'Lower', 'U','N', nr, nr, v,ldv, sva, nr, u,
1163  $ ldu, work(n+1), lwork, info )
1164  scalem = work(n+1)
1165  numrank = idnint(work(n+2))
1166  IF ( nr .LT. n ) THEN
1167  CALL dlaset( 'A',n-nr, nr, zero,zero, v(nr+1,1), ldv )
1168  CALL dlaset( 'A',nr, n-nr, zero,zero, v(1,nr+1), ldv )
1169  CALL dlaset( 'A',n-nr,n-nr,zero,one, v(nr+1,nr+1), ldv )
1170  END IF
1171 *
1172  CALL dormlq( 'Left', 'Transpose', n, n, nr, a, lda, work,
1173  $ v, ldv, work(n+1), lwork-n, ierr )
1174 *
1175  END IF
1176 *
1177  DO 8991 p = 1, n
1178  CALL dcopy( n, v(p,1), ldv, a(iwork(p),1), lda )
1179  8991 CONTINUE
1180  CALL dlacpy( 'All', n, n, a, lda, v, ldv )
1181 *
1182  IF ( transp ) THEN
1183  CALL dlacpy( 'All', n, n, v, ldv, u, ldu )
1184  END IF
1185 *
1186  ELSE IF ( lsvec .AND. ( .NOT. rsvec ) ) THEN
1187 *
1188 * .. Singular Values and Left Singular Vectors ..
1189 *
1190 * .. second preconditioning step to avoid need to accumulate
1191 * Jacobi rotations in the Jacobi iterations.
1192  DO 1965 p = 1, nr
1193  CALL dcopy( n-p+1, a(p,p), lda, u(p,p), 1 )
1194  1965 CONTINUE
1195  CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, u(1,2), ldu )
1196 *
1197  CALL dgeqrf( n, nr, u, ldu, work(n+1), work(2*n+1),
1198  $ lwork-2*n, ierr )
1199 *
1200  DO 1967 p = 1, nr - 1
1201  CALL dcopy( nr-p, u(p,p+1), ldu, u(p+1,p), 1 )
1202  1967 CONTINUE
1203  CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, u(1,2), ldu )
1204 *
1205  CALL dgesvj( 'Lower', 'U', 'N', nr,nr, u, ldu, sva, nr, a,
1206  $ lda, work(n+1), lwork-n, info )
1207  scalem = work(n+1)
1208  numrank = idnint(work(n+2))
1209 *
1210  IF ( nr .LT. m ) THEN
1211  CALL dlaset( 'A', m-nr, nr,zero, zero, u(nr+1,1), ldu )
1212  IF ( nr .LT. n1 ) THEN
1213  CALL dlaset( 'A',nr, n1-nr, zero, zero, u(1,nr+1), ldu )
1214  CALL dlaset( 'A',m-nr,n1-nr,zero,one,u(nr+1,nr+1), ldu )
1215  END IF
1216  END IF
1217 *
1218  CALL dormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1219  $ ldu, work(n+1), lwork-n, ierr )
1220 *
1221  IF ( rowpiv )
1222  $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1223 *
1224  DO 1974 p = 1, n1
1225  xsc = one / dnrm2( m, u(1,p), 1 )
1226  CALL dscal( m, xsc, u(1,p), 1 )
1227  1974 CONTINUE
1228 *
1229  IF ( transp ) THEN
1230  CALL dlacpy( 'All', n, n, u, ldu, v, ldv )
1231  END IF
1232 *
1233  ELSE
1234 *
1235 * .. Full SVD ..
1236 *
1237  IF ( .NOT. jracc ) THEN
1238 *
1239  IF ( .NOT. almort ) THEN
1240 *
1241 * Second Preconditioning Step (QRF [with pivoting])
1242 * Note that the composition of TRANSPOSE, QRF and TRANSPOSE is
1243 * equivalent to an LQF CALL. Since in many libraries the QRF
1244 * seems to be better optimized than the LQF, we do explicit
1245 * transpose and use the QRF. This is subject to changes in an
1246 * optimized implementation of DGEJSV.
1247 *
1248  DO 1968 p = 1, nr
1249  CALL dcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1250  1968 CONTINUE
1251 *
1252 * .. the following two loops perturb small entries to avoid
1253 * denormals in the second QR factorization, where they are
1254 * as good as zeros. This is done to avoid painfully slow
1255 * computation with denormals. The relative size of the perturbation
1256 * is a parameter that can be changed by the implementer.
1257 * This perturbation device will be obsolete on machines with
1258 * properly implemented arithmetic.
1259 * To switch it off, set L2PERT=.FALSE. To remove it from the
1260 * code, remove the action under L2PERT=.TRUE., leave the ELSE part.
1261 * The following two loops should be blocked and fused with the
1262 * transposed copy above.
1263 *
1264  IF ( l2pert ) THEN
1265  xsc = dsqrt(small)
1266  DO 2969 q = 1, nr
1267  temp1 = xsc*dabs( v(q,q) )
1268  DO 2968 p = 1, n
1269  IF ( ( p .GT. q ) .AND. ( dabs(v(p,q)) .LE. temp1 )
1270  $ .OR. ( p .LT. q ) )
1271  $ v(p,q) = dsign( temp1, v(p,q) )
1272  IF ( p .LT. q ) v(p,q) = - v(p,q)
1273  2968 CONTINUE
1274  2969 CONTINUE
1275  ELSE
1276  CALL dlaset( 'U', nr-1, nr-1, zero, zero, v(1,2), ldv )
1277  END IF
1278 *
1279 * Estimate the row scaled condition number of R1
1280 * (If R1 is rectangular, N > NR, then the condition number
1281 * of the leading NR x NR submatrix is estimated.)
1282 *
1283  CALL dlacpy( 'L', nr, nr, v, ldv, work(2*n+1), nr )
1284  DO 3950 p = 1, nr
1285  temp1 = dnrm2(nr-p+1,work(2*n+(p-1)*nr+p),1)
1286  CALL dscal(nr-p+1,one/temp1,work(2*n+(p-1)*nr+p),1)
1287  3950 CONTINUE
1288  CALL dpocon('Lower',nr,work(2*n+1),nr,one,temp1,
1289  $ work(2*n+nr*nr+1),iwork(m+2*n+1),ierr)
1290  condr1 = one / dsqrt(temp1)
1291 * .. here need a second oppinion on the condition number
1292 * .. then assume worst case scenario
1293 * R1 is OK for inverse <=> CONDR1 .LT. DBLE(N)
1294 * more conservative <=> CONDR1 .LT. DSQRT(DBLE(N))
1295 *
1296  cond_ok = dsqrt(dble(nr))
1297 *[TP] COND_OK is a tuning parameter.
1298 
1299  IF ( condr1 .LT. cond_ok ) THEN
1300 * .. the second QRF without pivoting. Note: in an optimized
1301 * implementation, this QRF should be implemented as the QRF
1302 * of a lower triangular matrix.
1303 * R1^t = Q2 * R2
1304  CALL dgeqrf( n, nr, v, ldv, work(n+1), work(2*n+1),
1305  $ lwork-2*n, ierr )
1306 *
1307  IF ( l2pert ) THEN
1308  xsc = dsqrt(small)/epsln
1309  DO 3959 p = 2, nr
1310  DO 3958 q = 1, p - 1
1311  temp1 = xsc * min(dabs(v(p,p)),dabs(v(q,q)))
1312  IF ( dabs(v(q,p)) .LE. temp1 )
1313  $ v(q,p) = dsign( temp1, v(q,p) )
1314  3958 CONTINUE
1315  3959 CONTINUE
1316  END IF
1317 *
1318  IF ( nr .NE. n )
1319  $ CALL dlacpy( 'A', n, nr, v, ldv, work(2*n+1), n )
1320 * .. save ...
1321 *
1322 * .. this transposed copy should be better than naive
1323  DO 1969 p = 1, nr - 1
1324  CALL dcopy( nr-p, v(p,p+1), ldv, v(p+1,p), 1 )
1325  1969 CONTINUE
1326 *
1327  condr2 = condr1
1328 *
1329  ELSE
1330 *
1331 * .. ill-conditioned case: second QRF with pivoting
1332 * Note that windowed pivoting would be equaly good
1333 * numerically, and more run-time efficient. So, in
1334 * an optimal implementation, the next call to DGEQP3
1335 * should be replaced with eg. CALL SGEQPX (ACM TOMS #782)
1336 * with properly (carefully) chosen parameters.
1337 *
1338 * R1^t * P2 = Q2 * R2
1339  DO 3003 p = 1, nr
1340  iwork(n+p) = 0
1341  3003 CONTINUE
1342  CALL dgeqp3( n, nr, v, ldv, iwork(n+1), work(n+1),
1343  $ work(2*n+1), lwork-2*n, ierr )
1344 ** CALL DGEQRF( N, NR, V, LDV, WORK(N+1), WORK(2*N+1),
1345 ** $ LWORK-2*N, IERR )
1346  IF ( l2pert ) THEN
1347  xsc = dsqrt(small)
1348  DO 3969 p = 2, nr
1349  DO 3968 q = 1, p - 1
1350  temp1 = xsc * min(dabs(v(p,p)),dabs(v(q,q)))
1351  IF ( dabs(v(q,p)) .LE. temp1 )
1352  $ v(q,p) = dsign( temp1, v(q,p) )
1353  3968 CONTINUE
1354  3969 CONTINUE
1355  END IF
1356 *
1357  CALL dlacpy( 'A', n, nr, v, ldv, work(2*n+1), n )
1358 *
1359  IF ( l2pert ) THEN
1360  xsc = dsqrt(small)
1361  DO 8970 p = 2, nr
1362  DO 8971 q = 1, p - 1
1363  temp1 = xsc * min(dabs(v(p,p)),dabs(v(q,q)))
1364  v(p,q) = - dsign( temp1, v(q,p) )
1365  8971 CONTINUE
1366  8970 CONTINUE
1367  ELSE
1368  CALL dlaset( 'L',nr-1,nr-1,zero,zero,v(2,1),ldv )
1369  END IF
1370 * Now, compute R2 = L3 * Q3, the LQ factorization.
1371  CALL dgelqf( nr, nr, v, ldv, work(2*n+n*nr+1),
1372  $ work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, ierr )
1373 * .. and estimate the condition number
1374  CALL dlacpy( 'L',nr,nr,v,ldv,work(2*n+n*nr+nr+1),nr )
1375  DO 4950 p = 1, nr
1376  temp1 = dnrm2( p, work(2*n+n*nr+nr+p), nr )
1377  CALL dscal( p, one/temp1, work(2*n+n*nr+nr+p), nr )
1378  4950 CONTINUE
1379  CALL dpocon( 'L',nr,work(2*n+n*nr+nr+1),nr,one,temp1,
1380  $ work(2*n+n*nr+nr+nr*nr+1),iwork(m+2*n+1),ierr )
1381  condr2 = one / dsqrt(temp1)
1382 *
1383  IF ( condr2 .GE. cond_ok ) THEN
1384 * .. save the Householder vectors used for Q3
1385 * (this overwrittes the copy of R2, as it will not be
1386 * needed in this branch, but it does not overwritte the
1387 * Huseholder vectors of Q2.).
1388  CALL dlacpy( 'U', nr, nr, v, ldv, work(2*n+1), n )
1389 * .. and the rest of the information on Q3 is in
1390 * WORK(2*N+N*NR+1:2*N+N*NR+N)
1391  END IF
1392 *
1393  END IF
1394 *
1395  IF ( l2pert ) THEN
1396  xsc = dsqrt(small)
1397  DO 4968 q = 2, nr
1398  temp1 = xsc * v(q,q)
1399  DO 4969 p = 1, q - 1
1400 * V(p,q) = - DSIGN( TEMP1, V(q,p) )
1401  v(p,q) = - dsign( temp1, v(p,q) )
1402  4969 CONTINUE
1403  4968 CONTINUE
1404  ELSE
1405  CALL dlaset( 'U', nr-1,nr-1, zero,zero, v(1,2), ldv )
1406  END IF
1407 *
1408 * Second preconditioning finished; continue with Jacobi SVD
1409 * The input matrix is lower trinagular.
1410 *
1411 * Recover the right singular vectors as solution of a well
1412 * conditioned triangular matrix equation.
1413 *
1414  IF ( condr1 .LT. cond_ok ) THEN
1415 *
1416  CALL dgesvj( 'L','U','N',nr,nr,v,ldv,sva,nr,u,
1417  $ ldu,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,info )
1418  scalem = work(2*n+n*nr+nr+1)
1419  numrank = idnint(work(2*n+n*nr+nr+2))
1420  DO 3970 p = 1, nr
1421  CALL dcopy( nr, v(1,p), 1, u(1,p), 1 )
1422  CALL dscal( nr, sva(p), v(1,p), 1 )
1423  3970 CONTINUE
1424 
1425 * .. pick the right matrix equation and solve it
1426 *
1427  IF ( nr .EQ. n ) THEN
1428 * :)) .. best case, R1 is inverted. The solution of this matrix
1429 * equation is Q2*V2 = the product of the Jacobi rotations
1430 * used in DGESVJ, premultiplied with the orthogonal matrix
1431 * from the second QR factorization.
1432  CALL dtrsm( 'L','U','N','N', nr,nr,one, a,lda, v,ldv )
1433  ELSE
1434 * .. R1 is well conditioned, but non-square. Transpose(R2)
1435 * is inverted to get the product of the Jacobi rotations
1436 * used in DGESVJ. The Q-factor from the second QR
1437 * factorization is then built in explicitly.
1438  CALL dtrsm('L','U','T','N',nr,nr,one,work(2*n+1),
1439  $ n,v,ldv)
1440  IF ( nr .LT. n ) THEN
1441  CALL dlaset('A',n-nr,nr,zero,zero,v(nr+1,1),ldv)
1442  CALL dlaset('A',nr,n-nr,zero,zero,v(1,nr+1),ldv)
1443  CALL dlaset('A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv)
1444  END IF
1445  CALL dormqr('L','N',n,n,nr,work(2*n+1),n,work(n+1),
1446  $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr)
1447  END IF
1448 *
1449  ELSE IF ( condr2 .LT. cond_ok ) THEN
1450 *
1451 * :) .. the input matrix A is very likely a relative of
1452 * the Kahan matrix :)
1453 * The matrix R2 is inverted. The solution of the matrix equation
1454 * is Q3^T*V3 = the product of the Jacobi rotations (appplied to
1455 * the lower triangular L3 from the LQ factorization of
1456 * R2=L3*Q3), pre-multiplied with the transposed Q3.
1457  CALL dgesvj( 'L', 'U', 'N', nr, nr, v, ldv, sva, nr, u,
1458  $ ldu, work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, info )
1459  scalem = work(2*n+n*nr+nr+1)
1460  numrank = idnint(work(2*n+n*nr+nr+2))
1461  DO 3870 p = 1, nr
1462  CALL dcopy( nr, v(1,p), 1, u(1,p), 1 )
1463  CALL dscal( nr, sva(p), u(1,p), 1 )
1464  3870 CONTINUE
1465  CALL dtrsm('L','U','N','N',nr,nr,one,work(2*n+1),n,u,ldu)
1466 * .. apply the permutation from the second QR factorization
1467  DO 873 q = 1, nr
1468  DO 872 p = 1, nr
1469  work(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1470  872 CONTINUE
1471  DO 874 p = 1, nr
1472  u(p,q) = work(2*n+n*nr+nr+p)
1473  874 CONTINUE
1474  873 CONTINUE
1475  IF ( nr .LT. n ) THEN
1476  CALL dlaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1477  CALL dlaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1478  CALL dlaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1479  END IF
1480  CALL dormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1481  $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1482  ELSE
1483 * Last line of defense.
1484 * #:( This is a rather pathological case: no scaled condition
1485 * improvement after two pivoted QR factorizations. Other
1486 * possibility is that the rank revealing QR factorization
1487 * or the condition estimator has failed, or the COND_OK
1488 * is set very close to ONE (which is unnecessary). Normally,
1489 * this branch should never be executed, but in rare cases of
1490 * failure of the RRQR or condition estimator, the last line of
1491 * defense ensures that DGEJSV completes the task.
1492 * Compute the full SVD of L3 using DGESVJ with explicit
1493 * accumulation of Jacobi rotations.
1494  CALL dgesvj( 'L', 'U', 'V', nr, nr, v, ldv, sva, nr, u,
1495  $ ldu, work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, info )
1496  scalem = work(2*n+n*nr+nr+1)
1497  numrank = idnint(work(2*n+n*nr+nr+2))
1498  IF ( nr .LT. n ) THEN
1499  CALL dlaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1500  CALL dlaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1501  CALL dlaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1502  END IF
1503  CALL dormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1504  $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1505 *
1506  CALL dormlq( 'L', 'T', nr, nr, nr, work(2*n+1), n,
1507  $ work(2*n+n*nr+1), u, ldu, work(2*n+n*nr+nr+1),
1508  $ lwork-2*n-n*nr-nr, ierr )
1509  DO 773 q = 1, nr
1510  DO 772 p = 1, nr
1511  work(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1512  772 CONTINUE
1513  DO 774 p = 1, nr
1514  u(p,q) = work(2*n+n*nr+nr+p)
1515  774 CONTINUE
1516  773 CONTINUE
1517 *
1518  END IF
1519 *
1520 * Permute the rows of V using the (column) permutation from the
1521 * first QRF. Also, scale the columns to make them unit in
1522 * Euclidean norm. This applies to all cases.
1523 *
1524  temp1 = dsqrt(dble(n)) * epsln
1525  DO 1972 q = 1, n
1526  DO 972 p = 1, n
1527  work(2*n+n*nr+nr+iwork(p)) = v(p,q)
1528  972 CONTINUE
1529  DO 973 p = 1, n
1530  v(p,q) = work(2*n+n*nr+nr+p)
1531  973 CONTINUE
1532  xsc = one / dnrm2( n, v(1,q), 1 )
1533  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1534  $ CALL dscal( n, xsc, v(1,q), 1 )
1535  1972 CONTINUE
1536 * At this moment, V contains the right singular vectors of A.
1537 * Next, assemble the left singular vector matrix U (M x N).
1538  IF ( nr .LT. m ) THEN
1539  CALL dlaset( 'A', m-nr, nr, zero, zero, u(nr+1,1), ldu )
1540  IF ( nr .LT. n1 ) THEN
1541  CALL dlaset('A',nr,n1-nr,zero,zero,u(1,nr+1),ldu)
1542  CALL dlaset('A',m-nr,n1-nr,zero,one,u(nr+1,nr+1),ldu)
1543  END IF
1544  END IF
1545 *
1546 * The Q matrix from the first QRF is built into the left singular
1547 * matrix U. This applies to all cases.
1548 *
1549  CALL dormqr( 'Left', 'No_Tr', m, n1, n, a, lda, work, u,
1550  $ ldu, work(n+1), lwork-n, ierr )
1551 
1552 * The columns of U are normalized. The cost is O(M*N) flops.
1553  temp1 = dsqrt(dble(m)) * epsln
1554  DO 1973 p = 1, nr
1555  xsc = one / dnrm2( m, u(1,p), 1 )
1556  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1557  $ CALL dscal( m, xsc, u(1,p), 1 )
1558  1973 CONTINUE
1559 *
1560 * If the initial QRF is computed with row pivoting, the left
1561 * singular vectors must be adjusted.
1562 *
1563  IF ( rowpiv )
1564  $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1565 *
1566  ELSE
1567 *
1568 * .. the initial matrix A has almost orthogonal columns and
1569 * the second QRF is not needed
1570 *
1571  CALL dlacpy( 'Upper', n, n, a, lda, work(n+1), n )
1572  IF ( l2pert ) THEN
1573  xsc = dsqrt(small)
1574  DO 5970 p = 2, n
1575  temp1 = xsc * work( n + (p-1)*n + p )
1576  DO 5971 q = 1, p - 1
1577  work(n+(q-1)*n+p)=-dsign(temp1,work(n+(p-1)*n+q))
1578  5971 CONTINUE
1579  5970 CONTINUE
1580  ELSE
1581  CALL dlaset( 'Lower',n-1,n-1,zero,zero,work(n+2),n )
1582  END IF
1583 *
1584  CALL dgesvj( 'Upper', 'U', 'N', n, n, work(n+1), n, sva,
1585  $ n, u, ldu, work(n+n*n+1), lwork-n-n*n, info )
1586 *
1587  scalem = work(n+n*n+1)
1588  numrank = idnint(work(n+n*n+2))
1589  DO 6970 p = 1, n
1590  CALL dcopy( n, work(n+(p-1)*n+1), 1, u(1,p), 1 )
1591  CALL dscal( n, sva(p), work(n+(p-1)*n+1), 1 )
1592  6970 CONTINUE
1593 *
1594  CALL dtrsm( 'Left', 'Upper', 'NoTrans', 'No UD', n, n,
1595  $ one, a, lda, work(n+1), n )
1596  DO 6972 p = 1, n
1597  CALL dcopy( n, work(n+p), n, v(iwork(p),1), ldv )
1598  6972 CONTINUE
1599  temp1 = dsqrt(dble(n))*epsln
1600  DO 6971 p = 1, n
1601  xsc = one / dnrm2( n, v(1,p), 1 )
1602  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1603  $ CALL dscal( n, xsc, v(1,p), 1 )
1604  6971 CONTINUE
1605 *
1606 * Assemble the left singular vector matrix U (M x N).
1607 *
1608  IF ( n .LT. m ) THEN
1609  CALL dlaset( 'A', m-n, n, zero, zero, u(n+1,1), ldu )
1610  IF ( n .LT. n1 ) THEN
1611  CALL dlaset( 'A',n, n1-n, zero, zero, u(1,n+1),ldu )
1612  CALL dlaset( 'A',m-n,n1-n, zero, one,u(n+1,n+1),ldu )
1613  END IF
1614  END IF
1615  CALL dormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1616  $ ldu, work(n+1), lwork-n, ierr )
1617  temp1 = dsqrt(dble(m))*epsln
1618  DO 6973 p = 1, n1
1619  xsc = one / dnrm2( m, u(1,p), 1 )
1620  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1621  $ CALL dscal( m, xsc, u(1,p), 1 )
1622  6973 CONTINUE
1623 *
1624  IF ( rowpiv )
1625  $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1626 *
1627  END IF
1628 *
1629 * end of the >> almost orthogonal case << in the full SVD
1630 *
1631  ELSE
1632 *
1633 * This branch deploys a preconditioned Jacobi SVD with explicitly
1634 * accumulated rotations. It is included as optional, mainly for
1635 * experimental purposes. It does perfom well, and can also be used.
1636 * In this implementation, this branch will be automatically activated
1637 * if the condition number sigma_max(A) / sigma_min(A) is predicted
1638 * to be greater than the overflow threshold. This is because the
1639 * a posteriori computation of the singular vectors assumes robust
1640 * implementation of BLAS and some LAPACK procedures, capable of working
1641 * in presence of extreme values. Since that is not always the case, ...
1642 *
1643  DO 7968 p = 1, nr
1644  CALL dcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1645  7968 CONTINUE
1646 *
1647  IF ( l2pert ) THEN
1648  xsc = dsqrt(small/epsln)
1649  DO 5969 q = 1, nr
1650  temp1 = xsc*dabs( v(q,q) )
1651  DO 5968 p = 1, n
1652  IF ( ( p .GT. q ) .AND. ( dabs(v(p,q)) .LE. temp1 )
1653  $ .OR. ( p .LT. q ) )
1654  $ v(p,q) = dsign( temp1, v(p,q) )
1655  IF ( p .LT. q ) v(p,q) = - v(p,q)
1656  5968 CONTINUE
1657  5969 CONTINUE
1658  ELSE
1659  CALL dlaset( 'U', nr-1, nr-1, zero, zero, v(1,2), ldv )
1660  END IF
1661 
1662  CALL dgeqrf( n, nr, v, ldv, work(n+1), work(2*n+1),
1663  $ lwork-2*n, ierr )
1664  CALL dlacpy( 'L', n, nr, v, ldv, work(2*n+1), n )
1665 *
1666  DO 7969 p = 1, nr
1667  CALL dcopy( nr-p+1, v(p,p), ldv, u(p,p), 1 )
1668  7969 CONTINUE
1669 
1670  IF ( l2pert ) THEN
1671  xsc = dsqrt(small/epsln)
1672  DO 9970 q = 2, nr
1673  DO 9971 p = 1, q - 1
1674  temp1 = xsc * min(dabs(u(p,p)),dabs(u(q,q)))
1675  u(p,q) = - dsign( temp1, u(q,p) )
1676  9971 CONTINUE
1677  9970 CONTINUE
1678  ELSE
1679  CALL dlaset('U', nr-1, nr-1, zero, zero, u(1,2), ldu )
1680  END IF
1681 
1682  CALL dgesvj( 'G', 'U', 'V', nr, nr, u, ldu, sva,
1683  $ n, v, ldv, work(2*n+n*nr+1), lwork-2*n-n*nr, info )
1684  scalem = work(2*n+n*nr+1)
1685  numrank = idnint(work(2*n+n*nr+2))
1686 
1687  IF ( nr .LT. n ) THEN
1688  CALL dlaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1689  CALL dlaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1690  CALL dlaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1691  END IF
1692 
1693  CALL dormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1694  $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1695 *
1696 * Permute the rows of V using the (column) permutation from the
1697 * first QRF. Also, scale the columns to make them unit in
1698 * Euclidean norm. This applies to all cases.
1699 *
1700  temp1 = dsqrt(dble(n)) * epsln
1701  DO 7972 q = 1, n
1702  DO 8972 p = 1, n
1703  work(2*n+n*nr+nr+iwork(p)) = v(p,q)
1704  8972 CONTINUE
1705  DO 8973 p = 1, n
1706  v(p,q) = work(2*n+n*nr+nr+p)
1707  8973 CONTINUE
1708  xsc = one / dnrm2( n, v(1,q), 1 )
1709  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1710  $ CALL dscal( n, xsc, v(1,q), 1 )
1711  7972 CONTINUE
1712 *
1713 * At this moment, V contains the right singular vectors of A.
1714 * Next, assemble the left singular vector matrix U (M x N).
1715 *
1716  IF ( nr .LT. m ) THEN
1717  CALL dlaset( 'A', m-nr, nr, zero, zero, u(nr+1,1), ldu )
1718  IF ( nr .LT. n1 ) THEN
1719  CALL dlaset( 'A',nr, n1-nr, zero, zero, u(1,nr+1),ldu )
1720  CALL dlaset( 'A',m-nr,n1-nr, zero, one,u(nr+1,nr+1),ldu )
1721  END IF
1722  END IF
1723 *
1724  CALL dormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1725  $ ldu, work(n+1), lwork-n, ierr )
1726 *
1727  IF ( rowpiv )
1728  $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1729 *
1730 *
1731  END IF
1732  IF ( transp ) THEN
1733 * .. swap U and V because the procedure worked on A^t
1734  DO 6974 p = 1, n
1735  CALL dswap( n, u(1,p), 1, v(1,p), 1 )
1736  6974 CONTINUE
1737  END IF
1738 *
1739  END IF
1740 * end of the full SVD
1741 *
1742 * Undo scaling, if necessary (and possible)
1743 *
1744  IF ( uscal2 .LE. (big/sva(1))*uscal1 ) THEN
1745  CALL dlascl( 'G', 0, 0, uscal1, uscal2, nr, 1, sva, n, ierr )
1746  uscal1 = one
1747  uscal2 = one
1748  END IF
1749 *
1750  IF ( nr .LT. n ) THEN
1751  DO 3004 p = nr+1, n
1752  sva(p) = zero
1753  3004 CONTINUE
1754  END IF
1755 *
1756  work(1) = uscal2 * scalem
1757  work(2) = uscal1
1758  IF ( errest ) work(3) = sconda
1759  IF ( lsvec .AND. rsvec ) THEN
1760  work(4) = condr1
1761  work(5) = condr2
1762  END IF
1763  IF ( l2tran ) THEN
1764  work(6) = entra
1765  work(7) = entrat
1766  END IF
1767 *
1768  iwork(1) = nr
1769  iwork(2) = numrank
1770  iwork(3) = warning
1771 *
1772  RETURN
1773 * ..
1774 * .. END OF DGEJSV
1775 * ..
subroutine dlaset(UPLO, M, N, ALPHA, BETA, A, LDA)
DLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values...
Definition: dlaset.f:112
subroutine dgesvj(JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, LDV, WORK, LWORK, INFO)
DGESVJ
Definition: dgesvj.f:339
subroutine dlacpy(UPLO, M, N, A, LDA, B, LDB)
DLACPY copies all or part of one two-dimensional array to another.
Definition: dlacpy.f:105
subroutine dgeqrf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
DGEQRF
Definition: dgeqrf.f:138
subroutine dlaswp(N, A, LDA, K1, K2, IPIV, INCX)
DLASWP performs a series of row interchanges on a general rectangular matrix.
Definition: dlaswp.f:116
subroutine dgeqp3(M, N, A, LDA, JPVT, TAU, WORK, LWORK, INFO)
DGEQP3
Definition: dgeqp3.f:153
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:62
subroutine dormqr(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
DORMQR
Definition: dormqr.f:169
subroutine dlassq(N, X, INCX, SCALE, SUMSQ)
DLASSQ updates a sum of squares represented in scaled form.
Definition: dlassq.f:105
subroutine dswap(N, DX, INCX, DY, INCY)
DSWAP
Definition: dswap.f:53
subroutine dormlq(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
DORMLQ
Definition: dormlq.f:169
double precision function dlamch(CMACH)
DLAMCH
Definition: dlamch.f:65
subroutine dorgqr(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
DORGQR
Definition: dorgqr.f:130
subroutine dtrsm(SIDE, UPLO, TRANSA, DIAG, M, N, ALPHA, A, LDA, B, LDB)
DTRSM
Definition: dtrsm.f:183
logical function lsame(CA, CB)
LSAME
Definition: lsame.f:55
subroutine dgelqf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
DGELQF
Definition: dgelqf.f:137
subroutine dlascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
DLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: dlascl.f:141
double precision function dnrm2(N, X, INCX)
DNRM2
Definition: dnrm2.f:56
integer function idamax(N, DX, INCX)
IDAMAX
Definition: idamax.f:53
subroutine dscal(N, DA, DX, INCX)
DSCAL
Definition: dscal.f:55
subroutine dcopy(N, DX, INCX, DY, INCY)
DCOPY
Definition: dcopy.f:53
subroutine dpocon(UPLO, N, A, LDA, ANORM, RCOND, WORK, IWORK, INFO)
DPOCON
Definition: dpocon.f:123

Here is the call graph for this function:

Here is the caller graph for this function:

subroutine dgesdd ( character  JOBZ,
integer  M,
integer  N,
double precision, dimension( lda, * )  A,
integer  LDA,
double precision, dimension( * )  S,
double precision, dimension( ldu, * )  U,
integer  LDU,
double precision, dimension( ldvt, * )  VT,
integer  LDVT,
double precision, dimension( * )  WORK,
integer  LWORK,
integer, dimension( * )  IWORK,
integer  INFO 
)

DGESDD

Download DGESDD + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 DGESDD computes the singular value decomposition (SVD) of a real
 M-by-N matrix A, optionally computing the left and right singular
 vectors.  If singular vectors are desired, it uses a
 divide-and-conquer algorithm.

 The SVD is written

      A = U * SIGMA * transpose(V)

 where SIGMA is an M-by-N matrix which is zero except for its
 min(m,n) diagonal elements, U is an M-by-M orthogonal matrix, and
 V is an N-by-N orthogonal matrix.  The diagonal elements of SIGMA
 are the singular values of A; they are real and non-negative, and
 are returned in descending order.  The first min(m,n) columns of
 U and V are the left and right singular vectors of A.

 Note that the routine returns VT = V**T, not V.

 The divide and conquer algorithm makes very mild assumptions about
 floating point arithmetic. It will work on machines with a guard
 digit in add/subtract, or on those binary machines without guard
 digits which subtract like the Cray X-MP, Cray Y-MP, Cray C-90, or
 Cray-2. It could conceivably fail on hexadecimal or decimal machines
 without guard digits, but we know of none.
Parameters
[in]JOBZ
          JOBZ is CHARACTER*1
          Specifies options for computing all or part of the matrix U:
          = 'A':  all M columns of U and all N rows of V**T are
                  returned in the arrays U and VT;
          = 'S':  the first min(M,N) columns of U and the first
                  min(M,N) rows of V**T are returned in the arrays U
                  and VT;
          = 'O':  If M >= N, the first N columns of U are overwritten
                  on the array A and all rows of V**T are returned in
                  the array VT;
                  otherwise, all columns of U are returned in the
                  array U and the first M rows of V**T are overwritten
                  in the array A;
          = 'N':  no columns of U or rows of V**T are computed.
[in]M
          M is INTEGER
          The number of rows of the input matrix A.  M >= 0.
[in]N
          N is INTEGER
          The number of columns of the input matrix A.  N >= 0.
[in,out]A
          A is DOUBLE PRECISION array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
          On exit,
          if JOBZ = 'O',  A is overwritten with the first N columns
                          of U (the left singular vectors, stored
                          columnwise) if M >= N;
                          A is overwritten with the first M rows
                          of V**T (the right singular vectors, stored
                          rowwise) otherwise.
          if JOBZ .ne. 'O', the contents of A are destroyed.
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]S
          S is DOUBLE PRECISION array, dimension (min(M,N))
          The singular values of A, sorted so that S(i) >= S(i+1).
[out]U
          U is DOUBLE PRECISION array, dimension (LDU,UCOL)
          UCOL = M if JOBZ = 'A' or JOBZ = 'O' and M < N;
          UCOL = min(M,N) if JOBZ = 'S'.
          If JOBZ = 'A' or JOBZ = 'O' and M < N, U contains the M-by-M
          orthogonal matrix U;
          if JOBZ = 'S', U contains the first min(M,N) columns of U
          (the left singular vectors, stored columnwise);
          if JOBZ = 'O' and M >= N, or JOBZ = 'N', U is not referenced.
[in]LDU
          LDU is INTEGER
          The leading dimension of the array U.  LDU >= 1; if
          JOBZ = 'S' or 'A' or JOBZ = 'O' and M < N, LDU >= M.
[out]VT
          VT is DOUBLE PRECISION array, dimension (LDVT,N)
          If JOBZ = 'A' or JOBZ = 'O' and M >= N, VT contains the
          N-by-N orthogonal matrix V**T;
          if JOBZ = 'S', VT contains the first min(M,N) rows of
          V**T (the right singular vectors, stored rowwise);
          if JOBZ = 'O' and M < N, or JOBZ = 'N', VT is not referenced.
[in]LDVT
          LDVT is INTEGER
          The leading dimension of the array VT.  LDVT >= 1; if
          JOBZ = 'A' or JOBZ = 'O' and M >= N, LDVT >= N;
          if JOBZ = 'S', LDVT >= min(M,N).
[out]WORK
          WORK is DOUBLE PRECISION array, dimension (MAX(1,LWORK))
          On exit, if INFO = 0, WORK(1) returns the optimal LWORK;
[in]LWORK
          LWORK is INTEGER
          The dimension of the array WORK. LWORK >= 1.
          If JOBZ = 'N',
            LWORK >= 3*min(M,N) + max(max(M,N),7*min(M,N)).
          If JOBZ = 'O',
            LWORK >= 3*min(M,N) + 
                     max(max(M,N),5*min(M,N)*min(M,N)+4*min(M,N)).
          If JOBZ = 'S' or 'A'
            LWORK >= min(M,N)*(7+4*min(M,N))
          For good performance, LWORK should generally be larger.
          If LWORK = -1 but other input arguments are legal, WORK(1)
          returns the optimal LWORK.
[out]IWORK
          IWORK is INTEGER array, dimension (8*min(M,N))
[out]INFO
          INFO is INTEGER
          = 0:  successful exit.
          < 0:  if INFO = -i, the i-th argument had an illegal value.
          > 0:  DBDSDC did not converge, updating process failed.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Date
November 2015
Contributors:
Ming Gu and Huan Ren, Computer Science Division, University of California at Berkeley, USA

Definition at line 218 of file dgesdd.f.

218 *
219 * -- LAPACK driver routine (version 3.6.0) --
220 * -- LAPACK is a software package provided by Univ. of Tennessee, --
221 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
222 * November 2015
223 *
224 * .. Scalar Arguments ..
225  CHARACTER jobz
226  INTEGER info, lda, ldu, ldvt, lwork, m, n
227 * ..
228 * .. Array Arguments ..
229  INTEGER iwork( * )
230  DOUBLE PRECISION a( lda, * ), s( * ), u( ldu, * ),
231  $ vt( ldvt, * ), work( * )
232 * ..
233 *
234 * =====================================================================
235 *
236 * .. Parameters ..
237  DOUBLE PRECISION zero, one
238  parameter( zero = 0.0d0, one = 1.0d0 )
239 * ..
240 * .. Local Scalars ..
241  LOGICAL lquery, wntqa, wntqas, wntqn, wntqo, wntqs
242  INTEGER bdspac, blk, chunk, i, ie, ierr, il,
243  $ ir, iscl, itau, itaup, itauq, iu, ivt, ldwkvt,
244  $ ldwrkl, ldwrkr, ldwrku, maxwrk, minmn, minwrk,
245  $ mnthr, nwork, wrkbl
246  DOUBLE PRECISION anrm, bignum, eps, smlnum
247 * ..
248 * .. Local Arrays ..
249  INTEGER idum( 1 )
250  DOUBLE PRECISION dum( 1 )
251 * ..
252 * .. External Subroutines ..
253  EXTERNAL dbdsdc, dgebrd, dgelqf, dgemm, dgeqrf, dlacpy,
255  $ xerbla
256 * ..
257 * .. External Functions ..
258  LOGICAL lsame
259  INTEGER ilaenv
260  DOUBLE PRECISION dlamch, dlange
261  EXTERNAL dlamch, dlange, ilaenv, lsame
262 * ..
263 * .. Intrinsic Functions ..
264  INTRINSIC int, max, min, sqrt
265 * ..
266 * .. Executable Statements ..
267 *
268 * Test the input arguments
269 *
270  info = 0
271  minmn = min( m, n )
272  wntqa = lsame( jobz, 'A' )
273  wntqs = lsame( jobz, 'S' )
274  wntqas = wntqa .OR. wntqs
275  wntqo = lsame( jobz, 'O' )
276  wntqn = lsame( jobz, 'N' )
277  lquery = ( lwork.EQ.-1 )
278 *
279  IF( .NOT.( wntqa .OR. wntqs .OR. wntqo .OR. wntqn ) ) THEN
280  info = -1
281  ELSE IF( m.LT.0 ) THEN
282  info = -2
283  ELSE IF( n.LT.0 ) THEN
284  info = -3
285  ELSE IF( lda.LT.max( 1, m ) ) THEN
286  info = -5
287  ELSE IF( ldu.LT.1 .OR. ( wntqas .AND. ldu.LT.m ) .OR.
288  $ ( wntqo .AND. m.LT.n .AND. ldu.LT.m ) ) THEN
289  info = -8
290  ELSE IF( ldvt.LT.1 .OR. ( wntqa .AND. ldvt.LT.n ) .OR.
291  $ ( wntqs .AND. ldvt.LT.minmn ) .OR.
292  $ ( wntqo .AND. m.GE.n .AND. ldvt.LT.n ) ) THEN
293  info = -10
294  END IF
295 *
296 * Compute workspace
297 * (Note: Comments in the code beginning "Workspace:" describe the
298 * minimal amount of workspace needed at that point in the code,
299 * as well as the preferred amount for good performance.
300 * NB refers to the optimal block size for the immediately
301 * following subroutine, as returned by ILAENV.)
302 *
303  IF( info.EQ.0 ) THEN
304  minwrk = 1
305  maxwrk = 1
306  IF( m.GE.n .AND. minmn.GT.0 ) THEN
307 *
308 * Compute space needed for DBDSDC
309 *
310  mnthr = int( minmn*11.0d0 / 6.0d0 )
311  IF( wntqn ) THEN
312  bdspac = 7*n
313  ELSE
314  bdspac = 3*n*n + 4*n
315  END IF
316  IF( m.GE.mnthr ) THEN
317  IF( wntqn ) THEN
318 *
319 * Path 1 (M much larger than N, JOBZ='N')
320 *
321  wrkbl = n + n*ilaenv( 1, 'DGEQRF', ' ', m, n, -1,
322  $ -1 )
323  wrkbl = max( wrkbl, 3*n+2*n*
324  $ ilaenv( 1, 'DGEBRD', ' ', n, n, -1, -1 ) )
325  maxwrk = max( wrkbl, bdspac+n )
326  minwrk = bdspac + n
327  ELSE IF( wntqo ) THEN
328 *
329 * Path 2 (M much larger than N, JOBZ='O')
330 *
331  wrkbl = n + n*ilaenv( 1, 'DGEQRF', ' ', m, n, -1, -1 )
332  wrkbl = max( wrkbl, n+n*ilaenv( 1, 'DORGQR', ' ', m,
333  $ n, n, -1 ) )
334  wrkbl = max( wrkbl, 3*n+2*n*
335  $ ilaenv( 1, 'DGEBRD', ' ', n, n, -1, -1 ) )
336  wrkbl = max( wrkbl, 3*n+n*
337  $ ilaenv( 1, 'DORMBR', 'QLN', n, n, n, -1 ) )
338  wrkbl = max( wrkbl, 3*n+n*
339  $ ilaenv( 1, 'DORMBR', 'PRT', n, n, n, -1 ) )
340  wrkbl = max( wrkbl, bdspac+3*n )
341  maxwrk = wrkbl + 2*n*n
342  minwrk = bdspac + 2*n*n + 3*n
343  ELSE IF( wntqs ) THEN
344 *
345 * Path 3 (M much larger than N, JOBZ='S')
346 *
347  wrkbl = n + n*ilaenv( 1, 'DGEQRF', ' ', m, n, -1, -1 )
348  wrkbl = max( wrkbl, n+n*ilaenv( 1, 'DORGQR', ' ', m,
349  $ n, n, -1 ) )
350  wrkbl = max( wrkbl, 3*n+2*n*
351  $ ilaenv( 1, 'DGEBRD', ' ', n, n, -1, -1 ) )
352  wrkbl = max( wrkbl, 3*n+n*
353  $ ilaenv( 1, 'DORMBR', 'QLN', n, n, n, -1 ) )
354  wrkbl = max( wrkbl, 3*n+n*
355  $ ilaenv( 1, 'DORMBR', 'PRT', n, n, n, -1 ) )
356  wrkbl = max( wrkbl, bdspac+3*n )
357  maxwrk = wrkbl + n*n
358  minwrk = bdspac + n*n + 3*n
359  ELSE IF( wntqa ) THEN
360 *
361 * Path 4 (M much larger than N, JOBZ='A')
362 *
363  wrkbl = n + n*ilaenv( 1, 'DGEQRF', ' ', m, n, -1, -1 )
364  wrkbl = max( wrkbl, n+m*ilaenv( 1, 'DORGQR', ' ', m,
365  $ m, n, -1 ) )
366  wrkbl = max( wrkbl, 3*n+2*n*
367  $ ilaenv( 1, 'DGEBRD', ' ', n, n, -1, -1 ) )
368  wrkbl = max( wrkbl, 3*n+n*
369  $ ilaenv( 1, 'DORMBR', 'QLN', n, n, n, -1 ) )
370  wrkbl = max( wrkbl, 3*n+n*
371  $ ilaenv( 1, 'DORMBR', 'PRT', n, n, n, -1 ) )
372  wrkbl = max( wrkbl, bdspac+3*n )
373  maxwrk = wrkbl + n*n
374  minwrk = bdspac + n*n + 2*n + m
375  END IF
376  ELSE
377 *
378 * Path 5 (M at least N, but not much larger)
379 *
380  wrkbl = 3*n + ( m+n )*ilaenv( 1, 'DGEBRD', ' ', m, n, -1,
381  $ -1 )
382  IF( wntqn ) THEN
383  maxwrk = max( wrkbl, bdspac+3*n )
384  minwrk = 3*n + max( m, bdspac )
385  ELSE IF( wntqo ) THEN
386  wrkbl = max( wrkbl, 3*n+n*
387  $ ilaenv( 1, 'DORMBR', 'QLN', m, n, n, -1 ) )
388  wrkbl = max( wrkbl, 3*n+n*
389  $ ilaenv( 1, 'DORMBR', 'PRT', n, n, n, -1 ) )
390  wrkbl = max( wrkbl, bdspac+3*n )
391  maxwrk = wrkbl + m*n
392  minwrk = 3*n + max( m, n*n+bdspac )
393  ELSE IF( wntqs ) THEN
394  wrkbl = max( wrkbl, 3*n+n*
395  $ ilaenv( 1, 'DORMBR', 'QLN', m, n, n, -1 ) )
396  wrkbl = max( wrkbl, 3*n+n*
397  $ ilaenv( 1, 'DORMBR', 'PRT', n, n, n, -1 ) )
398  maxwrk = max( wrkbl, bdspac+3*n )
399  minwrk = 3*n + max( m, bdspac )
400  ELSE IF( wntqa ) THEN
401  wrkbl = max( wrkbl, 3*n+m*
402  $ ilaenv( 1, 'DORMBR', 'QLN', m, m, n, -1 ) )
403  wrkbl = max( wrkbl, 3*n+n*
404  $ ilaenv( 1, 'DORMBR', 'PRT', n, n, n, -1 ) )
405  maxwrk = max( maxwrk, bdspac+3*n )
406  minwrk = 3*n + max( m, bdspac )
407  END IF
408  END IF
409  ELSE IF( minmn.GT.0 ) THEN
410 *
411 * Compute space needed for DBDSDC
412 *
413  mnthr = int( minmn*11.0d0 / 6.0d0 )
414  IF( wntqn ) THEN
415  bdspac = 7*m
416  ELSE
417  bdspac = 3*m*m + 4*m
418  END IF
419  IF( n.GE.mnthr ) THEN
420  IF( wntqn ) THEN
421 *
422 * Path 1t (N much larger than M, JOBZ='N')
423 *
424  wrkbl = m + m*ilaenv( 1, 'DGELQF', ' ', m, n, -1,
425  $ -1 )
426  wrkbl = max( wrkbl, 3*m+2*m*
427  $ ilaenv( 1, 'DGEBRD', ' ', m, m, -1, -1 ) )
428  maxwrk = max( wrkbl, bdspac+m )
429  minwrk = bdspac + m
430  ELSE IF( wntqo ) THEN
431 *
432 * Path 2t (N much larger than M, JOBZ='O')
433 *
434  wrkbl = m + m*ilaenv( 1, 'DGELQF', ' ', m, n, -1, -1 )
435  wrkbl = max( wrkbl, m+m*ilaenv( 1, 'DORGLQ', ' ', m,
436  $ n, m, -1 ) )
437  wrkbl = max( wrkbl, 3*m+2*m*
438  $ ilaenv( 1, 'DGEBRD', ' ', m, m, -1, -1 ) )
439  wrkbl = max( wrkbl, 3*m+m*
440  $ ilaenv( 1, 'DORMBR', 'QLN', m, m, m, -1 ) )
441  wrkbl = max( wrkbl, 3*m+m*
442  $ ilaenv( 1, 'DORMBR', 'PRT', m, m, m, -1 ) )
443  wrkbl = max( wrkbl, bdspac+3*m )
444  maxwrk = wrkbl + 2*m*m
445  minwrk = bdspac + 2*m*m + 3*m
446  ELSE IF( wntqs ) THEN
447 *
448 * Path 3t (N much larger than M, JOBZ='S')
449 *
450  wrkbl = m + m*ilaenv( 1, 'DGELQF', ' ', m, n, -1, -1 )
451  wrkbl = max( wrkbl, m+m*ilaenv( 1, 'DORGLQ', ' ', m,
452  $ n, m, -1 ) )
453  wrkbl = max( wrkbl, 3*m+2*m*
454  $ ilaenv( 1, 'DGEBRD', ' ', m, m, -1, -1 ) )
455  wrkbl = max( wrkbl, 3*m+m*
456  $ ilaenv( 1, 'DORMBR', 'QLN', m, m, m, -1 ) )
457  wrkbl = max( wrkbl, 3*m+m*
458  $ ilaenv( 1, 'DORMBR', 'PRT', m, m, m, -1 ) )
459  wrkbl = max( wrkbl, bdspac+3*m )
460  maxwrk = wrkbl + m*m
461  minwrk = bdspac + m*m + 3*m
462  ELSE IF( wntqa ) THEN
463 *
464 * Path 4t (N much larger than M, JOBZ='A')
465 *
466  wrkbl = m + m*ilaenv( 1, 'DGELQF', ' ', m, n, -1, -1 )
467  wrkbl = max( wrkbl, m+n*ilaenv( 1, 'DORGLQ', ' ', n,
468  $ n, m, -1 ) )
469  wrkbl = max( wrkbl, 3*m+2*m*
470  $ ilaenv( 1, 'DGEBRD', ' ', m, m, -1, -1 ) )
471  wrkbl = max( wrkbl, 3*m+m*
472  $ ilaenv( 1, 'DORMBR', 'QLN', m, m, m, -1 ) )
473  wrkbl = max( wrkbl, 3*m+m*
474  $ ilaenv( 1, 'DORMBR', 'PRT', m, m, m, -1 ) )
475  wrkbl = max( wrkbl, bdspac+3*m )
476  maxwrk = wrkbl + m*m
477  minwrk = bdspac + m*m + 3*m
478  END IF
479  ELSE
480 *
481 * Path 5t (N greater than M, but not much larger)
482 *
483  wrkbl = 3*m + ( m+n )*ilaenv( 1, 'DGEBRD', ' ', m, n, -1,
484  $ -1 )
485  IF( wntqn ) THEN
486  maxwrk = max( wrkbl, bdspac+3*m )
487  minwrk = 3*m + max( n, bdspac )
488  ELSE IF( wntqo ) THEN
489  wrkbl = max( wrkbl, 3*m+m*
490  $ ilaenv( 1, 'DORMBR', 'QLN', m, m, n, -1 ) )
491  wrkbl = max( wrkbl, 3*m+m*
492  $ ilaenv( 1, 'DORMBR', 'PRT', m, n, m, -1 ) )
493  wrkbl = max( wrkbl, bdspac+3*m )
494  maxwrk = wrkbl + m*n
495  minwrk = 3*m + max( n, m*m+bdspac )
496  ELSE IF( wntqs ) THEN
497  wrkbl = max( wrkbl, 3*m+m*
498  $ ilaenv( 1, 'DORMBR', 'QLN', m, m, n, -1 ) )
499  wrkbl = max( wrkbl, 3*m+m*
500  $ ilaenv( 1, 'DORMBR', 'PRT', m, n, m, -1 ) )
501  maxwrk = max( wrkbl, bdspac+3*m )
502  minwrk = 3*m + max( n, bdspac )
503  ELSE IF( wntqa ) THEN
504  wrkbl = max( wrkbl, 3*m+m*
505  $ ilaenv( 1, 'DORMBR', 'QLN', m, m, n, -1 ) )
506  wrkbl = max( wrkbl, 3*m+m*
507  $ ilaenv( 1, 'DORMBR', 'PRT', n, n, m, -1 ) )
508  maxwrk = max( wrkbl, bdspac+3*m )
509  minwrk = 3*m + max( n, bdspac )
510  END IF
511  END IF
512  END IF
513  maxwrk = max( maxwrk, minwrk )
514  work( 1 ) = maxwrk
515 *
516  IF( lwork.LT.minwrk .AND. .NOT.lquery ) THEN
517  info = -12
518  END IF
519  END IF
520 *
521  IF( info.NE.0 ) THEN
522  CALL xerbla( 'DGESDD', -info )
523  RETURN
524  ELSE IF( lquery ) THEN
525  RETURN
526  END IF
527 *
528 * Quick return if possible
529 *
530  IF( m.EQ.0 .OR. n.EQ.0 ) THEN
531  RETURN
532  END IF
533 *
534 * Get machine constants
535 *
536  eps = dlamch( 'P' )
537  smlnum = sqrt( dlamch( 'S' ) ) / eps
538  bignum = one / smlnum
539 *
540 * Scale A if max element outside range [SMLNUM,BIGNUM]
541 *
542  anrm = dlange( 'M', m, n, a, lda, dum )
543  iscl = 0
544  IF( anrm.GT.zero .AND. anrm.LT.smlnum ) THEN
545  iscl = 1
546  CALL dlascl( 'G', 0, 0, anrm, smlnum, m, n, a, lda, ierr )
547  ELSE IF( anrm.GT.bignum ) THEN
548  iscl = 1
549  CALL dlascl( 'G', 0, 0, anrm, bignum, m, n, a, lda, ierr )
550  END IF
551 *
552  IF( m.GE.n ) THEN
553 *
554 * A has at least as many rows as columns. If A has sufficiently
555 * more rows than columns, first reduce using the QR
556 * decomposition (if sufficient workspace available)
557 *
558  IF( m.GE.mnthr ) THEN
559 *
560  IF( wntqn ) THEN
561 *
562 * Path 1 (M much larger than N, JOBZ='N')
563 * No singular vectors to be computed
564 *
565  itau = 1
566  nwork = itau + n
567 *
568 * Compute A=Q*R
569 * (Workspace: need 2*N, prefer N+N*NB)
570 *
571  CALL dgeqrf( m, n, a, lda, work( itau ), work( nwork ),
572  $ lwork-nwork+1, ierr )
573 *
574 * Zero out below R
575 *
576  CALL dlaset( 'L', n-1, n-1, zero, zero, a( 2, 1 ), lda )
577  ie = 1
578  itauq = ie + n
579  itaup = itauq + n
580  nwork = itaup + n
581 *
582 * Bidiagonalize R in A
583 * (Workspace: need 4*N, prefer 3*N+2*N*NB)
584 *
585  CALL dgebrd( n, n, a, lda, s, work( ie ), work( itauq ),
586  $ work( itaup ), work( nwork ), lwork-nwork+1,
587  $ ierr )
588  nwork = ie + n
589 *
590 * Perform bidiagonal SVD, computing singular values only
591 * (Workspace: need N+BDSPAC)
592 *
593  CALL dbdsdc( 'U', 'N', n, s, work( ie ), dum, 1, dum, 1,
594  $ dum, idum, work( nwork ), iwork, info )
595 *
596  ELSE IF( wntqo ) THEN
597 *
598 * Path 2 (M much larger than N, JOBZ = 'O')
599 * N left singular vectors to be overwritten on A and
600 * N right singular vectors to be computed in VT
601 *
602  ir = 1
603 *
604 * WORK(IR) is LDWRKR by N
605 *
606  IF( lwork.GE.lda*n+n*n+3*n+bdspac ) THEN
607  ldwrkr = lda
608  ELSE
609  ldwrkr = ( lwork-n*n-3*n-bdspac ) / n
610  END IF
611  itau = ir + ldwrkr*n
612  nwork = itau + n
613 *
614 * Compute A=Q*R
615 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
616 *
617  CALL dgeqrf( m, n, a, lda, work( itau ), work( nwork ),
618  $ lwork-nwork+1, ierr )
619 *
620 * Copy R to WORK(IR), zeroing out below it
621 *
622  CALL dlacpy( 'U', n, n, a, lda, work( ir ), ldwrkr )
623  CALL dlaset( 'L', n-1, n-1, zero, zero, work( ir+1 ),
624  $ ldwrkr )
625 *
626 * Generate Q in A
627 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
628 *
629  CALL dorgqr( m, n, n, a, lda, work( itau ),
630  $ work( nwork ), lwork-nwork+1, ierr )
631  ie = itau
632  itauq = ie + n
633  itaup = itauq + n
634  nwork = itaup + n
635 *
636 * Bidiagonalize R in VT, copying result to WORK(IR)
637 * (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB)
638 *
639  CALL dgebrd( n, n, work( ir ), ldwrkr, s, work( ie ),
640  $ work( itauq ), work( itaup ), work( nwork ),
641  $ lwork-nwork+1, ierr )
642 *
643 * WORK(IU) is N by N
644 *
645  iu = nwork
646  nwork = iu + n*n
647 *
648 * Perform bidiagonal SVD, computing left singular vectors
649 * of bidiagonal matrix in WORK(IU) and computing right
650 * singular vectors of bidiagonal matrix in VT
651 * (Workspace: need N+N*N+BDSPAC)
652 *
653  CALL dbdsdc( 'U', 'I', n, s, work( ie ), work( iu ), n,
654  $ vt, ldvt, dum, idum, work( nwork ), iwork,
655  $ info )
656 *
657 * Overwrite WORK(IU) by left singular vectors of R
658 * and VT by right singular vectors of R
659 * (Workspace: need 2*N*N+3*N, prefer 2*N*N+2*N+N*NB)
660 *
661  CALL dormbr( 'Q', 'L', 'N', n, n, n, work( ir ), ldwrkr,
662  $ work( itauq ), work( iu ), n, work( nwork ),
663  $ lwork-nwork+1, ierr )
664  CALL dormbr( 'P', 'R', 'T', n, n, n, work( ir ), ldwrkr,
665  $ work( itaup ), vt, ldvt, work( nwork ),
666  $ lwork-nwork+1, ierr )
667 *
668 * Multiply Q in A by left singular vectors of R in
669 * WORK(IU), storing result in WORK(IR) and copying to A
670 * (Workspace: need 2*N*N, prefer N*N+M*N)
671 *
672  DO 10 i = 1, m, ldwrkr
673  chunk = min( m-i+1, ldwrkr )
674  CALL dgemm( 'N', 'N', chunk, n, n, one, a( i, 1 ),
675  $ lda, work( iu ), n, zero, work( ir ),
676  $ ldwrkr )
677  CALL dlacpy( 'F', chunk, n, work( ir ), ldwrkr,
678  $ a( i, 1 ), lda )
679  10 CONTINUE
680 *
681  ELSE IF( wntqs ) THEN
682 *
683 * Path 3 (M much larger than N, JOBZ='S')
684 * N left singular vectors to be computed in U and
685 * N right singular vectors to be computed in VT
686 *
687  ir = 1
688 *
689 * WORK(IR) is N by N
690 *
691  ldwrkr = n
692  itau = ir + ldwrkr*n
693  nwork = itau + n
694 *
695 * Compute A=Q*R
696 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
697 *
698  CALL dgeqrf( m, n, a, lda, work( itau ), work( nwork ),
699  $ lwork-nwork+1, ierr )
700 *
701 * Copy R to WORK(IR), zeroing out below it
702 *
703  CALL dlacpy( 'U', n, n, a, lda, work( ir ), ldwrkr )
704  CALL dlaset( 'L', n-1, n-1, zero, zero, work( ir+1 ),
705  $ ldwrkr )
706 *
707 * Generate Q in A
708 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
709 *
710  CALL dorgqr( m, n, n, a, lda, work( itau ),
711  $ work( nwork ), lwork-nwork+1, ierr )
712  ie = itau
713  itauq = ie + n
714  itaup = itauq + n
715  nwork = itaup + n
716 *
717 * Bidiagonalize R in WORK(IR)
718 * (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB)
719 *
720  CALL dgebrd( n, n, work( ir ), ldwrkr, s, work( ie ),
721  $ work( itauq ), work( itaup ), work( nwork ),
722  $ lwork-nwork+1, ierr )
723 *
724 * Perform bidiagonal SVD, computing left singular vectors
725 * of bidiagoal matrix in U and computing right singular
726 * vectors of bidiagonal matrix in VT
727 * (Workspace: need N+BDSPAC)
728 *
729  CALL dbdsdc( 'U', 'I', n, s, work( ie ), u, ldu, vt,
730  $ ldvt, dum, idum, work( nwork ), iwork,
731  $ info )
732 *
733 * Overwrite U by left singular vectors of R and VT
734 * by right singular vectors of R
735 * (Workspace: need N*N+3*N, prefer N*N+2*N+N*NB)
736 *
737  CALL dormbr( 'Q', 'L', 'N', n, n, n, work( ir ), ldwrkr,
738  $ work( itauq ), u, ldu, work( nwork ),
739  $ lwork-nwork+1, ierr )
740 *
741  CALL dormbr( 'P', 'R', 'T', n, n, n, work( ir ), ldwrkr,
742  $ work( itaup ), vt, ldvt, work( nwork ),
743  $ lwork-nwork+1, ierr )
744 *
745 * Multiply Q in A by left singular vectors of R in
746 * WORK(IR), storing result in U
747 * (Workspace: need N*N)
748 *
749  CALL dlacpy( 'F', n, n, u, ldu, work( ir ), ldwrkr )
750  CALL dgemm( 'N', 'N', m, n, n, one, a, lda, work( ir ),
751  $ ldwrkr, zero, u, ldu )
752 *
753  ELSE IF( wntqa ) THEN
754 *
755 * Path 4 (M much larger than N, JOBZ='A')
756 * M left singular vectors to be computed in U and
757 * N right singular vectors to be computed in VT
758 *
759  iu = 1
760 *
761 * WORK(IU) is N by N
762 *
763  ldwrku = n
764  itau = iu + ldwrku*n
765  nwork = itau + n
766 *
767 * Compute A=Q*R, copying result to U
768 * (Workspace: need N*N+N+M, prefer N*N+N+M*NB)
769 *
770  CALL dgeqrf( m, n, a, lda, work( itau ), work( nwork ),
771  $ lwork-nwork+1, ierr )
772  CALL dlacpy( 'L', m, n, a, lda, u, ldu )
773 *
774 * Generate Q in U
775 * (Workspace: need N*N+N+M, prefer N*N+N+M*NB)
776  CALL dorgqr( m, m, n, u, ldu, work( itau ),
777  $ work( nwork ), lwork-nwork+1, ierr )
778 *
779 * Produce R in A, zeroing out other entries
780 *
781  CALL dlaset( 'L', n-1, n-1, zero, zero, a( 2, 1 ), lda )
782  ie = itau
783  itauq = ie + n
784  itaup = itauq + n
785  nwork = itaup + n
786 *
787 * Bidiagonalize R in A
788 * (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB)
789 *
790  CALL dgebrd( n, n, a, lda, s, work( ie ), work( itauq ),
791  $ work( itaup ), work( nwork ), lwork-nwork+1,
792  $ ierr )
793 *
794 * Perform bidiagonal SVD, computing left singular vectors
795 * of bidiagonal matrix in WORK(IU) and computing right
796 * singular vectors of bidiagonal matrix in VT
797 * (Workspace: need N+N*N+BDSPAC)
798 *
799  CALL dbdsdc( 'U', 'I', n, s, work( ie ), work( iu ), n,
800  $ vt, ldvt, dum, idum, work( nwork ), iwork,
801  $ info )
802 *
803 * Overwrite WORK(IU) by left singular vectors of R and VT
804 * by right singular vectors of R
805 * (Workspace: need N*N+3*N, prefer N*N+2*N+N*NB)
806 *
807  CALL dormbr( 'Q', 'L', 'N', n, n, n, a, lda,
808  $ work( itauq ), work( iu ), ldwrku,
809  $ work( nwork ), lwork-nwork+1, ierr )
810  CALL dormbr( 'P', 'R', 'T', n, n, n, a, lda,
811  $ work( itaup ), vt, ldvt, work( nwork ),
812  $ lwork-nwork+1, ierr )
813 *
814 * Multiply Q in U by left singular vectors of R in
815 * WORK(IU), storing result in A
816 * (Workspace: need N*N)
817 *
818  CALL dgemm( 'N', 'N', m, n, n, one, u, ldu, work( iu ),
819  $ ldwrku, zero, a, lda )
820 *
821 * Copy left singular vectors of A from A to U
822 *
823  CALL dlacpy( 'F', m, n, a, lda, u, ldu )
824 *
825  END IF
826 *
827  ELSE
828 *
829 * M .LT. MNTHR
830 *
831 * Path 5 (M at least N, but not much larger)
832 * Reduce to bidiagonal form without QR decomposition
833 *
834  ie = 1
835  itauq = ie + n
836  itaup = itauq + n
837  nwork = itaup + n
838 *
839 * Bidiagonalize A
840 * (Workspace: need 3*N+M, prefer 3*N+(M+N)*NB)
841 *
842  CALL dgebrd( m, n, a, lda, s, work( ie ), work( itauq ),
843  $ work( itaup ), work( nwork ), lwork-nwork+1,
844  $ ierr )
845  IF( wntqn ) THEN
846 *
847 * Perform bidiagonal SVD, only computing singular values
848 * (Workspace: need N+BDSPAC)
849 *
850  CALL dbdsdc( 'U', 'N', n, s, work( ie ), dum, 1, dum, 1,
851  $ dum, idum, work( nwork ), iwork, info )
852  ELSE IF( wntqo ) THEN
853  iu = nwork
854  IF( lwork.GE.m*n+3*n+bdspac ) THEN
855 *
856 * WORK( IU ) is M by N
857 *
858  ldwrku = m
859  nwork = iu + ldwrku*n
860  CALL dlaset( 'F', m, n, zero, zero, work( iu ),
861  $ ldwrku )
862  ELSE
863 *
864 * WORK( IU ) is N by N
865 *
866  ldwrku = n
867  nwork = iu + ldwrku*n
868 *
869 * WORK(IR) is LDWRKR by N
870 *
871  ir = nwork
872  ldwrkr = ( lwork-n*n-3*n ) / n
873  END IF
874  nwork = iu + ldwrku*n
875 *
876 * Perform bidiagonal SVD, computing left singular vectors
877 * of bidiagonal matrix in WORK(IU) and computing right
878 * singular vectors of bidiagonal matrix in VT
879 * (Workspace: need N+N*N+BDSPAC)
880 *
881  CALL dbdsdc( 'U', 'I', n, s, work( ie ), work( iu ),
882  $ ldwrku, vt, ldvt, dum, idum, work( nwork ),
883  $ iwork, info )
884 *
885 * Overwrite VT by right singular vectors of A
886 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
887 *
888  CALL dormbr( 'P', 'R', 'T', n, n, n, a, lda,
889  $ work( itaup ), vt, ldvt, work( nwork ),
890  $ lwork-nwork+1, ierr )
891 *
892  IF( lwork.GE.m*n+3*n+bdspac ) THEN
893 *
894 * Overwrite WORK(IU) by left singular vectors of A
895 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
896 *
897  CALL dormbr( 'Q', 'L', 'N', m, n, n, a, lda,
898  $ work( itauq ), work( iu ), ldwrku,
899  $ work( nwork ), lwork-nwork+1, ierr )
900 *
901 * Copy left singular vectors of A from WORK(IU) to A
902 *
903  CALL dlacpy( 'F', m, n, work( iu ), ldwrku, a, lda )
904  ELSE
905 *
906 * Generate Q in A
907 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
908 *
909  CALL dorgbr( 'Q', m, n, n, a, lda, work( itauq ),
910  $ work( nwork ), lwork-nwork+1, ierr )
911 *
912 * Multiply Q in A by left singular vectors of
913 * bidiagonal matrix in WORK(IU), storing result in
914 * WORK(IR) and copying to A
915 * (Workspace: need 2*N*N, prefer N*N+M*N)
916 *
917  DO 20 i = 1, m, ldwrkr
918  chunk = min( m-i+1, ldwrkr )
919  CALL dgemm( 'N', 'N', chunk, n, n, one, a( i, 1 ),
920  $ lda, work( iu ), ldwrku, zero,
921  $ work( ir ), ldwrkr )
922  CALL dlacpy( 'F', chunk, n, work( ir ), ldwrkr,
923  $ a( i, 1 ), lda )
924  20 CONTINUE
925  END IF
926 *
927  ELSE IF( wntqs ) THEN
928 *
929 * Perform bidiagonal SVD, computing left singular vectors
930 * of bidiagonal matrix in U and computing right singular
931 * vectors of bidiagonal matrix in VT
932 * (Workspace: need N+BDSPAC)
933 *
934  CALL dlaset( 'F', m, n, zero, zero, u, ldu )
935  CALL dbdsdc( 'U', 'I', n, s, work( ie ), u, ldu, vt,
936  $ ldvt, dum, idum, work( nwork ), iwork,
937  $ info )
938 *
939 * Overwrite U by left singular vectors of A and VT
940 * by right singular vectors of A
941 * (Workspace: need 3*N, prefer 2*N+N*NB)
942 *
943  CALL dormbr( 'Q', 'L', 'N', m, n, n, a, lda,
944  $ work( itauq ), u, ldu, work( nwork ),
945  $ lwork-nwork+1, ierr )
946  CALL dormbr( 'P', 'R', 'T', n, n, n, a, lda,
947  $ work( itaup ), vt, ldvt, work( nwork ),
948  $ lwork-nwork+1, ierr )
949  ELSE IF( wntqa ) THEN
950 *
951 * Perform bidiagonal SVD, computing left singular vectors
952 * of bidiagonal matrix in U and computing right singular
953 * vectors of bidiagonal matrix in VT
954 * (Workspace: need N+BDSPAC)
955 *
956  CALL dlaset( 'F', m, m, zero, zero, u, ldu )
957  CALL dbdsdc( 'U', 'I', n, s, work( ie ), u, ldu, vt,
958  $ ldvt, dum, idum, work( nwork ), iwork,
959  $ info )
960 *
961 * Set the right corner of U to identity matrix
962 *
963  IF( m.GT.n ) THEN
964  CALL dlaset( 'F', m-n, m-n, zero, one, u( n+1, n+1 ),
965  $ ldu )
966  END IF
967 *
968 * Overwrite U by left singular vectors of A and VT
969 * by right singular vectors of A
970 * (Workspace: need N*N+2*N+M, prefer N*N+2*N+M*NB)
971 *
972  CALL dormbr( 'Q', 'L', 'N', m, m, n, a, lda,
973  $ work( itauq ), u, ldu, work( nwork ),
974  $ lwork-nwork+1, ierr )
975  CALL dormbr( 'P', 'R', 'T', n, n, m, a, lda,
976  $ work( itaup ), vt, ldvt, work( nwork ),
977  $ lwork-nwork+1, ierr )
978  END IF
979 *
980  END IF
981 *
982  ELSE
983 *
984 * A has more columns than rows. If A has sufficiently more
985 * columns than rows, first reduce using the LQ decomposition (if
986 * sufficient workspace available)
987 *
988  IF( n.GE.mnthr ) THEN
989 *
990  IF( wntqn ) THEN
991 *
992 * Path 1t (N much larger than M, JOBZ='N')
993 * No singular vectors to be computed
994 *
995  itau = 1
996  nwork = itau + m
997 *
998 * Compute A=L*Q
999 * (Workspace: need 2*M, prefer M+M*NB)
1000 *
1001  CALL dgelqf( m, n, a, lda, work( itau ), work( nwork ),
1002  $ lwork-nwork+1, ierr )
1003 *
1004 * Zero out above L
1005 *
1006  CALL dlaset( 'U', m-1, m-1, zero, zero, a( 1, 2 ), lda )
1007  ie = 1
1008  itauq = ie + m
1009  itaup = itauq + m
1010  nwork = itaup + m
1011 *
1012 * Bidiagonalize L in A
1013 * (Workspace: need 4*M, prefer 3*M+2*M*NB)
1014 *
1015  CALL dgebrd( m, m, a, lda, s, work( ie ), work( itauq ),
1016  $ work( itaup ), work( nwork ), lwork-nwork+1,
1017  $ ierr )
1018  nwork = ie + m
1019 *
1020 * Perform bidiagonal SVD, computing singular values only
1021 * (Workspace: need M+BDSPAC)
1022 *
1023  CALL dbdsdc( 'U', 'N', m, s, work( ie ), dum, 1, dum, 1,
1024  $ dum, idum, work( nwork ), iwork, info )
1025 *
1026  ELSE IF( wntqo ) THEN
1027 *
1028 * Path 2t (N much larger than M, JOBZ='O')
1029 * M right singular vectors to be overwritten on A and
1030 * M left singular vectors to be computed in U
1031 *
1032  ivt = 1
1033 *
1034 * IVT is M by M
1035 *
1036  il = ivt + m*m
1037  IF( lwork.GE.m*n+m*m+3*m+bdspac ) THEN
1038 *
1039 * WORK(IL) is M by N
1040 *
1041  ldwrkl = m
1042  chunk = n
1043  ELSE
1044  ldwrkl = m
1045  chunk = ( lwork-m*m ) / m
1046  END IF
1047  itau = il + ldwrkl*m
1048  nwork = itau + m
1049 *
1050 * Compute A=L*Q
1051 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
1052 *
1053  CALL dgelqf( m, n, a, lda, work( itau ), work( nwork ),
1054  $ lwork-nwork+1, ierr )
1055 *
1056 * Copy L to WORK(IL), zeroing about above it
1057 *
1058  CALL dlacpy( 'L', m, m, a, lda, work( il ), ldwrkl )
1059  CALL dlaset( 'U', m-1, m-1, zero, zero,
1060  $ work( il+ldwrkl ), ldwrkl )
1061 *
1062 * Generate Q in A
1063 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
1064 *
1065  CALL dorglq( m, n, m, a, lda, work( itau ),
1066  $ work( nwork ), lwork-nwork+1, ierr )
1067  ie = itau
1068  itauq = ie + m
1069  itaup = itauq + m
1070  nwork = itaup + m
1071 *
1072 * Bidiagonalize L in WORK(IL)
1073 * (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB)
1074 *
1075  CALL dgebrd( m, m, work( il ), ldwrkl, s, work( ie ),
1076  $ work( itauq ), work( itaup ), work( nwork ),
1077  $ lwork-nwork+1, ierr )
1078 *
1079 * Perform bidiagonal SVD, computing left singular vectors
1080 * of bidiagonal matrix in U, and computing right singular
1081 * vectors of bidiagonal matrix in WORK(IVT)
1082 * (Workspace: need M+M*M+BDSPAC)
1083 *
1084  CALL dbdsdc( 'U', 'I', m, s, work( ie ), u, ldu,
1085  $ work( ivt ), m, dum, idum, work( nwork ),
1086  $ iwork, info )
1087 *
1088 * Overwrite U by left singular vectors of L and WORK(IVT)
1089 * by right singular vectors of L
1090 * (Workspace: need 2*M*M+3*M, prefer 2*M*M+2*M+M*NB)
1091 *
1092  CALL dormbr( 'Q', 'L', 'N', m, m, m, work( il ), ldwrkl,
1093  $ work( itauq ), u, ldu, work( nwork ),
1094  $ lwork-nwork+1, ierr )
1095  CALL dormbr( 'P', 'R', 'T', m, m, m, work( il ), ldwrkl,
1096  $ work( itaup ), work( ivt ), m,
1097  $ work( nwork ), lwork-nwork+1, ierr )
1098 *
1099 * Multiply right singular vectors of L in WORK(IVT) by Q
1100 * in A, storing result in WORK(IL) and copying to A
1101 * (Workspace: need 2*M*M, prefer M*M+M*N)
1102 *
1103  DO 30 i = 1, n, chunk
1104  blk = min( n-i+1, chunk )
1105  CALL dgemm( 'N', 'N', m, blk, m, one, work( ivt ), m,
1106  $ a( 1, i ), lda, zero, work( il ), ldwrkl )
1107  CALL dlacpy( 'F', m, blk, work( il ), ldwrkl,
1108  $ a( 1, i ), lda )
1109  30 CONTINUE
1110 *
1111  ELSE IF( wntqs ) THEN
1112 *
1113 * Path 3t (N much larger than M, JOBZ='S')
1114 * M right singular vectors to be computed in VT and
1115 * M left singular vectors to be computed in U
1116 *
1117  il = 1
1118 *
1119 * WORK(IL) is M by M
1120 *
1121  ldwrkl = m
1122  itau = il + ldwrkl*m
1123  nwork = itau + m
1124 *
1125 * Compute A=L*Q
1126 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
1127 *
1128  CALL dgelqf( m, n, a, lda, work( itau ), work( nwork ),
1129  $ lwork-nwork+1, ierr )
1130 *
1131 * Copy L to WORK(IL), zeroing out above it
1132 *
1133  CALL dlacpy( 'L', m, m, a, lda, work( il ), ldwrkl )
1134  CALL dlaset( 'U', m-1, m-1, zero, zero,
1135  $ work( il+ldwrkl ), ldwrkl )
1136 *
1137 * Generate Q in A
1138 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
1139 *
1140  CALL dorglq( m, n, m, a, lda, work( itau ),
1141  $ work( nwork ), lwork-nwork+1, ierr )
1142  ie = itau
1143  itauq = ie + m
1144  itaup = itauq + m
1145  nwork = itaup + m
1146 *
1147 * Bidiagonalize L in WORK(IU), copying result to U
1148 * (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB)
1149 *
1150  CALL dgebrd( m, m, work( il ), ldwrkl, s, work( ie ),
1151  $ work( itauq ), work( itaup ), work( nwork ),
1152  $ lwork-nwork+1, ierr )
1153 *
1154 * Perform bidiagonal SVD, computing left singular vectors
1155 * of bidiagonal matrix in U and computing right singular
1156 * vectors of bidiagonal matrix in VT
1157 * (Workspace: need M+BDSPAC)
1158 *
1159  CALL dbdsdc( 'U', 'I', m, s, work( ie ), u, ldu, vt,
1160  $ ldvt, dum, idum, work( nwork ), iwork,
1161  $ info )
1162 *
1163 * Overwrite U by left singular vectors of L and VT
1164 * by right singular vectors of L
1165 * (Workspace: need M*M+3*M, prefer M*M+2*M+M*NB)
1166 *
1167  CALL dormbr( 'Q', 'L', 'N', m, m, m, work( il ), ldwrkl,
1168  $ work( itauq ), u, ldu, work( nwork ),
1169  $ lwork-nwork+1, ierr )
1170  CALL dormbr( 'P', 'R', 'T', m, m, m, work( il ), ldwrkl,
1171  $ work( itaup ), vt, ldvt, work( nwork ),
1172  $ lwork-nwork+1, ierr )
1173 *
1174 * Multiply right singular vectors of L in WORK(IL) by
1175 * Q in A, storing result in VT
1176 * (Workspace: need M*M)
1177 *
1178  CALL dlacpy( 'F', m, m, vt, ldvt, work( il ), ldwrkl )
1179  CALL dgemm( 'N', 'N', m, n, m, one, work( il ), ldwrkl,
1180  $ a, lda, zero, vt, ldvt )
1181 *
1182  ELSE IF( wntqa ) THEN
1183 *
1184 * Path 4t (N much larger than M, JOBZ='A')
1185 * N right singular vectors to be computed in VT and
1186 * M left singular vectors to be computed in U
1187 *
1188  ivt = 1
1189 *
1190 * WORK(IVT) is M by M
1191 *
1192  ldwkvt = m
1193  itau = ivt + ldwkvt*m
1194  nwork = itau + m
1195 *
1196 * Compute A=L*Q, copying result to VT
1197 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
1198 *
1199  CALL dgelqf( m, n, a, lda, work( itau ), work( nwork ),
1200  $ lwork-nwork+1, ierr )
1201  CALL dlacpy( 'U', m, n, a, lda, vt, ldvt )
1202 *
1203 * Generate Q in VT
1204 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
1205 *
1206  CALL dorglq( n, n, m, vt, ldvt, work( itau ),
1207  $ work( nwork ), lwork-nwork+1, ierr )
1208 *
1209 * Produce L in A, zeroing out other entries
1210 *
1211  CALL dlaset( 'U', m-1, m-1, zero, zero, a( 1, 2 ), lda )
1212  ie = itau
1213  itauq = ie + m
1214  itaup = itauq + m
1215  nwork = itaup + m
1216 *
1217 * Bidiagonalize L in A
1218 * (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB)
1219 *
1220  CALL dgebrd( m, m, a, lda, s, work( ie ), work( itauq ),
1221  $ work( itaup ), work( nwork ), lwork-nwork+1,
1222  $ ierr )
1223 *
1224 * Perform bidiagonal SVD, computing left singular vectors
1225 * of bidiagonal matrix in U and computing right singular
1226 * vectors of bidiagonal matrix in WORK(IVT)
1227 * (Workspace: need M+M*M+BDSPAC)
1228 *
1229  CALL dbdsdc( 'U', 'I', m, s, work( ie ), u, ldu,
1230  $ work( ivt ), ldwkvt, dum, idum,
1231  $ work( nwork ), iwork, info )
1232 *
1233 * Overwrite U by left singular vectors of L and WORK(IVT)
1234 * by right singular vectors of L
1235 * (Workspace: need M*M+3*M, prefer M*M+2*M+M*NB)
1236 *
1237  CALL dormbr( 'Q', 'L', 'N', m, m, m, a, lda,
1238  $ work( itauq ), u, ldu, work( nwork ),
1239  $ lwork-nwork+1, ierr )
1240  CALL dormbr( 'P', 'R', 'T', m, m, m, a, lda,
1241  $ work( itaup ), work( ivt ), ldwkvt,
1242  $ work( nwork ), lwork-nwork+1, ierr )
1243 *
1244 * Multiply right singular vectors of L in WORK(IVT) by
1245 * Q in VT, storing result in A
1246 * (Workspace: need M*M)
1247 *
1248  CALL dgemm( 'N', 'N', m, n, m, one, work( ivt ), ldwkvt,
1249  $ vt, ldvt, zero, a, lda )
1250 *
1251 * Copy right singular vectors of A from A to VT
1252 *
1253  CALL dlacpy( 'F', m, n, a, lda, vt, ldvt )
1254 *
1255  END IF
1256 *
1257  ELSE
1258 *
1259 * N .LT. MNTHR
1260 *
1261 * Path 5t (N greater than M, but not much larger)
1262 * Reduce to bidiagonal form without LQ decomposition
1263 *
1264  ie = 1
1265  itauq = ie + m
1266  itaup = itauq + m
1267  nwork = itaup + m
1268 *
1269 * Bidiagonalize A
1270 * (Workspace: need 3*M+N, prefer 3*M+(M+N)*NB)
1271 *
1272  CALL dgebrd( m, n, a, lda, s, work( ie ), work( itauq ),
1273  $ work( itaup ), work( nwork ), lwork-nwork+1,
1274  $ ierr )
1275  IF( wntqn ) THEN
1276 *
1277 * Perform bidiagonal SVD, only computing singular values
1278 * (Workspace: need M+BDSPAC)
1279 *
1280  CALL dbdsdc( 'L', 'N', m, s, work( ie ), dum, 1, dum, 1,
1281  $ dum, idum, work( nwork ), iwork, info )
1282  ELSE IF( wntqo ) THEN
1283  ldwkvt = m
1284  ivt = nwork
1285  IF( lwork.GE.m*n+3*m+bdspac ) THEN
1286 *
1287 * WORK( IVT ) is M by N
1288 *
1289  CALL dlaset( 'F', m, n, zero, zero, work( ivt ),
1290  $ ldwkvt )
1291  nwork = ivt + ldwkvt*n
1292  ELSE
1293 *
1294 * WORK( IVT ) is M by M
1295 *
1296  nwork = ivt + ldwkvt*m
1297  il = nwork
1298 *
1299 * WORK(IL) is M by CHUNK
1300 *
1301  chunk = ( lwork-m*m-3*m ) / m
1302  END IF
1303 *
1304 * Perform bidiagonal SVD, computing left singular vectors
1305 * of bidiagonal matrix in U and computing right singular
1306 * vectors of bidiagonal matrix in WORK(IVT)
1307 * (Workspace: need M*M+BDSPAC)
1308 *
1309  CALL dbdsdc( 'L', 'I', m, s, work( ie ), u, ldu,
1310  $ work( ivt ), ldwkvt, dum, idum,
1311  $ work( nwork ), iwork, info )
1312 *
1313 * Overwrite U by left singular vectors of A
1314 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
1315 *
1316  CALL dormbr( 'Q', 'L', 'N', m, m, n, a, lda,
1317  $ work( itauq ), u, ldu, work( nwork ),
1318  $ lwork-nwork+1, ierr )
1319 *
1320  IF( lwork.GE.m*n+3*m+bdspac ) THEN
1321 *
1322 * Overwrite WORK(IVT) by left singular vectors of A
1323 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
1324 *
1325  CALL dormbr( 'P', 'R', 'T', m, n, m, a, lda,
1326  $ work( itaup ), work( ivt ), ldwkvt,
1327  $ work( nwork ), lwork-nwork+1, ierr )
1328 *
1329 * Copy right singular vectors of A from WORK(IVT) to A
1330 *
1331  CALL dlacpy( 'F', m, n, work( ivt ), ldwkvt, a, lda )
1332  ELSE
1333 *
1334 * Generate P**T in A
1335 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
1336 *
1337  CALL dorgbr( 'P', m, n, m, a, lda, work( itaup ),
1338  $ work( nwork ), lwork-nwork+1, ierr )
1339 *
1340 * Multiply Q in A by right singular vectors of
1341 * bidiagonal matrix in WORK(IVT), storing result in
1342 * WORK(IL) and copying to A
1343 * (Workspace: need 2*M*M, prefer M*M+M*N)
1344 *
1345  DO 40 i = 1, n, chunk
1346  blk = min( n-i+1, chunk )
1347  CALL dgemm( 'N', 'N', m, blk, m, one, work( ivt ),
1348  $ ldwkvt, a( 1, i ), lda, zero,
1349  $ work( il ), m )
1350  CALL dlacpy( 'F', m, blk, work( il ), m, a( 1, i ),
1351  $ lda )
1352  40 CONTINUE
1353  END IF
1354  ELSE IF( wntqs ) THEN
1355 *
1356 * Perform bidiagonal SVD, computing left singular vectors
1357 * of bidiagonal matrix in U and computing right singular
1358 * vectors of bidiagonal matrix in VT
1359 * (Workspace: need M+BDSPAC)
1360 *
1361  CALL dlaset( 'F', m, n, zero, zero, vt, ldvt )
1362  CALL dbdsdc( 'L', 'I', m, s, work( ie ), u, ldu, vt,
1363  $ ldvt, dum, idum, work( nwork ), iwork,
1364  $ info )
1365 *
1366 * Overwrite U by left singular vectors of A and VT
1367 * by right singular vectors of A
1368 * (Workspace: need 3*M, prefer 2*M+M*NB)
1369 *
1370  CALL dormbr( 'Q', 'L', 'N', m, m, n, a, lda,
1371  $ work( itauq ), u, ldu, work( nwork ),
1372  $ lwork-nwork+1, ierr )
1373  CALL dormbr( 'P', 'R', 'T', m, n, m, a, lda,
1374  $ work( itaup ), vt, ldvt, work( nwork ),
1375  $ lwork-nwork+1, ierr )
1376  ELSE IF( wntqa ) THEN
1377 *
1378 * Perform bidiagonal SVD, computing left singular vectors
1379 * of bidiagonal matrix in U and computing right singular
1380 * vectors of bidiagonal matrix in VT
1381 * (Workspace: need M+BDSPAC)
1382 *
1383  CALL dlaset( 'F', n, n, zero, zero, vt, ldvt )
1384  CALL dbdsdc( 'L', 'I', m, s, work( ie ), u, ldu, vt,
1385  $ ldvt, dum, idum, work( nwork ), iwork,
1386  $ info )
1387 *
1388 * Set the right corner of VT to identity matrix
1389 *
1390  IF( n.GT.m ) THEN
1391  CALL dlaset( 'F', n-m, n-m, zero, one, vt( m+1, m+1 ),
1392  $ ldvt )
1393  END IF
1394 *
1395 * Overwrite U by left singular vectors of A and VT
1396 * by right singular vectors of A
1397 * (Workspace: need 2*M+N, prefer 2*M+N*NB)
1398 *
1399  CALL dormbr( 'Q', 'L', 'N', m, m, n, a, lda,
1400  $ work( itauq ), u, ldu, work( nwork ),
1401  $ lwork-nwork+1, ierr )
1402  CALL dormbr( 'P', 'R', 'T', n, n, m, a, lda,
1403  $ work( itaup ), vt, ldvt, work( nwork ),
1404  $ lwork-nwork+1, ierr )
1405  END IF
1406 *
1407  END IF
1408 *
1409  END IF
1410 *
1411 * Undo scaling if necessary
1412 *
1413  IF( iscl.EQ.1 ) THEN
1414  IF( anrm.GT.bignum )
1415  $ CALL dlascl( 'G', 0, 0, bignum, anrm, minmn, 1, s, minmn,
1416  $ ierr )
1417  IF( anrm.LT.smlnum )
1418  $ CALL dlascl( 'G', 0, 0, smlnum, anrm, minmn, 1, s, minmn,
1419  $ ierr )
1420  END IF
1421 *
1422 * Return optimal workspace in WORK(1)
1423 *
1424  work( 1 ) = maxwrk
1425 *
1426  RETURN
1427 *
1428 * End of DGESDD
1429 *
subroutine dorgbr(VECT, M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
DORGBR
Definition: dorgbr.f:159
subroutine dlaset(UPLO, M, N, ALPHA, BETA, A, LDA)
DLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values...
Definition: dlaset.f:112
subroutine dlacpy(UPLO, M, N, A, LDA, B, LDB)
DLACPY copies all or part of one two-dimensional array to another.
Definition: dlacpy.f:105
subroutine dbdsdc(UPLO, COMPQ, N, D, E, U, LDU, VT, LDVT, Q, IQ, WORK, IWORK, INFO)
DBDSDC
Definition: dbdsdc.f:207
subroutine dgeqrf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
DGEQRF
Definition: dgeqrf.f:138
subroutine dorglq(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
DORGLQ
Definition: dorglq.f:129
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:62
subroutine dormbr(VECT, SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
DORMBR
Definition: dormbr.f:197
subroutine dgebrd(M, N, A, LDA, D, E, TAUQ, TAUP, WORK, LWORK, INFO)
DGEBRD
Definition: dgebrd.f:207
subroutine dgemm(TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)
DGEMM
Definition: dgemm.f:189
double precision function dlange(NORM, M, N, A, LDA, WORK)
DLANGE returns the value of the 1-norm, Frobenius norm, infinity-norm, or the largest absolute value ...
Definition: dlange.f:116
double precision function dlamch(CMACH)
DLAMCH
Definition: dlamch.f:65
subroutine dorgqr(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
DORGQR
Definition: dorgqr.f:130
logical function lsame(CA, CB)
LSAME
Definition: lsame.f:55
subroutine dgelqf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
DGELQF
Definition: dgelqf.f:137
subroutine dlascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
DLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: dlascl.f:141
integer function ilaenv(ISPEC, NAME, OPTS, N1, N2, N3, N4)
Definition: tstiee.f:83

Here is the call graph for this function:

Here is the caller graph for this function:

subroutine dgesvd ( character  JOBU,
character  JOBVT,
integer  M,
integer  N,
double precision, dimension( lda, * )  A,
integer  LDA,
double precision, dimension( * )  S,
double precision, dimension( ldu, * )  U,
integer  LDU,
double precision, dimension( ldvt, * )  VT,
integer  LDVT,
double precision, dimension( * )  WORK,
integer  LWORK,
integer  INFO 
)

DGESVD computes the singular value decomposition (SVD) for GE matrices

Download DGESVD + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 DGESVD computes the singular value decomposition (SVD) of a real
 M-by-N matrix A, optionally computing the left and/or right singular
 vectors. The SVD is written

      A = U * SIGMA * transpose(V)

 where SIGMA is an M-by-N matrix which is zero except for its
 min(m,n) diagonal elements, U is an M-by-M orthogonal matrix, and
 V is an N-by-N orthogonal matrix.  The diagonal elements of SIGMA
 are the singular values of A; they are real and non-negative, and
 are returned in descending order.  The first min(m,n) columns of
 U and V are the left and right singular vectors of A.

 Note that the routine returns V**T, not V.
Parameters
[in]JOBU
          JOBU is CHARACTER*1
          Specifies options for computing all or part of the matrix U:
          = 'A':  all M columns of U are returned in array U:
          = 'S':  the first min(m,n) columns of U (the left singular
                  vectors) are returned in the array U;
          = 'O':  the first min(m,n) columns of U (the left singular
                  vectors) are overwritten on the array A;
          = 'N':  no columns of U (no left singular vectors) are
                  computed.
[in]JOBVT
          JOBVT is CHARACTER*1
          Specifies options for computing all or part of the matrix
          V**T:
          = 'A':  all N rows of V**T are returned in the array VT;
          = 'S':  the first min(m,n) rows of V**T (the right singular
                  vectors) are returned in the array VT;
          = 'O':  the first min(m,n) rows of V**T (the right singular
                  vectors) are overwritten on the array A;
          = 'N':  no rows of V**T (no right singular vectors) are
                  computed.

          JOBVT and JOBU cannot both be 'O'.
[in]M
          M is INTEGER
          The number of rows of the input matrix A.  M >= 0.
[in]N
          N is INTEGER
          The number of columns of the input matrix A.  N >= 0.
[in,out]A
          A is DOUBLE PRECISION array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
          On exit,
          if JOBU = 'O',  A is overwritten with the first min(m,n)
                          columns of U (the left singular vectors,
                          stored columnwise);
          if JOBVT = 'O', A is overwritten with the first min(m,n)
                          rows of V**T (the right singular vectors,
                          stored rowwise);
          if JOBU .ne. 'O' and JOBVT .ne. 'O', the contents of A
                          are destroyed.
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]S
          S is DOUBLE PRECISION array, dimension (min(M,N))
          The singular values of A, sorted so that S(i) >= S(i+1).
[out]U
          U is DOUBLE PRECISION array, dimension (LDU,UCOL)
          (LDU,M) if JOBU = 'A' or (LDU,min(M,N)) if JOBU = 'S'.
          If JOBU = 'A', U contains the M-by-M orthogonal matrix U;
          if JOBU = 'S', U contains the first min(m,n) columns of U
          (the left singular vectors, stored columnwise);
          if JOBU = 'N' or 'O', U is not referenced.
[in]LDU
          LDU is INTEGER
          The leading dimension of the array U.  LDU >= 1; if
          JOBU = 'S' or 'A', LDU >= M.
[out]VT
          VT is DOUBLE PRECISION array, dimension (LDVT,N)
          If JOBVT = 'A', VT contains the N-by-N orthogonal matrix
          V**T;
          if JOBVT = 'S', VT contains the first min(m,n) rows of
          V**T (the right singular vectors, stored rowwise);
          if JOBVT = 'N' or 'O', VT is not referenced.
[in]LDVT
          LDVT is INTEGER
          The leading dimension of the array VT.  LDVT >= 1; if
          JOBVT = 'A', LDVT >= N; if JOBVT = 'S', LDVT >= min(M,N).
[out]WORK
          WORK is DOUBLE PRECISION array, dimension (MAX(1,LWORK))
          On exit, if INFO = 0, WORK(1) returns the optimal LWORK;
          if INFO > 0, WORK(2:MIN(M,N)) contains the unconverged
          superdiagonal elements of an upper bidiagonal matrix B
          whose diagonal is in S (not necessarily sorted). B
          satisfies A = U * B * VT, so it has the same singular values
          as A, and singular vectors related by U and VT.
[in]LWORK
          LWORK is INTEGER
          The dimension of the array WORK.
          LWORK >= MAX(1,5*MIN(M,N)) for the paths (see comments inside code):
             - PATH 1  (M much larger than N, JOBU='N') 
             - PATH 1t (N much larger than M, JOBVT='N')
          LWORK >= MAX(1,3*MIN(M,N)+MAX(M,N),5*MIN(M,N)) for the other paths
          For good performance, LWORK should generally be larger.

          If LWORK = -1, then a workspace query is assumed; the routine
          only calculates the optimal size of the WORK array, returns
          this value as the first entry of the WORK array, and no error
          message related to LWORK is issued by XERBLA.
[out]INFO
          INFO is INTEGER
          = 0:  successful exit.
          < 0:  if INFO = -i, the i-th argument had an illegal value.
          > 0:  if DBDSQR did not converge, INFO specifies how many
                superdiagonals of an intermediate bidiagonal form B
                did not converge to zero. See the description of WORK
                above for details.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Date
April 2012

Definition at line 213 of file dgesvd.f.

213 *
214 * -- LAPACK driver routine (version 3.4.1) --
215 * -- LAPACK is a software package provided by Univ. of Tennessee, --
216 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
217 * April 2012
218 *
219 * .. Scalar Arguments ..
220  CHARACTER jobu, jobvt
221  INTEGER info, lda, ldu, ldvt, lwork, m, n
222 * ..
223 * .. Array Arguments ..
224  DOUBLE PRECISION a( lda, * ), s( * ), u( ldu, * ),
225  $ vt( ldvt, * ), work( * )
226 * ..
227 *
228 * =====================================================================
229 *
230 * .. Parameters ..
231  DOUBLE PRECISION zero, one
232  parameter( zero = 0.0d0, one = 1.0d0 )
233 * ..
234 * .. Local Scalars ..
235  LOGICAL lquery, wntua, wntuas, wntun, wntuo, wntus,
236  $ wntva, wntvas, wntvn, wntvo, wntvs
237  INTEGER bdspac, blk, chunk, i, ie, ierr, ir, iscl,
238  $ itau, itaup, itauq, iu, iwork, ldwrkr, ldwrku,
239  $ maxwrk, minmn, minwrk, mnthr, ncu, ncvt, nru,
240  $ nrvt, wrkbl
241  INTEGER lwork_dgeqrf, lwork_dorgqr_n, lwork_dorgqr_m,
242  $ lwork_dgebrd, lwork_dorgbr_p, lwork_dorgbr_q,
243  $ lwork_dgelqf, lwork_dorglq_n, lwork_dorglq_m
244  DOUBLE PRECISION anrm, bignum, eps, smlnum
245 * ..
246 * .. Local Arrays ..
247  DOUBLE PRECISION dum( 1 )
248 * ..
249 * .. External Subroutines ..
250  EXTERNAL dbdsqr, dgebrd, dgelqf, dgemm, dgeqrf, dlacpy,
252  $ xerbla
253 * ..
254 * .. External Functions ..
255  LOGICAL lsame
256  INTEGER ilaenv
257  DOUBLE PRECISION dlamch, dlange
258  EXTERNAL lsame, ilaenv, dlamch, dlange
259 * ..
260 * .. Intrinsic Functions ..
261  INTRINSIC max, min, sqrt
262 * ..
263 * .. Executable Statements ..
264 *
265 * Test the input arguments
266 *
267  info = 0
268  minmn = min( m, n )
269  wntua = lsame( jobu, 'A' )
270  wntus = lsame( jobu, 'S' )
271  wntuas = wntua .OR. wntus
272  wntuo = lsame( jobu, 'O' )
273  wntun = lsame( jobu, 'N' )
274  wntva = lsame( jobvt, 'A' )
275  wntvs = lsame( jobvt, 'S' )
276  wntvas = wntva .OR. wntvs
277  wntvo = lsame( jobvt, 'O' )
278  wntvn = lsame( jobvt, 'N' )
279  lquery = ( lwork.EQ.-1 )
280 *
281  IF( .NOT.( wntua .OR. wntus .OR. wntuo .OR. wntun ) ) THEN
282  info = -1
283  ELSE IF( .NOT.( wntva .OR. wntvs .OR. wntvo .OR. wntvn ) .OR.
284  $ ( wntvo .AND. wntuo ) ) THEN
285  info = -2
286  ELSE IF( m.LT.0 ) THEN
287  info = -3
288  ELSE IF( n.LT.0 ) THEN
289  info = -4
290  ELSE IF( lda.LT.max( 1, m ) ) THEN
291  info = -6
292  ELSE IF( ldu.LT.1 .OR. ( wntuas .AND. ldu.LT.m ) ) THEN
293  info = -9
294  ELSE IF( ldvt.LT.1 .OR. ( wntva .AND. ldvt.LT.n ) .OR.
295  $ ( wntvs .AND. ldvt.LT.minmn ) ) THEN
296  info = -11
297  END IF
298 *
299 * Compute workspace
300 * (Note: Comments in the code beginning "Workspace:" describe the
301 * minimal amount of workspace needed at that point in the code,
302 * as well as the preferred amount for good performance.
303 * NB refers to the optimal block size for the immediately
304 * following subroutine, as returned by ILAENV.)
305 *
306  IF( info.EQ.0 ) THEN
307  minwrk = 1
308  maxwrk = 1
309  IF( m.GE.n .AND. minmn.GT.0 ) THEN
310 *
311 * Compute space needed for DBDSQR
312 *
313  mnthr = ilaenv( 6, 'DGESVD', jobu // jobvt, m, n, 0, 0 )
314  bdspac = 5*n
315 * Compute space needed for DGEQRF
316  CALL dgeqrf( m, n, a, lda, dum(1), dum(1), -1, ierr )
317  lwork_dgeqrf=dum(1)
318 * Compute space needed for DORGQR
319  CALL dorgqr( m, n, n, a, lda, dum(1), dum(1), -1, ierr )
320  lwork_dorgqr_n=dum(1)
321  CALL dorgqr( m, m, n, a, lda, dum(1), dum(1), -1, ierr )
322  lwork_dorgqr_m=dum(1)
323 * Compute space needed for DGEBRD
324  CALL dgebrd( n, n, a, lda, s, dum(1), dum(1),
325  $ dum(1), dum(1), -1, ierr )
326  lwork_dgebrd=dum(1)
327 * Compute space needed for DORGBR P
328  CALL dorgbr( 'P', n, n, n, a, lda, dum(1),
329  $ dum(1), -1, ierr )
330  lwork_dorgbr_p=dum(1)
331 * Compute space needed for DORGBR Q
332  CALL dorgbr( 'Q', n, n, n, a, lda, dum(1),
333  $ dum(1), -1, ierr )
334  lwork_dorgbr_q=dum(1)
335 *
336  IF( m.GE.mnthr ) THEN
337  IF( wntun ) THEN
338 *
339 * Path 1 (M much larger than N, JOBU='N')
340 *
341  maxwrk = n + lwork_dgeqrf
342  maxwrk = max( maxwrk, 3*n+lwork_dgebrd )
343  IF( wntvo .OR. wntvas )
344  $ maxwrk = max( maxwrk, 3*n+lwork_dorgbr_p )
345  maxwrk = max( maxwrk, bdspac )
346  minwrk = max( 4*n, bdspac )
347  ELSE IF( wntuo .AND. wntvn ) THEN
348 *
349 * Path 2 (M much larger than N, JOBU='O', JOBVT='N')
350 *
351  wrkbl = n + lwork_dgeqrf
352  wrkbl = max( wrkbl, n+lwork_dorgqr_n )
353  wrkbl = max( wrkbl, 3*n+lwork_dgebrd )
354  wrkbl = max( wrkbl, 3*n+lwork_dorgbr_q )
355  wrkbl = max( wrkbl, bdspac )
356  maxwrk = max( n*n+wrkbl, n*n+m*n+n )
357  minwrk = max( 3*n+m, bdspac )
358  ELSE IF( wntuo .AND. wntvas ) THEN
359 *
360 * Path 3 (M much larger than N, JOBU='O', JOBVT='S' or
361 * 'A')
362 *
363  wrkbl = n + lwork_dgeqrf
364  wrkbl = max( wrkbl, n+lwork_dorgqr_n )
365  wrkbl = max( wrkbl, 3*n+lwork_dgebrd )
366  wrkbl = max( wrkbl, 3*n+lwork_dorgbr_q )
367  wrkbl = max( wrkbl, 3*n+lwork_dorgbr_p )
368  wrkbl = max( wrkbl, bdspac )
369  maxwrk = max( n*n+wrkbl, n*n+m*n+n )
370  minwrk = max( 3*n+m, bdspac )
371  ELSE IF( wntus .AND. wntvn ) THEN
372 *
373 * Path 4 (M much larger than N, JOBU='S', JOBVT='N')
374 *
375  wrkbl = n + lwork_dgeqrf
376  wrkbl = max( wrkbl, n+lwork_dorgqr_n )
377  wrkbl = max( wrkbl, 3*n+lwork_dgebrd )
378  wrkbl = max( wrkbl, 3*n+lwork_dorgbr_q )
379  wrkbl = max( wrkbl, bdspac )
380  maxwrk = n*n + wrkbl
381  minwrk = max( 3*n+m, bdspac )
382  ELSE IF( wntus .AND. wntvo ) THEN
383 *
384 * Path 5 (M much larger than N, JOBU='S', JOBVT='O')
385 *
386  wrkbl = n + lwork_dgeqrf
387  wrkbl = max( wrkbl, n+lwork_dorgqr_n )
388  wrkbl = max( wrkbl, 3*n+lwork_dgebrd )
389  wrkbl = max( wrkbl, 3*n+lwork_dorgbr_q )
390  wrkbl = max( wrkbl, 3*n+lwork_dorgbr_p )
391  wrkbl = max( wrkbl, bdspac )
392  maxwrk = 2*n*n + wrkbl
393  minwrk = max( 3*n+m, bdspac )
394  ELSE IF( wntus .AND. wntvas ) THEN
395 *
396 * Path 6 (M much larger than N, JOBU='S', JOBVT='S' or
397 * 'A')
398 *
399  wrkbl = n + lwork_dgeqrf
400  wrkbl = max( wrkbl, n+lwork_dorgqr_n )
401  wrkbl = max( wrkbl, 3*n+lwork_dgebrd )
402  wrkbl = max( wrkbl, 3*n+lwork_dorgbr_q )
403  wrkbl = max( wrkbl, 3*n+lwork_dorgbr_p )
404  wrkbl = max( wrkbl, bdspac )
405  maxwrk = n*n + wrkbl
406  minwrk = max( 3*n+m, bdspac )
407  ELSE IF( wntua .AND. wntvn ) THEN
408 *
409 * Path 7 (M much larger than N, JOBU='A', JOBVT='N')
410 *
411  wrkbl = n + lwork_dgeqrf
412  wrkbl = max( wrkbl, n+lwork_dorgqr_m )
413  wrkbl = max( wrkbl, 3*n+lwork_dgebrd )
414  wrkbl = max( wrkbl, 3*n+lwork_dorgbr_q )
415  wrkbl = max( wrkbl, bdspac )
416  maxwrk = n*n + wrkbl
417  minwrk = max( 3*n+m, bdspac )
418  ELSE IF( wntua .AND. wntvo ) THEN
419 *
420 * Path 8 (M much larger than N, JOBU='A', JOBVT='O')
421 *
422  wrkbl = n + lwork_dgeqrf
423  wrkbl = max( wrkbl, n+lwork_dorgqr_m )
424  wrkbl = max( wrkbl, 3*n+lwork_dgebrd )
425  wrkbl = max( wrkbl, 3*n+lwork_dorgbr_q )
426  wrkbl = max( wrkbl, 3*n+lwork_dorgbr_p )
427  wrkbl = max( wrkbl, bdspac )
428  maxwrk = 2*n*n + wrkbl
429  minwrk = max( 3*n+m, bdspac )
430  ELSE IF( wntua .AND. wntvas ) THEN
431 *
432 * Path 9 (M much larger than N, JOBU='A', JOBVT='S' or
433 * 'A')
434 *
435  wrkbl = n + lwork_dgeqrf
436  wrkbl = max( wrkbl, n+lwork_dorgqr_m )
437  wrkbl = max( wrkbl, 3*n+lwork_dgebrd )
438  wrkbl = max( wrkbl, 3*n+lwork_dorgbr_q )
439  wrkbl = max( wrkbl, 3*n+lwork_dorgbr_p )
440  wrkbl = max( wrkbl, bdspac )
441  maxwrk = n*n + wrkbl
442  minwrk = max( 3*n+m, bdspac )
443  END IF
444  ELSE
445 *
446 * Path 10 (M at least N, but not much larger)
447 *
448  CALL dgebrd( m, n, a, lda, s, dum(1), dum(1),
449  $ dum(1), dum(1), -1, ierr )
450  lwork_dgebrd=dum(1)
451  maxwrk = 3*n + lwork_dgebrd
452  IF( wntus .OR. wntuo ) THEN
453  CALL dorgbr( 'Q', m, n, n, a, lda, dum(1),
454  $ dum(1), -1, ierr )
455  lwork_dorgbr_q=dum(1)
456  maxwrk = max( maxwrk, 3*n+lwork_dorgbr_q )
457  END IF
458  IF( wntua ) THEN
459  CALL dorgbr( 'Q', m, m, n, a, lda, dum(1),
460  $ dum(1), -1, ierr )
461  lwork_dorgbr_q=dum(1)
462  maxwrk = max( maxwrk, 3*n+lwork_dorgbr_q )
463  END IF
464  IF( .NOT.wntvn ) THEN
465  maxwrk = max( maxwrk, 3*n+lwork_dorgbr_p )
466  END IF
467  maxwrk = max( maxwrk, bdspac )
468  minwrk = max( 3*n+m, bdspac )
469  END IF
470  ELSE IF( minmn.GT.0 ) THEN
471 *
472 * Compute space needed for DBDSQR
473 *
474  mnthr = ilaenv( 6, 'DGESVD', jobu // jobvt, m, n, 0, 0 )
475  bdspac = 5*m
476 * Compute space needed for DGELQF
477  CALL dgelqf( m, n, a, lda, dum(1), dum(1), -1, ierr )
478  lwork_dgelqf=dum(1)
479 * Compute space needed for DORGLQ
480  CALL dorglq( n, n, m, dum(1), n, dum(1), dum(1), -1, ierr )
481  lwork_dorglq_n=dum(1)
482  CALL dorglq( m, n, m, a, lda, dum(1), dum(1), -1, ierr )
483  lwork_dorglq_m=dum(1)
484 * Compute space needed for DGEBRD
485  CALL dgebrd( m, m, a, lda, s, dum(1), dum(1),
486  $ dum(1), dum(1), -1, ierr )
487  lwork_dgebrd=dum(1)
488 * Compute space needed for DORGBR P
489  CALL dorgbr( 'P', m, m, m, a, n, dum(1),
490  $ dum(1), -1, ierr )
491  lwork_dorgbr_p=dum(1)
492 * Compute space needed for DORGBR Q
493  CALL dorgbr( 'Q', m, m, m, a, n, dum(1),
494  $ dum(1), -1, ierr )
495  lwork_dorgbr_q=dum(1)
496  IF( n.GE.mnthr ) THEN
497  IF( wntvn ) THEN
498 *
499 * Path 1t(N much larger than M, JOBVT='N')
500 *
501  maxwrk = m + lwork_dgelqf
502  maxwrk = max( maxwrk, 3*m+lwork_dgebrd )
503  IF( wntuo .OR. wntuas )
504  $ maxwrk = max( maxwrk, 3*m+lwork_dorgbr_q )
505  maxwrk = max( maxwrk, bdspac )
506  minwrk = max( 4*m, bdspac )
507  ELSE IF( wntvo .AND. wntun ) THEN
508 *
509 * Path 2t(N much larger than M, JOBU='N', JOBVT='O')
510 *
511  wrkbl = m + lwork_dgelqf
512  wrkbl = max( wrkbl, m+lwork_dorglq_m )
513  wrkbl = max( wrkbl, 3*m+lwork_dgebrd )
514  wrkbl = max( wrkbl, 3*m+lwork_dorgbr_p )
515  wrkbl = max( wrkbl, bdspac )
516  maxwrk = max( m*m+wrkbl, m*m+m*n+m )
517  minwrk = max( 3*m+n, bdspac )
518  ELSE IF( wntvo .AND. wntuas ) THEN
519 *
520 * Path 3t(N much larger than M, JOBU='S' or 'A',
521 * JOBVT='O')
522 *
523  wrkbl = m + lwork_dgelqf
524  wrkbl = max( wrkbl, m+lwork_dorglq_m )
525  wrkbl = max( wrkbl, 3*m+lwork_dgebrd )
526  wrkbl = max( wrkbl, 3*m+lwork_dorgbr_p )
527  wrkbl = max( wrkbl, 3*m+lwork_dorgbr_q )
528  wrkbl = max( wrkbl, bdspac )
529  maxwrk = max( m*m+wrkbl, m*m+m*n+m )
530  minwrk = max( 3*m+n, bdspac )
531  ELSE IF( wntvs .AND. wntun ) THEN
532 *
533 * Path 4t(N much larger than M, JOBU='N', JOBVT='S')
534 *
535  wrkbl = m + lwork_dgelqf
536  wrkbl = max( wrkbl, m+lwork_dorglq_m )
537  wrkbl = max( wrkbl, 3*m+lwork_dgebrd )
538  wrkbl = max( wrkbl, 3*m+lwork_dorgbr_p )
539  wrkbl = max( wrkbl, bdspac )
540  maxwrk = m*m + wrkbl
541  minwrk = max( 3*m+n, bdspac )
542  ELSE IF( wntvs .AND. wntuo ) THEN
543 *
544 * Path 5t(N much larger than M, JOBU='O', JOBVT='S')
545 *
546  wrkbl = m + lwork_dgelqf
547  wrkbl = max( wrkbl, m+lwork_dorglq_m )
548  wrkbl = max( wrkbl, 3*m+lwork_dgebrd )
549  wrkbl = max( wrkbl, 3*m+lwork_dorgbr_p )
550  wrkbl = max( wrkbl, 3*m+lwork_dorgbr_q )
551  wrkbl = max( wrkbl, bdspac )
552  maxwrk = 2*m*m + wrkbl
553  minwrk = max( 3*m+n, bdspac )
554  ELSE IF( wntvs .AND. wntuas ) THEN
555 *
556 * Path 6t(N much larger than M, JOBU='S' or 'A',
557 * JOBVT='S')
558 *
559  wrkbl = m + lwork_dgelqf
560  wrkbl = max( wrkbl, m+lwork_dorglq_m )
561  wrkbl = max( wrkbl, 3*m+lwork_dgebrd )
562  wrkbl = max( wrkbl, 3*m+lwork_dorgbr_p )
563  wrkbl = max( wrkbl, 3*m+lwork_dorgbr_q )
564  wrkbl = max( wrkbl, bdspac )
565  maxwrk = m*m + wrkbl
566  minwrk = max( 3*m+n, bdspac )
567  ELSE IF( wntva .AND. wntun ) THEN
568 *
569 * Path 7t(N much larger than M, JOBU='N', JOBVT='A')
570 *
571  wrkbl = m + lwork_dgelqf
572  wrkbl = max( wrkbl, m+lwork_dorglq_n )
573  wrkbl = max( wrkbl, 3*m+lwork_dgebrd )
574  wrkbl = max( wrkbl, 3*m+lwork_dorgbr_p )
575  wrkbl = max( wrkbl, bdspac )
576  maxwrk = m*m + wrkbl
577  minwrk = max( 3*m+n, bdspac )
578  ELSE IF( wntva .AND. wntuo ) THEN
579 *
580 * Path 8t(N much larger than M, JOBU='O', JOBVT='A')
581 *
582  wrkbl = m + lwork_dgelqf
583  wrkbl = max( wrkbl, m+lwork_dorglq_n )
584  wrkbl = max( wrkbl, 3*m+lwork_dgebrd )
585  wrkbl = max( wrkbl, 3*m+lwork_dorgbr_p )
586  wrkbl = max( wrkbl, 3*m+lwork_dorgbr_q )
587  wrkbl = max( wrkbl, bdspac )
588  maxwrk = 2*m*m + wrkbl
589  minwrk = max( 3*m+n, bdspac )
590  ELSE IF( wntva .AND. wntuas ) THEN
591 *
592 * Path 9t(N much larger than M, JOBU='S' or 'A',
593 * JOBVT='A')
594 *
595  wrkbl = m + lwork_dgelqf
596  wrkbl = max( wrkbl, m+lwork_dorglq_n )
597  wrkbl = max( wrkbl, 3*m+lwork_dgebrd )
598  wrkbl = max( wrkbl, 3*m+lwork_dorgbr_p )
599  wrkbl = max( wrkbl, 3*m+lwork_dorgbr_q )
600  wrkbl = max( wrkbl, bdspac )
601  maxwrk = m*m + wrkbl
602  minwrk = max( 3*m+n, bdspac )
603  END IF
604  ELSE
605 *
606 * Path 10t(N greater than M, but not much larger)
607 *
608  CALL dgebrd( m, n, a, lda, s, dum(1), dum(1),
609  $ dum(1), dum(1), -1, ierr )
610  lwork_dgebrd=dum(1)
611  maxwrk = 3*m + lwork_dgebrd
612  IF( wntvs .OR. wntvo ) THEN
613 * Compute space needed for DORGBR P
614  CALL dorgbr( 'P', m, n, m, a, n, dum(1),
615  $ dum(1), -1, ierr )
616  lwork_dorgbr_p=dum(1)
617  maxwrk = max( maxwrk, 3*m+lwork_dorgbr_p )
618  END IF
619  IF( wntva ) THEN
620  CALL dorgbr( 'P', n, n, m, a, n, dum(1),
621  $ dum(1), -1, ierr )
622  lwork_dorgbr_p=dum(1)
623  maxwrk = max( maxwrk, 3*m+lwork_dorgbr_p )
624  END IF
625  IF( .NOT.wntun ) THEN
626  maxwrk = max( maxwrk, 3*m+lwork_dorgbr_q )
627  END IF
628  maxwrk = max( maxwrk, bdspac )
629  minwrk = max( 3*m+n, bdspac )
630  END IF
631  END IF
632  maxwrk = max( maxwrk, minwrk )
633  work( 1 ) = maxwrk
634 *
635  IF( lwork.LT.minwrk .AND. .NOT.lquery ) THEN
636  info = -13
637  END IF
638  END IF
639 *
640  IF( info.NE.0 ) THEN
641  CALL xerbla( 'DGESVD', -info )
642  RETURN
643  ELSE IF( lquery ) THEN
644  RETURN
645  END IF
646 *
647 * Quick return if possible
648 *
649  IF( m.EQ.0 .OR. n.EQ.0 ) THEN
650  RETURN
651  END IF
652 *
653 * Get machine constants
654 *
655  eps = dlamch( 'P' )
656  smlnum = sqrt( dlamch( 'S' ) ) / eps
657  bignum = one / smlnum
658 *
659 * Scale A if max element outside range [SMLNUM,BIGNUM]
660 *
661  anrm = dlange( 'M', m, n, a, lda, dum )
662  iscl = 0
663  IF( anrm.GT.zero .AND. anrm.LT.smlnum ) THEN
664  iscl = 1
665  CALL dlascl( 'G', 0, 0, anrm, smlnum, m, n, a, lda, ierr )
666  ELSE IF( anrm.GT.bignum ) THEN
667  iscl = 1
668  CALL dlascl( 'G', 0, 0, anrm, bignum, m, n, a, lda, ierr )
669  END IF
670 *
671  IF( m.GE.n ) THEN
672 *
673 * A has at least as many rows as columns. If A has sufficiently
674 * more rows than columns, first reduce using the QR
675 * decomposition (if sufficient workspace available)
676 *
677  IF( m.GE.mnthr ) THEN
678 *
679  IF( wntun ) THEN
680 *
681 * Path 1 (M much larger than N, JOBU='N')
682 * No left singular vectors to be computed
683 *
684  itau = 1
685  iwork = itau + n
686 *
687 * Compute A=Q*R
688 * (Workspace: need 2*N, prefer N+N*NB)
689 *
690  CALL dgeqrf( m, n, a, lda, work( itau ), work( iwork ),
691  $ lwork-iwork+1, ierr )
692 *
693 * Zero out below R
694 *
695  CALL dlaset( 'L', n-1, n-1, zero, zero, a( 2, 1 ), lda )
696  ie = 1
697  itauq = ie + n
698  itaup = itauq + n
699  iwork = itaup + n
700 *
701 * Bidiagonalize R in A
702 * (Workspace: need 4*N, prefer 3*N+2*N*NB)
703 *
704  CALL dgebrd( n, n, a, lda, s, work( ie ), work( itauq ),
705  $ work( itaup ), work( iwork ), lwork-iwork+1,
706  $ ierr )
707  ncvt = 0
708  IF( wntvo .OR. wntvas ) THEN
709 *
710 * If right singular vectors desired, generate P'.
711 * (Workspace: need 4*N-1, prefer 3*N+(N-1)*NB)
712 *
713  CALL dorgbr( 'P', n, n, n, a, lda, work( itaup ),
714  $ work( iwork ), lwork-iwork+1, ierr )
715  ncvt = n
716  END IF
717  iwork = ie + n
718 *
719 * Perform bidiagonal QR iteration, computing right
720 * singular vectors of A in A if desired
721 * (Workspace: need BDSPAC)
722 *
723  CALL dbdsqr( 'U', n, ncvt, 0, 0, s, work( ie ), a, lda,
724  $ dum, 1, dum, 1, work( iwork ), info )
725 *
726 * If right singular vectors desired in VT, copy them there
727 *
728  IF( wntvas )
729  $ CALL dlacpy( 'F', n, n, a, lda, vt, ldvt )
730 *
731  ELSE IF( wntuo .AND. wntvn ) THEN
732 *
733 * Path 2 (M much larger than N, JOBU='O', JOBVT='N')
734 * N left singular vectors to be overwritten on A and
735 * no right singular vectors to be computed
736 *
737  IF( lwork.GE.n*n+max( 4*n, bdspac ) ) THEN
738 *
739 * Sufficient workspace for a fast algorithm
740 *
741  ir = 1
742  IF( lwork.GE.max( wrkbl, lda*n+n )+lda*n ) THEN
743 *
744 * WORK(IU) is LDA by N, WORK(IR) is LDA by N
745 *
746  ldwrku = lda
747  ldwrkr = lda
748  ELSE IF( lwork.GE.max( wrkbl, lda*n+n )+n*n ) THEN
749 *
750 * WORK(IU) is LDA by N, WORK(IR) is N by N
751 *
752  ldwrku = lda
753  ldwrkr = n
754  ELSE
755 *
756 * WORK(IU) is LDWRKU by N, WORK(IR) is N by N
757 *
758  ldwrku = ( lwork-n*n-n ) / n
759  ldwrkr = n
760  END IF
761  itau = ir + ldwrkr*n
762  iwork = itau + n
763 *
764 * Compute A=Q*R
765 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
766 *
767  CALL dgeqrf( m, n, a, lda, work( itau ),
768  $ work( iwork ), lwork-iwork+1, ierr )
769 *
770 * Copy R to WORK(IR) and zero out below it
771 *
772  CALL dlacpy( 'U', n, n, a, lda, work( ir ), ldwrkr )
773  CALL dlaset( 'L', n-1, n-1, zero, zero, work( ir+1 ),
774  $ ldwrkr )
775 *
776 * Generate Q in A
777 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
778 *
779  CALL dorgqr( m, n, n, a, lda, work( itau ),
780  $ work( iwork ), lwork-iwork+1, ierr )
781  ie = itau
782  itauq = ie + n
783  itaup = itauq + n
784  iwork = itaup + n
785 *
786 * Bidiagonalize R in WORK(IR)
787 * (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB)
788 *
789  CALL dgebrd( n, n, work( ir ), ldwrkr, s, work( ie ),
790  $ work( itauq ), work( itaup ),
791  $ work( iwork ), lwork-iwork+1, ierr )
792 *
793 * Generate left vectors bidiagonalizing R
794 * (Workspace: need N*N+4*N, prefer N*N+3*N+N*NB)
795 *
796  CALL dorgbr( 'Q', n, n, n, work( ir ), ldwrkr,
797  $ work( itauq ), work( iwork ),
798  $ lwork-iwork+1, ierr )
799  iwork = ie + n
800 *
801 * Perform bidiagonal QR iteration, computing left
802 * singular vectors of R in WORK(IR)
803 * (Workspace: need N*N+BDSPAC)
804 *
805  CALL dbdsqr( 'U', n, 0, n, 0, s, work( ie ), dum, 1,
806  $ work( ir ), ldwrkr, dum, 1,
807  $ work( iwork ), info )
808  iu = ie + n
809 *
810 * Multiply Q in A by left singular vectors of R in
811 * WORK(IR), storing result in WORK(IU) and copying to A
812 * (Workspace: need N*N+2*N, prefer N*N+M*N+N)
813 *
814  DO 10 i = 1, m, ldwrku
815  chunk = min( m-i+1, ldwrku )
816  CALL dgemm( 'N', 'N', chunk, n, n, one, a( i, 1 ),
817  $ lda, work( ir ), ldwrkr, zero,
818  $ work( iu ), ldwrku )
819  CALL dlacpy( 'F', chunk, n, work( iu ), ldwrku,
820  $ a( i, 1 ), lda )
821  10 CONTINUE
822 *
823  ELSE
824 *
825 * Insufficient workspace for a fast algorithm
826 *
827  ie = 1
828  itauq = ie + n
829  itaup = itauq + n
830  iwork = itaup + n
831 *
832 * Bidiagonalize A
833 * (Workspace: need 3*N+M, prefer 3*N+(M+N)*NB)
834 *
835  CALL dgebrd( m, n, a, lda, s, work( ie ),
836  $ work( itauq ), work( itaup ),
837  $ work( iwork ), lwork-iwork+1, ierr )
838 *
839 * Generate left vectors bidiagonalizing A
840 * (Workspace: need 4*N, prefer 3*N+N*NB)
841 *
842  CALL dorgbr( 'Q', m, n, n, a, lda, work( itauq ),
843  $ work( iwork ), lwork-iwork+1, ierr )
844  iwork = ie + n
845 *
846 * Perform bidiagonal QR iteration, computing left
847 * singular vectors of A in A
848 * (Workspace: need BDSPAC)
849 *
850  CALL dbdsqr( 'U', n, 0, m, 0, s, work( ie ), dum, 1,
851  $ a, lda, dum, 1, work( iwork ), info )
852 *
853  END IF
854 *
855  ELSE IF( wntuo .AND. wntvas ) THEN
856 *
857 * Path 3 (M much larger than N, JOBU='O', JOBVT='S' or 'A')
858 * N left singular vectors to be overwritten on A and
859 * N right singular vectors to be computed in VT
860 *
861  IF( lwork.GE.n*n+max( 4*n, bdspac ) ) THEN
862 *
863 * Sufficient workspace for a fast algorithm
864 *
865  ir = 1
866  IF( lwork.GE.max( wrkbl, lda*n+n )+lda*n ) THEN
867 *
868 * WORK(IU) is LDA by N and WORK(IR) is LDA by N
869 *
870  ldwrku = lda
871  ldwrkr = lda
872  ELSE IF( lwork.GE.max( wrkbl, lda*n+n )+n*n ) THEN
873 *
874 * WORK(IU) is LDA by N and WORK(IR) is N by N
875 *
876  ldwrku = lda
877  ldwrkr = n
878  ELSE
879 *
880 * WORK(IU) is LDWRKU by N and WORK(IR) is N by N
881 *
882  ldwrku = ( lwork-n*n-n ) / n
883  ldwrkr = n
884  END IF
885  itau = ir + ldwrkr*n
886  iwork = itau + n
887 *
888 * Compute A=Q*R
889 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
890 *
891  CALL dgeqrf( m, n, a, lda, work( itau ),
892  $ work( iwork ), lwork-iwork+1, ierr )
893 *
894 * Copy R to VT, zeroing out below it
895 *
896  CALL dlacpy( 'U', n, n, a, lda, vt, ldvt )
897  IF( n.GT.1 )
898  $ CALL dlaset( 'L', n-1, n-1, zero, zero,
899  $ vt( 2, 1 ), ldvt )
900 *
901 * Generate Q in A
902 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
903 *
904  CALL dorgqr( m, n, n, a, lda, work( itau ),
905  $ work( iwork ), lwork-iwork+1, ierr )
906  ie = itau
907  itauq = ie + n
908  itaup = itauq + n
909  iwork = itaup + n
910 *
911 * Bidiagonalize R in VT, copying result to WORK(IR)
912 * (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB)
913 *
914  CALL dgebrd( n, n, vt, ldvt, s, work( ie ),
915  $ work( itauq ), work( itaup ),
916  $ work( iwork ), lwork-iwork+1, ierr )
917  CALL dlacpy( 'L', n, n, vt, ldvt, work( ir ), ldwrkr )
918 *
919 * Generate left vectors bidiagonalizing R in WORK(IR)
920 * (Workspace: need N*N+4*N, prefer N*N+3*N+N*NB)
921 *
922  CALL dorgbr( 'Q', n, n, n, work( ir ), ldwrkr,
923  $ work( itauq ), work( iwork ),
924  $ lwork-iwork+1, ierr )
925 *
926 * Generate right vectors bidiagonalizing R in VT
927 * (Workspace: need N*N+4*N-1, prefer N*N+3*N+(N-1)*NB)
928 *
929  CALL dorgbr( 'P', n, n, n, vt, ldvt, work( itaup ),
930  $ work( iwork ), lwork-iwork+1, ierr )
931  iwork = ie + n
932 *
933 * Perform bidiagonal QR iteration, computing left
934 * singular vectors of R in WORK(IR) and computing right
935 * singular vectors of R in VT
936 * (Workspace: need N*N+BDSPAC)
937 *
938  CALL dbdsqr( 'U', n, n, n, 0, s, work( ie ), vt, ldvt,
939  $ work( ir ), ldwrkr, dum, 1,
940  $ work( iwork ), info )
941  iu = ie + n
942 *
943 * Multiply Q in A by left singular vectors of R in
944 * WORK(IR), storing result in WORK(IU) and copying to A
945 * (Workspace: need N*N+2*N, prefer N*N+M*N+N)
946 *
947  DO 20 i = 1, m, ldwrku
948  chunk = min( m-i+1, ldwrku )
949  CALL dgemm( 'N', 'N', chunk, n, n, one, a( i, 1 ),
950  $ lda, work( ir ), ldwrkr, zero,
951  $ work( iu ), ldwrku )
952  CALL dlacpy( 'F', chunk, n, work( iu ), ldwrku,
953  $ a( i, 1 ), lda )
954  20 CONTINUE
955 *
956  ELSE
957 *
958 * Insufficient workspace for a fast algorithm
959 *
960  itau = 1
961  iwork = itau + n
962 *
963 * Compute A=Q*R
964 * (Workspace: need 2*N, prefer N+N*NB)
965 *
966  CALL dgeqrf( m, n, a, lda, work( itau ),
967  $ work( iwork ), lwork-iwork+1, ierr )
968 *
969 * Copy R to VT, zeroing out below it
970 *
971  CALL dlacpy( 'U', n, n, a, lda, vt, ldvt )
972  IF( n.GT.1 )
973  $ CALL dlaset( 'L', n-1, n-1, zero, zero,
974  $ vt( 2, 1 ), ldvt )
975 *
976 * Generate Q in A
977 * (Workspace: need 2*N, prefer N+N*NB)
978 *
979  CALL dorgqr( m, n, n, a, lda, work( itau ),
980  $ work( iwork ), lwork-iwork+1, ierr )
981  ie = itau
982  itauq = ie + n
983  itaup = itauq + n
984  iwork = itaup + n
985 *
986 * Bidiagonalize R in VT
987 * (Workspace: need 4*N, prefer 3*N+2*N*NB)
988 *
989  CALL dgebrd( n, n, vt, ldvt, s, work( ie ),
990  $ work( itauq ), work( itaup ),
991  $ work( iwork ), lwork-iwork+1, ierr )
992 *
993 * Multiply Q in A by left vectors bidiagonalizing R
994 * (Workspace: need 3*N+M, prefer 3*N+M*NB)
995 *
996  CALL dormbr( 'Q', 'R', 'N', m, n, n, vt, ldvt,
997  $ work( itauq ), a, lda, work( iwork ),
998  $ lwork-iwork+1, ierr )
999 *
1000 * Generate right vectors bidiagonalizing R in VT
1001 * (Workspace: need 4*N-1, prefer 3*N+(N-1)*NB)
1002 *
1003  CALL dorgbr( 'P', n, n, n, vt, ldvt, work( itaup ),
1004  $ work( iwork ), lwork-iwork+1, ierr )
1005  iwork = ie + n
1006 *
1007 * Perform bidiagonal QR iteration, computing left
1008 * singular vectors of A in A and computing right
1009 * singular vectors of A in VT
1010 * (Workspace: need BDSPAC)
1011 *
1012  CALL dbdsqr( 'U', n, n, m, 0, s, work( ie ), vt, ldvt,
1013  $ a, lda, dum, 1, work( iwork ), info )
1014 *
1015  END IF
1016 *
1017  ELSE IF( wntus ) THEN
1018 *
1019  IF( wntvn ) THEN
1020 *
1021 * Path 4 (M much larger than N, JOBU='S', JOBVT='N')
1022 * N left singular vectors to be computed in U and
1023 * no right singular vectors to be computed
1024 *
1025  IF( lwork.GE.n*n+max( 4*n, bdspac ) ) THEN
1026 *
1027 * Sufficient workspace for a fast algorithm
1028 *
1029  ir = 1
1030  IF( lwork.GE.wrkbl+lda*n ) THEN
1031 *
1032 * WORK(IR) is LDA by N
1033 *
1034  ldwrkr = lda
1035  ELSE
1036 *
1037 * WORK(IR) is N by N
1038 *
1039  ldwrkr = n
1040  END IF
1041  itau = ir + ldwrkr*n
1042  iwork = itau + n
1043 *
1044 * Compute A=Q*R
1045 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
1046 *
1047  CALL dgeqrf( m, n, a, lda, work( itau ),
1048  $ work( iwork ), lwork-iwork+1, ierr )
1049 *
1050 * Copy R to WORK(IR), zeroing out below it
1051 *
1052  CALL dlacpy( 'U', n, n, a, lda, work( ir ),
1053  $ ldwrkr )
1054  CALL dlaset( 'L', n-1, n-1, zero, zero,
1055  $ work( ir+1 ), ldwrkr )
1056 *
1057 * Generate Q in A
1058 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
1059 *
1060  CALL dorgqr( m, n, n, a, lda, work( itau ),
1061  $ work( iwork ), lwork-iwork+1, ierr )
1062  ie = itau
1063  itauq = ie + n
1064  itaup = itauq + n
1065  iwork = itaup + n
1066 *
1067 * Bidiagonalize R in WORK(IR)
1068 * (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB)
1069 *
1070  CALL dgebrd( n, n, work( ir ), ldwrkr, s,
1071  $ work( ie ), work( itauq ),
1072  $ work( itaup ), work( iwork ),
1073  $ lwork-iwork+1, ierr )
1074 *
1075 * Generate left vectors bidiagonalizing R in WORK(IR)
1076 * (Workspace: need N*N+4*N, prefer N*N+3*N+N*NB)
1077 *
1078  CALL dorgbr( 'Q', n, n, n, work( ir ), ldwrkr,
1079  $ work( itauq ), work( iwork ),
1080  $ lwork-iwork+1, ierr )
1081  iwork = ie + n
1082 *
1083 * Perform bidiagonal QR iteration, computing left
1084 * singular vectors of R in WORK(IR)
1085 * (Workspace: need N*N+BDSPAC)
1086 *
1087  CALL dbdsqr( 'U', n, 0, n, 0, s, work( ie ), dum,
1088  $ 1, work( ir ), ldwrkr, dum, 1,
1089  $ work( iwork ), info )
1090 *
1091 * Multiply Q in A by left singular vectors of R in
1092 * WORK(IR), storing result in U
1093 * (Workspace: need N*N)
1094 *
1095  CALL dgemm( 'N', 'N', m, n, n, one, a, lda,
1096  $ work( ir ), ldwrkr, zero, u, ldu )
1097 *
1098  ELSE
1099 *
1100 * Insufficient workspace for a fast algorithm
1101 *
1102  itau = 1
1103  iwork = itau + n
1104 *
1105 * Compute A=Q*R, copying result to U
1106 * (Workspace: need 2*N, prefer N+N*NB)
1107 *
1108  CALL dgeqrf( m, n, a, lda, work( itau ),
1109  $ work( iwork ), lwork-iwork+1, ierr )
1110  CALL dlacpy( 'L', m, n, a, lda, u, ldu )
1111 *
1112 * Generate Q in U
1113 * (Workspace: need 2*N, prefer N+N*NB)
1114 *
1115  CALL dorgqr( m, n, n, u, ldu, work( itau ),
1116  $ work( iwork ), lwork-iwork+1, ierr )
1117  ie = itau
1118  itauq = ie + n
1119  itaup = itauq + n
1120  iwork = itaup + n
1121 *
1122 * Zero out below R in A
1123 *
1124  CALL dlaset( 'L', n-1, n-1, zero, zero, a( 2, 1 ),
1125  $ lda )
1126 *
1127 * Bidiagonalize R in A
1128 * (Workspace: need 4*N, prefer 3*N+2*N*NB)
1129 *
1130  CALL dgebrd( n, n, a, lda, s, work( ie ),
1131  $ work( itauq ), work( itaup ),
1132  $ work( iwork ), lwork-iwork+1, ierr )
1133 *
1134 * Multiply Q in U by left vectors bidiagonalizing R
1135 * (Workspace: need 3*N+M, prefer 3*N+M*NB)
1136 *
1137  CALL dormbr( 'Q', 'R', 'N', m, n, n, a, lda,
1138  $ work( itauq ), u, ldu, work( iwork ),
1139  $ lwork-iwork+1, ierr )
1140  iwork = ie + n
1141 *
1142 * Perform bidiagonal QR iteration, computing left
1143 * singular vectors of A in U
1144 * (Workspace: need BDSPAC)
1145 *
1146  CALL dbdsqr( 'U', n, 0, m, 0, s, work( ie ), dum,
1147  $ 1, u, ldu, dum, 1, work( iwork ),
1148  $ info )
1149 *
1150  END IF
1151 *
1152  ELSE IF( wntvo ) THEN
1153 *
1154 * Path 5 (M much larger than N, JOBU='S', JOBVT='O')
1155 * N left singular vectors to be computed in U and
1156 * N right singular vectors to be overwritten on A
1157 *
1158  IF( lwork.GE.2*n*n+max( 4*n, bdspac ) ) THEN
1159 *
1160 * Sufficient workspace for a fast algorithm
1161 *
1162  iu = 1
1163  IF( lwork.GE.wrkbl+2*lda*n ) THEN
1164 *
1165 * WORK(IU) is LDA by N and WORK(IR) is LDA by N
1166 *
1167  ldwrku = lda
1168  ir = iu + ldwrku*n
1169  ldwrkr = lda
1170  ELSE IF( lwork.GE.wrkbl+( lda+n )*n ) THEN
1171 *
1172 * WORK(IU) is LDA by N and WORK(IR) is N by N
1173 *
1174  ldwrku = lda
1175  ir = iu + ldwrku*n
1176  ldwrkr = n
1177  ELSE
1178 *
1179 * WORK(IU) is N by N and WORK(IR) is N by N
1180 *
1181  ldwrku = n
1182  ir = iu + ldwrku*n
1183  ldwrkr = n
1184  END IF
1185  itau = ir + ldwrkr*n
1186  iwork = itau + n
1187 *
1188 * Compute A=Q*R
1189 * (Workspace: need 2*N*N+2*N, prefer 2*N*N+N+N*NB)
1190 *
1191  CALL dgeqrf( m, n, a, lda, work( itau ),
1192  $ work( iwork ), lwork-iwork+1, ierr )
1193 *
1194 * Copy R to WORK(IU), zeroing out below it
1195 *
1196  CALL dlacpy( 'U', n, n, a, lda, work( iu ),
1197  $ ldwrku )
1198  CALL dlaset( 'L', n-1, n-1, zero, zero,
1199  $ work( iu+1 ), ldwrku )
1200 *
1201 * Generate Q in A
1202 * (Workspace: need 2*N*N+2*N, prefer 2*N*N+N+N*NB)
1203 *
1204  CALL dorgqr( m, n, n, a, lda, work( itau ),
1205  $ work( iwork ), lwork-iwork+1, ierr )
1206  ie = itau
1207  itauq = ie + n
1208  itaup = itauq + n
1209  iwork = itaup + n
1210 *
1211 * Bidiagonalize R in WORK(IU), copying result to
1212 * WORK(IR)
1213 * (Workspace: need 2*N*N+4*N,
1214 * prefer 2*N*N+3*N+2*N*NB)
1215 *
1216  CALL dgebrd( n, n, work( iu ), ldwrku, s,
1217  $ work( ie ), work( itauq ),
1218  $ work( itaup ), work( iwork ),
1219  $ lwork-iwork+1, ierr )
1220  CALL dlacpy( 'U', n, n, work( iu ), ldwrku,
1221  $ work( ir ), ldwrkr )
1222 *
1223 * Generate left bidiagonalizing vectors in WORK(IU)
1224 * (Workspace: need 2*N*N+4*N, prefer 2*N*N+3*N+N*NB)
1225 *
1226  CALL dorgbr( 'Q', n, n, n, work( iu ), ldwrku,
1227  $ work( itauq ), work( iwork ),
1228  $ lwork-iwork+1, ierr )
1229 *
1230 * Generate right bidiagonalizing vectors in WORK(IR)
1231 * (Workspace: need 2*N*N+4*N-1,
1232 * prefer 2*N*N+3*N+(N-1)*NB)
1233 *
1234  CALL dorgbr( 'P', n, n, n, work( ir ), ldwrkr,
1235  $ work( itaup ), work( iwork ),
1236  $ lwork-iwork+1, ierr )
1237  iwork = ie + n
1238 *
1239 * Perform bidiagonal QR iteration, computing left
1240 * singular vectors of R in WORK(IU) and computing
1241 * right singular vectors of R in WORK(IR)
1242 * (Workspace: need 2*N*N+BDSPAC)
1243 *
1244  CALL dbdsqr( 'U', n, n, n, 0, s, work( ie ),
1245  $ work( ir ), ldwrkr, work( iu ),
1246  $ ldwrku, dum, 1, work( iwork ), info )
1247 *
1248 * Multiply Q in A by left singular vectors of R in
1249 * WORK(IU), storing result in U
1250 * (Workspace: need N*N)
1251 *
1252  CALL dgemm( 'N', 'N', m, n, n, one, a, lda,
1253  $ work( iu ), ldwrku, zero, u, ldu )
1254 *
1255 * Copy right singular vectors of R to A
1256 * (Workspace: need N*N)
1257 *
1258  CALL dlacpy( 'F', n, n, work( ir ), ldwrkr, a,
1259  $ lda )
1260 *
1261  ELSE
1262 *
1263 * Insufficient workspace for a fast algorithm
1264 *
1265  itau = 1
1266  iwork = itau + n
1267 *
1268 * Compute A=Q*R, copying result to U
1269 * (Workspace: need 2*N, prefer N+N*NB)
1270 *
1271  CALL dgeqrf( m, n, a, lda, work( itau ),
1272  $ work( iwork ), lwork-iwork+1, ierr )
1273  CALL dlacpy( 'L', m, n, a, lda, u, ldu )
1274 *
1275 * Generate Q in U
1276 * (Workspace: need 2*N, prefer N+N*NB)
1277 *
1278  CALL dorgqr( m, n, n, u, ldu, work( itau ),
1279  $ work( iwork ), lwork-iwork+1, ierr )
1280  ie = itau
1281  itauq = ie + n
1282  itaup = itauq + n
1283  iwork = itaup + n
1284 *
1285 * Zero out below R in A
1286 *
1287  CALL dlaset( 'L', n-1, n-1, zero, zero, a( 2, 1 ),
1288  $ lda )
1289 *
1290 * Bidiagonalize R in A
1291 * (Workspace: need 4*N, prefer 3*N+2*N*NB)
1292 *
1293  CALL dgebrd( n, n, a, lda, s, work( ie ),
1294  $ work( itauq ), work( itaup ),
1295  $ work( iwork ), lwork-iwork+1, ierr )
1296 *
1297 * Multiply Q in U by left vectors bidiagonalizing R
1298 * (Workspace: need 3*N+M, prefer 3*N+M*NB)
1299 *
1300  CALL dormbr( 'Q', 'R', 'N', m, n, n, a, lda,
1301  $ work( itauq ), u, ldu, work( iwork ),
1302  $ lwork-iwork+1, ierr )
1303 *
1304 * Generate right vectors bidiagonalizing R in A
1305 * (Workspace: need 4*N-1, prefer 3*N+(N-1)*NB)
1306 *
1307  CALL dorgbr( 'P', n, n, n, a, lda, work( itaup ),
1308  $ work( iwork ), lwork-iwork+1, ierr )
1309  iwork = ie + n
1310 *
1311 * Perform bidiagonal QR iteration, computing left
1312 * singular vectors of A in U and computing right
1313 * singular vectors of A in A
1314 * (Workspace: need BDSPAC)
1315 *
1316  CALL dbdsqr( 'U', n, n, m, 0, s, work( ie ), a,
1317  $ lda, u, ldu, dum, 1, work( iwork ),
1318  $ info )
1319 *
1320  END IF
1321 *
1322  ELSE IF( wntvas ) THEN
1323 *
1324 * Path 6 (M much larger than N, JOBU='S', JOBVT='S'
1325 * or 'A')
1326 * N left singular vectors to be computed in U and
1327 * N right singular vectors to be computed in VT
1328 *
1329  IF( lwork.GE.n*n+max( 4*n, bdspac ) ) THEN
1330 *
1331 * Sufficient workspace for a fast algorithm
1332 *
1333  iu = 1
1334  IF( lwork.GE.wrkbl+lda*n ) THEN
1335 *
1336 * WORK(IU) is LDA by N
1337 *
1338  ldwrku = lda
1339  ELSE
1340 *
1341 * WORK(IU) is N by N
1342 *
1343  ldwrku = n
1344  END IF
1345  itau = iu + ldwrku*n
1346  iwork = itau + n
1347 *
1348 * Compute A=Q*R
1349 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
1350 *
1351  CALL dgeqrf( m, n, a, lda, work( itau ),
1352  $ work( iwork ), lwork-iwork+1, ierr )
1353 *
1354 * Copy R to WORK(IU), zeroing out below it
1355 *
1356  CALL dlacpy( 'U', n, n, a, lda, work( iu ),
1357  $ ldwrku )
1358  CALL dlaset( 'L', n-1, n-1, zero, zero,
1359  $ work( iu+1 ), ldwrku )
1360 *
1361 * Generate Q in A
1362 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
1363 *
1364  CALL dorgqr( m, n, n, a, lda, work( itau ),
1365  $ work( iwork ), lwork-iwork+1, ierr )
1366  ie = itau
1367  itauq = ie + n
1368  itaup = itauq + n
1369  iwork = itaup + n
1370 *
1371 * Bidiagonalize R in WORK(IU), copying result to VT
1372 * (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB)
1373 *
1374  CALL dgebrd( n, n, work( iu ), ldwrku, s,
1375  $ work( ie ), work( itauq ),
1376  $ work( itaup ), work( iwork ),
1377  $ lwork-iwork+1, ierr )
1378  CALL dlacpy( 'U', n, n, work( iu ), ldwrku, vt,
1379  $ ldvt )
1380 *
1381 * Generate left bidiagonalizing vectors in WORK(IU)
1382 * (Workspace: need N*N+4*N, prefer N*N+3*N+N*NB)
1383 *
1384  CALL dorgbr( 'Q', n, n, n, work( iu ), ldwrku,
1385  $ work( itauq ), work( iwork ),
1386  $ lwork-iwork+1, ierr )
1387 *
1388 * Generate right bidiagonalizing vectors in VT
1389 * (Workspace: need N*N+4*N-1,
1390 * prefer N*N+3*N+(N-1)*NB)
1391 *
1392  CALL dorgbr( 'P', n, n, n, vt, ldvt, work( itaup ),
1393  $ work( iwork ), lwork-iwork+1, ierr )
1394  iwork = ie + n
1395 *
1396 * Perform bidiagonal QR iteration, computing left
1397 * singular vectors of R in WORK(IU) and computing
1398 * right singular vectors of R in VT
1399 * (Workspace: need N*N+BDSPAC)
1400 *
1401  CALL dbdsqr( 'U', n, n, n, 0, s, work( ie ), vt,
1402  $ ldvt, work( iu ), ldwrku, dum, 1,
1403  $ work( iwork ), info )
1404 *
1405 * Multiply Q in A by left singular vectors of R in
1406 * WORK(IU), storing result in U
1407 * (Workspace: need N*N)
1408 *
1409  CALL dgemm( 'N', 'N', m, n, n, one, a, lda,
1410  $ work( iu ), ldwrku, zero, u, ldu )
1411 *
1412  ELSE
1413 *
1414 * Insufficient workspace for a fast algorithm
1415 *
1416  itau = 1
1417  iwork = itau + n
1418 *
1419 * Compute A=Q*R, copying result to U
1420 * (Workspace: need 2*N, prefer N+N*NB)
1421 *
1422  CALL dgeqrf( m, n, a, lda, work( itau ),
1423  $ work( iwork ), lwork-iwork+1, ierr )
1424  CALL dlacpy( 'L', m, n, a, lda, u, ldu )
1425 *
1426 * Generate Q in U
1427 * (Workspace: need 2*N, prefer N+N*NB)
1428 *
1429  CALL dorgqr( m, n, n, u, ldu, work( itau ),
1430  $ work( iwork ), lwork-iwork+1, ierr )
1431 *
1432 * Copy R to VT, zeroing out below it
1433 *
1434  CALL dlacpy( 'U', n, n, a, lda, vt, ldvt )
1435  IF( n.GT.1 )
1436  $ CALL dlaset( 'L', n-1, n-1, zero, zero,
1437  $ vt( 2, 1 ), ldvt )
1438  ie = itau
1439  itauq = ie + n
1440  itaup = itauq + n
1441  iwork = itaup + n
1442 *
1443 * Bidiagonalize R in VT
1444 * (Workspace: need 4*N, prefer 3*N+2*N*NB)
1445 *
1446  CALL dgebrd( n, n, vt, ldvt, s, work( ie ),
1447  $ work( itauq ), work( itaup ),
1448  $ work( iwork ), lwork-iwork+1, ierr )
1449 *
1450 * Multiply Q in U by left bidiagonalizing vectors
1451 * in VT
1452 * (Workspace: need 3*N+M, prefer 3*N+M*NB)
1453 *
1454  CALL dormbr( 'Q', 'R', 'N', m, n, n, vt, ldvt,
1455  $ work( itauq ), u, ldu, work( iwork ),
1456  $ lwork-iwork+1, ierr )
1457 *
1458 * Generate right bidiagonalizing vectors in VT
1459 * (Workspace: need 4*N-1, prefer 3*N+(N-1)*NB)
1460 *
1461  CALL dorgbr( 'P', n, n, n, vt, ldvt, work( itaup ),
1462  $ work( iwork ), lwork-iwork+1, ierr )
1463  iwork = ie + n
1464 *
1465 * Perform bidiagonal QR iteration, computing left
1466 * singular vectors of A in U and computing right
1467 * singular vectors of A in VT
1468 * (Workspace: need BDSPAC)
1469 *
1470  CALL dbdsqr( 'U', n, n, m, 0, s, work( ie ), vt,
1471  $ ldvt, u, ldu, dum, 1, work( iwork ),
1472  $ info )
1473 *
1474  END IF
1475 *
1476  END IF
1477 *
1478  ELSE IF( wntua ) THEN
1479 *
1480  IF( wntvn ) THEN
1481 *
1482 * Path 7 (M much larger than N, JOBU='A', JOBVT='N')
1483 * M left singular vectors to be computed in U and
1484 * no right singular vectors to be computed
1485 *
1486  IF( lwork.GE.n*n+max( n+m, 4*n, bdspac ) ) THEN
1487 *
1488 * Sufficient workspace for a fast algorithm
1489 *
1490  ir = 1
1491  IF( lwork.GE.wrkbl+lda*n ) THEN
1492 *
1493 * WORK(IR) is LDA by N
1494 *
1495  ldwrkr = lda
1496  ELSE
1497 *
1498 * WORK(IR) is N by N
1499 *
1500  ldwrkr = n
1501  END IF
1502  itau = ir + ldwrkr*n
1503  iwork = itau + n
1504 *
1505 * Compute A=Q*R, copying result to U
1506 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
1507 *
1508  CALL dgeqrf( m, n, a, lda, work( itau ),
1509  $ work( iwork ), lwork-iwork+1, ierr )
1510  CALL dlacpy( 'L', m, n, a, lda, u, ldu )
1511 *
1512 * Copy R to WORK(IR), zeroing out below it
1513 *
1514  CALL dlacpy( 'U', n, n, a, lda, work( ir ),
1515  $ ldwrkr )
1516  CALL dlaset( 'L', n-1, n-1, zero, zero,
1517  $ work( ir+1 ), ldwrkr )
1518 *
1519 * Generate Q in U
1520 * (Workspace: need N*N+N+M, prefer N*N+N+M*NB)
1521 *
1522  CALL dorgqr( m, m, n, u, ldu, work( itau ),
1523  $ work( iwork ), lwork-iwork+1, ierr )
1524  ie = itau
1525  itauq = ie + n
1526  itaup = itauq + n
1527  iwork = itaup + n
1528 *
1529 * Bidiagonalize R in WORK(IR)
1530 * (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB)
1531 *
1532  CALL dgebrd( n, n, work( ir ), ldwrkr, s,
1533  $ work( ie ), work( itauq ),
1534  $ work( itaup ), work( iwork ),
1535  $ lwork-iwork+1, ierr )
1536 *
1537 * Generate left bidiagonalizing vectors in WORK(IR)
1538 * (Workspace: need N*N+4*N, prefer N*N+3*N+N*NB)
1539 *
1540  CALL dorgbr( 'Q', n, n, n, work( ir ), ldwrkr,
1541  $ work( itauq ), work( iwork ),
1542  $ lwork-iwork+1, ierr )
1543  iwork = ie + n
1544 *
1545 * Perform bidiagonal QR iteration, computing left
1546 * singular vectors of R in WORK(IR)
1547 * (Workspace: need N*N+BDSPAC)
1548 *
1549  CALL dbdsqr( 'U', n, 0, n, 0, s, work( ie ), dum,
1550  $ 1, work( ir ), ldwrkr, dum, 1,
1551  $ work( iwork ), info )
1552 *
1553 * Multiply Q in U by left singular vectors of R in
1554 * WORK(IR), storing result in A
1555 * (Workspace: need N*N)
1556 *
1557  CALL dgemm( 'N', 'N', m, n, n, one, u, ldu,
1558  $ work( ir ), ldwrkr, zero, a, lda )
1559 *
1560 * Copy left singular vectors of A from A to U
1561 *
1562  CALL dlacpy( 'F', m, n, a, lda, u, ldu )
1563 *
1564  ELSE
1565 *
1566 * Insufficient workspace for a fast algorithm
1567 *
1568  itau = 1
1569  iwork = itau + n
1570 *
1571 * Compute A=Q*R, copying result to U
1572 * (Workspace: need 2*N, prefer N+N*NB)
1573 *
1574  CALL dgeqrf( m, n, a, lda, work( itau ),
1575  $ work( iwork ), lwork-iwork+1, ierr )
1576  CALL dlacpy( 'L', m, n, a, lda, u, ldu )
1577 *
1578 * Generate Q in U
1579 * (Workspace: need N+M, prefer N+M*NB)
1580 *
1581  CALL dorgqr( m, m, n, u, ldu, work( itau ),
1582  $ work( iwork ), lwork-iwork+1, ierr )
1583  ie = itau
1584  itauq = ie + n
1585  itaup = itauq + n
1586  iwork = itaup + n
1587 *
1588 * Zero out below R in A
1589 *
1590  CALL dlaset( 'L', n-1, n-1, zero, zero, a( 2, 1 ),
1591  $ lda )
1592 *
1593 * Bidiagonalize R in A
1594 * (Workspace: need 4*N, prefer 3*N+2*N*NB)
1595 *
1596  CALL dgebrd( n, n, a, lda, s, work( ie ),
1597  $ work( itauq ), work( itaup ),
1598  $ work( iwork ), lwork-iwork+1, ierr )
1599 *
1600 * Multiply Q in U by left bidiagonalizing vectors
1601 * in A
1602 * (Workspace: need 3*N+M, prefer 3*N+M*NB)
1603 *
1604  CALL dormbr( 'Q', 'R', 'N', m, n, n, a, lda,
1605  $ work( itauq ), u, ldu, work( iwork ),
1606  $ lwork-iwork+1, ierr )
1607  iwork = ie + n
1608 *
1609 * Perform bidiagonal QR iteration, computing left
1610 * singular vectors of A in U
1611 * (Workspace: need BDSPAC)
1612 *
1613  CALL dbdsqr( 'U', n, 0, m, 0, s, work( ie ), dum,
1614  $ 1, u, ldu, dum, 1, work( iwork ),
1615  $ info )
1616 *
1617  END IF
1618 *
1619  ELSE IF( wntvo ) THEN
1620 *
1621 * Path 8 (M much larger than N, JOBU='A', JOBVT='O')
1622 * M left singular vectors to be computed in U and
1623 * N right singular vectors to be overwritten on A
1624 *
1625  IF( lwork.GE.2*n*n+max( n+m, 4*n, bdspac ) ) THEN
1626 *
1627 * Sufficient workspace for a fast algorithm
1628 *
1629  iu = 1
1630  IF( lwork.GE.wrkbl+2*lda*n ) THEN
1631 *
1632 * WORK(IU) is LDA by N and WORK(IR) is LDA by N
1633 *
1634  ldwrku = lda
1635  ir = iu + ldwrku*n
1636  ldwrkr = lda
1637  ELSE IF( lwork.GE.wrkbl+( lda+n )*n ) THEN
1638 *
1639 * WORK(IU) is LDA by N and WORK(IR) is N by N
1640 *
1641  ldwrku = lda
1642  ir = iu + ldwrku*n
1643  ldwrkr = n
1644  ELSE
1645 *
1646 * WORK(IU) is N by N and WORK(IR) is N by N
1647 *
1648  ldwrku = n
1649  ir = iu + ldwrku*n
1650  ldwrkr = n
1651  END IF
1652  itau = ir + ldwrkr*n
1653  iwork = itau + n
1654 *
1655 * Compute A=Q*R, copying result to U
1656 * (Workspace: need 2*N*N+2*N, prefer 2*N*N+N+N*NB)
1657 *
1658  CALL dgeqrf( m, n, a, lda, work( itau ),
1659  $ work( iwork ), lwork-iwork+1, ierr )
1660  CALL dlacpy( 'L', m, n, a, lda, u, ldu )
1661 *
1662 * Generate Q in U
1663 * (Workspace: need 2*N*N+N+M, prefer 2*N*N+N+M*NB)
1664 *
1665  CALL dorgqr( m, m, n, u, ldu, work( itau ),
1666  $ work( iwork ), lwork-iwork+1, ierr )
1667 *
1668 * Copy R to WORK(IU), zeroing out below it
1669 *
1670  CALL dlacpy( 'U', n, n, a, lda, work( iu ),
1671  $ ldwrku )
1672  CALL dlaset( 'L', n-1, n-1, zero, zero,
1673  $ work( iu+1 ), ldwrku )
1674  ie = itau
1675  itauq = ie + n
1676  itaup = itauq + n
1677  iwork = itaup + n
1678 *
1679 * Bidiagonalize R in WORK(IU), copying result to
1680 * WORK(IR)
1681 * (Workspace: need 2*N*N+4*N,
1682 * prefer 2*N*N+3*N+2*N*NB)
1683 *
1684  CALL dgebrd( n, n, work( iu ), ldwrku, s,
1685  $ work( ie ), work( itauq ),
1686  $ work( itaup ), work( iwork ),
1687  $ lwork-iwork+1, ierr )
1688  CALL dlacpy( 'U', n, n, work( iu ), ldwrku,
1689  $ work( ir ), ldwrkr )
1690 *
1691 * Generate left bidiagonalizing vectors in WORK(IU)
1692 * (Workspace: need 2*N*N+4*N, prefer 2*N*N+3*N+N*NB)
1693 *
1694  CALL dorgbr( 'Q', n, n, n, work( iu ), ldwrku,
1695  $ work( itauq ), work( iwork ),
1696  $ lwork-iwork+1, ierr )
1697 *
1698 * Generate right bidiagonalizing vectors in WORK(IR)
1699 * (Workspace: need 2*N*N+4*N-1,
1700 * prefer 2*N*N+3*N+(N-1)*NB)
1701 *
1702  CALL dorgbr( 'P', n, n, n, work( ir ), ldwrkr,
1703  $ work( itaup ), work( iwork ),
1704  $ lwork-iwork+1, ierr )
1705  iwork = ie + n
1706 *
1707 * Perform bidiagonal QR iteration, computing left
1708 * singular vectors of R in WORK(IU) and computing
1709 * right singular vectors of R in WORK(IR)
1710 * (Workspace: need 2*N*N+BDSPAC)
1711 *
1712  CALL dbdsqr( 'U', n, n, n, 0, s, work( ie ),
1713  $ work( ir ), ldwrkr, work( iu ),
1714  $ ldwrku, dum, 1, work( iwork ), info )
1715 *
1716 * Multiply Q in U by left singular vectors of R in
1717 * WORK(IU), storing result in A
1718 * (Workspace: need N*N)
1719 *
1720  CALL dgemm( 'N', 'N', m, n, n, one, u, ldu,
1721  $ work( iu ), ldwrku, zero, a, lda )
1722 *
1723 * Copy left singular vectors of A from A to U
1724 *
1725  CALL dlacpy( 'F', m, n, a, lda, u, ldu )
1726 *
1727 * Copy right singular vectors of R from WORK(IR) to A
1728 *
1729  CALL dlacpy( 'F', n, n, work( ir ), ldwrkr, a,
1730  $ lda )
1731 *
1732  ELSE
1733 *
1734 * Insufficient workspace for a fast algorithm
1735 *
1736  itau = 1
1737  iwork = itau + n
1738 *
1739 * Compute A=Q*R, copying result to U
1740 * (Workspace: need 2*N, prefer N+N*NB)
1741 *
1742  CALL dgeqrf( m, n, a, lda, work( itau ),
1743  $ work( iwork ), lwork-iwork+1, ierr )
1744  CALL dlacpy( 'L', m, n, a, lda, u, ldu )
1745 *
1746 * Generate Q in U
1747 * (Workspace: need N+M, prefer N+M*NB)
1748 *
1749  CALL dorgqr( m, m, n, u, ldu, work( itau ),
1750  $ work( iwork ), lwork-iwork+1, ierr )
1751  ie = itau
1752  itauq = ie + n
1753  itaup = itauq + n
1754  iwork = itaup + n
1755 *
1756 * Zero out below R in A
1757 *
1758  CALL dlaset( 'L', n-1, n-1, zero, zero, a( 2, 1 ),
1759  $ lda )
1760 *
1761 * Bidiagonalize R in A
1762 * (Workspace: need 4*N, prefer 3*N+2*N*NB)
1763 *
1764  CALL dgebrd( n, n, a, lda, s, work( ie ),
1765  $ work( itauq ), work( itaup ),
1766  $ work( iwork ), lwork-iwork+1, ierr )
1767 *
1768 * Multiply Q in U by left bidiagonalizing vectors
1769 * in A
1770 * (Workspace: need 3*N+M, prefer 3*N+M*NB)
1771 *
1772  CALL dormbr( 'Q', 'R', 'N', m, n, n, a, lda,
1773  $ work( itauq ), u, ldu, work( iwork ),
1774  $ lwork-iwork+1, ierr )
1775 *
1776 * Generate right bidiagonalizing vectors in A
1777 * (Workspace: need 4*N-1, prefer 3*N+(N-1)*NB)
1778 *
1779  CALL dorgbr( 'P', n, n, n, a, lda, work( itaup ),
1780  $ work( iwork ), lwork-iwork+1, ierr )
1781  iwork = ie + n
1782 *
1783 * Perform bidiagonal QR iteration, computing left
1784 * singular vectors of A in U and computing right
1785 * singular vectors of A in A
1786 * (Workspace: need BDSPAC)
1787 *
1788  CALL dbdsqr( 'U', n, n, m, 0, s, work( ie ), a,
1789  $ lda, u, ldu, dum, 1, work( iwork ),
1790  $ info )
1791 *
1792  END IF
1793 *
1794  ELSE IF( wntvas ) THEN
1795 *
1796 * Path 9 (M much larger than N, JOBU='A', JOBVT='S'
1797 * or 'A')
1798 * M left singular vectors to be computed in U and
1799 * N right singular vectors to be computed in VT
1800 *
1801  IF( lwork.GE.n*n+max( n+m, 4*n, bdspac ) ) THEN
1802 *
1803 * Sufficient workspace for a fast algorithm
1804 *
1805  iu = 1
1806  IF( lwork.GE.wrkbl+lda*n ) THEN
1807 *
1808 * WORK(IU) is LDA by N
1809 *
1810  ldwrku = lda
1811  ELSE
1812 *
1813 * WORK(IU) is N by N
1814 *
1815  ldwrku = n
1816  END IF
1817  itau = iu + ldwrku*n
1818  iwork = itau + n
1819 *
1820 * Compute A=Q*R, copying result to U
1821 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
1822 *
1823  CALL dgeqrf( m, n, a, lda, work( itau ),
1824  $ work( iwork ), lwork-iwork+1, ierr )
1825  CALL dlacpy( 'L', m, n, a, lda, u, ldu )
1826 *
1827 * Generate Q in U
1828 * (Workspace: need N*N+N+M, prefer N*N+N+M*NB)
1829 *
1830  CALL dorgqr( m, m, n, u, ldu, work( itau ),
1831  $ work( iwork ), lwork-iwork+1, ierr )
1832 *
1833 * Copy R to WORK(IU), zeroing out below it
1834 *
1835  CALL dlacpy( 'U', n, n, a, lda, work( iu ),
1836  $ ldwrku )
1837  CALL dlaset( 'L', n-1, n-1, zero, zero,
1838  $ work( iu+1 ), ldwrku )
1839  ie = itau
1840  itauq = ie + n
1841  itaup = itauq + n
1842  iwork = itaup + n
1843 *
1844 * Bidiagonalize R in WORK(IU), copying result to VT
1845 * (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB)
1846 *
1847  CALL dgebrd( n, n, work( iu ), ldwrku, s,
1848  $ work( ie ), work( itauq ),
1849  $ work( itaup ), work( iwork ),
1850  $ lwork-iwork+1, ierr )
1851  CALL dlacpy( 'U', n, n, work( iu ), ldwrku, vt,
1852  $ ldvt )
1853 *
1854 * Generate left bidiagonalizing vectors in WORK(IU)
1855 * (Workspace: need N*N+4*N, prefer N*N+3*N+N*NB)
1856 *
1857  CALL dorgbr( 'Q', n, n, n, work( iu ), ldwrku,
1858  $ work( itauq ), work( iwork ),
1859  $ lwork-iwork+1, ierr )
1860 *
1861 * Generate right bidiagonalizing vectors in VT
1862 * (Workspace: need N*N+4*N-1,
1863 * prefer N*N+3*N+(N-1)*NB)
1864 *
1865  CALL dorgbr( 'P', n, n, n, vt, ldvt, work( itaup ),
1866  $ work( iwork ), lwork-iwork+1, ierr )
1867  iwork = ie + n
1868 *
1869 * Perform bidiagonal QR iteration, computing left
1870 * singular vectors of R in WORK(IU) and computing
1871 * right singular vectors of R in VT
1872 * (Workspace: need N*N+BDSPAC)
1873 *
1874  CALL dbdsqr( 'U', n, n, n, 0, s, work( ie ), vt,
1875  $ ldvt, work( iu ), ldwrku, dum, 1,
1876  $ work( iwork ), info )
1877 *
1878 * Multiply Q in U by left singular vectors of R in
1879 * WORK(IU), storing result in A
1880 * (Workspace: need N*N)
1881 *
1882  CALL dgemm( 'N', 'N', m, n, n, one, u, ldu,
1883  $ work( iu ), ldwrku, zero, a, lda )
1884 *
1885 * Copy left singular vectors of A from A to U
1886 *
1887  CALL dlacpy( 'F', m, n, a, lda, u, ldu )
1888 *
1889  ELSE
1890 *
1891 * Insufficient workspace for a fast algorithm
1892 *
1893  itau = 1
1894  iwork = itau + n
1895 *
1896 * Compute A=Q*R, copying result to U
1897 * (Workspace: need 2*N, prefer N+N*NB)
1898 *
1899  CALL dgeqrf( m, n, a, lda, work( itau ),
1900  $ work( iwork ), lwork-iwork+1, ierr )
1901  CALL dlacpy( 'L', m, n, a, lda, u, ldu )
1902 *
1903 * Generate Q in U
1904 * (Workspace: need N+M, prefer N+M*NB)
1905 *
1906  CALL dorgqr( m, m, n, u, ldu, work( itau ),
1907  $ work( iwork ), lwork-iwork+1, ierr )
1908 *
1909 * Copy R from A to VT, zeroing out below it
1910 *
1911  CALL dlacpy( 'U', n, n, a, lda, vt, ldvt )
1912  IF( n.GT.1 )
1913  $ CALL dlaset( 'L', n-1, n-1, zero, zero,
1914  $ vt( 2, 1 ), ldvt )
1915  ie = itau
1916  itauq = ie + n
1917  itaup = itauq + n
1918  iwork = itaup + n
1919 *
1920 * Bidiagonalize R in VT
1921 * (Workspace: need 4*N, prefer 3*N+2*N*NB)
1922 *
1923  CALL dgebrd( n, n, vt, ldvt, s, work( ie ),
1924  $ work( itauq ), work( itaup ),
1925  $ work( iwork ), lwork-iwork+1, ierr )
1926 *
1927 * Multiply Q in U by left bidiagonalizing vectors
1928 * in VT
1929 * (Workspace: need 3*N+M, prefer 3*N+M*NB)
1930 *
1931  CALL dormbr( 'Q', 'R', 'N', m, n, n, vt, ldvt,
1932  $ work( itauq ), u, ldu, work( iwork ),
1933  $ lwork-iwork+1, ierr )
1934 *
1935 * Generate right bidiagonalizing vectors in VT
1936 * (Workspace: need 4*N-1, prefer 3*N+(N-1)*NB)
1937 *
1938  CALL dorgbr( 'P', n, n, n, vt, ldvt, work( itaup ),
1939  $ work( iwork ), lwork-iwork+1, ierr )
1940  iwork = ie + n
1941 *
1942 * Perform bidiagonal QR iteration, computing left
1943 * singular vectors of A in U and computing right
1944 * singular vectors of A in VT
1945 * (Workspace: need BDSPAC)
1946 *
1947  CALL dbdsqr( 'U', n, n, m, 0, s, work( ie ), vt,
1948  $ ldvt, u, ldu, dum, 1, work( iwork ),
1949  $ info )
1950 *
1951  END IF
1952 *
1953  END IF
1954 *
1955  END IF
1956 *
1957  ELSE
1958 *
1959 * M .LT. MNTHR
1960 *
1961 * Path 10 (M at least N, but not much larger)
1962 * Reduce to bidiagonal form without QR decomposition
1963 *
1964  ie = 1
1965  itauq = ie + n
1966  itaup = itauq + n
1967  iwork = itaup + n
1968 *
1969 * Bidiagonalize A
1970 * (Workspace: need 3*N+M, prefer 3*N+(M+N)*NB)
1971 *
1972  CALL dgebrd( m, n, a, lda, s, work( ie ), work( itauq ),
1973  $ work( itaup ), work( iwork ), lwork-iwork+1,
1974  $ ierr )
1975  IF( wntuas ) THEN
1976 *
1977 * If left singular vectors desired in U, copy result to U
1978 * and generate left bidiagonalizing vectors in U
1979 * (Workspace: need 3*N+NCU, prefer 3*N+NCU*NB)
1980 *
1981  CALL dlacpy( 'L', m, n, a, lda, u, ldu )
1982  IF( wntus )
1983  $ ncu = n
1984  IF( wntua )
1985  $ ncu = m
1986  CALL dorgbr( 'Q', m, ncu, n, u, ldu, work( itauq ),
1987  $ work( iwork ), lwork-iwork+1, ierr )
1988  END IF
1989  IF( wntvas ) THEN
1990 *
1991 * If right singular vectors desired in VT, copy result to
1992 * VT and generate right bidiagonalizing vectors in VT
1993 * (Workspace: need 4*N-1, prefer 3*N+(N-1)*NB)
1994 *
1995  CALL dlacpy( 'U', n, n, a, lda, vt, ldvt )
1996  CALL dorgbr( 'P', n, n, n, vt, ldvt, work( itaup ),
1997  $ work( iwork ), lwork-iwork+1, ierr )
1998  END IF
1999  IF( wntuo ) THEN
2000 *
2001 * If left singular vectors desired in A, generate left
2002 * bidiagonalizing vectors in A
2003 * (Workspace: need 4*N, prefer 3*N+N*NB)
2004 *
2005  CALL dorgbr( 'Q', m, n, n, a, lda, work( itauq ),
2006  $ work( iwork ), lwork-iwork+1, ierr )
2007  END IF
2008  IF( wntvo ) THEN
2009 *
2010 * If right singular vectors desired in A, generate right
2011 * bidiagonalizing vectors in A
2012 * (Workspace: need 4*N-1, prefer 3*N+(N-1)*NB)
2013 *
2014  CALL dorgbr( 'P', n, n, n, a, lda, work( itaup ),
2015  $ work( iwork ), lwork-iwork+1, ierr )
2016  END IF
2017  iwork = ie + n
2018  IF( wntuas .OR. wntuo )
2019  $ nru = m
2020  IF( wntun )
2021  $ nru = 0
2022  IF( wntvas .OR. wntvo )
2023  $ ncvt = n
2024  IF( wntvn )
2025  $ ncvt = 0
2026  IF( ( .NOT.wntuo ) .AND. ( .NOT.wntvo ) ) THEN
2027 *
2028 * Perform bidiagonal QR iteration, if desired, computing
2029 * left singular vectors in U and computing right singular
2030 * vectors in VT
2031 * (Workspace: need BDSPAC)
2032 *
2033  CALL dbdsqr( 'U', n, ncvt, nru, 0, s, work( ie ), vt,
2034  $ ldvt, u, ldu, dum, 1, work( iwork ), info )
2035  ELSE IF( ( .NOT.wntuo ) .AND. wntvo ) THEN
2036 *
2037 * Perform bidiagonal QR iteration, if desired, computing
2038 * left singular vectors in U and computing right singular
2039 * vectors in A
2040 * (Workspace: need BDSPAC)
2041 *
2042  CALL dbdsqr( 'U', n, ncvt, nru, 0, s, work( ie ), a, lda,
2043  $ u, ldu, dum, 1, work( iwork ), info )
2044  ELSE
2045 *
2046 * Perform bidiagonal QR iteration, if desired, computing
2047 * left singular vectors in A and computing right singular
2048 * vectors in VT
2049 * (Workspace: need BDSPAC)
2050 *
2051  CALL dbdsqr( 'U', n, ncvt, nru, 0, s, work( ie ), vt,
2052  $ ldvt, a, lda, dum, 1, work( iwork ), info )
2053  END IF
2054 *
2055  END IF
2056 *
2057  ELSE
2058 *
2059 * A has more columns than rows. If A has sufficiently more
2060 * columns than rows, first reduce using the LQ decomposition (if
2061 * sufficient workspace available)
2062 *
2063  IF( n.GE.mnthr ) THEN
2064 *
2065  IF( wntvn ) THEN
2066 *
2067 * Path 1t(N much larger than M, JOBVT='N')
2068 * No right singular vectors to be computed
2069 *
2070  itau = 1
2071  iwork = itau + m
2072 *
2073 * Compute A=L*Q
2074 * (Workspace: need 2*M, prefer M+M*NB)
2075 *
2076  CALL dgelqf( m, n, a, lda, work( itau ), work( iwork ),
2077  $ lwork-iwork+1, ierr )
2078 *
2079 * Zero out above L
2080 *
2081  CALL dlaset( 'U', m-1, m-1, zero, zero, a( 1, 2 ), lda )
2082  ie = 1
2083  itauq = ie + m
2084  itaup = itauq + m
2085  iwork = itaup + m
2086 *
2087 * Bidiagonalize L in A
2088 * (Workspace: need 4*M, prefer 3*M+2*M*NB)
2089 *
2090  CALL dgebrd( m, m, a, lda, s, work( ie ), work( itauq ),
2091  $ work( itaup ), work( iwork ), lwork-iwork+1,
2092  $ ierr )
2093  IF( wntuo .OR. wntuas ) THEN
2094 *
2095 * If left singular vectors desired, generate Q
2096 * (Workspace: need 4*M, prefer 3*M+M*NB)
2097 *
2098  CALL dorgbr( 'Q', m, m, m, a, lda, work( itauq ),
2099  $ work( iwork ), lwork-iwork+1, ierr )
2100  END IF
2101  iwork = ie + m
2102  nru = 0
2103  IF( wntuo .OR. wntuas )
2104  $ nru = m
2105 *
2106 * Perform bidiagonal QR iteration, computing left singular
2107 * vectors of A in A if desired
2108 * (Workspace: need BDSPAC)
2109 *
2110  CALL dbdsqr( 'U', m, 0, nru, 0, s, work( ie ), dum, 1, a,
2111  $ lda, dum, 1, work( iwork ), info )
2112 *
2113 * If left singular vectors desired in U, copy them there
2114 *
2115  IF( wntuas )
2116  $ CALL dlacpy( 'F', m, m, a, lda, u, ldu )
2117 *
2118  ELSE IF( wntvo .AND. wntun ) THEN
2119 *
2120 * Path 2t(N much larger than M, JOBU='N', JOBVT='O')
2121 * M right singular vectors to be overwritten on A and
2122 * no left singular vectors to be computed
2123 *
2124  IF( lwork.GE.m*m+max( 4*m, bdspac ) ) THEN
2125 *
2126 * Sufficient workspace for a fast algorithm
2127 *
2128  ir = 1
2129  IF( lwork.GE.max( wrkbl, lda*n+m )+lda*m ) THEN
2130 *
2131 * WORK(IU) is LDA by N and WORK(IR) is LDA by M
2132 *
2133  ldwrku = lda
2134  chunk = n
2135  ldwrkr = lda
2136  ELSE IF( lwork.GE.max( wrkbl, lda*n+m )+m*m ) THEN
2137 *
2138 * WORK(IU) is LDA by N and WORK(IR) is M by M
2139 *
2140  ldwrku = lda
2141  chunk = n
2142  ldwrkr = m
2143  ELSE
2144 *
2145 * WORK(IU) is M by CHUNK and WORK(IR) is M by M
2146 *
2147  ldwrku = m
2148  chunk = ( lwork-m*m-m ) / m
2149  ldwrkr = m
2150  END IF
2151  itau = ir + ldwrkr*m
2152  iwork = itau + m
2153 *
2154 * Compute A=L*Q
2155 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
2156 *
2157  CALL dgelqf( m, n, a, lda, work( itau ),
2158  $ work( iwork ), lwork-iwork+1, ierr )
2159 *
2160 * Copy L to WORK(IR) and zero out above it
2161 *
2162  CALL dlacpy( 'L', m, m, a, lda, work( ir ), ldwrkr )
2163  CALL dlaset( 'U', m-1, m-1, zero, zero,
2164  $ work( ir+ldwrkr ), ldwrkr )
2165 *
2166 * Generate Q in A
2167 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
2168 *
2169  CALL dorglq( m, n, m, a, lda, work( itau ),
2170  $ work( iwork ), lwork-iwork+1, ierr )
2171  ie = itau
2172  itauq = ie + m
2173  itaup = itauq + m
2174  iwork = itaup + m
2175 *
2176 * Bidiagonalize L in WORK(IR)
2177 * (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB)
2178 *
2179  CALL dgebrd( m, m, work( ir ), ldwrkr, s, work( ie ),
2180  $ work( itauq ), work( itaup ),
2181  $ work( iwork ), lwork-iwork+1, ierr )
2182 *
2183 * Generate right vectors bidiagonalizing L
2184 * (Workspace: need M*M+4*M-1, prefer M*M+3*M+(M-1)*NB)
2185 *
2186  CALL dorgbr( 'P', m, m, m, work( ir ), ldwrkr,
2187  $ work( itaup ), work( iwork ),
2188  $ lwork-iwork+1, ierr )
2189  iwork = ie + m
2190 *
2191 * Perform bidiagonal QR iteration, computing right
2192 * singular vectors of L in WORK(IR)
2193 * (Workspace: need M*M+BDSPAC)
2194 *
2195  CALL dbdsqr( 'U', m, m, 0, 0, s, work( ie ),
2196  $ work( ir ), ldwrkr, dum, 1, dum, 1,
2197  $ work( iwork ), info )
2198  iu = ie + m
2199 *
2200 * Multiply right singular vectors of L in WORK(IR) by Q
2201 * in A, storing result in WORK(IU) and copying to A
2202 * (Workspace: need M*M+2*M, prefer M*M+M*N+M)
2203 *
2204  DO 30 i = 1, n, chunk
2205  blk = min( n-i+1, chunk )
2206  CALL dgemm( 'N', 'N', m, blk, m, one, work( ir ),
2207  $ ldwrkr, a( 1, i ), lda, zero,
2208  $ work( iu ), ldwrku )
2209  CALL dlacpy( 'F', m, blk, work( iu ), ldwrku,
2210  $ a( 1, i ), lda )
2211  30 CONTINUE
2212 *
2213  ELSE
2214 *
2215 * Insufficient workspace for a fast algorithm
2216 *
2217  ie = 1
2218  itauq = ie + m
2219  itaup = itauq + m
2220  iwork = itaup + m
2221 *
2222 * Bidiagonalize A
2223 * (Workspace: need 3*M+N, prefer 3*M+(M+N)*NB)
2224 *
2225  CALL dgebrd( m, n, a, lda, s, work( ie ),
2226  $ work( itauq ), work( itaup ),
2227  $ work( iwork ), lwork-iwork+1, ierr )
2228 *
2229 * Generate right vectors bidiagonalizing A
2230 * (Workspace: need 4*M, prefer 3*M+M*NB)
2231 *
2232  CALL dorgbr( 'P', m, n, m, a, lda, work( itaup ),
2233  $ work( iwork ), lwork-iwork+1, ierr )
2234  iwork = ie + m
2235 *
2236 * Perform bidiagonal QR iteration, computing right
2237 * singular vectors of A in A
2238 * (Workspace: need BDSPAC)
2239 *
2240  CALL dbdsqr( 'L', m, n, 0, 0, s, work( ie ), a, lda,
2241  $ dum, 1, dum, 1, work( iwork ), info )
2242 *
2243  END IF
2244 *
2245  ELSE IF( wntvo .AND. wntuas ) THEN
2246 *
2247 * Path 3t(N much larger than M, JOBU='S' or 'A', JOBVT='O')
2248 * M right singular vectors to be overwritten on A and
2249 * M left singular vectors to be computed in U
2250 *
2251  IF( lwork.GE.m*m+max( 4*m, bdspac ) ) THEN
2252 *
2253 * Sufficient workspace for a fast algorithm
2254 *
2255  ir = 1
2256  IF( lwork.GE.max( wrkbl, lda*n+m )+lda*m ) THEN
2257 *
2258 * WORK(IU) is LDA by N and WORK(IR) is LDA by M
2259 *
2260  ldwrku = lda
2261  chunk = n
2262  ldwrkr = lda
2263  ELSE IF( lwork.GE.max( wrkbl, lda*n+m )+m*m ) THEN
2264 *
2265 * WORK(IU) is LDA by N and WORK(IR) is M by M
2266 *
2267  ldwrku = lda
2268  chunk = n
2269  ldwrkr = m
2270  ELSE
2271 *
2272 * WORK(IU) is M by CHUNK and WORK(IR) is M by M
2273 *
2274  ldwrku = m
2275  chunk = ( lwork-m*m-m ) / m
2276  ldwrkr = m
2277  END IF
2278  itau = ir + ldwrkr*m
2279  iwork = itau + m
2280 *
2281 * Compute A=L*Q
2282 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
2283 *
2284  CALL dgelqf( m, n, a, lda, work( itau ),
2285  $ work( iwork ), lwork-iwork+1, ierr )
2286 *
2287 * Copy L to U, zeroing about above it
2288 *
2289  CALL dlacpy( 'L', m, m, a, lda, u, ldu )
2290  CALL dlaset( 'U', m-1, m-1, zero, zero, u( 1, 2 ),
2291  $ ldu )
2292 *
2293 * Generate Q in A
2294 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
2295 *
2296  CALL dorglq( m, n, m, a, lda, work( itau ),
2297  $ work( iwork ), lwork-iwork+1, ierr )
2298  ie = itau
2299  itauq = ie + m
2300  itaup = itauq + m
2301  iwork = itaup + m
2302 *
2303 * Bidiagonalize L in U, copying result to WORK(IR)
2304 * (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB)
2305 *
2306  CALL dgebrd( m, m, u, ldu, s, work( ie ),
2307  $ work( itauq ), work( itaup ),
2308  $ work( iwork ), lwork-iwork+1, ierr )
2309  CALL dlacpy( 'U', m, m, u, ldu, work( ir ), ldwrkr )
2310 *
2311 * Generate right vectors bidiagonalizing L in WORK(IR)
2312 * (Workspace: need M*M+4*M-1, prefer M*M+3*M+(M-1)*NB)
2313 *
2314  CALL dorgbr( 'P', m, m, m, work( ir ), ldwrkr,
2315  $ work( itaup ), work( iwork ),
2316  $ lwork-iwork+1, ierr )
2317 *
2318 * Generate left vectors bidiagonalizing L in U
2319 * (Workspace: need M*M+4*M, prefer M*M+3*M+M*NB)
2320 *
2321  CALL dorgbr( 'Q', m, m, m, u, ldu, work( itauq ),
2322  $ work( iwork ), lwork-iwork+1, ierr )
2323  iwork = ie + m
2324 *
2325 * Perform bidiagonal QR iteration, computing left
2326 * singular vectors of L in U, and computing right
2327 * singular vectors of L in WORK(IR)
2328 * (Workspace: need M*M+BDSPAC)
2329 *
2330  CALL dbdsqr( 'U', m, m, m, 0, s, work( ie ),
2331  $ work( ir ), ldwrkr, u, ldu, dum, 1,
2332  $ work( iwork ), info )
2333  iu = ie + m
2334 *
2335 * Multiply right singular vectors of L in WORK(IR) by Q
2336 * in A, storing result in WORK(IU) and copying to A
2337 * (Workspace: need M*M+2*M, prefer M*M+M*N+M))
2338 *
2339  DO 40 i = 1, n, chunk
2340  blk = min( n-i+1, chunk )
2341  CALL dgemm( 'N', 'N', m, blk, m, one, work( ir ),
2342  $ ldwrkr, a( 1, i ), lda, zero,
2343  $ work( iu ), ldwrku )
2344  CALL dlacpy( 'F', m, blk, work( iu ), ldwrku,
2345  $ a( 1, i ), lda )
2346  40 CONTINUE
2347 *
2348  ELSE
2349 *
2350 * Insufficient workspace for a fast algorithm
2351 *
2352  itau = 1
2353  iwork = itau + m
2354 *
2355 * Compute A=L*Q
2356 * (Workspace: need 2*M, prefer M+M*NB)
2357 *
2358  CALL dgelqf( m, n, a, lda, work( itau ),
2359  $ work( iwork ), lwork-iwork+1, ierr )
2360 *
2361 * Copy L to U, zeroing out above it
2362 *
2363  CALL dlacpy( 'L', m, m, a, lda, u, ldu )
2364  CALL dlaset( 'U', m-1, m-1, zero, zero, u( 1, 2 ),
2365  $ ldu )
2366 *
2367 * Generate Q in A
2368 * (Workspace: need 2*M, prefer M+M*NB)
2369 *
2370  CALL dorglq( m, n, m, a, lda, work( itau ),
2371  $ work( iwork ), lwork-iwork+1, ierr )
2372  ie = itau
2373  itauq = ie + m
2374  itaup = itauq + m
2375  iwork = itaup + m
2376 *
2377 * Bidiagonalize L in U
2378 * (Workspace: need 4*M, prefer 3*M+2*M*NB)
2379 *
2380  CALL dgebrd( m, m, u, ldu, s, work( ie ),
2381  $ work( itauq ), work( itaup ),
2382  $ work( iwork ), lwork-iwork+1, ierr )
2383 *
2384 * Multiply right vectors bidiagonalizing L by Q in A
2385 * (Workspace: need 3*M+N, prefer 3*M+N*NB)
2386 *
2387  CALL dormbr( 'P', 'L', 'T', m, n, m, u, ldu,
2388  $ work( itaup ), a, lda, work( iwork ),
2389  $ lwork-iwork+1, ierr )
2390 *
2391 * Generate left vectors bidiagonalizing L in U
2392 * (Workspace: need 4*M, prefer 3*M+M*NB)
2393 *
2394  CALL dorgbr( 'Q', m, m, m, u, ldu, work( itauq ),
2395  $ work( iwork ), lwork-iwork+1, ierr )
2396  iwork = ie + m
2397 *
2398 * Perform bidiagonal QR iteration, computing left
2399 * singular vectors of A in U and computing right
2400 * singular vectors of A in A
2401 * (Workspace: need BDSPAC)
2402 *
2403  CALL dbdsqr( 'U', m, n, m, 0, s, work( ie ), a, lda,
2404  $ u, ldu, dum, 1, work( iwork ), info )
2405 *
2406  END IF
2407 *
2408  ELSE IF( wntvs ) THEN
2409 *
2410  IF( wntun ) THEN
2411 *
2412 * Path 4t(N much larger than M, JOBU='N', JOBVT='S')
2413 * M right singular vectors to be computed in VT and
2414 * no left singular vectors to be computed
2415 *
2416  IF( lwork.GE.m*m+max( 4*m, bdspac ) ) THEN
2417 *
2418 * Sufficient workspace for a fast algorithm
2419 *
2420  ir = 1
2421  IF( lwork.GE.wrkbl+lda*m ) THEN
2422 *
2423 * WORK(IR) is LDA by M
2424 *
2425  ldwrkr = lda
2426  ELSE
2427 *
2428 * WORK(IR) is M by M
2429 *
2430  ldwrkr = m
2431  END IF
2432  itau = ir + ldwrkr*m
2433  iwork = itau + m
2434 *
2435 * Compute A=L*Q
2436 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
2437 *
2438  CALL dgelqf( m, n, a, lda, work( itau ),
2439  $ work( iwork ), lwork-iwork+1, ierr )
2440 *
2441 * Copy L to WORK(IR), zeroing out above it
2442 *
2443  CALL dlacpy( 'L', m, m, a, lda, work( ir ),
2444  $ ldwrkr )
2445  CALL dlaset( 'U', m-1, m-1, zero, zero,
2446  $ work( ir+ldwrkr ), ldwrkr )
2447 *
2448 * Generate Q in A
2449 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
2450 *
2451  CALL dorglq( m, n, m, a, lda, work( itau ),
2452  $ work( iwork ), lwork-iwork+1, ierr )
2453  ie = itau
2454  itauq = ie + m
2455  itaup = itauq + m
2456  iwork = itaup + m
2457 *
2458 * Bidiagonalize L in WORK(IR)
2459 * (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB)
2460 *
2461  CALL dgebrd( m, m, work( ir ), ldwrkr, s,
2462  $ work( ie ), work( itauq ),
2463  $ work( itaup ), work( iwork ),
2464  $ lwork-iwork+1, ierr )
2465 *
2466 * Generate right vectors bidiagonalizing L in
2467 * WORK(IR)
2468 * (Workspace: need M*M+4*M, prefer M*M+3*M+(M-1)*NB)
2469 *
2470  CALL dorgbr( 'P', m, m, m, work( ir ), ldwrkr,
2471  $ work( itaup ), work( iwork ),
2472  $ lwork-iwork+1, ierr )
2473  iwork = ie + m
2474 *
2475 * Perform bidiagonal QR iteration, computing right
2476 * singular vectors of L in WORK(IR)
2477 * (Workspace: need M*M+BDSPAC)
2478 *
2479  CALL dbdsqr( 'U', m, m, 0, 0, s, work( ie ),
2480  $ work( ir ), ldwrkr, dum, 1, dum, 1,
2481  $ work( iwork ), info )
2482 *
2483 * Multiply right singular vectors of L in WORK(IR) by
2484 * Q in A, storing result in VT
2485 * (Workspace: need M*M)
2486 *
2487  CALL dgemm( 'N', 'N', m, n, m, one, work( ir ),
2488  $ ldwrkr, a, lda, zero, vt, ldvt )
2489 *
2490  ELSE
2491 *
2492 * Insufficient workspace for a fast algorithm
2493 *
2494  itau = 1
2495  iwork = itau + m
2496 *
2497 * Compute A=L*Q
2498 * (Workspace: need 2*M, prefer M+M*NB)
2499 *
2500  CALL dgelqf( m, n, a, lda, work( itau ),
2501  $ work( iwork ), lwork-iwork+1, ierr )
2502 *
2503 * Copy result to VT
2504 *
2505  CALL dlacpy( 'U', m, n, a, lda, vt, ldvt )
2506 *
2507 * Generate Q in VT
2508 * (Workspace: need 2*M, prefer M+M*NB)
2509 *
2510  CALL dorglq( m, n, m, vt, ldvt, work( itau ),
2511  $ work( iwork ), lwork-iwork+1, ierr )
2512  ie = itau
2513  itauq = ie + m
2514  itaup = itauq + m
2515  iwork = itaup + m
2516 *
2517 * Zero out above L in A
2518 *
2519  CALL dlaset( 'U', m-1, m-1, zero, zero, a( 1, 2 ),
2520  $ lda )
2521 *
2522 * Bidiagonalize L in A
2523 * (Workspace: need 4*M, prefer 3*M+2*M*NB)
2524 *
2525  CALL dgebrd( m, m, a, lda, s, work( ie ),
2526  $ work( itauq ), work( itaup ),
2527  $ work( iwork ), lwork-iwork+1, ierr )
2528 *
2529 * Multiply right vectors bidiagonalizing L by Q in VT
2530 * (Workspace: need 3*M+N, prefer 3*M+N*NB)
2531 *
2532  CALL dormbr( 'P', 'L', 'T', m, n, m, a, lda,
2533  $ work( itaup ), vt, ldvt,
2534  $ work( iwork ), lwork-iwork+1, ierr )
2535  iwork = ie + m
2536 *
2537 * Perform bidiagonal QR iteration, computing right
2538 * singular vectors of A in VT
2539 * (Workspace: need BDSPAC)
2540 *
2541  CALL dbdsqr( 'U', m, n, 0, 0, s, work( ie ), vt,
2542  $ ldvt, dum, 1, dum, 1, work( iwork ),
2543  $ info )
2544 *
2545  END IF
2546 *
2547  ELSE IF( wntuo ) THEN
2548 *
2549 * Path 5t(N much larger than M, JOBU='O', JOBVT='S')
2550 * M right singular vectors to be computed in VT and
2551 * M left singular vectors to be overwritten on A
2552 *
2553  IF( lwork.GE.2*m*m+max( 4*m, bdspac ) ) THEN
2554 *
2555 * Sufficient workspace for a fast algorithm
2556 *
2557  iu = 1
2558  IF( lwork.GE.wrkbl+2*lda*m ) THEN
2559 *
2560 * WORK(IU) is LDA by M and WORK(IR) is LDA by M
2561 *
2562  ldwrku = lda
2563  ir = iu + ldwrku*m
2564  ldwrkr = lda
2565  ELSE IF( lwork.GE.wrkbl+( lda+m )*m ) THEN
2566 *
2567 * WORK(IU) is LDA by M and WORK(IR) is M by M
2568 *
2569  ldwrku = lda
2570  ir = iu + ldwrku*m
2571  ldwrkr = m
2572  ELSE
2573 *
2574 * WORK(IU) is M by M and WORK(IR) is M by M
2575 *
2576  ldwrku = m
2577  ir = iu + ldwrku*m
2578  ldwrkr = m
2579  END IF
2580  itau = ir + ldwrkr*m
2581  iwork = itau + m
2582 *
2583 * Compute A=L*Q
2584 * (Workspace: need 2*M*M+2*M, prefer 2*M*M+M+M*NB)
2585 *
2586  CALL dgelqf( m, n, a, lda, work( itau ),
2587  $ work( iwork ), lwork-iwork+1, ierr )
2588 *
2589 * Copy L to WORK(IU), zeroing out below it
2590 *
2591  CALL dlacpy( 'L', m, m, a, lda, work( iu ),
2592  $ ldwrku )
2593  CALL dlaset( 'U', m-1, m-1, zero, zero,
2594  $ work( iu+ldwrku ), ldwrku )
2595 *
2596 * Generate Q in A
2597 * (Workspace: need 2*M*M+2*M, prefer 2*M*M+M+M*NB)
2598 *
2599  CALL dorglq( m, n, m, a, lda, work( itau ),
2600  $ work( iwork ), lwork-iwork+1, ierr )
2601  ie = itau
2602  itauq = ie + m
2603  itaup = itauq + m
2604  iwork = itaup + m
2605 *
2606 * Bidiagonalize L in WORK(IU), copying result to
2607 * WORK(IR)
2608 * (Workspace: need 2*M*M+4*M,
2609 * prefer 2*M*M+3*M+2*M*NB)
2610 *
2611  CALL dgebrd( m, m, work( iu ), ldwrku, s,
2612  $ work( ie ), work( itauq ),
2613  $ work( itaup ), work( iwork ),
2614  $ lwork-iwork+1, ierr )
2615  CALL dlacpy( 'L', m, m, work( iu ), ldwrku,
2616  $ work( ir ), ldwrkr )
2617 *
2618 * Generate right bidiagonalizing vectors in WORK(IU)
2619 * (Workspace: need 2*M*M+4*M-1,
2620 * prefer 2*M*M+3*M+(M-1)*NB)
2621 *
2622  CALL dorgbr( 'P', m, m, m, work( iu ), ldwrku,
2623  $ work( itaup ), work( iwork ),
2624  $ lwork-iwork+1, ierr )
2625 *
2626 * Generate left bidiagonalizing vectors in WORK(IR)
2627 * (Workspace: need 2*M*M+4*M, prefer 2*M*M+3*M+M*NB)
2628 *
2629  CALL dorgbr( 'Q', m, m, m, work( ir ), ldwrkr,
2630  $ work( itauq ), work( iwork ),
2631  $ lwork-iwork+1, ierr )
2632  iwork = ie + m
2633 *
2634 * Perform bidiagonal QR iteration, computing left
2635 * singular vectors of L in WORK(IR) and computing
2636 * right singular vectors of L in WORK(IU)
2637 * (Workspace: need 2*M*M+BDSPAC)
2638 *
2639  CALL dbdsqr( 'U', m, m, m, 0, s, work( ie ),
2640  $ work( iu ), ldwrku, work( ir ),
2641  $ ldwrkr, dum, 1, work( iwork ), info )
2642 *
2643 * Multiply right singular vectors of L in WORK(IU) by
2644 * Q in A, storing result in VT
2645 * (Workspace: need M*M)
2646 *
2647  CALL dgemm( 'N', 'N', m, n, m, one, work( iu ),
2648  $ ldwrku, a, lda, zero, vt, ldvt )
2649 *
2650 * Copy left singular vectors of L to A
2651 * (Workspace: need M*M)
2652 *
2653  CALL dlacpy( 'F', m, m, work( ir ), ldwrkr, a,
2654  $ lda )
2655 *
2656  ELSE
2657 *
2658 * Insufficient workspace for a fast algorithm
2659 *
2660  itau = 1
2661  iwork = itau + m
2662 *
2663 * Compute A=L*Q, copying result to VT
2664 * (Workspace: need 2*M, prefer M+M*NB)
2665 *
2666  CALL dgelqf( m, n, a, lda, work( itau ),
2667  $ work( iwork ), lwork-iwork+1, ierr )
2668  CALL dlacpy( 'U', m, n, a, lda, vt, ldvt )
2669 *
2670 * Generate Q in VT
2671 * (Workspace: need 2*M, prefer M+M*NB)
2672 *
2673  CALL dorglq( m, n, m, vt, ldvt, work( itau ),
2674  $ work( iwork ), lwork-iwork+1, ierr )
2675  ie = itau
2676  itauq = ie + m
2677  itaup = itauq + m
2678  iwork = itaup + m
2679 *
2680 * Zero out above L in A
2681 *
2682  CALL dlaset( 'U', m-1, m-1, zero, zero, a( 1, 2 ),
2683  $ lda )
2684 *
2685 * Bidiagonalize L in A
2686 * (Workspace: need 4*M, prefer 3*M+2*M*NB)
2687 *
2688  CALL dgebrd( m, m, a, lda, s, work( ie ),
2689  $ work( itauq ), work( itaup ),
2690  $ work( iwork ), lwork-iwork+1, ierr )
2691 *
2692 * Multiply right vectors bidiagonalizing L by Q in VT
2693 * (Workspace: need 3*M+N, prefer 3*M+N*NB)
2694 *
2695  CALL dormbr( 'P', 'L', 'T', m, n, m, a, lda,
2696  $ work( itaup ), vt, ldvt,
2697  $ work( iwork ), lwork-iwork+1, ierr )
2698 *
2699 * Generate left bidiagonalizing vectors of L in A
2700 * (Workspace: need 4*M, prefer 3*M+M*NB)
2701 *
2702  CALL dorgbr( 'Q', m, m, m, a, lda, work( itauq ),
2703  $ work( iwork ), lwork-iwork+1, ierr )
2704  iwork = ie + m
2705 *
2706 * Perform bidiagonal QR iteration, compute left
2707 * singular vectors of A in A and compute right
2708 * singular vectors of A in VT
2709 * (Workspace: need BDSPAC)
2710 *
2711  CALL dbdsqr( 'U', m, n, m, 0, s, work( ie ), vt,
2712  $ ldvt, a, lda, dum, 1, work( iwork ),
2713  $ info )
2714 *
2715  END IF
2716 *
2717  ELSE IF( wntuas ) THEN
2718 *
2719 * Path 6t(N much larger than M, JOBU='S' or 'A',
2720 * JOBVT='S')
2721 * M right singular vectors to be computed in VT and
2722 * M left singular vectors to be computed in U
2723 *
2724  IF( lwork.GE.m*m+max( 4*m, bdspac ) ) THEN
2725 *
2726 * Sufficient workspace for a fast algorithm
2727 *
2728  iu = 1
2729  IF( lwork.GE.wrkbl+lda*m ) THEN
2730 *
2731 * WORK(IU) is LDA by N
2732 *
2733  ldwrku = lda
2734  ELSE
2735 *
2736 * WORK(IU) is LDA by M
2737 *
2738  ldwrku = m
2739  END IF
2740  itau = iu + ldwrku*m
2741  iwork = itau + m
2742 *
2743 * Compute A=L*Q
2744 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
2745 *
2746  CALL dgelqf( m, n, a, lda, work( itau ),
2747  $ work( iwork ), lwork-iwork+1, ierr )
2748 *
2749 * Copy L to WORK(IU), zeroing out above it
2750 *
2751  CALL dlacpy( 'L', m, m, a, lda, work( iu ),
2752  $ ldwrku )
2753  CALL dlaset( 'U', m-1, m-1, zero, zero,
2754  $ work( iu+ldwrku ), ldwrku )
2755 *
2756 * Generate Q in A
2757 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
2758 *
2759  CALL dorglq( m, n, m, a, lda, work( itau ),
2760  $ work( iwork ), lwork-iwork+1, ierr )
2761  ie = itau
2762  itauq = ie + m
2763  itaup = itauq + m
2764  iwork = itaup + m
2765 *
2766 * Bidiagonalize L in WORK(IU), copying result to U
2767 * (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB)
2768 *
2769  CALL dgebrd( m, m, work( iu ), ldwrku, s,
2770  $ work( ie ), work( itauq ),
2771  $ work( itaup ), work( iwork ),
2772  $ lwork-iwork+1, ierr )
2773  CALL dlacpy( 'L', m, m, work( iu ), ldwrku, u,
2774  $ ldu )
2775 *
2776 * Generate right bidiagonalizing vectors in WORK(IU)
2777 * (Workspace: need M*M+4*M-1,
2778 * prefer M*M+3*M+(M-1)*NB)
2779 *
2780  CALL dorgbr( 'P', m, m, m, work( iu ), ldwrku,
2781  $ work( itaup ), work( iwork ),
2782  $ lwork-iwork+1, ierr )
2783 *
2784 * Generate left bidiagonalizing vectors in U
2785 * (Workspace: need M*M+4*M, prefer M*M+3*M+M*NB)
2786 *
2787  CALL dorgbr( 'Q', m, m, m, u, ldu, work( itauq ),
2788  $ work( iwork ), lwork-iwork+1, ierr )
2789  iwork = ie + m
2790 *
2791 * Perform bidiagonal QR iteration, computing left
2792 * singular vectors of L in U and computing right
2793 * singular vectors of L in WORK(IU)
2794 * (Workspace: need M*M+BDSPAC)
2795 *
2796  CALL dbdsqr( 'U', m, m, m, 0, s, work( ie ),
2797  $ work( iu ), ldwrku, u, ldu, dum, 1,
2798  $ work( iwork ), info )
2799 *
2800 * Multiply right singular vectors of L in WORK(IU) by
2801 * Q in A, storing result in VT
2802 * (Workspace: need M*M)
2803 *
2804  CALL dgemm( 'N', 'N', m, n, m, one, work( iu ),
2805  $ ldwrku, a, lda, zero, vt, ldvt )
2806 *
2807  ELSE
2808 *
2809 * Insufficient workspace for a fast algorithm
2810 *
2811  itau = 1
2812  iwork = itau + m
2813 *
2814 * Compute A=L*Q, copying result to VT
2815 * (Workspace: need 2*M, prefer M+M*NB)
2816 *
2817  CALL dgelqf( m, n, a, lda, work( itau ),
2818  $ work( iwork ), lwork-iwork+1, ierr )
2819  CALL dlacpy( 'U', m, n, a, lda, vt, ldvt )
2820 *
2821 * Generate Q in VT
2822 * (Workspace: need 2*M, prefer M+M*NB)
2823 *
2824  CALL dorglq( m, n, m, vt, ldvt, work( itau ),
2825  $ work( iwork ), lwork-iwork+1, ierr )
2826 *
2827 * Copy L to U, zeroing out above it
2828 *
2829  CALL dlacpy( 'L', m, m, a, lda, u, ldu )
2830  CALL dlaset( 'U', m-1, m-1, zero, zero, u( 1, 2 ),
2831  $ ldu )
2832  ie = itau
2833  itauq = ie + m
2834  itaup = itauq + m
2835  iwork = itaup + m
2836 *
2837 * Bidiagonalize L in U
2838 * (Workspace: need 4*M, prefer 3*M+2*M*NB)
2839 *
2840  CALL dgebrd( m, m, u, ldu, s, work( ie ),
2841  $ work( itauq ), work( itaup ),
2842  $ work( iwork ), lwork-iwork+1, ierr )
2843 *
2844 * Multiply right bidiagonalizing vectors in U by Q
2845 * in VT
2846 * (Workspace: need 3*M+N, prefer 3*M+N*NB)
2847 *
2848  CALL dormbr( 'P', 'L', 'T', m, n, m, u, ldu,
2849  $ work( itaup ), vt, ldvt,
2850  $ work( iwork ), lwork-iwork+1, ierr )
2851 *
2852 * Generate left bidiagonalizing vectors in U
2853 * (Workspace: need 4*M, prefer 3*M+M*NB)
2854 *
2855  CALL dorgbr( 'Q', m, m, m, u, ldu, work( itauq ),
2856  $ work( iwork ), lwork-iwork+1, ierr )
2857  iwork = ie + m
2858 *
2859 * Perform bidiagonal QR iteration, computing left
2860 * singular vectors of A in U and computing right
2861 * singular vectors of A in VT
2862 * (Workspace: need BDSPAC)
2863 *
2864  CALL dbdsqr( 'U', m, n, m, 0, s, work( ie ), vt,
2865  $ ldvt, u, ldu, dum, 1, work( iwork ),
2866  $ info )
2867 *
2868  END IF
2869 *
2870  END IF
2871 *
2872  ELSE IF( wntva ) THEN
2873 *
2874  IF( wntun ) THEN
2875 *
2876 * Path 7t(N much larger than M, JOBU='N', JOBVT='A')
2877 * N right singular vectors to be computed in VT and
2878 * no left singular vectors to be computed
2879 *
2880  IF( lwork.GE.m*m+max( n+m, 4*m, bdspac ) ) THEN
2881 *
2882 * Sufficient workspace for a fast algorithm
2883 *
2884  ir = 1
2885  IF( lwork.GE.wrkbl+lda*m ) THEN
2886 *
2887 * WORK(IR) is LDA by M
2888 *
2889  ldwrkr = lda
2890  ELSE
2891 *
2892 * WORK(IR) is M by M
2893 *
2894  ldwrkr = m
2895  END IF
2896  itau = ir + ldwrkr*m
2897  iwork = itau + m
2898 *
2899 * Compute A=L*Q, copying result to VT
2900 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
2901 *
2902  CALL dgelqf( m, n, a, lda, work( itau ),
2903  $ work( iwork ), lwork-iwork+1, ierr )
2904  CALL dlacpy( 'U', m, n, a, lda, vt, ldvt )
2905 *
2906 * Copy L to WORK(IR), zeroing out above it
2907 *
2908  CALL dlacpy( 'L', m, m, a, lda, work( ir ),
2909  $ ldwrkr )
2910  CALL dlaset( 'U', m-1, m-1, zero, zero,
2911  $ work( ir+ldwrkr ), ldwrkr )
2912 *
2913 * Generate Q in VT
2914 * (Workspace: need M*M+M+N, prefer M*M+M+N*NB)
2915 *
2916  CALL dorglq( n, n, m, vt, ldvt, work( itau ),
2917  $ work( iwork ), lwork-iwork+1, ierr )
2918  ie = itau
2919  itauq = ie + m
2920  itaup = itauq + m
2921  iwork = itaup + m
2922 *
2923 * Bidiagonalize L in WORK(IR)
2924 * (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB)
2925 *
2926  CALL dgebrd( m, m, work( ir ), ldwrkr, s,
2927  $ work( ie ), work( itauq ),
2928  $ work( itaup ), work( iwork ),
2929  $ lwork-iwork+1, ierr )
2930 *
2931 * Generate right bidiagonalizing vectors in WORK(IR)
2932 * (Workspace: need M*M+4*M-1,
2933 * prefer M*M+3*M+(M-1)*NB)
2934 *
2935  CALL dorgbr( 'P', m, m, m, work( ir ), ldwrkr,
2936  $ work( itaup ), work( iwork ),
2937  $ lwork-iwork+1, ierr )
2938  iwork = ie + m
2939 *
2940 * Perform bidiagonal QR iteration, computing right
2941 * singular vectors of L in WORK(IR)
2942 * (Workspace: need M*M+BDSPAC)
2943 *
2944  CALL dbdsqr( 'U', m, m, 0, 0, s, work( ie ),
2945  $ work( ir ), ldwrkr, dum, 1, dum, 1,
2946  $ work( iwork ), info )
2947 *
2948 * Multiply right singular vectors of L in WORK(IR) by
2949 * Q in VT, storing result in A
2950 * (Workspace: need M*M)
2951 *
2952  CALL dgemm( 'N', 'N', m, n, m, one, work( ir ),
2953  $ ldwrkr, vt, ldvt, zero, a, lda )
2954 *
2955 * Copy right singular vectors of A from A to VT
2956 *
2957  CALL dlacpy( 'F', m, n, a, lda, vt, ldvt )
2958 *
2959  ELSE
2960 *
2961 * Insufficient workspace for a fast algorithm
2962 *
2963  itau = 1
2964  iwork = itau + m
2965 *
2966 * Compute A=L*Q, copying result to VT
2967 * (Workspace: need 2*M, prefer M+M*NB)
2968 *
2969  CALL dgelqf( m, n, a, lda, work( itau ),
2970  $ work( iwork ), lwork-iwork+1, ierr )
2971  CALL dlacpy( 'U', m, n, a, lda, vt, ldvt )
2972 *
2973 * Generate Q in VT
2974 * (Workspace: need M+N, prefer M+N*NB)
2975 *
2976  CALL dorglq( n, n, m, vt, ldvt, work( itau ),
2977  $ work( iwork ), lwork-iwork+1, ierr )
2978  ie = itau
2979  itauq = ie + m
2980  itaup = itauq + m
2981  iwork = itaup + m
2982 *
2983 * Zero out above L in A
2984 *
2985  CALL dlaset( 'U', m-1, m-1, zero, zero, a( 1, 2 ),
2986  $ lda )
2987 *
2988 * Bidiagonalize L in A
2989 * (Workspace: need 4*M, prefer 3*M+2*M*NB)
2990 *
2991  CALL dgebrd( m, m, a, lda, s, work( ie ),
2992  $ work( itauq ), work( itaup ),
2993  $ work( iwork ), lwork-iwork+1, ierr )
2994 *
2995 * Multiply right bidiagonalizing vectors in A by Q
2996 * in VT
2997 * (Workspace: need 3*M+N, prefer 3*M+N*NB)
2998 *
2999  CALL dormbr( 'P', 'L', 'T', m, n, m, a, lda,
3000  $ work( itaup ), vt, ldvt,
3001  $ work( iwork ), lwork-iwork+1, ierr )
3002  iwork = ie + m
3003 *
3004 * Perform bidiagonal QR iteration, computing right
3005 * singular vectors of A in VT
3006 * (Workspace: need BDSPAC)
3007 *
3008  CALL dbdsqr( 'U', m, n, 0, 0, s, work( ie ), vt,
3009  $ ldvt, dum, 1, dum, 1, work( iwork ),
3010  $ info )
3011 *
3012  END IF
3013 *
3014  ELSE IF( wntuo ) THEN
3015 *
3016 * Path 8t(N much larger than M, JOBU='O', JOBVT='A')
3017 * N right singular vectors to be computed in VT and
3018 * M left singular vectors to be overwritten on A
3019 *
3020  IF( lwork.GE.2*m*m+max( n+m, 4*m, bdspac ) ) THEN
3021 *
3022 * Sufficient workspace for a fast algorithm
3023 *
3024  iu = 1
3025  IF( lwork.GE.wrkbl+2*lda*m ) THEN
3026 *
3027 * WORK(IU) is LDA by M and WORK(IR) is LDA by M
3028 *
3029  ldwrku = lda
3030  ir = iu + ldwrku*m
3031  ldwrkr = lda
3032  ELSE IF( lwork.GE.wrkbl+( lda+m )*m ) THEN
3033 *
3034 * WORK(IU) is LDA by M and WORK(IR) is M by M
3035 *
3036  ldwrku = lda
3037  ir = iu + ldwrku*m
3038  ldwrkr = m
3039  ELSE
3040 *
3041 * WORK(IU) is M by M and WORK(IR) is M by M
3042 *
3043  ldwrku = m
3044  ir = iu + ldwrku*m
3045  ldwrkr = m
3046  END IF
3047  itau = ir + ldwrkr*m
3048  iwork = itau + m
3049 *
3050 * Compute A=L*Q, copying result to VT
3051 * (Workspace: need 2*M*M+2*M, prefer 2*M*M+M+M*NB)
3052 *
3053  CALL dgelqf( m, n, a, lda, work( itau ),
3054  $ work( iwork ), lwork-iwork+1, ierr )
3055  CALL dlacpy( 'U', m, n, a, lda, vt, ldvt )
3056 *
3057 * Generate Q in VT
3058 * (Workspace: need 2*M*M+M+N, prefer 2*M*M+M+N*NB)
3059 *
3060  CALL dorglq( n, n, m, vt, ldvt, work( itau ),
3061  $ work( iwork ), lwork-iwork+1, ierr )
3062 *
3063 * Copy L to WORK(IU), zeroing out above it
3064 *
3065  CALL dlacpy( 'L', m, m, a, lda, work( iu ),
3066  $ ldwrku )
3067  CALL dlaset( 'U', m-1, m-1, zero, zero,
3068  $ work( iu+ldwrku ), ldwrku )
3069  ie = itau
3070  itauq = ie + m
3071  itaup = itauq + m
3072  iwork = itaup + m
3073 *
3074 * Bidiagonalize L in WORK(IU), copying result to
3075 * WORK(IR)
3076 * (Workspace: need 2*M*M+4*M,
3077 * prefer 2*M*M+3*M+2*M*NB)
3078 *
3079  CALL dgebrd( m, m, work( iu ), ldwrku, s,
3080  $ work( ie ), work( itauq ),
3081  $ work( itaup ), work( iwork ),
3082  $ lwork-iwork+1, ierr )
3083  CALL dlacpy( 'L', m, m, work( iu ), ldwrku,
3084  $ work( ir ), ldwrkr )
3085 *
3086 * Generate right bidiagonalizing vectors in WORK(IU)
3087 * (Workspace: need 2*M*M+4*M-1,
3088 * prefer 2*M*M+3*M+(M-1)*NB)
3089 *
3090  CALL dorgbr( 'P', m, m, m, work( iu ), ldwrku,
3091  $ work( itaup ), work( iwork ),
3092  $ lwork-iwork+1, ierr )
3093 *
3094 * Generate left bidiagonalizing vectors in WORK(IR)
3095 * (Workspace: need 2*M*M+4*M, prefer 2*M*M+3*M+M*NB)
3096 *
3097  CALL dorgbr( 'Q', m, m, m, work( ir ), ldwrkr,
3098  $ work( itauq ), work( iwork ),
3099  $ lwork-iwork+1, ierr )
3100  iwork = ie + m
3101 *
3102 * Perform bidiagonal QR iteration, computing left
3103 * singular vectors of L in WORK(IR) and computing
3104 * right singular vectors of L in WORK(IU)
3105 * (Workspace: need 2*M*M+BDSPAC)
3106 *
3107  CALL dbdsqr( 'U', m, m, m, 0, s, work( ie ),
3108  $ work( iu ), ldwrku, work( ir ),
3109  $ ldwrkr, dum, 1, work( iwork ), info )
3110 *
3111 * Multiply right singular vectors of L in WORK(IU) by
3112 * Q in VT, storing result in A
3113 * (Workspace: need M*M)
3114 *
3115  CALL dgemm( 'N', 'N', m, n, m, one, work( iu ),
3116  $ ldwrku, vt, ldvt, zero, a, lda )
3117 *
3118 * Copy right singular vectors of A from A to VT
3119 *
3120  CALL dlacpy( 'F', m, n, a, lda, vt, ldvt )
3121 *
3122 * Copy left singular vectors of A from WORK(IR) to A
3123 *
3124  CALL dlacpy( 'F', m, m, work( ir ), ldwrkr, a,
3125  $ lda )
3126 *
3127  ELSE
3128 *
3129 * Insufficient workspace for a fast algorithm
3130 *
3131  itau = 1
3132  iwork = itau + m
3133 *
3134 * Compute A=L*Q, copying result to VT
3135 * (Workspace: need 2*M, prefer M+M*NB)
3136 *
3137  CALL dgelqf( m, n, a, lda, work( itau ),
3138  $ work( iwork ), lwork-iwork+1, ierr )
3139  CALL dlacpy( 'U', m, n, a, lda, vt, ldvt )
3140 *
3141 * Generate Q in VT
3142 * (Workspace: need M+N, prefer M+N*NB)
3143 *
3144  CALL dorglq( n, n, m, vt, ldvt, work( itau ),
3145  $ work( iwork ), lwork-iwork+1, ierr )
3146  ie = itau
3147  itauq = ie + m
3148  itaup = itauq + m
3149  iwork = itaup + m
3150 *
3151 * Zero out above L in A
3152 *
3153  CALL dlaset( 'U', m-1, m-1, zero, zero, a( 1, 2 ),
3154  $ lda )
3155 *
3156 * Bidiagonalize L in A
3157 * (Workspace: need 4*M, prefer 3*M+2*M*NB)
3158 *
3159  CALL dgebrd( m, m, a, lda, s, work( ie ),
3160  $ work( itauq ), work( itaup ),
3161  $ work( iwork ), lwork-iwork+1, ierr )
3162 *
3163 * Multiply right bidiagonalizing vectors in A by Q
3164 * in VT
3165 * (Workspace: need 3*M+N, prefer 3*M+N*NB)
3166 *
3167  CALL dormbr( 'P', 'L', 'T', m, n, m, a, lda,
3168  $ work( itaup ), vt, ldvt,
3169  $ work( iwork ), lwork-iwork+1, ierr )
3170 *
3171 * Generate left bidiagonalizing vectors in A
3172 * (Workspace: need 4*M, prefer 3*M+M*NB)
3173 *
3174  CALL dorgbr( 'Q', m, m, m, a, lda, work( itauq ),
3175  $ work( iwork ), lwork-iwork+1, ierr )
3176  iwork = ie + m
3177 *
3178 * Perform bidiagonal QR iteration, computing left
3179 * singular vectors of A in A and computing right
3180 * singular vectors of A in VT
3181 * (Workspace: need BDSPAC)
3182 *
3183  CALL dbdsqr( 'U', m, n, m, 0, s, work( ie ), vt,
3184  $ ldvt, a, lda, dum, 1, work( iwork ),
3185  $ info )
3186 *
3187  END IF
3188 *
3189  ELSE IF( wntuas ) THEN
3190 *
3191 * Path 9t(N much larger than M, JOBU='S' or 'A',
3192 * JOBVT='A')
3193 * N right singular vectors to be computed in VT and
3194 * M left singular vectors to be computed in U
3195 *
3196  IF( lwork.GE.m*m+max( n+m, 4*m, bdspac ) ) THEN
3197 *
3198 * Sufficient workspace for a fast algorithm
3199 *
3200  iu = 1
3201  IF( lwork.GE.wrkbl+lda*m ) THEN
3202 *
3203 * WORK(IU) is LDA by M
3204 *
3205  ldwrku = lda
3206  ELSE
3207 *
3208 * WORK(IU) is M by M
3209 *
3210  ldwrku = m
3211  END IF
3212  itau = iu + ldwrku*m
3213  iwork = itau + m
3214 *
3215 * Compute A=L*Q, copying result to VT
3216 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
3217 *
3218  CALL dgelqf( m, n, a, lda, work( itau ),
3219  $ work( iwork ), lwork-iwork+1, ierr )
3220  CALL dlacpy( 'U', m, n, a, lda, vt, ldvt )
3221 *
3222 * Generate Q in VT
3223 * (Workspace: need M*M+M+N, prefer M*M+M+N*NB)
3224 *
3225  CALL dorglq( n, n, m, vt, ldvt, work( itau ),
3226  $ work( iwork ), lwork-iwork+1, ierr )
3227 *
3228 * Copy L to WORK(IU), zeroing out above it
3229 *
3230  CALL dlacpy( 'L', m, m, a, lda, work( iu ),
3231  $ ldwrku )
3232  CALL dlaset( 'U', m-1, m-1, zero, zero,
3233  $ work( iu+ldwrku ), ldwrku )
3234  ie = itau
3235  itauq = ie + m
3236  itaup = itauq + m
3237  iwork = itaup + m
3238 *
3239 * Bidiagonalize L in WORK(IU), copying result to U
3240 * (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB)
3241 *
3242  CALL dgebrd( m, m, work( iu ), ldwrku, s,
3243  $ work( ie ), work( itauq ),
3244  $ work( itaup ), work( iwork ),
3245  $ lwork-iwork+1, ierr )
3246  CALL dlacpy( 'L', m, m, work( iu ), ldwrku, u,
3247  $ ldu )
3248 *
3249 * Generate right bidiagonalizing vectors in WORK(IU)
3250 * (Workspace: need M*M+4*M, prefer M*M+3*M+(M-1)*NB)
3251 *
3252  CALL dorgbr( 'P', m, m, m, work( iu ), ldwrku,
3253  $ work( itaup ), work( iwork ),
3254  $ lwork-iwork+1, ierr )
3255 *
3256 * Generate left bidiagonalizing vectors in U
3257 * (Workspace: need M*M+4*M, prefer M*M+3*M+M*NB)
3258 *
3259  CALL dorgbr( 'Q', m, m, m, u, ldu, work( itauq ),
3260  $ work( iwork ), lwork-iwork+1, ierr )
3261  iwork = ie + m
3262 *
3263 * Perform bidiagonal QR iteration, computing left
3264 * singular vectors of L in U and computing right
3265 * singular vectors of L in WORK(IU)
3266 * (Workspace: need M*M+BDSPAC)
3267 *
3268  CALL dbdsqr( 'U', m, m, m, 0, s, work( ie ),
3269  $ work( iu ), ldwrku, u, ldu, dum, 1,
3270  $ work( iwork ), info )
3271 *
3272 * Multiply right singular vectors of L in WORK(IU) by
3273 * Q in VT, storing result in A
3274 * (Workspace: need M*M)
3275 *
3276  CALL dgemm( 'N', 'N', m, n, m, one, work( iu ),
3277  $ ldwrku, vt, ldvt, zero, a, lda )
3278 *
3279 * Copy right singular vectors of A from A to VT
3280 *
3281  CALL dlacpy( 'F', m, n, a, lda, vt, ldvt )
3282 *
3283  ELSE
3284 *
3285 * Insufficient workspace for a fast algorithm
3286 *
3287  itau = 1
3288  iwork = itau + m
3289 *
3290 * Compute A=L*Q, copying result to VT
3291 * (Workspace: need 2*M, prefer M+M*NB)
3292 *
3293  CALL dgelqf( m, n, a, lda, work( itau ),
3294  $ work( iwork ), lwork-iwork+1, ierr )
3295  CALL dlacpy( 'U', m, n, a, lda, vt, ldvt )
3296 *
3297 * Generate Q in VT
3298 * (Workspace: need M+N, prefer M+N*NB)
3299 *
3300  CALL dorglq( n, n, m, vt, ldvt, work( itau ),
3301  $ work( iwork ), lwork-iwork+1, ierr )
3302 *
3303 * Copy L to U, zeroing out above it
3304 *
3305  CALL dlacpy( 'L', m, m, a, lda, u, ldu )
3306  CALL dlaset( 'U', m-1, m-1, zero, zero, u( 1, 2 ),
3307  $ ldu )
3308  ie = itau
3309  itauq = ie + m
3310  itaup = itauq + m
3311  iwork = itaup + m
3312 *
3313 * Bidiagonalize L in U
3314 * (Workspace: need 4*M, prefer 3*M+2*M*NB)
3315 *
3316  CALL dgebrd( m, m, u, ldu, s, work( ie ),
3317  $ work( itauq ), work( itaup ),
3318  $ work( iwork ), lwork-iwork+1, ierr )
3319 *
3320 * Multiply right bidiagonalizing vectors in U by Q
3321 * in VT
3322 * (Workspace: need 3*M+N, prefer 3*M+N*NB)
3323 *
3324  CALL dormbr( 'P', 'L', 'T', m, n, m, u, ldu,
3325  $ work( itaup ), vt, ldvt,
3326  $ work( iwork ), lwork-iwork+1, ierr )
3327 *
3328 * Generate left bidiagonalizing vectors in U
3329 * (Workspace: need 4*M, prefer 3*M+M*NB)
3330 *
3331  CALL dorgbr( 'Q', m, m, m, u, ldu, work( itauq ),
3332  $ work( iwork ), lwork-iwork+1, ierr )
3333  iwork = ie + m
3334 *
3335 * Perform bidiagonal QR iteration, computing left
3336 * singular vectors of A in U and computing right
3337 * singular vectors of A in VT
3338 * (Workspace: need BDSPAC)
3339 *
3340  CALL dbdsqr( 'U', m, n, m, 0, s, work( ie ), vt,
3341  $ ldvt, u, ldu, dum, 1, work( iwork ),
3342  $ info )
3343 *
3344  END IF
3345 *
3346  END IF
3347 *
3348  END IF
3349 *
3350  ELSE
3351 *
3352 * N .LT. MNTHR
3353 *
3354 * Path 10t(N greater than M, but not much larger)
3355 * Reduce to bidiagonal form without LQ decomposition
3356 *
3357  ie = 1
3358  itauq = ie + m
3359  itaup = itauq + m
3360  iwork = itaup + m
3361 *
3362 * Bidiagonalize A
3363 * (Workspace: need 3*M+N, prefer 3*M+(M+N)*NB)
3364 *
3365  CALL dgebrd( m, n, a, lda, s, work( ie ), work( itauq ),
3366  $ work( itaup ), work( iwork ), lwork-iwork+1,
3367  $ ierr )
3368  IF( wntuas ) THEN
3369 *
3370 * If left singular vectors desired in U, copy result to U
3371 * and generate left bidiagonalizing vectors in U
3372 * (Workspace: need 4*M-1, prefer 3*M+(M-1)*NB)
3373 *
3374  CALL dlacpy( 'L', m, m, a, lda, u, ldu )
3375  CALL dorgbr( 'Q', m, m, n, u, ldu, work( itauq ),
3376  $ work( iwork ), lwork-iwork+1, ierr )
3377  END IF
3378  IF( wntvas ) THEN
3379 *
3380 * If right singular vectors desired in VT, copy result to
3381 * VT and generate right bidiagonalizing vectors in VT
3382 * (Workspace: need 3*M+NRVT, prefer 3*M+NRVT*NB)
3383 *
3384  CALL dlacpy( 'U', m, n, a, lda, vt, ldvt )
3385  IF( wntva )
3386  $ nrvt = n
3387  IF( wntvs )
3388  $ nrvt = m
3389  CALL dorgbr( 'P', nrvt, n, m, vt, ldvt, work( itaup ),
3390  $ work( iwork ), lwork-iwork+1, ierr )
3391  END IF
3392  IF( wntuo ) THEN
3393 *
3394 * If left singular vectors desired in A, generate left
3395 * bidiagonalizing vectors in A
3396 * (Workspace: need 4*M-1, prefer 3*M+(M-1)*NB)
3397 *
3398  CALL dorgbr( 'Q', m, m, n, a, lda, work( itauq ),
3399  $ work( iwork ), lwork-iwork+1, ierr )
3400  END IF
3401  IF( wntvo ) THEN
3402 *
3403 * If right singular vectors desired in A, generate right
3404 * bidiagonalizing vectors in A
3405 * (Workspace: need 4*M, prefer 3*M+M*NB)
3406 *
3407  CALL dorgbr( 'P', m, n, m, a, lda, work( itaup ),
3408  $ work( iwork ), lwork-iwork+1, ierr )
3409  END IF
3410  iwork = ie + m
3411  IF( wntuas .OR. wntuo )
3412  $ nru = m
3413  IF( wntun )
3414  $ nru = 0
3415  IF( wntvas .OR. wntvo )
3416  $ ncvt = n
3417  IF( wntvn )
3418  $ ncvt = 0
3419  IF( ( .NOT.wntuo ) .AND. ( .NOT.wntvo ) ) THEN
3420 *
3421 * Perform bidiagonal QR iteration, if desired, computing
3422 * left singular vectors in U and computing right singular
3423 * vectors in VT
3424 * (Workspace: need BDSPAC)
3425 *
3426  CALL dbdsqr( 'L', m, ncvt, nru, 0, s, work( ie ), vt,
3427  $ ldvt, u, ldu, dum, 1, work( iwork ), info )
3428  ELSE IF( ( .NOT.wntuo ) .AND. wntvo ) THEN
3429 *
3430 * Perform bidiagonal QR iteration, if desired, computing
3431 * left singular vectors in U and computing right singular
3432 * vectors in A
3433 * (Workspace: need BDSPAC)
3434 *
3435  CALL dbdsqr( 'L', m, ncvt, nru, 0, s, work( ie ), a, lda,
3436  $ u, ldu, dum, 1, work( iwork ), info )
3437  ELSE
3438 *
3439 * Perform bidiagonal QR iteration, if desired, computing
3440 * left singular vectors in A and computing right singular
3441 * vectors in VT
3442 * (Workspace: need BDSPAC)
3443 *
3444  CALL dbdsqr( 'L', m, ncvt, nru, 0, s, work( ie ), vt,
3445  $ ldvt, a, lda, dum, 1, work( iwork ), info )
3446  END IF
3447 *
3448  END IF
3449 *
3450  END IF
3451 *
3452 * If DBDSQR failed to converge, copy unconverged superdiagonals
3453 * to WORK( 2:MINMN )
3454 *
3455  IF( info.NE.0 ) THEN
3456  IF( ie.GT.2 ) THEN
3457  DO 50 i = 1, minmn - 1
3458  work( i+1 ) = work( i+ie-1 )
3459  50 CONTINUE
3460  END IF
3461  IF( ie.LT.2 ) THEN
3462  DO 60 i = minmn - 1, 1, -1
3463  work( i+1 ) = work( i+ie-1 )
3464  60 CONTINUE
3465  END IF
3466  END IF
3467 *
3468 * Undo scaling if necessary
3469 *
3470  IF( iscl.EQ.1 ) THEN
3471  IF( anrm.GT.bignum )
3472  $ CALL dlascl( 'G', 0, 0, bignum, anrm, minmn, 1, s, minmn,
3473  $ ierr )
3474  IF( info.NE.0 .AND. anrm.GT.bignum )
3475  $ CALL dlascl( 'G', 0, 0, bignum, anrm, minmn-1, 1, work( 2 ),
3476  $ minmn, ierr )
3477  IF( anrm.LT.smlnum )
3478  $ CALL dlascl( 'G', 0, 0, smlnum, anrm, minmn, 1, s, minmn,
3479  $ ierr )
3480  IF( info.NE.0 .AND. anrm.LT.smlnum )
3481  $ CALL dlascl( 'G', 0, 0, smlnum, anrm, minmn-1, 1, work( 2 ),
3482  $ minmn, ierr )
3483  END IF
3484 *
3485 * Return optimal workspace in WORK(1)
3486 *
3487  work( 1 ) = maxwrk
3488 *
3489  RETURN
3490 *
3491 * End of DGESVD
3492 *
subroutine dorgbr(VECT, M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
DORGBR
Definition: dorgbr.f:159
subroutine dlaset(UPLO, M, N, ALPHA, BETA, A, LDA)
DLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values...
Definition: dlaset.f:112
subroutine dlacpy(UPLO, M, N, A, LDA, B, LDB)
DLACPY copies all or part of one two-dimensional array to another.
Definition: dlacpy.f:105
subroutine dgeqrf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
DGEQRF
Definition: dgeqrf.f:138
subroutine dorglq(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
DORGLQ
Definition: dorglq.f:129
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:62
subroutine dormbr(VECT, SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
DORMBR
Definition: dormbr.f:197
subroutine dgebrd(M, N, A, LDA, D, E, TAUQ, TAUP, WORK, LWORK, INFO)
DGEBRD
Definition: dgebrd.f:207
subroutine dgemm(TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)
DGEMM
Definition: dgemm.f:189
double precision function dlange(NORM, M, N, A, LDA, WORK)
DLANGE returns the value of the 1-norm, Frobenius norm, infinity-norm, or the largest absolute value ...
Definition: dlange.f:116
double precision function dlamch(CMACH)
DLAMCH
Definition: dlamch.f:65
subroutine dorgqr(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
DORGQR
Definition: dorgqr.f:130
logical function lsame(CA, CB)
LSAME
Definition: lsame.f:55
subroutine dgelqf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
DGELQF
Definition: dgelqf.f:137
subroutine dlascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
DLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: dlascl.f:141
integer function ilaenv(ISPEC, NAME, OPTS, N1, N2, N3, N4)
Definition: tstiee.f:83
subroutine dbdsqr(UPLO, N, NCVT, NRU, NCC, D, E, VT, LDVT, U, LDU, C, LDC, WORK, INFO)
DBDSQR
Definition: dbdsqr.f:232

Here is the call graph for this function:

Here is the caller graph for this function:

subroutine dgesvdx ( character  JOBU,
character  JOBVT,
character  RANGE,
integer  M,
integer  N,
double precision, dimension( lda, * )  A,
integer  LDA,
double precision  VL,
double precision  VU,
integer  IL,
integer  IU,
integer  NS,
double precision, dimension( * )  S,
double precision, dimension( ldu, * )  U,
integer  LDU,
double precision, dimension( ldvt, * )  VT,
integer  LDVT,
double precision, dimension( * )  WORK,
integer  LWORK,
integer, dimension( * )  IWORK,
integer  INFO 
)

DGESVDX computes the singular value decomposition (SVD) for GE matrices

Download DGESVDX + dependencies [TGZ] [ZIP] [TXT]

Purpose:
  DGESVDX computes the singular value decomposition (SVD) of a real
  M-by-N matrix A, optionally computing the left and/or right singular
  vectors. The SVD is written
 
      A = U * SIGMA * transpose(V)
 
  where SIGMA is an M-by-N matrix which is zero except for its
  min(m,n) diagonal elements, U is an M-by-M orthogonal matrix, and
  V is an N-by-N orthogonal matrix.  The diagonal elements of SIGMA
  are the singular values of A; they are real and non-negative, and
  are returned in descending order.  The first min(m,n) columns of
  U and V are the left and right singular vectors of A.
 
  DGESVDX uses an eigenvalue problem for obtaining the SVD, which 
  allows for the computation of a subset of singular values and 
  vectors. See DBDSVDX for details.
 
  Note that the routine returns V**T, not V.
Parameters
[in]JOBU
          JOBU is CHARACTER*1
          Specifies options for computing all or part of the matrix U:
          = 'V':  the first min(m,n) columns of U (the left singular
                  vectors) or as specified by RANGE are returned in 
                  the array U;
          = 'N':  no columns of U (no left singular vectors) are
                  computed.
[in]JOBVT
          JOBVT is CHARACTER*1
           Specifies options for computing all or part of the matrix
           V**T:
           = 'V':  the first min(m,