LAPACK  3.6.0
LAPACK: Linear Algebra PACKage
Collaboration diagram for real:

Functions

subroutine sgejsv (JOBA, JOBU, JOBV, JOBR, JOBT, JOBP, M, N, A, LDA, SVA, U, LDU, V, LDV, WORK, LWORK, IWORK, INFO)
 SGEJSV More...
 
subroutine sgesdd (JOBZ, M, N, A, LDA, S, U, LDU, VT, LDVT, WORK, LWORK, IWORK, INFO)
 SGESDD More...
 
subroutine sgesvd (JOBU, JOBVT, M, N, A, LDA, S, U, LDU, VT, LDVT, WORK, LWORK, INFO)
  SGESVD computes the singular value decomposition (SVD) for GE matrices More...
 
subroutine sgesvdx (JOBU, JOBVT, RANGE, M, N, A, LDA, VL, VU, IL, IU, NS, S, U, LDU, VT, LDVT, WORK, LWORK, IWORK, INFO)
  SGESVDX computes the singular value decomposition (SVD) for GE matrices More...
 

Detailed Description

This is the group of real singular value driver functions for GE matrices

Function Documentation

subroutine sgejsv ( character*1  JOBA,
character*1  JOBU,
character*1  JOBV,
character*1  JOBR,
character*1  JOBT,
character*1  JOBP,
integer  M,
integer  N,
real, dimension( lda, * )  A,
integer  LDA,
real, dimension( n )  SVA,
real, dimension( ldu, * )  U,
integer  LDU,
real, dimension( ldv, * )  V,
integer  LDV,
real, dimension( lwork )  WORK,
integer  LWORK,
integer, dimension( * )  IWORK,
integer  INFO 
)

SGEJSV

Download SGEJSV + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 SGEJSV computes the singular value decomposition (SVD) of a real M-by-N
 matrix [A], where M >= N. The SVD of [A] is written as

              [A] = [U] * [SIGMA] * [V]^t,

 where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
 diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix, and
 [V] is an N-by-N orthogonal matrix. The diagonal elements of [SIGMA] are
 the singular values of [A]. The columns of [U] and [V] are the left and
 the right singular vectors of [A], respectively. The matrices [U] and [V]
 are computed and stored in the arrays U and V, respectively. The diagonal
 of [SIGMA] is computed and stored in the array SVA.
 SGEJSV can sometimes compute tiny singular values and their singular vectors much
 more accurately than other SVD routines, see below under Further Details.  
Parameters
[in]JOBA
          JOBA is CHARACTER*1
         Specifies the level of accuracy:
       = 'C': This option works well (high relative accuracy) if A = B * D,
              with well-conditioned B and arbitrary diagonal matrix D.
              The accuracy cannot be spoiled by COLUMN scaling. The
              accuracy of the computed output depends on the condition of
              B, and the procedure aims at the best theoretical accuracy.
              The relative error max_{i=1:N}|d sigma_i| / sigma_i is
              bounded by f(M,N)*epsilon* cond(B), independent of D.
              The input matrix is preprocessed with the QRF with column
              pivoting. This initial preprocessing and preconditioning by
              a rank revealing QR factorization is common for all values of
              JOBA. Additional actions are specified as follows:
       = 'E': Computation as with 'C' with an additional estimate of the
              condition number of B. It provides a realistic error bound.
       = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
              D1, D2, and well-conditioned matrix C, this option gives
              higher accuracy than the 'C' option. If the structure of the
              input matrix is not known, and relative accuracy is
              desirable, then this option is advisable. The input matrix A
              is preprocessed with QR factorization with FULL (row and
              column) pivoting.
       = 'G'  Computation as with 'F' with an additional estimate of the
              condition number of B, where A=D*B. If A has heavily weighted
              rows, then using this condition number gives too pessimistic
              error bound.
       = 'A': Small singular values are the noise and the matrix is treated
              as numerically rank defficient. The error in the computed
              singular values is bounded by f(m,n)*epsilon*||A||.
              The computed SVD A = U * S * V^t restores A up to
              f(m,n)*epsilon*||A||.
              This gives the procedure the licence to discard (set to zero)
              all singular values below N*epsilon*||A||.
       = 'R': Similar as in 'A'. Rank revealing property of the initial
              QR factorization is used do reveal (using triangular factor)
              a gap sigma_{r+1} < epsilon * sigma_r in which case the
              numerical RANK is declared to be r. The SVD is computed with
              absolute error bounds, but more accurately than with 'A'.
[in]JOBU
          JOBU is CHARACTER*1
         Specifies whether to compute the columns of U:
       = 'U': N columns of U are returned in the array U.
       = 'F': full set of M left sing. vectors is returned in the array U.
       = 'W': U may be used as workspace of length M*N. See the description
              of U.
       = 'N': U is not computed.
[in]JOBV
          JOBV is CHARACTER*1
         Specifies whether to compute the matrix V:
       = 'V': N columns of V are returned in the array V; Jacobi rotations
              are not explicitly accumulated.
       = 'J': N columns of V are returned in the array V, but they are
              computed as the product of Jacobi rotations. This option is
              allowed only if JOBU .NE. 'N', i.e. in computing the full SVD.
       = 'W': V may be used as workspace of length N*N. See the description
              of V.
       = 'N': V is not computed.
[in]JOBR
          JOBR is CHARACTER*1
         Specifies the RANGE for the singular values. Issues the licence to
         set to zero small positive singular values if they are outside
         specified range. If A .NE. 0 is scaled so that the largest singular
         value of c*A is around SQRT(BIG), BIG=SLAMCH('O'), then JOBR issues
         the licence to kill columns of A whose norm in c*A is less than
         SQRT(SFMIN) (for JOBR.EQ.'R'), or less than SMALL=SFMIN/EPSLN,
         where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E').
       = 'N': Do not kill small columns of c*A. This option assumes that
              BLAS and QR factorizations and triangular solvers are
              implemented to work in that range. If the condition of A
              is greater than BIG, use SGESVJ.
       = 'R': RESTRICTED range for sigma(c*A) is [SQRT(SFMIN), SQRT(BIG)]
              (roughly, as described above). This option is recommended.
                                             ===========================
         For computing the singular values in the FULL range [SFMIN,BIG]
         use SGESVJ.
[in]JOBT
          JOBT is CHARACTER*1
         If the matrix is square then the procedure may determine to use
         transposed A if A^t seems to be better with respect to convergence.
         If the matrix is not square, JOBT is ignored. This is subject to
         changes in the future.
         The decision is based on two values of entropy over the adjoint
         orbit of A^t * A. See the descriptions of WORK(6) and WORK(7).
       = 'T': transpose if entropy test indicates possibly faster
         convergence of Jacobi process if A^t is taken as input. If A is
         replaced with A^t, then the row pivoting is included automatically.
       = 'N': do not speculate.
         This option can be used to compute only the singular values, or the
         full SVD (U, SIGMA and V). For only one set of singular vectors
         (U or V), the caller should provide both U and V, as one of the
         matrices is used as workspace if the matrix A is transposed.
         The implementer can easily remove this constraint and make the
         code more complicated. See the descriptions of U and V.
[in]JOBP
          JOBP is CHARACTER*1
         Issues the licence to introduce structured perturbations to drown
         denormalized numbers. This licence should be active if the
         denormals are poorly implemented, causing slow computation,
         especially in cases of fast convergence (!). For details see [1,2].
         For the sake of simplicity, this perturbations are included only
         when the full SVD or only the singular values are requested. The
         implementer/user can easily add the perturbation for the cases of
         computing one set of singular vectors.
       = 'P': introduce perturbation
       = 'N': do not perturb
[in]M
          M is INTEGER
         The number of rows of the input matrix A.  M >= 0.
[in]N
          N is INTEGER
         The number of columns of the input matrix A. M >= N >= 0.
[in,out]A
          A is REAL array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]SVA
          SVA is REAL array, dimension (N)
          On exit,
          - For WORK(1)/WORK(2) = ONE: The singular values of A. During the
            computation SVA contains Euclidean column norms of the
            iterated matrices in the array A.
          - For WORK(1) .NE. WORK(2): The singular values of A are
            (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
            sigma_max(A) overflows or if small singular values have been
            saved from underflow by scaling the input matrix A.
          - If JOBR='R' then some of the singular values may be returned
            as exact zeros obtained by "set to zero" because they are
            below the numerical rank threshold or are denormalized numbers.
[out]U
          U is REAL array, dimension ( LDU, N )
          If JOBU = 'U', then U contains on exit the M-by-N matrix of
                         the left singular vectors.
          If JOBU = 'F', then U contains on exit the M-by-M matrix of
                         the left singular vectors, including an ONB
                         of the orthogonal complement of the Range(A).
          If JOBU = 'W'  .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N),
                         then U is used as workspace if the procedure
                         replaces A with A^t. In that case, [V] is computed
                         in U as left singular vectors of A^t and then
                         copied back to the V array. This 'W' option is just
                         a reminder to the caller that in this case U is
                         reserved as workspace of length N*N.
          If JOBU = 'N'  U is not referenced.
[in]LDU
          LDU is INTEGER
          The leading dimension of the array U,  LDU >= 1.
          IF  JOBU = 'U' or 'F' or 'W',  then LDU >= M.
[out]V
          V is REAL array, dimension ( LDV, N )
          If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
                         the right singular vectors;
          If JOBV = 'W', AND (JOBU.EQ.'U' AND JOBT.EQ.'T' AND M.EQ.N),
                         then V is used as workspace if the pprocedure
                         replaces A with A^t. In that case, [U] is computed
                         in V as right singular vectors of A^t and then
                         copied back to the U array. This 'W' option is just
                         a reminder to the caller that in this case V is
                         reserved as workspace of length N*N.
          If JOBV = 'N'  V is not referenced.
[in]LDV
          LDV is INTEGER
          The leading dimension of the array V,  LDV >= 1.
          If JOBV = 'V' or 'J' or 'W', then LDV >= N.
[out]WORK
          WORK is REAL array, dimension at least LWORK.
          On exit,
          WORK(1) = SCALE = WORK(2) / WORK(1) is the scaling factor such
                    that SCALE*SVA(1:N) are the computed singular values
                    of A. (See the description of SVA().)
          WORK(2) = See the description of WORK(1).
          WORK(3) = SCONDA is an estimate for the condition number of
                    column equilibrated A. (If JOBA .EQ. 'E' or 'G')
                    SCONDA is an estimate of SQRT(||(R^t * R)^(-1)||_1).
                    It is computed using SPOCON. It holds
                    N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
                    where R is the triangular factor from the QRF of A.
                    However, if R is truncated and the numerical rank is
                    determined to be strictly smaller than N, SCONDA is
                    returned as -1, thus indicating that the smallest
                    singular values might be lost.

          If full SVD is needed, the following two condition numbers are
          useful for the analysis of the algorithm. They are provied for
          a developer/implementer who is familiar with the details of
          the method.

          WORK(4) = an estimate of the scaled condition number of the
                    triangular factor in the first QR factorization.
          WORK(5) = an estimate of the scaled condition number of the
                    triangular factor in the second QR factorization.
          The following two parameters are computed if JOBT .EQ. 'T'.
          They are provided for a developer/implementer who is familiar
          with the details of the method.

          WORK(6) = the entropy of A^t*A :: this is the Shannon entropy
                    of diag(A^t*A) / Trace(A^t*A) taken as point in the
                    probability simplex.
          WORK(7) = the entropy of A*A^t.
[in]LWORK
          LWORK is INTEGER
          Length of WORK to confirm proper allocation of work space.
          LWORK depends on the job:

          If only SIGMA is needed ( JOBU.EQ.'N', JOBV.EQ.'N' ) and
            -> .. no scaled condition estimate required (JOBE.EQ.'N'):
               LWORK >= max(2*M+N,4*N+1,7). This is the minimal requirement.
               ->> For optimal performance (blocked code) the optimal value
               is LWORK >= max(2*M+N,3*N+(N+1)*NB,7). Here NB is the optimal
               block size for DGEQP3 and DGEQRF.
               In general, optimal LWORK is computed as 
               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), 7).        
            -> .. an estimate of the scaled condition number of A is
               required (JOBA='E', 'G'). In this case, LWORK is the maximum
               of the above and N*N+4*N, i.e. LWORK >= max(2*M+N,N*N+4*N,7).
               ->> For optimal performance (blocked code) the optimal value 
               is LWORK >= max(2*M+N,3*N+(N+1)*NB, N*N+4*N, 7).
               In general, the optimal length LWORK is computed as
               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), 
                                                     N+N*N+LWORK(DPOCON),7).

          If SIGMA and the right singular vectors are needed (JOBV.EQ.'V'),
            -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
            -> For optimal performance, LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
               where NB is the optimal block size for DGEQP3, DGEQRF, DGELQ,
               DORMLQ. In general, the optimal length LWORK is computed as
               LWORK >= max(2*M+N,N+LWORK(DGEQP3), N+LWORK(DPOCON), 
                       N+LWORK(DGELQ), 2*N+LWORK(DGEQRF), N+LWORK(DORMLQ)).

          If SIGMA and the left singular vectors are needed
            -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
            -> For optimal performance:
               if JOBU.EQ.'U' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
               if JOBU.EQ.'F' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,N+M*NB,7),
               where NB is the optimal block size for DGEQP3, DGEQRF, DORMQR.
               In general, the optimal length LWORK is computed as
               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DPOCON),
                        2*N+LWORK(DGEQRF), N+LWORK(DORMQR)). 
               Here LWORK(DORMQR) equals N*NB (for JOBU.EQ.'U') or 
               M*NB (for JOBU.EQ.'F').
               
          If the full SVD is needed: (JOBU.EQ.'U' or JOBU.EQ.'F') and 
            -> if JOBV.EQ.'V'  
               the minimal requirement is LWORK >= max(2*M+N,6*N+2*N*N). 
            -> if JOBV.EQ.'J' the minimal requirement is 
               LWORK >= max(2*M+N, 4*N+N*N,2*N+N*N+6).
            -> For optimal performance, LWORK should be additionally
               larger than N+M*NB, where NB is the optimal block size
               for DORMQR.
[out]IWORK
          IWORK is INTEGER array, dimension M+3*N.
          On exit,
          IWORK(1) = the numerical rank determined after the initial
                     QR factorization with pivoting. See the descriptions
                     of JOBA and JOBR.
          IWORK(2) = the number of the computed nonzero singular values
          IWORK(3) = if nonzero, a warning message:
                     If IWORK(3).EQ.1 then some of the column norms of A
                     were denormalized floats. The requested high accuracy
                     is not warranted by the data.
[out]INFO
          INFO is INTEGER
           < 0  : if INFO = -i, then the i-th argument had an illegal value.
           = 0 :  successfull exit;
           > 0 :  SGEJSV  did not converge in the maximal allowed number
                  of sweeps. The computed values may be inaccurate.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Date
November 2015
Further Details:
  SGEJSV implements a preconditioned Jacobi SVD algorithm. It uses SGEQP3,
  SGEQRF, and SGELQF as preprocessors and preconditioners. Optionally, an
  additional row pivoting can be used as a preprocessor, which in some
  cases results in much higher accuracy. An example is matrix A with the
  structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
  diagonal matrices and C is well-conditioned matrix. In that case, complete
  pivoting in the first QR factorizations provides accuracy dependent on the
  condition number of C, and independent of D1, D2. Such higher accuracy is
  not completely understood theoretically, but it works well in practice.
  Further, if A can be written as A = B*D, with well-conditioned B and some
  diagonal D, then the high accuracy is guaranteed, both theoretically and
  in software, independent of D. For more details see [1], [2].
     The computational range for the singular values can be the full range
  ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
  & LAPACK routines called by SGEJSV are implemented to work in that range.
  If that is not the case, then the restriction for safe computation with
  the singular values in the range of normalized IEEE numbers is that the
  spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
  overflow. This code (SGEJSV) is best used in this restricted range,
  meaning that singular values of magnitude below ||A||_2 / SLAMCH('O') are
  returned as zeros. See JOBR for details on this.
     Further, this implementation is somewhat slower than the one described
  in [1,2] due to replacement of some non-LAPACK components, and because
  the choice of some tuning parameters in the iterative part (SGESVJ) is
  left to the implementer on a particular machine.
     The rank revealing QR factorization (in this code: SGEQP3) should be
  implemented as in [3]. We have a new version of SGEQP3 under development
  that is more robust than the current one in LAPACK, with a cleaner cut in
  rank defficient cases. It will be available in the SIGMA library [4].
  If M is much larger than N, it is obvious that the inital QRF with
  column pivoting can be preprocessed by the QRF without pivoting. That
  well known trick is not used in SGEJSV because in some cases heavy row
  weighting can be treated with complete pivoting. The overhead in cases
  M much larger than N is then only due to pivoting, but the benefits in
  terms of accuracy have prevailed. The implementer/user can incorporate
  this extra QRF step easily. The implementer can also improve data movement
  (matrix transpose, matrix copy, matrix transposed copy) - this
  implementation of SGEJSV uses only the simplest, naive data movement.
Contributors:
Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
References:
 [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
     LAPACK Working note 169.
 [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
     LAPACK Working note 170.
 [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
     factorization software - a case study.
     ACM Trans. math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
     LAPACK Working note 176.
 [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
     QSVD, (H,K)-SVD computations.
     Department of Mathematics, University of Zagreb, 2008.
Bugs, examples and comments:
Please report all bugs and send interesting examples and/or comments to drmac.nosp@m.@mat.nosp@m.h.hr. Thank you.

Definition at line 479 of file sgejsv.f.

479 *
480 * -- LAPACK computational routine (version 3.6.0) --
481 * -- LAPACK is a software package provided by Univ. of Tennessee, --
482 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
483 * November 2015
484 *
485 * .. Scalar Arguments ..
486  IMPLICIT NONE
487  INTEGER info, lda, ldu, ldv, lwork, m, n
488 * ..
489 * .. Array Arguments ..
490  REAL a( lda, * ), sva( n ), u( ldu, * ), v( ldv, * ),
491  $ work( lwork )
492  INTEGER iwork( * )
493  CHARACTER*1 joba, jobp, jobr, jobt, jobu, jobv
494 * ..
495 *
496 * ===========================================================================
497 *
498 * .. Local Parameters ..
499  REAL zero, one
500  parameter( zero = 0.0e0, one = 1.0e0 )
501 * ..
502 * .. Local Scalars ..
503  REAL aapp, aaqq, aatmax, aatmin, big, big1, cond_ok,
504  $ condr1, condr2, entra, entrat, epsln, maxprj, scalem,
505  $ sconda, sfmin, small, temp1, uscal1, uscal2, xsc
506  INTEGER ierr, n1, nr, numrank, p, q, warning
507  LOGICAL almort, defr, errest, goscal, jracc, kill, lsvec,
508  $ l2aber, l2kill, l2pert, l2rank, l2tran,
509  $ noscal, rowpiv, rsvec, transp
510 * ..
511 * .. Intrinsic Functions ..
512  INTRINSIC abs, alog, max, min, float, nint, sign, sqrt
513 * ..
514 * .. External Functions ..
515  REAL slamch, snrm2
516  INTEGER isamax
517  LOGICAL lsame
518  EXTERNAL isamax, lsame, slamch, snrm2
519 * ..
520 * .. External Subroutines ..
521  EXTERNAL scopy, sgelqf, sgeqp3, sgeqrf, slacpy, slascl,
524 *
525  EXTERNAL sgesvj
526 * ..
527 *
528 * Test the input arguments
529 *
530  lsvec = lsame( jobu, 'U' ) .OR. lsame( jobu, 'F' )
531  jracc = lsame( jobv, 'J' )
532  rsvec = lsame( jobv, 'V' ) .OR. jracc
533  rowpiv = lsame( joba, 'F' ) .OR. lsame( joba, 'G' )
534  l2rank = lsame( joba, 'R' )
535  l2aber = lsame( joba, 'A' )
536  errest = lsame( joba, 'E' ) .OR. lsame( joba, 'G' )
537  l2tran = lsame( jobt, 'T' )
538  l2kill = lsame( jobr, 'R' )
539  defr = lsame( jobr, 'N' )
540  l2pert = lsame( jobp, 'P' )
541 *
542  IF ( .NOT.(rowpiv .OR. l2rank .OR. l2aber .OR.
543  $ errest .OR. lsame( joba, 'C' ) )) THEN
544  info = - 1
545  ELSE IF ( .NOT.( lsvec .OR. lsame( jobu, 'N' ) .OR.
546  $ lsame( jobu, 'W' )) ) THEN
547  info = - 2
548  ELSE IF ( .NOT.( rsvec .OR. lsame( jobv, 'N' ) .OR.
549  $ lsame( jobv, 'W' )) .OR. ( jracc .AND. (.NOT.lsvec) ) ) THEN
550  info = - 3
551  ELSE IF ( .NOT. ( l2kill .OR. defr ) ) THEN
552  info = - 4
553  ELSE IF ( .NOT. ( l2tran .OR. lsame( jobt, 'N' ) ) ) THEN
554  info = - 5
555  ELSE IF ( .NOT. ( l2pert .OR. lsame( jobp, 'N' ) ) ) THEN
556  info = - 6
557  ELSE IF ( m .LT. 0 ) THEN
558  info = - 7
559  ELSE IF ( ( n .LT. 0 ) .OR. ( n .GT. m ) ) THEN
560  info = - 8
561  ELSE IF ( lda .LT. m ) THEN
562  info = - 10
563  ELSE IF ( lsvec .AND. ( ldu .LT. m ) ) THEN
564  info = - 13
565  ELSE IF ( rsvec .AND. ( ldv .LT. n ) ) THEN
566  info = - 14
567  ELSE IF ( (.NOT.(lsvec .OR. rsvec .OR. errest).AND.
568  $ (lwork .LT. max(7,4*n+1,2*m+n))) .OR.
569  $ (.NOT.(lsvec .OR. rsvec) .AND. errest .AND.
570  $ (lwork .LT. max(7,4*n+n*n,2*m+n))) .OR.
571  $ (lsvec .AND. (.NOT.rsvec) .AND. (lwork .LT. max(7,2*m+n,4*n+1)))
572  $ .OR.
573  $ (rsvec .AND. (.NOT.lsvec) .AND. (lwork .LT. max(7,2*m+n,4*n+1)))
574  $ .OR.
575  $ (lsvec .AND. rsvec .AND. (.NOT.jracc) .AND.
576  $ (lwork.LT.max(2*m+n,6*n+2*n*n)))
577  $ .OR. (lsvec .AND. rsvec .AND. jracc .AND.
578  $ lwork.LT.max(2*m+n,4*n+n*n,2*n+n*n+6)))
579  $ THEN
580  info = - 17
581  ELSE
582 * #:)
583  info = 0
584  END IF
585 *
586  IF ( info .NE. 0 ) THEN
587 * #:(
588  CALL xerbla( 'SGEJSV', - info )
589  RETURN
590  END IF
591 *
592 * Quick return for void matrix (Y3K safe)
593 * #:)
594  IF ( ( m .EQ. 0 ) .OR. ( n .EQ. 0 ) ) RETURN
595 *
596 * Determine whether the matrix U should be M x N or M x M
597 *
598  IF ( lsvec ) THEN
599  n1 = n
600  IF ( lsame( jobu, 'F' ) ) n1 = m
601  END IF
602 *
603 * Set numerical parameters
604 *
605 *! NOTE: Make sure SLAMCH() does not fail on the target architecture.
606 *
607  epsln = slamch('Epsilon')
608  sfmin = slamch('SafeMinimum')
609  small = sfmin / epsln
610  big = slamch('O')
611 * BIG = ONE / SFMIN
612 *
613 * Initialize SVA(1:N) = diag( ||A e_i||_2 )_1^N
614 *
615 *(!) If necessary, scale SVA() to protect the largest norm from
616 * overflow. It is possible that this scaling pushes the smallest
617 * column norm left from the underflow threshold (extreme case).
618 *
619  scalem = one / sqrt(float(m)*float(n))
620  noscal = .true.
621  goscal = .true.
622  DO 1874 p = 1, n
623  aapp = zero
624  aaqq = one
625  CALL slassq( m, a(1,p), 1, aapp, aaqq )
626  IF ( aapp .GT. big ) THEN
627  info = - 9
628  CALL xerbla( 'SGEJSV', -info )
629  RETURN
630  END IF
631  aaqq = sqrt(aaqq)
632  IF ( ( aapp .LT. (big / aaqq) ) .AND. noscal ) THEN
633  sva(p) = aapp * aaqq
634  ELSE
635  noscal = .false.
636  sva(p) = aapp * ( aaqq * scalem )
637  IF ( goscal ) THEN
638  goscal = .false.
639  CALL sscal( p-1, scalem, sva, 1 )
640  END IF
641  END IF
642  1874 CONTINUE
643 *
644  IF ( noscal ) scalem = one
645 *
646  aapp = zero
647  aaqq = big
648  DO 4781 p = 1, n
649  aapp = max( aapp, sva(p) )
650  IF ( sva(p) .NE. zero ) aaqq = min( aaqq, sva(p) )
651  4781 CONTINUE
652 *
653 * Quick return for zero M x N matrix
654 * #:)
655  IF ( aapp .EQ. zero ) THEN
656  IF ( lsvec ) CALL slaset( 'G', m, n1, zero, one, u, ldu )
657  IF ( rsvec ) CALL slaset( 'G', n, n, zero, one, v, ldv )
658  work(1) = one
659  work(2) = one
660  IF ( errest ) work(3) = one
661  IF ( lsvec .AND. rsvec ) THEN
662  work(4) = one
663  work(5) = one
664  END IF
665  IF ( l2tran ) THEN
666  work(6) = zero
667  work(7) = zero
668  END IF
669  iwork(1) = 0
670  iwork(2) = 0
671  iwork(3) = 0
672  RETURN
673  END IF
674 *
675 * Issue warning if denormalized column norms detected. Override the
676 * high relative accuracy request. Issue licence to kill columns
677 * (set them to zero) whose norm is less than sigma_max / BIG (roughly).
678 * #:(
679  warning = 0
680  IF ( aaqq .LE. sfmin ) THEN
681  l2rank = .true.
682  l2kill = .true.
683  warning = 1
684  END IF
685 *
686 * Quick return for one-column matrix
687 * #:)
688  IF ( n .EQ. 1 ) THEN
689 *
690  IF ( lsvec ) THEN
691  CALL slascl( 'G',0,0,sva(1),scalem, m,1,a(1,1),lda,ierr )
692  CALL slacpy( 'A', m, 1, a, lda, u, ldu )
693 * computing all M left singular vectors of the M x 1 matrix
694  IF ( n1 .NE. n ) THEN
695  CALL sgeqrf( m, n, u,ldu, work, work(n+1),lwork-n,ierr )
696  CALL sorgqr( m,n1,1, u,ldu,work,work(n+1),lwork-n,ierr )
697  CALL scopy( m, a(1,1), 1, u(1,1), 1 )
698  END IF
699  END IF
700  IF ( rsvec ) THEN
701  v(1,1) = one
702  END IF
703  IF ( sva(1) .LT. (big*scalem) ) THEN
704  sva(1) = sva(1) / scalem
705  scalem = one
706  END IF
707  work(1) = one / scalem
708  work(2) = one
709  IF ( sva(1) .NE. zero ) THEN
710  iwork(1) = 1
711  IF ( ( sva(1) / scalem) .GE. sfmin ) THEN
712  iwork(2) = 1
713  ELSE
714  iwork(2) = 0
715  END IF
716  ELSE
717  iwork(1) = 0
718  iwork(2) = 0
719  END IF
720  IF ( errest ) work(3) = one
721  IF ( lsvec .AND. rsvec ) THEN
722  work(4) = one
723  work(5) = one
724  END IF
725  IF ( l2tran ) THEN
726  work(6) = zero
727  work(7) = zero
728  END IF
729  RETURN
730 *
731  END IF
732 *
733  transp = .false.
734  l2tran = l2tran .AND. ( m .EQ. n )
735 *
736  aatmax = -one
737  aatmin = big
738  IF ( rowpiv .OR. l2tran ) THEN
739 *
740 * Compute the row norms, needed to determine row pivoting sequence
741 * (in the case of heavily row weighted A, row pivoting is strongly
742 * advised) and to collect information needed to compare the
743 * structures of A * A^t and A^t * A (in the case L2TRAN.EQ..TRUE.).
744 *
745  IF ( l2tran ) THEN
746  DO 1950 p = 1, m
747  xsc = zero
748  temp1 = one
749  CALL slassq( n, a(p,1), lda, xsc, temp1 )
750 * SLASSQ gets both the ell_2 and the ell_infinity norm
751 * in one pass through the vector
752  work(m+n+p) = xsc * scalem
753  work(n+p) = xsc * (scalem*sqrt(temp1))
754  aatmax = max( aatmax, work(n+p) )
755  IF (work(n+p) .NE. zero) aatmin = min(aatmin,work(n+p))
756  1950 CONTINUE
757  ELSE
758  DO 1904 p = 1, m
759  work(m+n+p) = scalem*abs( a(p,isamax(n,a(p,1),lda)) )
760  aatmax = max( aatmax, work(m+n+p) )
761  aatmin = min( aatmin, work(m+n+p) )
762  1904 CONTINUE
763  END IF
764 *
765  END IF
766 *
767 * For square matrix A try to determine whether A^t would be better
768 * input for the preconditioned Jacobi SVD, with faster convergence.
769 * The decision is based on an O(N) function of the vector of column
770 * and row norms of A, based on the Shannon entropy. This should give
771 * the right choice in most cases when the difference actually matters.
772 * It may fail and pick the slower converging side.
773 *
774  entra = zero
775  entrat = zero
776  IF ( l2tran ) THEN
777 *
778  xsc = zero
779  temp1 = one
780  CALL slassq( n, sva, 1, xsc, temp1 )
781  temp1 = one / temp1
782 *
783  entra = zero
784  DO 1113 p = 1, n
785  big1 = ( ( sva(p) / xsc )**2 ) * temp1
786  IF ( big1 .NE. zero ) entra = entra + big1 * alog(big1)
787  1113 CONTINUE
788  entra = - entra / alog(float(n))
789 *
790 * Now, SVA().^2/Trace(A^t * A) is a point in the probability simplex.
791 * It is derived from the diagonal of A^t * A. Do the same with the
792 * diagonal of A * A^t, compute the entropy of the corresponding
793 * probability distribution. Note that A * A^t and A^t * A have the
794 * same trace.
795 *
796  entrat = zero
797  DO 1114 p = n+1, n+m
798  big1 = ( ( work(p) / xsc )**2 ) * temp1
799  IF ( big1 .NE. zero ) entrat = entrat + big1 * alog(big1)
800  1114 CONTINUE
801  entrat = - entrat / alog(float(m))
802 *
803 * Analyze the entropies and decide A or A^t. Smaller entropy
804 * usually means better input for the algorithm.
805 *
806  transp = ( entrat .LT. entra )
807 *
808 * If A^t is better than A, transpose A.
809 *
810  IF ( transp ) THEN
811 * In an optimal implementation, this trivial transpose
812 * should be replaced with faster transpose.
813  DO 1115 p = 1, n - 1
814  DO 1116 q = p + 1, n
815  temp1 = a(q,p)
816  a(q,p) = a(p,q)
817  a(p,q) = temp1
818  1116 CONTINUE
819  1115 CONTINUE
820  DO 1117 p = 1, n
821  work(m+n+p) = sva(p)
822  sva(p) = work(n+p)
823  1117 CONTINUE
824  temp1 = aapp
825  aapp = aatmax
826  aatmax = temp1
827  temp1 = aaqq
828  aaqq = aatmin
829  aatmin = temp1
830  kill = lsvec
831  lsvec = rsvec
832  rsvec = kill
833  IF ( lsvec ) n1 = n
834 *
835  rowpiv = .true.
836  END IF
837 *
838  END IF
839 * END IF L2TRAN
840 *
841 * Scale the matrix so that its maximal singular value remains less
842 * than SQRT(BIG) -- the matrix is scaled so that its maximal column
843 * has Euclidean norm equal to SQRT(BIG/N). The only reason to keep
844 * SQRT(BIG) instead of BIG is the fact that SGEJSV uses LAPACK and
845 * BLAS routines that, in some implementations, are not capable of
846 * working in the full interval [SFMIN,BIG] and that they may provoke
847 * overflows in the intermediate results. If the singular values spread
848 * from SFMIN to BIG, then SGESVJ will compute them. So, in that case,
849 * one should use SGESVJ instead of SGEJSV.
850 *
851  big1 = sqrt( big )
852  temp1 = sqrt( big / float(n) )
853 *
854  CALL slascl( 'G', 0, 0, aapp, temp1, n, 1, sva, n, ierr )
855  IF ( aaqq .GT. (aapp * sfmin) ) THEN
856  aaqq = ( aaqq / aapp ) * temp1
857  ELSE
858  aaqq = ( aaqq * temp1 ) / aapp
859  END IF
860  temp1 = temp1 * scalem
861  CALL slascl( 'G', 0, 0, aapp, temp1, m, n, a, lda, ierr )
862 *
863 * To undo scaling at the end of this procedure, multiply the
864 * computed singular values with USCAL2 / USCAL1.
865 *
866  uscal1 = temp1
867  uscal2 = aapp
868 *
869  IF ( l2kill ) THEN
870 * L2KILL enforces computation of nonzero singular values in
871 * the restricted range of condition number of the initial A,
872 * sigma_max(A) / sigma_min(A) approx. SQRT(BIG)/SQRT(SFMIN).
873  xsc = sqrt( sfmin )
874  ELSE
875  xsc = small
876 *
877 * Now, if the condition number of A is too big,
878 * sigma_max(A) / sigma_min(A) .GT. SQRT(BIG/N) * EPSLN / SFMIN,
879 * as a precaution measure, the full SVD is computed using SGESVJ
880 * with accumulated Jacobi rotations. This provides numerically
881 * more robust computation, at the cost of slightly increased run
882 * time. Depending on the concrete implementation of BLAS and LAPACK
883 * (i.e. how they behave in presence of extreme ill-conditioning) the
884 * implementor may decide to remove this switch.
885  IF ( ( aaqq.LT.sqrt(sfmin) ) .AND. lsvec .AND. rsvec ) THEN
886  jracc = .true.
887  END IF
888 *
889  END IF
890  IF ( aaqq .LT. xsc ) THEN
891  DO 700 p = 1, n
892  IF ( sva(p) .LT. xsc ) THEN
893  CALL slaset( 'A', m, 1, zero, zero, a(1,p), lda )
894  sva(p) = zero
895  END IF
896  700 CONTINUE
897  END IF
898 *
899 * Preconditioning using QR factorization with pivoting
900 *
901  IF ( rowpiv ) THEN
902 * Optional row permutation (Bjoerck row pivoting):
903 * A result by Cox and Higham shows that the Bjoerck's
904 * row pivoting combined with standard column pivoting
905 * has similar effect as Powell-Reid complete pivoting.
906 * The ell-infinity norms of A are made nonincreasing.
907  DO 1952 p = 1, m - 1
908  q = isamax( m-p+1, work(m+n+p), 1 ) + p - 1
909  iwork(2*n+p) = q
910  IF ( p .NE. q ) THEN
911  temp1 = work(m+n+p)
912  work(m+n+p) = work(m+n+q)
913  work(m+n+q) = temp1
914  END IF
915  1952 CONTINUE
916  CALL slaswp( n, a, lda, 1, m-1, iwork(2*n+1), 1 )
917  END IF
918 *
919 * End of the preparation phase (scaling, optional sorting and
920 * transposing, optional flushing of small columns).
921 *
922 * Preconditioning
923 *
924 * If the full SVD is needed, the right singular vectors are computed
925 * from a matrix equation, and for that we need theoretical analysis
926 * of the Businger-Golub pivoting. So we use SGEQP3 as the first RR QRF.
927 * In all other cases the first RR QRF can be chosen by other criteria
928 * (eg speed by replacing global with restricted window pivoting, such
929 * as in SGEQPX from TOMS # 782). Good results will be obtained using
930 * SGEQPX with properly (!) chosen numerical parameters.
931 * Any improvement of SGEQP3 improves overal performance of SGEJSV.
932 *
933 * A * P1 = Q1 * [ R1^t 0]^t:
934  DO 1963 p = 1, n
935 * .. all columns are free columns
936  iwork(p) = 0
937  1963 CONTINUE
938  CALL sgeqp3( m,n,a,lda, iwork,work, work(n+1),lwork-n, ierr )
939 *
940 * The upper triangular matrix R1 from the first QRF is inspected for
941 * rank deficiency and possibilities for deflation, or possible
942 * ill-conditioning. Depending on the user specified flag L2RANK,
943 * the procedure explores possibilities to reduce the numerical
944 * rank by inspecting the computed upper triangular factor. If
945 * L2RANK or L2ABER are up, then SGEJSV will compute the SVD of
946 * A + dA, where ||dA|| <= f(M,N)*EPSLN.
947 *
948  nr = 1
949  IF ( l2aber ) THEN
950 * Standard absolute error bound suffices. All sigma_i with
951 * sigma_i < N*EPSLN*||A|| are flushed to zero. This is an
952 * agressive enforcement of lower numerical rank by introducing a
953 * backward error of the order of N*EPSLN*||A||.
954  temp1 = sqrt(float(n))*epsln
955  DO 3001 p = 2, n
956  IF ( abs(a(p,p)) .GE. (temp1*abs(a(1,1))) ) THEN
957  nr = nr + 1
958  ELSE
959  GO TO 3002
960  END IF
961  3001 CONTINUE
962  3002 CONTINUE
963  ELSE IF ( l2rank ) THEN
964 * .. similarly as above, only slightly more gentle (less agressive).
965 * Sudden drop on the diagonal of R1 is used as the criterion for
966 * close-to-rank-defficient.
967  temp1 = sqrt(sfmin)
968  DO 3401 p = 2, n
969  IF ( ( abs(a(p,p)) .LT. (epsln*abs(a(p-1,p-1))) ) .OR.
970  $ ( abs(a(p,p)) .LT. small ) .OR.
971  $ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3402
972  nr = nr + 1
973  3401 CONTINUE
974  3402 CONTINUE
975 *
976  ELSE
977 * The goal is high relative accuracy. However, if the matrix
978 * has high scaled condition number the relative accuracy is in
979 * general not feasible. Later on, a condition number estimator
980 * will be deployed to estimate the scaled condition number.
981 * Here we just remove the underflowed part of the triangular
982 * factor. This prevents the situation in which the code is
983 * working hard to get the accuracy not warranted by the data.
984  temp1 = sqrt(sfmin)
985  DO 3301 p = 2, n
986  IF ( ( abs(a(p,p)) .LT. small ) .OR.
987  $ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3302
988  nr = nr + 1
989  3301 CONTINUE
990  3302 CONTINUE
991 *
992  END IF
993 *
994  almort = .false.
995  IF ( nr .EQ. n ) THEN
996  maxprj = one
997  DO 3051 p = 2, n
998  temp1 = abs(a(p,p)) / sva(iwork(p))
999  maxprj = min( maxprj, temp1 )
1000  3051 CONTINUE
1001  IF ( maxprj**2 .GE. one - float(n)*epsln ) almort = .true.
1002  END IF
1003 *
1004 *
1005  sconda = - one
1006  condr1 = - one
1007  condr2 = - one
1008 *
1009  IF ( errest ) THEN
1010  IF ( n .EQ. nr ) THEN
1011  IF ( rsvec ) THEN
1012 * .. V is available as workspace
1013  CALL slacpy( 'U', n, n, a, lda, v, ldv )
1014  DO 3053 p = 1, n
1015  temp1 = sva(iwork(p))
1016  CALL sscal( p, one/temp1, v(1,p), 1 )
1017  3053 CONTINUE
1018  CALL spocon( 'U', n, v, ldv, one, temp1,
1019  $ work(n+1), iwork(2*n+m+1), ierr )
1020  ELSE IF ( lsvec ) THEN
1021 * .. U is available as workspace
1022  CALL slacpy( 'U', n, n, a, lda, u, ldu )
1023  DO 3054 p = 1, n
1024  temp1 = sva(iwork(p))
1025  CALL sscal( p, one/temp1, u(1,p), 1 )
1026  3054 CONTINUE
1027  CALL spocon( 'U', n, u, ldu, one, temp1,
1028  $ work(n+1), iwork(2*n+m+1), ierr )
1029  ELSE
1030  CALL slacpy( 'U', n, n, a, lda, work(n+1), n )
1031  DO 3052 p = 1, n
1032  temp1 = sva(iwork(p))
1033  CALL sscal( p, one/temp1, work(n+(p-1)*n+1), 1 )
1034  3052 CONTINUE
1035 * .. the columns of R are scaled to have unit Euclidean lengths.
1036  CALL spocon( 'U', n, work(n+1), n, one, temp1,
1037  $ work(n+n*n+1), iwork(2*n+m+1), ierr )
1038  END IF
1039  sconda = one / sqrt(temp1)
1040 * SCONDA is an estimate of SQRT(||(R^t * R)^(-1)||_1).
1041 * N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
1042  ELSE
1043  sconda = - one
1044  END IF
1045  END IF
1046 *
1047  l2pert = l2pert .AND. ( abs( a(1,1)/a(nr,nr) ) .GT. sqrt(big1) )
1048 * If there is no violent scaling, artificial perturbation is not needed.
1049 *
1050 * Phase 3:
1051 *
1052  IF ( .NOT. ( rsvec .OR. lsvec ) ) THEN
1053 *
1054 * Singular Values only
1055 *
1056 * .. transpose A(1:NR,1:N)
1057  DO 1946 p = 1, min( n-1, nr )
1058  CALL scopy( n-p, a(p,p+1), lda, a(p+1,p), 1 )
1059  1946 CONTINUE
1060 *
1061 * The following two DO-loops introduce small relative perturbation
1062 * into the strict upper triangle of the lower triangular matrix.
1063 * Small entries below the main diagonal are also changed.
1064 * This modification is useful if the computing environment does not
1065 * provide/allow FLUSH TO ZERO underflow, for it prevents many
1066 * annoying denormalized numbers in case of strongly scaled matrices.
1067 * The perturbation is structured so that it does not introduce any
1068 * new perturbation of the singular values, and it does not destroy
1069 * the job done by the preconditioner.
1070 * The licence for this perturbation is in the variable L2PERT, which
1071 * should be .FALSE. if FLUSH TO ZERO underflow is active.
1072 *
1073  IF ( .NOT. almort ) THEN
1074 *
1075  IF ( l2pert ) THEN
1076 * XSC = SQRT(SMALL)
1077  xsc = epsln / float(n)
1078  DO 4947 q = 1, nr
1079  temp1 = xsc*abs(a(q,q))
1080  DO 4949 p = 1, n
1081  IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1082  $ .OR. ( p .LT. q ) )
1083  $ a(p,q) = sign( temp1, a(p,q) )
1084  4949 CONTINUE
1085  4947 CONTINUE
1086  ELSE
1087  CALL slaset( 'U', nr-1,nr-1, zero,zero, a(1,2),lda )
1088  END IF
1089 *
1090 * .. second preconditioning using the QR factorization
1091 *
1092  CALL sgeqrf( n,nr, a,lda, work, work(n+1),lwork-n, ierr )
1093 *
1094 * .. and transpose upper to lower triangular
1095  DO 1948 p = 1, nr - 1
1096  CALL scopy( nr-p, a(p,p+1), lda, a(p+1,p), 1 )
1097  1948 CONTINUE
1098 *
1099  END IF
1100 *
1101 * Row-cyclic Jacobi SVD algorithm with column pivoting
1102 *
1103 * .. again some perturbation (a "background noise") is added
1104 * to drown denormals
1105  IF ( l2pert ) THEN
1106 * XSC = SQRT(SMALL)
1107  xsc = epsln / float(n)
1108  DO 1947 q = 1, nr
1109  temp1 = xsc*abs(a(q,q))
1110  DO 1949 p = 1, nr
1111  IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1112  $ .OR. ( p .LT. q ) )
1113  $ a(p,q) = sign( temp1, a(p,q) )
1114  1949 CONTINUE
1115  1947 CONTINUE
1116  ELSE
1117  CALL slaset( 'U', nr-1, nr-1, zero, zero, a(1,2), lda )
1118  END IF
1119 *
1120 * .. and one-sided Jacobi rotations are started on a lower
1121 * triangular matrix (plus perturbation which is ignored in
1122 * the part which destroys triangular form (confusing?!))
1123 *
1124  CALL sgesvj( 'L', 'NoU', 'NoV', nr, nr, a, lda, sva,
1125  $ n, v, ldv, work, lwork, info )
1126 *
1127  scalem = work(1)
1128  numrank = nint(work(2))
1129 *
1130 *
1131  ELSE IF ( rsvec .AND. ( .NOT. lsvec ) ) THEN
1132 *
1133 * -> Singular Values and Right Singular Vectors <-
1134 *
1135  IF ( almort ) THEN
1136 *
1137 * .. in this case NR equals N
1138  DO 1998 p = 1, nr
1139  CALL scopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1140  1998 CONTINUE
1141  CALL slaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2), ldv )
1142 *
1143  CALL sgesvj( 'L','U','N', n, nr, v,ldv, sva, nr, a,lda,
1144  $ work, lwork, info )
1145  scalem = work(1)
1146  numrank = nint(work(2))
1147 
1148  ELSE
1149 *
1150 * .. two more QR factorizations ( one QRF is not enough, two require
1151 * accumulated product of Jacobi rotations, three are perfect )
1152 *
1153  CALL slaset( 'Lower', nr-1, nr-1, zero, zero, a(2,1), lda )
1154  CALL sgelqf( nr, n, a, lda, work, work(n+1), lwork-n, ierr)
1155  CALL slacpy( 'Lower', nr, nr, a, lda, v, ldv )
1156  CALL slaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2), ldv )
1157  CALL sgeqrf( nr, nr, v, ldv, work(n+1), work(2*n+1),
1158  $ lwork-2*n, ierr )
1159  DO 8998 p = 1, nr
1160  CALL scopy( nr-p+1, v(p,p), ldv, v(p,p), 1 )
1161  8998 CONTINUE
1162  CALL slaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2), ldv )
1163 *
1164  CALL sgesvj( 'Lower', 'U','N', nr, nr, v,ldv, sva, nr, u,
1165  $ ldu, work(n+1), lwork-n, info )
1166  scalem = work(n+1)
1167  numrank = nint(work(n+2))
1168  IF ( nr .LT. n ) THEN
1169  CALL slaset( 'A',n-nr, nr, zero,zero, v(nr+1,1), ldv )
1170  CALL slaset( 'A',nr, n-nr, zero,zero, v(1,nr+1), ldv )
1171  CALL slaset( 'A',n-nr,n-nr,zero,one, v(nr+1,nr+1), ldv )
1172  END IF
1173 *
1174  CALL sormlq( 'Left', 'Transpose', n, n, nr, a, lda, work,
1175  $ v, ldv, work(n+1), lwork-n, ierr )
1176 *
1177  END IF
1178 *
1179  DO 8991 p = 1, n
1180  CALL scopy( n, v(p,1), ldv, a(iwork(p),1), lda )
1181  8991 CONTINUE
1182  CALL slacpy( 'All', n, n, a, lda, v, ldv )
1183 *
1184  IF ( transp ) THEN
1185  CALL slacpy( 'All', n, n, v, ldv, u, ldu )
1186  END IF
1187 *
1188  ELSE IF ( lsvec .AND. ( .NOT. rsvec ) ) THEN
1189 *
1190 * .. Singular Values and Left Singular Vectors ..
1191 *
1192 * .. second preconditioning step to avoid need to accumulate
1193 * Jacobi rotations in the Jacobi iterations.
1194  DO 1965 p = 1, nr
1195  CALL scopy( n-p+1, a(p,p), lda, u(p,p), 1 )
1196  1965 CONTINUE
1197  CALL slaset( 'Upper', nr-1, nr-1, zero, zero, u(1,2), ldu )
1198 *
1199  CALL sgeqrf( n, nr, u, ldu, work(n+1), work(2*n+1),
1200  $ lwork-2*n, ierr )
1201 *
1202  DO 1967 p = 1, nr - 1
1203  CALL scopy( nr-p, u(p,p+1), ldu, u(p+1,p), 1 )
1204  1967 CONTINUE
1205  CALL slaset( 'Upper', nr-1, nr-1, zero, zero, u(1,2), ldu )
1206 *
1207  CALL sgesvj( 'Lower', 'U', 'N', nr,nr, u, ldu, sva, nr, a,
1208  $ lda, work(n+1), lwork-n, info )
1209  scalem = work(n+1)
1210  numrank = nint(work(n+2))
1211 *
1212  IF ( nr .LT. m ) THEN
1213  CALL slaset( 'A', m-nr, nr,zero, zero, u(nr+1,1), ldu )
1214  IF ( nr .LT. n1 ) THEN
1215  CALL slaset( 'A',nr, n1-nr, zero, zero, u(1,nr+1), ldu )
1216  CALL slaset( 'A',m-nr,n1-nr,zero,one,u(nr+1,nr+1), ldu )
1217  END IF
1218  END IF
1219 *
1220  CALL sormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1221  $ ldu, work(n+1), lwork-n, ierr )
1222 *
1223  IF ( rowpiv )
1224  $ CALL slaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1225 *
1226  DO 1974 p = 1, n1
1227  xsc = one / snrm2( m, u(1,p), 1 )
1228  CALL sscal( m, xsc, u(1,p), 1 )
1229  1974 CONTINUE
1230 *
1231  IF ( transp ) THEN
1232  CALL slacpy( 'All', n, n, u, ldu, v, ldv )
1233  END IF
1234 *
1235  ELSE
1236 *
1237 * .. Full SVD ..
1238 *
1239  IF ( .NOT. jracc ) THEN
1240 *
1241  IF ( .NOT. almort ) THEN
1242 *
1243 * Second Preconditioning Step (QRF [with pivoting])
1244 * Note that the composition of TRANSPOSE, QRF and TRANSPOSE is
1245 * equivalent to an LQF CALL. Since in many libraries the QRF
1246 * seems to be better optimized than the LQF, we do explicit
1247 * transpose and use the QRF. This is subject to changes in an
1248 * optimized implementation of SGEJSV.
1249 *
1250  DO 1968 p = 1, nr
1251  CALL scopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1252  1968 CONTINUE
1253 *
1254 * .. the following two loops perturb small entries to avoid
1255 * denormals in the second QR factorization, where they are
1256 * as good as zeros. This is done to avoid painfully slow
1257 * computation with denormals. The relative size of the perturbation
1258 * is a parameter that can be changed by the implementer.
1259 * This perturbation device will be obsolete on machines with
1260 * properly implemented arithmetic.
1261 * To switch it off, set L2PERT=.FALSE. To remove it from the
1262 * code, remove the action under L2PERT=.TRUE., leave the ELSE part.
1263 * The following two loops should be blocked and fused with the
1264 * transposed copy above.
1265 *
1266  IF ( l2pert ) THEN
1267  xsc = sqrt(small)
1268  DO 2969 q = 1, nr
1269  temp1 = xsc*abs( v(q,q) )
1270  DO 2968 p = 1, n
1271  IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
1272  $ .OR. ( p .LT. q ) )
1273  $ v(p,q) = sign( temp1, v(p,q) )
1274  IF ( p .LT. q ) v(p,q) = - v(p,q)
1275  2968 CONTINUE
1276  2969 CONTINUE
1277  ELSE
1278  CALL slaset( 'U', nr-1, nr-1, zero, zero, v(1,2), ldv )
1279  END IF
1280 *
1281 * Estimate the row scaled condition number of R1
1282 * (If R1 is rectangular, N > NR, then the condition number
1283 * of the leading NR x NR submatrix is estimated.)
1284 *
1285  CALL slacpy( 'L', nr, nr, v, ldv, work(2*n+1), nr )
1286  DO 3950 p = 1, nr
1287  temp1 = snrm2(nr-p+1,work(2*n+(p-1)*nr+p),1)
1288  CALL sscal(nr-p+1,one/temp1,work(2*n+(p-1)*nr+p),1)
1289  3950 CONTINUE
1290  CALL spocon('Lower',nr,work(2*n+1),nr,one,temp1,
1291  $ work(2*n+nr*nr+1),iwork(m+2*n+1),ierr)
1292  condr1 = one / sqrt(temp1)
1293 * .. here need a second oppinion on the condition number
1294 * .. then assume worst case scenario
1295 * R1 is OK for inverse <=> CONDR1 .LT. FLOAT(N)
1296 * more conservative <=> CONDR1 .LT. SQRT(FLOAT(N))
1297 *
1298  cond_ok = sqrt(float(nr))
1299 *[TP] COND_OK is a tuning parameter.
1300 
1301  IF ( condr1 .LT. cond_ok ) THEN
1302 * .. the second QRF without pivoting. Note: in an optimized
1303 * implementation, this QRF should be implemented as the QRF
1304 * of a lower triangular matrix.
1305 * R1^t = Q2 * R2
1306  CALL sgeqrf( n, nr, v, ldv, work(n+1), work(2*n+1),
1307  $ lwork-2*n, ierr )
1308 *
1309  IF ( l2pert ) THEN
1310  xsc = sqrt(small)/epsln
1311  DO 3959 p = 2, nr
1312  DO 3958 q = 1, p - 1
1313  temp1 = xsc * min(abs(v(p,p)),abs(v(q,q)))
1314  IF ( abs(v(q,p)) .LE. temp1 )
1315  $ v(q,p) = sign( temp1, v(q,p) )
1316  3958 CONTINUE
1317  3959 CONTINUE
1318  END IF
1319 *
1320  IF ( nr .NE. n )
1321  $ CALL slacpy( 'A', n, nr, v, ldv, work(2*n+1), n )
1322 * .. save ...
1323 *
1324 * .. this transposed copy should be better than naive
1325  DO 1969 p = 1, nr - 1
1326  CALL scopy( nr-p, v(p,p+1), ldv, v(p+1,p), 1 )
1327  1969 CONTINUE
1328 *
1329  condr2 = condr1
1330 *
1331  ELSE
1332 *
1333 * .. ill-conditioned case: second QRF with pivoting
1334 * Note that windowed pivoting would be equaly good
1335 * numerically, and more run-time efficient. So, in
1336 * an optimal implementation, the next call to SGEQP3
1337 * should be replaced with eg. CALL SGEQPX (ACM TOMS #782)
1338 * with properly (carefully) chosen parameters.
1339 *
1340 * R1^t * P2 = Q2 * R2
1341  DO 3003 p = 1, nr
1342  iwork(n+p) = 0
1343  3003 CONTINUE
1344  CALL sgeqp3( n, nr, v, ldv, iwork(n+1), work(n+1),
1345  $ work(2*n+1), lwork-2*n, ierr )
1346 ** CALL SGEQRF( N, NR, V, LDV, WORK(N+1), WORK(2*N+1),
1347 ** $ LWORK-2*N, IERR )
1348  IF ( l2pert ) THEN
1349  xsc = sqrt(small)
1350  DO 3969 p = 2, nr
1351  DO 3968 q = 1, p - 1
1352  temp1 = xsc * min(abs(v(p,p)),abs(v(q,q)))
1353  IF ( abs(v(q,p)) .LE. temp1 )
1354  $ v(q,p) = sign( temp1, v(q,p) )
1355  3968 CONTINUE
1356  3969 CONTINUE
1357  END IF
1358 *
1359  CALL slacpy( 'A', n, nr, v, ldv, work(2*n+1), n )
1360 *
1361  IF ( l2pert ) THEN
1362  xsc = sqrt(small)
1363  DO 8970 p = 2, nr
1364  DO 8971 q = 1, p - 1
1365  temp1 = xsc * min(abs(v(p,p)),abs(v(q,q)))
1366  v(p,q) = - sign( temp1, v(q,p) )
1367  8971 CONTINUE
1368  8970 CONTINUE
1369  ELSE
1370  CALL slaset( 'L',nr-1,nr-1,zero,zero,v(2,1),ldv )
1371  END IF
1372 * Now, compute R2 = L3 * Q3, the LQ factorization.
1373  CALL sgelqf( nr, nr, v, ldv, work(2*n+n*nr+1),
1374  $ work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, ierr )
1375 * .. and estimate the condition number
1376  CALL slacpy( 'L',nr,nr,v,ldv,work(2*n+n*nr+nr+1),nr )
1377  DO 4950 p = 1, nr
1378  temp1 = snrm2( p, work(2*n+n*nr+nr+p), nr )
1379  CALL sscal( p, one/temp1, work(2*n+n*nr+nr+p), nr )
1380  4950 CONTINUE
1381  CALL spocon( 'L',nr,work(2*n+n*nr+nr+1),nr,one,temp1,
1382  $ work(2*n+n*nr+nr+nr*nr+1),iwork(m+2*n+1),ierr )
1383  condr2 = one / sqrt(temp1)
1384 *
1385  IF ( condr2 .GE. cond_ok ) THEN
1386 * .. save the Householder vectors used for Q3
1387 * (this overwrittes the copy of R2, as it will not be
1388 * needed in this branch, but it does not overwritte the
1389 * Huseholder vectors of Q2.).
1390  CALL slacpy( 'U', nr, nr, v, ldv, work(2*n+1), n )
1391 * .. and the rest of the information on Q3 is in
1392 * WORK(2*N+N*NR+1:2*N+N*NR+N)
1393  END IF
1394 *
1395  END IF
1396 *
1397  IF ( l2pert ) THEN
1398  xsc = sqrt(small)
1399  DO 4968 q = 2, nr
1400  temp1 = xsc * v(q,q)
1401  DO 4969 p = 1, q - 1
1402 * V(p,q) = - SIGN( TEMP1, V(q,p) )
1403  v(p,q) = - sign( temp1, v(p,q) )
1404  4969 CONTINUE
1405  4968 CONTINUE
1406  ELSE
1407  CALL slaset( 'U', nr-1,nr-1, zero,zero, v(1,2), ldv )
1408  END IF
1409 *
1410 * Second preconditioning finished; continue with Jacobi SVD
1411 * The input matrix is lower trinagular.
1412 *
1413 * Recover the right singular vectors as solution of a well
1414 * conditioned triangular matrix equation.
1415 *
1416  IF ( condr1 .LT. cond_ok ) THEN
1417 *
1418  CALL sgesvj( 'L','U','N',nr,nr,v,ldv,sva,nr,u,
1419  $ ldu,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,info )
1420  scalem = work(2*n+n*nr+nr+1)
1421  numrank = nint(work(2*n+n*nr+nr+2))
1422  DO 3970 p = 1, nr
1423  CALL scopy( nr, v(1,p), 1, u(1,p), 1 )
1424  CALL sscal( nr, sva(p), v(1,p), 1 )
1425  3970 CONTINUE
1426 
1427 * .. pick the right matrix equation and solve it
1428 *
1429  IF ( nr .EQ. n ) THEN
1430 * :)) .. best case, R1 is inverted. The solution of this matrix
1431 * equation is Q2*V2 = the product of the Jacobi rotations
1432 * used in SGESVJ, premultiplied with the orthogonal matrix
1433 * from the second QR factorization.
1434  CALL strsm( 'L','U','N','N', nr,nr,one, a,lda, v,ldv )
1435  ELSE
1436 * .. R1 is well conditioned, but non-square. Transpose(R2)
1437 * is inverted to get the product of the Jacobi rotations
1438 * used in SGESVJ. The Q-factor from the second QR
1439 * factorization is then built in explicitly.
1440  CALL strsm('L','U','T','N',nr,nr,one,work(2*n+1),
1441  $ n,v,ldv)
1442  IF ( nr .LT. n ) THEN
1443  CALL slaset('A',n-nr,nr,zero,zero,v(nr+1,1),ldv)
1444  CALL slaset('A',nr,n-nr,zero,zero,v(1,nr+1),ldv)
1445  CALL slaset('A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv)
1446  END IF
1447  CALL sormqr('L','N',n,n,nr,work(2*n+1),n,work(n+1),
1448  $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr)
1449  END IF
1450 *
1451  ELSE IF ( condr2 .LT. cond_ok ) THEN
1452 *
1453 * :) .. the input matrix A is very likely a relative of
1454 * the Kahan matrix :)
1455 * The matrix R2 is inverted. The solution of the matrix equation
1456 * is Q3^T*V3 = the product of the Jacobi rotations (appplied to
1457 * the lower triangular L3 from the LQ factorization of
1458 * R2=L3*Q3), pre-multiplied with the transposed Q3.
1459  CALL sgesvj( 'L', 'U', 'N', nr, nr, v, ldv, sva, nr, u,
1460  $ ldu, work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, info )
1461  scalem = work(2*n+n*nr+nr+1)
1462  numrank = nint(work(2*n+n*nr+nr+2))
1463  DO 3870 p = 1, nr
1464  CALL scopy( nr, v(1,p), 1, u(1,p), 1 )
1465  CALL sscal( nr, sva(p), u(1,p), 1 )
1466  3870 CONTINUE
1467  CALL strsm('L','U','N','N',nr,nr,one,work(2*n+1),n,u,ldu)
1468 * .. apply the permutation from the second QR factorization
1469  DO 873 q = 1, nr
1470  DO 872 p = 1, nr
1471  work(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1472  872 CONTINUE
1473  DO 874 p = 1, nr
1474  u(p,q) = work(2*n+n*nr+nr+p)
1475  874 CONTINUE
1476  873 CONTINUE
1477  IF ( nr .LT. n ) THEN
1478  CALL slaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1479  CALL slaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1480  CALL slaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1481  END IF
1482  CALL sormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1483  $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1484  ELSE
1485 * Last line of defense.
1486 * #:( This is a rather pathological case: no scaled condition
1487 * improvement after two pivoted QR factorizations. Other
1488 * possibility is that the rank revealing QR factorization
1489 * or the condition estimator has failed, or the COND_OK
1490 * is set very close to ONE (which is unnecessary). Normally,
1491 * this branch should never be executed, but in rare cases of
1492 * failure of the RRQR or condition estimator, the last line of
1493 * defense ensures that SGEJSV completes the task.
1494 * Compute the full SVD of L3 using SGESVJ with explicit
1495 * accumulation of Jacobi rotations.
1496  CALL sgesvj( 'L', 'U', 'V', nr, nr, v, ldv, sva, nr, u,
1497  $ ldu, work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, info )
1498  scalem = work(2*n+n*nr+nr+1)
1499  numrank = nint(work(2*n+n*nr+nr+2))
1500  IF ( nr .LT. n ) THEN
1501  CALL slaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1502  CALL slaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1503  CALL slaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1504  END IF
1505  CALL sormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1506  $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1507 *
1508  CALL sormlq( 'L', 'T', nr, nr, nr, work(2*n+1), n,
1509  $ work(2*n+n*nr+1), u, ldu, work(2*n+n*nr+nr+1),
1510  $ lwork-2*n-n*nr-nr, ierr )
1511  DO 773 q = 1, nr
1512  DO 772 p = 1, nr
1513  work(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1514  772 CONTINUE
1515  DO 774 p = 1, nr
1516  u(p,q) = work(2*n+n*nr+nr+p)
1517  774 CONTINUE
1518  773 CONTINUE
1519 *
1520  END IF
1521 *
1522 * Permute the rows of V using the (column) permutation from the
1523 * first QRF. Also, scale the columns to make them unit in
1524 * Euclidean norm. This applies to all cases.
1525 *
1526  temp1 = sqrt(float(n)) * epsln
1527  DO 1972 q = 1, n
1528  DO 972 p = 1, n
1529  work(2*n+n*nr+nr+iwork(p)) = v(p,q)
1530  972 CONTINUE
1531  DO 973 p = 1, n
1532  v(p,q) = work(2*n+n*nr+nr+p)
1533  973 CONTINUE
1534  xsc = one / snrm2( n, v(1,q), 1 )
1535  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1536  $ CALL sscal( n, xsc, v(1,q), 1 )
1537  1972 CONTINUE
1538 * At this moment, V contains the right singular vectors of A.
1539 * Next, assemble the left singular vector matrix U (M x N).
1540  IF ( nr .LT. m ) THEN
1541  CALL slaset( 'A', m-nr, nr, zero, zero, u(nr+1,1), ldu )
1542  IF ( nr .LT. n1 ) THEN
1543  CALL slaset('A',nr,n1-nr,zero,zero,u(1,nr+1),ldu)
1544  CALL slaset('A',m-nr,n1-nr,zero,one,u(nr+1,nr+1),ldu)
1545  END IF
1546  END IF
1547 *
1548 * The Q matrix from the first QRF is built into the left singular
1549 * matrix U. This applies to all cases.
1550 *
1551  CALL sormqr( 'Left', 'No_Tr', m, n1, n, a, lda, work, u,
1552  $ ldu, work(n+1), lwork-n, ierr )
1553 
1554 * The columns of U are normalized. The cost is O(M*N) flops.
1555  temp1 = sqrt(float(m)) * epsln
1556  DO 1973 p = 1, nr
1557  xsc = one / snrm2( m, u(1,p), 1 )
1558  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1559  $ CALL sscal( m, xsc, u(1,p), 1 )
1560  1973 CONTINUE
1561 *
1562 * If the initial QRF is computed with row pivoting, the left
1563 * singular vectors must be adjusted.
1564 *
1565  IF ( rowpiv )
1566  $ CALL slaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1567 *
1568  ELSE
1569 *
1570 * .. the initial matrix A has almost orthogonal columns and
1571 * the second QRF is not needed
1572 *
1573  CALL slacpy( 'Upper', n, n, a, lda, work(n+1), n )
1574  IF ( l2pert ) THEN
1575  xsc = sqrt(small)
1576  DO 5970 p = 2, n
1577  temp1 = xsc * work( n + (p-1)*n + p )
1578  DO 5971 q = 1, p - 1
1579  work(n+(q-1)*n+p)=-sign(temp1,work(n+(p-1)*n+q))
1580  5971 CONTINUE
1581  5970 CONTINUE
1582  ELSE
1583  CALL slaset( 'Lower',n-1,n-1,zero,zero,work(n+2),n )
1584  END IF
1585 *
1586  CALL sgesvj( 'Upper', 'U', 'N', n, n, work(n+1), n, sva,
1587  $ n, u, ldu, work(n+n*n+1), lwork-n-n*n, info )
1588 *
1589  scalem = work(n+n*n+1)
1590  numrank = nint(work(n+n*n+2))
1591  DO 6970 p = 1, n
1592  CALL scopy( n, work(n+(p-1)*n+1), 1, u(1,p), 1 )
1593  CALL sscal( n, sva(p), work(n+(p-1)*n+1), 1 )
1594  6970 CONTINUE
1595 *
1596  CALL strsm( 'Left', 'Upper', 'NoTrans', 'No UD', n, n,
1597  $ one, a, lda, work(n+1), n )
1598  DO 6972 p = 1, n
1599  CALL scopy( n, work(n+p), n, v(iwork(p),1), ldv )
1600  6972 CONTINUE
1601  temp1 = sqrt(float(n))*epsln
1602  DO 6971 p = 1, n
1603  xsc = one / snrm2( n, v(1,p), 1 )
1604  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1605  $ CALL sscal( n, xsc, v(1,p), 1 )
1606  6971 CONTINUE
1607 *
1608 * Assemble the left singular vector matrix U (M x N).
1609 *
1610  IF ( n .LT. m ) THEN
1611  CALL slaset( 'A', m-n, n, zero, zero, u(n+1,1), ldu )
1612  IF ( n .LT. n1 ) THEN
1613  CALL slaset( 'A',n, n1-n, zero, zero, u(1,n+1),ldu )
1614  CALL slaset( 'A',m-n,n1-n, zero, one,u(n+1,n+1),ldu )
1615  END IF
1616  END IF
1617  CALL sormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1618  $ ldu, work(n+1), lwork-n, ierr )
1619  temp1 = sqrt(float(m))*epsln
1620  DO 6973 p = 1, n1
1621  xsc = one / snrm2( m, u(1,p), 1 )
1622  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1623  $ CALL sscal( m, xsc, u(1,p), 1 )
1624  6973 CONTINUE
1625 *
1626  IF ( rowpiv )
1627  $ CALL slaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1628 *
1629  END IF
1630 *
1631 * end of the >> almost orthogonal case << in the full SVD
1632 *
1633  ELSE
1634 *
1635 * This branch deploys a preconditioned Jacobi SVD with explicitly
1636 * accumulated rotations. It is included as optional, mainly for
1637 * experimental purposes. It does perfom well, and can also be used.
1638 * In this implementation, this branch will be automatically activated
1639 * if the condition number sigma_max(A) / sigma_min(A) is predicted
1640 * to be greater than the overflow threshold. This is because the
1641 * a posteriori computation of the singular vectors assumes robust
1642 * implementation of BLAS and some LAPACK procedures, capable of working
1643 * in presence of extreme values. Since that is not always the case, ...
1644 *
1645  DO 7968 p = 1, nr
1646  CALL scopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1647  7968 CONTINUE
1648 *
1649  IF ( l2pert ) THEN
1650  xsc = sqrt(small/epsln)
1651  DO 5969 q = 1, nr
1652  temp1 = xsc*abs( v(q,q) )
1653  DO 5968 p = 1, n
1654  IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
1655  $ .OR. ( p .LT. q ) )
1656  $ v(p,q) = sign( temp1, v(p,q) )
1657  IF ( p .LT. q ) v(p,q) = - v(p,q)
1658  5968 CONTINUE
1659  5969 CONTINUE
1660  ELSE
1661  CALL slaset( 'U', nr-1, nr-1, zero, zero, v(1,2), ldv )
1662  END IF
1663 
1664  CALL sgeqrf( n, nr, v, ldv, work(n+1), work(2*n+1),
1665  $ lwork-2*n, ierr )
1666  CALL slacpy( 'L', n, nr, v, ldv, work(2*n+1), n )
1667 *
1668  DO 7969 p = 1, nr
1669  CALL scopy( nr-p+1, v(p,p), ldv, u(p,p), 1 )
1670  7969 CONTINUE
1671 
1672  IF ( l2pert ) THEN
1673  xsc = sqrt(small/epsln)
1674  DO 9970 q = 2, nr
1675  DO 9971 p = 1, q - 1
1676  temp1 = xsc * min(abs(u(p,p)),abs(u(q,q)))
1677  u(p,q) = - sign( temp1, u(q,p) )
1678  9971 CONTINUE
1679  9970 CONTINUE
1680  ELSE
1681  CALL slaset('U', nr-1, nr-1, zero, zero, u(1,2), ldu )
1682  END IF
1683 
1684  CALL sgesvj( 'L', 'U', 'V', nr, nr, u, ldu, sva,
1685  $ n, v, ldv, work(2*n+n*nr+1), lwork-2*n-n*nr, info )
1686  scalem = work(2*n+n*nr+1)
1687  numrank = nint(work(2*n+n*nr+2))
1688 
1689  IF ( nr .LT. n ) THEN
1690  CALL slaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1691  CALL slaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1692  CALL slaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1693  END IF
1694 
1695  CALL sormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1696  $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1697 *
1698 * Permute the rows of V using the (column) permutation from the
1699 * first QRF. Also, scale the columns to make them unit in
1700 * Euclidean norm. This applies to all cases.
1701 *
1702  temp1 = sqrt(float(n)) * epsln
1703  DO 7972 q = 1, n
1704  DO 8972 p = 1, n
1705  work(2*n+n*nr+nr+iwork(p)) = v(p,q)
1706  8972 CONTINUE
1707  DO 8973 p = 1, n
1708  v(p,q) = work(2*n+n*nr+nr+p)
1709  8973 CONTINUE
1710  xsc = one / snrm2( n, v(1,q), 1 )
1711  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1712  $ CALL sscal( n, xsc, v(1,q), 1 )
1713  7972 CONTINUE
1714 *
1715 * At this moment, V contains the right singular vectors of A.
1716 * Next, assemble the left singular vector matrix U (M x N).
1717 *
1718  IF ( nr .LT. m ) THEN
1719  CALL slaset( 'A', m-nr, nr, zero, zero, u(nr+1,1), ldu )
1720  IF ( nr .LT. n1 ) THEN
1721  CALL slaset( 'A',nr, n1-nr, zero, zero, u(1,nr+1),ldu )
1722  CALL slaset( 'A',m-nr,n1-nr, zero, one,u(nr+1,nr+1),ldu )
1723  END IF
1724  END IF
1725 *
1726  CALL sormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1727  $ ldu, work(n+1), lwork-n, ierr )
1728 *
1729  IF ( rowpiv )
1730  $ CALL slaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1731 *
1732 *
1733  END IF
1734  IF ( transp ) THEN
1735 * .. swap U and V because the procedure worked on A^t
1736  DO 6974 p = 1, n
1737  CALL sswap( n, u(1,p), 1, v(1,p), 1 )
1738  6974 CONTINUE
1739  END IF
1740 *
1741  END IF
1742 * end of the full SVD
1743 *
1744 * Undo scaling, if necessary (and possible)
1745 *
1746  IF ( uscal2 .LE. (big/sva(1))*uscal1 ) THEN
1747  CALL slascl( 'G', 0, 0, uscal1, uscal2, nr, 1, sva, n, ierr )
1748  uscal1 = one
1749  uscal2 = one
1750  END IF
1751 *
1752  IF ( nr .LT. n ) THEN
1753  DO 3004 p = nr+1, n
1754  sva(p) = zero
1755  3004 CONTINUE
1756  END IF
1757 *
1758  work(1) = uscal2 * scalem
1759  work(2) = uscal1
1760  IF ( errest ) work(3) = sconda
1761  IF ( lsvec .AND. rsvec ) THEN
1762  work(4) = condr1
1763  work(5) = condr2
1764  END IF
1765  IF ( l2tran ) THEN
1766  work(6) = entra
1767  work(7) = entrat
1768  END IF
1769 *
1770  iwork(1) = nr
1771  iwork(2) = numrank
1772  iwork(3) = warning
1773 *
1774  RETURN
1775 * ..
1776 * .. END OF SGEJSV
1777 * ..
real function snrm2(N, X, INCX)
SNRM2
Definition: snrm2.f:56
real function slamch(CMACH)
SLAMCH
Definition: slamch.f:69
subroutine sormlq(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
SORMLQ
Definition: sormlq.f:170
integer function isamax(N, SX, INCX)
ISAMAX
Definition: isamax.f:53
subroutine sormqr(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
SORMQR
Definition: sormqr.f:170
subroutine scopy(N, SX, INCX, SY, INCY)
SCOPY
Definition: scopy.f:53
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:62
subroutine sgelqf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
SGELQF
Definition: sgelqf.f:137
subroutine sgeqrf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
SGEQRF
Definition: sgeqrf.f:138
subroutine strsm(SIDE, UPLO, TRANSA, DIAG, M, N, ALPHA, A, LDA, B, LDB)
STRSM
Definition: strsm.f:183
subroutine spocon(UPLO, N, A, LDA, ANORM, RCOND, WORK, IWORK, INFO)
SPOCON
Definition: spocon.f:123
logical function lsame(CA, CB)
LSAME
Definition: lsame.f:55
subroutine sgesvj(JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, LDV, WORK, LWORK, INFO)
SGESVJ
Definition: sgesvj.f:325
subroutine sswap(N, SX, INCX, SY, INCY)
SSWAP
Definition: sswap.f:53
subroutine sscal(N, SA, SX, INCX)
SSCAL
Definition: sscal.f:55
subroutine slassq(N, X, INCX, SCALE, SUMSQ)
SLASSQ updates a sum of squares represented in scaled form.
Definition: slassq.f:105
subroutine slacpy(UPLO, M, N, A, LDA, B, LDB)
SLACPY copies all or part of one two-dimensional array to another.
Definition: slacpy.f:105
subroutine sgeqp3(M, N, A, LDA, JPVT, TAU, WORK, LWORK, INFO)
SGEQP3
Definition: sgeqp3.f:153
subroutine slascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
SLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: slascl.f:141
subroutine sorgqr(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
SORGQR
Definition: sorgqr.f:130
subroutine slaswp(N, A, LDA, K1, K2, IPIV, INCX)
SLASWP performs a series of row interchanges on a general rectangular matrix.
Definition: slaswp.f:116
subroutine slaset(UPLO, M, N, ALPHA, BETA, A, LDA)
SLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values...
Definition: slaset.f:112

Here is the call graph for this function:

Here is the caller graph for this function:

subroutine sgesdd ( character  JOBZ,
integer  M,
integer  N,
real, dimension( lda, * )  A,
integer  LDA,
real, dimension( * )  S,
real, dimension( ldu, * )  U,
integer  LDU,
real, dimension( ldvt, * )  VT,
integer  LDVT,
real, dimension( * )  WORK,
integer  LWORK,
integer, dimension( * )  IWORK,
integer  INFO 
)

SGESDD

Download SGESDD + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 SGESDD computes the singular value decomposition (SVD) of a real
 M-by-N matrix A, optionally computing the left and right singular
 vectors.  If singular vectors are desired, it uses a
 divide-and-conquer algorithm.

 The SVD is written

      A = U * SIGMA * transpose(V)

 where SIGMA is an M-by-N matrix which is zero except for its
 min(m,n) diagonal elements, U is an M-by-M orthogonal matrix, and
 V is an N-by-N orthogonal matrix.  The diagonal elements of SIGMA
 are the singular values of A; they are real and non-negative, and
 are returned in descending order.  The first min(m,n) columns of
 U and V are the left and right singular vectors of A.

 Note that the routine returns VT = V**T, not V.

 The divide and conquer algorithm makes very mild assumptions about
 floating point arithmetic. It will work on machines with a guard
 digit in add/subtract, or on those binary machines without guard
 digits which subtract like the Cray X-MP, Cray Y-MP, Cray C-90, or
 Cray-2. It could conceivably fail on hexadecimal or decimal machines
 without guard digits, but we know of none.
Parameters
[in]JOBZ
          JOBZ is CHARACTER*1
          Specifies options for computing all or part of the matrix U:
          = 'A':  all M columns of U and all N rows of V**T are
                  returned in the arrays U and VT;
          = 'S':  the first min(M,N) columns of U and the first
                  min(M,N) rows of V**T are returned in the arrays U
                  and VT;
          = 'O':  If M >= N, the first N columns of U are overwritten
                  on the array A and all rows of V**T are returned in
                  the array VT;
                  otherwise, all columns of U are returned in the
                  array U and the first M rows of V**T are overwritten
                  in the array A;
          = 'N':  no columns of U or rows of V**T are computed.
[in]M
          M is INTEGER
          The number of rows of the input matrix A.  M >= 0.
[in]N
          N is INTEGER
          The number of columns of the input matrix A.  N >= 0.
[in,out]A
          A is REAL array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
          On exit,
          if JOBZ = 'O',  A is overwritten with the first N columns
                          of U (the left singular vectors, stored
                          columnwise) if M >= N;
                          A is overwritten with the first M rows
                          of V**T (the right singular vectors, stored
                          rowwise) otherwise.
          if JOBZ .ne. 'O', the contents of A are destroyed.
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]S
          S is REAL array, dimension (min(M,N))
          The singular values of A, sorted so that S(i) >= S(i+1).
[out]U
          U is REAL array, dimension (LDU,UCOL)
          UCOL = M if JOBZ = 'A' or JOBZ = 'O' and M < N;
          UCOL = min(M,N) if JOBZ = 'S'.
          If JOBZ = 'A' or JOBZ = 'O' and M < N, U contains the M-by-M
          orthogonal matrix U;
          if JOBZ = 'S', U contains the first min(M,N) columns of U
          (the left singular vectors, stored columnwise);
          if JOBZ = 'O' and M >= N, or JOBZ = 'N', U is not referenced.
[in]LDU
          LDU is INTEGER
          The leading dimension of the array U.  LDU >= 1; if
          JOBZ = 'S' or 'A' or JOBZ = 'O' and M < N, LDU >= M.
[out]VT
          VT is REAL array, dimension (LDVT,N)
          If JOBZ = 'A' or JOBZ = 'O' and M >= N, VT contains the
          N-by-N orthogonal matrix V**T;
          if JOBZ = 'S', VT contains the first min(M,N) rows of
          V**T (the right singular vectors, stored rowwise);
          if JOBZ = 'O' and M < N, or JOBZ = 'N', VT is not referenced.
[in]LDVT
          LDVT is INTEGER
          The leading dimension of the array VT.  LDVT >= 1; if
          JOBZ = 'A' or JOBZ = 'O' and M >= N, LDVT >= N;
          if JOBZ = 'S', LDVT >= min(M,N).
[out]WORK
          WORK is REAL array, dimension (MAX(1,LWORK))
          On exit, if INFO = 0, WORK(1) returns the optimal LWORK;
[in]LWORK
          LWORK is INTEGER
          The dimension of the array WORK. LWORK >= 1.
          If JOBZ = 'N',
            LWORK >= 3*min(M,N) + max(max(M,N),6*min(M,N)).
          If JOBZ = 'O',
            LWORK >= 3*min(M,N) + 
                     max(max(M,N),5*min(M,N)*min(M,N)+4*min(M,N)).
          If JOBZ = 'S' or 'A'
            LWORK >= min(M,N)*(7+4*min(M,N))
          For good performance, LWORK should generally be larger.
          If LWORK = -1 but other input arguments are legal, WORK(1)
          returns the optimal LWORK.
[out]IWORK
          IWORK is INTEGER array, dimension (8*min(M,N))
[out]INFO
          INFO is INTEGER
          = 0:  successful exit.
          < 0:  if INFO = -i, the i-th argument had an illegal value.
          > 0:  SBDSDC did not converge, updating process failed.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Date
November 2015
Contributors:
Ming Gu and Huan Ren, Computer Science Division, University of California at Berkeley, USA

Definition at line 218 of file sgesdd.f.

218 *
219 * -- LAPACK driver routine (version 3.6.0) --
220 * -- LAPACK is a software package provided by Univ. of Tennessee, --
221 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
222 * November 2015
223 *
224 * .. Scalar Arguments ..
225  CHARACTER jobz
226  INTEGER info, lda, ldu, ldvt, lwork, m, n
227 * ..
228 * .. Array Arguments ..
229  INTEGER iwork( * )
230  REAL a( lda, * ), s( * ), u( ldu, * ),
231  $ vt( ldvt, * ), work( * )
232 * ..
233 *
234 * =====================================================================
235 *
236 * .. Parameters ..
237  REAL zero, one
238  parameter( zero = 0.0e0, one = 1.0e0 )
239 * ..
240 * .. Local Scalars ..
241  LOGICAL lquery, wntqa, wntqas, wntqn, wntqo, wntqs
242  INTEGER bdspac, blk, chunk, i, ie, ierr, il,
243  $ ir, iscl, itau, itaup, itauq, iu, ivt, ldwkvt,
244  $ ldwrkl, ldwrkr, ldwrku, maxwrk, minmn, minwrk,
245  $ mnthr, nwork, wrkbl
246  REAL anrm, bignum, eps, smlnum
247 * ..
248 * .. Local Arrays ..
249  INTEGER idum( 1 )
250  REAL dum( 1 )
251 * ..
252 * .. External Subroutines ..
253  EXTERNAL sbdsdc, sgebrd, sgelqf, sgemm, sgeqrf, slacpy,
255  $ xerbla
256 * ..
257 * .. External Functions ..
258  LOGICAL lsame
259  INTEGER ilaenv
260  REAL slamch, slange
261  EXTERNAL ilaenv, lsame, slamch, slange
262 * ..
263 * .. Intrinsic Functions ..
264  INTRINSIC int, max, min, sqrt
265 * ..
266 * .. Executable Statements ..
267 *
268 * Test the input arguments
269 *
270  info = 0
271  minmn = min( m, n )
272  wntqa = lsame( jobz, 'A' )
273  wntqs = lsame( jobz, 'S' )
274  wntqas = wntqa .OR. wntqs
275  wntqo = lsame( jobz, 'O' )
276  wntqn = lsame( jobz, 'N' )
277  lquery = ( lwork.EQ.-1 )
278 *
279  IF( .NOT.( wntqa .OR. wntqs .OR. wntqo .OR. wntqn ) ) THEN
280  info = -1
281  ELSE IF( m.LT.0 ) THEN
282  info = -2
283  ELSE IF( n.LT.0 ) THEN
284  info = -3
285  ELSE IF( lda.LT.max( 1, m ) ) THEN
286  info = -5
287  ELSE IF( ldu.LT.1 .OR. ( wntqas .AND. ldu.LT.m ) .OR.
288  $ ( wntqo .AND. m.LT.n .AND. ldu.LT.m ) ) THEN
289  info = -8
290  ELSE IF( ldvt.LT.1 .OR. ( wntqa .AND. ldvt.LT.n ) .OR.
291  $ ( wntqs .AND. ldvt.LT.minmn ) .OR.
292  $ ( wntqo .AND. m.GE.n .AND. ldvt.LT.n ) ) THEN
293  info = -10
294  END IF
295 *
296 * Compute workspace
297 * (Note: Comments in the code beginning "Workspace:" describe the
298 * minimal amount of workspace needed at that point in the code,
299 * as well as the preferred amount for good performance.
300 * NB refers to the optimal block size for the immediately
301 * following subroutine, as returned by ILAENV.)
302 *
303  IF( info.EQ.0 ) THEN
304  minwrk = 1
305  maxwrk = 1
306  IF( m.GE.n .AND. minmn.GT.0 ) THEN
307 *
308 * Compute space needed for SBDSDC
309 *
310  mnthr = int( minmn*11.0e0 / 6.0e0 )
311  IF( wntqn ) THEN
312  bdspac = 7*n
313  ELSE
314  bdspac = 3*n*n + 4*n
315  END IF
316  IF( m.GE.mnthr ) THEN
317  IF( wntqn ) THEN
318 *
319 * Path 1 (M much larger than N, JOBZ='N')
320 *
321  wrkbl = n + n*ilaenv( 1, 'SGEQRF', ' ', m, n, -1,
322  $ -1 )
323  wrkbl = max( wrkbl, 3*n+2*n*
324  $ ilaenv( 1, 'SGEBRD', ' ', n, n, -1, -1 ) )
325  maxwrk = max( wrkbl, bdspac+n )
326  minwrk = bdspac + n
327  ELSE IF( wntqo ) THEN
328 *
329 * Path 2 (M much larger than N, JOBZ='O')
330 *
331  wrkbl = n + n*ilaenv( 1, 'SGEQRF', ' ', m, n, -1, -1 )
332  wrkbl = max( wrkbl, n+n*ilaenv( 1, 'SORGQR', ' ', m,
333  $ n, n, -1 ) )
334  wrkbl = max( wrkbl, 3*n+2*n*
335  $ ilaenv( 1, 'SGEBRD', ' ', n, n, -1, -1 ) )
336  wrkbl = max( wrkbl, 3*n+n*
337  $ ilaenv( 1, 'SORMBR', 'QLN', n, n, n, -1 ) )
338  wrkbl = max( wrkbl, 3*n+n*
339  $ ilaenv( 1, 'SORMBR', 'PRT', n, n, n, -1 ) )
340  wrkbl = max( wrkbl, bdspac+3*n )
341  maxwrk = wrkbl + 2*n*n
342  minwrk = bdspac + 2*n*n + 3*n
343  ELSE IF( wntqs ) THEN
344 *
345 * Path 3 (M much larger than N, JOBZ='S')
346 *
347  wrkbl = n + n*ilaenv( 1, 'SGEQRF', ' ', m, n, -1, -1 )
348  wrkbl = max( wrkbl, n+n*ilaenv( 1, 'SORGQR', ' ', m,
349  $ n, n, -1 ) )
350  wrkbl = max( wrkbl, 3*n+2*n*
351  $ ilaenv( 1, 'SGEBRD', ' ', n, n, -1, -1 ) )
352  wrkbl = max( wrkbl, 3*n+n*
353  $ ilaenv( 1, 'SORMBR', 'QLN', n, n, n, -1 ) )
354  wrkbl = max( wrkbl, 3*n+n*
355  $ ilaenv( 1, 'SORMBR', 'PRT', n, n, n, -1 ) )
356  wrkbl = max( wrkbl, bdspac+3*n )
357  maxwrk = wrkbl + n*n
358  minwrk = bdspac + n*n + 3*n
359  ELSE IF( wntqa ) THEN
360 *
361 * Path 4 (M much larger than N, JOBZ='A')
362 *
363  wrkbl = n + n*ilaenv( 1, 'SGEQRF', ' ', m, n, -1, -1 )
364  wrkbl = max( wrkbl, n+m*ilaenv( 1, 'SORGQR', ' ', m,
365  $ m, n, -1 ) )
366  wrkbl = max( wrkbl, 3*n+2*n*
367  $ ilaenv( 1, 'SGEBRD', ' ', n, n, -1, -1 ) )
368  wrkbl = max( wrkbl, 3*n+n*
369  $ ilaenv( 1, 'SORMBR', 'QLN', n, n, n, -1 ) )
370  wrkbl = max( wrkbl, 3*n+n*
371  $ ilaenv( 1, 'SORMBR', 'PRT', n, n, n, -1 ) )
372  wrkbl = max( wrkbl, bdspac+3*n )
373  maxwrk = wrkbl + n*n
374  minwrk = bdspac + n*n + 2*n + m
375  END IF
376  ELSE
377 *
378 * Path 5 (M at least N, but not much larger)
379 *
380  wrkbl = 3*n + ( m+n )*ilaenv( 1, 'SGEBRD', ' ', m, n, -1,
381  $ -1 )
382  IF( wntqn ) THEN
383  maxwrk = max( wrkbl, bdspac+3*n )
384  minwrk = 3*n + max( m, bdspac )
385  ELSE IF( wntqo ) THEN
386  wrkbl = max( wrkbl, 3*n+n*
387  $ ilaenv( 1, 'SORMBR', 'QLN', m, n, n, -1 ) )
388  wrkbl = max( wrkbl, 3*n+n*
389  $ ilaenv( 1, 'SORMBR', 'PRT', n, n, n, -1 ) )
390  wrkbl = max( wrkbl, bdspac+3*n )
391  maxwrk = wrkbl + m*n
392  minwrk = 3*n + max( m, n*n+bdspac )
393  ELSE IF( wntqs ) THEN
394  wrkbl = max( wrkbl, 3*n+n*
395  $ ilaenv( 1, 'SORMBR', 'QLN', m, n, n, -1 ) )
396  wrkbl = max( wrkbl, 3*n+n*
397  $ ilaenv( 1, 'SORMBR', 'PRT', n, n, n, -1 ) )
398  maxwrk = max( wrkbl, bdspac+3*n )
399  minwrk = 3*n + max( m, bdspac )
400  ELSE IF( wntqa ) THEN
401  wrkbl = max( wrkbl, 3*n+m*
402  $ ilaenv( 1, 'SORMBR', 'QLN', m, m, n, -1 ) )
403  wrkbl = max( wrkbl, 3*n+n*
404  $ ilaenv( 1, 'SORMBR', 'PRT', n, n, n, -1 ) )
405  maxwrk = max( maxwrk, bdspac+3*n )
406  minwrk = 3*n + max( m, bdspac )
407  END IF
408  END IF
409  ELSE IF ( minmn.GT.0 ) THEN
410 *
411 * Compute space needed for SBDSDC
412 *
413  mnthr = int( minmn*11.0e0 / 6.0e0 )
414  IF( wntqn ) THEN
415  bdspac = 7*m
416  ELSE
417  bdspac = 3*m*m + 4*m
418  END IF
419  IF( n.GE.mnthr ) THEN
420  IF( wntqn ) THEN
421 *
422 * Path 1t (N much larger than M, JOBZ='N')
423 *
424  wrkbl = m + m*ilaenv( 1, 'SGELQF', ' ', m, n, -1,
425  $ -1 )
426  wrkbl = max( wrkbl, 3*m+2*m*
427  $ ilaenv( 1, 'SGEBRD', ' ', m, m, -1, -1 ) )
428  maxwrk = max( wrkbl, bdspac+m )
429  minwrk = bdspac + m
430  ELSE IF( wntqo ) THEN
431 *
432 * Path 2t (N much larger than M, JOBZ='O')
433 *
434  wrkbl = m + m*ilaenv( 1, 'SGELQF', ' ', m, n, -1, -1 )
435  wrkbl = max( wrkbl, m+m*ilaenv( 1, 'SORGLQ', ' ', m,
436  $ n, m, -1 ) )
437  wrkbl = max( wrkbl, 3*m+2*m*
438  $ ilaenv( 1, 'SGEBRD', ' ', m, m, -1, -1 ) )
439  wrkbl = max( wrkbl, 3*m+m*
440  $ ilaenv( 1, 'SORMBR', 'QLN', m, m, m, -1 ) )
441  wrkbl = max( wrkbl, 3*m+m*
442  $ ilaenv( 1, 'SORMBR', 'PRT', m, m, m, -1 ) )
443  wrkbl = max( wrkbl, bdspac+3*m )
444  maxwrk = wrkbl + 2*m*m
445  minwrk = bdspac + 2*m*m + 3*m
446  ELSE IF( wntqs ) THEN
447 *
448 * Path 3t (N much larger than M, JOBZ='S')
449 *
450  wrkbl = m + m*ilaenv( 1, 'SGELQF', ' ', m, n, -1, -1 )
451  wrkbl = max( wrkbl, m+m*ilaenv( 1, 'SORGLQ', ' ', m,
452  $ n, m, -1 ) )
453  wrkbl = max( wrkbl, 3*m+2*m*
454  $ ilaenv( 1, 'SGEBRD', ' ', m, m, -1, -1 ) )
455  wrkbl = max( wrkbl, 3*m+m*
456  $ ilaenv( 1, 'SORMBR', 'QLN', m, m, m, -1 ) )
457  wrkbl = max( wrkbl, 3*m+m*
458  $ ilaenv( 1, 'SORMBR', 'PRT', m, m, m, -1 ) )
459  wrkbl = max( wrkbl, bdspac+3*m )
460  maxwrk = wrkbl + m*m
461  minwrk = bdspac + m*m + 3*m
462  ELSE IF( wntqa ) THEN
463 *
464 * Path 4t (N much larger than M, JOBZ='A')
465 *
466  wrkbl = m + m*ilaenv( 1, 'SGELQF', ' ', m, n, -1, -1 )
467  wrkbl = max( wrkbl, m+n*ilaenv( 1, 'SORGLQ', ' ', n,
468  $ n, m, -1 ) )
469  wrkbl = max( wrkbl, 3*m+2*m*
470  $ ilaenv( 1, 'SGEBRD', ' ', m, m, -1, -1 ) )
471  wrkbl = max( wrkbl, 3*m+m*
472  $ ilaenv( 1, 'SORMBR', 'QLN', m, m, m, -1 ) )
473  wrkbl = max( wrkbl, 3*m+m*
474  $ ilaenv( 1, 'SORMBR', 'PRT', m, m, m, -1 ) )
475  wrkbl = max( wrkbl, bdspac+3*m )
476  maxwrk = wrkbl + m*m
477  minwrk = bdspac + m*m + 3*m
478  END IF
479  ELSE
480 *
481 * Path 5t (N greater than M, but not much larger)
482 *
483  wrkbl = 3*m + ( m+n )*ilaenv( 1, 'SGEBRD', ' ', m, n, -1,
484  $ -1 )
485  IF( wntqn ) THEN
486  maxwrk = max( wrkbl, bdspac+3*m )
487  minwrk = 3*m + max( n, bdspac )
488  ELSE IF( wntqo ) THEN
489  wrkbl = max( wrkbl, 3*m+m*
490  $ ilaenv( 1, 'SORMBR', 'QLN', m, m, n, -1 ) )
491  wrkbl = max( wrkbl, 3*m+m*
492  $ ilaenv( 1, 'SORMBR', 'PRT', m, n, m, -1 ) )
493  wrkbl = max( wrkbl, bdspac+3*m )
494  maxwrk = wrkbl + m*n
495  minwrk = 3*m + max( n, m*m+bdspac )
496  ELSE IF( wntqs ) THEN
497  wrkbl = max( wrkbl, 3*m+m*
498  $ ilaenv( 1, 'SORMBR', 'QLN', m, m, n, -1 ) )
499  wrkbl = max( wrkbl, 3*m+m*
500  $ ilaenv( 1, 'SORMBR', 'PRT', m, n, m, -1 ) )
501  maxwrk = max( wrkbl, bdspac+3*m )
502  minwrk = 3*m + max( n, bdspac )
503  ELSE IF( wntqa ) THEN
504  wrkbl = max( wrkbl, 3*m+m*
505  $ ilaenv( 1, 'SORMBR', 'QLN', m, m, n, -1 ) )
506  wrkbl = max( wrkbl, 3*m+m*
507  $ ilaenv( 1, 'SORMBR', 'PRT', n, n, m, -1 ) )
508  maxwrk = max( wrkbl, bdspac+3*m )
509  minwrk = 3*m + max( n, bdspac )
510  END IF
511  END IF
512  END IF
513  maxwrk = max( maxwrk, minwrk )
514  work( 1 ) = maxwrk
515 *
516  IF( lwork.LT.minwrk .AND. .NOT.lquery ) THEN
517  info = -12
518  END IF
519  END IF
520 *
521  IF( info.NE.0 ) THEN
522  CALL xerbla( 'SGESDD', -info )
523  RETURN
524  ELSE IF( lquery ) THEN
525  RETURN
526  END IF
527 *
528 * Quick return if possible
529 *
530  IF( m.EQ.0 .OR. n.EQ.0 ) THEN
531  RETURN
532  END IF
533 *
534 * Get machine constants
535 *
536  eps = slamch( 'P' )
537  smlnum = sqrt( slamch( 'S' ) ) / eps
538  bignum = one / smlnum
539 *
540 * Scale A if max element outside range [SMLNUM,BIGNUM]
541 *
542  anrm = slange( 'M', m, n, a, lda, dum )
543  iscl = 0
544  IF( anrm.GT.zero .AND. anrm.LT.smlnum ) THEN
545  iscl = 1
546  CALL slascl( 'G', 0, 0, anrm, smlnum, m, n, a, lda, ierr )
547  ELSE IF( anrm.GT.bignum ) THEN
548  iscl = 1
549  CALL slascl( 'G', 0, 0, anrm, bignum, m, n, a, lda, ierr )
550  END IF
551 *
552  IF( m.GE.n ) THEN
553 *
554 * A has at least as many rows as columns. If A has sufficiently
555 * more rows than columns, first reduce using the QR
556 * decomposition (if sufficient workspace available)
557 *
558  IF( m.GE.mnthr ) THEN
559 *
560  IF( wntqn ) THEN
561 *
562 * Path 1 (M much larger than N, JOBZ='N')
563 * No singular vectors to be computed
564 *
565  itau = 1
566  nwork = itau + n
567 *
568 * Compute A=Q*R
569 * (Workspace: need 2*N, prefer N+N*NB)
570 *
571  CALL sgeqrf( m, n, a, lda, work( itau ), work( nwork ),
572  $ lwork-nwork+1, ierr )
573 *
574 * Zero out below R
575 *
576  CALL slaset( 'L', n-1, n-1, zero, zero, a( 2, 1 ), lda )
577  ie = 1
578  itauq = ie + n
579  itaup = itauq + n
580  nwork = itaup + n
581 *
582 * Bidiagonalize R in A
583 * (Workspace: need 4*N, prefer 3*N+2*N*NB)
584 *
585  CALL sgebrd( n, n, a, lda, s, work( ie ), work( itauq ),
586  $ work( itaup ), work( nwork ), lwork-nwork+1,
587  $ ierr )
588  nwork = ie + n
589 *
590 * Perform bidiagonal SVD, computing singular values only
591 * (Workspace: need N+BDSPAC)
592 *
593  CALL sbdsdc( 'U', 'N', n, s, work( ie ), dum, 1, dum, 1,
594  $ dum, idum, work( nwork ), iwork, info )
595 *
596  ELSE IF( wntqo ) THEN
597 *
598 * Path 2 (M much larger than N, JOBZ = 'O')
599 * N left singular vectors to be overwritten on A and
600 * N right singular vectors to be computed in VT
601 *
602  ir = 1
603 *
604 * WORK(IR) is LDWRKR by N
605 *
606  IF( lwork.GE.lda*n+n*n+3*n+bdspac ) THEN
607  ldwrkr = lda
608  ELSE
609  ldwrkr = ( lwork-n*n-3*n-bdspac ) / n
610  END IF
611  itau = ir + ldwrkr*n
612  nwork = itau + n
613 *
614 * Compute A=Q*R
615 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
616 *
617  CALL sgeqrf( m, n, a, lda, work( itau ), work( nwork ),
618  $ lwork-nwork+1, ierr )
619 *
620 * Copy R to WORK(IR), zeroing out below it
621 *
622  CALL slacpy( 'U', n, n, a, lda, work( ir ), ldwrkr )
623  CALL slaset( 'L', n-1, n-1, zero, zero, work( ir+1 ),
624  $ ldwrkr )
625 *
626 * Generate Q in A
627 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
628 *
629  CALL sorgqr( m, n, n, a, lda, work( itau ),
630  $ work( nwork ), lwork-nwork+1, ierr )
631  ie = itau
632  itauq = ie + n
633  itaup = itauq + n
634  nwork = itaup + n
635 *
636 * Bidiagonalize R in VT, copying result to WORK(IR)
637 * (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB)
638 *
639  CALL sgebrd( n, n, work( ir ), ldwrkr, s, work( ie ),
640  $ work( itauq ), work( itaup ), work( nwork ),
641  $ lwork-nwork+1, ierr )
642 *
643 * WORK(IU) is N by N
644 *
645  iu = nwork
646  nwork = iu + n*n
647 *
648 * Perform bidiagonal SVD, computing left singular vectors
649 * of bidiagonal matrix in WORK(IU) and computing right
650 * singular vectors of bidiagonal matrix in VT
651 * (Workspace: need N+N*N+BDSPAC)
652 *
653  CALL sbdsdc( 'U', 'I', n, s, work( ie ), work( iu ), n,
654  $ vt, ldvt, dum, idum, work( nwork ), iwork,
655  $ info )
656 *
657 * Overwrite WORK(IU) by left singular vectors of R
658 * and VT by right singular vectors of R
659 * (Workspace: need 2*N*N+3*N, prefer 2*N*N+2*N+N*NB)
660 *
661  CALL sormbr( 'Q', 'L', 'N', n, n, n, work( ir ), ldwrkr,
662  $ work( itauq ), work( iu ), n, work( nwork ),
663  $ lwork-nwork+1, ierr )
664  CALL sormbr( 'P', 'R', 'T', n, n, n, work( ir ), ldwrkr,
665  $ work( itaup ), vt, ldvt, work( nwork ),
666  $ lwork-nwork+1, ierr )
667 *
668 * Multiply Q in A by left singular vectors of R in
669 * WORK(IU), storing result in WORK(IR) and copying to A
670 * (Workspace: need 2*N*N, prefer N*N+M*N)
671 *
672  DO 10 i = 1, m, ldwrkr
673  chunk = min( m-i+1, ldwrkr )
674  CALL sgemm( 'N', 'N', chunk, n, n, one, a( i, 1 ),
675  $ lda, work( iu ), n, zero, work( ir ),
676  $ ldwrkr )
677  CALL slacpy( 'F', chunk, n, work( ir ), ldwrkr,
678  $ a( i, 1 ), lda )
679  10 CONTINUE
680 *
681  ELSE IF( wntqs ) THEN
682 *
683 * Path 3 (M much larger than N, JOBZ='S')
684 * N left singular vectors to be computed in U and
685 * N right singular vectors to be computed in VT
686 *
687  ir = 1
688 *
689 * WORK(IR) is N by N
690 *
691  ldwrkr = n
692  itau = ir + ldwrkr*n
693  nwork = itau + n
694 *
695 * Compute A=Q*R
696 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
697 *
698  CALL sgeqrf( m, n, a, lda, work( itau ), work( nwork ),
699  $ lwork-nwork+1, ierr )
700 *
701 * Copy R to WORK(IR), zeroing out below it
702 *
703  CALL slacpy( 'U', n, n, a, lda, work( ir ), ldwrkr )
704  CALL slaset( 'L', n-1, n-1, zero, zero, work( ir+1 ),
705  $ ldwrkr )
706 *
707 * Generate Q in A
708 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
709 *
710  CALL sorgqr( m, n, n, a, lda, work( itau ),
711  $ work( nwork ), lwork-nwork+1, ierr )
712  ie = itau
713  itauq = ie + n
714  itaup = itauq + n
715  nwork = itaup + n
716 *
717 * Bidiagonalize R in WORK(IR)
718 * (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB)
719 *
720  CALL sgebrd( n, n, work( ir ), ldwrkr, s, work( ie ),
721  $ work( itauq ), work( itaup ), work( nwork ),
722  $ lwork-nwork+1, ierr )
723 *
724 * Perform bidiagonal SVD, computing left singular vectors
725 * of bidiagoal matrix in U and computing right singular
726 * vectors of bidiagonal matrix in VT
727 * (Workspace: need N+BDSPAC)
728 *
729  CALL sbdsdc( 'U', 'I', n, s, work( ie ), u, ldu, vt,
730  $ ldvt, dum, idum, work( nwork ), iwork,
731  $ info )
732 *
733 * Overwrite U by left singular vectors of R and VT
734 * by right singular vectors of R
735 * (Workspace: need N*N+3*N, prefer N*N+2*N+N*NB)
736 *
737  CALL sormbr( 'Q', 'L', 'N', n, n, n, work( ir ), ldwrkr,
738  $ work( itauq ), u, ldu, work( nwork ),
739  $ lwork-nwork+1, ierr )
740 *
741  CALL sormbr( 'P', 'R', 'T', n, n, n, work( ir ), ldwrkr,
742  $ work( itaup ), vt, ldvt, work( nwork ),
743  $ lwork-nwork+1, ierr )
744 *
745 * Multiply Q in A by left singular vectors of R in
746 * WORK(IR), storing result in U
747 * (Workspace: need N*N)
748 *
749  CALL slacpy( 'F', n, n, u, ldu, work( ir ), ldwrkr )
750  CALL sgemm( 'N', 'N', m, n, n, one, a, lda, work( ir ),
751  $ ldwrkr, zero, u, ldu )
752 *
753  ELSE IF( wntqa ) THEN
754 *
755 * Path 4 (M much larger than N, JOBZ='A')
756 * M left singular vectors to be computed in U and
757 * N right singular vectors to be computed in VT
758 *
759  iu = 1
760 *
761 * WORK(IU) is N by N
762 *
763  ldwrku = n
764  itau = iu + ldwrku*n
765  nwork = itau + n
766 *
767 * Compute A=Q*R, copying result to U
768 * (Workspace: need N*N+N+M, prefer N*N+N+M*NB)
769 *
770  CALL sgeqrf( m, n, a, lda, work( itau ), work( nwork ),
771  $ lwork-nwork+1, ierr )
772  CALL slacpy( 'L', m, n, a, lda, u, ldu )
773 *
774 * Generate Q in U
775 * (Workspace: need N*N+N+M, prefer N*N+N+M*NB)
776  CALL sorgqr( m, m, n, u, ldu, work( itau ),
777  $ work( nwork ), lwork-nwork+1, ierr )
778 *
779 * Produce R in A, zeroing out other entries
780 *
781  CALL slaset( 'L', n-1, n-1, zero, zero, a( 2, 1 ), lda )
782  ie = itau
783  itauq = ie + n
784  itaup = itauq + n
785  nwork = itaup + n
786 *
787 * Bidiagonalize R in A
788 * (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB)
789 *
790  CALL sgebrd( n, n, a, lda, s, work( ie ), work( itauq ),
791  $ work( itaup ), work( nwork ), lwork-nwork+1,
792  $ ierr )
793 *
794 * Perform bidiagonal SVD, computing left singular vectors
795 * of bidiagonal matrix in WORK(IU) and computing right
796 * singular vectors of bidiagonal matrix in VT
797 * (Workspace: need N+N*N+BDSPAC)
798 *
799  CALL sbdsdc( 'U', 'I', n, s, work( ie ), work( iu ), n,
800  $ vt, ldvt, dum, idum, work( nwork ), iwork,
801  $ info )
802 *
803 * Overwrite WORK(IU) by left singular vectors of R and VT
804 * by right singular vectors of R
805 * (Workspace: need N*N+3*N, prefer N*N+2*N+N*NB)
806 *
807  CALL sormbr( 'Q', 'L', 'N', n, n, n, a, lda,
808  $ work( itauq ), work( iu ), ldwrku,
809  $ work( nwork ), lwork-nwork+1, ierr )
810  CALL sormbr( 'P', 'R', 'T', n, n, n, a, lda,
811  $ work( itaup ), vt, ldvt, work( nwork ),
812  $ lwork-nwork+1, ierr )
813 *
814 * Multiply Q in U by left singular vectors of R in
815 * WORK(IU), storing result in A
816 * (Workspace: need N*N)
817 *
818  CALL sgemm( 'N', 'N', m, n, n, one, u, ldu, work( iu ),
819  $ ldwrku, zero, a, lda )
820 *
821 * Copy left singular vectors of A from A to U
822 *
823  CALL slacpy( 'F', m, n, a, lda, u, ldu )
824 *
825  END IF
826 *
827  ELSE
828 *
829 * M .LT. MNTHR
830 *
831 * Path 5 (M at least N, but not much larger)
832 * Reduce to bidiagonal form without QR decomposition
833 *
834  ie = 1
835  itauq = ie + n
836  itaup = itauq + n
837  nwork = itaup + n
838 *
839 * Bidiagonalize A
840 * (Workspace: need 3*N+M, prefer 3*N+(M+N)*NB)
841 *
842  CALL sgebrd( m, n, a, lda, s, work( ie ), work( itauq ),
843  $ work( itaup ), work( nwork ), lwork-nwork+1,
844  $ ierr )
845  IF( wntqn ) THEN
846 *
847 * Perform bidiagonal SVD, only computing singular values
848 * (Workspace: need N+BDSPAC)
849 *
850  CALL sbdsdc( 'U', 'N', n, s, work( ie ), dum, 1, dum, 1,
851  $ dum, idum, work( nwork ), iwork, info )
852  ELSE IF( wntqo ) THEN
853  iu = nwork
854  IF( lwork.GE.m*n+3*n+bdspac ) THEN
855 *
856 * WORK( IU ) is M by N
857 *
858  ldwrku = m
859  nwork = iu + ldwrku*n
860  CALL slaset( 'F', m, n, zero, zero, work( iu ),
861  $ ldwrku )
862  ELSE
863 *
864 * WORK( IU ) is N by N
865 *
866  ldwrku = n
867  nwork = iu + ldwrku*n
868 *
869 * WORK(IR) is LDWRKR by N
870 *
871  ir = nwork
872  ldwrkr = ( lwork-n*n-3*n ) / n
873  END IF
874  nwork = iu + ldwrku*n
875 *
876 * Perform bidiagonal SVD, computing left singular vectors
877 * of bidiagonal matrix in WORK(IU) and computing right
878 * singular vectors of bidiagonal matrix in VT
879 * (Workspace: need N+N*N+BDSPAC)
880 *
881  CALL sbdsdc( 'U', 'I', n, s, work( ie ), work( iu ),
882  $ ldwrku, vt, ldvt, dum, idum, work( nwork ),
883  $ iwork, info )
884 *
885 * Overwrite VT by right singular vectors of A
886 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
887 *
888  CALL sormbr( 'P', 'R', 'T', n, n, n, a, lda,
889  $ work( itaup ), vt, ldvt, work( nwork ),
890  $ lwork-nwork+1, ierr )
891 *
892  IF( lwork.GE.m*n+3*n+bdspac ) THEN
893 *
894 * Overwrite WORK(IU) by left singular vectors of A
895 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
896 *
897  CALL sormbr( 'Q', 'L', 'N', m, n, n, a, lda,
898  $ work( itauq ), work( iu ), ldwrku,
899  $ work( nwork ), lwork-nwork+1, ierr )
900 *
901 * Copy left singular vectors of A from WORK(IU) to A
902 *
903  CALL slacpy( 'F', m, n, work( iu ), ldwrku, a, lda )
904  ELSE
905 *
906 * Generate Q in A
907 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
908 *
909  CALL sorgbr( 'Q', m, n, n, a, lda, work( itauq ),
910  $ work( nwork ), lwork-nwork+1, ierr )
911 *
912 * Multiply Q in A by left singular vectors of
913 * bidiagonal matrix in WORK(IU), storing result in
914 * WORK(IR) and copying to A
915 * (Workspace: need 2*N*N, prefer N*N+M*N)
916 *
917  DO 20 i = 1, m, ldwrkr
918  chunk = min( m-i+1, ldwrkr )
919  CALL sgemm( 'N', 'N', chunk, n, n, one, a( i, 1 ),
920  $ lda, work( iu ), ldwrku, zero,
921  $ work( ir ), ldwrkr )
922  CALL slacpy( 'F', chunk, n, work( ir ), ldwrkr,
923  $ a( i, 1 ), lda )
924  20 CONTINUE
925  END IF
926 *
927  ELSE IF( wntqs ) THEN
928 *
929 * Perform bidiagonal SVD, computing left singular vectors
930 * of bidiagonal matrix in U and computing right singular
931 * vectors of bidiagonal matrix in VT
932 * (Workspace: need N+BDSPAC)
933 *
934  CALL slaset( 'F', m, n, zero, zero, u, ldu )
935  CALL sbdsdc( 'U', 'I', n, s, work( ie ), u, ldu, vt,
936  $ ldvt, dum, idum, work( nwork ), iwork,
937  $ info )
938 *
939 * Overwrite U by left singular vectors of A and VT
940 * by right singular vectors of A
941 * (Workspace: need 3*N, prefer 2*N+N*NB)
942 *
943  CALL sormbr( 'Q', 'L', 'N', m, n, n, a, lda,
944  $ work( itauq ), u, ldu, work( nwork ),
945  $ lwork-nwork+1, ierr )
946  CALL sormbr( 'P', 'R', 'T', n, n, n, a, lda,
947  $ work( itaup ), vt, ldvt, work( nwork ),
948  $ lwork-nwork+1, ierr )
949  ELSE IF( wntqa ) THEN
950 *
951 * Perform bidiagonal SVD, computing left singular vectors
952 * of bidiagonal matrix in U and computing right singular
953 * vectors of bidiagonal matrix in VT
954 * (Workspace: need N+BDSPAC)
955 *
956  CALL slaset( 'F', m, m, zero, zero, u, ldu )
957  CALL sbdsdc( 'U', 'I', n, s, work( ie ), u, ldu, vt,
958  $ ldvt, dum, idum, work( nwork ), iwork,
959  $ info )
960 *
961 * Set the right corner of U to identity matrix
962 *
963  IF( m.GT.n ) THEN
964  CALL slaset( 'F', m-n, m-n, zero, one, u( n+1, n+1 ),
965  $ ldu )
966  END IF
967 *
968 * Overwrite U by left singular vectors of A and VT
969 * by right singular vectors of A
970 * (Workspace: need N*N+2*N+M, prefer N*N+2*N+M*NB)
971 *
972  CALL sormbr( 'Q', 'L', 'N', m, m, n, a, lda,
973  $ work( itauq ), u, ldu, work( nwork ),
974  $ lwork-nwork+1, ierr )
975  CALL sormbr( 'P', 'R', 'T', n, n, m, a, lda,
976  $ work( itaup ), vt, ldvt, work( nwork ),
977  $ lwork-nwork+1, ierr )
978  END IF
979 *
980  END IF
981 *
982  ELSE
983 *
984 * A has more columns than rows. If A has sufficiently more
985 * columns than rows, first reduce using the LQ decomposition (if
986 * sufficient workspace available)
987 *
988  IF( n.GE.mnthr ) THEN
989 *
990  IF( wntqn ) THEN
991 *
992 * Path 1t (N much larger than M, JOBZ='N')
993 * No singular vectors to be computed
994 *
995  itau = 1
996  nwork = itau + m
997 *
998 * Compute A=L*Q
999 * (Workspace: need 2*M, prefer M+M*NB)
1000 *
1001  CALL sgelqf( m, n, a, lda, work( itau ), work( nwork ),
1002  $ lwork-nwork+1, ierr )
1003 *
1004 * Zero out above L
1005 *
1006  CALL slaset( 'U', m-1, m-1, zero, zero, a( 1, 2 ), lda )
1007  ie = 1
1008  itauq = ie + m
1009  itaup = itauq + m
1010  nwork = itaup + m
1011 *
1012 * Bidiagonalize L in A
1013 * (Workspace: need 4*M, prefer 3*M+2*M*NB)
1014 *
1015  CALL sgebrd( m, m, a, lda, s, work( ie ), work( itauq ),
1016  $ work( itaup ), work( nwork ), lwork-nwork+1,
1017  $ ierr )
1018  nwork = ie + m
1019 *
1020 * Perform bidiagonal SVD, computing singular values only
1021 * (Workspace: need M+BDSPAC)
1022 *
1023  CALL sbdsdc( 'U', 'N', m, s, work( ie ), dum, 1, dum, 1,
1024  $ dum, idum, work( nwork ), iwork, info )
1025 *
1026  ELSE IF( wntqo ) THEN
1027 *
1028 * Path 2t (N much larger than M, JOBZ='O')
1029 * M right singular vectors to be overwritten on A and
1030 * M left singular vectors to be computed in U
1031 *
1032  ivt = 1
1033 *
1034 * IVT is M by M
1035 *
1036  il = ivt + m*m
1037  IF( lwork.GE.m*n+m*m+3*m+bdspac ) THEN
1038 *
1039 * WORK(IL) is M by N
1040 *
1041  ldwrkl = m
1042  chunk = n
1043  ELSE
1044  ldwrkl = m
1045  chunk = ( lwork-m*m ) / m
1046  END IF
1047  itau = il + ldwrkl*m
1048  nwork = itau + m
1049 *
1050 * Compute A=L*Q
1051 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
1052 *
1053  CALL sgelqf( m, n, a, lda, work( itau ), work( nwork ),
1054  $ lwork-nwork+1, ierr )
1055 *
1056 * Copy L to WORK(IL), zeroing about above it
1057 *
1058  CALL slacpy( 'L', m, m, a, lda, work( il ), ldwrkl )
1059  CALL slaset( 'U', m-1, m-1, zero, zero,
1060  $ work( il+ldwrkl ), ldwrkl )
1061 *
1062 * Generate Q in A
1063 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
1064 *
1065  CALL sorglq( m, n, m, a, lda, work( itau ),
1066  $ work( nwork ), lwork-nwork+1, ierr )
1067  ie = itau
1068  itauq = ie + m
1069  itaup = itauq + m
1070  nwork = itaup + m
1071 *
1072 * Bidiagonalize L in WORK(IL)
1073 * (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB)
1074 *
1075  CALL sgebrd( m, m, work( il ), ldwrkl, s, work( ie ),
1076  $ work( itauq ), work( itaup ), work( nwork ),
1077  $ lwork-nwork+1, ierr )
1078 *
1079 * Perform bidiagonal SVD, computing left singular vectors
1080 * of bidiagonal matrix in U, and computing right singular
1081 * vectors of bidiagonal matrix in WORK(IVT)
1082 * (Workspace: need M+M*M+BDSPAC)
1083 *
1084  CALL sbdsdc( 'U', 'I', m, s, work( ie ), u, ldu,
1085  $ work( ivt ), m, dum, idum, work( nwork ),
1086  $ iwork, info )
1087 *
1088 * Overwrite U by left singular vectors of L and WORK(IVT)
1089 * by right singular vectors of L
1090 * (Workspace: need 2*M*M+3*M, prefer 2*M*M+2*M+M*NB)
1091 *
1092  CALL sormbr( 'Q', 'L', 'N', m, m, m, work( il ), ldwrkl,
1093  $ work( itauq ), u, ldu, work( nwork ),
1094  $ lwork-nwork+1, ierr )
1095  CALL sormbr( 'P', 'R', 'T', m, m, m, work( il ), ldwrkl,
1096  $ work( itaup ), work( ivt ), m,
1097  $ work( nwork ), lwork-nwork+1, ierr )
1098 *
1099 * Multiply right singular vectors of L in WORK(IVT) by Q
1100 * in A, storing result in WORK(IL) and copying to A
1101 * (Workspace: need 2*M*M, prefer M*M+M*N)
1102 *
1103  DO 30 i = 1, n, chunk
1104  blk = min( n-i+1, chunk )
1105  CALL sgemm( 'N', 'N', m, blk, m, one, work( ivt ), m,
1106  $ a( 1, i ), lda, zero, work( il ), ldwrkl )
1107  CALL slacpy( 'F', m, blk, work( il ), ldwrkl,
1108  $ a( 1, i ), lda )
1109  30 CONTINUE
1110 *
1111  ELSE IF( wntqs ) THEN
1112 *
1113 * Path 3t (N much larger than M, JOBZ='S')
1114 * M right singular vectors to be computed in VT and
1115 * M left singular vectors to be computed in U
1116 *
1117  il = 1
1118 *
1119 * WORK(IL) is M by M
1120 *
1121  ldwrkl = m
1122  itau = il + ldwrkl*m
1123  nwork = itau + m
1124 *
1125 * Compute A=L*Q
1126 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
1127 *
1128  CALL sgelqf( m, n, a, lda, work( itau ), work( nwork ),
1129  $ lwork-nwork+1, ierr )
1130 *
1131 * Copy L to WORK(IL), zeroing out above it
1132 *
1133  CALL slacpy( 'L', m, m, a, lda, work( il ), ldwrkl )
1134  CALL slaset( 'U', m-1, m-1, zero, zero,
1135  $ work( il+ldwrkl ), ldwrkl )
1136 *
1137 * Generate Q in A
1138 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
1139 *
1140  CALL sorglq( m, n, m, a, lda, work( itau ),
1141  $ work( nwork ), lwork-nwork+1, ierr )
1142  ie = itau
1143  itauq = ie + m
1144  itaup = itauq + m
1145  nwork = itaup + m
1146 *
1147 * Bidiagonalize L in WORK(IU), copying result to U
1148 * (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB)
1149 *
1150  CALL sgebrd( m, m, work( il ), ldwrkl, s, work( ie ),
1151  $ work( itauq ), work( itaup ), work( nwork ),
1152  $ lwork-nwork+1, ierr )
1153 *
1154 * Perform bidiagonal SVD, computing left singular vectors
1155 * of bidiagonal matrix in U and computing right singular
1156 * vectors of bidiagonal matrix in VT
1157 * (Workspace: need M+BDSPAC)
1158 *
1159  CALL sbdsdc( 'U', 'I', m, s, work( ie ), u, ldu, vt,
1160  $ ldvt, dum, idum, work( nwork ), iwork,
1161  $ info )
1162 *
1163 * Overwrite U by left singular vectors of L and VT
1164 * by right singular vectors of L
1165 * (Workspace: need M*M+3*M, prefer M*M+2*M+M*NB)
1166 *
1167  CALL sormbr( 'Q', 'L', 'N', m, m, m, work( il ), ldwrkl,
1168  $ work( itauq ), u, ldu, work( nwork ),
1169  $ lwork-nwork+1, ierr )
1170  CALL sormbr( 'P', 'R', 'T', m, m, m, work( il ), ldwrkl,
1171  $ work( itaup ), vt, ldvt, work( nwork ),
1172  $ lwork-nwork+1, ierr )
1173 *
1174 * Multiply right singular vectors of L in WORK(IL) by
1175 * Q in A, storing result in VT
1176 * (Workspace: need M*M)
1177 *
1178  CALL slacpy( 'F', m, m, vt, ldvt, work( il ), ldwrkl )
1179  CALL sgemm( 'N', 'N', m, n, m, one, work( il ), ldwrkl,
1180  $ a, lda, zero, vt, ldvt )
1181 *
1182  ELSE IF( wntqa ) THEN
1183 *
1184 * Path 4t (N much larger than M, JOBZ='A')
1185 * N right singular vectors to be computed in VT and
1186 * M left singular vectors to be computed in U
1187 *
1188  ivt = 1
1189 *
1190 * WORK(IVT) is M by M
1191 *
1192  ldwkvt = m
1193  itau = ivt + ldwkvt*m
1194  nwork = itau + m
1195 *
1196 * Compute A=L*Q, copying result to VT
1197 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
1198 *
1199  CALL sgelqf( m, n, a, lda, work( itau ), work( nwork ),
1200  $ lwork-nwork+1, ierr )
1201  CALL slacpy( 'U', m, n, a, lda, vt, ldvt )
1202 *
1203 * Generate Q in VT
1204 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
1205 *
1206  CALL sorglq( n, n, m, vt, ldvt, work( itau ),
1207  $ work( nwork ), lwork-nwork+1, ierr )
1208 *
1209 * Produce L in A, zeroing out other entries
1210 *
1211  CALL slaset( 'U', m-1, m-1, zero, zero, a( 1, 2 ), lda )
1212  ie = itau
1213  itauq = ie + m
1214  itaup = itauq + m
1215  nwork = itaup + m
1216 *
1217 * Bidiagonalize L in A
1218 * (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB)
1219 *
1220  CALL sgebrd( m, m, a, lda, s, work( ie ), work( itauq ),
1221  $ work( itaup ), work( nwork ), lwork-nwork+1,
1222  $ ierr )
1223 *
1224 * Perform bidiagonal SVD, computing left singular vectors
1225 * of bidiagonal matrix in U and computing right singular
1226 * vectors of bidiagonal matrix in WORK(IVT)
1227 * (Workspace: need M+M*M+BDSPAC)
1228 *
1229  CALL sbdsdc( 'U', 'I', m, s, work( ie ), u, ldu,
1230  $ work( ivt ), ldwkvt, dum, idum,
1231  $ work( nwork ), iwork, info )
1232 *
1233 * Overwrite U by left singular vectors of L and WORK(IVT)
1234 * by right singular vectors of L
1235 * (Workspace: need M*M+3*M, prefer M*M+2*M+M*NB)
1236 *
1237  CALL sormbr( 'Q', 'L', 'N', m, m, m, a, lda,
1238  $ work( itauq ), u, ldu, work( nwork ),
1239  $ lwork-nwork+1, ierr )
1240  CALL sormbr( 'P', 'R', 'T', m, m, m, a, lda,
1241  $ work( itaup ), work( ivt ), ldwkvt,
1242  $ work( nwork ), lwork-nwork+1, ierr )
1243 *
1244 * Multiply right singular vectors of L in WORK(IVT) by
1245 * Q in VT, storing result in A
1246 * (Workspace: need M*M)
1247 *
1248  CALL sgemm( 'N', 'N', m, n, m, one, work( ivt ), ldwkvt,
1249  $ vt, ldvt, zero, a, lda )
1250 *
1251 * Copy right singular vectors of A from A to VT
1252 *
1253  CALL slacpy( 'F', m, n, a, lda, vt, ldvt )
1254 *
1255  END IF
1256 *
1257  ELSE
1258 *
1259 * N .LT. MNTHR
1260 *
1261 * Path 5t (N greater than M, but not much larger)
1262 * Reduce to bidiagonal form without LQ decomposition
1263 *
1264  ie = 1
1265  itauq = ie + m
1266  itaup = itauq + m
1267  nwork = itaup + m
1268 *
1269 * Bidiagonalize A
1270 * (Workspace: need 3*M+N, prefer 3*M+(M+N)*NB)
1271 *
1272  CALL sgebrd( m, n, a, lda, s, work( ie ), work( itauq ),
1273  $ work( itaup ), work( nwork ), lwork-nwork+1,
1274  $ ierr )
1275  IF( wntqn ) THEN
1276 *
1277 * Perform bidiagonal SVD, only computing singular values
1278 * (Workspace: need M+BDSPAC)
1279 *
1280  CALL sbdsdc( 'L', 'N', m, s, work( ie ), dum, 1, dum, 1,
1281  $ dum, idum, work( nwork ), iwork, info )
1282  ELSE IF( wntqo ) THEN
1283  ldwkvt = m
1284  ivt = nwork
1285  IF( lwork.GE.m*n+3*m+bdspac ) THEN
1286 *
1287 * WORK( IVT ) is M by N
1288 *
1289  CALL slaset( 'F', m, n, zero, zero, work( ivt ),
1290  $ ldwkvt )
1291  nwork = ivt + ldwkvt*n
1292  ELSE
1293 *
1294 * WORK( IVT ) is M by M
1295 *
1296  nwork = ivt + ldwkvt*m
1297  il = nwork
1298 *
1299 * WORK(IL) is M by CHUNK
1300 *
1301  chunk = ( lwork-m*m-3*m ) / m
1302  END IF
1303 *
1304 * Perform bidiagonal SVD, computing left singular vectors
1305 * of bidiagonal matrix in U and computing right singular
1306 * vectors of bidiagonal matrix in WORK(IVT)
1307 * (Workspace: need M*M+BDSPAC)
1308 *
1309  CALL sbdsdc( 'L', 'I', m, s, work( ie ), u, ldu,
1310  $ work( ivt ), ldwkvt, dum, idum,
1311  $ work( nwork ), iwork, info )
1312 *
1313 * Overwrite U by left singular vectors of A
1314 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
1315 *
1316  CALL sormbr( 'Q', 'L', 'N', m, m, n, a, lda,
1317  $ work( itauq ), u, ldu, work( nwork ),
1318  $ lwork-nwork+1, ierr )
1319 *
1320  IF( lwork.GE.m*n+3*m+bdspac ) THEN
1321 *
1322 * Overwrite WORK(IVT) by left singular vectors of A
1323 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
1324 *
1325  CALL sormbr( 'P', 'R', 'T', m, n, m, a, lda,
1326  $ work( itaup ), work( ivt ), ldwkvt,
1327  $ work( nwork ), lwork-nwork+1, ierr )
1328 *
1329 * Copy right singular vectors of A from WORK(IVT) to A
1330 *
1331  CALL slacpy( 'F', m, n, work( ivt ), ldwkvt, a, lda )
1332  ELSE
1333 *
1334 * Generate P**T in A
1335 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
1336 *
1337  CALL sorgbr( 'P', m, n, m, a, lda, work( itaup ),
1338  $ work( nwork ), lwork-nwork+1, ierr )
1339 *
1340 * Multiply Q in A by right singular vectors of
1341 * bidiagonal matrix in WORK(IVT), storing result in
1342 * WORK(IL) and copying to A
1343 * (Workspace: need 2*M*M, prefer M*M+M*N)
1344 *
1345  DO 40 i = 1, n, chunk
1346  blk = min( n-i+1, chunk )
1347  CALL sgemm( 'N', 'N', m, blk, m, one, work( ivt ),
1348  $ ldwkvt, a( 1, i ), lda, zero,
1349  $ work( il ), m )
1350  CALL slacpy( 'F', m, blk, work( il ), m, a( 1, i ),
1351  $ lda )
1352  40 CONTINUE
1353  END IF
1354  ELSE IF( wntqs ) THEN
1355 *
1356 * Perform bidiagonal SVD, computing left singular vectors
1357 * of bidiagonal matrix in U and computing right singular
1358 * vectors of bidiagonal matrix in VT
1359 * (Workspace: need M+BDSPAC)
1360 *
1361  CALL slaset( 'F', m, n, zero, zero, vt, ldvt )
1362  CALL sbdsdc( 'L', 'I', m, s, work( ie ), u, ldu, vt,
1363  $ ldvt, dum, idum, work( nwork ), iwork,
1364  $ info )
1365 *
1366 * Overwrite U by left singular vectors of A and VT
1367 * by right singular vectors of A
1368 * (Workspace: need 3*M, prefer 2*M+M*NB)
1369 *
1370  CALL sormbr( 'Q', 'L', 'N', m, m, n, a, lda,
1371  $ work( itauq ), u, ldu, work( nwork ),
1372  $ lwork-nwork+1, ierr )
1373  CALL sormbr( 'P', 'R', 'T', m, n, m, a, lda,
1374  $ work( itaup ), vt, ldvt, work( nwork ),
1375  $ lwork-nwork+1, ierr )
1376  ELSE IF( wntqa ) THEN
1377 *
1378 * Perform bidiagonal SVD, computing left singular vectors
1379 * of bidiagonal matrix in U and computing right singular
1380 * vectors of bidiagonal matrix in VT
1381 * (Workspace: need M+BDSPAC)
1382 *
1383  CALL slaset( 'F', n, n, zero, zero, vt, ldvt )
1384  CALL sbdsdc( 'L', 'I', m, s, work( ie ), u, ldu, vt,
1385  $ ldvt, dum, idum, work( nwork ), iwork,
1386  $ info )
1387 *
1388 * Set the right corner of VT to identity matrix
1389 *
1390  IF( n.GT.m ) THEN
1391  CALL slaset( 'F', n-m, n-m, zero, one, vt( m+1, m+1 ),
1392  $ ldvt )
1393  END IF
1394 *
1395 * Overwrite U by left singular vectors of A and VT
1396 * by right singular vectors of A
1397 * (Workspace: need 2*M+N, prefer 2*M+N*NB)
1398 *
1399  CALL sormbr( 'Q', 'L', 'N', m, m, n, a, lda,
1400  $ work( itauq ), u, ldu, work( nwork ),
1401  $ lwork-nwork+1, ierr )
1402  CALL sormbr( 'P', 'R', 'T', n, n, m, a, lda,
1403  $ work( itaup ), vt, ldvt, work( nwork ),
1404  $ lwork-nwork+1, ierr )
1405  END IF
1406 *
1407  END IF
1408 *
1409  END IF
1410 *
1411 * Undo scaling if necessary
1412 *
1413  IF( iscl.EQ.1 ) THEN
1414  IF( anrm.GT.bignum )
1415  $ CALL slascl( 'G', 0, 0, bignum, anrm, minmn, 1, s, minmn,
1416  $ ierr )
1417  IF( anrm.LT.smlnum )
1418  $ CALL slascl( 'G', 0, 0, smlnum, anrm, minmn, 1, s, minmn,
1419  $ ierr )
1420  END IF
1421 *
1422 * Return optimal workspace in WORK(1)
1423 *
1424  work( 1 ) = maxwrk
1425 *
1426  RETURN
1427 *
1428 * End of SGESDD
1429 *
real function slamch(CMACH)
SLAMCH
Definition: slamch.f:69
subroutine sgebrd(M, N, A, LDA, D, E, TAUQ, TAUP, WORK, LWORK, INFO)
SGEBRD
Definition: sgebrd.f:207
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:62
subroutine sorgbr(VECT, M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
SORGBR
Definition: sorgbr.f:159
subroutine sbdsdc(UPLO, COMPQ, N, D, E, U, LDU, VT, LDVT, Q, IQ, WORK, IWORK, INFO)
SBDSDC
Definition: sbdsdc.f:207
subroutine sgelqf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
SGELQF
Definition: sgelqf.f:137
subroutine sgeqrf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
SGEQRF
Definition: sgeqrf.f:138
logical function lsame(CA, CB)
LSAME
Definition: lsame.f:55
subroutine sgemm(TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)
SGEMM
Definition: sgemm.f:189
subroutine slacpy(UPLO, M, N, A, LDA, B, LDB)
SLACPY copies all or part of one two-dimensional array to another.
Definition: slacpy.f:105
integer function ilaenv(ISPEC, NAME, OPTS, N1, N2, N3, N4)
Definition: tstiee.f:83
subroutine slascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
SLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: slascl.f:141
subroutine sorgqr(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
SORGQR
Definition: sorgqr.f:130
subroutine sormbr(VECT, SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
SORMBR
Definition: sormbr.f:198
real function slange(NORM, M, N, A, LDA, WORK)
SLANGE returns the value of the 1-norm, Frobenius norm, infinity-norm, or the largest absolute value ...
Definition: slange.f:116
subroutine sorglq(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
SORGLQ
Definition: sorglq.f:129
subroutine slaset(UPLO, M, N, ALPHA, BETA, A, LDA)
SLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values...
Definition: slaset.f:112

Here is the call graph for this function:

Here is the caller graph for this function:

subroutine sgesvd ( character  JOBU,
character  JOBVT,
integer  M,
integer  N,
real, dimension( lda, * )  A,
integer  LDA,
real, dimension( * )  S,
real, dimension( ldu, * )  U,
integer  LDU,
real, dimension( ldvt, * )  VT,
integer  LDVT,
real, dimension( * )  WORK,
integer  LWORK,
integer  INFO 
)

SGESVD computes the singular value decomposition (SVD) for GE matrices

Download SGESVD + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 SGESVD computes the singular value decomposition (SVD) of a real
 M-by-N matrix A, optionally computing the left and/or right singular
 vectors. The SVD is written

      A = U * SIGMA * transpose(V)

 where SIGMA is an M-by-N matrix which is zero except for its
 min(m,n) diagonal elements, U is an M-by-M orthogonal matrix, and
 V is an N-by-N orthogonal matrix.  The diagonal elements of SIGMA
 are the singular values of A; they are real and non-negative, and
 are returned in descending order.  The first min(m,n) columns of
 U and V are the left and right singular vectors of A.

 Note that the routine returns V**T, not V.
Parameters
[in]JOBU
          JOBU is CHARACTER*1
          Specifies options for computing all or part of the matrix U:
          = 'A':  all M columns of U are returned in array U:
          = 'S':  the first min(m,n) columns of U (the left singular
                  vectors) are returned in the array U;
          = 'O':  the first min(m,n) columns of U (the left singular
                  vectors) are overwritten on the array A;
          = 'N':  no columns of U (no left singular vectors) are
                  computed.
[in]JOBVT
          JOBVT is CHARACTER*1
          Specifies options for computing all or part of the matrix
          V**T:
          = 'A':  all N rows of V**T are returned in the array VT;
          = 'S':  the first min(m,n) rows of V**T (the right singular
                  vectors) are returned in the array VT;
          = 'O':  the first min(m,n) rows of V**T (the right singular
                  vectors) are overwritten on the array A;
          = 'N':  no rows of V**T (no right singular vectors) are
                  computed.

          JOBVT and JOBU cannot both be 'O'.
[in]M
          M is INTEGER
          The number of rows of the input matrix A.  M >= 0.
[in]N
          N is INTEGER
          The number of columns of the input matrix A.  N >= 0.
[in,out]A
          A is REAL array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
          On exit,
          if JOBU = 'O',  A is overwritten with the first min(m,n)
                          columns of U (the left singular vectors,
                          stored columnwise);
          if JOBVT = 'O', A is overwritten with the first min(m,n)
                          rows of V**T (the right singular vectors,
                          stored rowwise);
          if JOBU .ne. 'O' and JOBVT .ne. 'O', the contents of A
                          are destroyed.
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]S
          S is REAL array, dimension (min(M,N))
          The singular values of A, sorted so that S(i) >= S(i+1).
[out]U
          U is REAL array, dimension (LDU,UCOL)
          (LDU,M) if JOBU = 'A' or (LDU,min(M,N)) if JOBU = 'S'.
          If JOBU = 'A', U contains the M-by-M orthogonal matrix U;
          if JOBU = 'S', U contains the first min(m,n) columns of U
          (the left singular vectors, stored columnwise);
          if JOBU = 'N' or 'O', U is not referenced.
[in]LDU
          LDU is INTEGER
          The leading dimension of the array U.  LDU >= 1; if
          JOBU = 'S' or 'A', LDU >= M.
[out]VT
          VT is REAL array, dimension (LDVT,N)
          If JOBVT = 'A', VT contains the N-by-N orthogonal matrix
          V**T;
          if JOBVT = 'S', VT contains the first min(m,n) rows of
          V**T (the right singular vectors, stored rowwise);
          if JOBVT = 'N' or 'O', VT is not referenced.
[in]LDVT
          LDVT is INTEGER
          The leading dimension of the array VT.  LDVT >= 1; if
          JOBVT = 'A', LDVT >= N; if JOBVT = 'S', LDVT >= min(M,N).
[out]WORK
          WORK is REAL array, dimension (MAX(1,LWORK))
          On exit, if INFO = 0, WORK(1) returns the optimal LWORK;
          if INFO > 0, WORK(2:MIN(M,N)) contains the unconverged
          superdiagonal elements of an upper bidiagonal matrix B
          whose diagonal is in S (not necessarily sorted). B
          satisfies A = U * B * VT, so it has the same singular values
          as A, and singular vectors related by U and VT.
[in]LWORK
          LWORK is INTEGER
          The dimension of the array WORK.
          LWORK >= MAX(1,5*MIN(M,N)) for the paths (see comments inside code):
             - PATH 1  (M much larger than N, JOBU='N') 
             - PATH 1t (N much larger than M, JOBVT='N')
          LWORK >= MAX(1,3*MIN(M,N)+MAX(M,N),5*MIN(M,N)) for the other paths
          For good performance, LWORK should generally be larger.

          If LWORK = -1, then a workspace query is assumed; the routine
          only calculates the optimal size of the WORK array, returns
          this value as the first entry of the WORK array, and no error
          message related to LWORK is issued by XERBLA.
[out]INFO
          INFO is INTEGER
          = 0:  successful exit.
          < 0:  if INFO = -i, the i-th argument had an illegal value.
          > 0:  if SBDSQR did not converge, INFO specifies how many
                superdiagonals of an intermediate bidiagonal form B
                did not converge to zero. See the description of WORK
                above for details.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Date
April 2012

Definition at line 213 of file sgesvd.f.

213 *
214 * -- LAPACK driver routine (version 3.4.1) --
215 * -- LAPACK is a software package provided by Univ. of Tennessee, --
216 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
217 * April 2012
218 *
219 * .. Scalar Arguments ..
220  CHARACTER jobu, jobvt
221  INTEGER info, lda, ldu, ldvt, lwork, m, n
222 * ..
223 * .. Array Arguments ..
224  REAL a( lda, * ), s( * ), u( ldu, * ),
225  $ vt( ldvt, * ), work( * )
226 * ..
227 *
228 * =====================================================================
229 *
230 * .. Parameters ..
231  REAL zero, one
232  parameter( zero = 0.0e0, one = 1.0e0 )
233 * ..
234 * .. Local Scalars ..
235  LOGICAL lquery, wntua, wntuas, wntun, wntuo, wntus,
236  $ wntva, wntvas, wntvn, wntvo, wntvs
237  INTEGER bdspac, blk, chunk, i, ie, ierr, ir, iscl,
238  $ itau, itaup, itauq, iu, iwork, ldwrkr, ldwrku,
239  $ maxwrk, minmn, minwrk, mnthr, ncu, ncvt, nru,
240  $ nrvt, wrkbl
241  INTEGER lwork_sgeqrf, lwork_sorgqr_n, lwork_sorgqr_m,
242  $ lwork_sgebrd, lwork_sorgbr_p, lwork_sorgbr_q,
243  $ lwork_sgelqf, lwork_sorglq_n, lwork_sorglq_m
244  REAL anrm, bignum, eps, smlnum
245 * ..
246 * .. Local Arrays ..
247  REAL dum( 1 )
248 * ..
249 * .. External Subroutines ..
250  EXTERNAL sbdsqr, sgebrd, sgelqf, sgemm, sgeqrf, slacpy,
252  $ xerbla
253 * ..
254 * .. External Functions ..
255  LOGICAL lsame
256  INTEGER ilaenv
257  REAL slamch, slange
258  EXTERNAL lsame, ilaenv, slamch, slange
259 * ..
260 * .. Intrinsic Functions ..
261  INTRINSIC max, min, sqrt
262 * ..
263 * .. Executable Statements ..
264 *
265 * Test the input arguments
266 *
267  info = 0
268  minmn = min( m, n )
269  wntua = lsame( jobu, 'A' )
270  wntus = lsame( jobu, 'S' )
271  wntuas = wntua .OR. wntus
272  wntuo = lsame( jobu, 'O' )
273  wntun = lsame( jobu, 'N' )
274  wntva = lsame( jobvt, 'A' )
275  wntvs = lsame( jobvt, 'S' )
276  wntvas = wntva .OR. wntvs
277  wntvo = lsame( jobvt, 'O' )
278  wntvn = lsame( jobvt, 'N' )
279  lquery = ( lwork.EQ.-1 )
280 *
281  IF( .NOT.( wntua .OR. wntus .OR. wntuo .OR. wntun ) ) THEN
282  info = -1
283  ELSE IF( .NOT.( wntva .OR. wntvs .OR. wntvo .OR. wntvn ) .OR.
284  $ ( wntvo .AND. wntuo ) ) THEN
285  info = -2
286  ELSE IF( m.LT.0 ) THEN
287  info = -3
288  ELSE IF( n.LT.0 ) THEN
289  info = -4
290  ELSE IF( lda.LT.max( 1, m ) ) THEN
291  info = -6
292  ELSE IF( ldu.LT.1 .OR. ( wntuas .AND. ldu.LT.m ) ) THEN
293  info = -9
294  ELSE IF( ldvt.LT.1 .OR. ( wntva .AND. ldvt.LT.n ) .OR.
295  $ ( wntvs .AND. ldvt.LT.minmn ) ) THEN
296  info = -11
297  END IF
298 *
299 * Compute workspace
300 * (Note: Comments in the code beginning "Workspace:" describe the
301 * minimal amount of workspace needed at that point in the code,
302 * as well as the preferred amount for good performance.
303 * NB refers to the optimal block size for the immediately
304 * following subroutine, as returned by ILAENV.)
305 *
306  IF( info.EQ.0 ) THEN
307  minwrk = 1
308  maxwrk = 1
309  IF( m.GE.n .AND. minmn.GT.0 ) THEN
310 *
311 * Compute space needed for SBDSQR
312 *
313  mnthr = ilaenv( 6, 'SGESVD', jobu // jobvt, m, n, 0, 0 )
314  bdspac = 5*n
315 * Compute space needed for SGEQRF
316  CALL sgeqrf( m, n, a, lda, dum(1), dum(1), -1, ierr )
317  lwork_sgeqrf=dum(1)
318 * Compute space needed for SORGQR
319  CALL sorgqr( m, n, n, a, lda, dum(1), dum(1), -1, ierr )
320  lwork_sorgqr_n=dum(1)
321  CALL sorgqr( m, m, n, a, lda, dum(1), dum(1), -1, ierr )
322  lwork_sorgqr_m=dum(1)
323 * Compute space needed for SGEBRD
324  CALL sgebrd( n, n, a, lda, s, dum(1), dum(1),
325  $ dum(1), dum(1), -1, ierr )
326  lwork_sgebrd=dum(1)
327 * Compute space needed for SORGBR P
328  CALL sorgbr( 'P', n, n, n, a, lda, dum(1),
329  $ dum(1), -1, ierr )
330  lwork_sorgbr_p=dum(1)
331 * Compute space needed for SORGBR Q
332  CALL sorgbr( 'Q', n, n, n, a, lda, dum(1),
333  $ dum(1), -1, ierr )
334  lwork_sorgbr_q=dum(1)
335 *
336  IF( m.GE.mnthr ) THEN
337  IF( wntun ) THEN
338 *
339 * Path 1 (M much larger than N, JOBU='N')
340 *
341  maxwrk = n + lwork_sgeqrf
342  maxwrk = max( maxwrk, 3*n+lwork_sgebrd )
343  IF( wntvo .OR. wntvas )
344  $ maxwrk = max( maxwrk, 3*n+lwork_sorgbr_p )
345  maxwrk = max( maxwrk, bdspac )
346  minwrk = max( 4*n, bdspac )
347  ELSE IF( wntuo .AND. wntvn ) THEN
348 *
349 * Path 2 (M much larger than N, JOBU='O', JOBVT='N')
350 *
351  wrkbl = n + lwork_sgeqrf
352  wrkbl = max( wrkbl, n+lwork_sorgqr_n )
353  wrkbl = max( wrkbl, 3*n+lwork_sgebrd )
354  wrkbl = max( wrkbl, 3*n+lwork_sorgbr_q )
355  wrkbl = max( wrkbl, bdspac )
356  maxwrk = max( n*n+wrkbl, n*n+m*n+n )
357  minwrk = max( 3*n+m, bdspac )
358  ELSE IF( wntuo .AND. wntvas ) THEN
359 *
360 * Path 3 (M much larger than N, JOBU='O', JOBVT='S' or
361 * 'A')
362 *
363  wrkbl = n + lwork_sgeqrf
364  wrkbl = max( wrkbl, n+lwork_sorgqr_n )
365  wrkbl = max( wrkbl, 3*n+lwork_sgebrd )
366  wrkbl = max( wrkbl, 3*n+lwork_sorgbr_q )
367  wrkbl = max( wrkbl, 3*n+lwork_sorgbr_p )
368  wrkbl = max( wrkbl, bdspac )
369  maxwrk = max( n*n+wrkbl, n*n+m*n+n )
370  minwrk = max( 3*n+m, bdspac )
371  ELSE IF( wntus .AND. wntvn ) THEN
372 *
373 * Path 4 (M much larger than N, JOBU='S', JOBVT='N')
374 *
375  wrkbl = n + lwork_sgeqrf
376  wrkbl = max( wrkbl, n+lwork_sorgqr_n )
377  wrkbl = max( wrkbl, 3*n+lwork_sgebrd )
378  wrkbl = max( wrkbl, 3*n+lwork_sorgbr_q )
379  wrkbl = max( wrkbl, bdspac )
380  maxwrk = n*n + wrkbl
381  minwrk = max( 3*n+m, bdspac )
382  ELSE IF( wntus .AND. wntvo ) THEN
383 *
384 * Path 5 (M much larger than N, JOBU='S', JOBVT='O')
385 *
386  wrkbl = n + lwork_sgeqrf
387  wrkbl = max( wrkbl, n+lwork_sorgqr_n )
388  wrkbl = max( wrkbl, 3*n+lwork_sgebrd )
389  wrkbl = max( wrkbl, 3*n+lwork_sorgbr_q )
390  wrkbl = max( wrkbl, 3*n+lwork_sorgbr_p )
391  wrkbl = max( wrkbl, bdspac )
392  maxwrk = 2*n*n + wrkbl
393  minwrk = max( 3*n+m, bdspac )
394  ELSE IF( wntus .AND. wntvas ) THEN
395 *
396 * Path 6 (M much larger than N, JOBU='S', JOBVT='S' or
397 * 'A')
398 *
399  wrkbl = n + lwork_sgeqrf
400  wrkbl = max( wrkbl, n+lwork_sorgqr_n )
401  wrkbl = max( wrkbl, 3*n+lwork_sgebrd )
402  wrkbl = max( wrkbl, 3*n+lwork_sorgbr_q )
403  wrkbl = max( wrkbl, 3*n+lwork_sorgbr_p )
404  wrkbl = max( wrkbl, bdspac )
405  maxwrk = n*n + wrkbl
406  minwrk = max( 3*n+m, bdspac )
407  ELSE IF( wntua .AND. wntvn ) THEN
408 *
409 * Path 7 (M much larger than N, JOBU='A', JOBVT='N')
410 *
411  wrkbl = n + lwork_sgeqrf
412  wrkbl = max( wrkbl, n+lwork_sorgqr_m )
413  wrkbl = max( wrkbl, 3*n+lwork_sgebrd )
414  wrkbl = max( wrkbl, 3*n+lwork_sorgbr_q )
415  wrkbl = max( wrkbl, bdspac )
416  maxwrk = n*n + wrkbl
417  minwrk = max( 3*n+m, bdspac )
418  ELSE IF( wntua .AND. wntvo ) THEN
419 *
420 * Path 8 (M much larger than N, JOBU='A', JOBVT='O')
421 *
422  wrkbl = n + lwork_sgeqrf
423  wrkbl = max( wrkbl, n+lwork_sorgqr_m )
424  wrkbl = max( wrkbl, 3*n+lwork_sgebrd )
425  wrkbl = max( wrkbl, 3*n+lwork_sorgbr_q )
426  wrkbl = max( wrkbl, 3*n+lwork_sorgbr_p )
427  wrkbl = max( wrkbl, bdspac )
428  maxwrk = 2*n*n + wrkbl
429  minwrk = max( 3*n+m, bdspac )
430  ELSE IF( wntua .AND. wntvas ) THEN
431 *
432 * Path 9 (M much larger than N, JOBU='A', JOBVT='S' or
433 * 'A')
434 *
435  wrkbl = n + lwork_sgeqrf
436  wrkbl = max( wrkbl, n+lwork_sorgqr_m )
437  wrkbl = max( wrkbl, 3*n+lwork_sgebrd )
438  wrkbl = max( wrkbl, 3*n+lwork_sorgbr_q )
439  wrkbl = max( wrkbl, 3*n+lwork_sorgbr_p )
440  wrkbl = max( wrkbl, bdspac )
441  maxwrk = n*n + wrkbl
442  minwrk = max( 3*n+m, bdspac )
443  END IF
444  ELSE
445 *
446 * Path 10 (M at least N, but not much larger)
447 *
448  CALL sgebrd( m, n, a, lda, s, dum(1), dum(1),
449  $ dum(1), dum(1), -1, ierr )
450  lwork_sgebrd=dum(1)
451  maxwrk = 3*n + lwork_sgebrd
452  IF( wntus .OR. wntuo ) THEN
453  CALL sorgbr( 'Q', m, n, n, a, lda, dum(1),
454  $ dum(1), -1, ierr )
455  lwork_sorgbr_q=dum(1)
456  maxwrk = max( maxwrk, 3*n+lwork_sorgbr_q )
457  END IF
458  IF( wntua ) THEN
459  CALL sorgbr( 'Q', m, m, n, a, lda, dum(1),
460  $ dum(1), -1, ierr )
461  lwork_sorgbr_q=dum(1)
462  maxwrk = max( maxwrk, 3*n+lwork_sorgbr_q )
463  END IF
464  IF( .NOT.wntvn ) THEN
465  maxwrk = max( maxwrk, 3*n+lwork_sorgbr_p )
466  END IF
467  maxwrk = max( maxwrk, bdspac )
468  minwrk = max( 3*n+m, bdspac )
469  END IF
470  ELSE IF( minmn.GT.0 ) THEN
471 *
472 * Compute space needed for SBDSQR
473 *
474  mnthr = ilaenv( 6, 'SGESVD', jobu // jobvt, m, n, 0, 0 )
475  bdspac = 5*m
476 * Compute space needed for SGELQF
477  CALL sgelqf( m, n, a, lda, dum(1), dum(1), -1, ierr )
478  lwork_sgelqf=dum(1)
479 * Compute space needed for SORGLQ
480  CALL sorglq( n, n, m, dum(1), n, dum(1), dum(1), -1, ierr )
481  lwork_sorglq_n=dum(1)
482  CALL sorglq( m, n, m, a, lda, dum(1), dum(1), -1, ierr )
483  lwork_sorglq_m=dum(1)
484 * Compute space needed for SGEBRD
485  CALL sgebrd( m, m, a, lda, s, dum(1), dum(1),
486  $ dum(1), dum(1), -1, ierr )
487  lwork_sgebrd=dum(1)
488 * Compute space needed for SORGBR P
489  CALL sorgbr( 'P', m, m, m, a, n, dum(1),
490  $ dum(1), -1, ierr )
491  lwork_sorgbr_p=dum(1)
492 * Compute space needed for SORGBR Q
493  CALL sorgbr( 'Q', m, m, m, a, n, dum(1),
494  $ dum(1), -1, ierr )
495  lwork_sorgbr_q=dum(1)
496  IF( n.GE.mnthr ) THEN
497  IF( wntvn ) THEN
498 *
499 * Path 1t(N much larger than M, JOBVT='N')
500 *
501  maxwrk = m + lwork_sgelqf
502  maxwrk = max( maxwrk, 3*m+lwork_sgebrd )
503  IF( wntuo .OR. wntuas )
504  $ maxwrk = max( maxwrk, 3*m+lwork_sorgbr_q )
505  maxwrk = max( maxwrk, bdspac )
506  minwrk = max( 4*m, bdspac )
507  ELSE IF( wntvo .AND. wntun ) THEN
508 *
509 * Path 2t(N much larger than M, JOBU='N', JOBVT='O')
510 *
511  wrkbl = m + lwork_sgelqf
512  wrkbl = max( wrkbl, m+lwork_sorglq_m )
513  wrkbl = max( wrkbl, 3*m+lwork_sgebrd )
514  wrkbl = max( wrkbl, 3*m+lwork_sorgbr_p )
515  wrkbl = max( wrkbl, bdspac )
516  maxwrk = max( m*m+wrkbl, m*m+m*n+m )
517  minwrk = max( 3*m+n, bdspac )
518  ELSE IF( wntvo .AND. wntuas ) THEN
519 *
520 * Path 3t(N much larger than M, JOBU='S' or 'A',
521 * JOBVT='O')
522 *
523  wrkbl = m + lwork_sgelqf
524  wrkbl = max( wrkbl, m+lwork_sorglq_m )
525  wrkbl = max( wrkbl, 3*m+lwork_sgebrd )
526  wrkbl = max( wrkbl, 3*m+lwork_sorgbr_p )
527  wrkbl = max( wrkbl, 3*m+lwork_sorgbr_q )
528  wrkbl = max( wrkbl, bdspac )
529  maxwrk = max( m*m+wrkbl, m*m+m*n+m )
530  minwrk = max( 3*m+n, bdspac )
531  ELSE IF( wntvs .AND. wntun ) THEN
532 *
533 * Path 4t(N much larger than M, JOBU='N', JOBVT='S')
534 *
535  wrkbl = m + lwork_sgelqf
536  wrkbl = max( wrkbl, m+lwork_sorglq_m )
537  wrkbl = max( wrkbl, 3*m+lwork_sgebrd )
538  wrkbl = max( wrkbl, 3*m+lwork_sorgbr_p )
539  wrkbl = max( wrkbl, bdspac )
540  maxwrk = m*m + wrkbl
541  minwrk = max( 3*m+n, bdspac )
542  ELSE IF( wntvs .AND. wntuo ) THEN
543 *
544 * Path 5t(N much larger than M, JOBU='O', JOBVT='S')
545 *
546  wrkbl = m + lwork_sgelqf
547  wrkbl = max( wrkbl, m+lwork_sorglq_m )
548  wrkbl = max( wrkbl, 3*m+lwork_sgebrd )
549  wrkbl = max( wrkbl, 3*m+lwork_sorgbr_p )
550  wrkbl = max( wrkbl, 3*m+lwork_sorgbr_q )
551  wrkbl = max( wrkbl, bdspac )
552  maxwrk = 2*m*m + wrkbl
553  minwrk = max( 3*m+n, bdspac )
554  maxwrk = max( maxwrk, minwrk )
555  ELSE IF( wntvs .AND. wntuas ) THEN
556 *
557 * Path 6t(N much larger than M, JOBU='S' or 'A',
558 * JOBVT='S')
559 *
560  wrkbl = m + lwork_sgelqf
561  wrkbl = max( wrkbl, m+lwork_sorglq_m )
562  wrkbl = max( wrkbl, 3*m+lwork_sgebrd )
563  wrkbl = max( wrkbl, 3*m+lwork_sorgbr_p )
564  wrkbl = max( wrkbl, 3*m+lwork_sorgbr_q )
565  wrkbl = max( wrkbl, bdspac )
566  maxwrk = m*m + wrkbl
567  minwrk = max( 3*m+n, bdspac )
568  ELSE IF( wntva .AND. wntun ) THEN
569 *
570 * Path 7t(N much larger than M, JOBU='N', JOBVT='A')
571 *
572  wrkbl = m + lwork_sgelqf
573  wrkbl = max( wrkbl, m+lwork_sorglq_n )
574  wrkbl = max( wrkbl, 3*m+lwork_sgebrd )
575  wrkbl = max( wrkbl, 3*m+lwork_sorgbr_p )
576  wrkbl = max( wrkbl, bdspac )
577  maxwrk = m*m + wrkbl
578  minwrk = max( 3*m+n, bdspac )
579  ELSE IF( wntva .AND. wntuo ) THEN
580 *
581 * Path 8t(N much larger than M, JOBU='O', JOBVT='A')
582 *
583  wrkbl = m + lwork_sgelqf
584  wrkbl = max( wrkbl, m+lwork_sorglq_n )
585  wrkbl = max( wrkbl, 3*m+lwork_sgebrd )
586  wrkbl = max( wrkbl, 3*m+lwork_sorgbr_p )
587  wrkbl = max( wrkbl, 3*m+lwork_sorgbr_q )
588  wrkbl = max( wrkbl, bdspac )
589  maxwrk = 2*m*m + wrkbl
590  minwrk = max( 3*m+n, bdspac )
591  ELSE IF( wntva .AND. wntuas ) THEN
592 *
593 * Path 9t(N much larger than M, JOBU='S' or 'A',
594 * JOBVT='A')
595 *
596  wrkbl = m + lwork_sgelqf
597  wrkbl = max( wrkbl, m+lwork_sorglq_n )
598  wrkbl = max( wrkbl, 3*m+lwork_sgebrd )
599  wrkbl = max( wrkbl, 3*m+lwork_sorgbr_p )
600  wrkbl = max( wrkbl, 3*m+lwork_sorgbr_q )
601  wrkbl = max( wrkbl, bdspac )
602  maxwrk = m*m + wrkbl
603  minwrk = max( 3*m+n, bdspac )
604  END IF
605  ELSE
606 *
607 * Path 10t(N greater than M, but not much larger)
608 *
609  CALL sgebrd( m, n, a, lda, s, dum(1), dum(1),
610  $ dum(1), dum(1), -1, ierr )
611  lwork_sgebrd=dum(1)
612  maxwrk = 3*m + lwork_sgebrd
613  IF( wntvs .OR. wntvo ) THEN
614 * Compute space needed for SORGBR P
615  CALL sorgbr( 'P', m, n, m, a, n, dum(1),
616  $ dum(1), -1, ierr )
617  lwork_sorgbr_p=dum(1)
618  maxwrk = max( maxwrk, 3*m+lwork_sorgbr_p )
619  END IF
620  IF( wntva ) THEN
621  CALL sorgbr( 'P', n, n, m, a, n, dum(1),
622  $ dum(1), -1, ierr )
623  lwork_sorgbr_p=dum(1)
624  maxwrk = max( maxwrk, 3*m+lwork_sorgbr_p )
625  END IF
626  IF( .NOT.wntun ) THEN
627  maxwrk = max( maxwrk, 3*m+lwork_sorgbr_q )
628  END IF
629  maxwrk = max( maxwrk, bdspac )
630  minwrk = max( 3*m+n, bdspac )
631  END IF
632  END IF
633  maxwrk = max( maxwrk, minwrk )
634  work( 1 ) = maxwrk
635 *
636  IF( lwork.LT.minwrk .AND. .NOT.lquery ) THEN
637  info = -13
638  END IF
639  END IF
640 *
641  IF( info.NE.0 ) THEN
642  CALL xerbla( 'SGESVD', -info )
643  RETURN
644  ELSE IF( lquery ) THEN
645  RETURN
646  END IF
647 *
648 * Quick return if possible
649 *
650  IF( m.EQ.0 .OR. n.EQ.0 ) THEN
651  RETURN
652  END IF
653 *
654 * Get machine constants
655 *
656  eps = slamch( 'P' )
657  smlnum = sqrt( slamch( 'S' ) ) / eps
658  bignum = one / smlnum
659 *
660 * Scale A if max element outside range [SMLNUM,BIGNUM]
661 *
662  anrm = slange( 'M', m, n, a, lda, dum )
663  iscl = 0
664  IF( anrm.GT.zero .AND. anrm.LT.smlnum ) THEN
665  iscl = 1
666  CALL slascl( 'G', 0, 0, anrm, smlnum, m, n, a, lda, ierr )
667  ELSE IF( anrm.GT.bignum ) THEN
668  iscl = 1
669  CALL slascl( 'G', 0, 0, anrm, bignum, m, n, a, lda, ierr )
670  END IF
671 *
672  IF( m.GE.n ) THEN
673 *
674 * A has at least as many rows as columns. If A has sufficiently
675 * more rows than columns, first reduce using the QR
676 * decomposition (if sufficient workspace available)
677 *
678  IF( m.GE.mnthr ) THEN
679 *
680  IF( wntun ) THEN
681 *
682 * Path 1 (M much larger than N, JOBU='N')
683 * No left singular vectors to be computed
684 *
685  itau = 1
686  iwork = itau + n
687 *
688 * Compute A=Q*R
689 * (Workspace: need 2*N, prefer N+N*NB)
690 *
691  CALL sgeqrf( m, n, a, lda, work( itau ), work( iwork ),
692  $ lwork-iwork+1, ierr )
693 *
694 * Zero out below R
695 *
696  CALL slaset( 'L', n-1, n-1, zero, zero, a( 2, 1 ), lda )
697  ie = 1
698  itauq = ie + n
699  itaup = itauq + n
700  iwork = itaup + n
701 *
702 * Bidiagonalize R in A
703 * (Workspace: need 4*N, prefer 3*N+2*N*NB)
704 *
705  CALL sgebrd( n, n, a, lda, s, work( ie ), work( itauq ),
706  $ work( itaup ), work( iwork ), lwork-iwork+1,
707  $ ierr )
708  ncvt = 0
709  IF( wntvo .OR. wntvas ) THEN
710 *
711 * If right singular vectors desired, generate P'.
712 * (Workspace: need 4*N-1, prefer 3*N+(N-1)*NB)
713 *
714  CALL sorgbr( 'P', n, n, n, a, lda, work( itaup ),
715  $ work( iwork ), lwork-iwork+1, ierr )
716  ncvt = n
717  END IF
718  iwork = ie + n
719 *
720 * Perform bidiagonal QR iteration, computing right
721 * singular vectors of A in A if desired
722 * (Workspace: need BDSPAC)
723 *
724  CALL sbdsqr( 'U', n, ncvt, 0, 0, s, work( ie ), a, lda,
725  $ dum, 1, dum, 1, work( iwork ), info )
726 *
727 * If right singular vectors desired in VT, copy them there
728 *
729  IF( wntvas )
730  $ CALL slacpy( 'F', n, n, a, lda, vt, ldvt )
731 *
732  ELSE IF( wntuo .AND. wntvn ) THEN
733 *
734 * Path 2 (M much larger than N, JOBU='O', JOBVT='N')
735 * N left singular vectors to be overwritten on A and
736 * no right singular vectors to be computed
737 *
738  IF( lwork.GE.n*n+max( 4*n, bdspac ) ) THEN
739 *
740 * Sufficient workspace for a fast algorithm
741 *
742  ir = 1
743  IF( lwork.GE.max( wrkbl, lda*n+n )+lda*n ) THEN
744 *
745 * WORK(IU) is LDA by N, WORK(IR) is LDA by N
746 *
747  ldwrku = lda
748  ldwrkr = lda
749  ELSE IF( lwork.GE.max( wrkbl, lda*n+n )+n*n ) THEN
750 *
751 * WORK(IU) is LDA by N, WORK(IR) is N by N
752 *
753  ldwrku = lda
754  ldwrkr = n
755  ELSE
756 *
757 * WORK(IU) is LDWRKU by N, WORK(IR) is N by N
758 *
759  ldwrku = ( lwork-n*n-n ) / n
760  ldwrkr = n
761  END IF
762  itau = ir + ldwrkr*n
763  iwork = itau + n
764 *
765 * Compute A=Q*R
766 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
767 *
768  CALL sgeqrf( m, n, a, lda, work( itau ),
769  $ work( iwork ), lwork-iwork+1, ierr )
770 *
771 * Copy R to WORK(IR) and zero out below it
772 *
773  CALL slacpy( 'U', n, n, a, lda, work( ir ), ldwrkr )
774  CALL slaset( 'L', n-1, n-1, zero, zero, work( ir+1 ),
775  $ ldwrkr )
776 *
777 * Generate Q in A
778 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
779 *
780  CALL sorgqr( m, n, n, a, lda, work( itau ),
781  $ work( iwork ), lwork-iwork+1, ierr )
782  ie = itau
783  itauq = ie + n
784  itaup = itauq + n
785  iwork = itaup + n
786 *
787 * Bidiagonalize R in WORK(IR)
788 * (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB)
789 *
790  CALL sgebrd( n, n, work( ir ), ldwrkr, s, work( ie ),
791  $ work( itauq ), work( itaup ),
792  $ work( iwork ), lwork-iwork+1, ierr )
793 *
794 * Generate left vectors bidiagonalizing R
795 * (Workspace: need N*N+4*N, prefer N*N+3*N+N*NB)
796 *
797  CALL sorgbr( 'Q', n, n, n, work( ir ), ldwrkr,
798  $ work( itauq ), work( iwork ),
799  $ lwork-iwork+1, ierr )
800  iwork = ie + n
801 *
802 * Perform bidiagonal QR iteration, computing left
803 * singular vectors of R in WORK(IR)
804 * (Workspace: need N*N+BDSPAC)
805 *
806  CALL sbdsqr( 'U', n, 0, n, 0, s, work( ie ), dum, 1,
807  $ work( ir ), ldwrkr, dum, 1,
808  $ work( iwork ), info )
809  iu = ie + n
810 *
811 * Multiply Q in A by left singular vectors of R in
812 * WORK(IR), storing result in WORK(IU) and copying to A
813 * (Workspace: need N*N+2*N, prefer N*N+M*N+N)
814 *
815  DO 10 i = 1, m, ldwrku
816  chunk = min( m-i+1, ldwrku )
817  CALL sgemm( 'N', 'N', chunk, n, n, one, a( i, 1 ),
818  $ lda, work( ir ), ldwrkr, zero,
819  $ work( iu ), ldwrku )
820  CALL slacpy( 'F', chunk, n, work( iu ), ldwrku,
821  $ a( i, 1 ), lda )
822  10 CONTINUE
823 *
824  ELSE
825 *
826 * Insufficient workspace for a fast algorithm
827 *
828  ie = 1
829  itauq = ie + n
830  itaup = itauq + n
831  iwork = itaup + n
832 *
833 * Bidiagonalize A
834 * (Workspace: need 3*N+M, prefer 3*N+(M+N)*NB)
835 *
836  CALL sgebrd( m, n, a, lda, s, work( ie ),
837  $ work( itauq ), work( itaup ),
838  $ work( iwork ), lwork-iwork+1, ierr )
839 *
840 * Generate left vectors bidiagonalizing A
841 * (Workspace: need 4*N, prefer 3*N+N*NB)
842 *
843  CALL sorgbr( 'Q', m, n, n, a, lda, work( itauq ),
844  $ work( iwork ), lwork-iwork+1, ierr )
845  iwork = ie + n
846 *
847 * Perform bidiagonal QR iteration, computing left
848 * singular vectors of A in A
849 * (Workspace: need BDSPAC)
850 *
851  CALL sbdsqr( 'U', n, 0, m, 0, s, work( ie ), dum, 1,
852  $ a, lda, dum, 1, work( iwork ), info )
853 *
854  END IF
855 *
856  ELSE IF( wntuo .AND. wntvas ) THEN
857 *
858 * Path 3 (M much larger than N, JOBU='O', JOBVT='S' or 'A')
859 * N left singular vectors to be overwritten on A and
860 * N right singular vectors to be computed in VT
861 *
862  IF( lwork.GE.n*n+max( 4*n, bdspac ) ) THEN
863 *
864 * Sufficient workspace for a fast algorithm
865 *
866  ir = 1
867  IF( lwork.GE.max( wrkbl, lda*n+n )+lda*n ) THEN
868 *
869 * WORK(IU) is LDA by N and WORK(IR) is LDA by N
870 *
871  ldwrku = lda
872  ldwrkr = lda
873  ELSE IF( lwork.GE.max( wrkbl, lda*n+n )+n*n ) THEN
874 *
875 * WORK(IU) is LDA by N and WORK(IR) is N by N
876 *
877  ldwrku = lda
878  ldwrkr = n
879  ELSE
880 *
881 * WORK(IU) is LDWRKU by N and WORK(IR) is N by N
882 *
883  ldwrku = ( lwork-n*n-n ) / n
884  ldwrkr = n
885  END IF
886  itau = ir + ldwrkr*n
887  iwork = itau + n
888 *
889 * Compute A=Q*R
890 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
891 *
892  CALL sgeqrf( m, n, a, lda, work( itau ),
893  $ work( iwork ), lwork-iwork+1, ierr )
894 *
895 * Copy R to VT, zeroing out below it
896 *
897  CALL slacpy( 'U', n, n, a, lda, vt, ldvt )
898  IF( n.GT.1 )
899  $ CALL slaset( 'L', n-1, n-1, zero, zero,
900  $ vt( 2, 1 ), ldvt )
901 *
902 * Generate Q in A
903 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
904 *
905  CALL sorgqr( m, n, n, a, lda, work( itau ),
906  $ work( iwork ), lwork-iwork+1, ierr )
907  ie = itau
908  itauq = ie + n
909  itaup = itauq + n
910  iwork = itaup + n
911 *
912 * Bidiagonalize R in VT, copying result to WORK(IR)
913 * (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB)
914 *
915  CALL sgebrd( n, n, vt, ldvt, s, work( ie ),
916  $ work( itauq ), work( itaup ),
917  $ work( iwork ), lwork-iwork+1, ierr )
918  CALL slacpy( 'L', n, n, vt, ldvt, work( ir ), ldwrkr )
919 *
920 * Generate left vectors bidiagonalizing R in WORK(IR)
921 * (Workspace: need N*N+4*N, prefer N*N+3*N+N*NB)
922 *
923  CALL sorgbr( 'Q', n, n, n, work( ir ), ldwrkr,
924  $ work( itauq ), work( iwork ),
925  $ lwork-iwork+1, ierr )
926 *
927 * Generate right vectors bidiagonalizing R in VT
928 * (Workspace: need N*N+4*N-1, prefer N*N+3*N+(N-1)*NB)
929 *
930  CALL sorgbr( 'P', n, n, n, vt, ldvt, work( itaup ),
931  $ work( iwork ), lwork-iwork+1, ierr )
932  iwork = ie + n
933 *
934 * Perform bidiagonal QR iteration, computing left
935 * singular vectors of R in WORK(IR) and computing right
936 * singular vectors of R in VT
937 * (Workspace: need N*N+BDSPAC)
938 *
939  CALL sbdsqr( 'U', n, n, n, 0, s, work( ie ), vt, ldvt,
940  $ work( ir ), ldwrkr, dum, 1,
941  $ work( iwork ), info )
942  iu = ie + n
943 *
944 * Multiply Q in A by left singular vectors of R in
945 * WORK(IR), storing result in WORK(IU) and copying to A
946 * (Workspace: need N*N+2*N, prefer N*N+M*N+N)
947 *
948  DO 20 i = 1, m, ldwrku
949  chunk = min( m-i+1, ldwrku )
950  CALL sgemm( 'N', 'N', chunk, n, n, one, a( i, 1 ),
951  $ lda, work( ir ), ldwrkr, zero,
952  $ work( iu ), ldwrku )
953  CALL slacpy( 'F', chunk, n, work( iu ), ldwrku,
954  $ a( i, 1 ), lda )
955  20 CONTINUE
956 *
957  ELSE
958 *
959 * Insufficient workspace for a fast algorithm
960 *
961  itau = 1
962  iwork = itau + n
963 *
964 * Compute A=Q*R
965 * (Workspace: need 2*N, prefer N+N*NB)
966 *
967  CALL sgeqrf( m, n, a, lda, work( itau ),
968  $ work( iwork ), lwork-iwork+1, ierr )
969 *
970 * Copy R to VT, zeroing out below it
971 *
972  CALL slacpy( 'U', n, n, a, lda, vt, ldvt )
973  IF( n.GT.1 )
974  $ CALL slaset( 'L', n-1, n-1, zero, zero,
975  $ vt( 2, 1 ), ldvt )
976 *
977 * Generate Q in A
978 * (Workspace: need 2*N, prefer N+N*NB)
979 *
980  CALL sorgqr( m, n, n, a, lda, work( itau ),
981  $ work( iwork ), lwork-iwork+1, ierr )
982  ie = itau
983  itauq = ie + n
984  itaup = itauq + n
985  iwork = itaup + n
986 *
987 * Bidiagonalize R in VT
988 * (Workspace: need 4*N, prefer 3*N+2*N*NB)
989 *
990  CALL sgebrd( n, n, vt, ldvt, s, work( ie ),
991  $ work( itauq ), work( itaup ),
992  $ work( iwork ), lwork-iwork+1, ierr )
993 *
994 * Multiply Q in A by left vectors bidiagonalizing R
995 * (Workspace: need 3*N+M, prefer 3*N+M*NB)
996 *
997  CALL sormbr( 'Q', 'R', 'N', m, n, n, vt, ldvt,
998  $ work( itauq ), a, lda, work( iwork ),
999  $ lwork-iwork+1, ierr )
1000 *
1001 * Generate right vectors bidiagonalizing R in VT
1002 * (Workspace: need 4*N-1, prefer 3*N+(N-1)*NB)
1003 *
1004  CALL sorgbr( 'P', n, n, n, vt, ldvt, work( itaup ),
1005  $ work( iwork ), lwork-iwork+1, ierr )
1006  iwork = ie + n
1007 *
1008 * Perform bidiagonal QR iteration, computing left
1009 * singular vectors of A in A and computing right
1010 * singular vectors of A in VT
1011 * (Workspace: need BDSPAC)
1012 *
1013  CALL sbdsqr( 'U', n, n, m, 0, s, work( ie ), vt, ldvt,
1014  $ a, lda, dum, 1, work( iwork ), info )
1015 *
1016  END IF
1017 *
1018  ELSE IF( wntus ) THEN
1019 *
1020  IF( wntvn ) THEN
1021 *
1022 * Path 4 (M much larger than N, JOBU='S', JOBVT='N')
1023 * N left singular vectors to be computed in U and
1024 * no right singular vectors to be computed
1025 *
1026  IF( lwork.GE.n*n+max( 4*n, bdspac ) ) THEN
1027 *
1028 * Sufficient workspace for a fast algorithm
1029 *
1030  ir = 1
1031  IF( lwork.GE.wrkbl+lda*n ) THEN
1032 *
1033 * WORK(IR) is LDA by N
1034 *
1035  ldwrkr = lda
1036  ELSE
1037 *
1038 * WORK(IR) is N by N
1039 *
1040  ldwrkr = n
1041  END IF
1042  itau = ir + ldwrkr*n
1043  iwork = itau + n
1044 *
1045 * Compute A=Q*R
1046 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
1047 *
1048  CALL sgeqrf( m, n, a, lda, work( itau ),
1049  $ work( iwork ), lwork-iwork+1, ierr )
1050 *
1051 * Copy R to WORK(IR), zeroing out below it
1052 *
1053  CALL slacpy( 'U', n, n, a, lda, work( ir ),
1054  $ ldwrkr )
1055  CALL slaset( 'L', n-1, n-1, zero, zero,
1056  $ work( ir+1 ), ldwrkr )
1057 *
1058 * Generate Q in A
1059 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
1060 *
1061  CALL sorgqr( m, n, n, a, lda, work( itau ),
1062  $ work( iwork ), lwork-iwork+1, ierr )
1063  ie = itau
1064  itauq = ie + n
1065  itaup = itauq + n
1066  iwork = itaup + n
1067 *
1068 * Bidiagonalize R in WORK(IR)
1069 * (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB)
1070 *
1071  CALL sgebrd( n, n, work( ir ), ldwrkr, s,
1072  $ work( ie ), work( itauq ),
1073  $ work( itaup ), work( iwork ),
1074  $ lwork-iwork+1, ierr )
1075 *
1076 * Generate left vectors bidiagonalizing R in WORK(IR)
1077 * (Workspace: need N*N+4*N, prefer N*N+3*N+N*NB)
1078 *
1079  CALL sorgbr( 'Q', n, n, n, work( ir ), ldwrkr,
1080  $ work( itauq ), work( iwork ),
1081  $ lwork-iwork+1, ierr )
1082  iwork = ie + n
1083 *
1084 * Perform bidiagonal QR iteration, computing left
1085 * singular vectors of R in WORK(IR)
1086 * (Workspace: need N*N+BDSPAC)
1087 *
1088  CALL sbdsqr( 'U', n, 0, n, 0, s, work( ie ), dum,
1089  $ 1, work( ir ), ldwrkr, dum, 1,
1090  $ work( iwork ), info )
1091 *
1092 * Multiply Q in A by left singular vectors of R in
1093 * WORK(IR), storing result in U
1094 * (Workspace: need N*N)
1095 *
1096  CALL sgemm( 'N', 'N', m, n, n, one, a, lda,
1097  $ work( ir ), ldwrkr, zero, u, ldu )
1098 *
1099  ELSE
1100 *
1101 * Insufficient workspace for a fast algorithm
1102 *
1103  itau = 1
1104  iwork = itau + n
1105 *
1106 * Compute A=Q*R, copying result to U
1107 * (Workspace: need 2*N, prefer N+N*NB)
1108 *
1109  CALL sgeqrf( m, n, a, lda, work( itau ),
1110  $ work( iwork ), lwork-iwork+1, ierr )
1111  CALL slacpy( 'L', m, n, a, lda, u, ldu )
1112 *
1113 * Generate Q in U
1114 * (Workspace: need 2*N, prefer N+N*NB)
1115 *
1116  CALL sorgqr( m, n, n, u, ldu, work( itau ),
1117  $ work( iwork ), lwork-iwork+1, ierr )
1118  ie = itau
1119  itauq = ie + n
1120  itaup = itauq + n
1121  iwork = itaup + n
1122 *
1123 * Zero out below R in A
1124 *
1125  CALL slaset( 'L', n-1, n-1, zero, zero, a( 2, 1 ),
1126  $ lda )
1127 *
1128 * Bidiagonalize R in A
1129 * (Workspace: need 4*N, prefer 3*N+2*N*NB)
1130 *
1131  CALL sgebrd( n, n, a, lda, s, work( ie ),
1132  $ work( itauq ), work( itaup ),
1133  $ work( iwork ), lwork-iwork+1, ierr )
1134 *
1135 * Multiply Q in U by left vectors bidiagonalizing R
1136 * (Workspace: need 3*N+M, prefer 3*N+M*NB)
1137 *
1138  CALL sormbr( 'Q', 'R', 'N', m, n, n, a, lda,
1139  $ work( itauq ), u, ldu, work( iwork ),
1140  $ lwork-iwork+1, ierr )
1141  iwork = ie + n
1142 *
1143 * Perform bidiagonal QR iteration, computing left
1144 * singular vectors of A in U
1145 * (Workspace: need BDSPAC)
1146 *
1147  CALL sbdsqr( 'U', n, 0, m, 0, s, work( ie ), dum,
1148  $ 1, u, ldu, dum, 1, work( iwork ),
1149  $ info )
1150 *
1151  END IF
1152 *
1153  ELSE IF( wntvo ) THEN
1154 *
1155 * Path 5 (M much larger than N, JOBU='S', JOBVT='O')
1156 * N left singular vectors to be computed in U and
1157 * N right singular vectors to be overwritten on A
1158 *
1159  IF( lwork.GE.2*n*n+max( 4*n, bdspac ) ) THEN
1160 *
1161 * Sufficient workspace for a fast algorithm
1162 *
1163  iu = 1
1164  IF( lwork.GE.wrkbl+2*lda*n ) THEN
1165 *
1166 * WORK(IU) is LDA by N and WORK(IR) is LDA by N
1167 *
1168  ldwrku = lda
1169  ir = iu + ldwrku*n
1170  ldwrkr = lda
1171  ELSE IF( lwork.GE.wrkbl+( lda+n )*n ) THEN
1172 *
1173 * WORK(IU) is LDA by N and WORK(IR) is N by N
1174 *
1175  ldwrku = lda
1176  ir = iu + ldwrku*n
1177  ldwrkr = n
1178  ELSE
1179 *
1180 * WORK(IU) is N by N and WORK(IR) is N by N
1181 *
1182  ldwrku = n
1183  ir = iu + ldwrku*n
1184  ldwrkr = n
1185  END IF
1186  itau = ir + ldwrkr*n
1187  iwork = itau + n
1188 *
1189 * Compute A=Q*R
1190 * (Workspace: need 2*N*N+2*N, prefer 2*N*N+N+N*NB)
1191 *
1192  CALL sgeqrf( m, n, a, lda, work( itau ),
1193  $ work( iwork ), lwork-iwork+1, ierr )
1194 *
1195 * Copy R to WORK(IU), zeroing out below it
1196 *
1197  CALL slacpy( 'U', n, n, a, lda, work( iu ),
1198  $ ldwrku )
1199  CALL slaset( 'L', n-1, n-1, zero, zero,
1200  $ work( iu+1 ), ldwrku )
1201 *
1202 * Generate Q in A
1203 * (Workspace: need 2*N*N+2*N, prefer 2*N*N+N+N*NB)
1204 *
1205  CALL sorgqr( m, n, n, a, lda, work( itau ),
1206  $ work( iwork ), lwork-iwork+1, ierr )
1207  ie = itau
1208  itauq = ie + n
1209  itaup = itauq + n
1210  iwork = itaup + n
1211 *
1212 * Bidiagonalize R in WORK(IU), copying result to
1213 * WORK(IR)
1214 * (Workspace: need 2*N*N+4*N,
1215 * prefer 2*N*N+3*N+2*N*NB)
1216 *
1217  CALL sgebrd( n, n, work( iu ), ldwrku, s,
1218  $ work( ie ), work( itauq ),
1219  $ work( itaup ), work( iwork ),
1220  $ lwork-iwork+1, ierr )
1221  CALL slacpy( 'U', n, n, work( iu ), ldwrku,
1222  $ work( ir ), ldwrkr )
1223 *
1224 * Generate left bidiagonalizing vectors in WORK(IU)
1225 * (Workspace: need 2*N*N+4*N, prefer 2*N*N+3*N+N*NB)
1226 *
1227  CALL sorgbr( 'Q', n, n, n, work( iu ), ldwrku,
1228  $ work( itauq ), work( iwork ),
1229  $ lwork-iwork+1, ierr )
1230 *
1231 * Generate right bidiagonalizing vectors in WORK(IR)
1232 * (Workspace: need 2*N*N+4*N-1,
1233 * prefer 2*N*N+3*N+(N-1)*NB)
1234 *
1235  CALL sorgbr( 'P', n, n, n, work( ir ), ldwrkr,
1236  $ work( itaup ), work( iwork ),
1237  $ lwork-iwork+1, ierr )
1238  iwork = ie + n
1239 *
1240 * Perform bidiagonal QR iteration, computing left
1241 * singular vectors of R in WORK(IU) and computing
1242 * right singular vectors of R in WORK(IR)
1243 * (Workspace: need 2*N*N+BDSPAC)
1244 *
1245  CALL sbdsqr( 'U', n, n, n, 0, s, work( ie ),
1246  $ work( ir ), ldwrkr, work( iu ),
1247  $ ldwrku, dum, 1, work( iwork ), info )
1248 *
1249 * Multiply Q in A by left singular vectors of R in
1250 * WORK(IU), storing result in U
1251 * (Workspace: need N*N)
1252 *
1253  CALL sgemm( 'N', 'N', m, n, n, one, a, lda,
1254  $ work( iu ), ldwrku, zero, u, ldu )
1255 *
1256 * Copy right singular vectors of R to A
1257 * (Workspace: need N*N)
1258 *
1259  CALL slacpy( 'F', n, n, work( ir ), ldwrkr, a,
1260  $ lda )
1261 *
1262  ELSE
1263 *
1264 * Insufficient workspace for a fast algorithm
1265 *
1266  itau = 1
1267  iwork = itau + n
1268 *
1269 * Compute A=Q*R, copying result to U
1270 * (Workspace: need 2*N, prefer N+N*NB)
1271 *
1272  CALL sgeqrf( m, n, a, lda, work( itau ),
1273  $ work( iwork ), lwork-iwork+1, ierr )
1274  CALL slacpy( 'L', m, n, a, lda, u, ldu )
1275 *
1276 * Generate Q in U
1277 * (Workspace: need 2*N, prefer N+N*NB)
1278 *
1279  CALL sorgqr( m, n, n, u, ldu, work( itau ),
1280  $ work( iwork ), lwork-iwork+1, ierr )
1281  ie = itau
1282  itauq = ie + n
1283  itaup = itauq + n
1284  iwork = itaup + n
1285 *
1286 * Zero out below R in A
1287 *
1288  CALL slaset( 'L', n-1, n-1, zero, zero, a( 2, 1 ),
1289  $ lda )
1290 *
1291 * Bidiagonalize R in A
1292 * (Workspace: need 4*N, prefer 3*N+2*N*NB)
1293 *
1294  CALL sgebrd( n, n, a, lda, s, work( ie ),
1295  $ work( itauq ), work( itaup ),
1296  $ work( iwork ), lwork-iwork+1, ierr )
1297 *
1298 * Multiply Q in U by left vectors bidiagonalizing R
1299 * (Workspace: need 3*N+M, prefer 3*N+M*NB)
1300 *
1301  CALL sormbr( 'Q', 'R', 'N', m, n, n, a, lda,
1302  $ work( itauq ), u, ldu, work( iwork ),
1303  $ lwork-iwork+1, ierr )
1304 *
1305 * Generate right vectors bidiagonalizing R in A
1306 * (Workspace: need 4*N-1, prefer 3*N+(N-1)*NB)
1307 *
1308  CALL sorgbr( 'P', n, n, n, a, lda, work( itaup ),
1309  $ work( iwork ), lwork-iwork+1, ierr )
1310  iwork = ie + n
1311 *
1312 * Perform bidiagonal QR iteration, computing left
1313 * singular vectors of A in U and computing right
1314 * singular vectors of A in A
1315 * (Workspace: need BDSPAC)
1316 *
1317  CALL sbdsqr( 'U', n, n, m, 0, s, work( ie ), a,
1318  $ lda, u, ldu, dum, 1, work( iwork ),
1319  $ info )
1320 *
1321  END IF
1322 *
1323  ELSE IF( wntvas ) THEN
1324 *
1325 * Path 6 (M much larger than N, JOBU='S', JOBVT='S'
1326 * or 'A')
1327 * N left singular vectors to be computed in U and
1328 * N right singular vectors to be computed in VT
1329 *
1330  IF( lwork.GE.n*n+max( 4*n, bdspac ) ) THEN
1331 *
1332 * Sufficient workspace for a fast algorithm
1333 *
1334  iu = 1
1335  IF( lwork.GE.wrkbl+lda*n ) THEN
1336 *
1337 * WORK(IU) is LDA by N
1338 *
1339  ldwrku = lda
1340  ELSE
1341 *
1342 * WORK(IU) is N by N
1343 *
1344  ldwrku = n
1345  END IF
1346  itau = iu + ldwrku*n
1347  iwork = itau + n
1348 *
1349 * Compute A=Q*R
1350 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
1351 *
1352  CALL sgeqrf( m, n, a, lda, work( itau ),
1353  $ work( iwork ), lwork-iwork+1, ierr )
1354 *
1355 * Copy R to WORK(IU), zeroing out below it
1356 *
1357  CALL slacpy( 'U', n, n, a, lda, work( iu ),
1358  $ ldwrku )
1359  CALL slaset( 'L', n-1, n-1, zero, zero,
1360  $ work( iu+1 ), ldwrku )
1361 *
1362 * Generate Q in A
1363 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
1364 *
1365  CALL sorgqr( m, n, n, a, lda, work( itau ),
1366  $ work( iwork ), lwork-iwork+1, ierr )
1367  ie = itau
1368  itauq = ie + n
1369  itaup = itauq + n
1370  iwork = itaup + n
1371 *
1372 * Bidiagonalize R in WORK(IU), copying result to VT
1373 * (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB)
1374 *
1375  CALL sgebrd( n, n, work( iu ), ldwrku, s,
1376  $ work( ie ), work( itauq ),
1377  $ work( itaup ), work( iwork ),
1378  $ lwork-iwork+1, ierr )
1379  CALL slacpy( 'U', n, n, work( iu ), ldwrku, vt,
1380  $ ldvt )
1381 *
1382 * Generate left bidiagonalizing vectors in WORK(IU)
1383 * (Workspace: need N*N+4*N, prefer N*N+3*N+N*NB)
1384 *
1385  CALL sorgbr( 'Q', n, n, n, work( iu ), ldwrku,
1386  $ work( itauq ), work( iwork ),
1387  $ lwork-iwork+1, ierr )
1388 *
1389 * Generate right bidiagonalizing vectors in VT
1390 * (Workspace: need N*N+4*N-1,
1391 * prefer N*N+3*N+(N-1)*NB)
1392 *
1393  CALL sorgbr( 'P', n, n, n, vt, ldvt, work( itaup ),
1394  $ work( iwork ), lwork-iwork+1, ierr )
1395  iwork = ie + n
1396 *
1397 * Perform bidiagonal QR iteration, computing left
1398 * singular vectors of R in WORK(IU) and computing
1399 * right singular vectors of R in VT
1400 * (Workspace: need N*N+BDSPAC)
1401 *
1402  CALL sbdsqr( 'U', n, n, n, 0, s, work( ie ), vt,
1403  $ ldvt, work( iu ), ldwrku, dum, 1,
1404  $ work( iwork ), info )
1405 *
1406 * Multiply Q in A by left singular vectors of R in
1407 * WORK(IU), storing result in U
1408 * (Workspace: need N*N)
1409 *
1410  CALL sgemm( 'N', 'N', m, n, n, one, a, lda,
1411  $ work( iu ), ldwrku, zero, u, ldu )
1412 *
1413  ELSE
1414 *
1415 * Insufficient workspace for a fast algorithm
1416 *
1417  itau = 1
1418  iwork = itau + n
1419 *
1420 * Compute A=Q*R, copying result to U
1421 * (Workspace: need 2*N, prefer N+N*NB)
1422 *
1423  CALL sgeqrf( m, n, a, lda, work( itau ),
1424  $ work( iwork ), lwork-iwork+1, ierr )
1425  CALL slacpy( 'L', m, n, a, lda, u, ldu )
1426 *
1427 * Generate Q in U
1428 * (Workspace: need 2*N, prefer N+N*NB)
1429 *
1430  CALL sorgqr( m, n, n, u, ldu, work( itau ),
1431  $ work( iwork ), lwork-iwork+1, ierr )
1432 *
1433 * Copy R to VT, zeroing out below it
1434 *
1435  CALL slacpy( 'U', n, n, a, lda, vt, ldvt )
1436  IF( n.GT.1 )
1437  $ CALL slaset( 'L', n-1, n-1, zero, zero,
1438  $ vt( 2, 1 ), ldvt )
1439  ie = itau
1440  itauq = ie + n
1441  itaup = itauq + n
1442  iwork = itaup + n
1443 *
1444 * Bidiagonalize R in VT
1445 * (Workspace: need 4*N, prefer 3*N+2*N*NB)
1446 *
1447  CALL sgebrd( n, n, vt, ldvt, s, work( ie ),
1448  $ work( itauq ), work( itaup ),
1449  $ work( iwork ), lwork-iwork+1, ierr )
1450 *
1451 * Multiply Q in U by left bidiagonalizing vectors
1452 * in VT
1453 * (Workspace: need 3*N+M, prefer 3*N+M*NB)
1454 *
1455  CALL sormbr( 'Q', 'R', 'N', m, n, n, vt, ldvt,
1456  $ work( itauq ), u, ldu, work( iwork ),
1457  $ lwork-iwork+1, ierr )
1458 *
1459 * Generate right bidiagonalizing vectors in VT
1460 * (Workspace: need 4*N-1, prefer 3*N+(N-1)*NB)
1461 *
1462  CALL sorgbr( 'P', n, n, n, vt, ldvt, work( itaup ),
1463  $ work( iwork ), lwork-iwork+1, ierr )
1464  iwork = ie + n
1465 *
1466 * Perform bidiagonal QR iteration, computing left
1467 * singular vectors of A in U and computing right
1468 * singular vectors of A in VT
1469 * (Workspace: need BDSPAC)
1470 *
1471  CALL sbdsqr( 'U', n, n, m, 0, s, work( ie ), vt,
1472  $ ldvt, u, ldu, dum, 1, work( iwork ),
1473  $ info )
1474 *
1475  END IF
1476 *
1477  END IF
1478 *
1479  ELSE IF( wntua ) THEN
1480 *
1481  IF( wntvn ) THEN
1482 *
1483 * Path 7 (M much larger than N, JOBU='A', JOBVT='N')
1484 * M left singular vectors to be computed in U and
1485 * no right singular vectors to be computed
1486 *
1487  IF( lwork.GE.n*n+max( n+m, 4*n, bdspac ) ) THEN
1488 *
1489 * Sufficient workspace for a fast algorithm
1490 *
1491  ir = 1
1492  IF( lwork.GE.wrkbl+lda*n ) THEN
1493 *
1494 * WORK(IR) is LDA by N
1495 *
1496  ldwrkr = lda
1497  ELSE
1498 *
1499 * WORK(IR) is N by N
1500 *
1501  ldwrkr = n
1502  END IF
1503  itau = ir + ldwrkr*n
1504  iwork = itau + n
1505 *
1506 * Compute A=Q*R, copying result to U
1507 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
1508 *
1509  CALL sgeqrf( m, n, a, lda, work( itau ),
1510  $ work( iwork ), lwork-iwork+1, ierr )
1511  CALL slacpy( 'L', m, n, a, lda, u, ldu )
1512 *
1513 * Copy R to WORK(IR), zeroing out below it
1514 *
1515  CALL slacpy( 'U', n, n, a, lda, work( ir ),
1516  $ ldwrkr )
1517  CALL slaset( 'L', n-1, n-1, zero, zero,
1518  $ work( ir+1 ), ldwrkr )
1519 *
1520 * Generate Q in U
1521 * (Workspace: need N*N+N+M, prefer N*N+N+M*NB)
1522 *
1523  CALL sorgqr( m, m, n, u, ldu, work( itau ),
1524  $ work( iwork ), lwork-iwork+1, ierr )
1525  ie = itau
1526  itauq = ie + n
1527  itaup = itauq + n
1528  iwork = itaup + n
1529 *
1530 * Bidiagonalize R in WORK(IR)
1531 * (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB)
1532 *
1533  CALL sgebrd( n, n, work( ir ), ldwrkr, s,
1534  $ work( ie ), work( itauq ),
1535  $ work( itaup ), work( iwork ),
1536  $ lwork-iwork+1, ierr )
1537 *
1538 * Generate left bidiagonalizing vectors in WORK(IR)
1539 * (Workspace: need N*N+4*N, prefer N*N+3*N+N*NB)
1540 *
1541  CALL sorgbr( 'Q', n, n, n, work( ir ), ldwrkr,
1542  $ work( itauq ), work( iwork ),
1543  $ lwork-iwork+1, ierr )
1544  iwork = ie + n
1545 *
1546 * Perform bidiagonal QR iteration, computing left
1547 * singular vectors of R in WORK(IR)
1548 * (Workspace: need N*N+BDSPAC)
1549 *
1550  CALL sbdsqr( 'U', n, 0, n, 0, s, work( ie ), dum,
1551  $ 1, work( ir ), ldwrkr, dum, 1,
1552  $ work( iwork ), info )
1553 *
1554 * Multiply Q in U by left singular vectors of R in
1555 * WORK(IR), storing result in A
1556 * (Workspace: need N*N)
1557 *
1558  CALL sgemm( 'N', 'N', m, n, n, one, u, ldu,
1559  $ work( ir ), ldwrkr, zero, a, lda )
1560 *
1561 * Copy left singular vectors of A from A to U
1562 *
1563  CALL slacpy( 'F', m, n, a, lda, u, ldu )
1564 *
1565  ELSE
1566 *
1567 * Insufficient workspace for a fast algorithm
1568 *
1569  itau = 1
1570  iwork = itau + n
1571 *
1572 * Compute A=Q*R, copying result to U
1573 * (Workspace: need 2*N, prefer N+N*NB)
1574 *
1575  CALL sgeqrf( m, n, a, lda, work( itau ),
1576  $ work( iwork ), lwork-iwork+1, ierr )
1577  CALL slacpy( 'L', m, n, a, lda, u, ldu )
1578 *
1579 * Generate Q in U
1580 * (Workspace: need N+M, prefer N+M*NB)
1581 *
1582  CALL sorgqr( m, m, n, u, ldu, work( itau ),
1583  $ work( iwork ), lwork-iwork+1, ierr )
1584  ie = itau
1585  itauq = ie + n
1586  itaup = itauq + n
1587  iwork = itaup + n
1588 *
1589 * Zero out below R in A
1590 *
1591  CALL slaset( 'L', n-1, n-1, zero, zero, a( 2, 1 ),
1592  $ lda )
1593 *
1594 * Bidiagonalize R in A
1595 * (Workspace: need 4*N, prefer 3*N+2*N*NB)
1596 *
1597  CALL sgebrd( n, n, a, lda, s, work( ie ),
1598  $ work( itauq ), work( itaup ),
1599  $ work( iwork ), lwork-iwork+1, ierr )
1600 *
1601 * Multiply Q in U by left bidiagonalizing vectors
1602 * in A
1603 * (Workspace: need 3*N+M, prefer 3*N+M*NB)
1604 *
1605  CALL sormbr( 'Q', 'R', 'N', m, n, n, a, lda,
1606  $ work( itauq ), u, ldu, work( iwork ),
1607  $ lwork-iwork+1, ierr )
1608  iwork = ie + n
1609 *
1610 * Perform bidiagonal QR iteration, computing left
1611 * singular vectors of A in U
1612 * (Workspace: need BDSPAC)
1613 *
1614  CALL sbdsqr( 'U', n, 0, m, 0, s, work( ie ), dum,
1615  $ 1, u, ldu, dum, 1, work( iwork ),
1616  $ info )
1617 *
1618  END IF
1619 *
1620  ELSE IF( wntvo ) THEN
1621 *
1622 * Path 8 (M much larger than N, JOBU='A', JOBVT='O')
1623 * M left singular vectors to be computed in U and
1624 * N right singular vectors to be overwritten on A
1625 *
1626  IF( lwork.GE.2*n*n+max( n+m, 4*n, bdspac ) ) THEN
1627 *
1628 * Sufficient workspace for a fast algorithm
1629 *
1630  iu = 1
1631  IF( lwork.GE.wrkbl+2*lda*n ) THEN
1632 *
1633 * WORK(IU) is LDA by N and WORK(IR) is LDA by N
1634 *
1635  ldwrku = lda
1636  ir = iu + ldwrku*n
1637  ldwrkr = lda
1638  ELSE IF( lwork.GE.wrkbl+( lda+n )*n ) THEN
1639 *
1640 * WORK(IU) is LDA by N and WORK(IR) is N by N
1641 *
1642  ldwrku = lda
1643  ir = iu + ldwrku*n
1644  ldwrkr = n
1645  ELSE
1646 *
1647 * WORK(IU) is N by N and WORK(IR) is N by N
1648 *
1649  ldwrku = n
1650  ir = iu + ldwrku*n
1651  ldwrkr = n
1652  END IF
1653  itau = ir + ldwrkr*n
1654  iwork = itau + n
1655 *
1656 * Compute A=Q*R, copying result to U
1657 * (Workspace: need 2*N*N+2*N, prefer 2*N*N+N+N*NB)
1658 *
1659  CALL sgeqrf( m, n, a, lda, work( itau ),
1660  $ work( iwork ), lwork-iwork+1, ierr )
1661  CALL slacpy( 'L', m, n, a, lda, u, ldu )
1662 *
1663 * Generate Q in U
1664 * (Workspace: need 2*N*N+N+M, prefer 2*N*N+N+M*NB)
1665 *
1666  CALL sorgqr( m, m, n, u, ldu, work( itau ),
1667  $ work( iwork ), lwork-iwork+1, ierr )
1668 *
1669 * Copy R to WORK(IU), zeroing out below it
1670 *
1671  CALL slacpy( 'U', n, n, a, lda, work( iu ),
1672  $ ldwrku )
1673  CALL slaset( 'L', n-1, n-1, zero, zero,
1674  $ work( iu+1 ), ldwrku )
1675  ie = itau
1676  itauq = ie + n
1677  itaup = itauq + n
1678  iwork = itaup + n
1679 *
1680 * Bidiagonalize R in WORK(IU), copying result to
1681 * WORK(IR)
1682 * (Workspace: need 2*N*N+4*N,
1683 * prefer 2*N*N+3*N+2*N*NB)
1684 *
1685  CALL sgebrd( n, n, work( iu ), ldwrku, s,
1686  $ work( ie ), work( itauq ),
1687  $ work( itaup ), work( iwork ),
1688  $ lwork-iwork+1, ierr )
1689  CALL slacpy( 'U', n, n, work( iu ), ldwrku,
1690  $ work( ir ), ldwrkr )
1691 *
1692 * Generate left bidiagonalizing vectors in WORK(IU)
1693 * (Workspace: need 2*N*N+4*N, prefer 2*N*N+3*N+N*NB)
1694 *
1695  CALL sorgbr( 'Q', n, n, n, work( iu ), ldwrku,
1696  $ work( itauq ), work( iwork ),
1697  $ lwork-iwork+1, ierr )
1698 *
1699 * Generate right bidiagonalizing vectors in WORK(IR)
1700 * (Workspace: need 2*N*N+4*N-1,
1701 * prefer 2*N*N+3*N+(N-1)*NB)
1702 *
1703  CALL sorgbr( 'P', n, n, n, work( ir ), ldwrkr,
1704  $ work( itaup ), work( iwork ),
1705  $ lwork-iwork+1, ierr )
1706  iwork = ie + n
1707 *
1708 * Perform bidiagonal QR iteration, computing left
1709 * singular vectors of R in WORK(IU) and computing
1710 * right singular vectors of R in WORK(IR)
1711 * (Workspace: need 2*N*N+BDSPAC)
1712 *
1713  CALL sbdsqr( 'U', n, n, n, 0, s, work( ie ),
1714  $ work( ir ), ldwrkr, work( iu ),
1715  $ ldwrku, dum, 1, work( iwork ), info )
1716 *
1717 * Multiply Q in U by left singular vectors of R in
1718 * WORK(IU), storing result in A
1719 * (Workspace: need N*N)
1720 *
1721  CALL sgemm( 'N', 'N', m, n, n, one, u, ldu,
1722  $ work( iu ), ldwrku, zero, a, lda )
1723 *
1724 * Copy left singular vectors of A from A to U
1725 *
1726  CALL slacpy( 'F', m, n, a, lda, u, ldu )
1727 *
1728 * Copy right singular vectors of R from WORK(IR) to A
1729 *
1730  CALL slacpy( 'F', n, n, work( ir ), ldwrkr, a,
1731  $ lda )
1732 *
1733  ELSE
1734 *
1735 * Insufficient workspace for a fast algorithm
1736 *
1737  itau = 1
1738  iwork = itau + n
1739 *
1740 * Compute A=Q*R, copying result to U
1741 * (Workspace: need 2*N, prefer N+N*NB)
1742 *
1743  CALL sgeqrf( m, n, a, lda, work( itau ),
1744  $ work( iwork ), lwork-iwork+1, ierr )
1745  CALL slacpy( 'L', m, n, a, lda, u, ldu )
1746 *
1747 * Generate Q in U
1748 * (Workspace: need N+M, prefer N+M*NB)
1749 *
1750  CALL sorgqr( m, m, n, u, ldu, work( itau ),
1751  $ work( iwork ), lwork-iwork+1, ierr )
1752  ie = itau
1753  itauq = ie + n
1754  itaup = itauq + n
1755  iwork = itaup + n
1756 *
1757 * Zero out below R in A
1758 *
1759  CALL slaset( 'L', n-1, n-1, zero, zero, a( 2, 1 ),
1760  $ lda )
1761 *
1762 * Bidiagonalize R in A
1763 * (Workspace: need 4*N, prefer 3*N+2*N*NB)
1764 *
1765  CALL sgebrd( n, n, a, lda, s, work( ie ),
1766  $ work( itauq ), work( itaup ),
1767  $ work( iwork ), lwork-iwork+1, ierr )
1768 *
1769 * Multiply Q in U by left bidiagonalizing vectors
1770 * in A
1771 * (Workspace: need 3*N+M, prefer 3*N+M*NB)
1772 *
1773  CALL sormbr( 'Q', 'R', 'N', m, n, n, a, lda,
1774  $ work( itauq ), u, ldu, work( iwork ),
1775  $ lwork-iwork+1, ierr )
1776 *
1777 * Generate right bidiagonalizing vectors in A
1778 * (Workspace: need 4*N-1, prefer 3*N+(N-1)*NB)
1779 *
1780  CALL sorgbr( 'P', n, n, n, a, lda, work( itaup ),
1781  $ work( iwork ), lwork-iwork+1, ierr )
1782  iwork = ie + n
1783 *
1784 * Perform bidiagonal QR iteration, computing left
1785 * singular vectors of A in U and computing right
1786 * singular vectors of A in A
1787 * (Workspace: need BDSPAC)
1788 *
1789  CALL sbdsqr( 'U', n, n, m, 0, s, work( ie ), a,
1790  $ lda, u, ldu, dum, 1, work( iwork ),
1791  $ info )
1792 *
1793  END IF
1794 *
1795  ELSE IF( wntvas ) THEN
1796 *
1797 * Path 9 (M much larger than N, JOBU='A', JOBVT='S'
1798 * or 'A')
1799 * M left singular vectors to be computed in U and
1800 * N right singular vectors to be computed in VT
1801 *
1802  IF( lwork.GE.n*n+max( n+m, 4*n, bdspac ) ) THEN
1803 *
1804 * Sufficient workspace for a fast algorithm
1805 *
1806  iu = 1
1807  IF( lwork.GE.wrkbl+lda*n ) THEN
1808 *
1809 * WORK(IU) is LDA by N
1810 *
1811  ldwrku = lda
1812  ELSE
1813 *
1814 * WORK(IU) is N by N
1815 *
1816  ldwrku = n
1817  END IF
1818  itau = iu + ldwrku*n
1819  iwork = itau + n
1820 *
1821 * Compute A=Q*R, copying result to U
1822 * (Workspace: need N*N+2*N, prefer N*N+N+N*NB)
1823 *
1824  CALL sgeqrf( m, n, a, lda, work( itau ),
1825  $ work( iwork ), lwork-iwork+1, ierr )
1826  CALL slacpy( 'L', m, n, a, lda, u, ldu )
1827 *
1828 * Generate Q in U
1829 * (Workspace: need N*N+N+M, prefer N*N+N+M*NB)
1830 *
1831  CALL sorgqr( m, m, n, u, ldu, work( itau ),
1832  $ work( iwork ), lwork-iwork+1, ierr )
1833 *
1834 * Copy R to WORK(IU), zeroing out below it
1835 *
1836  CALL slacpy( 'U', n, n, a, lda, work( iu ),
1837  $ ldwrku )
1838  CALL slaset( 'L', n-1, n-1, zero, zero,
1839  $ work( iu+1 ), ldwrku )
1840  ie = itau
1841  itauq = ie + n
1842  itaup = itauq + n
1843  iwork = itaup + n
1844 *
1845 * Bidiagonalize R in WORK(IU), copying result to VT
1846 * (Workspace: need N*N+4*N, prefer N*N+3*N+2*N*NB)
1847 *
1848  CALL sgebrd( n, n, work( iu ), ldwrku, s,
1849  $ work( ie ), work( itauq ),
1850  $ work( itaup ), work( iwork ),
1851  $ lwork-iwork+1, ierr )
1852  CALL slacpy( 'U', n, n, work( iu ), ldwrku, vt,
1853  $ ldvt )
1854 *
1855 * Generate left bidiagonalizing vectors in WORK(IU)
1856 * (Workspace: need N*N+4*N, prefer N*N+3*N+N*NB)
1857 *
1858  CALL sorgbr( 'Q', n, n, n, work( iu ), ldwrku,
1859  $ work( itauq ), work( iwork ),
1860  $ lwork-iwork+1, ierr )
1861 *
1862 * Generate right bidiagonalizing vectors in VT
1863 * (Workspace: need N*N+4*N-1,
1864 * prefer N*N+3*N+(N-1)*NB)
1865 *
1866  CALL sorgbr( 'P', n, n, n, vt, ldvt, work( itaup ),
1867  $ work( iwork ), lwork-iwork+1, ierr )
1868  iwork = ie + n
1869 *
1870 * Perform bidiagonal QR iteration, computing left
1871 * singular vectors of R in WORK(IU) and computing
1872 * right singular vectors of R in VT
1873 * (Workspace: need N*N+BDSPAC)
1874 *
1875  CALL sbdsqr( 'U', n, n, n, 0, s, work( ie ), vt,
1876  $ ldvt, work( iu ), ldwrku, dum, 1,
1877  $ work( iwork ), info )
1878 *
1879 * Multiply Q in U by left singular vectors of R in
1880 * WORK(IU), storing result in A
1881 * (Workspace: need N*N)
1882 *
1883  CALL sgemm( 'N', 'N', m, n, n, one, u, ldu,
1884  $ work( iu ), ldwrku, zero, a, lda )
1885 *
1886 * Copy left singular vectors of A from A to U
1887 *
1888  CALL slacpy( 'F', m, n, a, lda, u, ldu )
1889 *
1890  ELSE
1891 *
1892 * Insufficient workspace for a fast algorithm
1893 *
1894  itau = 1
1895  iwork = itau + n
1896 *
1897 * Compute A=Q*R, copying result to U
1898 * (Workspace: need 2*N, prefer N+N*NB)
1899 *
1900  CALL sgeqrf( m, n, a, lda, work( itau ),
1901  $ work( iwork ), lwork-iwork+1, ierr )
1902  CALL slacpy( 'L', m, n, a, lda, u, ldu )
1903 *
1904 * Generate Q in U
1905 * (Workspace: need N+M, prefer N+M*NB)
1906 *
1907  CALL sorgqr( m, m, n, u, ldu, work( itau ),
1908  $ work( iwork ), lwork-iwork+1, ierr )
1909 *
1910 * Copy R from A to VT, zeroing out below it
1911 *
1912  CALL slacpy( 'U', n, n, a, lda, vt, ldvt )
1913  IF( n.GT.1 )
1914  $ CALL slaset( 'L', n-1, n-1, zero, zero,
1915  $ vt( 2, 1 ), ldvt )
1916  ie = itau
1917  itauq = ie + n
1918  itaup = itauq + n
1919  iwork = itaup + n
1920 *
1921 * Bidiagonalize R in VT
1922 * (Workspace: need 4*N, prefer 3*N+2*N*NB)
1923 *
1924  CALL sgebrd( n, n, vt, ldvt, s, work( ie ),
1925  $ work( itauq ), work( itaup ),
1926  $ work( iwork ), lwork-iwork+1, ierr )
1927 *
1928 * Multiply Q in U by left bidiagonalizing vectors
1929 * in VT
1930 * (Workspace: need 3*N+M, prefer 3*N+M*NB)
1931 *
1932  CALL sormbr( 'Q', 'R', 'N', m, n, n, vt, ldvt,
1933  $ work( itauq ), u, ldu, work( iwork ),
1934  $ lwork-iwork+1, ierr )
1935 *
1936 * Generate right bidiagonalizing vectors in VT
1937 * (Workspace: need 4*N-1, prefer 3*N+(N-1)*NB)
1938 *
1939  CALL sorgbr( 'P', n, n, n, vt, ldvt, work( itaup ),
1940  $ work( iwork ), lwork-iwork+1, ierr )
1941  iwork = ie + n
1942 *
1943 * Perform bidiagonal QR iteration, computing left
1944 * singular vectors of A in U and computing right
1945 * singular vectors of A in VT
1946 * (Workspace: need BDSPAC)
1947 *
1948  CALL sbdsqr( 'U', n, n, m, 0, s, work( ie ), vt,
1949  $ ldvt, u, ldu, dum, 1, work( iwork ),
1950  $ info )
1951 *
1952  END IF
1953 *
1954  END IF
1955 *
1956  END IF
1957 *
1958  ELSE
1959 *
1960 * M .LT. MNTHR
1961 *
1962 * Path 10 (M at least N, but not much larger)
1963 * Reduce to bidiagonal form without QR decomposition
1964 *
1965  ie = 1
1966  itauq = ie + n
1967  itaup = itauq + n
1968  iwork = itaup + n
1969 *
1970 * Bidiagonalize A
1971 * (Workspace: need 3*N+M, prefer 3*N+(M+N)*NB)
1972 *
1973  CALL sgebrd( m, n, a, lda, s, work( ie ), work( itauq ),
1974  $ work( itaup ), work( iwork ), lwork-iwork+1,
1975  $ ierr )
1976  IF( wntuas ) THEN
1977 *
1978 * If left singular vectors desired in U, copy result to U
1979 * and generate left bidiagonalizing vectors in U
1980 * (Workspace: need 3*N+NCU, prefer 3*N+NCU*NB)
1981 *
1982  CALL slacpy( 'L', m, n, a, lda, u, ldu )
1983  IF( wntus )
1984  $ ncu = n
1985  IF( wntua )
1986  $ ncu = m
1987  CALL sorgbr( 'Q', m, ncu, n, u, ldu, work( itauq ),
1988  $ work( iwork ), lwork-iwork+1, ierr )
1989  END IF
1990  IF( wntvas ) THEN
1991 *
1992 * If right singular vectors desired in VT, copy result to
1993 * VT and generate right bidiagonalizing vectors in VT
1994 * (Workspace: need 4*N-1, prefer 3*N+(N-1)*NB)
1995 *
1996  CALL slacpy( 'U', n, n, a, lda, vt, ldvt )
1997  CALL sorgbr( 'P', n, n, n, vt, ldvt, work( itaup ),
1998  $ work( iwork ), lwork-iwork+1, ierr )
1999  END IF
2000  IF( wntuo ) THEN
2001 *
2002 * If left singular vectors desired in A, generate left
2003 * bidiagonalizing vectors in A
2004 * (Workspace: need 4*N, prefer 3*N+N*NB)
2005 *
2006  CALL sorgbr( 'Q', m, n, n, a, lda, work( itauq ),
2007  $ work( iwork ), lwork-iwork+1, ierr )
2008  END IF
2009  IF( wntvo ) THEN
2010 *
2011 * If right singular vectors desired in A, generate right
2012 * bidiagonalizing vectors in A
2013 * (Workspace: need 4*N-1, prefer 3*N+(N-1)*NB)
2014 *
2015  CALL sorgbr( 'P', n, n, n, a, lda, work( itaup ),
2016  $ work( iwork ), lwork-iwork+1, ierr )
2017  END IF
2018  iwork = ie + n
2019  IF( wntuas .OR. wntuo )
2020  $ nru = m
2021  IF( wntun )
2022  $ nru = 0
2023  IF( wntvas .OR. wntvo )
2024  $ ncvt = n
2025  IF( wntvn )
2026  $ ncvt = 0
2027  IF( ( .NOT.wntuo ) .AND. ( .NOT.wntvo ) ) THEN
2028 *
2029 * Perform bidiagonal QR iteration, if desired, computing
2030 * left singular vectors in U and computing right singular
2031 * vectors in VT
2032 * (Workspace: need BDSPAC)
2033 *
2034  CALL sbdsqr( 'U', n, ncvt, nru, 0, s, work( ie ), vt,
2035  $ ldvt, u, ldu, dum, 1, work( iwork ), info )
2036  ELSE IF( ( .NOT.wntuo ) .AND. wntvo ) THEN
2037 *
2038 * Perform bidiagonal QR iteration, if desired, computing
2039 * left singular vectors in U and computing right singular
2040 * vectors in A
2041 * (Workspace: need BDSPAC)
2042 *
2043  CALL sbdsqr( 'U', n, ncvt, nru, 0, s, work( ie ), a, lda,
2044  $ u, ldu, dum, 1, work( iwork ), info )
2045  ELSE
2046 *
2047 * Perform bidiagonal QR iteration, if desired, computing
2048 * left singular vectors in A and computing right singular
2049 * vectors in VT
2050 * (Workspace: need BDSPAC)
2051 *
2052  CALL sbdsqr( 'U', n, ncvt, nru, 0, s, work( ie ), vt,
2053  $ ldvt, a, lda, dum, 1, work( iwork ), info )
2054  END IF
2055 *
2056  END IF
2057 *
2058  ELSE
2059 *
2060 * A has more columns than rows. If A has sufficiently more
2061 * columns than rows, first reduce using the LQ decomposition (if
2062 * sufficient workspace available)
2063 *
2064  IF( n.GE.mnthr ) THEN
2065 *
2066  IF( wntvn ) THEN
2067 *
2068 * Path 1t(N much larger than M, JOBVT='N')
2069 * No right singular vectors to be computed
2070 *
2071  itau = 1
2072  iwork = itau + m
2073 *
2074 * Compute A=L*Q
2075 * (Workspace: need 2*M, prefer M+M*NB)
2076 *
2077  CALL sgelqf( m, n, a, lda, work( itau ), work( iwork ),
2078  $ lwork-iwork+1, ierr )
2079 *
2080 * Zero out above L
2081 *
2082  CALL slaset( 'U', m-1, m-1, zero, zero, a( 1, 2 ), lda )
2083  ie = 1
2084  itauq = ie + m
2085  itaup = itauq + m
2086  iwork = itaup + m
2087 *
2088 * Bidiagonalize L in A
2089 * (Workspace: need 4*M, prefer 3*M+2*M*NB)
2090 *
2091  CALL sgebrd( m, m, a, lda, s, work( ie ), work( itauq ),
2092  $ work( itaup ), work( iwork ), lwork-iwork+1,
2093  $ ierr )
2094  IF( wntuo .OR. wntuas ) THEN
2095 *
2096 * If left singular vectors desired, generate Q
2097 * (Workspace: need 4*M, prefer 3*M+M*NB)
2098 *
2099  CALL sorgbr( 'Q', m, m, m, a, lda, work( itauq ),
2100  $ work( iwork ), lwork-iwork+1, ierr )
2101  END IF
2102  iwork = ie + m
2103  nru = 0
2104  IF( wntuo .OR. wntuas )
2105  $ nru = m
2106 *
2107 * Perform bidiagonal QR iteration, computing left singular
2108 * vectors of A in A if desired
2109 * (Workspace: need BDSPAC)
2110 *
2111  CALL sbdsqr( 'U', m, 0, nru, 0, s, work( ie ), dum, 1, a,
2112  $ lda, dum, 1, work( iwork ), info )
2113 *
2114 * If left singular vectors desired in U, copy them there
2115 *
2116  IF( wntuas )
2117  $ CALL slacpy( 'F', m, m, a, lda, u, ldu )
2118 *
2119  ELSE IF( wntvo .AND. wntun ) THEN
2120 *
2121 * Path 2t(N much larger than M, JOBU='N', JOBVT='O')
2122 * M right singular vectors to be overwritten on A and
2123 * no left singular vectors to be computed
2124 *
2125  IF( lwork.GE.m*m+max( 4*m, bdspac ) ) THEN
2126 *
2127 * Sufficient workspace for a fast algorithm
2128 *
2129  ir = 1
2130  IF( lwork.GE.max( wrkbl, lda*n+m )+lda*m ) THEN
2131 *
2132 * WORK(IU) is LDA by N and WORK(IR) is LDA by M
2133 *
2134  ldwrku = lda
2135  chunk = n
2136  ldwrkr = lda
2137  ELSE IF( lwork.GE.max( wrkbl, lda*n+m )+m*m ) THEN
2138 *
2139 * WORK(IU) is LDA by N and WORK(IR) is M by M
2140 *
2141  ldwrku = lda
2142  chunk = n
2143  ldwrkr = m
2144  ELSE
2145 *
2146 * WORK(IU) is M by CHUNK and WORK(IR) is M by M
2147 *
2148  ldwrku = m
2149  chunk = ( lwork-m*m-m ) / m
2150  ldwrkr = m
2151  END IF
2152  itau = ir + ldwrkr*m
2153  iwork = itau + m
2154 *
2155 * Compute A=L*Q
2156 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
2157 *
2158  CALL sgelqf( m, n, a, lda, work( itau ),
2159  $ work( iwork ), lwork-iwork+1, ierr )
2160 *
2161 * Copy L to WORK(IR) and zero out above it
2162 *
2163  CALL slacpy( 'L', m, m, a, lda, work( ir ), ldwrkr )
2164  CALL slaset( 'U', m-1, m-1, zero, zero,
2165  $ work( ir+ldwrkr ), ldwrkr )
2166 *
2167 * Generate Q in A
2168 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
2169 *
2170  CALL sorglq( m, n, m, a, lda, work( itau ),
2171  $ work( iwork ), lwork-iwork+1, ierr )
2172  ie = itau
2173  itauq = ie + m
2174  itaup = itauq + m
2175  iwork = itaup + m
2176 *
2177 * Bidiagonalize L in WORK(IR)
2178 * (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB)
2179 *
2180  CALL sgebrd( m, m, work( ir ), ldwrkr, s, work( ie ),
2181  $ work( itauq ), work( itaup ),
2182  $ work( iwork ), lwork-iwork+1, ierr )
2183 *
2184 * Generate right vectors bidiagonalizing L
2185 * (Workspace: need M*M+4*M-1, prefer M*M+3*M+(M-1)*NB)
2186 *
2187  CALL sorgbr( 'P', m, m, m, work( ir ), ldwrkr,
2188  $ work( itaup ), work( iwork ),
2189  $ lwork-iwork+1, ierr )
2190  iwork = ie + m
2191 *
2192 * Perform bidiagonal QR iteration, computing right
2193 * singular vectors of L in WORK(IR)
2194 * (Workspace: need M*M+BDSPAC)
2195 *
2196  CALL sbdsqr( 'U', m, m, 0, 0, s, work( ie ),
2197  $ work( ir ), ldwrkr, dum, 1, dum, 1,
2198  $ work( iwork ), info )
2199  iu = ie + m
2200 *
2201 * Multiply right singular vectors of L in WORK(IR) by Q
2202 * in A, storing result in WORK(IU) and copying to A
2203 * (Workspace: need M*M+2*M, prefer M*M+M*N+M)
2204 *
2205  DO 30 i = 1, n, chunk
2206  blk = min( n-i+1, chunk )
2207  CALL sgemm( 'N', 'N', m, blk, m, one, work( ir ),
2208  $ ldwrkr, a( 1, i ), lda, zero,
2209  $ work( iu ), ldwrku )
2210  CALL slacpy( 'F', m, blk, work( iu ), ldwrku,
2211  $ a( 1, i ), lda )
2212  30 CONTINUE
2213 *
2214  ELSE
2215 *
2216 * Insufficient workspace for a fast algorithm
2217 *
2218  ie = 1
2219  itauq = ie + m
2220  itaup = itauq + m
2221  iwork = itaup + m
2222 *
2223 * Bidiagonalize A
2224 * (Workspace: need 3*M+N, prefer 3*M+(M+N)*NB)
2225 *
2226  CALL sgebrd( m, n, a, lda, s, work( ie ),
2227  $ work( itauq ), work( itaup ),
2228  $ work( iwork ), lwork-iwork+1, ierr )
2229 *
2230 * Generate right vectors bidiagonalizing A
2231 * (Workspace: need 4*M, prefer 3*M+M*NB)
2232 *
2233  CALL sorgbr( 'P', m, n, m, a, lda, work( itaup ),
2234  $ work( iwork ), lwork-iwork+1, ierr )
2235  iwork = ie + m
2236 *
2237 * Perform bidiagonal QR iteration, computing right
2238 * singular vectors of A in A
2239 * (Workspace: need BDSPAC)
2240 *
2241  CALL sbdsqr( 'L', m, n, 0, 0, s, work( ie ), a, lda,
2242  $ dum, 1, dum, 1, work( iwork ), info )
2243 *
2244  END IF
2245 *
2246  ELSE IF( wntvo .AND. wntuas ) THEN
2247 *
2248 * Path 3t(N much larger than M, JOBU='S' or 'A', JOBVT='O')
2249 * M right singular vectors to be overwritten on A and
2250 * M left singular vectors to be computed in U
2251 *
2252  IF( lwork.GE.m*m+max( 4*m, bdspac ) ) THEN
2253 *
2254 * Sufficient workspace for a fast algorithm
2255 *
2256  ir = 1
2257  IF( lwork.GE.max( wrkbl, lda*n+m )+lda*m ) THEN
2258 *
2259 * WORK(IU) is LDA by N and WORK(IR) is LDA by M
2260 *
2261  ldwrku = lda
2262  chunk = n
2263  ldwrkr = lda
2264  ELSE IF( lwork.GE.max( wrkbl, lda*n+m )+m*m ) THEN
2265 *
2266 * WORK(IU) is LDA by N and WORK(IR) is M by M
2267 *
2268  ldwrku = lda
2269  chunk = n
2270  ldwrkr = m
2271  ELSE
2272 *
2273 * WORK(IU) is M by CHUNK and WORK(IR) is M by M
2274 *
2275  ldwrku = m
2276  chunk = ( lwork-m*m-m ) / m
2277  ldwrkr = m
2278  END IF
2279  itau = ir + ldwrkr*m
2280  iwork = itau + m
2281 *
2282 * Compute A=L*Q
2283 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
2284 *
2285  CALL sgelqf( m, n, a, lda, work( itau ),
2286  $ work( iwork ), lwork-iwork+1, ierr )
2287 *
2288 * Copy L to U, zeroing about above it
2289 *
2290  CALL slacpy( 'L', m, m, a, lda, u, ldu )
2291  CALL slaset( 'U', m-1, m-1, zero, zero, u( 1, 2 ),
2292  $ ldu )
2293 *
2294 * Generate Q in A
2295 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
2296 *
2297  CALL sorglq( m, n, m, a, lda, work( itau ),
2298  $ work( iwork ), lwork-iwork+1, ierr )
2299  ie = itau
2300  itauq = ie + m
2301  itaup = itauq + m
2302  iwork = itaup + m
2303 *
2304 * Bidiagonalize L in U, copying result to WORK(IR)
2305 * (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB)
2306 *
2307  CALL sgebrd( m, m, u, ldu, s, work( ie ),
2308  $ work( itauq ), work( itaup ),
2309  $ work( iwork ), lwork-iwork+1, ierr )
2310  CALL slacpy( 'U', m, m, u, ldu, work( ir ), ldwrkr )
2311 *
2312 * Generate right vectors bidiagonalizing L in WORK(IR)
2313 * (Workspace: need M*M+4*M-1, prefer M*M+3*M+(M-1)*NB)
2314 *
2315  CALL sorgbr( 'P', m, m, m, work( ir ), ldwrkr,
2316  $ work( itaup ), work( iwork ),
2317  $ lwork-iwork+1, ierr )
2318 *
2319 * Generate left vectors bidiagonalizing L in U
2320 * (Workspace: need M*M+4*M, prefer M*M+3*M+M*NB)
2321 *
2322  CALL sorgbr( 'Q', m, m, m, u, ldu, work( itauq ),
2323  $ work( iwork ), lwork-iwork+1, ierr )
2324  iwork = ie + m
2325 *
2326 * Perform bidiagonal QR iteration, computing left
2327 * singular vectors of L in U, and computing right
2328 * singular vectors of L in WORK(IR)
2329 * (Workspace: need M*M+BDSPAC)
2330 *
2331  CALL sbdsqr( 'U', m, m, m, 0, s, work( ie ),
2332  $ work( ir ), ldwrkr, u, ldu, dum, 1,
2333  $ work( iwork ), info )
2334  iu = ie + m
2335 *
2336 * Multiply right singular vectors of L in WORK(IR) by Q
2337 * in A, storing result in WORK(IU) and copying to A
2338 * (Workspace: need M*M+2*M, prefer M*M+M*N+M))
2339 *
2340  DO 40 i = 1, n, chunk
2341  blk = min( n-i+1, chunk )
2342  CALL sgemm( 'N', 'N', m, blk, m, one, work( ir ),
2343  $ ldwrkr, a( 1, i ), lda, zero,
2344  $ work( iu ), ldwrku )
2345  CALL slacpy( 'F', m, blk, work( iu ), ldwrku,
2346  $ a( 1, i ), lda )
2347  40 CONTINUE
2348 *
2349  ELSE
2350 *
2351 * Insufficient workspace for a fast algorithm
2352 *
2353  itau = 1
2354  iwork = itau + m
2355 *
2356 * Compute A=L*Q
2357 * (Workspace: need 2*M, prefer M+M*NB)
2358 *
2359  CALL sgelqf( m, n, a, lda, work( itau ),
2360  $ work( iwork ), lwork-iwork+1, ierr )
2361 *
2362 * Copy L to U, zeroing out above it
2363 *
2364  CALL slacpy( 'L', m, m, a, lda, u, ldu )
2365  CALL slaset( 'U', m-1, m-1, zero, zero, u( 1, 2 ),
2366  $ ldu )
2367 *
2368 * Generate Q in A
2369 * (Workspace: need 2*M, prefer M+M*NB)
2370 *
2371  CALL sorglq( m, n, m, a, lda, work( itau ),
2372  $ work( iwork ), lwork-iwork+1, ierr )
2373  ie = itau
2374  itauq = ie + m
2375  itaup = itauq + m
2376  iwork = itaup + m
2377 *
2378 * Bidiagonalize L in U
2379 * (Workspace: need 4*M, prefer 3*M+2*M*NB)
2380 *
2381  CALL sgebrd( m, m, u, ldu, s, work( ie ),
2382  $ work( itauq ), work( itaup ),
2383  $ work( iwork ), lwork-iwork+1, ierr )
2384 *
2385 * Multiply right vectors bidiagonalizing L by Q in A
2386 * (Workspace: need 3*M+N, prefer 3*M+N*NB)
2387 *
2388  CALL sormbr( 'P', 'L', 'T', m, n, m, u, ldu,
2389  $ work( itaup ), a, lda, work( iwork ),
2390  $ lwork-iwork+1, ierr )
2391 *
2392 * Generate left vectors bidiagonalizing L in U
2393 * (Workspace: need 4*M, prefer 3*M+M*NB)
2394 *
2395  CALL sorgbr( 'Q', m, m, m, u, ldu, work( itauq ),
2396  $ work( iwork ), lwork-iwork+1, ierr )
2397  iwork = ie + m
2398 *
2399 * Perform bidiagonal QR iteration, computing left
2400 * singular vectors of A in U and computing right
2401 * singular vectors of A in A
2402 * (Workspace: need BDSPAC)
2403 *
2404  CALL sbdsqr( 'U', m, n, m, 0, s, work( ie ), a, lda,
2405  $ u, ldu, dum, 1, work( iwork ), info )
2406 *
2407  END IF
2408 *
2409  ELSE IF( wntvs ) THEN
2410 *
2411  IF( wntun ) THEN
2412 *
2413 * Path 4t(N much larger than M, JOBU='N', JOBVT='S')
2414 * M right singular vectors to be computed in VT and
2415 * no left singular vectors to be computed
2416 *
2417  IF( lwork.GE.m*m+max( 4*m, bdspac ) ) THEN
2418 *
2419 * Sufficient workspace for a fast algorithm
2420 *
2421  ir = 1
2422  IF( lwork.GE.wrkbl+lda*m ) THEN
2423 *
2424 * WORK(IR) is LDA by M
2425 *
2426  ldwrkr = lda
2427  ELSE
2428 *
2429 * WORK(IR) is M by M
2430 *
2431  ldwrkr = m
2432  END IF
2433  itau = ir + ldwrkr*m
2434  iwork = itau + m
2435 *
2436 * Compute A=L*Q
2437 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
2438 *
2439  CALL sgelqf( m, n, a, lda, work( itau ),
2440  $ work( iwork ), lwork-iwork+1, ierr )
2441 *
2442 * Copy L to WORK(IR), zeroing out above it
2443 *
2444  CALL slacpy( 'L', m, m, a, lda, work( ir ),
2445  $ ldwrkr )
2446  CALL slaset( 'U', m-1, m-1, zero, zero,
2447  $ work( ir+ldwrkr ), ldwrkr )
2448 *
2449 * Generate Q in A
2450 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
2451 *
2452  CALL sorglq( m, n, m, a, lda, work( itau ),
2453  $ work( iwork ), lwork-iwork+1, ierr )
2454  ie = itau
2455  itauq = ie + m
2456  itaup = itauq + m
2457  iwork = itaup + m
2458 *
2459 * Bidiagonalize L in WORK(IR)
2460 * (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB)
2461 *
2462  CALL sgebrd( m, m, work( ir ), ldwrkr, s,
2463  $ work( ie ), work( itauq ),
2464  $ work( itaup ), work( iwork ),
2465  $ lwork-iwork+1, ierr )
2466 *
2467 * Generate right vectors bidiagonalizing L in
2468 * WORK(IR)
2469 * (Workspace: need M*M+4*M, prefer M*M+3*M+(M-1)*NB)
2470 *
2471  CALL sorgbr( 'P', m, m, m, work( ir ), ldwrkr,
2472  $ work( itaup ), work( iwork ),
2473  $ lwork-iwork+1, ierr )
2474  iwork = ie + m
2475 *
2476 * Perform bidiagonal QR iteration, computing right
2477 * singular vectors of L in WORK(IR)
2478 * (Workspace: need M*M+BDSPAC)
2479 *
2480  CALL sbdsqr( 'U', m, m, 0, 0, s, work( ie ),
2481  $ work( ir ), ldwrkr, dum, 1, dum, 1,
2482  $ work( iwork ), info )
2483 *
2484 * Multiply right singular vectors of L in WORK(IR) by
2485 * Q in A, storing result in VT
2486 * (Workspace: need M*M)
2487 *
2488  CALL sgemm( 'N', 'N', m, n, m, one, work( ir ),
2489  $ ldwrkr, a, lda, zero, vt, ldvt )
2490 *
2491  ELSE
2492 *
2493 * Insufficient workspace for a fast algorithm
2494 *
2495  itau = 1
2496  iwork = itau + m
2497 *
2498 * Compute A=L*Q
2499 * (Workspace: need 2*M, prefer M+M*NB)
2500 *
2501  CALL sgelqf( m, n, a, lda, work( itau ),
2502  $ work( iwork ), lwork-iwork+1, ierr )
2503 *
2504 * Copy result to VT
2505 *
2506  CALL slacpy( 'U', m, n, a, lda, vt, ldvt )
2507 *
2508 * Generate Q in VT
2509 * (Workspace: need 2*M, prefer M+M*NB)
2510 *
2511  CALL sorglq( m, n, m, vt, ldvt, work( itau ),
2512  $ work( iwork ), lwork-iwork+1, ierr )
2513  ie = itau
2514  itauq = ie + m
2515  itaup = itauq + m
2516  iwork = itaup + m
2517 *
2518 * Zero out above L in A
2519 *
2520  CALL slaset( 'U', m-1, m-1, zero, zero, a( 1, 2 ),
2521  $ lda )
2522 *
2523 * Bidiagonalize L in A
2524 * (Workspace: need 4*M, prefer 3*M+2*M*NB)
2525 *
2526  CALL sgebrd( m, m, a, lda, s, work( ie ),
2527  $ work( itauq ), work( itaup ),
2528  $ work( iwork ), lwork-iwork+1, ierr )
2529 *
2530 * Multiply right vectors bidiagonalizing L by Q in VT
2531 * (Workspace: need 3*M+N, prefer 3*M+N*NB)
2532 *
2533  CALL sormbr( 'P', 'L', 'T', m, n, m, a, lda,
2534  $ work( itaup ), vt, ldvt,
2535  $ work( iwork ), lwork-iwork+1, ierr )
2536  iwork = ie + m
2537 *
2538 * Perform bidiagonal QR iteration, computing right
2539 * singular vectors of A in VT
2540 * (Workspace: need BDSPAC)
2541 *
2542  CALL sbdsqr( 'U', m, n, 0, 0, s, work( ie ), vt,
2543  $ ldvt, dum, 1, dum, 1, work( iwork ),
2544  $ info )
2545 *
2546  END IF
2547 *
2548  ELSE IF( wntuo ) THEN
2549 *
2550 * Path 5t(N much larger than M, JOBU='O', JOBVT='S')
2551 * M right singular vectors to be computed in VT and
2552 * M left singular vectors to be overwritten on A
2553 *
2554  IF( lwork.GE.2*m*m+max( 4*m, bdspac ) ) THEN
2555 *
2556 * Sufficient workspace for a fast algorithm
2557 *
2558  iu = 1
2559  IF( lwork.GE.wrkbl+2*lda*m ) THEN
2560 *
2561 * WORK(IU) is LDA by M and WORK(IR) is LDA by M
2562 *
2563  ldwrku = lda
2564  ir = iu + ldwrku*m
2565  ldwrkr = lda
2566  ELSE IF( lwork.GE.wrkbl+( lda+m )*m ) THEN
2567 *
2568 * WORK(IU) is LDA by M and WORK(IR) is M by M
2569 *
2570  ldwrku = lda
2571  ir = iu + ldwrku*m
2572  ldwrkr = m
2573  ELSE
2574 *
2575 * WORK(IU) is M by M and WORK(IR) is M by M
2576 *
2577  ldwrku = m
2578  ir = iu + ldwrku*m
2579  ldwrkr = m
2580  END IF
2581  itau = ir + ldwrkr*m
2582  iwork = itau + m
2583 *
2584 * Compute A=L*Q
2585 * (Workspace: need 2*M*M+2*M, prefer 2*M*M+M+M*NB)
2586 *
2587  CALL sgelqf( m, n, a, lda, work( itau ),
2588  $ work( iwork ), lwork-iwork+1, ierr )
2589 *
2590 * Copy L to WORK(IU), zeroing out below it
2591 *
2592  CALL slacpy( 'L', m, m, a, lda, work( iu ),
2593  $ ldwrku )
2594  CALL slaset( 'U', m-1, m-1, zero, zero,
2595  $ work( iu+ldwrku ), ldwrku )
2596 *
2597 * Generate Q in A
2598 * (Workspace: need 2*M*M+2*M, prefer 2*M*M+M+M*NB)
2599 *
2600  CALL sorglq( m, n, m, a, lda, work( itau ),
2601  $ work( iwork ), lwork-iwork+1, ierr )
2602  ie = itau
2603  itauq = ie + m
2604  itaup = itauq + m
2605  iwork = itaup + m
2606 *
2607 * Bidiagonalize L in WORK(IU), copying result to
2608 * WORK(IR)
2609 * (Workspace: need 2*M*M+4*M,
2610 * prefer 2*M*M+3*M+2*M*NB)
2611 *
2612  CALL sgebrd( m, m, work( iu ), ldwrku, s,
2613  $ work( ie ), work( itauq ),
2614  $ work( itaup ), work( iwork ),
2615  $ lwork-iwork+1, ierr )
2616  CALL slacpy( 'L', m, m, work( iu ), ldwrku,
2617  $ work( ir ), ldwrkr )
2618 *
2619 * Generate right bidiagonalizing vectors in WORK(IU)
2620 * (Workspace: need 2*M*M+4*M-1,
2621 * prefer 2*M*M+3*M+(M-1)*NB)
2622 *
2623  CALL sorgbr( 'P', m, m, m, work( iu ), ldwrku,
2624  $ work( itaup ), work( iwork ),
2625  $ lwork-iwork+1, ierr )
2626 *
2627 * Generate left bidiagonalizing vectors in WORK(IR)
2628 * (Workspace: need 2*M*M+4*M, prefer 2*M*M+3*M+M*NB)
2629 *
2630  CALL sorgbr( 'Q', m, m, m, work( ir ), ldwrkr,
2631  $ work( itauq ), work( iwork ),
2632  $ lwork-iwork+1, ierr )
2633  iwork = ie + m
2634 *
2635 * Perform bidiagonal QR iteration, computing left
2636 * singular vectors of L in WORK(IR) and computing
2637 * right singular vectors of L in WORK(IU)
2638 * (Workspace: need 2*M*M+BDSPAC)
2639 *
2640  CALL sbdsqr( 'U', m, m, m, 0, s, work( ie ),
2641  $ work( iu ), ldwrku, work( ir ),
2642  $ ldwrkr, dum, 1, work( iwork ), info )
2643 *
2644 * Multiply right singular vectors of L in WORK(IU) by
2645 * Q in A, storing result in VT
2646 * (Workspace: need M*M)
2647 *
2648  CALL sgemm( 'N', 'N', m, n, m, one, work( iu ),
2649  $ ldwrku, a, lda, zero, vt, ldvt )
2650 *
2651 * Copy left singular vectors of L to A
2652 * (Workspace: need M*M)
2653 *
2654  CALL slacpy( 'F', m, m, work( ir ), ldwrkr, a,
2655  $ lda )
2656 *
2657  ELSE
2658 *
2659 * Insufficient workspace for a fast algorithm
2660 *
2661  itau = 1
2662  iwork = itau + m
2663 *
2664 * Compute A=L*Q, copying result to VT
2665 * (Workspace: need 2*M, prefer M+M*NB)
2666 *
2667  CALL sgelqf( m, n, a, lda, work( itau ),
2668  $ work( iwork ), lwork-iwork+1, ierr )
2669  CALL slacpy( 'U', m, n, a, lda, vt, ldvt )
2670 *
2671 * Generate Q in VT
2672 * (Workspace: need 2*M, prefer M+M*NB)
2673 *
2674  CALL sorglq( m, n, m, vt, ldvt, work( itau ),
2675  $ work( iwork ), lwork-iwork+1, ierr )
2676  ie = itau
2677  itauq = ie + m
2678  itaup = itauq + m
2679  iwork = itaup + m
2680 *
2681 * Zero out above L in A
2682 *
2683  CALL slaset( 'U', m-1, m-1, zero, zero, a( 1, 2 ),
2684  $ lda )
2685 *
2686 * Bidiagonalize L in A
2687 * (Workspace: need 4*M, prefer 3*M+2*M*NB)
2688 *
2689  CALL sgebrd( m, m, a, lda, s, work( ie ),
2690  $ work( itauq ), work( itaup ),
2691  $ work( iwork ), lwork-iwork+1, ierr )
2692 *
2693 * Multiply right vectors bidiagonalizing L by Q in VT
2694 * (Workspace: need 3*M+N, prefer 3*M+N*NB)
2695 *
2696  CALL sormbr( 'P', 'L', 'T', m, n, m, a, lda,
2697  $ work( itaup ), vt, ldvt,
2698  $ work( iwork ), lwork-iwork+1, ierr )
2699 *
2700 * Generate left bidiagonalizing vectors of L in A
2701 * (Workspace: need 4*M, prefer 3*M+M*NB)
2702 *
2703  CALL sorgbr( 'Q', m, m, m, a, lda, work( itauq ),
2704  $ work( iwork ), lwork-iwork+1, ierr )
2705  iwork = ie + m
2706 *
2707 * Perform bidiagonal QR iteration, compute left
2708 * singular vectors of A in A and compute right
2709 * singular vectors of A in VT
2710 * (Workspace: need BDSPAC)
2711 *
2712  CALL sbdsqr( 'U', m, n, m, 0, s, work( ie ), vt,
2713  $ ldvt, a, lda, dum, 1, work( iwork ),
2714  $ info )
2715 *
2716  END IF
2717 *
2718  ELSE IF( wntuas ) THEN
2719 *
2720 * Path 6t(N much larger than M, JOBU='S' or 'A',
2721 * JOBVT='S')
2722 * M right singular vectors to be computed in VT and
2723 * M left singular vectors to be computed in U
2724 *
2725  IF( lwork.GE.m*m+max( 4*m, bdspac ) ) THEN
2726 *
2727 * Sufficient workspace for a fast algorithm
2728 *
2729  iu = 1
2730  IF( lwork.GE.wrkbl+lda*m ) THEN
2731 *
2732 * WORK(IU) is LDA by N
2733 *
2734  ldwrku = lda
2735  ELSE
2736 *
2737 * WORK(IU) is LDA by M
2738 *
2739  ldwrku = m
2740  END IF
2741  itau = iu + ldwrku*m
2742  iwork = itau + m
2743 *
2744 * Compute A=L*Q
2745 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
2746 *
2747  CALL sgelqf( m, n, a, lda, work( itau ),
2748  $ work( iwork ), lwork-iwork+1, ierr )
2749 *
2750 * Copy L to WORK(IU), zeroing out above it
2751 *
2752  CALL slacpy( 'L', m, m, a, lda, work( iu ),
2753  $ ldwrku )
2754  CALL slaset( 'U', m-1, m-1, zero, zero,
2755  $ work( iu+ldwrku ), ldwrku )
2756 *
2757 * Generate Q in A
2758 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
2759 *
2760  CALL sorglq( m, n, m, a, lda, work( itau ),
2761  $ work( iwork ), lwork-iwork+1, ierr )
2762  ie = itau
2763  itauq = ie + m
2764  itaup = itauq + m
2765  iwork = itaup + m
2766 *
2767 * Bidiagonalize L in WORK(IU), copying result to U
2768 * (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB)
2769 *
2770  CALL sgebrd( m, m, work( iu ), ldwrku, s,
2771  $ work( ie ), work( itauq ),
2772  $ work( itaup ), work( iwork ),
2773  $ lwork-iwork+1, ierr )
2774  CALL slacpy( 'L', m, m, work( iu ), ldwrku, u,
2775  $ ldu )
2776 *
2777 * Generate right bidiagonalizing vectors in WORK(IU)
2778 * (Workspace: need M*M+4*M-1,
2779 * prefer M*M+3*M+(M-1)*NB)
2780 *
2781  CALL sorgbr( 'P', m, m, m, work( iu ), ldwrku,
2782  $ work( itaup ), work( iwork ),
2783  $ lwork-iwork+1, ierr )
2784 *
2785 * Generate left bidiagonalizing vectors in U
2786 * (Workspace: need M*M+4*M, prefer M*M+3*M+M*NB)
2787 *
2788  CALL sorgbr( 'Q', m, m, m, u, ldu, work( itauq ),
2789  $ work( iwork ), lwork-iwork+1, ierr )
2790  iwork = ie + m
2791 *
2792 * Perform bidiagonal QR iteration, computing left
2793 * singular vectors of L in U and computing right
2794 * singular vectors of L in WORK(IU)
2795 * (Workspace: need M*M+BDSPAC)
2796 *
2797  CALL sbdsqr( 'U', m, m, m, 0, s, work( ie ),
2798  $ work( iu ), ldwrku, u, ldu, dum, 1,
2799  $ work( iwork ), info )
2800 *
2801 * Multiply right singular vectors of L in WORK(IU) by
2802 * Q in A, storing result in VT
2803 * (Workspace: need M*M)
2804 *
2805  CALL sgemm( 'N', 'N', m, n, m, one, work( iu ),
2806  $ ldwrku, a, lda, zero, vt, ldvt )
2807 *
2808  ELSE
2809 *
2810 * Insufficient workspace for a fast algorithm
2811 *
2812  itau = 1
2813  iwork = itau + m
2814 *
2815 * Compute A=L*Q, copying result to VT
2816 * (Workspace: need 2*M, prefer M+M*NB)
2817 *
2818  CALL sgelqf( m, n, a, lda, work( itau ),
2819  $ work( iwork ), lwork-iwork+1, ierr )
2820  CALL slacpy( 'U', m, n, a, lda, vt, ldvt )
2821 *
2822 * Generate Q in VT
2823 * (Workspace: need 2*M, prefer M+M*NB)
2824 *
2825  CALL sorglq( m, n, m, vt, ldvt, work( itau ),
2826  $ work( iwork ), lwork-iwork+1, ierr )
2827 *
2828 * Copy L to U, zeroing out above it
2829 *
2830  CALL slacpy( 'L', m, m, a, lda, u, ldu )
2831  CALL slaset( 'U', m-1, m-1, zero, zero, u( 1, 2 ),
2832  $ ldu )
2833  ie = itau
2834  itauq = ie + m
2835  itaup = itauq + m
2836  iwork = itaup + m
2837 *
2838 * Bidiagonalize L in U
2839 * (Workspace: need 4*M, prefer 3*M+2*M*NB)
2840 *
2841  CALL sgebrd( m, m, u, ldu, s, work( ie ),
2842  $ work( itauq ), work( itaup ),
2843  $ work( iwork ), lwork-iwork+1, ierr )
2844 *
2845 * Multiply right bidiagonalizing vectors in U by Q
2846 * in VT
2847 * (Workspace: need 3*M+N, prefer 3*M+N*NB)
2848 *
2849  CALL sormbr( 'P', 'L', 'T', m, n, m, u, ldu,
2850  $ work( itaup ), vt, ldvt,
2851  $ work( iwork ), lwork-iwork+1, ierr )
2852 *
2853 * Generate left bidiagonalizing vectors in U
2854 * (Workspace: need 4*M, prefer 3*M+M*NB)
2855 *
2856  CALL sorgbr( 'Q', m, m, m, u, ldu, work( itauq ),
2857  $ work( iwork ), lwork-iwork+1, ierr )
2858  iwork = ie + m
2859 *
2860 * Perform bidiagonal QR iteration, computing left
2861 * singular vectors of A in U and computing right
2862 * singular vectors of A in VT
2863 * (Workspace: need BDSPAC)
2864 *
2865  CALL sbdsqr( 'U', m, n, m, 0, s, work( ie ), vt,
2866  $ ldvt, u, ldu, dum, 1, work( iwork ),
2867  $ info )
2868 *
2869  END IF
2870 *
2871  END IF
2872 *
2873  ELSE IF( wntva ) THEN
2874 *
2875  IF( wntun ) THEN
2876 *
2877 * Path 7t(N much larger than M, JOBU='N', JOBVT='A')
2878 * N right singular vectors to be computed in VT and
2879 * no left singular vectors to be computed
2880 *
2881  IF( lwork.GE.m*m+max( n+m, 4*m, bdspac ) ) THEN
2882 *
2883 * Sufficient workspace for a fast algorithm
2884 *
2885  ir = 1
2886  IF( lwork.GE.wrkbl+lda*m ) THEN
2887 *
2888 * WORK(IR) is LDA by M
2889 *
2890  ldwrkr = lda
2891  ELSE
2892 *
2893 * WORK(IR) is M by M
2894 *
2895  ldwrkr = m
2896  END IF
2897  itau = ir + ldwrkr*m
2898  iwork = itau + m
2899 *
2900 * Compute A=L*Q, copying result to VT
2901 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
2902 *
2903  CALL sgelqf( m, n, a, lda, work( itau ),
2904  $ work( iwork ), lwork-iwork+1, ierr )
2905  CALL slacpy( 'U', m, n, a, lda, vt, ldvt )
2906 *
2907 * Copy L to WORK(IR), zeroing out above it
2908 *
2909  CALL slacpy( 'L', m, m, a, lda, work( ir ),
2910  $ ldwrkr )
2911  CALL slaset( 'U', m-1, m-1, zero, zero,
2912  $ work( ir+ldwrkr ), ldwrkr )
2913 *
2914 * Generate Q in VT
2915 * (Workspace: need M*M+M+N, prefer M*M+M+N*NB)
2916 *
2917  CALL sorglq( n, n, m, vt, ldvt, work( itau ),
2918  $ work( iwork ), lwork-iwork+1, ierr )
2919  ie = itau
2920  itauq = ie + m
2921  itaup = itauq + m
2922  iwork = itaup + m
2923 *
2924 * Bidiagonalize L in WORK(IR)
2925 * (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB)
2926 *
2927  CALL sgebrd( m, m, work( ir ), ldwrkr, s,
2928  $ work( ie ), work( itauq ),
2929  $ work( itaup ), work( iwork ),
2930  $ lwork-iwork+1, ierr )
2931 *
2932 * Generate right bidiagonalizing vectors in WORK(IR)
2933 * (Workspace: need M*M+4*M-1,
2934 * prefer M*M+3*M+(M-1)*NB)
2935 *
2936  CALL sorgbr( 'P', m, m, m, work( ir ), ldwrkr,
2937  $ work( itaup ), work( iwork ),
2938  $ lwork-iwork+1, ierr )
2939  iwork = ie + m
2940 *
2941 * Perform bidiagonal QR iteration, computing right
2942 * singular vectors of L in WORK(IR)
2943 * (Workspace: need M*M+BDSPAC)
2944 *
2945  CALL sbdsqr( 'U', m, m, 0, 0, s, work( ie ),
2946  $ work( ir ), ldwrkr, dum, 1, dum, 1,
2947  $ work( iwork ), info )
2948 *
2949 * Multiply right singular vectors of L in WORK(IR) by
2950 * Q in VT, storing result in A
2951 * (Workspace: need M*M)
2952 *
2953  CALL sgemm( 'N', 'N', m, n, m, one, work( ir ),
2954  $ ldwrkr, vt, ldvt, zero, a, lda )
2955 *
2956 * Copy right singular vectors of A from A to VT
2957 *
2958  CALL slacpy( 'F', m, n, a, lda, vt, ldvt )
2959 *
2960  ELSE
2961 *
2962 * Insufficient workspace for a fast algorithm
2963 *
2964  itau = 1
2965  iwork = itau + m
2966 *
2967 * Compute A=L*Q, copying result to VT
2968 * (Workspace: need 2*M, prefer M+M*NB)
2969 *
2970  CALL sgelqf( m, n, a, lda, work( itau ),
2971  $ work( iwork ), lwork-iwork+1, ierr )
2972  CALL slacpy( 'U', m, n, a, lda, vt, ldvt )
2973 *
2974 * Generate Q in VT
2975 * (Workspace: need M+N, prefer M+N*NB)
2976 *
2977  CALL sorglq( n, n, m, vt, ldvt, work( itau ),
2978  $ work( iwork ), lwork-iwork+1, ierr )
2979  ie = itau
2980  itauq = ie + m
2981  itaup = itauq + m
2982  iwork = itaup + m
2983 *
2984 * Zero out above L in A
2985 *
2986  CALL slaset( 'U', m-1, m-1, zero, zero, a( 1, 2 ),
2987  $ lda )
2988 *
2989 * Bidiagonalize L in A
2990 * (Workspace: need 4*M, prefer 3*M+2*M*NB)
2991 *
2992  CALL sgebrd( m, m, a, lda, s, work( ie ),
2993  $ work( itauq ), work( itaup ),
2994  $ work( iwork ), lwork-iwork+1, ierr )
2995 *
2996 * Multiply right bidiagonalizing vectors in A by Q
2997 * in VT
2998 * (Workspace: need 3*M+N, prefer 3*M+N*NB)
2999 *
3000  CALL sormbr( 'P', 'L', 'T', m, n, m, a, lda,
3001  $ work( itaup ), vt, ldvt,
3002  $ work( iwork ), lwork-iwork+1, ierr )
3003  iwork = ie + m
3004 *
3005 * Perform bidiagonal QR iteration, computing right
3006 * singular vectors of A in VT
3007 * (Workspace: need BDSPAC)
3008 *
3009  CALL sbdsqr( 'U', m, n, 0, 0, s, work( ie ), vt,
3010  $ ldvt, dum, 1, dum, 1, work( iwork ),
3011  $ info )
3012 *
3013  END IF
3014 *
3015  ELSE IF( wntuo ) THEN
3016 *
3017 * Path 8t(N much larger than M, JOBU='O', JOBVT='A')
3018 * N right singular vectors to be computed in VT and
3019 * M left singular vectors to be overwritten on A
3020 *
3021  IF( lwork.GE.2*m*m+max( n+m, 4*m, bdspac ) ) THEN
3022 *
3023 * Sufficient workspace for a fast algorithm
3024 *
3025  iu = 1
3026  IF( lwork.GE.wrkbl+2*lda*m ) THEN
3027 *
3028 * WORK(IU) is LDA by M and WORK(IR) is LDA by M
3029 *
3030  ldwrku = lda
3031  ir = iu + ldwrku*m
3032  ldwrkr = lda
3033  ELSE IF( lwork.GE.wrkbl+( lda+m )*m ) THEN
3034 *
3035 * WORK(IU) is LDA by M and WORK(IR) is M by M
3036 *
3037  ldwrku = lda
3038  ir = iu + ldwrku*m
3039  ldwrkr = m
3040  ELSE
3041 *
3042 * WORK(IU) is M by M and WORK(IR) is M by M
3043 *
3044  ldwrku = m
3045  ir = iu + ldwrku*m
3046  ldwrkr = m
3047  END IF
3048  itau = ir + ldwrkr*m
3049  iwork = itau + m
3050 *
3051 * Compute A=L*Q, copying result to VT
3052 * (Workspace: need 2*M*M+2*M, prefer 2*M*M+M+M*NB)
3053 *
3054  CALL sgelqf( m, n, a, lda, work( itau ),
3055  $ work( iwork ), lwork-iwork+1, ierr )
3056  CALL slacpy( 'U', m, n, a, lda, vt, ldvt )
3057 *
3058 * Generate Q in VT
3059 * (Workspace: need 2*M*M+M+N, prefer 2*M*M+M+N*NB)
3060 *
3061  CALL sorglq( n, n, m, vt, ldvt, work( itau ),
3062  $ work( iwork ), lwork-iwork+1, ierr )
3063 *
3064 * Copy L to WORK(IU), zeroing out above it
3065 *
3066  CALL slacpy( 'L', m, m, a, lda, work( iu ),
3067  $ ldwrku )
3068  CALL slaset( 'U', m-1, m-1, zero, zero,
3069  $ work( iu+ldwrku ), ldwrku )
3070  ie = itau
3071  itauq = ie + m
3072  itaup = itauq + m
3073  iwork = itaup + m
3074 *
3075 * Bidiagonalize L in WORK(IU), copying result to
3076 * WORK(IR)
3077 * (Workspace: need 2*M*M+4*M,
3078 * prefer 2*M*M+3*M+2*M*NB)
3079 *
3080  CALL sgebrd( m, m, work( iu ), ldwrku, s,
3081  $ work( ie ), work( itauq ),
3082  $ work( itaup ), work( iwork ),
3083  $ lwork-iwork+1, ierr )
3084  CALL slacpy( 'L', m, m, work( iu ), ldwrku,
3085  $ work( ir ), ldwrkr )
3086 *
3087 * Generate right bidiagonalizing vectors in WORK(IU)
3088 * (Workspace: need 2*M*M+4*M-1,
3089 * prefer 2*M*M+3*M+(M-1)*NB)
3090 *
3091  CALL sorgbr( 'P', m, m, m, work( iu ), ldwrku,
3092  $ work( itaup ), work( iwork ),
3093  $ lwork-iwork+1, ierr )
3094 *
3095 * Generate left bidiagonalizing vectors in WORK(IR)
3096 * (Workspace: need 2*M*M+4*M, prefer 2*M*M+3*M+M*NB)
3097 *
3098  CALL sorgbr( 'Q', m, m, m, work( ir ), ldwrkr,
3099  $ work( itauq ), work( iwork ),
3100  $ lwork-iwork+1, ierr )
3101  iwork = ie + m
3102 *
3103 * Perform bidiagonal QR iteration, computing left
3104 * singular vectors of L in WORK(IR) and computing
3105 * right singular vectors of L in WORK(IU)
3106 * (Workspace: need 2*M*M+BDSPAC)
3107 *
3108  CALL sbdsqr( 'U', m, m, m, 0, s, work( ie ),
3109  $ work( iu ), ldwrku, work( ir ),
3110  $ ldwrkr, dum, 1, work( iwork ), info )
3111 *
3112 * Multiply right singular vectors of L in WORK(IU) by
3113 * Q in VT, storing result in A
3114 * (Workspace: need M*M)
3115 *
3116  CALL sgemm( 'N', 'N', m, n, m, one, work( iu ),
3117  $ ldwrku, vt, ldvt, zero, a, lda )
3118 *
3119 * Copy right singular vectors of A from A to VT
3120 *
3121  CALL slacpy( 'F', m, n, a, lda, vt, ldvt )
3122 *
3123 * Copy left singular vectors of A from WORK(IR) to A
3124 *
3125  CALL slacpy( 'F', m, m, work( ir ), ldwrkr, a,
3126  $ lda )
3127 *
3128  ELSE
3129 *
3130 * Insufficient workspace for a fast algorithm
3131 *
3132  itau = 1
3133  iwork = itau + m
3134 *
3135 * Compute A=L*Q, copying result to VT
3136 * (Workspace: need 2*M, prefer M+M*NB)
3137 *
3138  CALL sgelqf( m, n, a, lda, work( itau ),
3139  $ work( iwork ), lwork-iwork+1, ierr )
3140  CALL slacpy( 'U', m, n, a, lda, vt, ldvt )
3141 *
3142 * Generate Q in VT
3143 * (Workspace: need M+N, prefer M+N*NB)
3144 *
3145  CALL sorglq( n, n, m, vt, ldvt, work( itau ),
3146  $ work( iwork ), lwork-iwork+1, ierr )
3147  ie = itau
3148  itauq = ie + m
3149  itaup = itauq + m
3150  iwork = itaup + m
3151 *
3152 * Zero out above L in A
3153 *
3154  CALL slaset( 'U', m-1, m-1, zero, zero, a( 1, 2 ),
3155  $ lda )
3156 *
3157 * Bidiagonalize L in A
3158 * (Workspace: need 4*M, prefer 3*M+2*M*NB)
3159 *
3160  CALL sgebrd( m, m, a, lda, s, work( ie ),
3161  $ work( itauq ), work( itaup ),
3162  $ work( iwork ), lwork-iwork+1, ierr )
3163 *
3164 * Multiply right bidiagonalizing vectors in A by Q
3165 * in VT
3166 * (Workspace: need 3*M+N, prefer 3*M+N*NB)
3167 *
3168  CALL sormbr( 'P', 'L', 'T', m, n, m, a, lda,
3169  $ work( itaup ), vt, ldvt,
3170  $ work( iwork ), lwork-iwork+1, ierr )
3171 *
3172 * Generate left bidiagonalizing vectors in A
3173 * (Workspace: need 4*M, prefer 3*M+M*NB)
3174 *
3175  CALL sorgbr( 'Q', m, m, m, a, lda, work( itauq ),
3176  $ work( iwork ), lwork-iwork+1, ierr )
3177  iwork = ie + m
3178 *
3179 * Perform bidiagonal QR iteration, computing left
3180 * singular vectors of A in A and computing right
3181 * singular vectors of A in VT
3182 * (Workspace: need BDSPAC)
3183 *
3184  CALL sbdsqr( 'U', m, n, m, 0, s, work( ie ), vt,
3185  $ ldvt, a, lda, dum, 1, work( iwork ),
3186  $ info )
3187 *
3188  END IF
3189 *
3190  ELSE IF( wntuas ) THEN
3191 *
3192 * Path 9t(N much larger than M, JOBU='S' or 'A',
3193 * JOBVT='A')
3194 * N right singular vectors to be computed in VT and
3195 * M left singular vectors to be computed in U
3196 *
3197  IF( lwork.GE.m*m+max( n+m, 4*m, bdspac ) ) THEN
3198 *
3199 * Sufficient workspace for a fast algorithm
3200 *
3201  iu = 1
3202  IF( lwork.GE.wrkbl+lda*m ) THEN
3203 *
3204 * WORK(IU) is LDA by M
3205 *
3206  ldwrku = lda
3207  ELSE
3208 *
3209 * WORK(IU) is M by M
3210 *
3211  ldwrku = m
3212  END IF
3213  itau = iu + ldwrku*m
3214  iwork = itau + m
3215 *
3216 * Compute A=L*Q, copying result to VT
3217 * (Workspace: need M*M+2*M, prefer M*M+M+M*NB)
3218 *
3219  CALL sgelqf( m, n, a, lda, work( itau ),
3220  $ work( iwork ), lwork-iwork+1, ierr )
3221  CALL slacpy( 'U', m, n, a, lda, vt, ldvt )
3222 *
3223 * Generate Q in VT
3224 * (Workspace: need M*M+M+N, prefer M*M+M+N*NB)
3225 *
3226  CALL sorglq( n, n, m, vt, ldvt, work( itau ),
3227  $ work( iwork ), lwork-iwork+1, ierr )
3228 *
3229 * Copy L to WORK(IU), zeroing out above it
3230 *
3231  CALL slacpy( 'L', m, m, a, lda, work( iu ),
3232  $ ldwrku )
3233  CALL slaset( 'U', m-1, m-1, zero, zero,
3234  $ work( iu+ldwrku ), ldwrku )
3235  ie = itau
3236  itauq = ie + m
3237  itaup = itauq + m
3238  iwork = itaup + m
3239 *
3240 * Bidiagonalize L in WORK(IU), copying result to U
3241 * (Workspace: need M*M+4*M, prefer M*M+3*M+2*M*NB)
3242 *
3243  CALL sgebrd( m, m, work( iu ), ldwrku, s,
3244  $ work( ie ), work( itauq ),
3245  $ work( itaup ), work( iwork ),
3246  $ lwork-iwork+1, ierr )
3247  CALL slacpy( 'L', m, m, work( iu ), ldwrku, u,
3248  $ ldu )
3249 *
3250 * Generate right bidiagonalizing vectors in WORK(IU)
3251 * (Workspace: need M*M+4*M, prefer M*M+3*M+(M-1)*NB)
3252 *
3253  CALL sorgbr( 'P', m, m, m, work( iu ), ldwrku,
3254  $ work( itaup ), work( iwork ),
3255  $ lwork-iwork+1, ierr )
3256 *
3257 * Generate left bidiagonalizing vectors in U
3258 * (Workspace: need M*M+4*M, prefer M*M+3*M+M*NB)
3259 *
3260  CALL sorgbr( 'Q', m, m, m, u, ldu, work( itauq ),
3261  $ work( iwork ), lwork-iwork+1, ierr )
3262  iwork = ie + m
3263 *
3264 * Perform bidiagonal QR iteration, computing left
3265 * singular vectors of L in U and computing right
3266 * singular vectors of L in WORK(IU)
3267 * (Workspace: need M*M+BDSPAC)
3268 *
3269  CALL sbdsqr( 'U', m, m, m, 0, s, work( ie ),
3270  $ work( iu ), ldwrku, u, ldu, dum, 1,
3271  $ work( iwork ), info )
3272 *
3273 * Multiply right singular vectors of L in WORK(IU) by
3274 * Q in VT, storing result in A
3275 * (Workspace: need M*M)
3276 *
3277  CALL sgemm( 'N', 'N', m, n, m, one, work( iu ),
3278  $ ldwrku, vt, ldvt, zero, a, lda )
3279 *
3280 * Copy right singular vectors of A from A to VT
3281 *
3282  CALL slacpy( 'F', m, n, a, lda, vt, ldvt )
3283 *
3284  ELSE
3285 *
3286 * Insufficient workspace for a fast algorithm
3287 *
3288  itau = 1
3289  iwork = itau + m
3290 *
3291 * Compute A=L*Q, copying result to VT
3292 * (Workspace: need 2*M, prefer M+M*NB)
3293 *
3294  CALL sgelqf( m, n, a, lda, work( itau ),
3295  $ work( iwork ), lwork-iwork+1, ierr )
3296  CALL slacpy( 'U', m, n, a, lda, vt, ldvt )
3297 *
3298 * Generate Q in VT
3299 * (Workspace: need M+N, prefer M+N*NB)
3300 *
3301  CALL sorglq( n, n, m, vt, ldvt, work( itau ),
3302  $ work( iwork ), lwork-iwork+1, ierr )
3303 *
3304 * Copy L to U, zeroing out above it
3305 *
3306  CALL slacpy( 'L', m, m, a, lda, u, ldu )
3307  CALL slaset( 'U', m-1, m-1, zero, zero, u( 1, 2 ),
3308  $ ldu )
3309  ie = itau
3310  itauq = ie + m
3311  itaup = itauq + m
3312  iwork = itaup + m
3313 *
3314 * Bidiagonalize L in U
3315 * (Workspace: need 4*M, prefer 3*M+2*M*NB)
3316 *
3317  CALL sgebrd( m, m, u, ldu, s, work( ie ),
3318  $ work( itauq ), work( itaup ),
3319  $ work( iwork ), lwork-iwork+1, ierr )
3320 *
3321 * Multiply right bidiagonalizing vectors in U by Q
3322 * in VT
3323 * (Workspace: need 3*M+N, prefer 3*M+N*NB)
3324 *
3325  CALL sormbr( 'P', 'L', 'T', m, n, m, u, ldu,
3326  $ work( itaup ), vt, ldvt,
3327  $ work( iwork ), lwork-iwork+1, ierr )
3328 *
3329 * Generate left bidiagonalizing vectors in U
3330 * (Workspace: need 4*M, prefer 3*M+M*NB)
3331 *
3332  CALL sorgbr( 'Q', m, m, m, u, ldu, work( itauq ),
3333  $ work( iwork ), lwork-iwork+1, ierr )
3334  iwork = ie + m
3335 *
3336 * Perform bidiagonal QR iteration, computing left
3337 * singular vectors of A in U and computing right
3338 * singular vectors of A in VT
3339 * (Workspace: need BDSPAC)
3340 *
3341  CALL sbdsqr( 'U', m, n, m, 0, s, work( ie ), vt,
3342  $ ldvt, u, ldu, dum, 1, work( iwork ),
3343  $ info )
3344 *
3345  END IF
3346 *
3347  END IF
3348 *
3349  END IF
3350 *
3351  ELSE
3352 *
3353 * N .LT. MNTHR
3354 *
3355 * Path 10t(N greater than M, but not much larger)
3356 * Reduce to bidiagonal form without LQ decomposition
3357 *
3358  ie = 1
3359  itauq = ie + m
3360  itaup = itauq + m
3361  iwork = itaup + m
3362 *
3363 * Bidiagonalize A
3364 * (Workspace: need 3*M+N, prefer 3*M+(M+N)*NB)
3365 *
3366  CALL sgebrd( m, n, a, lda, s, work( ie ), work( itauq ),
3367  $ work( itaup ), work( iwork ), lwork-iwork+1,
3368  $ ierr )
3369  IF( wntuas ) THEN
3370 *
3371 * If left singular vectors desired in U, copy result to U
3372 * and generate left bidiagonalizing vectors in U
3373 * (Workspace: need 4*M-1, prefer 3*M+(M-1)*NB)
3374 *
3375  CALL slacpy( 'L', m, m, a, lda, u, ldu )
3376  CALL sorgbr( 'Q', m, m, n, u, ldu, work( itauq ),
3377  $ work( iwork ), lwork-iwork+1, ierr )
3378  END IF
3379  IF( wntvas ) THEN
3380 *
3381 * If right singular vectors desired in VT, copy result to
3382 * VT and generate right bidiagonalizing vectors in VT
3383 * (Workspace: need 3*M+NRVT, prefer 3*M+NRVT*NB)
3384 *
3385  CALL slacpy( 'U', m, n, a, lda, vt, ldvt )
3386  IF( wntva )
3387  $ nrvt = n
3388  IF( wntvs )
3389  $ nrvt = m
3390  CALL sorgbr( 'P', nrvt, n, m, vt, ldvt, work( itaup ),
3391  $ work( iwork ), lwork-iwork+1, ierr )
3392  END IF
3393  IF( wntuo ) THEN
3394 *
3395 * If left singular vectors desired in A, generate left
3396 * bidiagonalizing vectors in A
3397 * (Workspace: need 4*M-1, prefer 3*M+(M-1)*NB)
3398 *
3399  CALL sorgbr( 'Q', m, m, n, a, lda, work( itauq ),
3400  $ work( iwork ), lwork-iwork+1, ierr )
3401  END IF
3402  IF( wntvo ) THEN
3403 *
3404 * If right singular vectors desired in A, generate right
3405 * bidiagonalizing vectors in A
3406 * (Workspace: need 4*M, prefer 3*M+M*NB)
3407 *
3408  CALL sorgbr( 'P', m, n, m, a, lda, work( itaup ),
3409  $ work( iwork ), lwork-iwork+1, ierr )
3410  END IF
3411  iwork = ie + m
3412  IF( wntuas .OR. wntuo )
3413  $ nru = m
3414  IF( wntun )
3415  $ nru = 0
3416  IF( wntvas .OR. wntvo )
3417  $ ncvt = n
3418  IF( wntvn )
3419  $ ncvt = 0
3420  IF( ( .NOT.wntuo ) .AND. ( .NOT.wntvo ) ) THEN
3421 *
3422 * Perform bidiagonal QR iteration, if desired, computing
3423 * left singular vectors in U and computing right singular
3424 * vectors in VT
3425 * (Workspace: need BDSPAC)
3426 *
3427  CALL sbdsqr( 'L', m, ncvt, nru, 0, s, work( ie ), vt,
3428  $ ldvt, u, ldu, dum, 1, work( iwork ), info )
3429  ELSE IF( ( .NOT.wntuo ) .AND. wntvo ) THEN
3430 *
3431 * Perform bidiagonal QR iteration, if desired, computing
3432 * left singular vectors in U and computing right singular
3433 * vectors in A
3434 * (Workspace: need BDSPAC)
3435 *
3436  CALL sbdsqr( 'L', m, ncvt, nru, 0, s, work( ie ), a, lda,
3437  $ u, ldu, dum, 1, work( iwork ), info )
3438  ELSE
3439 *
3440 * Perform bidiagonal QR iteration, if desired, computing
3441 * left singular vectors in A and computing right singular
3442 * vectors in VT
3443 * (Workspace: need BDSPAC)
3444 *
3445  CALL sbdsqr( 'L', m, ncvt, nru, 0, s, work( ie ), vt,
3446  $ ldvt, a, lda, dum, 1, work( iwork ), info )
3447  END IF
3448 *
3449  END IF
3450 *
3451  END IF
3452 *
3453 * If SBDSQR failed to converge, copy unconverged superdiagonals
3454 * to WORK( 2:MINMN )
3455 *
3456  IF( info.NE.0 ) THEN
3457  IF( ie.GT.2 ) THEN
3458  DO 50 i = 1, minmn - 1
3459  work( i+1 ) = work( i+ie-1 )
3460  50 CONTINUE
3461  END IF
3462  IF( ie.LT.2 ) THEN
3463  DO 60 i = minmn - 1, 1, -1
3464  work( i+1 ) = work( i+ie-1 )
3465  60 CONTINUE
3466  END IF
3467  END IF
3468 *
3469 * Undo scaling if necessary
3470 *
3471  IF( iscl.EQ.1 ) THEN
3472  IF( anrm.GT.bignum )
3473  $ CALL slascl( 'G', 0, 0, bignum, anrm, minmn, 1, s, minmn,
3474  $ ierr )
3475  IF( info.NE.0 .AND. anrm.GT.bignum )
3476  $ CALL slascl( 'G', 0, 0, bignum, anrm, minmn-1, 1, work( 2 ),
3477  $ minmn, ierr )
3478  IF( anrm.LT.smlnum )
3479  $ CALL slascl( 'G', 0, 0, smlnum, anrm, minmn, 1, s, minmn,
3480  $ ierr )
3481  IF( info.NE.0 .AND. anrm.LT.smlnum )
3482  $ CALL slascl( 'G', 0, 0, smlnum, anrm, minmn-1, 1, work( 2 ),
3483  $ minmn, ierr )
3484  END IF
3485 *
3486 * Return optimal workspace in WORK(1)
3487 *
3488  work( 1 ) = maxwrk
3489 *
3490  RETURN
3491 *
3492 * End of SGESVD
3493 *
real function slamch(CMACH)
SLAMCH
Definition: slamch.f:69
subroutine sgebrd(M, N, A, LDA, D, E, TAUQ, TAUP, WORK, LWORK, INFO)
SGEBRD
Definition: sgebrd.f:207
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:62
subroutine sbdsqr(UPLO, N, NCVT, NRU, NCC, D, E, VT, LDVT, U, LDU, C, LDC, WORK, INFO)
SBDSQR
Definition: sbdsqr.f:232
subroutine sorgbr(VECT, M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
SORGBR
Definition: sorgbr.f:159
subroutine sgelqf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
SGELQF
Definition: sgelqf.f:137
subroutine sgeqrf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
SGEQRF
Definition: sgeqrf.f:138
logical function lsame(CA, CB)
LSAME
Definition: lsame.f:55
subroutine sgemm(TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)
SGEMM
Definition: sgemm.f:189
subroutine slacpy(UPLO, M, N, A, LDA, B, LDB)
SLACPY copies all or part of one two-dimensional array to another.
Definition: slacpy.f:105
integer function ilaenv(ISPEC, NAME, OPTS, N1, N2, N3, N4)
Definition: tstiee.f:83
subroutine slascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
SLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: slascl.f:141
subroutine sorgqr(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
SORGQR
Definition: sorgqr.f:130
subroutine sormbr(VECT, SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
SORMBR
Definition: sormbr.f:198
real function slange(NORM, M, N, A, LDA, WORK)
SLANGE returns the value of the 1-norm, Frobenius norm, infinity-norm, or the largest absolute value ...
Definition: slange.f:116
subroutine sorglq(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
SORGLQ
Definition: sorglq.f:129
subroutine slaset(UPLO, M, N, ALPHA, BETA, A, LDA)
SLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values...
Definition: slaset.f:112

Here is the call graph for this function:

Here is the caller graph for this function:

subroutine sgesvdx ( character  JOBU,
character  JOBVT,
character  RANGE,
integer  M,
integer  N,
real, dimension( lda, * )  A,
integer  LDA,
real  VL,
real  VU,
integer  IL,
integer  IU,
integer  NS,
real, dimension( * )  S,
real, dimension( ldu, * )  U,
integer  LDU,
real, dimension( ldvt, * )  VT,
integer  LDVT,
real, dimension( * )  WORK,
integer  LWORK,
integer, dimension( * )  IWORK,
integer  INFO 
)

SGESVDX computes the singular value decomposition (SVD) for GE matrices

Download SGESVDX + dependencies [TGZ] [ZIP] [TXT]

Purpose:
  SGESVDX computes the singular value decomposition (SVD) of a real
  M-by-N matrix A, optionally computing the left and/or right singular
  vectors. The SVD is written
 
      A = U * SIGMA * transpose(V)
 
  where SIGMA is an M-by-N matrix which is zero except for its
  min(m,n) diagonal elements, U is an M-by-M orthogonal matrix, and
  V is an N-by-N orthogonal matrix.  The diagonal elements of SIGMA
  are the singular values of A; they are real and non-negative, and
  are returned in descending order.  The first min(m,n) columns of
  U and V are the left and right singular vectors of A.
 
  SGESVDX uses an eigenvalue problem for obtaining the SVD, which 
  allows for the computation of a subset of singular values and 
  vectors. See SBDSVDX for details.
 
  Note that the routine returns V**T, not V.
Parameters
[in]JOBU
          JOBU is CHARACTER*1
          Specifies options for computing all or part of the matrix U:
          = 'V':  the first min(m,n) columns of U (the left singular
                  vectors) or as specified by RANGE are returned in 
                  the array U;
          = 'N':  no columns of U (no left singular vectors) are
                  computed.
[in]JOBVT
          JOBVT is CHARACTER*1
           Specifies options for computing all or part of the matrix
           V**T:
           = 'V':  the first min(m,n) rows of V**T (the right singular
                   vectors) or as specified by RANGE are returned in 
                   the array VT;
           = 'N':  no rows of V**T (no right singular vectors) are
                   computed.
[in]RANGE
          RANGE is CHARACTER*1
          = 'A': all singular values will be found.
          = 'V': all singular values in the half-open interval (VL,VU]
                 will be found.
          = 'I': the IL-th through IU-th singular values will be found. 
[in]M