LAPACK  3.6.0
LAPACK: Linear Algebra PACKage
Collaboration diagram for complex:

Functions

subroutine cgejsv (JOBA, JOBU, JOBV, JOBR, JOBT, JOBP, M, N, A, LDA, SVA, U, LDU, V, LDV, CWORK, LWORK, RWORK, LRWORK, IWORK, INFO)
 CGEJSV More...
 
subroutine cgesdd (JOBZ, M, N, A, LDA, S, U, LDU, VT, LDVT, WORK, LWORK, RWORK, IWORK, INFO)
 CGESDD More...
 
subroutine cgesvd (JOBU, JOBVT, M, N, A, LDA, S, U, LDU, VT, LDVT, WORK, LWORK, RWORK, INFO)
  CGESVD computes the singular value decomposition (SVD) for GE matrices More...
 
subroutine cgesvdx (JOBU, JOBVT, RANGE, M, N, A, LDA, VL, VU, IL, IU, NS, S, U, LDU, VT, LDVT, WORK, LWORK, RWORK, IWORK, INFO)
  CGESVDX computes the singular value decomposition (SVD) for GE matrices More...
 

Detailed Description

This is the group of complex singular value driver functions for GE matrices

Function Documentation

subroutine cgejsv ( character*1  JOBA,
character*1  JOBU,
character*1  JOBV,
character*1  JOBR,
character*1  JOBT,
character*1  JOBP,
integer  M,
integer  N,
complex, dimension( lda, * )  A,
integer  LDA,
real, dimension( n )  SVA,
complex, dimension( ldu, * )  U,
integer  LDU,
complex, dimension( ldv, * )  V,
integer  LDV,
complex, dimension( lwork )  CWORK,
integer  LWORK,
real, dimension( * )  RWORK,
integer  LRWORK,
integer, dimension( * )  IWORK,
integer  INFO 
)

CGEJSV

Download CGEJSV + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 CGEJSV computes the singular value decomposition (SVD) of a real M-by-N
 matrix [A], where M >= N. The SVD of [A] is written as

              [A] = [U] * [SIGMA] * [V]^*,

 where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
 diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix, and
 [V] is an N-by-N orthogonal matrix. The diagonal elements of [SIGMA] are
 the singular values of [A]. The columns of [U] and [V] are the left and
 the right singular vectors of [A], respectively. The matrices [U] and [V]
 are computed and stored in the arrays U and V, respectively. The diagonal
 of [SIGMA] is computed and stored in the array SVA.

  Arguments:
  ==========

 \param[in] JOBA
 \verbatim
          JOBA is CHARACTER*1
         Specifies the level of accuracy:
       = 'C': This option works well (high relative accuracy) if A = B * D,
              with well-conditioned B and arbitrary diagonal matrix D.
              The accuracy cannot be spoiled by COLUMN scaling. The
              accuracy of the computed output depends on the condition of
              B, and the procedure aims at the best theoretical accuracy.
              The relative error max_{i=1:N}|d sigma_i| / sigma_i is
              bounded by f(M,N)*epsilon* cond(B), independent of D.
              The input matrix is preprocessed with the QRF with column
              pivoting. This initial preprocessing and preconditioning by
              a rank revealing QR factorization is common for all values of
              JOBA. Additional actions are specified as follows:
       = 'E': Computation as with 'C' with an additional estimate of the
              condition number of B. It provides a realistic error bound.
       = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
              D1, D2, and well-conditioned matrix C, this option gives
              higher accuracy than the 'C' option. If the structure of the
              input matrix is not known, and relative accuracy is
              desirable, then this option is advisable. The input matrix A
              is preprocessed with QR factorization with FULL (row and
              column) pivoting.
       = 'G'  Computation as with 'F' with an additional estimate of the
              condition number of B, where A=D*B. If A has heavily weighted
              rows, then using this condition number gives too pessimistic
              error bound.
       = 'A': Small singular values are the noise and the matrix is treated
              as numerically rank defficient. The error in the computed
              singular values is bounded by f(m,n)*epsilon*||A||.
              The computed SVD A = U * S * V^* restores A up to
              f(m,n)*epsilon*||A||.
              This gives the procedure the licence to discard (set to zero)
              all singular values below N*epsilon*||A||.
       = 'R': Similar as in 'A'. Rank revealing property of the initial
              QR factorization is used do reveal (using triangular factor)
              a gap sigma_{r+1} < epsilon * sigma_r in which case the
              numerical RANK is declared to be r. The SVD is computed with
              absolute error bounds, but more accurately than with 'A'.
Parameters
[in]JOBU
          JOBU is CHARACTER*1
         Specifies whether to compute the columns of U:
       = 'U': N columns of U are returned in the array U.
       = 'F': full set of M left sing. vectors is returned in the array U.
       = 'W': U may be used as workspace of length M*N. See the description
              of U.
       = 'N': U is not computed.
[in]JOBV
          JOBV is CHARACTER*1
         Specifies whether to compute the matrix V:
       = 'V': N columns of V are returned in the array V; Jacobi rotations
              are not explicitly accumulated.
       = 'J': N columns of V are returned in the array V, but they are
              computed as the product of Jacobi rotations. This option is
              allowed only if JOBU .NE. 'N', i.e. in computing the full SVD.
       = 'W': V may be used as workspace of length N*N. See the description
              of V.
       = 'N': V is not computed.
[in]JOBR
          JOBR is CHARACTER*1
         Specifies the RANGE for the singular values. Issues the licence to
         set to zero small positive singular values if they are outside
         specified range. If A .NE. 0 is scaled so that the largest singular
         value of c*A is around SQRT(BIG), BIG=SLAMCH('O'), then JOBR issues
         the licence to kill columns of A whose norm in c*A is less than
         SQRT(SFMIN) (for JOBR.EQ.'R'), or less than SMALL=SFMIN/EPSLN,
         where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E').
       = 'N': Do not kill small columns of c*A. This option assumes that
              BLAS and QR factorizations and triangular solvers are
              implemented to work in that range. If the condition of A
              is greater than BIG, use CGESVJ.
       = 'R': RESTRICTED range for sigma(c*A) is [SQRT(SFMIN), SQRT(BIG)]
              (roughly, as described above). This option is recommended.
                                             ===========================
         For computing the singular values in the FULL range [SFMIN,BIG]
         use CGESVJ.
[in]JOBT
          JOBT is CHARACTER*1
         If the matrix is square then the procedure may determine to use
         transposed A if A^* seems to be better with respect to convergence.
         If the matrix is not square, JOBT is ignored. This is subject to
         changes in the future.
         The decision is based on two values of entropy over the adjoint
         orbit of A^* * A. See the descriptions of WORK(6) and WORK(7).
       = 'T': transpose if entropy test indicates possibly faster
         convergence of Jacobi process if A^* is taken as input. If A is
         replaced with A^*, then the row pivoting is included automatically.
       = 'N': do not speculate.
         This option can be used to compute only the singular values, or the
         full SVD (U, SIGMA and V). For only one set of singular vectors
         (U or V), the caller should provide both U and V, as one of the
         matrices is used as workspace if the matrix A is transposed.
         The implementer can easily remove this constraint and make the
         code more complicated. See the descriptions of U and V.
[in]JOBP
          JOBP is CHARACTER*1
         Issues the licence to introduce structured perturbations to drown
         denormalized numbers. This licence should be active if the
         denormals are poorly implemented, causing slow computation,
         especially in cases of fast convergence (!). For details see [1,2].
         For the sake of simplicity, this perturbations are included only
         when the full SVD or only the singular values are requested. The
         implementer/user can easily add the perturbation for the cases of
         computing one set of singular vectors.
       = 'P': introduce perturbation
       = 'N': do not perturb
[in]M
          M is INTEGER
         The number of rows of the input matrix A.  M >= 0.
[in]N
          N is INTEGER
         The number of columns of the input matrix A. M >= N >= 0.
[in,out]A
          A is COMPLEX array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]SVA
          SVA is REAL array, dimension (N)
          On exit,
          - For WORK(1)/WORK(2) = ONE: The singular values of A. During the
            computation SVA contains Euclidean column norms of the
            iterated matrices in the array A.
          - For WORK(1) .NE. WORK(2): The singular values of A are
            (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
            sigma_max(A) overflows or if small singular values have been
            saved from underflow by scaling the input matrix A.
          - If JOBR='R' then some of the singular values may be returned
            as exact zeros obtained by "set to zero" because they are
            below the numerical rank threshold or are denormalized numbers.
[out]U
          U is COMPLEX array, dimension ( LDU, N )
          If JOBU = 'U', then U contains on exit the M-by-N matrix of
                         the left singular vectors.
          If JOBU = 'F', then U contains on exit the M-by-M matrix of
                         the left singular vectors, including an ONB
                         of the orthogonal complement of the Range(A).
          If JOBU = 'W'  .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N),
                         then U is used as workspace if the procedure
                         replaces A with A^*. In that case, [V] is computed
                         in U as left singular vectors of A^* and then
                         copied back to the V array. This 'W' option is just
                         a reminder to the caller that in this case U is
                         reserved as workspace of length N*N.
          If JOBU = 'N'  U is not referenced.
[in]LDU
          LDU is INTEGER
          The leading dimension of the array U,  LDU >= 1.
          IF  JOBU = 'U' or 'F' or 'W',  then LDU >= M.
[out]V
          V is COMPLEX array, dimension ( LDV, N )
          If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
                         the right singular vectors;
          If JOBV = 'W', AND (JOBU.EQ.'U' AND JOBT.EQ.'T' AND M.EQ.N),
                         then V is used as workspace if the pprocedure
                         replaces A with A^*. In that case, [U] is computed
                         in V as right singular vectors of A^* and then
                         copied back to the U array. This 'W' option is just
                         a reminder to the caller that in this case V is
                         reserved as workspace of length N*N.
          If JOBV = 'N'  V is not referenced.
[in]LDV
          LDV is INTEGER
          The leading dimension of the array V,  LDV >= 1.
          If JOBV = 'V' or 'J' or 'W', then LDV >= N.
[out]CWORK
          CWORK is COMPLEX array, dimension at least LWORK.     
[in]LWORK
          LWORK is INTEGER
          Length of CWORK to confirm proper allocation of workspace.
          LWORK depends on the job:

          1. If only SIGMA is needed ( JOBU.EQ.'N', JOBV.EQ.'N' ) and
            1.1 .. no scaled condition estimate required (JOBE.EQ.'N'):
               LWORK >= 2*N+1. This is the minimal requirement.
               ->> For optimal performance (blocked code) the optimal value
               is LWORK >= N + (N+1)*NB. Here NB is the optimal
               block size for CGEQP3 and CGEQRF.
               In general, optimal LWORK is computed as 
               LWORK >= max(N+LWORK(CGEQP3),N+LWORK(CGEQRF)).        
            1.2. .. an estimate of the scaled condition number of A is
               required (JOBA='E', or 'G'). In this case, LWORK the minimal
               requirement is LWORK >= N*N + 3*N.
               ->> For optimal performance (blocked code) the optimal value 
               is LWORK >= max(N+(N+1)*NB, N*N+3*N).
               In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(CGEQP3),N+LWORK(CGEQRF), 
                                                     N+N*N+LWORK(CPOCON)).

          2. If SIGMA and the right singular vectors are needed (JOBV.EQ.'V'),
             (JOBU.EQ.'N')
            -> the minimal requirement is LWORK >= 3*N.
            -> For optimal performance, LWORK >= max(N+(N+1)*NB, 3*N,2*N+N*NB),
               where NB is the optimal block size for CGEQP3, CGEQRF, CGELQ,
               CUNMLQ. In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(CGEQP3), N+LWORK(CPOCON), N+LWORK(CGESVJ),
                       N+LWORK(CGELQF), 2*N+LWORK(CGEQRF), N+LWORK(CUNMLQ)).

          3. If SIGMA and the left singular vectors are needed
            -> the minimal requirement is LWORK >= 3*N.
            -> For optimal performance:
               if JOBU.EQ.'U' :: LWORK >= max(3*N, N+(N+1)*NB, 2*N+N*NB),
               where NB is the optimal block size for CGEQP3, CGEQRF, CUNMQR.
               In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(CGEQP3),N+LWORK(CPOCON),
                        2*N+LWORK(CGEQRF), N+LWORK(CUNMQR)). 
               
          4. If the full SVD is needed: (JOBU.EQ.'U' or JOBU.EQ.'F') and 
            4.1. if JOBV.EQ.'V'  
               the minimal requirement is LWORK >= 5*N+2*N*N. 
            4.2. if JOBV.EQ.'J' the minimal requirement is 
               LWORK >= 4*N+N*N.
            In both cases, the allocated CWORK can accomodate blocked runs
            of CGEQP3, CGEQRF, CGELQF, SUNMQR, CUNMLQ.
[out]RWORK
          RWORK is REAL array, dimension at least LRWORK.
          On exit,
          RWORK(1) = Determines the scaling factor SCALE = RWORK(2) / RWORK(1)
                    such that SCALE*SVA(1:N) are the computed singular values
                    of A. (See the description of SVA().)
          RWORK(2) = See the description of RWORK(1).
          RWORK(3) = SCONDA is an estimate for the condition number of
                    column equilibrated A. (If JOBA .EQ. 'E' or 'G')
                    SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1).
                    It is computed using SPOCON. It holds
                    N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
                    where R is the triangular factor from the QRF of A.
                    However, if R is truncated and the numerical rank is
                    determined to be strictly smaller than N, SCONDA is
                    returned as -1, thus indicating that the smallest
                    singular values might be lost.

          If full SVD is needed, the following two condition numbers are
          useful for the analysis of the algorithm. They are provied for
          a developer/implementer who is familiar with the details of
          the method.

          RWORK(4) = an estimate of the scaled condition number of the
                    triangular factor in the first QR factorization.
          RWORK(5) = an estimate of the scaled condition number of the
                    triangular factor in the second QR factorization.
          The following two parameters are computed if JOBT .EQ. 'T'.
          They are provided for a developer/implementer who is familiar
          with the details of the method.
          RWORK(6) = the entropy of A^* * A :: this is the Shannon entropy
                    of diag(A^* * A) / Trace(A^* * A) taken as point in the
                    probability simplex.
          RWORK(7) = the entropy of A * A^*. (See the description of RWORK(6).)
[in]LRWORK
          LRWORK is INTEGER
          Length of RWORK to confirm proper allocation of workspace.
          LRWORK depends on the job:

       1. If only singular values are requested i.e. if 
          LSAME(JOBU,'N') .AND. LSAME(JOBV,'N') 
          then:
          1.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
          then LRWORK = max( 7, N + 2 * M ). 
          1.2. Otherwise, LRWORK  = max( 7, 2 * N ).
       2. If singular values with the right singular vectors are requested
          i.e. if 
          (LSAME(JOBV,'V').OR.LSAME(JOBV,'J')) .AND. 
          .NOT.(LSAME(JOBU,'U').OR.LSAME(JOBU,'F'))
          then:
          2.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
          then LRWORK = max( 7, N + 2 * M ). 
          2.2. Otherwise, LRWORK  = max( 7, 2 * N ).      
       3. If singular values with the left singular vectors are requested, i.e. if    
          (LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) .AND.
          .NOT.(LSAME(JOBV,'V').OR.LSAME(JOBV,'J'))
          then:
          3.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
          then LRWORK = max( 7, N + 2 * M ). 
          3.2. Otherwise, LRWORK  = max( 7, 2 * N ).    
       4. If singular values with both the left and the right singular vectors 
          are requested, i.e. if     
          (LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) .AND.
          (LSAME(JOBV,'V').OR.LSAME(JOBV,'J'))
          then:
          4.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
          then LRWORK = max( 7, N + 2 * M ). 
          4.2. Otherwise, LRWORK  = max( 7, 2 * N ).    
[out]IWORK
          IWORK is INTEGER array, of dimension:
                If LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'), then 
                the dimension of IWORK is max( 3, 2 * N + M ).
                Otherwise, the dimension of IWORK is 
                -> max( 3, 2*N ) for full SVD
                -> max( 3, N ) for singular values only or singular
                   values with one set of singular vectors (left or right)
          On exit,
          IWORK(1) = the numerical rank determined after the initial
                     QR factorization with pivoting. See the descriptions
                     of JOBA and JOBR.
          IWORK(2) = the number of the computed nonzero singular values
          IWORK(3) = if nonzero, a warning message:
                     If IWORK(3).EQ.1 then some of the column norms of A
                     were denormalized floats. The requested high accuracy
                     is not warranted by the data.
[out]INFO
          INFO is INTEGER
           < 0  : if INFO = -i, then the i-th argument had an illegal value.
           = 0 :  successfull exit;
           > 0 :  CGEJSV  did not converge in the maximal allowed number
                  of sweeps. The computed values may be inaccurate.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Date
November 2015
Further Details:
  CGEJSV implements a preconditioned Jacobi SVD algorithm. It uses CGEQP3,
  CGEQRF, and CGELQF as preprocessors and preconditioners. Optionally, an
  additional row pivoting can be used as a preprocessor, which in some
  cases results in much higher accuracy. An example is matrix A with the
  structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
  diagonal matrices and C is well-conditioned matrix. In that case, complete
  pivoting in the first QR factorizations provides accuracy dependent on the
  condition number of C, and independent of D1, D2. Such higher accuracy is
  not completely understood theoretically, but it works well in practice.
  Further, if A can be written as A = B*D, with well-conditioned B and some
  diagonal D, then the high accuracy is guaranteed, both theoretically and
  in software, independent of D. For more details see [1], [2].
     The computational range for the singular values can be the full range
  ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
  & LAPACK routines called by CGEJSV are implemented to work in that range.
  If that is not the case, then the restriction for safe computation with
  the singular values in the range of normalized IEEE numbers is that the
  spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
  overflow. This code (CGEJSV) is best used in this restricted range,
  meaning that singular values of magnitude below ||A||_2 / SLAMCH('O') are
  returned as zeros. See JOBR for details on this.
     Further, this implementation is somewhat slower than the one described
  in [1,2] due to replacement of some non-LAPACK components, and because
  the choice of some tuning parameters in the iterative part (CGESVJ) is
  left to the implementer on a particular machine.
     The rank revealing QR factorization (in this code: CGEQP3) should be
  implemented as in [3]. We have a new version of CGEQP3 under development
  that is more robust than the current one in LAPACK, with a cleaner cut in
  rank defficient cases. It will be available in the SIGMA library [4].
  If M is much larger than N, it is obvious that the inital QRF with
  column pivoting can be preprocessed by the QRF without pivoting. That
  well known trick is not used in CGEJSV because in some cases heavy row
  weighting can be treated with complete pivoting. The overhead in cases
  M much larger than N is then only due to pivoting, but the benefits in
  terms of accuracy have prevailed. The implementer/user can incorporate
  this extra QRF step easily. The implementer can also improve data movement
  (matrix transpose, matrix copy, matrix transposed copy) - this
  implementation of CGEJSV uses only the simplest, naive data movement.  \par Contributors: 
  Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)  \par References: 
@verbatim 

 [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
     LAPACK Working note 169.
 [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
     LAPACK Working note 170.
 [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
     factorization software - a case study.
     ACM Trans. math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
     LAPACK Working note 176.
 [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
     QSVD, (H,K)-SVD computations.
     Department of Mathematics, University of Zagreb, 2008.
Bugs, examples and comments:
Please report all bugs and send interesting examples and/or comments to drmac.nosp@m.@mat.nosp@m.h.hr. Thank you.

Definition at line 518 of file cgejsv.f.

518 *
519 * -- LAPACK computational routine (version 3.6.0) --
520 * -- LAPACK is a software package provided by Univ. of Tennessee, --
521 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
522 * November 2015
523 *
524 * .. Scalar Arguments ..
525  IMPLICIT NONE
526  INTEGER info, lda, ldu, ldv, lwork, lrwork, m, n
527 * ..
528 * .. Array Arguments ..
529  COMPLEX a( lda, * ), u( ldu, * ), v( ldv, * ), cwork( lwork )
530  REAL sva( n ), rwork( * )
531  INTEGER iwork( * )
532  CHARACTER*1 joba, jobp, jobr, jobt, jobu, jobv
533 * ..
534 *
535 * ===========================================================================
536 *
537 * .. Local Parameters ..
538  REAL zero, one
539  parameter( zero = 0.0e0, one = 1.0e0 )
540  COMPLEX czero, cone
541  parameter( czero = ( 0.0e0, 0.0e0 ), cone = ( 1.0e0, 0.0e0 ) )
542 * ..
543 * .. Local Scalars ..
544  COMPLEX ctemp
545  REAL aapp, aaqq, aatmax, aatmin, big, big1, cond_ok,
546  $ condr1, condr2, entra, entrat, epsln, maxprj, scalem,
547  $ sconda, sfmin, small, temp1, uscal1, uscal2, xsc
548  INTEGER ierr, n1, nr, numrank, p, q, warning
549  LOGICAL almort, defr, errest, goscal, jracc, kill, lsvec,
550  $ l2aber, l2kill, l2pert, l2rank, l2tran,
551  $ noscal, rowpiv, rsvec, transp
552 * ..
553 * .. Intrinsic Functions ..
554  INTRINSIC abs, conjg, alog, amax1, amin1, cmplx, float,
555  $ max0, min0, nint, sign, sqrt
556 * ..
557 * .. External Functions ..
558  REAL slamch, scnrm2
559  INTEGER isamax
560  LOGICAL lsame
561  EXTERNAL isamax, lsame, slamch, scnrm2
562 * ..
563 * .. External Subroutines ..
564  EXTERNAL ccopy, cgelqf, cgeqp3, cgeqrf, clacpy, clascl,
567 *
568  EXTERNAL cgesvj
569 * ..
570 *
571 * Test the input arguments
572 *
573  lsvec = lsame( jobu, 'U' ) .OR. lsame( jobu, 'F' )
574  jracc = lsame( jobv, 'J' )
575  rsvec = lsame( jobv, 'V' ) .OR. jracc
576  rowpiv = lsame( joba, 'F' ) .OR. lsame( joba, 'G' )
577  l2rank = lsame( joba, 'R' )
578  l2aber = lsame( joba, 'A' )
579  errest = lsame( joba, 'E' ) .OR. lsame( joba, 'G' )
580  l2tran = lsame( jobt, 'T' )
581  l2kill = lsame( jobr, 'R' )
582  defr = lsame( jobr, 'N' )
583  l2pert = lsame( jobp, 'P' )
584 *
585  IF ( .NOT.(rowpiv .OR. l2rank .OR. l2aber .OR.
586  $ errest .OR. lsame( joba, 'C' ) )) THEN
587  info = - 1
588  ELSE IF ( .NOT.( lsvec .OR. lsame( jobu, 'N' ) .OR.
589  $ lsame( jobu, 'W' )) ) THEN
590  info = - 2
591  ELSE IF ( .NOT.( rsvec .OR. lsame( jobv, 'N' ) .OR.
592  $ lsame( jobv, 'W' )) .OR. ( jracc .AND. (.NOT.lsvec) ) ) THEN
593  info = - 3
594  ELSE IF ( .NOT. ( l2kill .OR. defr ) ) THEN
595  info = - 4
596  ELSE IF ( .NOT. ( l2tran .OR. lsame( jobt, 'N' ) ) ) THEN
597  info = - 5
598  ELSE IF ( .NOT. ( l2pert .OR. lsame( jobp, 'N' ) ) ) THEN
599  info = - 6
600  ELSE IF ( m .LT. 0 ) THEN
601  info = - 7
602  ELSE IF ( ( n .LT. 0 ) .OR. ( n .GT. m ) ) THEN
603  info = - 8
604  ELSE IF ( lda .LT. m ) THEN
605  info = - 10
606  ELSE IF ( lsvec .AND. ( ldu .LT. m ) ) THEN
607  info = - 13
608  ELSE IF ( rsvec .AND. ( ldv .LT. n ) ) THEN
609  info = - 15
610  ELSE IF ( (.NOT.(lsvec .OR. rsvec .OR. errest).AND.
611  $ (lwork .LT. 2*n+1)) .OR.
612  $ (.NOT.(lsvec .OR. rsvec) .AND. errest .AND.
613  $ (lwork .LT. n*n+3*n)) .OR.
614  $ (lsvec .AND. (.NOT.rsvec) .AND. (lwork .LT. 3*n))
615  $ .OR.
616  $ (rsvec .AND. (.NOT.lsvec) .AND. (lwork .LT. 3*n))
617  $ .OR.
618  $ (lsvec .AND. rsvec .AND. (.NOT.jracc) .AND.
619  $ (lwork.LT.5*n+2*n*n))
620  $ .OR. (lsvec .AND. rsvec .AND. jracc .AND.
621  $ lwork.LT.4*n+n*n))
622  $ THEN
623  info = - 17
624  ELSE IF ( lrwork.LT. max0(n+2*m,7)) THEN
625  info = -19
626  ELSE
627 * #:)
628  info = 0
629  END IF
630 *
631  IF ( info .NE. 0 ) THEN
632 * #:(
633  CALL xerbla( 'CGEJSV', - info )
634  RETURN
635  END IF
636 *
637 * Quick return for void matrix (Y3K safe)
638 * #:)
639  IF ( ( m .EQ. 0 ) .OR. ( n .EQ. 0 ) ) RETURN
640 *
641 * Determine whether the matrix U should be M x N or M x M
642 *
643  IF ( lsvec ) THEN
644  n1 = n
645  IF ( lsame( jobu, 'F' ) ) n1 = m
646  END IF
647 *
648 * Set numerical parameters
649 *
650 *! NOTE: Make sure SLAMCH() does not fail on the target architecture.
651 *
652  epsln = slamch('Epsilon')
653  sfmin = slamch('SafeMinimum')
654  small = sfmin / epsln
655  big = slamch('O')
656 * BIG = ONE / SFMIN
657 *
658 * Initialize SVA(1:N) = diag( ||A e_i||_2 )_1^N
659 *
660 *(!) If necessary, scale SVA() to protect the largest norm from
661 * overflow. It is possible that this scaling pushes the smallest
662 * column norm left from the underflow threshold (extreme case).
663 *
664  scalem = one / sqrt(float(m)*float(n))
665  noscal = .true.
666  goscal = .true.
667  DO 1874 p = 1, n
668  aapp = zero
669  aaqq = one
670  CALL classq( m, a(1,p), 1, aapp, aaqq )
671  IF ( aapp .GT. big ) THEN
672  info = - 9
673  CALL xerbla( 'CGEJSV', -info )
674  RETURN
675  END IF
676  aaqq = sqrt(aaqq)
677  IF ( ( aapp .LT. (big / aaqq) ) .AND. noscal ) THEN
678  sva(p) = aapp * aaqq
679  ELSE
680  noscal = .false.
681  sva(p) = aapp * ( aaqq * scalem )
682  IF ( goscal ) THEN
683  goscal = .false.
684  CALL sscal( p-1, scalem, sva, 1 )
685  END IF
686  END IF
687  1874 CONTINUE
688 *
689  IF ( noscal ) scalem = one
690 *
691  aapp = zero
692  aaqq = big
693  DO 4781 p = 1, n
694  aapp = amax1( aapp, sva(p) )
695  IF ( sva(p) .NE. zero ) aaqq = amin1( aaqq, sva(p) )
696  4781 CONTINUE
697 *
698 * Quick return for zero M x N matrix
699 * #:)
700  IF ( aapp .EQ. zero ) THEN
701  IF ( lsvec ) CALL claset( 'G', m, n1, czero, cone, u, ldu )
702  IF ( rsvec ) CALL claset( 'G', n, n, czero, cone, v, ldv )
703  rwork(1) = one
704  rwork(2) = one
705  IF ( errest ) rwork(3) = one
706  IF ( lsvec .AND. rsvec ) THEN
707  rwork(4) = one
708  rwork(5) = one
709  END IF
710  IF ( l2tran ) THEN
711  rwork(6) = zero
712  rwork(7) = zero
713  END IF
714  iwork(1) = 0
715  iwork(2) = 0
716  iwork(3) = 0
717  RETURN
718  END IF
719 *
720 * Issue warning if denormalized column norms detected. Override the
721 * high relative accuracy request. Issue licence to kill columns
722 * (set them to zero) whose norm is less than sigma_max / BIG (roughly).
723 * #:(
724  warning = 0
725  IF ( aaqq .LE. sfmin ) THEN
726  l2rank = .true.
727  l2kill = .true.
728  warning = 1
729  END IF
730 *
731 * Quick return for one-column matrix
732 * #:)
733  IF ( n .EQ. 1 ) THEN
734 *
735  IF ( lsvec ) THEN
736  CALL clascl( 'G',0,0,sva(1),scalem, m,1,a(1,1),lda,ierr )
737  CALL clacpy( 'A', m, 1, a, lda, u, ldu )
738 * computing all M left singular vectors of the M x 1 matrix
739  IF ( n1 .NE. n ) THEN
740  CALL cgeqrf( m, n, u,ldu, cwork, cwork(n+1),lwork-n,ierr )
741  CALL cungqr( m,n1,1, u,ldu,cwork,cwork(n+1),lwork-n,ierr )
742  CALL ccopy( m, a(1,1), 1, u(1,1), 1 )
743  END IF
744  END IF
745  IF ( rsvec ) THEN
746  v(1,1) = cone
747  END IF
748  IF ( sva(1) .LT. (big*scalem) ) THEN
749  sva(1) = sva(1) / scalem
750  scalem = one
751  END IF
752  rwork(1) = one / scalem
753  rwork(2) = one
754  IF ( sva(1) .NE. zero ) THEN
755  iwork(1) = 1
756  IF ( ( sva(1) / scalem) .GE. sfmin ) THEN
757  iwork(2) = 1
758  ELSE
759  iwork(2) = 0
760  END IF
761  ELSE
762  iwork(1) = 0
763  iwork(2) = 0
764  END IF
765  iwork(3) = 0
766  IF ( errest ) rwork(3) = one
767  IF ( lsvec .AND. rsvec ) THEN
768  rwork(4) = one
769  rwork(5) = one
770  END IF
771  IF ( l2tran ) THEN
772  rwork(6) = zero
773  rwork(7) = zero
774  END IF
775  RETURN
776 *
777  END IF
778 *
779  transp = .false.
780  l2tran = l2tran .AND. ( m .EQ. n )
781 *
782  aatmax = -one
783  aatmin = big
784  IF ( rowpiv .OR. l2tran ) THEN
785 *
786 * Compute the row norms, needed to determine row pivoting sequence
787 * (in the case of heavily row weighted A, row pivoting is strongly
788 * advised) and to collect information needed to compare the
789 * structures of A * A^* and A^* * A (in the case L2TRAN.EQ..TRUE.).
790 *
791  IF ( l2tran ) THEN
792  DO 1950 p = 1, m
793  xsc = zero
794  temp1 = one
795  CALL classq( n, a(p,1), lda, xsc, temp1 )
796 * CLASSQ gets both the ell_2 and the ell_infinity norm
797 * in one pass through the vector
798  rwork(m+n+p) = xsc * scalem
799  rwork(n+p) = xsc * (scalem*sqrt(temp1))
800  aatmax = amax1( aatmax, rwork(n+p) )
801  IF (rwork(n+p) .NE. zero)
802  $ aatmin = amin1(aatmin,rwork(n+p))
803  1950 CONTINUE
804  ELSE
805  DO 1904 p = 1, m
806  rwork(m+n+p) = scalem*abs( a(p,isamax(n,a(p,1),lda)) )
807  aatmax = amax1( aatmax, rwork(m+n+p) )
808  aatmin = amin1( aatmin, rwork(m+n+p) )
809  1904 CONTINUE
810  END IF
811 *
812  END IF
813 *
814 * For square matrix A try to determine whether A^* would be better
815 * input for the preconditioned Jacobi SVD, with faster convergence.
816 * The decision is based on an O(N) function of the vector of column
817 * and row norms of A, based on the Shannon entropy. This should give
818 * the right choice in most cases when the difference actually matters.
819 * It may fail and pick the slower converging side.
820 *
821  entra = zero
822  entrat = zero
823  IF ( l2tran ) THEN
824 *
825  xsc = zero
826  temp1 = one
827  CALL classq( n, sva, 1, xsc, temp1 )
828  temp1 = one / temp1
829 *
830  entra = zero
831  DO 1113 p = 1, n
832  big1 = ( ( sva(p) / xsc )**2 ) * temp1
833  IF ( big1 .NE. zero ) entra = entra + big1 * alog(big1)
834  1113 CONTINUE
835  entra = - entra / alog(float(n))
836 *
837 * Now, SVA().^2/Trace(A^* * A) is a point in the probability simplex.
838 * It is derived from the diagonal of A^* * A. Do the same with the
839 * diagonal of A * A^*, compute the entropy of the corresponding
840 * probability distribution. Note that A * A^* and A^* * A have the
841 * same trace.
842 *
843  entrat = zero
844  DO 1114 p = n+1, n+m
845  big1 = ( ( rwork(p) / xsc )**2 ) * temp1
846  IF ( big1 .NE. zero ) entrat = entrat + big1 * alog(big1)
847  1114 CONTINUE
848  entrat = - entrat / alog(float(m))
849 *
850 * Analyze the entropies and decide A or A^*. Smaller entropy
851 * usually means better input for the algorithm.
852 *
853  transp = ( entrat .LT. entra )
854  transp = .true.
855 *
856 * If A^* is better than A, take the adjoint of A.
857 *
858  IF ( transp ) THEN
859 * In an optimal implementation, this trivial transpose
860 * should be replaced with faster transpose.
861  DO 1115 p = 1, n - 1
862  a(p,p) = conjg(a(p,p))
863  DO 1116 q = p + 1, n
864  ctemp = conjg(a(q,p))
865  a(q,p) = conjg(a(p,q))
866  a(p,q) = ctemp
867  1116 CONTINUE
868  1115 CONTINUE
869  a(n,n) = conjg(a(n,n))
870  DO 1117 p = 1, n
871  rwork(m+n+p) = sva(p)
872  sva(p) = rwork(n+p)
873 * previously computed row 2-norms are now column 2-norms
874 * of the transposed matrix
875  1117 CONTINUE
876  temp1 = aapp
877  aapp = aatmax
878  aatmax = temp1
879  temp1 = aaqq
880  aaqq = aatmin
881  aatmin = temp1
882  kill = lsvec
883  lsvec = rsvec
884  rsvec = kill
885  IF ( lsvec ) n1 = n
886 *
887  rowpiv = .true.
888  END IF
889 *
890  END IF
891 * END IF L2TRAN
892 *
893 * Scale the matrix so that its maximal singular value remains less
894 * than SQRT(BIG) -- the matrix is scaled so that its maximal column
895 * has Euclidean norm equal to SQRT(BIG/N). The only reason to keep
896 * SQRT(BIG) instead of BIG is the fact that CGEJSV uses LAPACK and
897 * BLAS routines that, in some implementations, are not capable of
898 * working in the full interval [SFMIN,BIG] and that they may provoke
899 * overflows in the intermediate results. If the singular values spread
900 * from SFMIN to BIG, then CGESVJ will compute them. So, in that case,
901 * one should use CGESVJ instead of CGEJSV.
902 *
903  big1 = sqrt( big )
904  temp1 = sqrt( big / float(n) )
905 *
906  CALL clascl( 'G', 0, 0, aapp, temp1, n, 1, sva, n, ierr )
907  IF ( aaqq .GT. (aapp * sfmin) ) THEN
908  aaqq = ( aaqq / aapp ) * temp1
909  ELSE
910  aaqq = ( aaqq * temp1 ) / aapp
911  END IF
912  temp1 = temp1 * scalem
913  CALL clascl( 'G', 0, 0, aapp, temp1, m, n, a, lda, ierr )
914 *
915 * To undo scaling at the end of this procedure, multiply the
916 * computed singular values with USCAL2 / USCAL1.
917 *
918  uscal1 = temp1
919  uscal2 = aapp
920 *
921  IF ( l2kill ) THEN
922 * L2KILL enforces computation of nonzero singular values in
923 * the restricted range of condition number of the initial A,
924 * sigma_max(A) / sigma_min(A) approx. SQRT(BIG)/SQRT(SFMIN).
925  xsc = sqrt( sfmin )
926  ELSE
927  xsc = small
928 *
929 * Now, if the condition number of A is too big,
930 * sigma_max(A) / sigma_min(A) .GT. SQRT(BIG/N) * EPSLN / SFMIN,
931 * as a precaution measure, the full SVD is computed using CGESVJ
932 * with accumulated Jacobi rotations. This provides numerically
933 * more robust computation, at the cost of slightly increased run
934 * time. Depending on the concrete implementation of BLAS and LAPACK
935 * (i.e. how they behave in presence of extreme ill-conditioning) the
936 * implementor may decide to remove this switch.
937  IF ( ( aaqq.LT.sqrt(sfmin) ) .AND. lsvec .AND. rsvec ) THEN
938  jracc = .true.
939  END IF
940 *
941  END IF
942  IF ( aaqq .LT. xsc ) THEN
943  DO 700 p = 1, n
944  IF ( sva(p) .LT. xsc ) THEN
945  CALL claset( 'A', m, 1, czero, czero, a(1,p), lda )
946  sva(p) = zero
947  END IF
948  700 CONTINUE
949  END IF
950 *
951 * Preconditioning using QR factorization with pivoting
952 *
953  IF ( rowpiv ) THEN
954 * Optional row permutation (Bjoerck row pivoting):
955 * A result by Cox and Higham shows that the Bjoerck's
956 * row pivoting combined with standard column pivoting
957 * has similar effect as Powell-Reid complete pivoting.
958 * The ell-infinity norms of A are made nonincreasing.
959  DO 1952 p = 1, m - 1
960  q = isamax( m-p+1, rwork(m+n+p), 1 ) + p - 1
961  iwork(2*n+p) = q
962  IF ( p .NE. q ) THEN
963  temp1 = rwork(m+n+p)
964  rwork(m+n+p) = rwork(m+n+q)
965  rwork(m+n+q) = temp1
966  END IF
967  1952 CONTINUE
968  CALL claswp( n, a, lda, 1, m-1, iwork(2*n+1), 1 )
969  END IF
970 *
971 * End of the preparation phase (scaling, optional sorting and
972 * transposing, optional flushing of small columns).
973 *
974 * Preconditioning
975 *
976 * If the full SVD is needed, the right singular vectors are computed
977 * from a matrix equation, and for that we need theoretical analysis
978 * of the Businger-Golub pivoting. So we use CGEQP3 as the first RR QRF.
979 * In all other cases the first RR QRF can be chosen by other criteria
980 * (eg speed by replacing global with restricted window pivoting, such
981 * as in xGEQPX from TOMS # 782). Good results will be obtained using
982 * xGEQPX with properly (!) chosen numerical parameters.
983 * Any improvement of CGEQP3 improves overal performance of CGEJSV.
984 *
985 * A * P1 = Q1 * [ R1^* 0]^*:
986  DO 1963 p = 1, n
987 * .. all columns are free columns
988  iwork(p) = 0
989  1963 CONTINUE
990  CALL cgeqp3( m, n, a, lda, iwork, cwork, cwork(n+1), lwork-n,
991  $ rwork, ierr )
992 *
993 * The upper triangular matrix R1 from the first QRF is inspected for
994 * rank deficiency and possibilities for deflation, or possible
995 * ill-conditioning. Depending on the user specified flag L2RANK,
996 * the procedure explores possibilities to reduce the numerical
997 * rank by inspecting the computed upper triangular factor. If
998 * L2RANK or L2ABER are up, then CGEJSV will compute the SVD of
999 * A + dA, where ||dA|| <= f(M,N)*EPSLN.
1000 *
1001  nr = 1
1002  IF ( l2aber ) THEN
1003 * Standard absolute error bound suffices. All sigma_i with
1004 * sigma_i < N*EPSLN*||A|| are flushed to zero. This is an
1005 * agressive enforcement of lower numerical rank by introducing a
1006 * backward error of the order of N*EPSLN*||A||.
1007  temp1 = sqrt(float(n))*epsln
1008  DO 3001 p = 2, n
1009  IF ( abs(a(p,p)) .GE. (temp1*abs(a(1,1))) ) THEN
1010  nr = nr + 1
1011  ELSE
1012  GO TO 3002
1013  END IF
1014  3001 CONTINUE
1015  3002 CONTINUE
1016  ELSE IF ( l2rank ) THEN
1017 * .. similarly as above, only slightly more gentle (less agressive).
1018 * Sudden drop on the diagonal of R1 is used as the criterion for
1019 * close-to-rank-defficient.
1020  temp1 = sqrt(sfmin)
1021  DO 3401 p = 2, n
1022  IF ( ( abs(a(p,p)) .LT. (epsln*abs(a(p-1,p-1))) ) .OR.
1023  $ ( abs(a(p,p)) .LT. small ) .OR.
1024  $ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3402
1025  nr = nr + 1
1026  3401 CONTINUE
1027  3402 CONTINUE
1028 *
1029  ELSE
1030 * The goal is high relative accuracy. However, if the matrix
1031 * has high scaled condition number the relative accuracy is in
1032 * general not feasible. Later on, a condition number estimator
1033 * will be deployed to estimate the scaled condition number.
1034 * Here we just remove the underflowed part of the triangular
1035 * factor. This prevents the situation in which the code is
1036 * working hard to get the accuracy not warranted by the data.
1037  temp1 = sqrt(sfmin)
1038  DO 3301 p = 2, n
1039  IF ( ( abs(a(p,p)) .LT. small ) .OR.
1040  $ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3302
1041  nr = nr + 1
1042  3301 CONTINUE
1043  3302 CONTINUE
1044 *
1045  END IF
1046 *
1047  almort = .false.
1048  IF ( nr .EQ. n ) THEN
1049  maxprj = one
1050  DO 3051 p = 2, n
1051  temp1 = abs(a(p,p)) / sva(iwork(p))
1052  maxprj = amin1( maxprj, temp1 )
1053  3051 CONTINUE
1054  IF ( maxprj**2 .GE. one - float(n)*epsln ) almort = .true.
1055  END IF
1056 *
1057 *
1058  sconda = - one
1059  condr1 = - one
1060  condr2 = - one
1061 *
1062  IF ( errest ) THEN
1063  IF ( n .EQ. nr ) THEN
1064  IF ( rsvec ) THEN
1065 * .. V is available as workspace
1066  CALL clacpy( 'U', n, n, a, lda, v, ldv )
1067  DO 3053 p = 1, n
1068  temp1 = sva(iwork(p))
1069  CALL csscal( p, one/temp1, v(1,p), 1 )
1070  3053 CONTINUE
1071  CALL cpocon( 'U', n, v, ldv, one, temp1,
1072  $ cwork(n+1), rwork, ierr )
1073 *
1074  ELSE IF ( lsvec ) THEN
1075 * .. U is available as workspace
1076  CALL clacpy( 'U', n, n, a, lda, u, ldu )
1077  DO 3054 p = 1, n
1078  temp1 = sva(iwork(p))
1079  CALL csscal( p, one/temp1, u(1,p), 1 )
1080  3054 CONTINUE
1081  CALL cpocon( 'U', n, u, ldu, one, temp1,
1082  $ cwork(n+1), rwork, ierr )
1083  ELSE
1084  CALL clacpy( 'U', n, n, a, lda, cwork(n+1), n )
1085  DO 3052 p = 1, n
1086  temp1 = sva(iwork(p))
1087  CALL csscal( p, one/temp1, cwork(n+(p-1)*n+1), 1 )
1088  3052 CONTINUE
1089 * .. the columns of R are scaled to have unit Euclidean lengths.
1090  CALL cpocon( 'U', n, cwork(n+1), n, one, temp1,
1091  $ cwork(n+n*n+1), rwork, ierr )
1092 *
1093  END IF
1094  sconda = one / sqrt(temp1)
1095 * SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1).
1096 * N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
1097  ELSE
1098  sconda = - one
1099  END IF
1100  END IF
1101 *
1102  l2pert = l2pert .AND. ( abs( a(1,1)/a(nr,nr) ) .GT. sqrt(big1) )
1103 * If there is no violent scaling, artificial perturbation is not needed.
1104 *
1105 * Phase 3:
1106 *
1107  IF ( .NOT. ( rsvec .OR. lsvec ) ) THEN
1108 *
1109 * Singular Values only
1110 *
1111 * .. transpose A(1:NR,1:N)
1112  DO 1946 p = 1, min0( n-1, nr )
1113  CALL ccopy( n-p, a(p,p+1), lda, a(p+1,p), 1 )
1114  CALL clacgv( n-p+1, a(p,p), 1 )
1115  1946 CONTINUE
1116  IF ( nr .EQ. n ) a(n,n) = conjg(a(n,n))
1117 *
1118 * The following two DO-loops introduce small relative perturbation
1119 * into the strict upper triangle of the lower triangular matrix.
1120 * Small entries below the main diagonal are also changed.
1121 * This modification is useful if the computing environment does not
1122 * provide/allow FLUSH TO ZERO underflow, for it prevents many
1123 * annoying denormalized numbers in case of strongly scaled matrices.
1124 * The perturbation is structured so that it does not introduce any
1125 * new perturbation of the singular values, and it does not destroy
1126 * the job done by the preconditioner.
1127 * The licence for this perturbation is in the variable L2PERT, which
1128 * should be .FALSE. if FLUSH TO ZERO underflow is active.
1129 *
1130  IF ( .NOT. almort ) THEN
1131 *
1132  IF ( l2pert ) THEN
1133 * XSC = SQRT(SMALL)
1134  xsc = epsln / float(n)
1135  DO 4947 q = 1, nr
1136  ctemp = cmplx(xsc*abs(a(q,q)),zero)
1137  DO 4949 p = 1, n
1138  IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1139  $ .OR. ( p .LT. q ) )
1140 * $ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) )
1141  $ a(p,q) = ctemp
1142  4949 CONTINUE
1143  4947 CONTINUE
1144  ELSE
1145  CALL claset( 'U', nr-1,nr-1, czero,czero, a(1,2),lda )
1146  END IF
1147 *
1148 * .. second preconditioning using the QR factorization
1149 *
1150  CALL cgeqrf( n,nr, a,lda, cwork, cwork(n+1),lwork-n, ierr )
1151 *
1152 * .. and transpose upper to lower triangular
1153  DO 1948 p = 1, nr - 1
1154  CALL ccopy( nr-p, a(p,p+1), lda, a(p+1,p), 1 )
1155  CALL clacgv( nr-p+1, a(p,p), 1 )
1156  1948 CONTINUE
1157 *
1158  END IF
1159 *
1160 * Row-cyclic Jacobi SVD algorithm with column pivoting
1161 *
1162 * .. again some perturbation (a "background noise") is added
1163 * to drown denormals
1164  IF ( l2pert ) THEN
1165 * XSC = SQRT(SMALL)
1166  xsc = epsln / float(n)
1167  DO 1947 q = 1, nr
1168  ctemp = cmplx(xsc*abs(a(q,q)),zero)
1169  DO 1949 p = 1, nr
1170  IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1171  $ .OR. ( p .LT. q ) )
1172 * $ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) )
1173  $ a(p,q) = ctemp
1174  1949 CONTINUE
1175  1947 CONTINUE
1176  ELSE
1177  CALL claset( 'U', nr-1, nr-1, czero, czero, a(1,2), lda )
1178  END IF
1179 *
1180 * .. and one-sided Jacobi rotations are started on a lower
1181 * triangular matrix (plus perturbation which is ignored in
1182 * the part which destroys triangular form (confusing?!))
1183 *
1184  CALL cgesvj( 'L', 'NoU', 'NoV', nr, nr, a, lda, sva,
1185  $ n, v, ldv, cwork, lwork, rwork, lrwork, info )
1186 *
1187  scalem = rwork(1)
1188  numrank = nint(rwork(2))
1189 *
1190 *
1191  ELSE IF ( rsvec .AND. ( .NOT. lsvec ) ) THEN
1192 *
1193 * -> Singular Values and Right Singular Vectors <-
1194 *
1195  IF ( almort ) THEN
1196 *
1197 * .. in this case NR equals N
1198  DO 1998 p = 1, nr
1199  CALL ccopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1200  CALL clacgv( n-p+1, v(p,p), 1 )
1201  1998 CONTINUE
1202  CALL claset( 'Upper', nr-1,nr-1, czero, czero, v(1,2), ldv )
1203 *
1204  CALL cgesvj( 'L','U','N', n, nr, v,ldv, sva, nr, a,lda,
1205  $ cwork, lwork, rwork, lrwork, info )
1206  scalem = rwork(1)
1207  numrank = nint(rwork(2))
1208 
1209  ELSE
1210 *
1211 * .. two more QR factorizations ( one QRF is not enough, two require
1212 * accumulated product of Jacobi rotations, three are perfect )
1213 *
1214  CALL claset( 'Lower', nr-1,nr-1, czero, czero, a(2,1), lda )
1215  CALL cgelqf( nr,n, a, lda, cwork, cwork(n+1), lwork-n, ierr)
1216  CALL clacpy( 'Lower', nr, nr, a, lda, v, ldv )
1217  CALL claset( 'Upper', nr-1,nr-1, czero, czero, v(1,2), ldv )
1218  CALL cgeqrf( nr, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1219  $ lwork-2*n, ierr )
1220  DO 8998 p = 1, nr
1221  CALL ccopy( nr-p+1, v(p,p), ldv, v(p,p), 1 )
1222  CALL clacgv( nr-p+1, v(p,p), 1 )
1223  8998 CONTINUE
1224  CALL claset( 'Upper', nr-1, nr-1, zero, zero, v(1,2), ldv )
1225 *
1226  CALL cgesvj( 'Lower', 'U','N', nr, nr, v,ldv, sva, nr, u,
1227  $ ldu, cwork(n+1), lwork-n, rwork, lrwork, info )
1228  scalem = rwork(1)
1229  numrank = nint(rwork(2))
1230  IF ( nr .LT. n ) THEN
1231  CALL claset( 'A',n-nr, nr, czero,czero, v(nr+1,1), ldv )
1232  CALL claset( 'A',nr, n-nr, czero,czero, v(1,nr+1), ldv )
1233  CALL claset( 'A',n-nr,n-nr,czero,cone, v(nr+1,nr+1),ldv )
1234  END IF
1235 *
1236  CALL cunmlq( 'Left', 'C', n, n, nr, a, lda, cwork,
1237  $ v, ldv, cwork(n+1), lwork-n, ierr )
1238 *
1239  END IF
1240 *
1241  DO 8991 p = 1, n
1242  CALL ccopy( n, v(p,1), ldv, a(iwork(p),1), lda )
1243  8991 CONTINUE
1244  CALL clacpy( 'All', n, n, a, lda, v, ldv )
1245 *
1246  IF ( transp ) THEN
1247  CALL clacpy( 'All', n, n, v, ldv, u, ldu )
1248  END IF
1249 *
1250  ELSE IF ( lsvec .AND. ( .NOT. rsvec ) ) THEN
1251 *
1252 * .. Singular Values and Left Singular Vectors ..
1253 *
1254 * .. second preconditioning step to avoid need to accumulate
1255 * Jacobi rotations in the Jacobi iterations.
1256  DO 1965 p = 1, nr
1257  CALL ccopy( n-p+1, a(p,p), lda, u(p,p), 1 )
1258  CALL clacgv( n-p+1, u(p,p), 1 )
1259  1965 CONTINUE
1260  CALL claset( 'Upper', nr-1, nr-1, czero, czero, u(1,2), ldu )
1261 *
1262  CALL cgeqrf( n, nr, u, ldu, cwork(n+1), cwork(2*n+1),
1263  $ lwork-2*n, ierr )
1264 *
1265  DO 1967 p = 1, nr - 1
1266  CALL ccopy( nr-p, u(p,p+1), ldu, u(p+1,p), 1 )
1267  CALL clacgv( n-p+1, u(p,p), 1 )
1268  1967 CONTINUE
1269  CALL claset( 'Upper', nr-1, nr-1, czero, czero, u(1,2), ldu )
1270 *
1271  CALL cgesvj( 'Lower', 'U', 'N', nr,nr, u, ldu, sva, nr, a,
1272  $ lda, cwork(n+1), lwork-n, rwork, lrwork, info )
1273  scalem = rwork(1)
1274  numrank = nint(rwork(2))
1275 *
1276  IF ( nr .LT. m ) THEN
1277  CALL claset( 'A', m-nr, nr,czero, czero, u(nr+1,1), ldu )
1278  IF ( nr .LT. n1 ) THEN
1279  CALL claset( 'A',nr, n1-nr, czero, czero, u(1,nr+1),ldu )
1280  CALL claset( 'A',m-nr,n1-nr,czero,cone,u(nr+1,nr+1),ldu )
1281  END IF
1282  END IF
1283 *
1284  CALL cunmqr( 'Left', 'No Tr', m, n1, n, a, lda, cwork, u,
1285  $ ldu, cwork(n+1), lwork-n, ierr )
1286 *
1287  IF ( rowpiv )
1288  $ CALL claswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1289 *
1290  DO 1974 p = 1, n1
1291  xsc = one / scnrm2( m, u(1,p), 1 )
1292  CALL csscal( m, xsc, u(1,p), 1 )
1293  1974 CONTINUE
1294 *
1295  IF ( transp ) THEN
1296  CALL clacpy( 'All', n, n, u, ldu, v, ldv )
1297  END IF
1298 *
1299  ELSE
1300 *
1301 * .. Full SVD ..
1302 *
1303  IF ( .NOT. jracc ) THEN
1304 *
1305  IF ( .NOT. almort ) THEN
1306 *
1307 * Second Preconditioning Step (QRF [with pivoting])
1308 * Note that the composition of TRANSPOSE, QRF and TRANSPOSE is
1309 * equivalent to an LQF CALL. Since in many libraries the QRF
1310 * seems to be better optimized than the LQF, we do explicit
1311 * transpose and use the QRF. This is subject to changes in an
1312 * optimized implementation of CGEJSV.
1313 *
1314  DO 1968 p = 1, nr
1315  CALL ccopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1316  CALL clacgv( n-p+1, v(p,p), 1 )
1317  1968 CONTINUE
1318 *
1319 * .. the following two loops perturb small entries to avoid
1320 * denormals in the second QR factorization, where they are
1321 * as good as zeros. This is done to avoid painfully slow
1322 * computation with denormals. The relative size of the perturbation
1323 * is a parameter that can be changed by the implementer.
1324 * This perturbation device will be obsolete on machines with
1325 * properly implemented arithmetic.
1326 * To switch it off, set L2PERT=.FALSE. To remove it from the
1327 * code, remove the action under L2PERT=.TRUE., leave the ELSE part.
1328 * The following two loops should be blocked and fused with the
1329 * transposed copy above.
1330 *
1331  IF ( l2pert ) THEN
1332  xsc = sqrt(small)
1333  DO 2969 q = 1, nr
1334  ctemp = cmplx(xsc*abs( v(q,q) ),zero)
1335  DO 2968 p = 1, n
1336  IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
1337  $ .OR. ( p .LT. q ) )
1338 * $ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) )
1339  $ v(p,q) = ctemp
1340  IF ( p .LT. q ) v(p,q) = - v(p,q)
1341  2968 CONTINUE
1342  2969 CONTINUE
1343  ELSE
1344  CALL claset( 'U', nr-1, nr-1, czero, czero, v(1,2), ldv )
1345  END IF
1346 *
1347 * Estimate the row scaled condition number of R1
1348 * (If R1 is rectangular, N > NR, then the condition number
1349 * of the leading NR x NR submatrix is estimated.)
1350 *
1351  CALL clacpy( 'L', nr, nr, v, ldv, cwork(2*n+1), nr )
1352  DO 3950 p = 1, nr
1353  temp1 = scnrm2(nr-p+1,cwork(2*n+(p-1)*nr+p),1)
1354  CALL csscal(nr-p+1,one/temp1,cwork(2*n+(p-1)*nr+p),1)
1355  3950 CONTINUE
1356  CALL cpocon('Lower',nr,cwork(2*n+1),nr,one,temp1,
1357  $ cwork(2*n+nr*nr+1),rwork,ierr)
1358  condr1 = one / sqrt(temp1)
1359 * .. here need a second oppinion on the condition number
1360 * .. then assume worst case scenario
1361 * R1 is OK for inverse <=> CONDR1 .LT. FLOAT(N)
1362 * more conservative <=> CONDR1 .LT. SQRT(FLOAT(N))
1363 *
1364  cond_ok = sqrt(sqrt(float(nr)))
1365 *[TP] COND_OK is a tuning parameter.
1366 *
1367  IF ( condr1 .LT. cond_ok ) THEN
1368 * .. the second QRF without pivoting. Note: in an optimized
1369 * implementation, this QRF should be implemented as the QRF
1370 * of a lower triangular matrix.
1371 * R1^* = Q2 * R2
1372  CALL cgeqrf( n, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1373  $ lwork-2*n, ierr )
1374 *
1375  IF ( l2pert ) THEN
1376  xsc = sqrt(small)/epsln
1377  DO 3959 p = 2, nr
1378  DO 3958 q = 1, p - 1
1379  ctemp=cmplx(xsc*amin1(abs(v(p,p)),abs(v(q,q))),
1380  $ zero)
1381  IF ( abs(v(q,p)) .LE. temp1 )
1382 * $ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) )
1383  $ v(q,p) = ctemp
1384  3958 CONTINUE
1385  3959 CONTINUE
1386  END IF
1387 *
1388  IF ( nr .NE. n )
1389  $ CALL clacpy( 'A', n, nr, v, ldv, cwork(2*n+1), n )
1390 * .. save ...
1391 *
1392 * .. this transposed copy should be better than naive
1393  DO 1969 p = 1, nr - 1
1394  CALL ccopy( nr-p, v(p,p+1), ldv, v(p+1,p), 1 )
1395  CALL clacgv(nr-p+1, v(p,p), 1 )
1396  1969 CONTINUE
1397  v(nr,nr)=conjg(v(nr,nr))
1398 *
1399  condr2 = condr1
1400 *
1401  ELSE
1402 *
1403 * .. ill-conditioned case: second QRF with pivoting
1404 * Note that windowed pivoting would be equaly good
1405 * numerically, and more run-time efficient. So, in
1406 * an optimal implementation, the next call to CGEQP3
1407 * should be replaced with eg. CALL CGEQPX (ACM TOMS #782)
1408 * with properly (carefully) chosen parameters.
1409 *
1410 * R1^* * P2 = Q2 * R2
1411  DO 3003 p = 1, nr
1412  iwork(n+p) = 0
1413  3003 CONTINUE
1414  CALL cgeqp3( n, nr, v, ldv, iwork(n+1), cwork(n+1),
1415  $ cwork(2*n+1), lwork-2*n, rwork, ierr )
1416 ** CALL CGEQRF( N, NR, V, LDV, CWORK(N+1), CWORK(2*N+1),
1417 ** $ LWORK-2*N, IERR )
1418  IF ( l2pert ) THEN
1419  xsc = sqrt(small)
1420  DO 3969 p = 2, nr
1421  DO 3968 q = 1, p - 1
1422  ctemp=cmplx(xsc*amin1(abs(v(p,p)),abs(v(q,q))),
1423  $ zero)
1424  IF ( abs(v(q,p)) .LE. temp1 )
1425 * $ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) )
1426  $ v(q,p) = ctemp
1427  3968 CONTINUE
1428  3969 CONTINUE
1429  END IF
1430 *
1431  CALL clacpy( 'A', n, nr, v, ldv, cwork(2*n+1), n )
1432 *
1433  IF ( l2pert ) THEN
1434  xsc = sqrt(small)
1435  DO 8970 p = 2, nr
1436  DO 8971 q = 1, p - 1
1437  ctemp=cmplx(xsc*amin1(abs(v(p,p)),abs(v(q,q))),
1438  $ zero)
1439 * V(p,q) = - TEMP1*( V(q,p) / ABS(V(q,p)) )
1440  v(p,q) = - ctemp
1441  8971 CONTINUE
1442  8970 CONTINUE
1443  ELSE
1444  CALL claset( 'L',nr-1,nr-1,czero,czero,v(2,1),ldv )
1445  END IF
1446 * Now, compute R2 = L3 * Q3, the LQ factorization.
1447  CALL cgelqf( nr, nr, v, ldv, cwork(2*n+n*nr+1),
1448  $ cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, ierr )
1449 * .. and estimate the condition number
1450  CALL clacpy( 'L',nr,nr,v,ldv,cwork(2*n+n*nr+nr+1),nr )
1451  DO 4950 p = 1, nr
1452  temp1 = scnrm2( p, cwork(2*n+n*nr+nr+p), nr )
1453  CALL csscal( p, one/temp1, cwork(2*n+n*nr+nr+p), nr )
1454  4950 CONTINUE
1455  CALL cpocon( 'L',nr,cwork(2*n+n*nr+nr+1),nr,one,temp1,
1456  $ cwork(2*n+n*nr+nr+nr*nr+1),rwork,ierr )
1457  condr2 = one / sqrt(temp1)
1458 *
1459 *
1460  IF ( condr2 .GE. cond_ok ) THEN
1461 * .. save the Householder vectors used for Q3
1462 * (this overwrittes the copy of R2, as it will not be
1463 * needed in this branch, but it does not overwritte the
1464 * Huseholder vectors of Q2.).
1465  CALL clacpy( 'U', nr, nr, v, ldv, cwork(2*n+1), n )
1466 * .. and the rest of the information on Q3 is in
1467 * WORK(2*N+N*NR+1:2*N+N*NR+N)
1468  END IF
1469 *
1470  END IF
1471 *
1472  IF ( l2pert ) THEN
1473  xsc = sqrt(small)
1474  DO 4968 q = 2, nr
1475  ctemp = xsc * v(q,q)
1476  DO 4969 p = 1, q - 1
1477 * V(p,q) = - SIGN( TEMP1, V(q,p) )
1478 * V(p,q) = - TEMP1*( V(p,q) / ABS(V(p,q)) )
1479  v(p,q) = - ctemp
1480  4969 CONTINUE
1481  4968 CONTINUE
1482  ELSE
1483  CALL claset( 'U', nr-1,nr-1, czero,czero, v(1,2), ldv )
1484  END IF
1485 *
1486 * Second preconditioning finished; continue with Jacobi SVD
1487 * The input matrix is lower trinagular.
1488 *
1489 * Recover the right singular vectors as solution of a well
1490 * conditioned triangular matrix equation.
1491 *
1492  IF ( condr1 .LT. cond_ok ) THEN
1493 *
1494  CALL cgesvj( 'L','U','N',nr,nr,v,ldv,sva,nr,u, ldu,
1495  $ cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,rwork,
1496  $ lrwork, info )
1497  scalem = rwork(1)
1498  numrank = nint(rwork(2))
1499  DO 3970 p = 1, nr
1500  CALL ccopy( nr, v(1,p), 1, u(1,p), 1 )
1501  CALL csscal( nr, sva(p), v(1,p), 1 )
1502  3970 CONTINUE
1503 
1504 * .. pick the right matrix equation and solve it
1505 *
1506  IF ( nr .EQ. n ) THEN
1507 * :)) .. best case, R1 is inverted. The solution of this matrix
1508 * equation is Q2*V2 = the product of the Jacobi rotations
1509 * used in CGESVJ, premultiplied with the orthogonal matrix
1510 * from the second QR factorization.
1511  CALL ctrsm('L','U','N','N', nr,nr,cone, a,lda, v,ldv)
1512  ELSE
1513 * .. R1 is well conditioned, but non-square. Adjoint of R2
1514 * is inverted to get the product of the Jacobi rotations
1515 * used in CGESVJ. The Q-factor from the second QR
1516 * factorization is then built in explicitly.
1517  CALL ctrsm('L','U','C','N',nr,nr,cone,cwork(2*n+1),
1518  $ n,v,ldv)
1519  IF ( nr .LT. n ) THEN
1520  CALL claset('A',n-nr,nr,zero,czero,v(nr+1,1),ldv)
1521  CALL claset('A',nr,n-nr,zero,czero,v(1,nr+1),ldv)
1522  CALL claset('A',n-nr,n-nr,zero,cone,v(nr+1,nr+1),ldv)
1523  END IF
1524  CALL cunmqr('L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1525  $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr)
1526  END IF
1527 *
1528  ELSE IF ( condr2 .LT. cond_ok ) THEN
1529 *
1530 * The matrix R2 is inverted. The solution of the matrix equation
1531 * is Q3^* * V3 = the product of the Jacobi rotations (appplied to
1532 * the lower triangular L3 from the LQ factorization of
1533 * R2=L3*Q3), pre-multiplied with the transposed Q3.
1534  CALL cgesvj( 'L', 'U', 'N', nr, nr, v, ldv, sva, nr, u,
1535  $ ldu, cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr,
1536  $ rwork, lrwork, info )
1537  scalem = rwork(1)
1538  numrank = nint(rwork(2))
1539  DO 3870 p = 1, nr
1540  CALL ccopy( nr, v(1,p), 1, u(1,p), 1 )
1541  CALL csscal( nr, sva(p), u(1,p), 1 )
1542  3870 CONTINUE
1543  CALL ctrsm('L','U','N','N',nr,nr,cone,cwork(2*n+1),n,
1544  $ u,ldu)
1545 * .. apply the permutation from the second QR factorization
1546  DO 873 q = 1, nr
1547  DO 872 p = 1, nr
1548  cwork(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1549  872 CONTINUE
1550  DO 874 p = 1, nr
1551  u(p,q) = cwork(2*n+n*nr+nr+p)
1552  874 CONTINUE
1553  873 CONTINUE
1554  IF ( nr .LT. n ) THEN
1555  CALL claset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1556  CALL claset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1557  CALL claset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1558  END IF
1559  CALL cunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1560  $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1561  ELSE
1562 * Last line of defense.
1563 * #:( This is a rather pathological case: no scaled condition
1564 * improvement after two pivoted QR factorizations. Other
1565 * possibility is that the rank revealing QR factorization
1566 * or the condition estimator has failed, or the COND_OK
1567 * is set very close to ONE (which is unnecessary). Normally,
1568 * this branch should never be executed, but in rare cases of
1569 * failure of the RRQR or condition estimator, the last line of
1570 * defense ensures that CGEJSV completes the task.
1571 * Compute the full SVD of L3 using CGESVJ with explicit
1572 * accumulation of Jacobi rotations.
1573  CALL cgesvj( 'L', 'U', 'V', nr, nr, v, ldv, sva, nr, u,
1574  $ ldu, cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr,
1575  $ rwork, lrwork, info )
1576  scalem = rwork(1)
1577  numrank = nint(rwork(2))
1578  IF ( nr .LT. n ) THEN
1579  CALL claset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1580  CALL claset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1581  CALL claset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1582  END IF
1583  CALL cunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1584  $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1585 *
1586  CALL cunmlq( 'L', 'C', nr, nr, nr, cwork(2*n+1), n,
1587  $ cwork(2*n+n*nr+1), u, ldu, cwork(2*n+n*nr+nr+1),
1588  $ lwork-2*n-n*nr-nr, ierr )
1589  DO 773 q = 1, nr
1590  DO 772 p = 1, nr
1591  cwork(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1592  772 CONTINUE
1593  DO 774 p = 1, nr
1594  u(p,q) = cwork(2*n+n*nr+nr+p)
1595  774 CONTINUE
1596  773 CONTINUE
1597 *
1598  END IF
1599 *
1600 * Permute the rows of V using the (column) permutation from the
1601 * first QRF. Also, scale the columns to make them unit in
1602 * Euclidean norm. This applies to all cases.
1603 *
1604  temp1 = sqrt(float(n)) * epsln
1605  DO 1972 q = 1, n
1606  DO 972 p = 1, n
1607  cwork(2*n+n*nr+nr+iwork(p)) = v(p,q)
1608  972 CONTINUE
1609  DO 973 p = 1, n
1610  v(p,q) = cwork(2*n+n*nr+nr+p)
1611  973 CONTINUE
1612  xsc = one / scnrm2( n, v(1,q), 1 )
1613  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1614  $ CALL csscal( n, xsc, v(1,q), 1 )
1615  1972 CONTINUE
1616 * At this moment, V contains the right singular vectors of A.
1617 * Next, assemble the left singular vector matrix U (M x N).
1618  IF ( nr .LT. m ) THEN
1619  CALL claset('A', m-nr, nr, czero, czero, u(nr+1,1), ldu)
1620  IF ( nr .LT. n1 ) THEN
1621  CALL claset('A',nr,n1-nr,czero,czero,u(1,nr+1),ldu)
1622  CALL claset('A',m-nr,n1-nr,czero,cone,
1623  $ u(nr+1,nr+1),ldu)
1624  END IF
1625  END IF
1626 *
1627 * The Q matrix from the first QRF is built into the left singular
1628 * matrix U. This applies to all cases.
1629 *
1630  CALL cunmqr( 'Left', 'No_Tr', m, n1, n, a, lda, cwork, u,
1631  $ ldu, cwork(n+1), lwork-n, ierr )
1632 
1633 * The columns of U are normalized. The cost is O(M*N) flops.
1634  temp1 = sqrt(float(m)) * epsln
1635  DO 1973 p = 1, nr
1636  xsc = one / scnrm2( m, u(1,p), 1 )
1637  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1638  $ CALL csscal( m, xsc, u(1,p), 1 )
1639  1973 CONTINUE
1640 *
1641 * If the initial QRF is computed with row pivoting, the left
1642 * singular vectors must be adjusted.
1643 *
1644  IF ( rowpiv )
1645  $ CALL claswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1646 *
1647  ELSE
1648 *
1649 * .. the initial matrix A has almost orthogonal columns and
1650 * the second QRF is not needed
1651 *
1652  CALL clacpy( 'Upper', n, n, a, lda, cwork(n+1), n )
1653  IF ( l2pert ) THEN
1654  xsc = sqrt(small)
1655  DO 5970 p = 2, n
1656  ctemp = xsc * cwork( n + (p-1)*n + p )
1657  DO 5971 q = 1, p - 1
1658 * CWORK(N+(q-1)*N+p)=-TEMP1 * ( CWORK(N+(p-1)*N+q) /
1659 * $ ABS(CWORK(N+(p-1)*N+q)) )
1660  cwork(n+(q-1)*n+p)=-ctemp
1661  5971 CONTINUE
1662  5970 CONTINUE
1663  ELSE
1664  CALL claset( 'Lower',n-1,n-1,czero,czero,cwork(n+2),n )
1665  END IF
1666 *
1667  CALL cgesvj( 'Upper', 'U', 'N', n, n, cwork(n+1), n, sva,
1668  $ n, u, ldu, cwork(n+n*n+1), lwork-n-n*n, rwork, lrwork,
1669  $ info )
1670 *
1671  scalem = rwork(1)
1672  numrank = nint(rwork(2))
1673  DO 6970 p = 1, n
1674  CALL ccopy( n, cwork(n+(p-1)*n+1), 1, u(1,p), 1 )
1675  CALL csscal( n, sva(p), cwork(n+(p-1)*n+1), 1 )
1676  6970 CONTINUE
1677 *
1678  CALL ctrsm( 'Left', 'Upper', 'NoTrans', 'No UD', n, n,
1679  $ cone, a, lda, cwork(n+1), n )
1680  DO 6972 p = 1, n
1681  CALL ccopy( n, cwork(n+p), n, v(iwork(p),1), ldv )
1682  6972 CONTINUE
1683  temp1 = sqrt(float(n))*epsln
1684  DO 6971 p = 1, n
1685  xsc = one / scnrm2( n, v(1,p), 1 )
1686  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1687  $ CALL csscal( n, xsc, v(1,p), 1 )
1688  6971 CONTINUE
1689 *
1690 * Assemble the left singular vector matrix U (M x N).
1691 *
1692  IF ( n .LT. m ) THEN
1693  CALL claset( 'A', m-n, n, czero, czero, u(n+1,1), ldu )
1694  IF ( n .LT. n1 ) THEN
1695  CALL claset('A',n, n1-n, czero, czero, u(1,n+1),ldu)
1696  CALL claset( 'A',m-n,n1-n, czero, cone,u(n+1,n+1),ldu)
1697  END IF
1698  END IF
1699  CALL cunmqr( 'Left', 'No Tr', m, n1, n, a, lda, cwork, u,
1700  $ ldu, cwork(n+1), lwork-n, ierr )
1701  temp1 = sqrt(float(m))*epsln
1702  DO 6973 p = 1, n1
1703  xsc = one / scnrm2( m, u(1,p), 1 )
1704  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1705  $ CALL csscal( m, xsc, u(1,p), 1 )
1706  6973 CONTINUE
1707 *
1708  IF ( rowpiv )
1709  $ CALL claswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1710 *
1711  END IF
1712 *
1713 * end of the >> almost orthogonal case << in the full SVD
1714 *
1715  ELSE
1716 *
1717 * This branch deploys a preconditioned Jacobi SVD with explicitly
1718 * accumulated rotations. It is included as optional, mainly for
1719 * experimental purposes. It does perfom well, and can also be used.
1720 * In this implementation, this branch will be automatically activated
1721 * if the condition number sigma_max(A) / sigma_min(A) is predicted
1722 * to be greater than the overflow threshold. This is because the
1723 * a posteriori computation of the singular vectors assumes robust
1724 * implementation of BLAS and some LAPACK procedures, capable of working
1725 * in presence of extreme values. Since that is not always the case, ...
1726 *
1727  DO 7968 p = 1, nr
1728  CALL ccopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1729  CALL clacgv( n-p+1, v(p,p), 1 )
1730  7968 CONTINUE
1731 *
1732  IF ( l2pert ) THEN
1733  xsc = sqrt(small/epsln)
1734  DO 5969 q = 1, nr
1735  ctemp = cmplx(xsc*abs( v(q,q) ),zero)
1736  DO 5968 p = 1, n
1737  IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
1738  $ .OR. ( p .LT. q ) )
1739 * $ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) )
1740  $ v(p,q) = ctemp
1741  IF ( p .LT. q ) v(p,q) = - v(p,q)
1742  5968 CONTINUE
1743  5969 CONTINUE
1744  ELSE
1745  CALL claset( 'U', nr-1, nr-1, czero, czero, v(1,2), ldv )
1746  END IF
1747 
1748  CALL cgeqrf( n, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1749  $ lwork-2*n, ierr )
1750  CALL clacpy( 'L', n, nr, v, ldv, cwork(2*n+1), n )
1751 *
1752  DO 7969 p = 1, nr
1753  CALL ccopy( nr-p+1, v(p,p), ldv, u(p,p), 1 )
1754  CALL clacgv( nr-p+1, u(p,p), 1 )
1755  7969 CONTINUE
1756 
1757  IF ( l2pert ) THEN
1758  xsc = sqrt(small/epsln)
1759  DO 9970 q = 2, nr
1760  DO 9971 p = 1, q - 1
1761  ctemp = cmplx(xsc * amin1(abs(u(p,p)),abs(u(q,q))),
1762  $ zero)
1763 * U(p,q) = - TEMP1 * ( U(q,p) / ABS(U(q,p)) )
1764  u(p,q) = - ctemp
1765  9971 CONTINUE
1766  9970 CONTINUE
1767  ELSE
1768  CALL claset('U', nr-1, nr-1, czero, czero, u(1,2), ldu )
1769  END IF
1770 
1771  CALL cgesvj( 'L', 'U', 'V', nr, nr, u, ldu, sva,
1772  $ n, v, ldv, cwork(2*n+n*nr+1), lwork-2*n-n*nr,
1773  $ rwork, lrwork, info )
1774  scalem = rwork(1)
1775  numrank = nint(rwork(2))
1776 
1777  IF ( nr .LT. n ) THEN
1778  CALL claset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1779  CALL claset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1780  CALL claset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1781  END IF
1782 
1783  CALL cunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1784  $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1785 *
1786 * Permute the rows of V using the (column) permutation from the
1787 * first QRF. Also, scale the columns to make them unit in
1788 * Euclidean norm. This applies to all cases.
1789 *
1790  temp1 = sqrt(float(n)) * epsln
1791  DO 7972 q = 1, n
1792  DO 8972 p = 1, n
1793  cwork(2*n+n*nr+nr+iwork(p)) = v(p,q)
1794  8972 CONTINUE
1795  DO 8973 p = 1, n
1796  v(p,q) = cwork(2*n+n*nr+nr+p)
1797  8973 CONTINUE
1798  xsc = one / scnrm2( n, v(1,q), 1 )
1799  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1800  $ CALL csscal( n, xsc, v(1,q), 1 )
1801  7972 CONTINUE
1802 *
1803 * At this moment, V contains the right singular vectors of A.
1804 * Next, assemble the left singular vector matrix U (M x N).
1805 *
1806  IF ( nr .LT. m ) THEN
1807  CALL claset( 'A', m-nr, nr, czero, czero, u(nr+1,1), ldu )
1808  IF ( nr .LT. n1 ) THEN
1809  CALL claset('A',nr, n1-nr, czero, czero, u(1,nr+1),ldu)
1810  CALL claset('A',m-nr,n1-nr, czero, cone,u(nr+1,nr+1),ldu)
1811  END IF
1812  END IF
1813 *
1814  CALL cunmqr( 'Left', 'No Tr', m, n1, n, a, lda, cwork, u,
1815  $ ldu, cwork(n+1), lwork-n, ierr )
1816 *
1817  IF ( rowpiv )
1818  $ CALL claswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1819 *
1820 *
1821  END IF
1822  IF ( transp ) THEN
1823 * .. swap U and V because the procedure worked on A^*
1824  DO 6974 p = 1, n
1825  CALL cswap( n, u(1,p), 1, v(1,p), 1 )
1826  6974 CONTINUE
1827  END IF
1828 *
1829  END IF
1830 * end of the full SVD
1831 *
1832 * Undo scaling, if necessary (and possible)
1833 *
1834  IF ( uscal2 .LE. (big/sva(1))*uscal1 ) THEN
1835  CALL clascl( 'G', 0, 0, uscal1, uscal2, nr, 1, sva, n, ierr )
1836  uscal1 = one
1837  uscal2 = one
1838  END IF
1839 *
1840  IF ( nr .LT. n ) THEN
1841  DO 3004 p = nr+1, n
1842  sva(p) = zero
1843  3004 CONTINUE
1844  END IF
1845 *
1846  rwork(1) = uscal2 * scalem
1847  rwork(2) = uscal1
1848  IF ( errest ) rwork(3) = sconda
1849  IF ( lsvec .AND. rsvec ) THEN
1850  rwork(4) = condr1
1851  rwork(5) = condr2
1852  END IF
1853  IF ( l2tran ) THEN
1854  rwork(6) = entra
1855  rwork(7) = entrat
1856  END IF
1857 *
1858  iwork(1) = nr
1859  iwork(2) = numrank
1860  iwork(3) = warning
1861 *
1862  RETURN
1863 * ..
1864 * .. END OF CGEJSV
1865 * ..
subroutine claswp(N, A, LDA, K1, K2, IPIV, INCX)
CLASWP performs a series of row interchanges on a general rectangular matrix.
Definition: claswp.f:116
subroutine claset(UPLO, M, N, ALPHA, BETA, A, LDA)
CLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values...
Definition: claset.f:108
subroutine cgeqp3(M, N, A, LDA, JPVT, TAU, WORK, LWORK, RWORK, INFO)
CGEQP3
Definition: cgeqp3.f:161
real function slamch(CMACH)
SLAMCH
Definition: slamch.f:69
subroutine cgeqrf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
CGEQRF
Definition: cgeqrf.f:138
integer function isamax(N, SX, INCX)
ISAMAX
Definition: isamax.f:53
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:62
subroutine cunmqr(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
CUNMQR
Definition: cunmqr.f:170
subroutine csscal(N, SA, CX, INCX)
CSSCAL
Definition: csscal.f:54
subroutine ctrsm(SIDE, UPLO, TRANSA, DIAG, M, N, ALPHA, A, LDA, B, LDB)
CTRSM
Definition: ctrsm.f:182
subroutine clacgv(N, X, INCX)
CLACGV conjugates a complex vector.
Definition: clacgv.f:76
logical function lsame(CA, CB)
LSAME
Definition: lsame.f:55
real function scnrm2(N, X, INCX)
SCNRM2
Definition: scnrm2.f:56
subroutine clacpy(UPLO, M, N, A, LDA, B, LDB)
CLACPY copies all or part of one two-dimensional array to another.
Definition: clacpy.f:105
subroutine cgelqf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
CGELQF
Definition: cgelqf.f:137
subroutine sscal(N, SA, SX, INCX)
SSCAL
Definition: sscal.f:55
subroutine cunmlq(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
CUNMLQ
Definition: cunmlq.f:170
subroutine classq(N, X, INCX, SCALE, SUMSQ)
CLASSQ updates a sum of squares represented in scaled form.
Definition: classq.f:108
subroutine clascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
CLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: clascl.f:141
subroutine cgesvj(JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, LDV, CWORK, LWORK, RWORK, LRWORK, INFO)
CGESVJ
Definition: cgesvj.f:328
subroutine ccopy(N, CX, INCX, CY, INCY)
CCOPY
Definition: ccopy.f:52
subroutine cpocon(UPLO, N, A, LDA, ANORM, RCOND, WORK, RWORK, INFO)
CPOCON
Definition: cpocon.f:123
subroutine cungqr(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
CUNGQR
Definition: cungqr.f:130
subroutine cswap(N, CX, INCX, CY, INCY)
CSWAP
Definition: cswap.f:52

Here is the call graph for this function:

Here is the caller graph for this function:

subroutine cgesdd ( character  JOBZ,
integer  M,
integer  N,
complex, dimension( lda, * )  A,
integer  LDA,
real, dimension( * )  S,
complex, dimension( ldu, * )  U,
integer  LDU,
complex, dimension( ldvt, * )  VT,
integer  LDVT,
complex, dimension( * )  WORK,
integer  LWORK,
real, dimension( * )  RWORK,
integer, dimension( * )  IWORK,
integer  INFO 
)

CGESDD

Download CGESDD + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 CGESDD computes the singular value decomposition (SVD) of a complex
 M-by-N matrix A, optionally computing the left and/or right singular
 vectors, by using divide-and-conquer method. The SVD is written

      A = U * SIGMA * conjugate-transpose(V)

 where SIGMA is an M-by-N matrix which is zero except for its
 min(m,n) diagonal elements, U is an M-by-M unitary matrix, and
 V is an N-by-N unitary matrix.  The diagonal elements of SIGMA
 are the singular values of A; they are real and non-negative, and
 are returned in descending order.  The first min(m,n) columns of
 U and V are the left and right singular vectors of A.

 Note that the routine returns VT = V**H, not V.

 The divide and conquer algorithm makes very mild assumptions about
 floating point arithmetic. It will work on machines with a guard
 digit in add/subtract, or on those binary machines without guard
 digits which subtract like the Cray X-MP, Cray Y-MP, Cray C-90, or
 Cray-2. It could conceivably fail on hexadecimal or decimal machines
 without guard digits, but we know of none.
Parameters
[in]JOBZ
          JOBZ is CHARACTER*1
          Specifies options for computing all or part of the matrix U:
          = 'A':  all M columns of U and all N rows of V**H are
                  returned in the arrays U and VT;
          = 'S':  the first min(M,N) columns of U and the first
                  min(M,N) rows of V**H are returned in the arrays U
                  and VT;
          = 'O':  If M >= N, the first N columns of U are overwritten
                  in the array A and all rows of V**H are returned in
                  the array VT;
                  otherwise, all columns of U are returned in the
                  array U and the first M rows of V**H are overwritten
                  in the array A;
          = 'N':  no columns of U or rows of V**H are computed.
[in]M
          M is INTEGER
          The number of rows of the input matrix A.  M >= 0.
[in]N
          N is INTEGER
          The number of columns of the input matrix A.  N >= 0.
[in,out]A
          A is COMPLEX array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
          On exit,
          if JOBZ = 'O',  A is overwritten with the first N columns
                          of U (the left singular vectors, stored
                          columnwise) if M >= N;
                          A is overwritten with the first M rows
                          of V**H (the right singular vectors, stored
                          rowwise) otherwise.
          if JOBZ .ne. 'O', the contents of A are destroyed.
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]S
          S is REAL array, dimension (min(M,N))
          The singular values of A, sorted so that S(i) >= S(i+1).
[out]U
          U is COMPLEX array, dimension (LDU,UCOL)
          UCOL = M if JOBZ = 'A' or JOBZ = 'O' and M < N;
          UCOL = min(M,N) if JOBZ = 'S'.
          If JOBZ = 'A' or JOBZ = 'O' and M < N, U contains the M-by-M
          unitary matrix U;
          if JOBZ = 'S', U contains the first min(M,N) columns of U
          (the left singular vectors, stored columnwise);
          if JOBZ = 'O' and M >= N, or JOBZ = 'N', U is not referenced.
[in]LDU
          LDU is INTEGER
          The leading dimension of the array U.  LDU >= 1; if
          JOBZ = 'S' or 'A' or JOBZ = 'O' and M < N, LDU >= M.
[out]VT
          VT is COMPLEX array, dimension (LDVT,N)
          If JOBZ = 'A' or JOBZ = 'O' and M >= N, VT contains the
          N-by-N unitary matrix V**H;
          if JOBZ = 'S', VT contains the first min(M,N) rows of
          V**H (the right singular vectors, stored rowwise);
          if JOBZ = 'O' and M < N, or JOBZ = 'N', VT is not referenced.
[in]LDVT
          LDVT is INTEGER
          The leading dimension of the array VT.  LDVT >= 1; if
          JOBZ = 'A' or JOBZ = 'O' and M >= N, LDVT >= N;
          if JOBZ = 'S', LDVT >= min(M,N).
[out]WORK
          WORK is COMPLEX array, dimension (MAX(1,LWORK))
          On exit, if INFO = 0, WORK(1) returns the optimal LWORK.
[in]LWORK
          LWORK is INTEGER
          The dimension of the array WORK. LWORK >= 1.
          if JOBZ = 'N', LWORK >= 2*min(M,N)+max(M,N).
          if JOBZ = 'O',
                LWORK >= 2*min(M,N)*min(M,N)+2*min(M,N)+max(M,N).
          if JOBZ = 'S' or 'A',
                LWORK >= min(M,N)*min(M,N)+2*min(M,N)+max(M,N).
          For good performance, LWORK should generally be larger.

          If LWORK = -1, a workspace query is assumed.  The optimal
          size for the WORK array is calculated and stored in WORK(1),
          and no other work except argument checking is performed.
[out]RWORK
          RWORK is REAL array, dimension (MAX(1,LRWORK))
          If JOBZ = 'N', LRWORK >= 7*min(M,N).
          Otherwise, 
          LRWORK >= min(M,N)*max(5*min(M,N)+7,2*max(M,N)+2*min(M,N)+1)
[out]IWORK
          IWORK is INTEGER array, dimension (8*min(M,N))
[out]INFO
          INFO is INTEGER
          = 0:  successful exit.
          < 0:  if INFO = -i, the i-th argument had an illegal value.
          > 0:  The updating process of SBDSDC did not converge.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Date
November 2015
Contributors:
Ming Gu and Huan Ren, Computer Science Division, University of California at Berkeley, USA

Definition at line 224 of file cgesdd.f.

224 *
225 * -- LAPACK driver routine (version 3.6.0) --
226 * -- LAPACK is a software package provided by Univ. of Tennessee, --
227 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
228 * November 2015
229 *
230 * .. Scalar Arguments ..
231  CHARACTER jobz
232  INTEGER info, lda, ldu, ldvt, lwork, m, n
233 * ..
234 * .. Array Arguments ..
235  INTEGER iwork( * )
236  REAL rwork( * ), s( * )
237  COMPLEX a( lda, * ), u( ldu, * ), vt( ldvt, * ),
238  $ work( * )
239 * ..
240 *
241 * =====================================================================
242 *
243 * .. Parameters ..
244  INTEGER lquerv
245  parameter( lquerv = -1 )
246  COMPLEX czero, cone
247  parameter( czero = ( 0.0e+0, 0.0e+0 ),
248  $ cone = ( 1.0e+0, 0.0e+0 ) )
249  REAL zero, one
250  parameter( zero = 0.0e+0, one = 1.0e+0 )
251 * ..
252 * .. Local Scalars ..
253  LOGICAL wntqa, wntqas, wntqn, wntqo, wntqs
254  INTEGER blk, chunk, i, ie, ierr, il, ir, iru, irvt,
255  $ iscl, itau, itaup, itauq, iu, ivt, ldwkvt,
256  $ ldwrkl, ldwrkr, ldwrku, maxwrk, minmn, minwrk,
257  $ mnthr1, mnthr2, nrwork, nwork, wrkbl
258  REAL anrm, bignum, eps, smlnum
259 * ..
260 * .. Local Arrays ..
261  INTEGER idum( 1 )
262  REAL dum( 1 )
263 * ..
264 * .. External Subroutines ..
265  EXTERNAL cgebrd, cgelqf, cgemm, cgeqrf, clacp2, clacpy,
268 * ..
269 * .. External Functions ..
270  LOGICAL lsame
271  INTEGER ilaenv
272  REAL clange, slamch
273  EXTERNAL clange, slamch, ilaenv, lsame
274 * ..
275 * .. Intrinsic Functions ..
276  INTRINSIC int, max, min, sqrt
277 * ..
278 * .. Executable Statements ..
279 *
280 * Test the input arguments
281 *
282  info = 0
283  minmn = min( m, n )
284  mnthr1 = int( minmn*17.0 / 9.0 )
285  mnthr2 = int( minmn*5.0 / 3.0 )
286  wntqa = lsame( jobz, 'A' )
287  wntqs = lsame( jobz, 'S' )
288  wntqas = wntqa .OR. wntqs
289  wntqo = lsame( jobz, 'O' )
290  wntqn = lsame( jobz, 'N' )
291  minwrk = 1
292  maxwrk = 1
293 *
294  IF( .NOT.( wntqa .OR. wntqs .OR. wntqo .OR. wntqn ) ) THEN
295  info = -1
296  ELSE IF( m.LT.0 ) THEN
297  info = -2
298  ELSE IF( n.LT.0 ) THEN
299  info = -3
300  ELSE IF( lda.LT.max( 1, m ) ) THEN
301  info = -5
302  ELSE IF( ldu.LT.1 .OR. ( wntqas .AND. ldu.LT.m ) .OR.
303  $ ( wntqo .AND. m.LT.n .AND. ldu.LT.m ) ) THEN
304  info = -8
305  ELSE IF( ldvt.LT.1 .OR. ( wntqa .AND. ldvt.LT.n ) .OR.
306  $ ( wntqs .AND. ldvt.LT.minmn ) .OR.
307  $ ( wntqo .AND. m.GE.n .AND. ldvt.LT.n ) ) THEN
308  info = -10
309  END IF
310 *
311 * Compute workspace
312 * (Note: Comments in the code beginning "Workspace:" describe the
313 * minimal amount of workspace needed at that point in the code,
314 * as well as the preferred amount for good performance.
315 * CWorkspace refers to complex workspace, and RWorkspace to
316 * real workspace. NB refers to the optimal block size for the
317 * immediately following subroutine, as returned by ILAENV.)
318 *
319  IF( info.EQ.0 .AND. m.GT.0 .AND. n.GT.0 ) THEN
320  IF( m.GE.n ) THEN
321 *
322 * There is no complex work space needed for bidiagonal SVD
323 * The real work space needed for bidiagonal SVD is BDSPAC
324 * for computing singular values and singular vectors; BDSPAN
325 * for computing singular values only.
326 * BDSPAC = 5*N*N + 7*N
327 * BDSPAN = MAX(7*N+4, 3*N+2+SMLSIZ*(SMLSIZ+8))
328 *
329  IF( m.GE.mnthr1 ) THEN
330  IF( wntqn ) THEN
331 *
332 * Path 1 (M much larger than N, JOBZ='N')
333 *
334  maxwrk = n + n*ilaenv( 1, 'CGEQRF', ' ', m, n, -1,
335  $ -1 )
336  maxwrk = max( maxwrk, 2*n+2*n*
337  $ ilaenv( 1, 'CGEBRD', ' ', n, n, -1, -1 ) )
338  minwrk = 3*n
339  ELSE IF( wntqo ) THEN
340 *
341 * Path 2 (M much larger than N, JOBZ='O')
342 *
343  wrkbl = n + n*ilaenv( 1, 'CGEQRF', ' ', m, n, -1, -1 )
344  wrkbl = max( wrkbl, n+n*ilaenv( 1, 'CUNGQR', ' ', m,
345  $ n, n, -1 ) )
346  wrkbl = max( wrkbl, 2*n+2*n*
347  $ ilaenv( 1, 'CGEBRD', ' ', n, n, -1, -1 ) )
348  wrkbl = max( wrkbl, 2*n+n*
349  $ ilaenv( 1, 'CUNMBR', 'QLN', n, n, n, -1 ) )
350  wrkbl = max( wrkbl, 2*n+n*
351  $ ilaenv( 1, 'CUNMBR', 'PRC', n, n, n, -1 ) )
352  maxwrk = m*n + n*n + wrkbl
353  minwrk = 2*n*n + 3*n
354  ELSE IF( wntqs ) THEN
355 *
356 * Path 3 (M much larger than N, JOBZ='S')
357 *
358  wrkbl = n + n*ilaenv( 1, 'CGEQRF', ' ', m, n, -1, -1 )
359  wrkbl = max( wrkbl, n+n*ilaenv( 1, 'CUNGQR', ' ', m,
360  $ n, n, -1 ) )
361  wrkbl = max( wrkbl, 2*n+2*n*
362  $ ilaenv( 1, 'CGEBRD', ' ', n, n, -1, -1 ) )
363  wrkbl = max( wrkbl, 2*n+n*
364  $ ilaenv( 1, 'CUNMBR', 'QLN', n, n, n, -1 ) )
365  wrkbl = max( wrkbl, 2*n+n*
366  $ ilaenv( 1, 'CUNMBR', 'PRC', n, n, n, -1 ) )
367  maxwrk = n*n + wrkbl
368  minwrk = n*n + 3*n
369  ELSE IF( wntqa ) THEN
370 *
371 * Path 4 (M much larger than N, JOBZ='A')
372 *
373  wrkbl = n + n*ilaenv( 1, 'CGEQRF', ' ', m, n, -1, -1 )
374  wrkbl = max( wrkbl, n+m*ilaenv( 1, 'CUNGQR', ' ', m,
375  $ m, n, -1 ) )
376  wrkbl = max( wrkbl, 2*n+2*n*
377  $ ilaenv( 1, 'CGEBRD', ' ', n, n, -1, -1 ) )
378  wrkbl = max( wrkbl, 2*n+n*
379  $ ilaenv( 1, 'CUNMBR', 'QLN', n, n, n, -1 ) )
380  wrkbl = max( wrkbl, 2*n+n*
381  $ ilaenv( 1, 'CUNMBR', 'PRC', n, n, n, -1 ) )
382  maxwrk = n*n + wrkbl
383  minwrk = n*n + 2*n + m
384  END IF
385  ELSE IF( m.GE.mnthr2 ) THEN
386 *
387 * Path 5 (M much larger than N, but not as much as MNTHR1)
388 *
389  maxwrk = 2*n + ( m+n )*ilaenv( 1, 'CGEBRD', ' ', m, n,
390  $ -1, -1 )
391  minwrk = 2*n + m
392  IF( wntqo ) THEN
393  maxwrk = max( maxwrk, 2*n+n*
394  $ ilaenv( 1, 'CUNGBR', 'P', n, n, n, -1 ) )
395  maxwrk = max( maxwrk, 2*n+n*
396  $ ilaenv( 1, 'CUNGBR', 'Q', m, n, n, -1 ) )
397  maxwrk = maxwrk + m*n
398  minwrk = minwrk + n*n
399  ELSE IF( wntqs ) THEN
400  maxwrk = max( maxwrk, 2*n+n*
401  $ ilaenv( 1, 'CUNGBR', 'P', n, n, n, -1 ) )
402  maxwrk = max( maxwrk, 2*n+n*
403  $ ilaenv( 1, 'CUNGBR', 'Q', m, n, n, -1 ) )
404  ELSE IF( wntqa ) THEN
405  maxwrk = max( maxwrk, 2*n+n*
406  $ ilaenv( 1, 'CUNGBR', 'P', n, n, n, -1 ) )
407  maxwrk = max( maxwrk, 2*n+m*
408  $ ilaenv( 1, 'CUNGBR', 'Q', m, m, n, -1 ) )
409  END IF
410  ELSE
411 *
412 * Path 6 (M at least N, but not much larger)
413 *
414  maxwrk = 2*n + ( m+n )*ilaenv( 1, 'CGEBRD', ' ', m, n,
415  $ -1, -1 )
416  minwrk = 2*n + m
417  IF( wntqo ) THEN
418  maxwrk = max( maxwrk, 2*n+n*
419  $ ilaenv( 1, 'CUNMBR', 'PRC', n, n, n, -1 ) )
420  maxwrk = max( maxwrk, 2*n+n*
421  $ ilaenv( 1, 'CUNMBR', 'QLN', m, n, n, -1 ) )
422  maxwrk = maxwrk + m*n
423  minwrk = minwrk + n*n
424  ELSE IF( wntqs ) THEN
425  maxwrk = max( maxwrk, 2*n+n*
426  $ ilaenv( 1, 'CUNMBR', 'PRC', n, n, n, -1 ) )
427  maxwrk = max( maxwrk, 2*n+n*
428  $ ilaenv( 1, 'CUNMBR', 'QLN', m, n, n, -1 ) )
429  ELSE IF( wntqa ) THEN
430  maxwrk = max( maxwrk, 2*n+n*
431  $ ilaenv( 1, 'CUNGBR', 'PRC', n, n, n, -1 ) )
432  maxwrk = max( maxwrk, 2*n+m*
433  $ ilaenv( 1, 'CUNGBR', 'QLN', m, m, n, -1 ) )
434  END IF
435  END IF
436  ELSE
437 *
438 * There is no complex work space needed for bidiagonal SVD
439 * The real work space needed for bidiagonal SVD is BDSPAC
440 * for computing singular values and singular vectors; BDSPAN
441 * for computing singular values only.
442 * BDSPAC = 5*M*M + 7*M
443 * BDSPAN = MAX(7*M+4, 3*M+2+SMLSIZ*(SMLSIZ+8))
444 *
445  IF( n.GE.mnthr1 ) THEN
446  IF( wntqn ) THEN
447 *
448 * Path 1t (N much larger than M, JOBZ='N')
449 *
450  maxwrk = m + m*ilaenv( 1, 'CGELQF', ' ', m, n, -1,
451  $ -1 )
452  maxwrk = max( maxwrk, 2*m+2*m*
453  $ ilaenv( 1, 'CGEBRD', ' ', m, m, -1, -1 ) )
454  minwrk = 3*m
455  ELSE IF( wntqo ) THEN
456 *
457 * Path 2t (N much larger than M, JOBZ='O')
458 *
459  wrkbl = m + m*ilaenv( 1, 'CGELQF', ' ', m, n, -1, -1 )
460  wrkbl = max( wrkbl, m+m*ilaenv( 1, 'CUNGLQ', ' ', m,
461  $ n, m, -1 ) )
462  wrkbl = max( wrkbl, 2*m+2*m*
463  $ ilaenv( 1, 'CGEBRD', ' ', m, m, -1, -1 ) )
464  wrkbl = max( wrkbl, 2*m+m*
465  $ ilaenv( 1, 'CUNMBR', 'PRC', m, m, m, -1 ) )
466  wrkbl = max( wrkbl, 2*m+m*
467  $ ilaenv( 1, 'CUNMBR', 'QLN', m, m, m, -1 ) )
468  maxwrk = m*n + m*m + wrkbl
469  minwrk = 2*m*m + 3*m
470  ELSE IF( wntqs ) THEN
471 *
472 * Path 3t (N much larger than M, JOBZ='S')
473 *
474  wrkbl = m + m*ilaenv( 1, 'CGELQF', ' ', m, n, -1, -1 )
475  wrkbl = max( wrkbl, m+m*ilaenv( 1, 'CUNGLQ', ' ', m,
476  $ n, m, -1 ) )
477  wrkbl = max( wrkbl, 2*m+2*m*
478  $ ilaenv( 1, 'CGEBRD', ' ', m, m, -1, -1 ) )
479  wrkbl = max( wrkbl, 2*m+m*
480  $ ilaenv( 1, 'CUNMBR', 'PRC', m, m, m, -1 ) )
481  wrkbl = max( wrkbl, 2*m+m*
482  $ ilaenv( 1, 'CUNMBR', 'QLN', m, m, m, -1 ) )
483  maxwrk = m*m + wrkbl
484  minwrk = m*m + 3*m
485  ELSE IF( wntqa ) THEN
486 *
487 * Path 4t (N much larger than M, JOBZ='A')
488 *
489  wrkbl = m + m*ilaenv( 1, 'CGELQF', ' ', m, n, -1, -1 )
490  wrkbl = max( wrkbl, m+n*ilaenv( 1, 'CUNGLQ', ' ', n,
491  $ n, m, -1 ) )
492  wrkbl = max( wrkbl, 2*m+2*m*
493  $ ilaenv( 1, 'CGEBRD', ' ', m, m, -1, -1 ) )
494  wrkbl = max( wrkbl, 2*m+m*
495  $ ilaenv( 1, 'CUNMBR', 'PRC', m, m, m, -1 ) )
496  wrkbl = max( wrkbl, 2*m+m*
497  $ ilaenv( 1, 'CUNMBR', 'QLN', m, m, m, -1 ) )
498  maxwrk = m*m + wrkbl
499  minwrk = m*m + 2*m + n
500  END IF
501  ELSE IF( n.GE.mnthr2 ) THEN
502 *
503 * Path 5t (N much larger than M, but not as much as MNTHR1)
504 *
505  maxwrk = 2*m + ( m+n )*ilaenv( 1, 'CGEBRD', ' ', m, n,
506  $ -1, -1 )
507  minwrk = 2*m + n
508  IF( wntqo ) THEN
509  maxwrk = max( maxwrk, 2*m+m*
510  $ ilaenv( 1, 'CUNGBR', 'P', m, n, m, -1 ) )
511  maxwrk = max( maxwrk, 2*m+m*
512  $ ilaenv( 1, 'CUNGBR', 'Q', m, m, n, -1 ) )
513  maxwrk = maxwrk + m*n
514  minwrk = minwrk + m*m
515  ELSE IF( wntqs ) THEN
516  maxwrk = max( maxwrk, 2*m+m*
517  $ ilaenv( 1, 'CUNGBR', 'P', m, n, m, -1 ) )
518  maxwrk = max( maxwrk, 2*m+m*
519  $ ilaenv( 1, 'CUNGBR', 'Q', m, m, n, -1 ) )
520  ELSE IF( wntqa ) THEN
521  maxwrk = max( maxwrk, 2*m+n*
522  $ ilaenv( 1, 'CUNGBR', 'P', n, n, m, -1 ) )
523  maxwrk = max( maxwrk, 2*m+m*
524  $ ilaenv( 1, 'CUNGBR', 'Q', m, m, n, -1 ) )
525  END IF
526  ELSE
527 *
528 * Path 6t (N greater than M, but not much larger)
529 *
530  maxwrk = 2*m + ( m+n )*ilaenv( 1, 'CGEBRD', ' ', m, n,
531  $ -1, -1 )
532  minwrk = 2*m + n
533  IF( wntqo ) THEN
534  maxwrk = max( maxwrk, 2*m+m*
535  $ ilaenv( 1, 'CUNMBR', 'PRC', m, n, m, -1 ) )
536  maxwrk = max( maxwrk, 2*m+m*
537  $ ilaenv( 1, 'CUNMBR', 'QLN', m, m, n, -1 ) )
538  maxwrk = maxwrk + m*n
539  minwrk = minwrk + m*m
540  ELSE IF( wntqs ) THEN
541  maxwrk = max( maxwrk, 2*m+m*
542  $ ilaenv( 1, 'CUNGBR', 'PRC', m, n, m, -1 ) )
543  maxwrk = max( maxwrk, 2*m+m*
544  $ ilaenv( 1, 'CUNGBR', 'QLN', m, m, n, -1 ) )
545  ELSE IF( wntqa ) THEN
546  maxwrk = max( maxwrk, 2*m+n*
547  $ ilaenv( 1, 'CUNGBR', 'PRC', n, n, m, -1 ) )
548  maxwrk = max( maxwrk, 2*m+m*
549  $ ilaenv( 1, 'CUNGBR', 'QLN', m, m, n, -1 ) )
550  END IF
551  END IF
552  END IF
553  maxwrk = max( maxwrk, minwrk )
554  END IF
555  IF( info.EQ.0 ) THEN
556  work( 1 ) = maxwrk
557  IF( lwork.LT.minwrk .AND. lwork.NE.lquerv )
558  $ info = -13
559  END IF
560 *
561 * Quick returns
562 *
563  IF( info.NE.0 ) THEN
564  CALL xerbla( 'CGESDD', -info )
565  RETURN
566  END IF
567  IF( lwork.EQ.lquerv )
568  $ RETURN
569  IF( m.EQ.0 .OR. n.EQ.0 ) THEN
570  RETURN
571  END IF
572 *
573 * Get machine constants
574 *
575  eps = slamch( 'P' )
576  smlnum = sqrt( slamch( 'S' ) ) / eps
577  bignum = one / smlnum
578 *
579 * Scale A if max element outside range [SMLNUM,BIGNUM]
580 *
581  anrm = clange( 'M', m, n, a, lda, dum )
582  iscl = 0
583  IF( anrm.GT.zero .AND. anrm.LT.smlnum ) THEN
584  iscl = 1
585  CALL clascl( 'G', 0, 0, anrm, smlnum, m, n, a, lda, ierr )
586  ELSE IF( anrm.GT.bignum ) THEN
587  iscl = 1
588  CALL clascl( 'G', 0, 0, anrm, bignum, m, n, a, lda, ierr )
589  END IF
590 *
591  IF( m.GE.n ) THEN
592 *
593 * A has at least as many rows as columns. If A has sufficiently
594 * more rows than columns, first reduce using the QR
595 * decomposition (if sufficient workspace available)
596 *
597  IF( m.GE.mnthr1 ) THEN
598 *
599  IF( wntqn ) THEN
600 *
601 * Path 1 (M much larger than N, JOBZ='N')
602 * No singular vectors to be computed
603 *
604  itau = 1
605  nwork = itau + n
606 *
607 * Compute A=Q*R
608 * (CWorkspace: need 2*N, prefer N+N*NB)
609 * (RWorkspace: need 0)
610 *
611  CALL cgeqrf( m, n, a, lda, work( itau ), work( nwork ),
612  $ lwork-nwork+1, ierr )
613 *
614 * Zero out below R
615 *
616  CALL claset( 'L', n-1, n-1, czero, czero, a( 2, 1 ),
617  $ lda )
618  ie = 1
619  itauq = 1
620  itaup = itauq + n
621  nwork = itaup + n
622 *
623 * Bidiagonalize R in A
624 * (CWorkspace: need 3*N, prefer 2*N+2*N*NB)
625 * (RWorkspace: need N)
626 *
627  CALL cgebrd( n, n, a, lda, s, rwork( ie ), work( itauq ),
628  $ work( itaup ), work( nwork ), lwork-nwork+1,
629  $ ierr )
630  nrwork = ie + n
631 *
632 * Perform bidiagonal SVD, compute singular values only
633 * (CWorkspace: 0)
634 * (RWorkspace: need BDSPAN)
635 *
636  CALL sbdsdc( 'U', 'N', n, s, rwork( ie ), dum, 1, dum, 1,
637  $ dum, idum, rwork( nrwork ), iwork, info )
638 *
639  ELSE IF( wntqo ) THEN
640 *
641 * Path 2 (M much larger than N, JOBZ='O')
642 * N left singular vectors to be overwritten on A and
643 * N right singular vectors to be computed in VT
644 *
645  iu = 1
646 *
647 * WORK(IU) is N by N
648 *
649  ldwrku = n
650  ir = iu + ldwrku*n
651  IF( lwork.GE.m*n+n*n+3*n ) THEN
652 *
653 * WORK(IR) is M by N
654 *
655  ldwrkr = m
656  ELSE
657  ldwrkr = ( lwork-n*n-3*n ) / n
658  END IF
659  itau = ir + ldwrkr*n
660  nwork = itau + n
661 *
662 * Compute A=Q*R
663 * (CWorkspace: need N*N+2*N, prefer M*N+N+N*NB)
664 * (RWorkspace: 0)
665 *
666  CALL cgeqrf( m, n, a, lda, work( itau ), work( nwork ),
667  $ lwork-nwork+1, ierr )
668 *
669 * Copy R to WORK( IR ), zeroing out below it
670 *
671  CALL clacpy( 'U', n, n, a, lda, work( ir ), ldwrkr )
672  CALL claset( 'L', n-1, n-1, czero, czero, work( ir+1 ),
673  $ ldwrkr )
674 *
675 * Generate Q in A
676 * (CWorkspace: need 2*N, prefer N+N*NB)
677 * (RWorkspace: 0)
678 *
679  CALL cungqr( m, n, n, a, lda, work( itau ),
680  $ work( nwork ), lwork-nwork+1, ierr )
681  ie = 1
682  itauq = itau
683  itaup = itauq + n
684  nwork = itaup + n
685 *
686 * Bidiagonalize R in WORK(IR)
687 * (CWorkspace: need N*N+3*N, prefer M*N+2*N+2*N*NB)
688 * (RWorkspace: need N)
689 *
690  CALL cgebrd( n, n, work( ir ), ldwrkr, s, rwork( ie ),
691  $ work( itauq ), work( itaup ), work( nwork ),
692  $ lwork-nwork+1, ierr )
693 *
694 * Perform bidiagonal SVD, computing left singular vectors
695 * of R in WORK(IRU) and computing right singular vectors
696 * of R in WORK(IRVT)
697 * (CWorkspace: need 0)
698 * (RWorkspace: need BDSPAC)
699 *
700  iru = ie + n
701  irvt = iru + n*n
702  nrwork = irvt + n*n
703  CALL sbdsdc( 'U', 'I', n, s, rwork( ie ), rwork( iru ),
704  $ n, rwork( irvt ), n, dum, idum,
705  $ rwork( nrwork ), iwork, info )
706 *
707 * Copy real matrix RWORK(IRU) to complex matrix WORK(IU)
708 * Overwrite WORK(IU) by the left singular vectors of R
709 * (CWorkspace: need 2*N*N+3*N, prefer M*N+N*N+2*N+N*NB)
710 * (RWorkspace: 0)
711 *
712  CALL clacp2( 'F', n, n, rwork( iru ), n, work( iu ),
713  $ ldwrku )
714  CALL cunmbr( 'Q', 'L', 'N', n, n, n, work( ir ), ldwrkr,
715  $ work( itauq ), work( iu ), ldwrku,
716  $ work( nwork ), lwork-nwork+1, ierr )
717 *
718 * Copy real matrix RWORK(IRVT) to complex matrix VT
719 * Overwrite VT by the right singular vectors of R
720 * (CWorkspace: need N*N+3*N, prefer M*N+2*N+N*NB)
721 * (RWorkspace: 0)
722 *
723  CALL clacp2( 'F', n, n, rwork( irvt ), n, vt, ldvt )
724  CALL cunmbr( 'P', 'R', 'C', n, n, n, work( ir ), ldwrkr,
725  $ work( itaup ), vt, ldvt, work( nwork ),
726  $ lwork-nwork+1, ierr )
727 *
728 * Multiply Q in A by left singular vectors of R in
729 * WORK(IU), storing result in WORK(IR) and copying to A
730 * (CWorkspace: need 2*N*N, prefer N*N+M*N)
731 * (RWorkspace: 0)
732 *
733  DO 10 i = 1, m, ldwrkr
734  chunk = min( m-i+1, ldwrkr )
735  CALL cgemm( 'N', 'N', chunk, n, n, cone, a( i, 1 ),
736  $ lda, work( iu ), ldwrku, czero,
737  $ work( ir ), ldwrkr )
738  CALL clacpy( 'F', chunk, n, work( ir ), ldwrkr,
739  $ a( i, 1 ), lda )
740  10 CONTINUE
741 *
742  ELSE IF( wntqs ) THEN
743 *
744 * Path 3 (M much larger than N, JOBZ='S')
745 * N left singular vectors to be computed in U and
746 * N right singular vectors to be computed in VT
747 *
748  ir = 1
749 *
750 * WORK(IR) is N by N
751 *
752  ldwrkr = n
753  itau = ir + ldwrkr*n
754  nwork = itau + n
755 *
756 * Compute A=Q*R
757 * (CWorkspace: need N*N+2*N, prefer N*N+N+N*NB)
758 * (RWorkspace: 0)
759 *
760  CALL cgeqrf( m, n, a, lda, work( itau ), work( nwork ),
761  $ lwork-nwork+1, ierr )
762 *
763 * Copy R to WORK(IR), zeroing out below it
764 *
765  CALL clacpy( 'U', n, n, a, lda, work( ir ), ldwrkr )
766  CALL claset( 'L', n-1, n-1, czero, czero, work( ir+1 ),
767  $ ldwrkr )
768 *
769 * Generate Q in A
770 * (CWorkspace: need 2*N, prefer N+N*NB)
771 * (RWorkspace: 0)
772 *
773  CALL cungqr( m, n, n, a, lda, work( itau ),
774  $ work( nwork ), lwork-nwork+1, ierr )
775  ie = 1
776  itauq = itau
777  itaup = itauq + n
778  nwork = itaup + n
779 *
780 * Bidiagonalize R in WORK(IR)
781 * (CWorkspace: need N*N+3*N, prefer N*N+2*N+2*N*NB)
782 * (RWorkspace: need N)
783 *
784  CALL cgebrd( n, n, work( ir ), ldwrkr, s, rwork( ie ),
785  $ work( itauq ), work( itaup ), work( nwork ),
786  $ lwork-nwork+1, ierr )
787 *
788 * Perform bidiagonal SVD, computing left singular vectors
789 * of bidiagonal matrix in RWORK(IRU) and computing right
790 * singular vectors of bidiagonal matrix in RWORK(IRVT)
791 * (CWorkspace: need 0)
792 * (RWorkspace: need BDSPAC)
793 *
794  iru = ie + n
795  irvt = iru + n*n
796  nrwork = irvt + n*n
797  CALL sbdsdc( 'U', 'I', n, s, rwork( ie ), rwork( iru ),
798  $ n, rwork( irvt ), n, dum, idum,
799  $ rwork( nrwork ), iwork, info )
800 *
801 * Copy real matrix RWORK(IRU) to complex matrix U
802 * Overwrite U by left singular vectors of R
803 * (CWorkspace: need N*N+3*N, prefer N*N+2*N+N*NB)
804 * (RWorkspace: 0)
805 *
806  CALL clacp2( 'F', n, n, rwork( iru ), n, u, ldu )
807  CALL cunmbr( 'Q', 'L', 'N', n, n, n, work( ir ), ldwrkr,
808  $ work( itauq ), u, ldu, work( nwork ),
809  $ lwork-nwork+1, ierr )
810 *
811 * Copy real matrix RWORK(IRVT) to complex matrix VT
812 * Overwrite VT by right singular vectors of R
813 * (CWorkspace: need N*N+3*N, prefer N*N+2*N+N*NB)
814 * (RWorkspace: 0)
815 *
816  CALL clacp2( 'F', n, n, rwork( irvt ), n, vt, ldvt )
817  CALL cunmbr( 'P', 'R', 'C', n, n, n, work( ir ), ldwrkr,
818  $ work( itaup ), vt, ldvt, work( nwork ),
819  $ lwork-nwork+1, ierr )
820 *
821 * Multiply Q in A by left singular vectors of R in
822 * WORK(IR), storing result in U
823 * (CWorkspace: need N*N)
824 * (RWorkspace: 0)
825 *
826  CALL clacpy( 'F', n, n, u, ldu, work( ir ), ldwrkr )
827  CALL cgemm( 'N', 'N', m, n, n, cone, a, lda, work( ir ),
828  $ ldwrkr, czero, u, ldu )
829 *
830  ELSE IF( wntqa ) THEN
831 *
832 * Path 4 (M much larger than N, JOBZ='A')
833 * M left singular vectors to be computed in U and
834 * N right singular vectors to be computed in VT
835 *
836  iu = 1
837 *
838 * WORK(IU) is N by N
839 *
840  ldwrku = n
841  itau = iu + ldwrku*n
842  nwork = itau + n
843 *
844 * Compute A=Q*R, copying result to U
845 * (CWorkspace: need 2*N, prefer N+N*NB)
846 * (RWorkspace: 0)
847 *
848  CALL cgeqrf( m, n, a, lda, work( itau ), work( nwork ),
849  $ lwork-nwork+1, ierr )
850  CALL clacpy( 'L', m, n, a, lda, u, ldu )
851 *
852 * Generate Q in U
853 * (CWorkspace: need N+M, prefer N+M*NB)
854 * (RWorkspace: 0)
855 *
856  CALL cungqr( m, m, n, u, ldu, work( itau ),
857  $ work( nwork ), lwork-nwork+1, ierr )
858 *
859 * Produce R in A, zeroing out below it
860 *
861  CALL claset( 'L', n-1, n-1, czero, czero, a( 2, 1 ),
862  $ lda )
863  ie = 1
864  itauq = itau
865  itaup = itauq + n
866  nwork = itaup + n
867 *
868 * Bidiagonalize R in A
869 * (CWorkspace: need 3*N, prefer 2*N+2*N*NB)
870 * (RWorkspace: need N)
871 *
872  CALL cgebrd( n, n, a, lda, s, rwork( ie ), work( itauq ),
873  $ work( itaup ), work( nwork ), lwork-nwork+1,
874  $ ierr )
875  iru = ie + n
876  irvt = iru + n*n
877  nrwork = irvt + n*n
878 *
879 * Perform bidiagonal SVD, computing left singular vectors
880 * of bidiagonal matrix in RWORK(IRU) and computing right
881 * singular vectors of bidiagonal matrix in RWORK(IRVT)
882 * (CWorkspace: need 0)
883 * (RWorkspace: need BDSPAC)
884 *
885  CALL sbdsdc( 'U', 'I', n, s, rwork( ie ), rwork( iru ),
886  $ n, rwork( irvt ), n, dum, idum,
887  $ rwork( nrwork ), iwork, info )
888 *
889 * Copy real matrix RWORK(IRU) to complex matrix WORK(IU)
890 * Overwrite WORK(IU) by left singular vectors of R
891 * (CWorkspace: need N*N+3*N, prefer N*N+2*N+N*NB)
892 * (RWorkspace: 0)
893 *
894  CALL clacp2( 'F', n, n, rwork( iru ), n, work( iu ),
895  $ ldwrku )
896  CALL cunmbr( 'Q', 'L', 'N', n, n, n, a, lda,
897  $ work( itauq ), work( iu ), ldwrku,
898  $ work( nwork ), lwork-nwork+1, ierr )
899 *
900 * Copy real matrix RWORK(IRVT) to complex matrix VT
901 * Overwrite VT by right singular vectors of R
902 * (CWorkspace: need 3*N, prefer 2*N+N*NB)
903 * (RWorkspace: 0)
904 *
905  CALL clacp2( 'F', n, n, rwork( irvt ), n, vt, ldvt )
906  CALL cunmbr( 'P', 'R', 'C', n, n, n, a, lda,
907  $ work( itaup ), vt, ldvt, work( nwork ),
908  $ lwork-nwork+1, ierr )
909 *
910 * Multiply Q in U by left singular vectors of R in
911 * WORK(IU), storing result in A
912 * (CWorkspace: need N*N)
913 * (RWorkspace: 0)
914 *
915  CALL cgemm( 'N', 'N', m, n, n, cone, u, ldu, work( iu ),
916  $ ldwrku, czero, a, lda )
917 *
918 * Copy left singular vectors of A from A to U
919 *
920  CALL clacpy( 'F', m, n, a, lda, u, ldu )
921 *
922  END IF
923 *
924  ELSE IF( m.GE.mnthr2 ) THEN
925 *
926 * MNTHR2 <= M < MNTHR1
927 *
928 * Path 5 (M much larger than N, but not as much as MNTHR1)
929 * Reduce to bidiagonal form without QR decomposition, use
930 * CUNGBR and matrix multiplication to compute singular vectors
931 *
932  ie = 1
933  nrwork = ie + n
934  itauq = 1
935  itaup = itauq + n
936  nwork = itaup + n
937 *
938 * Bidiagonalize A
939 * (CWorkspace: need 2*N+M, prefer 2*N+(M+N)*NB)
940 * (RWorkspace: need N)
941 *
942  CALL cgebrd( m, n, a, lda, s, rwork( ie ), work( itauq ),
943  $ work( itaup ), work( nwork ), lwork-nwork+1,
944  $ ierr )
945  IF( wntqn ) THEN
946 *
947 * Compute singular values only
948 * (Cworkspace: 0)
949 * (Rworkspace: need BDSPAN)
950 *
951  CALL sbdsdc( 'U', 'N', n, s, rwork( ie ), dum, 1, dum, 1,
952  $ dum, idum, rwork( nrwork ), iwork, info )
953  ELSE IF( wntqo ) THEN
954  iu = nwork
955  iru = nrwork
956  irvt = iru + n*n
957  nrwork = irvt + n*n
958 *
959 * Copy A to VT, generate P**H
960 * (Cworkspace: need 2*N, prefer N+N*NB)
961 * (Rworkspace: 0)
962 *
963  CALL clacpy( 'U', n, n, a, lda, vt, ldvt )
964  CALL cungbr( 'P', n, n, n, vt, ldvt, work( itaup ),
965  $ work( nwork ), lwork-nwork+1, ierr )
966 *
967 * Generate Q in A
968 * (CWorkspace: need 2*N, prefer N+N*NB)
969 * (RWorkspace: 0)
970 *
971  CALL cungbr( 'Q', m, n, n, a, lda, work( itauq ),
972  $ work( nwork ), lwork-nwork+1, ierr )
973 *
974  IF( lwork.GE.m*n+3*n ) THEN
975 *
976 * WORK( IU ) is M by N
977 *
978  ldwrku = m
979  ELSE
980 *
981 * WORK(IU) is LDWRKU by N
982 *
983  ldwrku = ( lwork-3*n ) / n
984  END IF
985  nwork = iu + ldwrku*n
986 *
987 * Perform bidiagonal SVD, computing left singular vectors
988 * of bidiagonal matrix in RWORK(IRU) and computing right
989 * singular vectors of bidiagonal matrix in RWORK(IRVT)
990 * (CWorkspace: need 0)
991 * (RWorkspace: need BDSPAC)
992 *
993  CALL sbdsdc( 'U', 'I', n, s, rwork( ie ), rwork( iru ),
994  $ n, rwork( irvt ), n, dum, idum,
995  $ rwork( nrwork ), iwork, info )
996 *
997 * Multiply real matrix RWORK(IRVT) by P**H in VT,
998 * storing the result in WORK(IU), copying to VT
999 * (Cworkspace: need 0)
1000 * (Rworkspace: need 3*N*N)
1001 *
1002  CALL clarcm( n, n, rwork( irvt ), n, vt, ldvt,
1003  $ work( iu ), ldwrku, rwork( nrwork ) )
1004  CALL clacpy( 'F', n, n, work( iu ), ldwrku, vt, ldvt )
1005 *
1006 * Multiply Q in A by real matrix RWORK(IRU), storing the
1007 * result in WORK(IU), copying to A
1008 * (CWorkspace: need N*N, prefer M*N)
1009 * (Rworkspace: need 3*N*N, prefer N*N+2*M*N)
1010 *
1011  nrwork = irvt
1012  DO 20 i = 1, m, ldwrku
1013  chunk = min( m-i+1, ldwrku )
1014  CALL clacrm( chunk, n, a( i, 1 ), lda, rwork( iru ),
1015  $ n, work( iu ), ldwrku, rwork( nrwork ) )
1016  CALL clacpy( 'F', chunk, n, work( iu ), ldwrku,
1017  $ a( i, 1 ), lda )
1018  20 CONTINUE
1019 *
1020  ELSE IF( wntqs ) THEN
1021 *
1022 * Copy A to VT, generate P**H
1023 * (Cworkspace: need 2*N, prefer N+N*NB)
1024 * (Rworkspace: 0)
1025 *
1026  CALL clacpy( 'U', n, n, a, lda, vt, ldvt )
1027  CALL cungbr( 'P', n, n, n, vt, ldvt, work( itaup ),
1028  $ work( nwork ), lwork-nwork+1, ierr )
1029 *
1030 * Copy A to U, generate Q
1031 * (Cworkspace: need 2*N, prefer N+N*NB)
1032 * (Rworkspace: 0)
1033 *
1034  CALL clacpy( 'L', m, n, a, lda, u, ldu )
1035  CALL cungbr( 'Q', m, n, n, u, ldu, work( itauq ),
1036  $ work( nwork ), lwork-nwork+1, ierr )
1037 *
1038 * Perform bidiagonal SVD, computing left singular vectors
1039 * of bidiagonal matrix in RWORK(IRU) and computing right
1040 * singular vectors of bidiagonal matrix in RWORK(IRVT)
1041 * (CWorkspace: need 0)
1042 * (RWorkspace: need BDSPAC)
1043 *
1044  iru = nrwork
1045  irvt = iru + n*n
1046  nrwork = irvt + n*n
1047  CALL sbdsdc( 'U', 'I', n, s, rwork( ie ), rwork( iru ),
1048  $ n, rwork( irvt ), n, dum, idum,
1049  $ rwork( nrwork ), iwork, info )
1050 *
1051 * Multiply real matrix RWORK(IRVT) by P**H in VT,
1052 * storing the result in A, copying to VT
1053 * (Cworkspace: need 0)
1054 * (Rworkspace: need 3*N*N)
1055 *
1056  CALL clarcm( n, n, rwork( irvt ), n, vt, ldvt, a, lda,
1057  $ rwork( nrwork ) )
1058  CALL clacpy( 'F', n, n, a, lda, vt, ldvt )
1059 *
1060 * Multiply Q in U by real matrix RWORK(IRU), storing the
1061 * result in A, copying to U
1062 * (CWorkspace: need 0)
1063 * (Rworkspace: need N*N+2*M*N)
1064 *
1065  nrwork = irvt
1066  CALL clacrm( m, n, u, ldu, rwork( iru ), n, a, lda,
1067  $ rwork( nrwork ) )
1068  CALL clacpy( 'F', m, n, a, lda, u, ldu )
1069  ELSE
1070 *
1071 * Copy A to VT, generate P**H
1072 * (Cworkspace: need 2*N, prefer N+N*NB)
1073 * (Rworkspace: 0)
1074 *
1075  CALL clacpy( 'U', n, n, a, lda, vt, ldvt )
1076  CALL cungbr( 'P', n, n, n, vt, ldvt, work( itaup ),
1077  $ work( nwork ), lwork-nwork+1, ierr )
1078 *
1079 * Copy A to U, generate Q
1080 * (Cworkspace: need 2*N, prefer N+N*NB)
1081 * (Rworkspace: 0)
1082 *
1083  CALL clacpy( 'L', m, n, a, lda, u, ldu )
1084  CALL cungbr( 'Q', m, m, n, u, ldu, work( itauq ),
1085  $ work( nwork ), lwork-nwork+1, ierr )
1086 *
1087 * Perform bidiagonal SVD, computing left singular vectors
1088 * of bidiagonal matrix in RWORK(IRU) and computing right
1089 * singular vectors of bidiagonal matrix in RWORK(IRVT)
1090 * (CWorkspace: need 0)
1091 * (RWorkspace: need BDSPAC)
1092 *
1093  iru = nrwork
1094  irvt = iru + n*n
1095  nrwork = irvt + n*n
1096  CALL sbdsdc( 'U', 'I', n, s, rwork( ie ), rwork( iru ),
1097  $ n, rwork( irvt ), n, dum, idum,
1098  $ rwork( nrwork ), iwork, info )
1099 *
1100 * Multiply real matrix RWORK(IRVT) by P**H in VT,
1101 * storing the result in A, copying to VT
1102 * (Cworkspace: need 0)
1103 * (Rworkspace: need 3*N*N)
1104 *
1105  CALL clarcm( n, n, rwork( irvt ), n, vt, ldvt, a, lda,
1106  $ rwork( nrwork ) )
1107  CALL clacpy( 'F', n, n, a, lda, vt, ldvt )
1108 *
1109 * Multiply Q in U by real matrix RWORK(IRU), storing the
1110 * result in A, copying to U
1111 * (CWorkspace: 0)
1112 * (Rworkspace: need 3*N*N)
1113 *
1114  nrwork = irvt
1115  CALL clacrm( m, n, u, ldu, rwork( iru ), n, a, lda,
1116  $ rwork( nrwork ) )
1117  CALL clacpy( 'F', m, n, a, lda, u, ldu )
1118  END IF
1119 *
1120  ELSE
1121 *
1122 * M .LT. MNTHR2
1123 *
1124 * Path 6 (M at least N, but not much larger)
1125 * Reduce to bidiagonal form without QR decomposition
1126 * Use CUNMBR to compute singular vectors
1127 *
1128  ie = 1
1129  nrwork = ie + n
1130  itauq = 1
1131  itaup = itauq + n
1132  nwork = itaup + n
1133 *
1134 * Bidiagonalize A
1135 * (CWorkspace: need 2*N+M, prefer 2*N+(M+N)*NB)
1136 * (RWorkspace: need N)
1137 *
1138  CALL cgebrd( m, n, a, lda, s, rwork( ie ), work( itauq ),
1139  $ work( itaup ), work( nwork ), lwork-nwork+1,
1140  $ ierr )
1141  IF( wntqn ) THEN
1142 *
1143 * Compute singular values only
1144 * (Cworkspace: 0)
1145 * (Rworkspace: need BDSPAN)
1146 *
1147  CALL sbdsdc( 'U', 'N', n, s, rwork( ie ), dum, 1, dum, 1,
1148  $ dum, idum, rwork( nrwork ), iwork, info )
1149  ELSE IF( wntqo ) THEN
1150  iu = nwork
1151  iru = nrwork
1152  irvt = iru + n*n
1153  nrwork = irvt + n*n
1154  IF( lwork.GE.m*n+3*n ) THEN
1155 *
1156 * WORK( IU ) is M by N
1157 *
1158  ldwrku = m
1159  ELSE
1160 *
1161 * WORK( IU ) is LDWRKU by N
1162 *
1163  ldwrku = ( lwork-3*n ) / n
1164  END IF
1165  nwork = iu + ldwrku*n
1166 *
1167 * Perform bidiagonal SVD, computing left singular vectors
1168 * of bidiagonal matrix in RWORK(IRU) and computing right
1169 * singular vectors of bidiagonal matrix in RWORK(IRVT)
1170 * (CWorkspace: need 0)
1171 * (RWorkspace: need BDSPAC)
1172 *
1173  CALL sbdsdc( 'U', 'I', n, s, rwork( ie ), rwork( iru ),
1174  $ n, rwork( irvt ), n, dum, idum,
1175  $ rwork( nrwork ), iwork, info )
1176 *
1177 * Copy real matrix RWORK(IRVT) to complex matrix VT
1178 * Overwrite VT by right singular vectors of A
1179 * (Cworkspace: need 2*N, prefer N+N*NB)
1180 * (Rworkspace: need 0)
1181 *
1182  CALL clacp2( 'F', n, n, rwork( irvt ), n, vt, ldvt )
1183  CALL cunmbr( 'P', 'R', 'C', n, n, n, a, lda,
1184  $ work( itaup ), vt, ldvt, work( nwork ),
1185  $ lwork-nwork+1, ierr )
1186 *
1187  IF( lwork.GE.m*n+3*n ) THEN
1188 *
1189 * Copy real matrix RWORK(IRU) to complex matrix WORK(IU)
1190 * Overwrite WORK(IU) by left singular vectors of A, copying
1191 * to A
1192 * (Cworkspace: need M*N+2*N, prefer M*N+N+N*NB)
1193 * (Rworkspace: need 0)
1194 *
1195  CALL claset( 'F', m, n, czero, czero, work( iu ),
1196  $ ldwrku )
1197  CALL clacp2( 'F', n, n, rwork( iru ), n, work( iu ),
1198  $ ldwrku )
1199  CALL cunmbr( 'Q', 'L', 'N', m, n, n, a, lda,
1200  $ work( itauq ), work( iu ), ldwrku,
1201  $ work( nwork ), lwork-nwork+1, ierr )
1202  CALL clacpy( 'F', m, n, work( iu ), ldwrku, a, lda )
1203  ELSE
1204 *
1205 * Generate Q in A
1206 * (Cworkspace: need 2*N, prefer N+N*NB)
1207 * (Rworkspace: need 0)
1208 *
1209  CALL cungbr( 'Q', m, n, n, a, lda, work( itauq ),
1210  $ work( nwork ), lwork-nwork+1, ierr )
1211 *
1212 * Multiply Q in A by real matrix RWORK(IRU), storing the
1213 * result in WORK(IU), copying to A
1214 * (CWorkspace: need N*N, prefer M*N)
1215 * (Rworkspace: need 3*N*N, prefer N*N+2*M*N)
1216 *
1217  nrwork = irvt
1218  DO 30 i = 1, m, ldwrku
1219  chunk = min( m-i+1, ldwrku )
1220  CALL clacrm( chunk, n, a( i, 1 ), lda,
1221  $ rwork( iru ), n, work( iu ), ldwrku,
1222  $ rwork( nrwork ) )
1223  CALL clacpy( 'F', chunk, n, work( iu ), ldwrku,
1224  $ a( i, 1 ), lda )
1225  30 CONTINUE
1226  END IF
1227 *
1228  ELSE IF( wntqs ) THEN
1229 *
1230 * Perform bidiagonal SVD, computing left singular vectors
1231 * of bidiagonal matrix in RWORK(IRU) and computing right
1232 * singular vectors of bidiagonal matrix in RWORK(IRVT)
1233 * (CWorkspace: need 0)
1234 * (RWorkspace: need BDSPAC)
1235 *
1236  iru = nrwork
1237  irvt = iru + n*n
1238  nrwork = irvt + n*n
1239  CALL sbdsdc( 'U', 'I', n, s, rwork( ie ), rwork( iru ),
1240  $ n, rwork( irvt ), n, dum, idum,
1241  $ rwork( nrwork ), iwork, info )
1242 *
1243 * Copy real matrix RWORK(IRU) to complex matrix U
1244 * Overwrite U by left singular vectors of A
1245 * (CWorkspace: need 3*N, prefer 2*N+N*NB)
1246 * (RWorkspace: 0)
1247 *
1248  CALL claset( 'F', m, n, czero, czero, u, ldu )
1249  CALL clacp2( 'F', n, n, rwork( iru ), n, u, ldu )
1250  CALL cunmbr( 'Q', 'L', 'N', m, n, n, a, lda,
1251  $ work( itauq ), u, ldu, work( nwork ),
1252  $ lwork-nwork+1, ierr )
1253 *
1254 * Copy real matrix RWORK(IRVT) to complex matrix VT
1255 * Overwrite VT by right singular vectors of A
1256 * (CWorkspace: need 3*N, prefer 2*N+N*NB)
1257 * (RWorkspace: 0)
1258 *
1259  CALL clacp2( 'F', n, n, rwork( irvt ), n, vt, ldvt )
1260  CALL cunmbr( 'P', 'R', 'C', n, n, n, a, lda,
1261  $ work( itaup ), vt, ldvt, work( nwork ),
1262  $ lwork-nwork+1, ierr )
1263  ELSE
1264 *
1265 * Perform bidiagonal SVD, computing left singular vectors
1266 * of bidiagonal matrix in RWORK(IRU) and computing right
1267 * singular vectors of bidiagonal matrix in RWORK(IRVT)
1268 * (CWorkspace: need 0)
1269 * (RWorkspace: need BDSPAC)
1270 *
1271  iru = nrwork
1272  irvt = iru + n*n
1273  nrwork = irvt + n*n
1274  CALL sbdsdc( 'U', 'I', n, s, rwork( ie ), rwork( iru ),
1275  $ n, rwork( irvt ), n, dum, idum,
1276  $ rwork( nrwork ), iwork, info )
1277 *
1278 * Set the right corner of U to identity matrix
1279 *
1280  CALL claset( 'F', m, m, czero, czero, u, ldu )
1281  IF( m.GT.n ) THEN
1282  CALL claset( 'F', m-n, m-n, czero, cone,
1283  $ u( n+1, n+1 ), ldu )
1284  END IF
1285 *
1286 * Copy real matrix RWORK(IRU) to complex matrix U
1287 * Overwrite U by left singular vectors of A
1288 * (CWorkspace: need 2*N+M, prefer 2*N+M*NB)
1289 * (RWorkspace: 0)
1290 *
1291  CALL clacp2( 'F', n, n, rwork( iru ), n, u, ldu )
1292  CALL cunmbr( 'Q', 'L', 'N', m, m, n, a, lda,
1293  $ work( itauq ), u, ldu, work( nwork ),
1294  $ lwork-nwork+1, ierr )
1295 *
1296 * Copy real matrix RWORK(IRVT) to complex matrix VT
1297 * Overwrite VT by right singular vectors of A
1298 * (CWorkspace: need 3*N, prefer 2*N+N*NB)
1299 * (RWorkspace: 0)
1300 *
1301  CALL clacp2( 'F', n, n, rwork( irvt ), n, vt, ldvt )
1302  CALL cunmbr( 'P', 'R', 'C', n, n, n, a, lda,
1303  $ work( itaup ), vt, ldvt, work( nwork ),
1304  $ lwork-nwork+1, ierr )
1305  END IF
1306 *
1307  END IF
1308 *
1309  ELSE
1310 *
1311 * A has more columns than rows. If A has sufficiently more
1312 * columns than rows, first reduce using the LQ decomposition (if
1313 * sufficient workspace available)
1314 *
1315  IF( n.GE.mnthr1 ) THEN
1316 *
1317  IF( wntqn ) THEN
1318 *
1319 * Path 1t (N much larger than M, JOBZ='N')
1320 * No singular vectors to be computed
1321 *
1322  itau = 1
1323  nwork = itau + m
1324 *
1325 * Compute A=L*Q
1326 * (CWorkspace: need 2*M, prefer M+M*NB)
1327 * (RWorkspace: 0)
1328 *
1329  CALL cgelqf( m, n, a, lda, work( itau ), work( nwork ),
1330  $ lwork-nwork+1, ierr )
1331 *
1332 * Zero out above L
1333 *
1334  CALL claset( 'U', m-1, m-1, czero, czero, a( 1, 2 ),
1335  $ lda )
1336  ie = 1
1337  itauq = 1
1338  itaup = itauq + m
1339  nwork = itaup + m
1340 *
1341 * Bidiagonalize L in A
1342 * (CWorkspace: need 3*M, prefer 2*M+2*M*NB)
1343 * (RWorkspace: need M)
1344 *
1345  CALL cgebrd( m, m, a, lda, s, rwork( ie ), work( itauq ),
1346  $ work( itaup ), work( nwork ), lwork-nwork+1,
1347  $ ierr )
1348  nrwork = ie + m
1349 *
1350 * Perform bidiagonal SVD, compute singular values only
1351 * (CWorkspace: 0)
1352 * (RWorkspace: need BDSPAN)
1353 *
1354  CALL sbdsdc( 'U', 'N', m, s, rwork( ie ), dum, 1, dum, 1,
1355  $ dum, idum, rwork( nrwork ), iwork, info )
1356 *
1357  ELSE IF( wntqo ) THEN
1358 *
1359 * Path 2t (N much larger than M, JOBZ='O')
1360 * M right singular vectors to be overwritten on A and
1361 * M left singular vectors to be computed in U
1362 *
1363  ivt = 1
1364  ldwkvt = m
1365 *
1366 * WORK(IVT) is M by M
1367 *
1368  il = ivt + ldwkvt*m
1369  IF( lwork.GE.m*n+m*m+3*m ) THEN
1370 *
1371 * WORK(IL) M by N
1372 *
1373  ldwrkl = m
1374  chunk = n
1375  ELSE
1376 *
1377 * WORK(IL) is M by CHUNK
1378 *
1379  ldwrkl = m
1380  chunk = ( lwork-m*m-3*m ) / m
1381  END IF
1382  itau = il + ldwrkl*chunk
1383  nwork = itau + m
1384 *
1385 * Compute A=L*Q
1386 * (CWorkspace: need 2*M, prefer M+M*NB)
1387 * (RWorkspace: 0)
1388 *
1389  CALL cgelqf( m, n, a, lda, work( itau ), work( nwork ),
1390  $ lwork-nwork+1, ierr )
1391 *
1392 * Copy L to WORK(IL), zeroing about above it
1393 *
1394  CALL clacpy( 'L', m, m, a, lda, work( il ), ldwrkl )
1395  CALL claset( 'U', m-1, m-1, czero, czero,
1396  $ work( il+ldwrkl ), ldwrkl )
1397 *
1398 * Generate Q in A
1399 * (CWorkspace: need M*M+2*M, prefer M*M+M+M*NB)
1400 * (RWorkspace: 0)
1401 *
1402  CALL cunglq( m, n, m, a, lda, work( itau ),
1403  $ work( nwork ), lwork-nwork+1, ierr )
1404  ie = 1
1405  itauq = itau
1406  itaup = itauq + m
1407  nwork = itaup + m
1408 *
1409 * Bidiagonalize L in WORK(IL)
1410 * (CWorkspace: need M*M+3*M, prefer M*M+2*M+2*M*NB)
1411 * (RWorkspace: need M)
1412 *
1413  CALL cgebrd( m, m, work( il ), ldwrkl, s, rwork( ie ),
1414  $ work( itauq ), work( itaup ), work( nwork ),
1415  $ lwork-nwork+1, ierr )
1416 *
1417 * Perform bidiagonal SVD, computing left singular vectors
1418 * of bidiagonal matrix in RWORK(IRU) and computing right
1419 * singular vectors of bidiagonal matrix in RWORK(IRVT)
1420 * (CWorkspace: need 0)
1421 * (RWorkspace: need BDSPAC)
1422 *
1423  iru = ie + m
1424  irvt = iru + m*m
1425  nrwork = irvt + m*m
1426  CALL sbdsdc( 'U', 'I', m, s, rwork( ie ), rwork( iru ),
1427  $ m, rwork( irvt ), m, dum, idum,
1428  $ rwork( nrwork ), iwork, info )
1429 *
1430 * Copy real matrix RWORK(IRU) to complex matrix WORK(IU)
1431 * Overwrite WORK(IU) by the left singular vectors of L
1432 * (CWorkspace: need N*N+3*N, prefer M*N+2*N+N*NB)
1433 * (RWorkspace: 0)
1434 *
1435  CALL clacp2( 'F', m, m, rwork( iru ), m, u, ldu )
1436  CALL cunmbr( 'Q', 'L', 'N', m, m, m, work( il ), ldwrkl,
1437  $ work( itauq ), u, ldu, work( nwork ),
1438  $ lwork-nwork+1, ierr )
1439 *
1440 * Copy real matrix RWORK(IRVT) to complex matrix WORK(IVT)
1441 * Overwrite WORK(IVT) by the right singular vectors of L
1442 * (CWorkspace: need N*N+3*N, prefer M*N+2*N+N*NB)
1443 * (RWorkspace: 0)
1444 *
1445  CALL clacp2( 'F', m, m, rwork( irvt ), m, work( ivt ),
1446  $ ldwkvt )
1447  CALL cunmbr( 'P', 'R', 'C', m, m, m, work( il ), ldwrkl,
1448  $ work( itaup ), work( ivt ), ldwkvt,
1449  $ work( nwork ), lwork-nwork+1, ierr )
1450 *
1451 * Multiply right singular vectors of L in WORK(IL) by Q
1452 * in A, storing result in WORK(IL) and copying to A
1453 * (CWorkspace: need 2*M*M, prefer M*M+M*N))
1454 * (RWorkspace: 0)
1455 *
1456  DO 40 i = 1, n, chunk
1457  blk = min( n-i+1, chunk )
1458  CALL cgemm( 'N', 'N', m, blk, m, cone, work( ivt ), m,
1459  $ a( 1, i ), lda, czero, work( il ),
1460  $ ldwrkl )
1461  CALL clacpy( 'F', m, blk, work( il ), ldwrkl,
1462  $ a( 1, i ), lda )
1463  40 CONTINUE
1464 *
1465  ELSE IF( wntqs ) THEN
1466 *
1467 * Path 3t (N much larger than M, JOBZ='S')
1468 * M right singular vectors to be computed in VT and
1469 * M left singular vectors to be computed in U
1470 *
1471  il = 1
1472 *
1473 * WORK(IL) is M by M
1474 *
1475  ldwrkl = m
1476  itau = il + ldwrkl*m
1477  nwork = itau + m
1478 *
1479 * Compute A=L*Q
1480 * (CWorkspace: need 2*M, prefer M+M*NB)
1481 * (RWorkspace: 0)
1482 *
1483  CALL cgelqf( m, n, a, lda, work( itau ), work( nwork ),
1484  $ lwork-nwork+1, ierr )
1485 *
1486 * Copy L to WORK(IL), zeroing out above it
1487 *
1488  CALL clacpy( 'L', m, m, a, lda, work( il ), ldwrkl )
1489  CALL claset( 'U', m-1, m-1, czero, czero,
1490  $ work( il+ldwrkl ), ldwrkl )
1491 *
1492 * Generate Q in A
1493 * (CWorkspace: need M*M+2*M, prefer M*M+M+M*NB)
1494 * (RWorkspace: 0)
1495 *
1496  CALL cunglq( m, n, m, a, lda, work( itau ),
1497  $ work( nwork ), lwork-nwork+1, ierr )
1498  ie = 1
1499  itauq = itau
1500  itaup = itauq + m
1501  nwork = itaup + m
1502 *
1503 * Bidiagonalize L in WORK(IL)
1504 * (CWorkspace: need M*M+3*M, prefer M*M+2*M+2*M*NB)
1505 * (RWorkspace: need M)
1506 *
1507  CALL cgebrd( m, m, work( il ), ldwrkl, s, rwork( ie ),
1508  $ work( itauq ), work( itaup ), work( nwork ),
1509  $ lwork-nwork+1, ierr )
1510 *
1511 * Perform bidiagonal SVD, computing left singular vectors
1512 * of bidiagonal matrix in RWORK(IRU) and computing right
1513 * singular vectors of bidiagonal matrix in RWORK(IRVT)
1514 * (CWorkspace: need 0)
1515 * (RWorkspace: need BDSPAC)
1516 *
1517  iru = ie + m
1518  irvt = iru + m*m
1519  nrwork = irvt + m*m
1520  CALL sbdsdc( 'U', 'I', m, s, rwork( ie ), rwork( iru ),
1521  $ m, rwork( irvt ), m, dum, idum,
1522  $ rwork( nrwork ), iwork, info )
1523 *
1524 * Copy real matrix RWORK(IRU) to complex matrix U
1525 * Overwrite U by left singular vectors of L
1526 * (CWorkspace: need M*M+3*M, prefer M*M+2*M+M*NB)
1527 * (RWorkspace: 0)
1528 *
1529  CALL clacp2( 'F', m, m, rwork( iru ), m, u, ldu )
1530  CALL cunmbr( 'Q', 'L', 'N', m, m, m, work( il ), ldwrkl,
1531  $ work( itauq ), u, ldu, work( nwork ),
1532  $ lwork-nwork+1, ierr )
1533 *
1534 * Copy real matrix RWORK(IRVT) to complex matrix VT
1535 * Overwrite VT by left singular vectors of L
1536 * (CWorkspace: need M*M+3*M, prefer M*M+2*M+M*NB)
1537 * (RWorkspace: 0)
1538 *
1539  CALL clacp2( 'F', m, m, rwork( irvt ), m, vt, ldvt )
1540  CALL cunmbr( 'P', 'R', 'C', m, m, m, work( il ), ldwrkl,
1541  $ work( itaup ), vt, ldvt, work( nwork ),
1542  $ lwork-nwork+1, ierr )
1543 *
1544 * Copy VT to WORK(IL), multiply right singular vectors of L
1545 * in WORK(IL) by Q in A, storing result in VT
1546 * (CWorkspace: need M*M)
1547 * (RWorkspace: 0)
1548 *
1549  CALL clacpy( 'F', m, m, vt, ldvt, work( il ), ldwrkl )
1550  CALL cgemm( 'N', 'N', m, n, m, cone, work( il ), ldwrkl,
1551  $ a, lda, czero, vt, ldvt )
1552 *
1553  ELSE IF( wntqa ) THEN
1554 *
1555 * Path 9t (N much larger than M, JOBZ='A')
1556 * N right singular vectors to be computed in VT and
1557 * M left singular vectors to be computed in U
1558 *
1559  ivt = 1
1560 *
1561 * WORK(IVT) is M by M
1562 *
1563  ldwkvt = m
1564  itau = ivt + ldwkvt*m
1565  nwork = itau + m
1566 *
1567 * Compute A=L*Q, copying result to VT
1568 * (CWorkspace: need 2*M, prefer M+M*NB)
1569 * (RWorkspace: 0)
1570 *
1571  CALL cgelqf( m, n, a, lda, work( itau ), work( nwork ),
1572  $ lwork-nwork+1, ierr )
1573  CALL clacpy( 'U', m, n, a, lda, vt, ldvt )
1574 *
1575 * Generate Q in VT
1576 * (CWorkspace: need M+N, prefer M+N*NB)
1577 * (RWorkspace: 0)
1578 *
1579  CALL cunglq( n, n, m, vt, ldvt, work( itau ),
1580  $ work( nwork ), lwork-nwork+1, ierr )
1581 *
1582 * Produce L in A, zeroing out above it
1583 *
1584  CALL claset( 'U', m-1, m-1, czero, czero, a( 1, 2 ),
1585  $ lda )
1586  ie = 1
1587  itauq = itau
1588  itaup = itauq + m
1589  nwork = itaup + m
1590 *
1591 * Bidiagonalize L in A
1592 * (CWorkspace: need M*M+3*M, prefer M*M+2*M+2*M*NB)
1593 * (RWorkspace: need M)
1594 *
1595  CALL cgebrd( m, m, a, lda, s, rwork( ie ), work( itauq ),
1596  $ work( itaup ), work( nwork ), lwork-nwork+1,
1597  $ ierr )
1598 *
1599 * Perform bidiagonal SVD, computing left singular vectors
1600 * of bidiagonal matrix in RWORK(IRU) and computing right
1601 * singular vectors of bidiagonal matrix in RWORK(IRVT)
1602 * (CWorkspace: need 0)
1603 * (RWorkspace: need BDSPAC)
1604 *
1605  iru = ie + m
1606  irvt = iru + m*m
1607  nrwork = irvt + m*m
1608  CALL sbdsdc( 'U', 'I', m, s, rwork( ie ), rwork( iru ),
1609  $ m, rwork( irvt ), m, dum, idum,
1610  $ rwork( nrwork ), iwork, info )
1611 *
1612 * Copy real matrix RWORK(IRU) to complex matrix U
1613 * Overwrite U by left singular vectors of L
1614 * (CWorkspace: need 3*M, prefer 2*M+M*NB)
1615 * (RWorkspace: 0)
1616 *
1617  CALL clacp2( 'F', m, m, rwork( iru ), m, u, ldu )
1618  CALL cunmbr( 'Q', 'L', 'N', m, m, m, a, lda,
1619  $ work( itauq ), u, ldu, work( nwork ),
1620  $ lwork-nwork+1, ierr )
1621 *
1622 * Copy real matrix RWORK(IRVT) to complex matrix WORK(IVT)
1623 * Overwrite WORK(IVT) by right singular vectors of L
1624 * (CWorkspace: need M*M+3*M, prefer M*M+2*M+M*NB)
1625 * (RWorkspace: 0)
1626 *
1627  CALL clacp2( 'F', m, m, rwork( irvt ), m, work( ivt ),
1628  $ ldwkvt )
1629  CALL cunmbr( 'P', 'R', 'C', m, m, m, a, lda,
1630  $ work( itaup ), work( ivt ), ldwkvt,
1631  $ work( nwork ), lwork-nwork+1, ierr )
1632 *
1633 * Multiply right singular vectors of L in WORK(IVT) by
1634 * Q in VT, storing result in A
1635 * (CWorkspace: need M*M)
1636 * (RWorkspace: 0)
1637 *
1638  CALL cgemm( 'N', 'N', m, n, m, cone, work( ivt ),
1639  $ ldwkvt, vt, ldvt, czero, a, lda )
1640 *
1641 * Copy right singular vectors of A from A to VT
1642 *
1643  CALL clacpy( 'F', m, n, a, lda, vt, ldvt )
1644 *
1645  END IF
1646 *
1647  ELSE IF( n.GE.mnthr2 ) THEN
1648 *
1649 * MNTHR2 <= N < MNTHR1
1650 *
1651 * Path 5t (N much larger than M, but not as much as MNTHR1)
1652 * Reduce to bidiagonal form without QR decomposition, use
1653 * CUNGBR and matrix multiplication to compute singular vectors
1654 *
1655 *
1656  ie = 1
1657  nrwork = ie + m
1658  itauq = 1
1659  itaup = itauq + m
1660  nwork = itaup + m
1661 *
1662 * Bidiagonalize A
1663 * (CWorkspace: need 2*M+N, prefer 2*M+(M+N)*NB)
1664 * (RWorkspace: M)
1665 *
1666  CALL cgebrd( m, n, a, lda, s, rwork( ie ), work( itauq ),
1667  $ work( itaup ), work( nwork ), lwork-nwork+1,
1668  $ ierr )
1669 *
1670  IF( wntqn ) THEN
1671 *
1672 * Compute singular values only
1673 * (Cworkspace: 0)
1674 * (Rworkspace: need BDSPAN)
1675 *
1676  CALL sbdsdc( 'L', 'N', m, s, rwork( ie ), dum, 1, dum, 1,
1677  $ dum, idum, rwork( nrwork ), iwork, info )
1678  ELSE IF( wntqo ) THEN
1679  irvt = nrwork
1680  iru = irvt + m*m
1681  nrwork = iru + m*m
1682  ivt = nwork
1683 *
1684 * Copy A to U, generate Q
1685 * (Cworkspace: need 2*M, prefer M+M*NB)
1686 * (Rworkspace: 0)
1687 *
1688  CALL clacpy( 'L', m, m, a, lda, u, ldu )
1689  CALL cungbr( 'Q', m, m, n, u, ldu, work( itauq ),
1690  $ work( nwork ), lwork-nwork+1, ierr )
1691 *
1692 * Generate P**H in A
1693 * (Cworkspace: need 2*M, prefer M+M*NB)
1694 * (Rworkspace: 0)
1695 *
1696  CALL cungbr( 'P', m, n, m, a, lda, work( itaup ),
1697  $ work( nwork ), lwork-nwork+1, ierr )
1698 *
1699  ldwkvt = m
1700  IF( lwork.GE.m*n+3*m ) THEN
1701 *
1702 * WORK( IVT ) is M by N
1703 *
1704  nwork = ivt + ldwkvt*n
1705  chunk = n
1706  ELSE
1707 *
1708 * WORK( IVT ) is M by CHUNK
1709 *
1710  chunk = ( lwork-3*m ) / m
1711  nwork = ivt + ldwkvt*chunk
1712  END IF
1713 *
1714 * Perform bidiagonal SVD, computing left singular vectors
1715 * of bidiagonal matrix in RWORK(IRU) and computing right
1716 * singular vectors of bidiagonal matrix in RWORK(IRVT)
1717 * (CWorkspace: need 0)
1718 * (RWorkspace: need BDSPAC)
1719 *
1720  CALL sbdsdc( 'L', 'I', m, s, rwork( ie ), rwork( iru ),
1721  $ m, rwork( irvt ), m, dum, idum,
1722  $ rwork( nrwork ), iwork, info )
1723 *
1724 * Multiply Q in U by real matrix RWORK(IRVT)
1725 * storing the result in WORK(IVT), copying to U
1726 * (Cworkspace: need 0)
1727 * (Rworkspace: need 2*M*M)
1728 *
1729  CALL clacrm( m, m, u, ldu, rwork( iru ), m, work( ivt ),
1730  $ ldwkvt, rwork( nrwork ) )
1731  CALL clacpy( 'F', m, m, work( ivt ), ldwkvt, u, ldu )
1732 *
1733 * Multiply RWORK(IRVT) by P**H in A, storing the
1734 * result in WORK(IVT), copying to A
1735 * (CWorkspace: need M*M, prefer M*N)
1736 * (Rworkspace: need 2*M*M, prefer 2*M*N)
1737 *
1738  nrwork = iru
1739  DO 50 i = 1, n, chunk
1740  blk = min( n-i+1, chunk )
1741  CALL clarcm( m, blk, rwork( irvt ), m, a( 1, i ), lda,
1742  $ work( ivt ), ldwkvt, rwork( nrwork ) )
1743  CALL clacpy( 'F', m, blk, work( ivt ), ldwkvt,
1744  $ a( 1, i ), lda )
1745  50 CONTINUE
1746  ELSE IF( wntqs ) THEN
1747 *
1748 * Copy A to U, generate Q
1749 * (Cworkspace: need 2*M, prefer M+M*NB)
1750 * (Rworkspace: 0)
1751 *
1752  CALL clacpy( 'L', m, m, a, lda, u, ldu )
1753  CALL cungbr( 'Q', m, m, n, u, ldu, work( itauq ),
1754  $ work( nwork ), lwork-nwork+1, ierr )
1755 *
1756 * Copy A to VT, generate P**H
1757 * (Cworkspace: need 2*M, prefer M+M*NB)
1758 * (Rworkspace: 0)
1759 *
1760  CALL clacpy( 'U', m, n, a, lda, vt, ldvt )
1761  CALL cungbr( 'P', m, n, m, vt, ldvt, work( itaup ),
1762  $ work( nwork ), lwork-nwork+1, ierr )
1763 *
1764 * Perform bidiagonal SVD, computing left singular vectors
1765 * of bidiagonal matrix in RWORK(IRU) and computing right
1766 * singular vectors of bidiagonal matrix in RWORK(IRVT)
1767 * (CWorkspace: need 0)
1768 * (RWorkspace: need BDSPAC)
1769 *
1770  irvt = nrwork
1771  iru = irvt + m*m
1772  nrwork = iru + m*m
1773  CALL sbdsdc( 'L', 'I', m, s, rwork( ie ), rwork( iru ),
1774  $ m, rwork( irvt ), m, dum, idum,
1775  $ rwork( nrwork ), iwork, info )
1776 *
1777 * Multiply Q in U by real matrix RWORK(IRU), storing the
1778 * result in A, copying to U
1779 * (CWorkspace: need 0)
1780 * (Rworkspace: need 3*M*M)
1781 *
1782  CALL clacrm( m, m, u, ldu, rwork( iru ), m, a, lda,
1783  $ rwork( nrwork ) )
1784  CALL clacpy( 'F', m, m, a, lda, u, ldu )
1785 *
1786 * Multiply real matrix RWORK(IRVT) by P**H in VT,
1787 * storing the result in A, copying to VT
1788 * (Cworkspace: need 0)
1789 * (Rworkspace: need M*M+2*M*N)
1790 *
1791  nrwork = iru
1792  CALL clarcm( m, n, rwork( irvt ), m, vt, ldvt, a, lda,
1793  $ rwork( nrwork ) )
1794  CALL clacpy( 'F', m, n, a, lda, vt, ldvt )
1795  ELSE
1796 *
1797 * Copy A to U, generate Q
1798 * (Cworkspace: need 2*M, prefer M+M*NB)
1799 * (Rworkspace: 0)
1800 *
1801  CALL clacpy( 'L', m, m, a, lda, u, ldu )
1802  CALL cungbr( 'Q', m, m, n, u, ldu, work( itauq ),
1803  $ work( nwork ), lwork-nwork+1, ierr )
1804 *
1805 * Copy A to VT, generate P**H
1806 * (Cworkspace: need 2*M, prefer M+M*NB)
1807 * (Rworkspace: 0)
1808 *
1809  CALL clacpy( 'U', m, n, a, lda, vt, ldvt )
1810  CALL cungbr( 'P', n, n, m, vt, ldvt, work( itaup ),
1811  $ work( nwork ), lwork-nwork+1, ierr )
1812 *
1813 * Perform bidiagonal SVD, computing left singular vectors
1814 * of bidiagonal matrix in RWORK(IRU) and computing right
1815 * singular vectors of bidiagonal matrix in RWORK(IRVT)
1816 * (CWorkspace: need 0)
1817 * (RWorkspace: need BDSPAC)
1818 *
1819  irvt = nrwork
1820  iru = irvt + m*m
1821  nrwork = iru + m*m
1822  CALL sbdsdc( 'L', 'I', m, s, rwork( ie ), rwork( iru ),
1823  $ m, rwork( irvt ), m, dum, idum,
1824  $ rwork( nrwork ), iwork, info )
1825 *
1826 * Multiply Q in U by real matrix RWORK(IRU), storing the
1827 * result in A, copying to U
1828 * (CWorkspace: need 0)
1829 * (Rworkspace: need 3*M*M)
1830 *
1831  CALL clacrm( m, m, u, ldu, rwork( iru ), m, a, lda,
1832  $ rwork( nrwork ) )
1833  CALL clacpy( 'F', m, m, a, lda, u, ldu )
1834 *
1835 * Multiply real matrix RWORK(IRVT) by P**H in VT,
1836 * storing the result in A, copying to VT
1837 * (Cworkspace: need 0)
1838 * (Rworkspace: need M*M+2*M*N)
1839 *
1840  CALL clarcm( m, n, rwork( irvt ), m, vt, ldvt, a, lda,
1841  $ rwork( nrwork ) )
1842  CALL clacpy( 'F', m, n, a, lda, vt, ldvt )
1843  END IF
1844 *
1845  ELSE
1846 *
1847 * N .LT. MNTHR2
1848 *
1849 * Path 6t (N greater than M, but not much larger)
1850 * Reduce to bidiagonal form without LQ decomposition
1851 * Use CUNMBR to compute singular vectors
1852 *
1853  ie = 1
1854  nrwork = ie + m
1855  itauq = 1
1856  itaup = itauq + m
1857  nwork = itaup + m
1858 *
1859 * Bidiagonalize A
1860 * (CWorkspace: need 2*M+N, prefer 2*M+(M+N)*NB)
1861 * (RWorkspace: M)
1862 *
1863  CALL cgebrd( m, n, a, lda, s, rwork( ie ), work( itauq ),
1864  $ work( itaup ), work( nwork ), lwork-nwork+1,
1865  $ ierr )
1866  IF( wntqn ) THEN
1867 *
1868 * Compute singular values only
1869 * (Cworkspace: 0)
1870 * (Rworkspace: need BDSPAN)
1871 *
1872  CALL sbdsdc( 'L', 'N', m, s, rwork( ie ), dum, 1, dum, 1,
1873  $ dum, idum, rwork( nrwork ), iwork, info )
1874  ELSE IF( wntqo ) THEN
1875  ldwkvt = m
1876  ivt = nwork
1877  IF( lwork.GE.m*n+3*m ) THEN
1878 *
1879 * WORK( IVT ) is M by N
1880 *
1881  CALL claset( 'F', m, n, czero, czero, work( ivt ),
1882  $ ldwkvt )
1883  nwork = ivt + ldwkvt*n
1884  ELSE
1885 *
1886 * WORK( IVT ) is M by CHUNK
1887 *
1888  chunk = ( lwork-3*m ) / m
1889  nwork = ivt + ldwkvt*chunk
1890  END IF
1891 *
1892 * Perform bidiagonal SVD, computing left singular vectors
1893 * of bidiagonal matrix in RWORK(IRU) and computing right
1894 * singular vectors of bidiagonal matrix in RWORK(IRVT)
1895 * (CWorkspace: need 0)
1896 * (RWorkspace: need BDSPAC)
1897 *
1898  irvt = nrwork
1899  iru = irvt + m*m
1900  nrwork = iru + m*m
1901  CALL sbdsdc( 'L', 'I', m, s, rwork( ie ), rwork( iru ),
1902  $ m, rwork( irvt ), m, dum, idum,
1903  $ rwork( nrwork ), iwork, info )
1904 *
1905 * Copy real matrix RWORK(IRU) to complex matrix U
1906 * Overwrite U by left singular vectors of A
1907 * (Cworkspace: need 2*M, prefer M+M*NB)
1908 * (Rworkspace: need 0)
1909 *
1910  CALL clacp2( 'F', m, m, rwork( iru ), m, u, ldu )
1911  CALL cunmbr( 'Q', 'L', 'N', m, m, n, a, lda,
1912  $ work( itauq ), u, ldu, work( nwork ),
1913  $ lwork-nwork+1, ierr )
1914 *
1915  IF( lwork.GE.m*n+3*m ) THEN
1916 *
1917 * Copy real matrix RWORK(IRVT) to complex matrix WORK(IVT)
1918 * Overwrite WORK(IVT) by right singular vectors of A,
1919 * copying to A
1920 * (Cworkspace: need M*N+2*M, prefer M*N+M+M*NB)
1921 * (Rworkspace: need 0)
1922 *
1923  CALL clacp2( 'F', m, m, rwork( irvt ), m, work( ivt ),
1924  $ ldwkvt )
1925  CALL cunmbr( 'P', 'R', 'C', m, n, m, a, lda,
1926  $ work( itaup ), work( ivt ), ldwkvt,
1927  $ work( nwork ), lwork-nwork+1, ierr )
1928  CALL clacpy( 'F', m, n, work( ivt ), ldwkvt, a, lda )
1929  ELSE
1930 *
1931 * Generate P**H in A
1932 * (Cworkspace: need 2*M, prefer M+M*NB)
1933 * (Rworkspace: need 0)
1934 *
1935  CALL cungbr( 'P', m, n, m, a, lda, work( itaup ),
1936  $ work( nwork ), lwork-nwork+1, ierr )
1937 *
1938 * Multiply Q in A by real matrix RWORK(IRU), storing the
1939 * result in WORK(IU), copying to A
1940 * (CWorkspace: need M*M, prefer M*N)
1941 * (Rworkspace: need 3*M*M, prefer M*M+2*M*N)
1942 *
1943  nrwork = iru
1944  DO 60 i = 1, n, chunk
1945  blk = min( n-i+1, chunk )
1946  CALL clarcm( m, blk, rwork( irvt ), m, a( 1, i ),
1947  $ lda, work( ivt ), ldwkvt,
1948  $ rwork( nrwork ) )
1949  CALL clacpy( 'F', m, blk, work( ivt ), ldwkvt,
1950  $ a( 1, i ), lda )
1951  60 CONTINUE
1952  END IF
1953  ELSE IF( wntqs ) THEN
1954 *
1955 * Perform bidiagonal SVD, computing left singular vectors
1956 * of bidiagonal matrix in RWORK(IRU) and computing right
1957 * singular vectors of bidiagonal matrix in RWORK(IRVT)
1958 * (CWorkspace: need 0)
1959 * (RWorkspace: need BDSPAC)
1960 *
1961  irvt = nrwork
1962  iru = irvt + m*m
1963  nrwork = iru + m*m
1964  CALL sbdsdc( 'L', 'I', m, s, rwork( ie ), rwork( iru ),
1965  $ m, rwork( irvt ), m, dum, idum,
1966  $ rwork( nrwork ), iwork, info )
1967 *
1968 * Copy real matrix RWORK(IRU) to complex matrix U
1969 * Overwrite U by left singular vectors of A
1970 * (CWorkspace: need 3*M, prefer 2*M+M*NB)
1971 * (RWorkspace: M*M)
1972 *
1973  CALL clacp2( 'F', m, m, rwork( iru ), m, u, ldu )
1974  CALL cunmbr( 'Q', 'L', 'N', m, m, n, a, lda,
1975  $ work( itauq ), u, ldu, work( nwork ),
1976  $ lwork-nwork+1, ierr )
1977 *
1978 * Copy real matrix RWORK(IRVT) to complex matrix VT
1979 * Overwrite VT by right singular vectors of A
1980 * (CWorkspace: need 3*M, prefer 2*M+M*NB)
1981 * (RWorkspace: M*M)
1982 *
1983  CALL claset( 'F', m, n, czero, czero, vt, ldvt )
1984  CALL clacp2( 'F', m, m, rwork( irvt ), m, vt, ldvt )
1985  CALL cunmbr( 'P', 'R', 'C', m, n, m, a, lda,
1986  $ work( itaup ), vt, ldvt, work( nwork ),
1987  $ lwork-nwork+1, ierr )
1988  ELSE
1989 *
1990 * Perform bidiagonal SVD, computing left singular vectors
1991 * of bidiagonal matrix in RWORK(IRU) and computing right
1992 * singular vectors of bidiagonal matrix in RWORK(IRVT)
1993 * (CWorkspace: need 0)
1994 * (RWorkspace: need BDSPAC)
1995 *
1996  irvt = nrwork
1997  iru = irvt + m*m
1998  nrwork = iru + m*m
1999 *
2000  CALL sbdsdc( 'L', 'I', m, s, rwork( ie ), rwork( iru ),
2001  $ m, rwork( irvt ), m, dum, idum,
2002  $ rwork( nrwork ), iwork, info )
2003 *
2004 * Copy real matrix RWORK(IRU) to complex matrix U
2005 * Overwrite U by left singular vectors of A
2006 * (CWorkspace: need 3*M, prefer 2*M+M*NB)
2007 * (RWorkspace: M*M)
2008 *
2009  CALL clacp2( 'F', m, m, rwork( iru ), m, u, ldu )
2010  CALL cunmbr( 'Q', 'L', 'N', m, m, n, a, lda,
2011  $ work( itauq ), u, ldu, work( nwork ),
2012  $ lwork-nwork+1, ierr )
2013 *
2014 * Set all of VT to identity matrix
2015 *
2016  CALL claset( 'F', n, n, czero, cone, vt, ldvt )
2017 *
2018 * Copy real matrix RWORK(IRVT) to complex matrix VT
2019 * Overwrite VT by right singular vectors of A
2020 * (CWorkspace: need 2*M+N, prefer 2*M+N*NB)
2021 * (RWorkspace: M*M)
2022 *
2023  CALL clacp2( 'F', m, m, rwork( irvt ), m, vt, ldvt )
2024  CALL cunmbr( 'P', 'R', 'C', n, n, m, a, lda,
2025  $ work( itaup ), vt, ldvt, work( nwork ),
2026  $ lwork-nwork+1, ierr )
2027  END IF
2028 *
2029  END IF
2030 *
2031  END IF
2032 *
2033 * Undo scaling if necessary
2034 *
2035  IF( iscl.EQ.1 ) THEN
2036  IF( anrm.GT.bignum )
2037  $ CALL slascl( 'G', 0, 0, bignum, anrm, minmn, 1, s, minmn,
2038  $ ierr )
2039  IF( info.NE.0 .AND. anrm.GT.bignum )
2040  $ CALL slascl( 'G', 0, 0, bignum, anrm, minmn-1, 1,
2041  $ rwork( ie ), minmn, ierr )
2042  IF( anrm.LT.smlnum )
2043  $ CALL slascl( 'G', 0, 0, smlnum, anrm, minmn, 1, s, minmn,
2044  $ ierr )
2045  IF( info.NE.0 .AND. anrm.LT.smlnum )
2046  $ CALL slascl( 'G', 0, 0, smlnum, anrm, minmn-1, 1,
2047  $ rwork( ie ), minmn, ierr )
2048  END IF
2049 *
2050 * Return optimal workspace in WORK(1)
2051 *
2052  work( 1 ) = maxwrk
2053 *
2054  RETURN
2055 *
2056 * End of CGESDD
2057 *
subroutine claset(UPLO, M, N, ALPHA, BETA, A, LDA)
CLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values...
Definition: claset.f:108
subroutine cunmbr(VECT, SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
CUNMBR
Definition: cunmbr.f:199
real function slamch(CMACH)
SLAMCH
Definition: slamch.f:69
subroutine clarcm(M, N, A, LDA, B, LDB, C, LDC, RWORK)
CLARCM copies all or part of a real two-dimensional array to a complex array.
Definition: clarcm.f:116
subroutine cgeqrf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
CGEQRF
Definition: cgeqrf.f:138
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:62
subroutine clacp2(UPLO, M, N, A, LDA, B, LDB)
CLACP2 copies all or part of a real two-dimensional array to a complex array.
Definition: clacp2.f:106
subroutine sbdsdc(UPLO, COMPQ, N, D, E, U, LDU, VT, LDVT, Q, IQ, WORK, IWORK, INFO)
SBDSDC
Definition: sbdsdc.f:207
subroutine cgemm(TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)
CGEMM
Definition: cgemm.f:189
subroutine cungbr(VECT, M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
CUNGBR
Definition: cungbr.f:159
subroutine cgebrd(M, N, A, LDA, D, E, TAUQ, TAUP, WORK, LWORK, INFO)
CGEBRD
Definition: cgebrd.f:208
logical function lsame(CA, CB)
LSAME
Definition: lsame.f:55
subroutine clacpy(UPLO, M, N, A, LDA, B, LDB)
CLACPY copies all or part of one two-dimensional array to another.
Definition: clacpy.f:105
subroutine clacrm(M, N, A, LDA, B, LDB, C, LDC, RWORK)
CLACRM multiplies a complex matrix by a square real matrix.
Definition: clacrm.f:116
subroutine cgelqf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
CGELQF
Definition: cgelqf.f:137
subroutine clascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
CLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: clascl.f:141
subroutine cunglq(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
CUNGLQ
Definition: cunglq.f:129
integer function ilaenv(ISPEC, NAME, OPTS, N1, N2, N3, N4)
Definition: tstiee.f:83
subroutine slascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
SLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: slascl.f:141
real function clange(NORM, M, N, A, LDA, WORK)
CLANGE returns the value of the 1-norm, Frobenius norm, infinity-norm, or the largest absolute value ...
Definition: clange.f:117
subroutine cungqr(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
CUNGQR
Definition: cungqr.f:130

Here is the call graph for this function:

Here is the caller graph for this function:

subroutine cgesvd ( character  JOBU,
character  JOBVT,
integer  M,
integer  N,
complex, dimension( lda, * )  A,
integer  LDA,
real, dimension( * )  S,
complex, dimension( ldu, * )  U,
integer  LDU,
complex, dimension( ldvt, * )  VT,
integer  LDVT,
complex, dimension( * )  WORK,
integer  LWORK,
real, dimension( * )  RWORK,
integer  INFO 
)

CGESVD computes the singular value decomposition (SVD) for GE matrices

Download CGESVD + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 CGESVD computes the singular value decomposition (SVD) of a complex
 M-by-N matrix A, optionally computing the left and/or right singular
 vectors. The SVD is written

      A = U * SIGMA * conjugate-transpose(V)

 where SIGMA is an M-by-N matrix which is zero except for its
 min(m,n) diagonal elements, U is an M-by-M unitary matrix, and
 V is an N-by-N unitary matrix.  The diagonal elements of SIGMA
 are the singular values of A; they are real and non-negative, and
 are returned in descending order.  The first min(m,n) columns of
 U and V are the left and right singular vectors of A.

 Note that the routine returns V**H, not V.
Parameters
[in]JOBU
          JOBU is CHARACTER*1
          Specifies options for computing all or part of the matrix U:
          = 'A':  all M columns of U are returned in array U:
          = 'S':  the first min(m,n) columns of U (the left singular
                  vectors) are returned in the array U;
          = 'O':  the first min(m,n) columns of U (the left singular
                  vectors) are overwritten on the array A;
          = 'N':  no columns of U (no left singular vectors) are
                  computed.
[in]JOBVT
          JOBVT is CHARACTER*1
          Specifies options for computing all or part of the matrix
          V**H:
          = 'A':  all N rows of V**H are returned in the array VT;
          = 'S':  the first min(m,n) rows of V**H (the right singular
                  vectors) are returned in the array VT;
          = 'O':  the first min(m,n) rows of V**H (the right singular
                  vectors) are overwritten on the array A;
          = 'N':  no rows of V**H (no right singular vectors) are
                  computed.

          JOBVT and JOBU cannot both be 'O'.
[in]M
          M is INTEGER
          The number of rows of the input matrix A.  M >= 0.
[in]N
          N is INTEGER
          The number of columns of the input matrix A.  N >= 0.
[in,out]A
          A is COMPLEX array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
          On exit,
          if JOBU = 'O',  A is overwritten with the first min(m,n)
                          columns of U (the left singular vectors,
                          stored columnwise);
          if JOBVT = 'O', A is overwritten with the first min(m,n)
                          rows of V**H (the right singular vectors,
                          stored rowwise);
          if JOBU .ne. 'O' and JOBVT .ne. 'O', the contents of A
                          are destroyed.
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]S
          S is REAL array, dimension (min(M,N))
          The singular values of A, sorted so that S(i) >= S(i+1).
[out]U
          U is COMPLEX array, dimension (LDU,UCOL)
          (LDU,M) if JOBU = 'A' or (LDU,min(M,N)) if JOBU = 'S'.
          If JOBU = 'A', U contains the M-by-M unitary matrix U;
          if JOBU = 'S', U contains the first min(m,n) columns of U
          (the left singular vectors, stored columnwise);
          if JOBU = 'N' or 'O', U is not referenced.
[in]LDU
          LDU is INTEGER
          The leading dimension of the array U.  LDU >= 1; if
          JOBU = 'S' or 'A', LDU >= M.
[out]VT
          VT is COMPLEX array, dimension (LDVT,N)
          If JOBVT = 'A', VT contains the N-by-N unitary matrix
          V**H;
          if JOBVT = 'S', VT contains the first min(m,n) rows of
          V**H (the right singular vectors, stored rowwise);
          if JOBVT = 'N' or 'O', VT is not referenced.
[in]LDVT
          LDVT is INTEGER
          The leading dimension of the array VT.  LDVT >= 1; if
          JOBVT = 'A', LDVT >= N; if JOBVT = 'S', LDVT >= min(M,N).
[out]WORK
          WORK is COMPLEX array, dimension (MAX(1,LWORK))
          On exit, if INFO = 0, WORK(1) returns the optimal LWORK.
[in]LWORK
          LWORK is INTEGER
          The dimension of the array WORK.
          LWORK >=  MAX(1,2*MIN(M,N)+MAX(M,N)).
          For good performance, LWORK should generally be larger.

          If LWORK = -1, then a workspace query is assumed; the routine
          only calculates the optimal size of the WORK array, returns
          this value as the first entry of the WORK array, and no error
          message related to LWORK is issued by XERBLA.
[out]RWORK
          RWORK is REAL array, dimension (5*min(M,N))
          On exit, if INFO > 0, RWORK(1:MIN(M,N)-1) contains the
          unconverged superdiagonal elements of an upper bidiagonal
          matrix B whose diagonal is in S (not necessarily sorted).
          B satisfies A = U * B * VT, so it has the same singular
          values as A, and singular vectors related by U and VT.
[out]INFO
          INFO is INTEGER
          = 0:  successful exit.
          < 0:  if INFO = -i, the i-th argument had an illegal value.
          > 0:  if CBDSQR did not converge, INFO specifies how many
                superdiagonals of an intermediate bidiagonal form B
                did not converge to zero. See the description of RWORK
                above for details.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Date
April 2012

Definition at line 216 of file cgesvd.f.

216 *
217 * -- LAPACK driver routine (version 3.6.0) --
218 * -- LAPACK is a software package provided by Univ. of Tennessee, --
219 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
220 * April 2012
221 *
222 * .. Scalar Arguments ..
223  CHARACTER jobu, jobvt
224  INTEGER info, lda, ldu, ldvt, lwork, m, n
225 * ..
226 * .. Array Arguments ..
227  REAL rwork( * ), s( * )
228  COMPLEX a( lda, * ), u( ldu, * ), vt( ldvt, * ),
229  $ work( * )
230 * ..
231 *
232 * =====================================================================
233 *
234 * .. Parameters ..
235  COMPLEX czero, cone
236  parameter( czero = ( 0.0e0, 0.0e0 ),
237  $ cone = ( 1.0e0, 0.0e0 ) )
238  REAL zero, one
239  parameter( zero = 0.0e0, one = 1.0e0 )
240 * ..
241 * .. Local Scalars ..
242  LOGICAL lquery, wntua, wntuas, wntun, wntuo, wntus,
243  $ wntva, wntvas, wntvn, wntvo, wntvs
244  INTEGER blk, chunk, i, ie, ierr, ir, irwork, iscl,
245  $ itau, itaup, itauq, iu, iwork, ldwrkr, ldwrku,
246  $ maxwrk, minmn, minwrk, mnthr, ncu, ncvt, nru,
247  $ nrvt, wrkbl
248  INTEGER lwork_cgeqrf, lwork_cungqr_n, lwork_cungqr_m,
249  $ lwork_cgebrd, lwork_cungbr_p, lwork_cungbr_q,
250  $ lwork_cgelqf, lwork_cunglq_n, lwork_cunglq_m
251  REAL anrm, bignum, eps, smlnum
252 * ..
253 * .. Local Arrays ..
254  REAL dum( 1 )
255  COMPLEX cdum( 1 )
256 * ..
257 * .. External Subroutines ..
258  EXTERNAL cbdsqr, cgebrd, cgelqf, cgemm, cgeqrf, clacpy,
260  $ slascl, xerbla
261 * ..
262 * .. External Functions ..
263  LOGICAL lsame
264  INTEGER ilaenv
265  REAL clange, slamch
266  EXTERNAL lsame, ilaenv, clange, slamch
267 * ..
268 * .. Intrinsic Functions ..
269  INTRINSIC max, min, sqrt
270 * ..
271 * .. Executable Statements ..
272 *
273 * Test the input arguments
274 *
275  info = 0
276  minmn = min( m, n )
277  wntua = lsame( jobu, 'A' )
278  wntus = lsame( jobu, 'S' )
279  wntuas = wntua .OR. wntus
280  wntuo = lsame( jobu, 'O' )
281  wntun = lsame( jobu, 'N' )
282  wntva = lsame( jobvt, 'A' )
283  wntvs = lsame( jobvt, 'S' )
284  wntvas = wntva .OR. wntvs
285  wntvo = lsame( jobvt, 'O' )
286  wntvn = lsame( jobvt, 'N' )
287  lquery = ( lwork.EQ.-1 )
288 *
289  IF( .NOT.( wntua .OR. wntus .OR. wntuo .OR. wntun ) ) THEN
290  info = -1
291  ELSE IF( .NOT.( wntva .OR. wntvs .OR. wntvo .OR. wntvn ) .OR.
292  $ ( wntvo .AND. wntuo ) ) THEN
293  info = -2
294  ELSE IF( m.LT.0 ) THEN
295  info = -3
296  ELSE IF( n.LT.0 ) THEN
297  info = -4
298  ELSE IF( lda.LT.max( 1, m ) ) THEN
299  info = -6
300  ELSE IF( ldu.LT.1 .OR. ( wntuas .AND. ldu.LT.m ) ) THEN
301  info = -9
302  ELSE IF( ldvt.LT.1 .OR. ( wntva .AND. ldvt.LT.n ) .OR.
303  $ ( wntvs .AND. ldvt.LT.minmn ) ) THEN
304  info = -11
305  END IF
306 *
307 * Compute workspace
308 * (Note: Comments in the code beginning "Workspace:" describe the
309 * minimal amount of workspace needed at that point in the code,
310 * as well as the preferred amount for good performance.
311 * CWorkspace refers to complex workspace, and RWorkspace to
312 * real workspace. NB refers to the optimal block size for the
313 * immediately following subroutine, as returned by ILAENV.)
314 *
315  IF( info.EQ.0 ) THEN
316  minwrk = 1
317  maxwrk = 1
318  IF( m.GE.n .AND. minmn.GT.0 ) THEN
319 *
320 * Space needed for ZBDSQR is BDSPAC = 5*N
321 *
322  mnthr = ilaenv( 6, 'CGESVD', jobu // jobvt, m, n, 0, 0 )
323 * Compute space needed for CGEQRF
324  CALL cgeqrf( m, n, a, lda, cdum(1), cdum(1), -1, ierr )
325  lwork_cgeqrf=cdum(1)
326 * Compute space needed for CUNGQR
327  CALL cungqr( m, n, n, a, lda, cdum(1), cdum(1), -1, ierr )
328  lwork_cungqr_n=cdum(1)
329  CALL cungqr( m, m, n, a, lda, cdum(1), cdum(1), -1, ierr )
330  lwork_cungqr_m=cdum(1)
331 * Compute space needed for CGEBRD
332  CALL cgebrd( n, n, a, lda, s, dum(1), cdum(1),
333  $ cdum(1), cdum(1), -1, ierr )
334  lwork_cgebrd=cdum(1)
335 * Compute space needed for CUNGBR
336  CALL cungbr( 'P', n, n, n, a, lda, cdum(1),
337  $ cdum(1), -1, ierr )
338  lwork_cungbr_p=cdum(1)
339  CALL cungbr( 'Q', n, n, n, a, lda, cdum(1),
340  $ cdum(1), -1, ierr )
341  lwork_cungbr_q=cdum(1)
342 *
343  mnthr = ilaenv( 6, 'CGESVD', jobu // jobvt, m, n, 0, 0 )
344  IF( m.GE.mnthr ) THEN
345  IF( wntun ) THEN
346 *
347 * Path 1 (M much larger than N, JOBU='N')
348 *
349  maxwrk = n + lwork_cgeqrf
350  maxwrk = max( maxwrk, 2*n+lwork_cgebrd )
351  IF( wntvo .OR. wntvas )
352  $ maxwrk = max( maxwrk, 2*n+lwork_cungbr_p )
353  minwrk = 3*n
354  ELSE IF( wntuo .AND. wntvn ) THEN
355 *
356 * Path 2 (M much larger than N, JOBU='O', JOBVT='N')
357 *
358  wrkbl = n + lwork_cgeqrf
359  wrkbl = max( wrkbl, n+lwork_cungqr_n )
360  wrkbl = max( wrkbl, 2*n+lwork_cgebrd )
361  wrkbl = max( wrkbl, 2*n+lwork_cungbr_q )
362  maxwrk = max( n*n+wrkbl, n*n+m*n )
363  minwrk = 2*n + m
364  ELSE IF( wntuo .AND. wntvas ) THEN
365 *
366 * Path 3 (M much larger than N, JOBU='O', JOBVT='S' or
367 * 'A')
368 *
369  wrkbl = n + lwork_cgeqrf
370  wrkbl = max( wrkbl, n+lwork_cungqr_n )
371  wrkbl = max( wrkbl, 2*n+lwork_cgebrd )
372  wrkbl = max( wrkbl, 2*n+lwork_cungbr_q )
373  wrkbl = max( wrkbl, 2*n+lwork_cungbr_p )
374  maxwrk = max( n*n+wrkbl, n*n+m*n )
375  minwrk = 2*n + m
376  ELSE IF( wntus .AND. wntvn ) THEN
377 *
378 * Path 4 (M much larger than N, JOBU='S', JOBVT='N')
379 *
380  wrkbl = n + lwork_cgeqrf
381  wrkbl = max( wrkbl, n+lwork_cungqr_n )
382  wrkbl = max( wrkbl, 2*n+lwork_cgebrd )
383  wrkbl = max( wrkbl, 2*n+lwork_cungbr_q )
384  maxwrk = n*n + wrkbl
385  minwrk = 2*n + m
386  ELSE IF( wntus .AND. wntvo ) THEN
387 *
388 * Path 5 (M much larger than N, JOBU='S', JOBVT='O')
389 *
390  wrkbl = n + lwork_cgeqrf
391  wrkbl = max( wrkbl, n+lwork_cungqr_n )
392  wrkbl = max( wrkbl, 2*n+lwork_cgebrd )
393  wrkbl = max( wrkbl, 2*n+lwork_cungbr_q )
394  wrkbl = max( wrkbl, 2*n+lwork_cungbr_p )
395  maxwrk = 2*n*n + wrkbl
396  minwrk = 2*n + m
397  ELSE IF( wntus .AND. wntvas ) THEN
398 *
399 * Path 6 (M much larger than N, JOBU='S', JOBVT='S' or
400 * 'A')
401 *
402  wrkbl = n + lwork_cgeqrf
403  wrkbl = max( wrkbl, n+lwork_cungqr_n )
404  wrkbl = max( wrkbl, 2*n+lwork_cgebrd )
405  wrkbl = max( wrkbl, 2*n+lwork_cungbr_q )
406  wrkbl = max( wrkbl, 2*n+lwork_cungbr_p )
407  maxwrk = n*n + wrkbl
408  minwrk = 2*n + m
409  ELSE IF( wntua .AND. wntvn ) THEN
410 *
411 * Path 7 (M much larger than N, JOBU='A', JOBVT='N')
412 *
413  wrkbl = n + lwork_cgeqrf
414  wrkbl = max( wrkbl, n+lwork_cungqr_m )
415  wrkbl = max( wrkbl, 2*n+lwork_cgebrd )
416  wrkbl = max( wrkbl, 2*n+lwork_cungbr_q )
417  maxwrk = n*n + wrkbl
418  minwrk = 2*n + m
419  ELSE IF( wntua .AND. wntvo ) THEN
420 *
421 * Path 8 (M much larger than N, JOBU='A', JOBVT='O')
422 *
423  wrkbl = n + lwork_cgeqrf
424  wrkbl = max( wrkbl, n+lwork_cungqr_m )
425  wrkbl = max( wrkbl, 2*n+lwork_cgebrd )
426  wrkbl = max( wrkbl, 2*n+lwork_cungbr_q )
427  wrkbl = max( wrkbl, 2*n+lwork_cungbr_p )
428  maxwrk = 2*n*n + wrkbl
429  minwrk = 2*n + m
430  ELSE IF( wntua .AND. wntvas ) THEN
431 *
432 * Path 9 (M much larger than N, JOBU='A', JOBVT='S' or
433 * 'A')
434 *
435  wrkbl = n + lwork_cgeqrf
436  wrkbl = max( wrkbl, n+lwork_cungqr_m )
437  wrkbl = max( wrkbl, 2*n+lwork_cgebrd )
438  wrkbl = max( wrkbl, 2*n+lwork_cungbr_q )
439  wrkbl = max( wrkbl, 2*n+lwork_cungbr_p )
440  maxwrk = n*n + wrkbl
441  minwrk = 2*n + m
442  END IF
443  ELSE
444 *
445 * Path 10 (M at least N, but not much larger)
446 *
447  CALL cgebrd( m, n, a, lda, s, dum(1), cdum(1),
448  $ cdum(1), cdum(1), -1, ierr )
449  lwork_cgebrd=cdum(1)
450  maxwrk = 2*n + lwork_cgebrd
451  IF( wntus .OR. wntuo ) THEN
452  CALL cungbr( 'Q', m, n, n, a, lda, cdum(1),
453  $ cdum(1), -1, ierr )
454  lwork_cungbr_q=cdum(1)
455  maxwrk = max( maxwrk, 2*n+lwork_cungbr_q )
456  END IF
457  IF( wntua ) THEN
458  CALL cungbr( 'Q', m, m, n, a, lda, cdum(1),
459  $ cdum(1), -1, ierr )
460  lwork_cungbr_q=cdum(1)
461  maxwrk = max( maxwrk, 2*n+lwork_cungbr_q )
462  END IF
463  IF( .NOT.wntvn ) THEN
464  maxwrk = max( maxwrk, 2*n+lwork_cungbr_p )
465  minwrk = 2*n + m
466  END IF
467  END IF
468  ELSE IF( minmn.GT.0 ) THEN
469 *
470 * Space needed for CBDSQR is BDSPAC = 5*M
471 *
472  mnthr = ilaenv( 6, 'CGESVD', jobu // jobvt, m, n, 0, 0 )
473 * Compute space needed for CGELQF
474  CALL cgelqf( m, n, a, lda, cdum(1), cdum(1), -1, ierr )
475  lwork_cgelqf=cdum(1)
476 * Compute space needed for CUNGLQ
477  CALL cunglq( n, n, m, cdum(1), n, cdum(1), cdum(1), -1,
478  $ ierr )
479  lwork_cunglq_n=cdum(1)
480  CALL cunglq( m, n, m, a, lda, cdum(1), cdum(1), -1, ierr )
481  lwork_cunglq_m=cdum(1)
482 * Compute space needed for CGEBRD
483  CALL cgebrd( m, m, a, lda, s, dum(1), cdum(1),
484  $ cdum(1), cdum(1), -1, ierr )
485  lwork_cgebrd=cdum(1)
486 * Compute space needed for CUNGBR P
487  CALL cungbr( 'P', m, m, m, a, n, cdum(1),
488  $ cdum(1), -1, ierr )
489  lwork_cungbr_p=cdum(1)
490 * Compute space needed for CUNGBR Q
491  CALL cungbr( 'Q', m, m, m, a, n, cdum(1),
492  $ cdum(1), -1, ierr )
493  lwork_cungbr_q=cdum(1)
494  IF( n.GE.mnthr ) THEN
495  IF( wntvn ) THEN
496 *
497 * Path 1t(N much larger than M, JOBVT='N')
498 *
499  maxwrk = m + lwork_cgelqf
500  maxwrk = max( maxwrk, 2*m+lwork_cgebrd )
501  IF( wntuo .OR. wntuas )
502  $ maxwrk = max( maxwrk, 2*m+lwork_cungbr_q )
503  minwrk = 3*m
504  ELSE IF( wntvo .AND. wntun ) THEN
505 *
506 * Path 2t(N much larger than M, JOBU='N', JOBVT='O')
507 *
508  wrkbl = m + lwork_cgelqf
509  wrkbl = max( wrkbl, m+lwork_cunglq_m )
510  wrkbl = max( wrkbl, 2*m+lwork_cgebrd )
511  wrkbl = max( wrkbl, 2*m+lwork_cungbr_p )
512  maxwrk = max( m*m+wrkbl, m*m+m*n )
513  minwrk = 2*m + n
514  ELSE IF( wntvo .AND. wntuas ) THEN
515 *
516 * Path 3t(N much larger than M, JOBU='S' or 'A',
517 * JOBVT='O')
518 *
519  wrkbl = m + lwork_cgelqf
520  wrkbl = max( wrkbl, m+lwork_cunglq_m )
521  wrkbl = max( wrkbl, 2*m+lwork_cgebrd )
522  wrkbl = max( wrkbl, 2*m+lwork_cungbr_p )
523  wrkbl = max( wrkbl, 2*m+lwork_cungbr_q )
524  maxwrk = max( m*m+wrkbl, m*m+m*n )
525  minwrk = 2*m + n
526  ELSE IF( wntvs .AND. wntun ) THEN
527 *
528 * Path 4t(N much larger than M, JOBU='N', JOBVT='S')
529 *
530  wrkbl = m + lwork_cgelqf
531  wrkbl = max( wrkbl, m+lwork_cunglq_m )
532  wrkbl = max( wrkbl, 2*m+lwork_cgebrd )
533  wrkbl = max( wrkbl, 2*m+lwork_cungbr_p )
534  maxwrk = m*m + wrkbl
535  minwrk = 2*m + n
536  ELSE IF( wntvs .AND. wntuo ) THEN
537 *
538 * Path 5t(N much larger than M, JOBU='O', JOBVT='S')
539 *
540  wrkbl = m + lwork_cgelqf
541  wrkbl = max( wrkbl, m+lwork_cunglq_m )
542  wrkbl = max( wrkbl, 2*m+lwork_cgebrd )
543  wrkbl = max( wrkbl, 2*m+lwork_cungbr_p )
544  wrkbl = max( wrkbl, 2*m+lwork_cungbr_q )
545  maxwrk = 2*m*m + wrkbl
546  minwrk = 2*m + n
547  ELSE IF( wntvs .AND. wntuas ) THEN
548 *
549 * Path 6t(N much larger than M, JOBU='S' or 'A',
550 * JOBVT='S')
551 *
552  wrkbl = m + lwork_cgelqf
553  wrkbl = max( wrkbl, m+lwork_cunglq_m )
554  wrkbl = max( wrkbl, 2*m+lwork_cgebrd )
555  wrkbl = max( wrkbl, 2*m+lwork_cungbr_p )
556  wrkbl = max( wrkbl, 2*m+lwork_cungbr_q )
557  maxwrk = m*m + wrkbl
558  minwrk = 2*m + n
559  ELSE IF( wntva .AND. wntun ) THEN
560 *
561 * Path 7t(N much larger than M, JOBU='N', JOBVT='A')
562 *
563  wrkbl = m + lwork_cgelqf
564  wrkbl = max( wrkbl, m+lwork_cunglq_n )
565  wrkbl = max( wrkbl, 2*m+lwork_cgebrd )
566  wrkbl = max( wrkbl, 2*m+lwork_cungbr_p )
567  maxwrk = m*m + wrkbl
568  minwrk = 2*m + n
569  ELSE IF( wntva .AND. wntuo ) THEN
570 *
571 * Path 8t(N much larger than M, JOBU='O', JOBVT='A')
572 *
573  wrkbl = m + lwork_cgelqf
574  wrkbl = max( wrkbl, m+lwork_cunglq_n )
575  wrkbl = max( wrkbl, 2*m+lwork_cgebrd )
576  wrkbl = max( wrkbl, 2*m+lwork_cungbr_p )
577  wrkbl = max( wrkbl, 2*m+lwork_cungbr_q )
578  maxwrk = 2*m*m + wrkbl
579  minwrk = 2*m + n
580  ELSE IF( wntva .AND. wntuas ) THEN
581 *
582 * Path 9t(N much larger than M, JOBU='S' or 'A',
583 * JOBVT='A')
584 *
585  wrkbl = m + lwork_cgelqf
586  wrkbl = max( wrkbl, m+lwork_cunglq_n )
587  wrkbl = max( wrkbl, 2*m+lwork_cgebrd )
588  wrkbl = max( wrkbl, 2*m+lwork_cungbr_p )
589  wrkbl = max( wrkbl, 2*m+lwork_cungbr_q )
590  maxwrk = m*m + wrkbl
591  minwrk = 2*m + n
592  END IF
593  ELSE
594 *
595 * Path 10t(N greater than M, but not much larger)
596 *
597  CALL cgebrd( m, n, a, lda, s, dum(1), cdum(1),
598  $ cdum(1), cdum(1), -1, ierr )
599  lwork_cgebrd=cdum(1)
600  maxwrk = 2*m + lwork_cgebrd
601  IF( wntvs .OR. wntvo ) THEN
602 * Compute space needed for CUNGBR P
603  CALL cungbr( 'P', m, n, m, a, n, cdum(1),
604  $ cdum(1), -1, ierr )
605  lwork_cungbr_p=cdum(1)
606  maxwrk = max( maxwrk, 2*m+lwork_cungbr_p )
607  END IF
608  IF( wntva ) THEN
609  CALL cungbr( 'P', n, n, m, a, n, cdum(1),
610  $ cdum(1), -1, ierr )
611  lwork_cungbr_p=cdum(1)
612  maxwrk = max( maxwrk, 2*m+lwork_cungbr_p )
613  END IF
614  IF( .NOT.wntun ) THEN
615  maxwrk = max( maxwrk, 2*m+lwork_cungbr_q )
616  minwrk = 2*m + n
617  END IF
618  END IF
619  END IF
620  maxwrk = max( minwrk, maxwrk )
621  work( 1 ) = maxwrk
622 *
623  IF( lwork.LT.minwrk .AND. .NOT.lquery ) THEN
624  info = -13
625  END IF
626  END IF
627 *
628  IF( info.NE.0 ) THEN
629  CALL xerbla( 'CGESVD', -info )
630  RETURN
631  ELSE IF( lquery ) THEN
632  RETURN
633  END IF
634 *
635 * Quick return if possible
636 *
637  IF( m.EQ.0 .OR. n.EQ.0 ) THEN
638  RETURN
639  END IF
640 *
641 * Get machine constants
642 *
643  eps = slamch( 'P' )
644  smlnum = sqrt( slamch( 'S' ) ) / eps
645  bignum = one / smlnum
646 *
647 * Scale A if max element outside range [SMLNUM,BIGNUM]
648 *
649  anrm = clange( 'M', m, n, a, lda, dum )
650  iscl = 0
651  IF( anrm.GT.zero .AND. anrm.LT.smlnum ) THEN
652  iscl = 1
653  CALL clascl( 'G', 0, 0, anrm, smlnum, m, n, a, lda, ierr )
654  ELSE IF( anrm.GT.bignum ) THEN
655  iscl = 1
656  CALL clascl( 'G', 0, 0, anrm, bignum, m, n, a, lda, ierr )
657  END IF
658 *
659  IF( m.GE.n ) THEN
660 *
661 * A has at least as many rows as columns. If A has sufficiently
662 * more rows than columns, first reduce using the QR
663 * decomposition (if sufficient workspace available)
664 *
665  IF( m.GE.mnthr ) THEN
666 *
667  IF( wntun ) THEN
668 *
669 * Path 1 (M much larger than N, JOBU='N')
670 * No left singular vectors to be computed
671 *
672  itau = 1
673  iwork = itau + n
674 *
675 * Compute A=Q*R
676 * (CWorkspace: need 2*N, prefer N+N*NB)
677 * (RWorkspace: need 0)
678 *
679  CALL cgeqrf( m, n, a, lda, work( itau ), work( iwork ),
680  $ lwork-iwork+1, ierr )
681 *
682 * Zero out below R
683 *
684  CALL claset( 'L', n-1, n-1, czero, czero, a( 2, 1 ),
685  $ lda )
686  ie = 1
687  itauq = 1
688  itaup = itauq + n
689  iwork = itaup + n
690 *
691 * Bidiagonalize R in A
692 * (CWorkspace: need 3*N, prefer 2*N+2*N*NB)
693 * (RWorkspace: need N)
694 *
695  CALL cgebrd( n, n, a, lda, s, rwork( ie ), work( itauq ),
696  $ work( itaup ), work( iwork ), lwork-iwork+1,
697  $ ierr )
698  ncvt = 0
699  IF( wntvo .OR. wntvas ) THEN
700 *
701 * If right singular vectors desired, generate P'.
702 * (CWorkspace: need 3*N-1, prefer 2*N+(N-1)*NB)
703 * (RWorkspace: 0)
704 *
705  CALL cungbr( 'P', n, n, n, a, lda, work( itaup ),
706  $ work( iwork ), lwork-iwork+1, ierr )
707  ncvt = n
708  END IF
709  irwork = ie + n
710 *
711 * Perform bidiagonal QR iteration, computing right
712 * singular vectors of A in A if desired
713 * (CWorkspace: 0)
714 * (RWorkspace: need BDSPAC)
715 *
716  CALL cbdsqr( 'U', n, ncvt, 0, 0, s, rwork( ie ), a, lda,
717  $ cdum, 1, cdum, 1, rwork( irwork ), info )
718 *
719 * If right singular vectors desired in VT, copy them there
720 *
721  IF( wntvas )
722  $ CALL clacpy( 'F', n, n, a, lda, vt, ldvt )
723 *
724  ELSE IF( wntuo .AND. wntvn ) THEN
725 *
726 * Path 2 (M much larger than N, JOBU='O', JOBVT='N')
727 * N left singular vectors to be overwritten on A and
728 * no right singular vectors to be computed
729 *
730  IF( lwork.GE.n*n+3*n ) THEN
731 *
732 * Sufficient workspace for a fast algorithm
733 *
734  ir = 1
735  IF( lwork.GE.max( wrkbl, lda*n )+lda*n ) THEN
736 *
737 * WORK(IU) is LDA by N, WORK(IR) is LDA by N
738 *
739  ldwrku = lda
740  ldwrkr = lda
741  ELSE IF( lwork.GE.max( wrkbl, lda*n )+n*n ) THEN
742 *
743 * WORK(IU) is LDA by N, WORK(IR) is N by N
744 *
745  ldwrku = lda
746  ldwrkr = n
747  ELSE
748 *
749 * WORK(IU) is LDWRKU by N, WORK(IR) is N by N
750 *
751  ldwrku = ( lwork-n*n ) / n
752  ldwrkr = n
753  END IF
754  itau = ir + ldwrkr*n
755  iwork = itau + n
756 *
757 * Compute A=Q*R
758 * (CWorkspace: need N*N+2*N, prefer N*N+N+N*NB)
759 * (RWorkspace: 0)
760 *
761  CALL cgeqrf( m, n, a, lda, work( itau ),
762  $ work( iwork ), lwork-iwork+1, ierr )
763 *
764 * Copy R to WORK(IR) and zero out below it
765 *
766  CALL clacpy( 'U', n, n, a, lda, work( ir ), ldwrkr )
767  CALL claset( 'L', n-1, n-1, czero, czero,
768  $ work( ir+1 ), ldwrkr )
769 *
770 * Generate Q in A
771 * (CWorkspace: need N*N+2*N, prefer N*N+N+N*NB)
772 * (RWorkspace: 0)
773 *
774  CALL cungqr( m, n, n, a, lda, work( itau ),
775  $ work( iwork ), lwork-iwork+1, ierr )
776  ie = 1
777  itauq = itau
778  itaup = itauq + n
779  iwork = itaup + n
780 *
781 * Bidiagonalize R in WORK(IR)
782 * (CWorkspace: need N*N+3*N, prefer N*N+2*N+2*N*NB)
783 * (RWorkspace: need N)
784 *
785  CALL cgebrd( n, n, work( ir ), ldwrkr, s, rwork( ie ),
786  $ work( itauq ), work( itaup ),
787  $ work( iwork ), lwork-iwork+1, ierr )
788 *
789 * Generate left vectors bidiagonalizing R
790 * (CWorkspace: need N*N+3*N, prefer N*N+2*N+N*NB)
791 * (RWorkspace: need 0)
792 *
793  CALL cungbr( 'Q', n, n, n, work( ir ), ldwrkr,
794  $ work( itauq ), work( iwork ),
795  $ lwork-iwork+1, ierr )
796  irwork = ie + n
797 *
798 * Perform bidiagonal QR iteration, computing left
799 * singular vectors of R in WORK(IR)
800 * (CWorkspace: need N*N)
801 * (RWorkspace: need BDSPAC)
802 *
803  CALL cbdsqr( 'U', n, 0, n, 0, s, rwork( ie ), cdum, 1,
804  $ work( ir ), ldwrkr, cdum, 1,
805  $ rwork( irwork ), info )
806  iu = itauq
807 *
808 * Multiply Q in A by left singular vectors of R in
809 * WORK(IR), storing result in WORK(IU) and copying to A
810 * (CWorkspace: need N*N+N, prefer N*N+M*N)
811 * (RWorkspace: 0)
812 *
813  DO 10 i = 1, m, ldwrku
814  chunk = min( m-i+1, ldwrku )
815  CALL cgemm( 'N', 'N', chunk, n, n, cone, a( i, 1 ),
816  $ lda, work( ir ), ldwrkr, czero,
817  $ work( iu ), ldwrku )
818  CALL clacpy( 'F', chunk, n, work( iu ), ldwrku,
819  $ a( i, 1 ), lda )
820  10 CONTINUE
821 *
822  ELSE
823 *
824 * Insufficient workspace for a fast algorithm
825 *
826  ie = 1
827  itauq = 1
828  itaup = itauq + n
829  iwork = itaup + n
830 *
831 * Bidiagonalize A
832 * (CWorkspace: need 2*N+M, prefer 2*N+(M+N)*NB)
833 * (RWorkspace: N)
834 *
835  CALL cgebrd( m, n, a, lda, s, rwork( ie ),
836  $ work( itauq ), work( itaup ),
837  $ work( iwork ), lwork-iwork+1, ierr )
838 *
839 * Generate left vectors bidiagonalizing A
840 * (CWorkspace: need 3*N, prefer 2*N+N*NB)
841 * (RWorkspace: 0)
842 *
843  CALL cungbr( 'Q', m, n, n, a, lda, work( itauq ),
844  $ work( iwork ), lwork-iwork+1, ierr )
845  irwork = ie + n
846 *
847 * Perform bidiagonal QR iteration, computing left
848 * singular vectors of A in A
849 * (CWorkspace: need 0)
850 * (RWorkspace: need BDSPAC)
851 *
852  CALL cbdsqr( 'U', n, 0, m, 0, s, rwork( ie ), cdum, 1,
853  $ a, lda, cdum, 1, rwork( irwork ), info )
854 *
855  END IF
856 *
857  ELSE IF( wntuo .AND. wntvas ) THEN
858 *
859 * Path 3 (M much larger than N, JOBU='O', JOBVT='S' or 'A')
860 * N left singular vectors to be overwritten on A and
861 * N right singular vectors to be computed in VT
862 *
863  IF( lwork.GE.n*n+3*n ) THEN
864 *
865 * Sufficient workspace for a fast algorithm
866 *
867  ir = 1
868  IF( lwork.GE.max( wrkbl, lda*n )+lda*n ) THEN
869 *
870 * WORK(IU) is LDA by N and WORK(IR) is LDA by N
871 *
872  ldwrku = lda
873  ldwrkr = lda
874  ELSE IF( lwork.GE.max( wrkbl, lda*n )+n*n ) THEN
875 *
876 * WORK(IU) is LDA by N and WORK(IR) is N by N
877 *
878  ldwrku = lda
879  ldwrkr = n
880  ELSE
881 *
882 * WORK(IU) is LDWRKU by N and WORK(IR) is N by N
883 *
884  ldwrku = ( lwork-n*n ) / n
885  ldwrkr = n
886  END IF
887  itau = ir + ldwrkr*n
888  iwork = itau + n
889 *
890 * Compute A=Q*R
891 * (CWorkspace: need N*N+2*N, prefer N*N+N+N*NB)
892 * (RWorkspace: 0)
893 *
894  CALL cgeqrf( m, n, a, lda, work( itau ),
895  $ work( iwork ), lwork-iwork+1, ierr )
896 *
897 * Copy R to VT, zeroing out below it
898 *
899  CALL clacpy( 'U', n, n, a, lda, vt, ldvt )
900  IF( n.GT.1 )
901  $ CALL claset( 'L', n-1, n-1, czero, czero,
902  $ vt( 2, 1 ), ldvt )
903 *
904 * Generate Q in A
905 * (CWorkspace: need N*N+2*N, prefer N*N+N+N*NB)
906 * (RWorkspace: 0)
907 *
908  CALL cungqr( m, n, n, a, lda, work( itau ),
909  $ work( iwork ), lwork-iwork+1, ierr )
910  ie = 1
911  itauq = itau
912  itaup = itauq + n
913  iwork = itaup + n
914 *
915 * Bidiagonalize R in VT, copying result to WORK(IR)
916 * (CWorkspace: need N*N+3*N, prefer N*N+2*N+2*N*NB)
917 * (RWorkspace: need N)
918 *
919  CALL cgebrd( n, n, vt, ldvt, s, rwork( ie ),
920  $ work( itauq ), work( itaup ),
921  $ work( iwork ), lwork-iwork+1, ierr )
922  CALL clacpy( 'L', n, n, vt, ldvt, work( ir ), ldwrkr )
923 *
924 * Generate left vectors bidiagonalizing R in WORK(IR)
925 * (CWorkspace: need N*N+3*N, prefer N*N+2*N+N*NB)
926 * (RWorkspace: 0)
927 *
928  CALL cungbr( 'Q', n, n, n, work( ir ), ldwrkr,
929  $ work( itauq ), work( iwork ),
930  $ lwork-iwork+1, ierr )
931 *
932 * Generate right vectors bidiagonalizing R in VT
933 * (CWorkspace: need N*N+3*N-1, prefer N*N+2*N+(N-1)*NB)
934 * (RWorkspace: 0)
935 *
936  CALL cungbr( 'P', n, n, n, vt, ldvt, work( itaup ),
937  $ work( iwork ), lwork-iwork+1, ierr )
938  irwork = ie + n
939 *
940 * Perform bidiagonal QR iteration, computing left
941 * singular vectors of R in WORK(IR) and computing right
942 * singular vectors of R in VT
943 * (CWorkspace: need N*N)
944 * (RWorkspace: need BDSPAC)
945 *
946  CALL cbdsqr( 'U', n, n, n, 0, s, rwork( ie ), vt,
947  $ ldvt, work( ir ), ldwrkr, cdum, 1,
948  $ rwork( irwork ), info )
949  iu = itauq
950 *
951 * Multiply Q in A by left singular vectors of R in
952 * WORK(IR), storing result in WORK(IU) and copying to A
953 * (CWorkspace: need N*N+N, prefer N*N+M*N)
954 * (RWorkspace: 0)
955 *
956  DO 20 i = 1, m, ldwrku
957  chunk = min( m-i+1, ldwrku )
958  CALL cgemm( 'N', 'N', chunk, n, n, cone, a( i, 1 ),
959  $ lda, work( ir ), ldwrkr, czero,
960  $ work( iu ), ldwrku )
961  CALL clacpy( 'F', chunk, n, work( iu ), ldwrku,
962  $ a( i, 1 ), lda )
963  20 CONTINUE
964 *
965  ELSE
966 *
967 * Insufficient workspace for a fast algorithm
968 *
969  itau = 1
970  iwork = itau + n
971 *
972 * Compute A=Q*R
973 * (CWorkspace: need 2*N, prefer N+N*NB)
974 * (RWorkspace: 0)
975 *
976  CALL cgeqrf( m, n, a, lda, work( itau ),
977  $ work( iwork ), lwork-iwork+1, ierr )
978 *
979 * Copy R to VT, zeroing out below it
980 *
981  CALL clacpy( 'U', n, n, a, lda, vt, ldvt )
982  IF( n.GT.1 )
983  $ CALL claset( 'L', n-1, n-1, czero, czero,
984  $ vt( 2, 1 ), ldvt )
985 *
986 * Generate Q in A
987 * (CWorkspace: need 2*N, prefer N+N*NB)
988 * (RWorkspace: 0)
989 *
990  CALL cungqr( m, n, n, a, lda, work( itau ),
991  $ work( iwork ), lwork-iwork+1, ierr )
992  ie = 1
993  itauq = itau
994  itaup = itauq + n
995  iwork = itaup + n
996 *
997 * Bidiagonalize R in VT
998 * (CWorkspace: need 3*N, prefer 2*N+2*N*NB)
999 * (RWorkspace: N)
1000 *
1001  CALL cgebrd( n, n, vt, ldvt, s, rwork( ie ),
1002  $ work( itauq ), work( itaup ),
1003  $ work( iwork ), lwork-iwork+1, ierr )
1004 *
1005 * Multiply Q in A by left vectors bidiagonalizing R
1006 * (CWorkspace: need 2*N+M, prefer 2*N+M*NB)
1007 * (RWorkspace: 0)
1008 *
1009  CALL cunmbr( 'Q', 'R', 'N', m, n, n, vt, ldvt,
1010  $ work( itauq ), a, lda, work( iwork ),
1011  $ lwork-iwork+1, ierr )
1012 *
1013 * Generate right vectors bidiagonalizing R in VT
1014 * (CWorkspace: need 3*N-1, prefer 2*N+(N-1)*NB)
1015 * (RWorkspace: 0)
1016 *
1017  CALL cungbr( 'P', n, n, n, vt, ldvt, work( itaup ),
1018  $ work( iwork ), lwork-iwork+1, ierr )
1019  irwork = ie + n
1020 *
1021 * Perform bidiagonal QR iteration, computing left
1022 * singular vectors of A in A and computing right
1023 * singular vectors of A in VT
1024 * (CWorkspace: 0)
1025 * (RWorkspace: need BDSPAC)
1026 *
1027  CALL cbdsqr( 'U', n, n, m, 0, s, rwork( ie ), vt,
1028  $ ldvt, a, lda, cdum, 1, rwork( irwork ),
1029  $ info )
1030 *
1031  END IF
1032 *
1033  ELSE IF( wntus ) THEN
1034 *
1035  IF( wntvn ) THEN
1036 *
1037 * Path 4 (M much larger than N, JOBU='S', JOBVT='N')
1038 * N left singular vectors to be computed in U and
1039 * no right singular vectors to be computed
1040 *
1041  IF( lwork.GE.n*n+3*n ) THEN
1042 *
1043 * Sufficient workspace for a fast algorithm
1044 *
1045  ir = 1
1046  IF( lwork.GE.wrkbl+lda*n ) THEN
1047 *
1048 * WORK(IR) is LDA by N
1049 *
1050  ldwrkr = lda
1051  ELSE
1052 *
1053 * WORK(IR) is N by N
1054 *
1055  ldwrkr = n
1056  END IF
1057  itau = ir + ldwrkr*n
1058  iwork = itau + n
1059 *
1060 * Compute A=Q*R
1061 * (CWorkspace: need N*N+2*N, prefer N*N+N+N*NB)
1062 * (RWorkspace: 0)
1063 *
1064  CALL cgeqrf( m, n, a, lda, work( itau ),
1065  $ work( iwork ), lwork-iwork+1, ierr )
1066 *
1067 * Copy R to WORK(IR), zeroing out below it
1068 *
1069  CALL clacpy( 'U', n, n, a, lda, work( ir ),
1070  $ ldwrkr )
1071  CALL claset( 'L', n-1, n-1, czero, czero,
1072  $ work( ir+1 ), ldwrkr )
1073 *
1074 * Generate Q in A
1075 * (CWorkspace: need N*N+2*N, prefer N*N+N+N*NB)
1076 * (RWorkspace: 0)
1077 *
1078  CALL cungqr( m, n, n, a, lda, work( itau ),
1079  $ work( iwork ), lwork-iwork+1, ierr )
1080  ie = 1
1081  itauq = itau
1082  itaup = itauq + n
1083  iwork = itaup + n
1084 *
1085 * Bidiagonalize R in WORK(IR)
1086 * (CWorkspace: need N*N+3*N, prefer N*N+2*N+2*N*NB)
1087 * (RWorkspace: need N)
1088 *
1089  CALL cgebrd( n, n, work( ir ), ldwrkr, s,
1090  $ rwork( ie ), work( itauq ),
1091  $ work( itaup ), work( iwork ),
1092  $ lwork-iwork+1, ierr )
1093 *
1094 * Generate left vectors bidiagonalizing R in WORK(IR)
1095 * (CWorkspace: need N*N+3*N, prefer N*N+2*N+N*NB)
1096 * (RWorkspace: 0)
1097 *
1098  CALL cungbr( 'Q', n, n, n, work( ir ), ldwrkr,
1099  $ work( itauq ), work( iwork ),
1100  $ lwork-iwork+1, ierr )
1101  irwork = ie + n
1102 *
1103 * Perform bidiagonal QR iteration, computing left
1104 * singular vectors of R in WORK(IR)
1105 * (CWorkspace: need N*N)
1106 * (RWorkspace: need BDSPAC)
1107 *
1108  CALL cbdsqr( 'U', n, 0, n, 0, s, rwork( ie ), cdum,
1109  $ 1, work( ir ), ldwrkr, cdum, 1,
1110  $ rwork( irwork ), info )
1111 *
1112 * Multiply Q in A by left singular vectors of R in
1113 * WORK(IR), storing result in U
1114 * (CWorkspace: need N*N)
1115 * (RWorkspace: 0)
1116 *
1117  CALL cgemm( 'N', 'N', m, n, n, cone, a, lda,
1118  $ work( ir ), ldwrkr, czero, u, ldu )
1119 *
1120  ELSE
1121 *
1122 * Insufficient workspace for a fast algorithm
1123 *
1124  itau = 1
1125  iwork = itau + n
1126 *
1127 * Compute A=Q*R, copying result to U
1128 * (CWorkspace: need 2*N, prefer N+N*NB)
1129 * (RWorkspace: 0)
1130 *
1131  CALL cgeqrf( m, n, a, lda, work( itau ),
1132  $ work( iwork ), lwork-iwork+1, ierr )
1133  CALL clacpy( 'L', m, n, a, lda, u, ldu )
1134 *
1135 * Generate Q in U
1136 * (CWorkspace: need 2*N, prefer N+N*NB)
1137 * (RWorkspace: 0)
1138 *
1139  CALL cungqr( m, n, n, u, ldu, work( itau ),
1140  $ work( iwork ), lwork-iwork+1, ierr )
1141  ie = 1
1142  itauq = itau
1143  itaup = itauq + n
1144  iwork = itaup + n
1145 *
1146 * Zero out below R in A
1147 *
1148  CALL claset( 'L', n-1, n-1, czero, czero,
1149  $ a( 2, 1 ), lda )
1150 *
1151 * Bidiagonalize R in A
1152 * (CWorkspace: need 3*N, prefer 2*N+2*N*NB)
1153 * (RWorkspace: need N)
1154 *
1155  CALL cgebrd( n, n, a, lda, s, rwork( ie ),
1156  $ work( itauq ), work( itaup ),
1157  $ work( iwork ), lwork-iwork+1, ierr )
1158 *
1159 * Multiply Q in U by left vectors bidiagonalizing R
1160 * (CWorkspace: need 2*N+M, prefer 2*N+M*NB)
1161 * (RWorkspace: 0)
1162 *
1163  CALL cunmbr( 'Q', 'R', 'N', m, n, n, a, lda,
1164  $ work( itauq ), u, ldu, work( iwork ),
1165  $ lwork-iwork+1, ierr )
1166  irwork = ie + n
1167 *
1168 * Perform bidiagonal QR iteration, computing left
1169 * singular vectors of A in U
1170 * (CWorkspace: 0)
1171 * (RWorkspace: need BDSPAC)
1172 *
1173  CALL cbdsqr( 'U', n, 0, m, 0, s, rwork( ie ), cdum,
1174  $ 1, u, ldu, cdum, 1, rwork( irwork ),
1175  $ info )
1176 *
1177  END IF
1178 *
1179  ELSE IF( wntvo ) THEN
1180 *
1181 * Path 5 (M much larger than N, JOBU='S', JOBVT='O')
1182 * N left singular vectors to be computed in U and
1183 * N right singular vectors to be overwritten on A
1184 *
1185  IF( lwork.GE.2*n*n+3*n ) THEN
1186 *
1187 * Sufficient workspace for a fast algorithm
1188 *
1189  iu = 1
1190  IF( lwork.GE.wrkbl+2*lda*n ) THEN
1191 *
1192 * WORK(IU) is LDA by N and WORK(IR) is LDA by N
1193 *
1194  ldwrku = lda
1195  ir = iu + ldwrku*n
1196  ldwrkr = lda
1197  ELSE IF( lwork.GE.wrkbl+( lda+n )*n ) THEN
1198 *
1199 * WORK(IU) is LDA by N and WORK(IR) is N by N
1200 *
1201  ldwrku = lda
1202  ir = iu + ldwrku*n
1203  ldwrkr = n
1204  ELSE
1205 *
1206 * WORK(IU) is N by N and WORK(IR) is N by N
1207 *
1208  ldwrku = n
1209  ir = iu + ldwrku*n
1210  ldwrkr = n
1211  END IF
1212  itau = ir + ldwrkr*n
1213  iwork = itau + n
1214 *
1215 * Compute A=Q*R
1216 * (CWorkspace: need 2*N*N+2*N, prefer 2*N*N+N+N*NB)
1217 * (RWorkspace: 0)
1218 *
1219  CALL cgeqrf( m, n, a, lda, work( itau ),
1220  $ work( iwork ), lwork-iwork+1, ierr )
1221 *
1222 * Copy R to WORK(IU), zeroing out below it
1223 *
1224  CALL clacpy( 'U', n, n, a, lda, work( iu ),
1225  $ ldwrku )
1226  CALL claset( 'L', n-1, n-1, czero, czero,
1227  $ work( iu+1 ), ldwrku )
1228 *
1229 * Generate Q in A
1230 * (CWorkspace: need 2*N*N+2*N, prefer 2*N*N+N+N*NB)
1231 * (RWorkspace: 0)
1232 *
1233  CALL cungqr( m, n, n, a, lda, work( itau ),
1234  $ work( iwork ), lwork-iwork+1, ierr )
1235  ie = 1
1236  itauq = itau
1237  itaup = itauq + n
1238  iwork = itaup + n
1239 *
1240 * Bidiagonalize R in WORK(IU), copying result to
1241 * WORK(IR)
1242 * (CWorkspace: need 2*N*N+3*N,
1243 * prefer 2*N*N+2*N+2*N*NB)
1244 * (RWorkspace: need N)
1245 *
1246  CALL cgebrd( n, n, work( iu ), ldwrku, s,
1247  $ rwork( ie ), work( itauq ),
1248  $ work( itaup ), work( iwork ),
1249  $ lwork-iwork+1, ierr )
1250  CALL clacpy( 'U', n, n, work( iu ), ldwrku,
1251  $ work( ir ), ldwrkr )
1252 *
1253 * Generate left bidiagonalizing vectors in WORK(IU)
1254 * (CWorkspace: need 2*N*N+3*N, prefer 2*N*N+2*N+N*NB)
1255 * (RWorkspace: 0)
1256 *
1257  CALL cungbr( 'Q', n, n, n, work( iu ), ldwrku,
1258  $ work( itauq ), work( iwork ),
1259  $ lwork-iwork+1, ierr )
1260 *
1261 * Generate right bidiagonalizing vectors in WORK(IR)
1262 * (CWorkspace: need 2*N*N+3*N-1,
1263 * prefer 2*N*N+2*N+(N-1)*NB)
1264 * (RWorkspace: 0)
1265 *
1266  CALL cungbr( 'P', n, n, n, work( ir ), ldwrkr,
1267  $ work( itaup ), work( iwork ),
1268  $ lwork-iwork+1, ierr )
1269  irwork = ie + n
1270 *
1271 * Perform bidiagonal QR iteration, computing left
1272 * singular vectors of R in WORK(IU) and computing
1273 * right singular vectors of R in WORK(IR)
1274 * (CWorkspace: need 2*N*N)
1275 * (RWorkspace: need BDSPAC)
1276 *
1277  CALL cbdsqr( 'U', n, n, n, 0, s, rwork( ie ),
1278  $ work( ir ), ldwrkr, work( iu ),
1279  $ ldwrku, cdum, 1, rwork( irwork ),
1280  $ info )
1281 *
1282 * Multiply Q in A by left singular vectors of R in
1283 * WORK(IU), storing result in U
1284 * (CWorkspace: need N*N)
1285 * (RWorkspace: 0)
1286 *
1287  CALL cgemm( 'N', 'N', m, n, n, cone, a, lda,
1288  $ work( iu ), ldwrku, czero, u, ldu )
1289 *
1290 * Copy right singular vectors of R to A
1291 * (CWorkspace: need N*N)
1292 * (RWorkspace: 0)
1293 *
1294  CALL clacpy( 'F', n, n, work( ir ), ldwrkr, a,
1295  $ lda )
1296 *
1297  ELSE
1298 *
1299 * Insufficient workspace for a fast algorithm
1300 *
1301  itau = 1
1302  iwork = itau + n
1303 *
1304 * Compute A=Q*R, copying result to U
1305 * (CWorkspace: need 2*N, prefer N+N*NB)
1306 * (RWorkspace: 0)
1307 *
1308  CALL cgeqrf( m, n, a, lda, work( itau ),
1309  $ work( iwork ), lwork-iwork+1, ierr )
1310  CALL clacpy( 'L', m, n, a, lda, u, ldu )
1311 *
1312 * Generate Q in U
1313 * (CWorkspace: need 2*N, prefer N+N*NB)
1314 * (RWorkspace: 0)
1315 *
1316  CALL cungqr( m, n, n, u, ldu, work( itau ),
1317  $ work( iwork ), lwork-iwork+1, ierr )
1318  ie = 1
1319  itauq = itau
1320  itaup = itauq + n
1321  iwork = itaup + n
1322 *
1323 * Zero out below R in A
1324 *
1325  CALL claset( 'L', n-1, n-1, czero, czero,
1326  $ a( 2, 1 ), lda )
1327 *
1328 * Bidiagonalize R in A
1329 * (CWorkspace: need 3*N, prefer 2*N+2*N*NB)
1330 * (RWorkspace: need N)
1331 *
1332  CALL cgebrd( n, n, a, lda, s, rwork( ie ),
1333  $ work( itauq ), work( itaup ),
1334  $ work( iwork ), lwork-iwork+1, ierr )
1335 *
1336 * Multiply Q in U by left vectors bidiagonalizing R
1337 * (CWorkspace: need 2*N+M, prefer 2*N+M*NB)
1338 * (RWorkspace: 0)
1339 *
1340  CALL cunmbr( 'Q', 'R', 'N', m, n, n, a, lda,
1341  $ work( itauq ), u, ldu, work( iwork ),
1342  $ lwork-iwork+1, ierr )
1343 *
1344 * Generate right vectors bidiagonalizing R in A
1345 * (CWorkspace: need 3*N-1, prefer 2*N+(N-1)*NB)
1346 * (RWorkspace: 0)
1347 *
1348  CALL cungbr( 'P', n, n, n, a, lda, work( itaup ),
1349  $ work( iwork ), lwork-iwork+1, ierr )
1350  irwork = ie + n
1351 *
1352 * Perform bidiagonal QR iteration, computing left
1353 * singular vectors of A in U and computing right
1354 * singular vectors of A in A
1355 * (CWorkspace: 0)
1356 * (RWorkspace: need BDSPAC)
1357 *
1358  CALL cbdsqr( 'U', n, n, m, 0, s, rwork( ie ), a,
1359  $ lda, u, ldu, cdum, 1, rwork( irwork ),
1360  $ info )
1361 *
1362  END IF
1363 *
1364  ELSE IF( wntvas ) THEN
1365 *
1366 * Path 6 (M much larger than N, JOBU='S', JOBVT='S'
1367 * or 'A')
1368 * N left singular vectors to be computed in U and
1369 * N right singular vectors to be computed in VT
1370 *
1371  IF( lwork.GE.n*n+3*n ) THEN
1372 *
1373 * Sufficient workspace for a fast algorithm
1374 *
1375  iu = 1
1376  IF( lwork.GE.wrkbl+lda*n ) THEN
1377 *
1378 * WORK(IU) is LDA by N
1379 *
1380  ldwrku = lda
1381  ELSE
1382 *
1383 * WORK(IU) is N by N
1384 *
1385  ldwrku = n
1386  END IF
1387  itau = iu + ldwrku*n
1388  iwork = itau + n
1389 *
1390 * Compute A=Q*R
1391 * (CWorkspace: need N*N+2*N, prefer N*N+N+N*NB)
1392 * (RWorkspace: 0)
1393 *
1394  CALL cgeqrf( m, n, a, lda, work( itau ),
1395  $ work( iwork ), lwork-iwork+1, ierr )
1396 *
1397 * Copy R to WORK(IU), zeroing out below it
1398 *
1399  CALL clacpy( 'U', n, n, a, lda, work( iu ),
1400  $ ldwrku )
1401  CALL claset( 'L', n-1, n-1, czero, czero,
1402  $ work( iu+1 ), ldwrku )
1403 *
1404 * Generate Q in A
1405 * (CWorkspace: need N*N+2*N, prefer N*N+N+N*NB)
1406 * (RWorkspace: 0)
1407 *
1408  CALL cungqr( m, n, n, a, lda, work( itau ),
1409  $ work( iwork ), lwork-iwork+1, ierr )
1410  ie = 1
1411  itauq = itau
1412  itaup = itauq + n
1413  iwork = itaup + n
1414 *
1415 * Bidiagonalize R in WORK(IU), copying result to VT
1416 * (CWorkspace: need N*N+3*N, prefer N*N+2*N+2*N*NB)
1417 * (RWorkspace: need N)
1418 *
1419  CALL cgebrd( n, n, work( iu ), ldwrku, s,
1420  $ rwork( ie ), work( itauq ),
1421  $ work( itaup ), work( iwork ),
1422  $ lwork-iwork+1, ierr )
1423  CALL clacpy( 'U', n, n, work( iu ), ldwrku, vt,
1424  $ ldvt )
1425 *
1426 * Generate left bidiagonalizing vectors in WORK(IU)
1427 * (CWorkspace: need N*N+3*N, prefer N*N+2*N+N*NB)
1428 * (RWorkspace: 0)
1429 *
1430  CALL cungbr( 'Q', n, n, n, work( iu ), ldwrku,
1431  $ work( itauq ), work( iwork ),
1432  $ lwork-iwork+1, ierr )
1433 *
1434 * Generate right bidiagonalizing vectors in VT
1435 * (CWorkspace: need N*N+3*N-1,
1436 * prefer N*N+2*N+(N-1)*NB)
1437 * (RWorkspace: 0)
1438 *
1439  CALL cungbr( 'P', n, n, n, vt, ldvt, work( itaup ),
1440  $ work( iwork ), lwork-iwork+1, ierr )
1441  irwork = ie + n
1442 *
1443 * Perform bidiagonal QR iteration, computing left
1444 * singular vectors of R in WORK(IU) and computing
1445 * right singular vectors of R in VT
1446 * (CWorkspace: need N*N)
1447 * (RWorkspace: need BDSPAC)
1448 *
1449  CALL cbdsqr( 'U', n, n, n, 0, s, rwork( ie ), vt,
1450  $ ldvt, work( iu ), ldwrku, cdum, 1,
1451  $ rwork( irwork ), info )
1452 *
1453 * Multiply Q in A by left singular vectors of R in
1454 * WORK(IU), storing result in U
1455 * (CWorkspace: need N*N)
1456 * (RWorkspace: 0)
1457 *
1458  CALL cgemm( 'N', 'N', m, n, n, cone, a, lda,
1459  $ work( iu ), ldwrku, czero, u, ldu )
1460 *
1461  ELSE
1462 *
1463 * Insufficient workspace for a fast algorithm
1464 *
1465  itau = 1
1466  iwork = itau + n
1467 *
1468 * Compute A=Q*R, copying result to U
1469 * (CWorkspace: need 2*N, prefer N+N*NB)
1470 * (RWorkspace: 0)
1471 *
1472  CALL cgeqrf( m, n, a, lda, work( itau ),
1473  $ work( iwork ), lwork-iwork+1, ierr )
1474  CALL clacpy( 'L', m, n, a, lda, u, ldu )
1475 *
1476 * Generate Q in U
1477 * (CWorkspace: need 2*N, prefer N+N*NB)
1478 * (RWorkspace: 0)
1479 *
1480  CALL cungqr( m, n, n, u, ldu, work( itau ),
1481  $ work( iwork ), lwork-iwork+1, ierr )
1482 *
1483 * Copy R to VT, zeroing out below it
1484 *
1485  CALL clacpy( 'U', n, n, a, lda, vt, ldvt )
1486  IF( n.GT.1 )
1487  $ CALL claset( 'L', n-1, n-1, czero, czero,
1488  $ vt( 2, 1 ), ldvt )
1489  ie = 1
1490  itauq = itau
1491  itaup = itauq + n
1492  iwork = itaup + n
1493 *
1494 * Bidiagonalize R in VT
1495 * (CWorkspace: need 3*N, prefer 2*N+2*N*NB)
1496 * (RWorkspace: need N)
1497 *
1498  CALL cgebrd( n, n, vt, ldvt, s, rwork( ie ),
1499  $ work( itauq ), work( itaup ),
1500  $ work( iwork ), lwork-iwork+1, ierr )
1501 *
1502 * Multiply Q in U by left bidiagonalizing vectors
1503 * in VT
1504 * (CWorkspace: need 2*N+M, prefer 2*N+M*NB)
1505 * (RWorkspace: 0)
1506 *
1507  CALL cunmbr( 'Q', 'R', 'N', m, n, n, vt, ldvt,
1508  $ work( itauq ), u, ldu, work( iwork ),
1509  $ lwork-iwork+1, ierr )
1510 *
1511 * Generate right bidiagonalizing vectors in VT
1512 * (CWorkspace: need 3*N-1, prefer 2*N+(N-1)*NB)
1513 * (RWorkspace: 0)
1514 *
1515  CALL cungbr( 'P', n, n, n, vt, ldvt, work( itaup ),
1516  $ work( iwork ), lwork-iwork+1, ierr )
1517  irwork = ie + n
1518 *
1519 * Perform bidiagonal QR iteration, computing left
1520 * singular vectors of A in U and computing right
1521 * singular vectors of A in VT
1522 * (CWorkspace: 0)
1523 * (RWorkspace: need BDSPAC)
1524 *
1525  CALL cbdsqr( 'U', n, n, m, 0, s, rwork( ie ), vt,
1526  $ ldvt, u, ldu, cdum, 1,
1527  $ rwork( irwork ), info )
1528 *
1529  END IF
1530 *
1531  END IF
1532 *
1533  ELSE IF( wntua ) THEN
1534 *
1535  IF( wntvn ) THEN
1536 *
1537 * Path 7 (M much larger than N, JOBU='A', JOBVT='N')
1538 * M left singular vectors to be computed in U and
1539 * no right singular vectors to be computed
1540 *
1541  IF( lwork.GE.n*n+max( n+m, 3*n ) ) THEN
1542 *
1543 * Sufficient workspace for a fast algorithm
1544 *
1545  ir = 1
1546  IF( lwork.GE.wrkbl+lda*n ) THEN
1547 *
1548 * WORK(IR) is LDA by N
1549 *
1550  ldwrkr = lda
1551  ELSE
1552 *
1553 * WORK(IR) is N by N
1554 *
1555  ldwrkr = n
1556  END IF
1557  itau = ir + ldwrkr*n
1558  iwork = itau + n
1559 *
1560 * Compute A=Q*R, copying result to U
1561 * (CWorkspace: need N*N+2*N, prefer N*N+N+N*NB)
1562 * (RWorkspace: 0)
1563 *
1564  CALL cgeqrf( m, n, a, lda, work( itau ),
1565  $ work( iwork ), lwork-iwork+1, ierr )
1566  CALL clacpy( 'L', m, n, a, lda, u, ldu )
1567 *
1568 * Copy R to WORK(IR), zeroing out below it
1569 *
1570  CALL clacpy( 'U', n, n, a, lda, work( ir ),
1571  $ ldwrkr )
1572  CALL claset( 'L', n-1, n-1, czero, czero,
1573  $ work( ir+1 ), ldwrkr )
1574 *
1575 * Generate Q in U
1576 * (CWorkspace: need N*N+N+M, prefer N*N+N+M*NB)
1577 * (RWorkspace: 0)
1578 *
1579  CALL cungqr( m, m, n, u, ldu, work( itau ),
1580  $ work( iwork ), lwork-iwork+1, ierr )
1581  ie = 1
1582  itauq = itau
1583  itaup = itauq + n
1584  iwork = itaup + n
1585 *
1586 * Bidiagonalize R in WORK(IR)
1587 * (CWorkspace: need N*N+3*N, prefer N*N+2*N+2*N*NB)
1588 * (RWorkspace: need N)
1589 *
1590  CALL cgebrd( n, n, work( ir ), ldwrkr, s,
1591  $ rwork( ie ), work( itauq ),
1592  $ work( itaup ), work( iwork ),
1593  $ lwork-iwork+1, ierr )
1594 *
1595 * Generate left bidiagonalizing vectors in WORK(IR)
1596 * (CWorkspace: need N*N+3*N, prefer N*N+2*N+N*NB)
1597 * (RWorkspace: 0)
1598 *
1599  CALL cungbr( 'Q', n, n, n, work( ir ), ldwrkr,
1600  $ work( itauq ), work( iwork ),
1601  $ lwork-iwork+1, ierr )
1602  irwork = ie + n
1603 *
1604 * Perform bidiagonal QR iteration, computing left
1605 * singular vectors of R in WORK(IR)
1606 * (CWorkspace: need N*N)
1607 * (RWorkspace: need BDSPAC)
1608 *
1609  CALL cbdsqr( 'U', n, 0, n, 0, s, rwork( ie ), cdum,
1610  $ 1, work( ir ), ldwrkr, cdum, 1,
1611  $ rwork( irwork ), info )
1612 *
1613 * Multiply Q in U by left singular vectors of R in
1614 * WORK(IR), storing result in A
1615 * (CWorkspace: need N*N)
1616 * (RWorkspace: 0)
1617 *
1618  CALL cgemm( 'N', 'N', m, n, n, cone, u, ldu,
1619  $ work( ir ), ldwrkr, czero, a, lda )
1620 *
1621 * Copy left singular vectors of A from A to U
1622 *
1623  CALL clacpy( 'F', m, n, a, lda, u, ldu )
1624 *
1625  ELSE
1626 *
1627 * Insufficient workspace for a fast algorithm
1628 *
1629  itau = 1
1630  iwork = itau + n
1631 *
1632 * Compute A=Q*R, copying result to U
1633 * (CWorkspace: need 2*N, prefer N+N*NB)
1634 * (RWorkspace: 0)
1635 *
1636  CALL cgeqrf( m, n, a, lda, work( itau ),
1637  $ work( iwork ), lwork-iwork+1, ierr )
1638  CALL clacpy( 'L', m, n, a, lda, u, ldu )
1639 *
1640 * Generate Q in U
1641 * (CWorkspace: need N+M, prefer N+M*NB)
1642 * (RWorkspace: 0)
1643 *
1644  CALL cungqr( m, m, n, u, ldu, work( itau ),
1645  $ work( iwork ), lwork-iwork+1, ierr )
1646  ie = 1
1647  itauq = itau
1648  itaup = itauq + n
1649  iwork = itaup + n
1650 *
1651 * Zero out below R in A
1652 *
1653  CALL claset( 'L', n-1, n-1, czero, czero,
1654  $ a( 2, 1 ), lda )
1655 *
1656 * Bidiagonalize R in A
1657 * (CWorkspace: need 3*N, prefer 2*N+2*N*NB)
1658 * (RWorkspace: need N)
1659 *
1660  CALL cgebrd( n, n, a, lda, s, rwork( ie ),
1661  $ work( itauq ), work( itaup ),
1662  $ work( iwork ), lwork-iwork+1, ierr )
1663 *
1664 * Multiply Q in U by left bidiagonalizing vectors
1665 * in A
1666 * (CWorkspace: need 2*N+M, prefer 2*N+M*NB)
1667 * (RWorkspace: 0)
1668 *
1669  CALL cunmbr( 'Q', 'R', 'N', m, n, n, a, lda,
1670  $ work( itauq ), u, ldu, work( iwork ),
1671  $ lwork-iwork+1, ierr )
1672  irwork = ie + n
1673 *
1674 * Perform bidiagonal QR iteration, computing left
1675 * singular vectors of A in U
1676 * (CWorkspace: 0)
1677 * (RWorkspace: need BDSPAC)
1678 *
1679  CALL cbdsqr( 'U', n, 0, m, 0, s, rwork( ie ), cdum,
1680  $ 1, u, ldu, cdum, 1, rwork( irwork ),
1681  $ info )
1682 *
1683  END IF
1684 *
1685  ELSE IF( wntvo ) THEN
1686 *
1687 * Path 8 (M much larger than N, JOBU='A', JOBVT='O')
1688 * M left singular vectors to be computed in U and
1689 * N right singular vectors to be overwritten on A
1690 *
1691  IF( lwork.GE.2*n*n+max( n+m, 3*n ) ) THEN
1692 *
1693 * Sufficient workspace for a fast algorithm
1694 *
1695  iu = 1
1696  IF( lwork.GE.wrkbl+2*lda*n ) THEN
1697 *
1698 * WORK(IU) is LDA by N and WORK(IR) is LDA by N
1699 *
1700  ldwrku = lda
1701  ir = iu + ldwrku*n
1702  ldwrkr = lda
1703  ELSE IF( lwork.GE.wrkbl+( lda+n )*n ) THEN
1704 *
1705 * WORK(IU) is LDA by N and WORK(IR) is N by N
1706 *
1707  ldwrku = lda
1708  ir = iu + ldwrku*n
1709  ldwrkr = n
1710  ELSE
1711 *
1712 * WORK(IU) is N by N and WORK(IR) is N by N
1713 *
1714  ldwrku = n
1715  ir = iu + ldwrku*n
1716  ldwrkr = n
1717  END IF
1718  itau = ir + ldwrkr*n
1719  iwork = itau + n
1720 *
1721 * Compute A=Q*R, copying result to U
1722 * (CWorkspace: need 2*N*N+2*N, prefer 2*N*N+N+N*NB)
1723 * (RWorkspace: 0)
1724 *
1725  CALL cgeqrf( m, n, a, lda, work( itau ),
1726  $ work( iwork ), lwork-iwork+1, ierr )
1727  CALL clacpy( 'L', m, n, a, lda, u, ldu )
1728 *
1729 * Generate Q in U
1730 * (CWorkspace: need 2*N*N+N+M, prefer 2*N*N+N+M*NB)
1731 * (RWorkspace: 0)
1732 *
1733  CALL cungqr( m, m, n, u, ldu, work( itau ),
1734  $ work( iwork ), lwork-iwork+1, ierr )
1735 *
1736 * Copy R to WORK(IU), zeroing out below it
1737 *
1738  CALL clacpy( 'U', n, n, a, lda, work( iu ),
1739  $ ldwrku )
1740  CALL claset( 'L', n-1, n-1, czero, czero,
1741  $ work( iu+1 ), ldwrku )
1742  ie = 1
1743  itauq = itau
1744  itaup = itauq + n
1745  iwork = itaup + n
1746 *
1747 * Bidiagonalize R in WORK(IU), copying result to
1748 * WORK(IR)
1749 * (CWorkspace: need 2*N*N+3*N,
1750 * prefer 2*N*N+2*N+2*N*NB)
1751 * (RWorkspace: need N)
1752 *
1753  CALL cgebrd( n, n, work( iu ), ldwrku, s,
1754  $ rwork( ie ), work( itauq ),
1755  $ work( itaup ), work( iwork ),
1756  $ lwork-iwork+1, ierr )
1757  CALL clacpy( 'U', n, n, work( iu ), ldwrku,
1758  $ work( ir ), ldwrkr )
1759 *
1760 * Generate left bidiagonalizing vectors in WORK(IU)
1761 * (CWorkspace: need 2*N*N+3*N, prefer 2*N*N+2*N+N*NB)
1762 * (RWorkspace: 0)
1763 *
1764  CALL cungbr( 'Q', n, n, n, work( iu ), ldwrku,
1765  $ work( itauq ), work( iwork ),
1766  $ lwork-iwork+1, ierr )
1767 *
1768 * Generate right bidiagonalizing vectors in WORK(IR)
1769 * (CWorkspace: need 2*N*N+3*N-1,
1770 * prefer 2*N*N+2*N+(N-1)*NB)
1771 * (RWorkspace: 0)
1772 *
1773  CALL cungbr( 'P', n, n, n, work( ir ), ldwrkr,
1774  $ work( itaup ), work( iwork ),
1775  $ lwork-iwork+1, ierr )
1776  irwork = ie + n
1777 *
1778 * Perform bidiagonal QR iteration, computing left
1779 * singular vectors of R in WORK(IU) and computing
1780 * right singular vectors of R in WORK(IR)
1781 * (CWorkspace: need 2*N*N)
1782 * (RWorkspace: need BDSPAC)
1783 *
1784  CALL cbdsqr( 'U', n, n, n, 0, s, rwork( ie ),
1785  $ work( ir ), ldwrkr, work( iu ),
1786  $ ldwrku, cdum, 1, rwork( irwork ),
1787  $ info )
1788 *
1789 * Multiply Q in U by left singular vectors of R in
1790 * WORK(IU), storing result in A
1791 * (CWorkspace: need N*N)
1792 * (RWorkspace: 0)
1793 *
1794  CALL cgemm( 'N', 'N', m, n, n, cone, u, ldu,
1795  $ work( iu ), ldwrku, czero, a, lda )
1796 *
1797 * Copy left singular vectors of A from A to U
1798 *
1799  CALL clacpy( 'F', m, n, a, lda, u, ldu )
1800 *
1801 * Copy right singular vectors of R from WORK(IR) to A
1802 *
1803  CALL clacpy( 'F', n, n, work( ir ), ldwrkr, a,
1804  $ lda )
1805 *
1806  ELSE
1807 *
1808 * Insufficient workspace for a fast algorithm
1809 *
1810  itau = 1
1811  iwork = itau + n
1812 *
1813 * Compute A=Q*R, copying result to U
1814 * (CWorkspace: need 2*N, prefer N+N*NB)
1815 * (RWorkspace: 0)
1816 *
1817  CALL cgeqrf( m, n, a, lda, work( itau ),
1818  $ work( iwork ), lwork-iwork+1, ierr )
1819  CALL clacpy( 'L', m, n, a, lda, u, ldu )
1820 *
1821 * Generate Q in U
1822 * (CWorkspace: need N+M, prefer N+M*NB)
1823 * (RWorkspace: 0)
1824 *
1825  CALL cungqr( m, m, n, u, ldu, work( itau ),
1826  $ work( iwork ), lwork-iwork+1, ierr )
1827  ie = 1
1828  itauq = itau
1829  itaup = itauq + n
1830  iwork = itaup + n
1831 *
1832 * Zero out below R in A
1833 *
1834  CALL claset( 'L', n-1, n-1, czero, czero,
1835  $ a( 2, 1 ), lda )
1836 *
1837 * Bidiagonalize R in A
1838 * (CWorkspace: need 3*N, prefer 2*N+2*N*NB)
1839 * (RWorkspace: need N)
1840 *
1841  CALL cgebrd( n, n, a, lda, s, rwork( ie ),
1842  $ work( itauq ), work( itaup ),
1843  $ work( iwork ), lwork-iwork+1, ierr )
1844 *
1845 * Multiply Q in U by left bidiagonalizing vectors
1846 * in A
1847 * (CWorkspace: need 2*N+M, prefer 2*N+M*NB)
1848 * (RWorkspace: 0)
1849 *
1850  CALL cunmbr( 'Q', 'R', 'N', m, n, n, a, lda,
1851  $ work( itauq ), u, ldu, work( iwork ),
1852  $ lwork-iwork+1, ierr )
1853 *
1854 * Generate right bidiagonalizing vectors in A
1855 * (CWorkspace: need 3*N-1, prefer 2*N+(N-1)*NB)
1856 * (RWorkspace: 0)
1857 *
1858  CALL cungbr( 'P', n, n, n, a, lda, work( itaup ),
1859  $ work( iwork ), lwork-iwork+1, ierr )
1860  irwork = ie + n
1861 *
1862 * Perform bidiagonal QR iteration, computing left
1863 * singular vectors of A in U and computing right
1864 * singular vectors of A in A
1865 * (CWorkspace: 0)
1866 * (RWorkspace: need BDSPAC)
1867 *
1868  CALL cbdsqr( 'U', n, n, m, 0, s, rwork( ie ), a,
1869  $ lda, u, ldu, cdum, 1, rwork( irwork ),
1870  $ info )
1871 *
1872  END IF
1873 *
1874  ELSE IF( wntvas ) THEN
1875 *
1876 * Path 9 (M much larger than N, JOBU='A', JOBVT='S'
1877 * or 'A')
1878 * M left singular vectors to be computed in U and
1879 * N right singular vectors to be computed in VT
1880 *
1881  IF( lwork.GE.n*n+max( n+m, 3*n ) ) THEN
1882 *
1883 * Sufficient workspace for a fast algorithm
1884 *
1885  iu = 1
1886  IF( lwork.GE.wrkbl+lda*n ) THEN
1887 *
1888 * WORK(IU) is LDA by N
1889 *
1890  ldwrku = lda
1891  ELSE
1892 *
1893 * WORK(IU) is N by N
1894 *
1895  ldwrku = n
1896  END IF
1897  itau = iu + ldwrku*n
1898  iwork = itau + n
1899 *
1900 * Compute A=Q*R, copying result to U
1901 * (CWorkspace: need N*N+2*N, prefer N*N+N+N*NB)
1902 * (RWorkspace: 0)
1903 *
1904  CALL cgeqrf( m, n, a, lda, work( itau ),
1905  $ work( iwork ), lwork-iwork+1, ierr )
1906  CALL clacpy( 'L', m, n, a, lda, u, ldu )
1907 *
1908 * Generate Q in U
1909 * (CWorkspace: need N*N+N+M, prefer N*N+N+M*NB)
1910 * (RWorkspace: 0)
1911 *
1912  CALL cungqr( m, m, n, u, ldu, work( itau ),
1913  $ work( iwork ), lwork-iwork+1, ierr )
1914 *
1915 * Copy R to WORK(IU), zeroing out below it
1916 *
1917  CALL clacpy( 'U', n, n, a, lda, work( iu ),
1918  $ ldwrku )
1919  CALL claset( 'L', n-1, n-1, czero, czero,
1920  $ work( iu+1 ), ldwrku )
1921  ie = 1
1922  itauq = itau
1923  itaup = itauq + n
1924  iwork = itaup + n
1925 *
1926 * Bidiagonalize R in WORK(IU), copying result to VT
1927 * (CWorkspace: need N*N+3*N, prefer N*N+2*N+2*N*NB)
1928 * (RWorkspace: need N)
1929 *
1930  CALL cgebrd( n, n, work( iu ), ldwrku, s,
1931  $ rwork( ie ), work( itauq ),
1932  $ work( itaup ), work( iwork ),
1933  $ lwork-iwork+1, ierr )
1934  CALL clacpy( 'U', n, n, work( iu ), ldwrku, vt,
1935  $ ldvt )
1936 *
1937 * Generate left bidiagonalizing vectors in WORK(IU)
1938 * (CWorkspace: need N*N+3*N, prefer N*N+2*N+N*NB)
1939 * (RWorkspace: 0)
1940 *
1941  CALL cungbr( 'Q', n, n, n, work( iu ), ldwrku,
1942  $ work( itauq ), work( iwork ),
1943  $ lwork-iwork+1, ierr )
1944 *
1945 * Generate right bidiagonalizing vectors in VT
1946 * (CWorkspace: need N*N+3*N-1,
1947 * prefer N*N+2*N+(N-1)*NB)
1948 * (RWorkspace: need 0)
1949 *
1950  CALL cungbr( 'P', n, n, n, vt, ldvt, work( itaup ),
1951  $ work( iwork ), lwork-iwork+1, ierr )
1952  irwork = ie + n
1953 *
1954 * Perform bidiagonal QR iteration, computing left
1955 * singular vectors of R in WORK(IU) and computing
1956 * right singular vectors of R in VT
1957 * (CWorkspace: need N*N)
1958 * (RWorkspace: need BDSPAC)
1959 *
1960  CALL cbdsqr( 'U', n, n, n, 0, s, rwork( ie ), vt,
1961  $ ldvt, work( iu ), ldwrku, cdum, 1,
1962  $ rwork( irwork ), info )
1963 *
1964 * Multiply Q in U by left singular vectors of R in
1965 * WORK(IU), storing result in A
1966 * (CWorkspace: need N*N)
1967 * (RWorkspace: 0)
1968 *
1969  CALL cgemm( 'N', 'N', m, n, n, cone, u, ldu,
1970  $ work( iu ), ldwrku, czero, a, lda )
1971 *
1972 * Copy left singular vectors of A from A to U
1973 *
1974  CALL clacpy( 'F', m, n, a, lda, u, ldu )
1975 *
1976  ELSE
1977 *
1978 * Insufficient workspace for a fast algorithm
1979 *
1980  itau = 1
1981  iwork = itau + n
1982 *
1983 * Compute A=Q*R, copying result to U
1984 * (CWorkspace: need 2*N, prefer N+N*NB)
1985 * (RWorkspace: 0)
1986 *
1987  CALL cgeqrf( m, n, a, lda, work( itau ),
1988  $ work( iwork ), lwork-iwork+1, ierr )
1989  CALL clacpy( 'L', m, n, a, lda, u, ldu )
1990 *
1991 * Generate Q in U
1992 * (CWorkspace: need N+M, prefer N+M*NB)
1993 * (RWorkspace: 0)
1994 *
1995  CALL cungqr( m, m, n, u, ldu, work( itau ),
1996  $ work( iwork ), lwork-iwork+1, ierr )
1997 *
1998 * Copy R from A to VT, zeroing out below it
1999 *
2000  CALL clacpy( 'U', n, n, a, lda, vt, ldvt )
2001  IF( n.GT.1 )
2002  $ CALL claset( 'L', n-1, n-1, czero, czero,
2003  $ vt( 2, 1 ), ldvt )
2004  ie = 1
2005  itauq = itau
2006  itaup = itauq + n
2007  iwork = itaup + n
2008 *
2009 * Bidiagonalize R in VT
2010 * (CWorkspace: need 3*N, prefer 2*N+2*N*NB)
2011 * (RWorkspace: need N)
2012 *
2013  CALL cgebrd( n, n, vt, ldvt, s, rwork( ie ),
2014  $ work( itauq ), work( itaup ),
2015  $ work( iwork ), lwork-iwork+1, ierr )
2016 *
2017 * Multiply Q in U by left bidiagonalizing vectors
2018 * in VT
2019 * (CWorkspace: need 2*N+M, prefer 2*N+M*NB)
2020 * (RWorkspace: 0)
2021 *
2022  CALL cunmbr( 'Q', 'R', 'N', m, n, n, vt, ldvt,
2023  $ work( itauq ), u, ldu, work( iwork ),
2024  $ lwork-iwork+1, ierr )
2025 *
2026 * Generate right bidiagonalizing vectors in VT
2027 * (CWorkspace: need 3*N-1, prefer 2*N+(N-1)*NB)
2028 * (RWorkspace: 0)
2029 *
2030  CALL cungbr( 'P', n, n, n, vt, ldvt, work( itaup ),
2031  $ work( iwork ), lwork-iwork+1, ierr )
2032  irwork = ie + n
2033 *
2034 * Perform bidiagonal QR iteration, computing left
2035 * singular vectors of A in U and computing right
2036 * singular vectors of A in VT
2037 * (CWorkspace: 0)
2038 * (RWorkspace: need BDSPAC)
2039 *
2040  CALL cbdsqr( 'U', n, n, m, 0, s, rwork( ie ), vt,
2041  $ ldvt, u, ldu, cdum, 1,
2042  $ rwork( irwork ), info )
2043 *
2044  END IF
2045 *
2046  END IF
2047 *
2048  END IF
2049 *
2050  ELSE
2051 *
2052 * M .LT. MNTHR
2053 *
2054 * Path 10 (M at least N, but not much larger)
2055 * Reduce to bidiagonal form without QR decomposition
2056 *
2057  ie = 1
2058  itauq = 1
2059  itaup = itauq + n
2060  iwork = itaup + n
2061 *
2062 * Bidiagonalize A
2063 * (CWorkspace: need 2*N+M, prefer 2*N+(M+N)*NB)
2064 * (RWorkspace: need N)
2065 *
2066  CALL cgebrd( m, n, a, lda, s, rwork( ie ), work( itauq ),
2067  $ work( itaup ), work( iwork ), lwork-iwork+1,
2068  $ ierr )
2069  IF( wntuas ) THEN
2070 *
2071 * If left singular vectors desired in U, copy result to U
2072 * and generate left bidiagonalizing vectors in U
2073 * (CWorkspace: need 2*N+NCU, prefer 2*N+NCU*NB)
2074 * (RWorkspace: 0)
2075 *
2076  CALL clacpy( 'L', m, n, a, lda, u, ldu )
2077  IF( wntus )
2078  $ ncu = n
2079  IF( wntua )
2080  $ ncu = m
2081  CALL cungbr( 'Q', m, ncu, n, u, ldu, work( itauq ),
2082  $ work( iwork ), lwork-iwork+1, ierr )
2083  END IF
2084  IF( wntvas ) THEN
2085 *
2086 * If right singular vectors desired in VT, copy result to
2087 * VT and generate right bidiagonalizing vectors in VT
2088 * (CWorkspace: need 3*N-1, prefer 2*N+(N-1)*NB)
2089 * (RWorkspace: 0)
2090 *
2091  CALL clacpy( 'U', n, n, a, lda, vt, ldvt )
2092  CALL cungbr( 'P', n, n, n, vt, ldvt, work( itaup ),
2093  $ work( iwork ), lwork-iwork+1, ierr )
2094  END IF
2095  IF( wntuo ) THEN
2096 *
2097 * If left singular vectors desired in A, generate left
2098 * bidiagonalizing vectors in A
2099 * (CWorkspace: need 3*N, prefer 2*N+N*NB)
2100 * (RWorkspace: 0)
2101 *
2102  CALL cungbr( 'Q', m, n, n, a, lda, work( itauq ),
2103  $ work( iwork ), lwork-iwork+1, ierr )
2104  END IF
2105  IF( wntvo ) THEN
2106 *
2107 * If right singular vectors desired in A, generate right
2108 * bidiagonalizing vectors in A
2109 * (CWorkspace: need 3*N-1, prefer 2*N+(N-1)*NB)
2110 * (RWorkspace: 0)
2111 *
2112  CALL cungbr( 'P', n, n, n, a, lda, work( itaup ),
2113  $ work( iwork ), lwork-iwork+1, ierr )
2114  END IF
2115  irwork = ie + n
2116  IF( wntuas .OR. wntuo )
2117  $ nru = m
2118  IF( wntun )
2119  $ nru = 0
2120  IF( wntvas .OR. wntvo )
2121  $ ncvt = n
2122  IF( wntvn )
2123  $ ncvt = 0
2124  IF( ( .NOT.wntuo ) .AND. ( .NOT.wntvo ) ) THEN
2125 *
2126 * Perform bidiagonal QR iteration, if desired, computing
2127 * left singular vectors in U and computing right singular
2128 * vectors in VT
2129 * (CWorkspace: 0)
2130 * (RWorkspace: need BDSPAC)
2131 *
2132  CALL cbdsqr( 'U', n, ncvt, nru, 0, s, rwork( ie ), vt,
2133  $ ldvt, u, ldu, cdum, 1, rwork( irwork ),
2134  $ info )
2135  ELSE IF( ( .NOT.wntuo ) .AND. wntvo ) THEN
2136 *
2137 * Perform bidiagonal QR iteration, if desired, computing
2138 * left singular vectors in U and computing right singular
2139 * vectors in A
2140 * (CWorkspace: 0)
2141 * (RWorkspace: need BDSPAC)
2142 *
2143  CALL cbdsqr( 'U', n, ncvt, nru, 0, s, rwork( ie ), a,
2144  $ lda, u, ldu, cdum, 1, rwork( irwork ),
2145  $ info )
2146  ELSE
2147 *
2148 * Perform bidiagonal QR iteration, if desired, computing
2149 * left singular vectors in A and computing right singular
2150 * vectors in VT
2151 * (CWorkspace: 0)
2152 * (RWorkspace: need BDSPAC)
2153 *
2154  CALL cbdsqr( 'U', n, ncvt, nru, 0, s, rwork( ie ), vt,
2155  $ ldvt, a, lda, cdum, 1, rwork( irwork ),
2156  $ info )
2157  END IF
2158 *
2159  END IF
2160 *
2161  ELSE
2162 *
2163 * A has more columns than rows. If A has sufficiently more
2164 * columns than rows, first reduce using the LQ decomposition (if
2165 * sufficient workspace available)
2166 *
2167  IF( n.GE.mnthr ) THEN
2168 *
2169  IF( wntvn ) THEN
2170 *
2171 * Path 1t(N much larger than M, JOBVT='N')
2172 * No right singular vectors to be computed
2173 *
2174  itau = 1
2175  iwork = itau + m
2176 *
2177 * Compute A=L*Q
2178 * (CWorkspace: need 2*M, prefer M+M*NB)
2179 * (RWorkspace: 0)
2180 *
2181  CALL cgelqf( m, n, a, lda, work( itau ), work( iwork ),
2182  $ lwork-iwork+1, ierr )
2183 *
2184 * Zero out above L
2185 *
2186  CALL claset( 'U', m-1, m-1, czero, czero, a( 1, 2 ),
2187  $ lda )
2188  ie = 1
2189  itauq = 1
2190  itaup = itauq + m
2191  iwork = itaup + m
2192 *
2193 * Bidiagonalize L in A
2194 * (CWorkspace: need 3*M, prefer 2*M+2*M*NB)
2195 * (RWorkspace: need M)
2196 *
2197  CALL cgebrd( m, m, a, lda, s, rwork( ie ), work( itauq ),
2198  $ work( itaup ), work( iwork ), lwork-iwork+1,
2199  $ ierr )
2200  IF( wntuo .OR. wntuas ) THEN
2201 *
2202 * If left singular vectors desired, generate Q
2203 * (CWorkspace: need 3*M, prefer 2*M+M*NB)
2204 * (RWorkspace: 0)
2205 *
2206  CALL cungbr( 'Q', m, m, m, a, lda, work( itauq ),
2207  $ work( iwork ), lwork-iwork+1, ierr )
2208  END IF
2209  irwork = ie + m
2210  nru = 0
2211  IF( wntuo .OR. wntuas )
2212  $ nru = m
2213 *
2214 * Perform bidiagonal QR iteration, computing left singular
2215 * vectors of A in A if desired
2216 * (CWorkspace: 0)
2217 * (RWorkspace: need BDSPAC)
2218 *
2219  CALL cbdsqr( 'U', m, 0, nru, 0, s, rwork( ie ), cdum, 1,
2220  $ a, lda, cdum, 1, rwork( irwork ), info )
2221 *
2222 * If left singular vectors desired in U, copy them there
2223 *
2224  IF( wntuas )
2225  $ CALL clacpy( 'F', m, m, a, lda, u, ldu )
2226 *
2227  ELSE IF( wntvo .AND. wntun ) THEN
2228 *
2229 * Path 2t(N much larger than M, JOBU='N', JOBVT='O')
2230 * M right singular vectors to be overwritten on A and
2231 * no left singular vectors to be computed
2232 *
2233  IF( lwork.GE.m*m+3*m ) THEN
2234 *
2235 * Sufficient workspace for a fast algorithm
2236 *
2237  ir = 1
2238  IF( lwork.GE.max( wrkbl, lda*n )+lda*m ) THEN
2239 *
2240 * WORK(IU) is LDA by N and WORK(IR) is LDA by M
2241 *
2242  ldwrku = lda
2243  chunk = n
2244  ldwrkr = lda
2245  ELSE IF( lwork.GE.max( wrkbl, lda*n )+m*m ) THEN
2246 *
2247 * WORK(IU) is LDA by N and WORK(IR) is M by M
2248 *
2249  ldwrku = lda
2250  chunk = n
2251  ldwrkr = m
2252  ELSE
2253 *
2254 * WORK(IU) is M by CHUNK and WORK(IR) is M by M
2255 *
2256  ldwrku = m
2257  chunk = ( lwork-m*m ) / m
2258  ldwrkr = m
2259  END IF
2260  itau = ir + ldwrkr*m
2261  iwork = itau + m
2262 *
2263 * Compute A=L*Q
2264 * (CWorkspace: need M*M+2*M, prefer M*M+M+M*NB)
2265 * (RWorkspace: 0)
2266 *
2267  CALL cgelqf( m, n, a, lda, work( itau ),
2268  $ work( iwork ), lwork-iwork+1, ierr )
2269 *
2270 * Copy L to WORK(IR) and zero out above it
2271 *
2272  CALL clacpy( 'L', m, m, a, lda, work( ir ), ldwrkr )
2273  CALL claset( 'U', m-1, m-1, czero, czero,
2274  $ work( ir+ldwrkr ), ldwrkr )
2275 *
2276 * Generate Q in A
2277 * (CWorkspace: need M*M+2*M, prefer M*M+M+M*NB)
2278 * (RWorkspace: 0)
2279 *
2280  CALL cunglq( m, n, m, a, lda, work( itau ),
2281  $ work( iwork ), lwork-iwork+1, ierr )
2282  ie = 1
2283  itauq = itau
2284  itaup = itauq + m
2285  iwork = itaup + m
2286 *
2287 * Bidiagonalize L in WORK(IR)
2288 * (CWorkspace: need M*M+3*M, prefer M*M+2*M+2*M*NB)
2289 * (RWorkspace: need M)
2290 *
2291  CALL cgebrd( m, m, work( ir ), ldwrkr, s, rwork( ie ),
2292  $ work( itauq ), work( itaup ),
2293  $ work( iwork ), lwork-iwork+1, ierr )
2294 *
2295 * Generate right vectors bidiagonalizing L
2296 * (CWorkspace: need M*M+3*M-1, prefer M*M+2*M+(M-1)*NB)
2297 * (RWorkspace: 0)
2298 *
2299  CALL cungbr( 'P', m, m, m, work( ir ), ldwrkr,
2300  $ work( itaup ), work( iwork ),
2301  $ lwork-iwork+1, ierr )
2302  irwork = ie + m
2303 *
2304 * Perform bidiagonal QR iteration, computing right
2305 * singular vectors of L in WORK(IR)
2306 * (CWorkspace: need M*M)
2307 * (RWorkspace: need BDSPAC)
2308 *
2309  CALL cbdsqr( 'U', m, m, 0, 0, s, rwork( ie ),
2310  $ work( ir ), ldwrkr, cdum, 1, cdum, 1,
2311  $ rwork( irwork ), info )
2312  iu = itauq
2313 *
2314 * Multiply right singular vectors of L in WORK(IR) by Q
2315 * in A, storing result in WORK(IU) and copying to A
2316 * (CWorkspace: need M*M+M, prefer M*M+M*N)
2317 * (RWorkspace: 0)
2318 *
2319  DO 30 i = 1, n, chunk
2320  blk = min( n-i+1, chunk )
2321  CALL cgemm( 'N', 'N', m, blk, m, cone, work( ir ),
2322  $ ldwrkr, a( 1, i ), lda, czero,
2323  $ work( iu ), ldwrku )
2324  CALL clacpy( 'F', m, blk, work( iu ), ldwrku,
2325  $ a( 1, i ), lda )
2326  30 CONTINUE
2327 *
2328  ELSE
2329 *
2330 * Insufficient workspace for a fast algorithm
2331 *
2332  ie = 1
2333  itauq = 1
2334  itaup = itauq + m
2335  iwork = itaup + m
2336 *
2337 * Bidiagonalize A
2338 * (CWorkspace: need 2*M+N, prefer 2*M+(M+N)*NB)
2339 * (RWorkspace: need M)
2340 *
2341  CALL cgebrd( m, n, a, lda, s, rwork( ie ),
2342  $ work( itauq ), work( itaup ),
2343  $ work( iwork ), lwork-iwork+1, ierr )
2344 *
2345 * Generate right vectors bidiagonalizing A
2346 * (CWorkspace: need 3*M, prefer 2*M+M*NB)
2347 * (RWorkspace: 0)
2348 *
2349  CALL cungbr( 'P', m, n, m, a, lda, work( itaup ),
2350  $ work( iwork ), lwork-iwork+1, ierr )
2351  irwork = ie + m
2352 *
2353 * Perform bidiagonal QR iteration, computing right
2354 * singular vectors of A in A
2355 * (CWorkspace: 0)
2356 * (RWorkspace: need BDSPAC)
2357 *
2358  CALL cbdsqr( 'L', m, n, 0, 0, s, rwork( ie ), a, lda,
2359  $ cdum, 1, cdum, 1, rwork( irwork ), info )
2360 *
2361  END IF
2362 *
2363  ELSE IF( wntvo .AND. wntuas ) THEN
2364 *
2365 * Path 3t(N much larger than M, JOBU='S' or 'A', JOBVT='O')
2366 * M right singular vectors to be overwritten on A and
2367 * M left singular vectors to be computed in U
2368 *
2369  IF( lwork.GE.m*m+3*m ) THEN
2370 *
2371 * Sufficient workspace for a fast algorithm
2372 *
2373  ir = 1
2374  IF( lwork.GE.max( wrkbl, lda*n )+lda*m ) THEN
2375 *
2376 * WORK(IU) is LDA by N and WORK(IR) is LDA by M
2377 *
2378  ldwrku = lda
2379  chunk = n
2380  ldwrkr = lda
2381  ELSE IF( lwork.GE.max( wrkbl, lda*n )+m*m ) THEN
2382 *
2383 * WORK(IU) is LDA by N and WORK(IR) is M by M
2384 *
2385  ldwrku = lda
2386  chunk = n
2387  ldwrkr = m
2388  ELSE
2389 *
2390 * WORK(IU) is M by CHUNK and WORK(IR) is M by M
2391 *
2392  ldwrku = m
2393  chunk = ( lwork-m*m ) / m
2394  ldwrkr = m
2395  END IF
2396  itau = ir + ldwrkr*m
2397  iwork = itau + m
2398 *
2399 * Compute A=L*Q
2400 * (CWorkspace: need M*M+2*M, prefer M*M+M+M*NB)
2401 * (RWorkspace: 0)
2402 *
2403  CALL cgelqf( m, n, a, lda, work( itau ),
2404  $ work( iwork ), lwork-iwork+1, ierr )
2405 *
2406 * Copy L to U, zeroing about above it
2407 *
2408  CALL clacpy( 'L', m, m, a, lda, u, ldu )
2409  CALL claset( 'U', m-1, m-1, czero, czero, u( 1, 2 ),
2410  $ ldu )
2411 *
2412 * Generate Q in A
2413 * (CWorkspace: need M*M+2*M, prefer M*M+M+M*NB)
2414 * (RWorkspace: 0)
2415 *
2416  CALL cunglq( m, n, m, a, lda, work( itau ),
2417  $ work( iwork ), lwork-iwork+1, ierr )
2418  ie = 1
2419  itauq = itau
2420  itaup = itauq + m
2421  iwork = itaup + m
2422 *
2423 * Bidiagonalize L in U, copying result to WORK(IR)
2424 * (CWorkspace: need M*M+3*M, prefer M*M+2*M+2*M*NB)
2425 * (RWorkspace: need M)
2426 *
2427  CALL cgebrd( m, m, u, ldu, s, rwork( ie ),
2428  $ work( itauq ), work( itaup ),
2429  $ work( iwork ), lwork-iwork+1, ierr )
2430  CALL clacpy( 'U', m, m, u, ldu, work( ir ), ldwrkr )
2431 *
2432 * Generate right vectors bidiagonalizing L in WORK(IR)
2433 * (CWorkspace: need M*M+3*M-1, prefer M*M+2*M+(M-1)*NB)
2434 * (RWorkspace: 0)
2435 *
2436  CALL cungbr( 'P', m, m, m, work( ir ), ldwrkr,
2437  $ work( itaup ), work( iwork ),
2438  $ lwork-iwork+1, ierr )
2439 *
2440 * Generate left vectors bidiagonalizing L in U
2441 * (CWorkspace: need M*M+3*M, prefer M*M+2*M+M*NB)
2442 * (RWorkspace: 0)
2443 *
2444  CALL cungbr( 'Q', m, m, m, u, ldu, work( itauq ),
2445  $ work( iwork ), lwork-iwork+1, ierr )
2446  irwork = ie + m
2447 *
2448 * Perform bidiagonal QR iteration, computing left
2449 * singular vectors of L in U, and computing right
2450 * singular vectors of L in WORK(IR)
2451 * (CWorkspace: need M*M)
2452 * (RWorkspace: need BDSPAC)
2453 *
2454  CALL cbdsqr( 'U', m, m, m, 0, s, rwork( ie ),
2455  $ work( ir ), ldwrkr, u, ldu, cdum, 1,
2456  $ rwork( irwork ), info )
2457  iu = itauq
2458 *
2459 * Multiply right singular vectors of L in WORK(IR) by Q
2460 * in A, storing result in WORK(IU) and copying to A
2461 * (CWorkspace: need M*M+M, prefer M*M+M*N))
2462 * (RWorkspace: 0)
2463 *
2464  DO 40 i = 1, n, chunk
2465  blk = min( n-i+1, chunk )
2466  CALL cgemm( 'N', 'N', m, blk, m, cone, work( ir ),
2467  $ ldwrkr, a( 1, i ), lda, czero,
2468  $ work( iu ), ldwrku )
2469  CALL clacpy( 'F', m, blk, work( iu ), ldwrku,
2470  $ a( 1, i ), lda )
2471  40 CONTINUE
2472 *
2473  ELSE
2474 *
2475 * Insufficient workspace for a fast algorithm
2476 *
2477  itau = 1
2478  iwork = itau + m
2479 *
2480 * Compute A=L*Q
2481 * (CWorkspace: need 2*M, prefer M+M*NB)
2482 * (RWorkspace: 0)
2483 *
2484  CALL cgelqf( m, n, a, lda, work( itau ),
2485  $ work( iwork ), lwork-iwork+1, ierr )
2486 *
2487 * Copy L to U, zeroing out above it
2488 *
2489  CALL clacpy( 'L', m, m, a, lda, u, ldu )
2490  CALL claset( 'U', m-1, m-1, czero, czero, u( 1, 2 ),
2491  $ ldu )
2492 *
2493 * Generate Q in A
2494 * (CWorkspace: need 2*M, prefer M+M*NB)
2495 * (RWorkspace: 0)
2496 *
2497  CALL cunglq( m, n, m, a, lda, work( itau ),
2498  $ work( iwork ), lwork-iwork+1, ierr )
2499  ie = 1
2500  itauq = itau
2501  itaup = itauq + m
2502  iwork = itaup + m
2503 *
2504 * Bidiagonalize L in U
2505 * (CWorkspace: need 3*M, prefer 2*M+2*M*NB)
2506 * (RWorkspace: need M)
2507 *
2508  CALL cgebrd( m, m, u, ldu, s, rwork( ie ),
2509  $ work( itauq ), work( itaup ),
2510  $ work( iwork ), lwork-iwork+1, ierr )
2511 *
2512 * Multiply right vectors bidiagonalizing L by Q in A
2513 * (CWorkspace: need 2*M+N, prefer 2*M+N*NB)
2514 * (RWorkspace: 0)
2515 *
2516  CALL cunmbr( 'P', 'L', 'C', m, n, m, u, ldu,
2517  $ work( itaup ), a, lda, work( iwork ),
2518  $ lwork-iwork+1, ierr )
2519 *
2520 * Generate left vectors bidiagonalizing L in U
2521 * (CWorkspace: need 3*M, prefer 2*M+M*NB)
2522 * (RWorkspace: 0)
2523 *
2524  CALL cungbr( 'Q', m, m, m, u, ldu, work( itauq ),
2525  $ work( iwork ), lwork-iwork+1, ierr )
2526  irwork = ie + m
2527 *
2528 * Perform bidiagonal QR iteration, computing left
2529 * singular vectors of A in U and computing right
2530 * singular vectors of A in A
2531 * (CWorkspace: 0)
2532 * (RWorkspace: need BDSPAC)
2533 *
2534  CALL cbdsqr( 'U', m, n, m, 0, s, rwork( ie ), a, lda,
2535  $ u, ldu, cdum, 1, rwork( irwork ), info )
2536 *
2537  END IF
2538 *
2539  ELSE IF( wntvs ) THEN
2540 *
2541  IF( wntun ) THEN
2542 *
2543 * Path 4t(N much larger than M, JOBU='N', JOBVT='S')
2544 * M right singular vectors to be computed in VT and
2545 * no left singular vectors to be computed
2546 *
2547  IF( lwork.GE.m*m+3*m ) THEN
2548 *
2549 * Sufficient workspace for a fast algorithm
2550 *
2551  ir = 1
2552  IF( lwork.GE.wrkbl+lda*m ) THEN
2553 *
2554 * WORK(IR) is LDA by M
2555 *
2556  ldwrkr = lda
2557  ELSE
2558 *
2559 * WORK(IR) is M by M
2560 *
2561  ldwrkr = m
2562  END IF
2563  itau = ir + ldwrkr*m
2564  iwork = itau + m
2565 *
2566 * Compute A=L*Q
2567 * (CWorkspace: need M*M+2*M, prefer M*M+M+M*NB)
2568 * (RWorkspace: 0)
2569 *
2570  CALL cgelqf( m, n, a, lda, work( itau ),
2571  $ work( iwork ), lwork-iwork+1, ierr )
2572 *
2573 * Copy L to WORK(IR), zeroing out above it
2574 *
2575  CALL clacpy( 'L', m, m, a, lda, work( ir ),
2576  $ ldwrkr )
2577  CALL claset( 'U', m-1, m-1, czero, czero,
2578  $ work( ir+ldwrkr ), ldwrkr )
2579 *
2580 * Generate Q in A
2581 * (CWorkspace: need M*M+2*M, prefer M*M+M+M*NB)
2582 * (RWorkspace: 0)
2583 *
2584  CALL cunglq( m, n, m, a, lda, work( itau ),
2585  $ work( iwork ), lwork-iwork+1, ierr )
2586  ie = 1
2587  itauq = itau
2588  itaup = itauq + m
2589  iwork = itaup + m
2590 *
2591 * Bidiagonalize L in WORK(IR)
2592 * (CWorkspace: need M*M+3*M, prefer M*M+2*M+2*M*NB)
2593 * (RWorkspace: need M)
2594 *
2595  CALL cgebrd( m, m, work( ir ), ldwrkr, s,
2596  $ rwork( ie ), work( itauq ),
2597  $ work( itaup ), work( iwork ),
2598  $ lwork-iwork+1, ierr )
2599 *
2600 * Generate right vectors bidiagonalizing L in
2601 * WORK(IR)
2602 * (CWorkspace: need M*M+3*M, prefer M*M+2*M+(M-1)*NB)
2603 * (RWorkspace: 0)
2604 *
2605  CALL cungbr( 'P', m, m, m, work( ir ), ldwrkr,
2606  $ work( itaup ), work( iwork ),
2607  $ lwork-iwork+1, ierr )
2608  irwork = ie + m
2609 *
2610 * Perform bidiagonal QR iteration, computing right
2611 * singular vectors of L in WORK(IR)
2612 * (CWorkspace: need M*M)
2613 * (RWorkspace: need BDSPAC)
2614 *
2615  CALL cbdsqr( 'U', m, m, 0, 0, s, rwork( ie ),
2616  $ work( ir ), ldwrkr, cdum, 1, cdum, 1,
2617  $ rwork( irwork ), info )
2618 *
2619 * Multiply right singular vectors of L in WORK(IR) by
2620 * Q in A, storing result in VT
2621 * (CWorkspace: need M*M)
2622 * (RWorkspace: 0)
2623 *
2624  CALL cgemm( 'N', 'N', m, n, m, cone, work( ir ),
2625  $ ldwrkr, a, lda, czero, vt, ldvt )
2626 *
2627  ELSE
2628 *
2629 * Insufficient workspace for a fast algorithm
2630 *
2631  itau = 1
2632  iwork = itau + m
2633 *
2634 * Compute A=L*Q
2635 * (CWorkspace: need 2*M, prefer M+M*NB)
2636 * (RWorkspace: 0)
2637 *
2638  CALL cgelqf( m, n, a, lda, work( itau ),
2639  $ work( iwork ), lwork-iwork+1, ierr )
2640 *
2641 * Copy result to VT
2642 *
2643  CALL clacpy( 'U', m, n, a, lda, vt, ldvt )
2644 *
2645 * Generate Q in VT
2646 * (CWorkspace: need 2*M, prefer M+M*NB)
2647 * (RWorkspace: 0)
2648 *
2649  CALL cunglq( m, n, m, vt, ldvt, work( itau ),
2650  $ work( iwork ), lwork-iwork+1, ierr )
2651  ie = 1
2652  itauq = itau
2653  itaup = itauq + m
2654  iwork = itaup + m
2655 *
2656 * Zero out above L in A
2657 *
2658  CALL claset( 'U', m-1, m-1, czero, czero,
2659  $ a( 1, 2 ), lda )
2660 *
2661 * Bidiagonalize L in A
2662 * (CWorkspace: need 3*M, prefer 2*M+2*M*NB)
2663 * (RWorkspace: need M)
2664 *
2665  CALL cgebrd( m, m, a, lda, s, rwork( ie ),
2666  $ work( itauq ), work( itaup ),
2667  $ work( iwork ), lwork-iwork+1, ierr )
2668 *
2669 * Multiply right vectors bidiagonalizing L by Q in VT
2670 * (CWorkspace: need 2*M+N, prefer 2*M+N*NB)
2671 * (RWorkspace: 0)
2672 *
2673  CALL cunmbr( 'P', 'L', 'C', m, n, m, a, lda,
2674  $ work( itaup ), vt, ldvt,
2675  $ work( iwork ), lwork-iwork+1, ierr )
2676  irwork = ie + m
2677 *
2678 * Perform bidiagonal QR iteration, computing right
2679 * singular vectors of A in VT
2680 * (CWorkspace: 0)
2681 * (RWorkspace: need BDSPAC)
2682 *
2683  CALL cbdsqr( 'U', m, n, 0, 0, s, rwork( ie ), vt,
2684  $ ldvt, cdum, 1, cdum, 1,
2685  $ rwork( irwork ), info )
2686 *
2687  END IF
2688 *
2689  ELSE IF( wntuo ) THEN
2690 *
2691 * Path 5t(N much larger than M, JOBU='O', JOBVT='S')
2692 * M right singular vectors to be computed in VT and
2693 * M left singular vectors to be overwritten on A
2694 *
2695  IF( lwork.GE.2*m*m+3*m ) THEN
2696 *
2697 * Sufficient workspace for a fast algorithm
2698 *
2699  iu = 1
2700  IF( lwork.GE.wrkbl+2*lda*m ) THEN
2701 *
2702 * WORK(IU) is LDA by M and WORK(IR) is LDA by M
2703 *
2704  ldwrku = lda
2705  ir = iu + ldwrku*m
2706  ldwrkr = lda
2707  ELSE IF( lwork.GE.wrkbl+( lda+m )*m ) THEN
2708 *
2709 * WORK(IU) is LDA by M and WORK(IR) is M by M
2710 *
2711  ldwrku = lda
2712  ir = iu + ldwrku*m
2713  ldwrkr = m
2714  ELSE
2715 *
2716 * WORK(IU) is M by M and WORK(IR) is M by M
2717 *
2718  ldwrku = m
2719  ir = iu + ldwrku*m
2720  ldwrkr = m
2721  END IF
2722  itau = ir + ldwrkr*m
2723  iwork = itau + m
2724 *
2725 * Compute A=L*Q
2726 * (CWorkspace: need 2*M*M+2*M, prefer 2*M*M+M+M*NB)
2727 * (RWorkspace: 0)
2728 *
2729  CALL cgelqf( m, n, a, lda, work( itau ),
2730  $ work( iwork ), lwork-iwork+1, ierr )
2731 *
2732 * Copy L to WORK(IU), zeroing out below it
2733 *
2734  CALL clacpy( 'L', m, m, a, lda, work( iu ),
2735  $ ldwrku )
2736  CALL claset( 'U', m-1, m-1, czero, czero,
2737  $ work( iu+ldwrku ), ldwrku )
2738 *
2739 * Generate Q in A
2740 * (CWorkspace: need 2*M*M+2*M, prefer 2*M*M+M+M*NB)
2741 * (RWorkspace: 0)
2742 *
2743  CALL cunglq( m, n, m, a, lda, work( itau ),
2744  $ work( iwork ), lwork-iwork+1, ierr )
2745  ie = 1
2746  itauq = itau
2747  itaup = itauq + m
2748  iwork = itaup + m
2749 *
2750 * Bidiagonalize L in WORK(IU), copying result to
2751 * WORK(IR)
2752 * (CWorkspace: need 2*M*M+3*M,
2753 * prefer 2*M*M+2*M+2*M*NB)
2754 * (RWorkspace: need M)
2755 *
2756  CALL cgebrd( m, m, work( iu ), ldwrku, s,
2757  $ rwork( ie ), work( itauq ),
2758  $ work( itaup ), work( iwork ),
2759  $ lwork-iwork+1, ierr )
2760  CALL clacpy( 'L', m, m, work( iu ), ldwrku,
2761  $ work( ir ), ldwrkr )
2762 *
2763 * Generate right bidiagonalizing vectors in WORK(IU)
2764 * (CWorkspace: need 2*M*M+3*M-1,
2765 * prefer 2*M*M+2*M+(M-1)*NB)
2766 * (RWorkspace: 0)
2767 *
2768  CALL cungbr( 'P', m, m, m, work( iu ), ldwrku,
2769  $ work( itaup ), work( iwork ),
2770  $ lwork-iwork+1, ierr )
2771 *
2772 * Generate left bidiagonalizing vectors in WORK(IR)
2773 * (CWorkspace: need 2*M*M+3*M, prefer 2*M*M+2*M+M*NB)
2774 * (RWorkspace: 0)
2775 *
2776  CALL cungbr( 'Q', m, m, m, work( ir ), ldwrkr,
2777  $ work( itauq ), work( iwork ),
2778  $ lwork-iwork+1, ierr )
2779  irwork = ie + m
2780 *
2781 * Perform bidiagonal QR iteration, computing left
2782 * singular vectors of L in WORK(IR) and computing
2783 * right singular vectors of L in WORK(IU)
2784 * (CWorkspace: need 2*M*M)
2785 * (RWorkspace: need BDSPAC)
2786 *
2787  CALL cbdsqr( 'U', m, m, m, 0, s, rwork( ie ),
2788  $ work( iu ), ldwrku, work( ir ),
2789  $ ldwrkr, cdum, 1, rwork( irwork ),
2790  $ info )
2791 *
2792 * Multiply right singular vectors of L in WORK(IU) by
2793 * Q in A, storing result in VT
2794 * (CWorkspace: need M*M)
2795 * (RWorkspace: 0)
2796 *
2797  CALL cgemm( 'N', 'N', m, n, m, cone, work( iu ),
2798  $ ldwrku, a, lda, czero, vt, ldvt )
2799 *
2800 * Copy left singular vectors of L to A
2801 * (CWorkspace: need M*M)
2802 * (RWorkspace: 0)
2803 *
2804  CALL clacpy( 'F', m, m, work( ir ), ldwrkr, a,
2805  $ lda )
2806 *
2807  ELSE
2808 *
2809 * Insufficient workspace for a fast algorithm
2810 *
2811  itau = 1
2812  iwork = itau + m
2813 *
2814 * Compute A=L*Q, copying result to VT
2815 * (CWorkspace: need 2*M, prefer M+M*NB)
2816 * (RWorkspace: 0)
2817 *
2818  CALL cgelqf( m, n, a, lda, work( itau ),
2819  $ work( iwork ), lwork-iwork+1, ierr )
2820  CALL clacpy( 'U', m, n, a, lda, vt, ldvt )
2821 *
2822 * Generate Q in VT
2823 * (CWorkspace: need 2*M, prefer M+M*NB)
2824 * (RWorkspace: 0)
2825 *
2826  CALL cunglq( m, n, m, vt, ldvt, work( itau ),
2827  $ work( iwork ), lwork-iwork+1, ierr )
2828  ie = 1
2829  itauq = itau
2830  itaup = itauq + m
2831  iwork = itaup + m
2832 *
2833 * Zero out above L in A
2834 *
2835  CALL claset( 'U', m-1, m-1, czero, czero,
2836  $ a( 1, 2 ), lda )
2837 *
2838 * Bidiagonalize L in A
2839 * (CWorkspace: need 3*M, prefer 2*M+2*M*NB)
2840 * (RWorkspace: need M)
2841 *
2842  CALL cgebrd( m, m, a, lda, s, rwork( ie ),
2843  $ work( itauq ), work( itaup ),
2844  $ work( iwork ), lwork-iwork+1, ierr )
2845 *
2846 * Multiply right vectors bidiagonalizing L by Q in VT
2847 * (CWorkspace: need 2*M+N, prefer 2*M+N*NB)
2848 * (RWorkspace: 0)
2849 *
2850  CALL cunmbr( 'P', 'L', 'C', m, n, m, a, lda,
2851  $ work( itaup ), vt, ldvt,
2852  $ work( iwork ), lwork-iwork+1, ierr )
2853 *
2854 * Generate left bidiagonalizing vectors of L in A
2855 * (CWorkspace: need 3*M, prefer 2*M+M*NB)
2856 * (RWorkspace: 0)
2857 *
2858  CALL cungbr( 'Q', m, m, m, a, lda, work( itauq ),
2859  $ work( iwork ), lwork-iwork+1, ierr )
2860  irwork = ie + m
2861 *
2862 * Perform bidiagonal QR iteration, computing left
2863 * singular vectors of A in A and computing right
2864 * singular vectors of A in VT
2865 * (CWorkspace: 0)
2866 * (RWorkspace: need BDSPAC)
2867 *
2868  CALL cbdsqr( 'U', m, n, m, 0, s, rwork( ie ), vt,
2869  $ ldvt, a, lda, cdum, 1,
2870  $ rwork( irwork ), info )
2871 *
2872  END IF
2873 *
2874  ELSE IF( wntuas ) THEN
2875 *
2876 * Path 6t(N much larger than M, JOBU='S' or 'A',
2877 * JOBVT='S')
2878 * M right singular vectors to be computed in VT and
2879 * M left singular vectors to be computed in U
2880 *
2881  IF( lwork.GE.m*m+3*m ) THEN
2882 *
2883 * Sufficient workspace for a fast algorithm
2884 *
2885  iu = 1
2886  IF( lwork.GE.wrkbl+lda*m ) THEN
2887 *
2888 * WORK(IU) is LDA by N
2889 *
2890  ldwrku = lda
2891  ELSE
2892 *
2893 * WORK(IU) is LDA by M
2894 *