LAPACK 3.12.0
LAPACK: Linear Algebra PACKage
Loading...
Searching...
No Matches

◆ cgejsv()

subroutine cgejsv ( character*1  joba,
character*1  jobu,
character*1  jobv,
character*1  jobr,
character*1  jobt,
character*1  jobp,
integer  m,
integer  n,
complex, dimension( lda, * )  a,
integer  lda,
real, dimension( n )  sva,
complex, dimension( ldu, * )  u,
integer  ldu,
complex, dimension( ldv, * )  v,
integer  ldv,
complex, dimension( lwork )  cwork,
integer  lwork,
real, dimension( lrwork )  rwork,
integer  lrwork,
integer, dimension( * )  iwork,
integer  info 
)

CGEJSV

Download CGEJSV + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 CGEJSV computes the singular value decomposition (SVD) of a complex M-by-N
 matrix [A], where M >= N. The SVD of [A] is written as

              [A] = [U] * [SIGMA] * [V]^*,

 where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
 diagonal elements, [U] is an M-by-N (or M-by-M) unitary matrix, and
 [V] is an N-by-N unitary matrix. The diagonal elements of [SIGMA] are
 the singular values of [A]. The columns of [U] and [V] are the left and
 the right singular vectors of [A], respectively. The matrices [U] and [V]
 are computed and stored in the arrays U and V, respectively. The diagonal
 of [SIGMA] is computed and stored in the array SVA.
Parameters
[in]JOBA
          JOBA is CHARACTER*1
         Specifies the level of accuracy:
       = 'C': This option works well (high relative accuracy) if A = B * D,
              with well-conditioned B and arbitrary diagonal matrix D.
              The accuracy cannot be spoiled by COLUMN scaling. The
              accuracy of the computed output depends on the condition of
              B, and the procedure aims at the best theoretical accuracy.
              The relative error max_{i=1:N}|d sigma_i| / sigma_i is
              bounded by f(M,N)*epsilon* cond(B), independent of D.
              The input matrix is preprocessed with the QRF with column
              pivoting. This initial preprocessing and preconditioning by
              a rank revealing QR factorization is common for all values of
              JOBA. Additional actions are specified as follows:
       = 'E': Computation as with 'C' with an additional estimate of the
              condition number of B. It provides a realistic error bound.
       = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
              D1, D2, and well-conditioned matrix C, this option gives
              higher accuracy than the 'C' option. If the structure of the
              input matrix is not known, and relative accuracy is
              desirable, then this option is advisable. The input matrix A
              is preprocessed with QR factorization with FULL (row and
              column) pivoting.
       = 'G': Computation as with 'F' with an additional estimate of the
              condition number of B, where A=B*D. If A has heavily weighted
              rows, then using this condition number gives too pessimistic
              error bound.
       = 'A': Small singular values are not well determined by the data 
              and are considered as noisy; the matrix is treated as
              numerically rank deficient. The error in the computed
              singular values is bounded by f(m,n)*epsilon*||A||.
              The computed SVD A = U * S * V^* restores A up to
              f(m,n)*epsilon*||A||.
              This gives the procedure the licence to discard (set to zero)
              all singular values below N*epsilon*||A||.
       = 'R': Similar as in 'A'. Rank revealing property of the initial
              QR factorization is used do reveal (using triangular factor)
              a gap sigma_{r+1} < epsilon * sigma_r in which case the
              numerical RANK is declared to be r. The SVD is computed with
              absolute error bounds, but more accurately than with 'A'.
[in]JOBU
          JOBU is CHARACTER*1
         Specifies whether to compute the columns of U:
       = 'U': N columns of U are returned in the array U.
       = 'F': full set of M left sing. vectors is returned in the array U.
       = 'W': U may be used as workspace of length M*N. See the description
              of U.
       = 'N': U is not computed.
[in]JOBV
          JOBV is CHARACTER*1
         Specifies whether to compute the matrix V:
       = 'V': N columns of V are returned in the array V; Jacobi rotations
              are not explicitly accumulated.
       = 'J': N columns of V are returned in the array V, but they are
              computed as the product of Jacobi rotations, if JOBT = 'N'.
       = 'W': V may be used as workspace of length N*N. See the description
              of V.
       = 'N': V is not computed.
[in]JOBR
          JOBR is CHARACTER*1
         Specifies the RANGE for the singular values. Issues the licence to
         set to zero small positive singular values if they are outside
         specified range. If A .NE. 0 is scaled so that the largest singular
         value of c*A is around SQRT(BIG), BIG=SLAMCH('O'), then JOBR issues
         the licence to kill columns of A whose norm in c*A is less than
         SQRT(SFMIN) (for JOBR = 'R'), or less than SMALL=SFMIN/EPSLN,
         where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E').
       = 'N': Do not kill small columns of c*A. This option assumes that
              BLAS and QR factorizations and triangular solvers are
              implemented to work in that range. If the condition of A
              is greater than BIG, use CGESVJ.
       = 'R': RESTRICTED range for sigma(c*A) is [SQRT(SFMIN), SQRT(BIG)]
              (roughly, as described above). This option is recommended.
                                             ===========================
         For computing the singular values in the FULL range [SFMIN,BIG]
         use CGESVJ.
[in]JOBT
          JOBT is CHARACTER*1
         If the matrix is square then the procedure may determine to use
         transposed A if A^* seems to be better with respect to convergence.
         If the matrix is not square, JOBT is ignored.
         The decision is based on two values of entropy over the adjoint
         orbit of A^* * A. See the descriptions of RWORK(6) and RWORK(7).
       = 'T': transpose if entropy test indicates possibly faster
         convergence of Jacobi process if A^* is taken as input. If A is
         replaced with A^*, then the row pivoting is included automatically.
       = 'N': do not speculate.
         The option 'T' can be used to compute only the singular values, or
         the full SVD (U, SIGMA and V). For only one set of singular vectors
         (U or V), the caller should provide both U and V, as one of the
         matrices is used as workspace if the matrix A is transposed.
         The implementer can easily remove this constraint and make the
         code more complicated. See the descriptions of U and V.
         In general, this option is considered experimental, and 'N'; should
         be preferred. This is subject to changes in the future.
[in]JOBP
          JOBP is CHARACTER*1
         Issues the licence to introduce structured perturbations to drown
         denormalized numbers. This licence should be active if the
         denormals are poorly implemented, causing slow computation,
         especially in cases of fast convergence (!). For details see [1,2].
         For the sake of simplicity, this perturbations are included only
         when the full SVD or only the singular values are requested. The
         implementer/user can easily add the perturbation for the cases of
         computing one set of singular vectors.
       = 'P': introduce perturbation
       = 'N': do not perturb
[in]M
          M is INTEGER
         The number of rows of the input matrix A.  M >= 0.
[in]N
          N is INTEGER
         The number of columns of the input matrix A. M >= N >= 0.
[in,out]A
          A is COMPLEX array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]SVA
          SVA is REAL array, dimension (N)
          On exit,
          - For RWORK(1)/RWORK(2) = ONE: The singular values of A. During
            the computation SVA contains Euclidean column norms of the
            iterated matrices in the array A.
          - For RWORK(1) .NE. RWORK(2): The singular values of A are
            (RWORK(1)/RWORK(2)) * SVA(1:N). This factored form is used if
            sigma_max(A) overflows or if small singular values have been
            saved from underflow by scaling the input matrix A.
          - If JOBR='R' then some of the singular values may be returned
            as exact zeros obtained by "set to zero" because they are
            below the numerical rank threshold or are denormalized numbers.
[out]U
          U is COMPLEX array, dimension ( LDU, N ) or ( LDU, M )
          If JOBU = 'U', then U contains on exit the M-by-N matrix of
                         the left singular vectors.
          If JOBU = 'F', then U contains on exit the M-by-M matrix of
                         the left singular vectors, including an ONB
                         of the orthogonal complement of the Range(A).
          If JOBU = 'W'  .AND. (JOBV = 'V' .AND. JOBT = 'T' .AND. M = N),
                         then U is used as workspace if the procedure
                         replaces A with A^*. In that case, [V] is computed
                         in U as left singular vectors of A^* and then
                         copied back to the V array. This 'W' option is just
                         a reminder to the caller that in this case U is
                         reserved as workspace of length N*N.
          If JOBU = 'N'  U is not referenced, unless JOBT='T'.
[in]LDU
          LDU is INTEGER
          The leading dimension of the array U,  LDU >= 1.
          IF  JOBU = 'U' or 'F' or 'W',  then LDU >= M.
[out]V
          V is COMPLEX array, dimension ( LDV, N )
          If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
                         the right singular vectors;
          If JOBV = 'W', AND (JOBU = 'U' AND JOBT = 'T' AND M = N),
                         then V is used as workspace if the procedure
                         replaces A with A^*. In that case, [U] is computed
                         in V as right singular vectors of A^* and then
                         copied back to the U array. This 'W' option is just
                         a reminder to the caller that in this case V is
                         reserved as workspace of length N*N.
          If JOBV = 'N'  V is not referenced, unless JOBT='T'.
[in]LDV
          LDV is INTEGER
          The leading dimension of the array V,  LDV >= 1.
          If JOBV = 'V' or 'J' or 'W', then LDV >= N.
[out]CWORK
          CWORK is COMPLEX array, dimension (MAX(2,LWORK))
          If the call to CGEJSV is a workspace query (indicated by LWORK=-1 or
          LRWORK=-1), then on exit CWORK(1) contains the required length of 
          CWORK for the job parameters used in the call.
[in]LWORK
          LWORK is INTEGER
          Length of CWORK to confirm proper allocation of workspace.
          LWORK depends on the job:

          1. If only SIGMA is needed ( JOBU = 'N', JOBV = 'N' ) and
            1.1 .. no scaled condition estimate required (JOBA.NE.'E'.AND.JOBA.NE.'G'):
               LWORK >= 2*N+1. This is the minimal requirement.
               ->> For optimal performance (blocked code) the optimal value
               is LWORK >= N + (N+1)*NB. Here NB is the optimal
               block size for CGEQP3 and CGEQRF.
               In general, optimal LWORK is computed as
               LWORK >= max(N+LWORK(CGEQP3),N+LWORK(CGEQRF), LWORK(CGESVJ)).        
            1.2. .. an estimate of the scaled condition number of A is
               required (JOBA='E', or 'G'). In this case, LWORK the minimal
               requirement is LWORK >= N*N + 2*N.
               ->> For optimal performance (blocked code) the optimal value
               is LWORK >= max(N+(N+1)*NB, N*N+2*N)=N**2+2*N.
               In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(CGEQP3),N+LWORK(CGEQRF), LWORK(CGESVJ),
                            N*N+LWORK(CPOCON)).
          2. If SIGMA and the right singular vectors are needed (JOBV = 'V'),
             (JOBU = 'N')
            2.1   .. no scaled condition estimate requested (JOBE = 'N'):    
            -> the minimal requirement is LWORK >= 3*N.
            -> For optimal performance, 
               LWORK >= max(N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB,
               where NB is the optimal block size for CGEQP3, CGEQRF, CGELQF,
               CUNMLQ. In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(CGEQP3), N+LWORK(CGESVJ),
                       N+LWORK(CGELQF), 2*N+LWORK(CGEQRF), N+LWORK(CUNMLQ)).
            2.2 .. an estimate of the scaled condition number of A is
               required (JOBA='E', or 'G').
            -> the minimal requirement is LWORK >= 3*N.      
            -> For optimal performance, 
               LWORK >= max(N+(N+1)*NB, 2*N,2*N+N*NB)=2*N+N*NB,
               where NB is the optimal block size for CGEQP3, CGEQRF, CGELQF,
               CUNMLQ. In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(CGEQP3), LWORK(CPOCON), N+LWORK(CGESVJ),
                       N+LWORK(CGELQF), 2*N+LWORK(CGEQRF), N+LWORK(CUNMLQ)).   
          3. If SIGMA and the left singular vectors are needed
            3.1  .. no scaled condition estimate requested (JOBE = 'N'):
            -> the minimal requirement is LWORK >= 3*N.
            -> For optimal performance:
               if JOBU = 'U' :: LWORK >= max(3*N, N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB,
               where NB is the optimal block size for CGEQP3, CGEQRF, CUNMQR.
               In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(CGEQP3), 2*N+LWORK(CGEQRF), N+LWORK(CUNMQR)). 
            3.2  .. an estimate of the scaled condition number of A is
               required (JOBA='E', or 'G').
            -> the minimal requirement is LWORK >= 3*N.
            -> For optimal performance:
               if JOBU = 'U' :: LWORK >= max(3*N, N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB,
               where NB is the optimal block size for CGEQP3, CGEQRF, CUNMQR.
               In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(CGEQP3),N+LWORK(CPOCON),
                        2*N+LWORK(CGEQRF), N+LWORK(CUNMQR)).

          4. If the full SVD is needed: (JOBU = 'U' or JOBU = 'F') and
            4.1. if JOBV = 'V'
               the minimal requirement is LWORK >= 5*N+2*N*N.
            4.2. if JOBV = 'J' the minimal requirement is
               LWORK >= 4*N+N*N.
            In both cases, the allocated CWORK can accommodate blocked runs
            of CGEQP3, CGEQRF, CGELQF, CUNMQR, CUNMLQ.
 
          If the call to CGEJSV is a workspace query (indicated by LWORK=-1 or
          LRWORK=-1), then on exit CWORK(1) contains the optimal and CWORK(2) contains the
          minimal length of CWORK for the job parameters used in the call.        
[out]RWORK
          RWORK is REAL array, dimension (MAX(7,LRWORK))
          On exit,
          RWORK(1) = Determines the scaling factor SCALE = RWORK(2) / RWORK(1)
                    such that SCALE*SVA(1:N) are the computed singular values
                    of A. (See the description of SVA().)
          RWORK(2) = See the description of RWORK(1).
          RWORK(3) = SCONDA is an estimate for the condition number of
                    column equilibrated A. (If JOBA = 'E' or 'G')
                    SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1).
                    It is computed using CPOCON. It holds
                    N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
                    where R is the triangular factor from the QRF of A.
                    However, if R is truncated and the numerical rank is
                    determined to be strictly smaller than N, SCONDA is
                    returned as -1, thus indicating that the smallest
                    singular values might be lost.

          If full SVD is needed, the following two condition numbers are
          useful for the analysis of the algorithm. They are provided for
          a developer/implementer who is familiar with the details of
          the method.

          RWORK(4) = an estimate of the scaled condition number of the
                    triangular factor in the first QR factorization.
          RWORK(5) = an estimate of the scaled condition number of the
                    triangular factor in the second QR factorization.
          The following two parameters are computed if JOBT = 'T'.
          They are provided for a developer/implementer who is familiar
          with the details of the method.
          RWORK(6) = the entropy of A^* * A :: this is the Shannon entropy
                    of diag(A^* * A) / Trace(A^* * A) taken as point in the
                    probability simplex.
          RWORK(7) = the entropy of A * A^*. (See the description of RWORK(6).)
          If the call to CGEJSV is a workspace query (indicated by LWORK=-1 or
          LRWORK=-1), then on exit RWORK(1) contains the required length of
          RWORK for the job parameters used in the call.
[in]LRWORK
          LRWORK is INTEGER
          Length of RWORK to confirm proper allocation of workspace.
          LRWORK depends on the job:

       1. If only the singular values are requested i.e. if
          LSAME(JOBU,'N') .AND. LSAME(JOBV,'N')
          then:
          1.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
               then: LRWORK = max( 7, 2 * M ).
          1.2. Otherwise, LRWORK  = max( 7,  N ).
       2. If singular values with the right singular vectors are requested
          i.e. if
          (LSAME(JOBV,'V').OR.LSAME(JOBV,'J')) .AND.
          .NOT.(LSAME(JOBU,'U').OR.LSAME(JOBU,'F'))
          then:
          2.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
          then LRWORK = max( 7, 2 * M ).
          2.2. Otherwise, LRWORK  = max( 7,  N ).
       3. If singular values with the left singular vectors are requested, i.e. if
          (LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) .AND.
          .NOT.(LSAME(JOBV,'V').OR.LSAME(JOBV,'J'))
          then:
          3.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
          then LRWORK = max( 7, 2 * M ).
          3.2. Otherwise, LRWORK  = max( 7,  N ).
       4. If singular values with both the left and the right singular vectors
          are requested, i.e. if
          (LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) .AND.
          (LSAME(JOBV,'V').OR.LSAME(JOBV,'J'))
          then:
          4.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
          then LRWORK = max( 7, 2 * M ).
          4.2. Otherwise, LRWORK  = max( 7, N ).
 
          If, on entry, LRWORK = -1 or LWORK=-1, a workspace query is assumed and 
          the length of RWORK is returned in RWORK(1). 
[out]IWORK
          IWORK is INTEGER array, of dimension at least 4, that further depends
          on the job:
 
          1. If only the singular values are requested then:
             If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) 
             then the length of IWORK is N+M; otherwise the length of IWORK is N.
          2. If the singular values and the right singular vectors are requested then:
             If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) 
             then the length of IWORK is N+M; otherwise the length of IWORK is N. 
          3. If the singular values and the left singular vectors are requested then:
             If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) 
             then the length of IWORK is N+M; otherwise the length of IWORK is N. 
          4. If the singular values with both the left and the right singular vectors
             are requested, then:      
             4.1. If LSAME(JOBV,'J') the length of IWORK is determined as follows:
                  If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) 
                  then the length of IWORK is N+M; otherwise the length of IWORK is N. 
             4.2. If LSAME(JOBV,'V') the length of IWORK is determined as follows:
                  If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) 
                  then the length of IWORK is 2*N+M; otherwise the length of IWORK is 2*N.
        
          On exit,
          IWORK(1) = the numerical rank determined after the initial
                     QR factorization with pivoting. See the descriptions
                     of JOBA and JOBR.
          IWORK(2) = the number of the computed nonzero singular values
          IWORK(3) = if nonzero, a warning message:
                     If IWORK(3) = 1 then some of the column norms of A
                     were denormalized floats. The requested high accuracy
                     is not warranted by the data.
          IWORK(4) = 1 or -1. If IWORK(4) = 1, then the procedure used A^* to
                     do the job as specified by the JOB parameters.
          If the call to CGEJSV is a workspace query (indicated by LWORK = -1 and 
          LRWORK = -1), then on exit IWORK(1) contains the required length of 
          IWORK for the job parameters used in the call.
[out]INFO
          INFO is INTEGER
           < 0:  if INFO = -i, then the i-th argument had an illegal value.
           = 0:  successful exit;
           > 0:  CGEJSV  did not converge in the maximal allowed number
                 of sweeps. The computed values may be inaccurate.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Further Details:
  CGEJSV implements a preconditioned Jacobi SVD algorithm. It uses CGEQP3,
  CGEQRF, and CGELQF as preprocessors and preconditioners. Optionally, an
  additional row pivoting can be used as a preprocessor, which in some
  cases results in much higher accuracy. An example is matrix A with the
  structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
  diagonal matrices and C is well-conditioned matrix. In that case, complete
  pivoting in the first QR factorizations provides accuracy dependent on the
  condition number of C, and independent of D1, D2. Such higher accuracy is
  not completely understood theoretically, but it works well in practice.
  Further, if A can be written as A = B*D, with well-conditioned B and some
  diagonal D, then the high accuracy is guaranteed, both theoretically and
  in software, independent of D. For more details see [1], [2].
     The computational range for the singular values can be the full range
  ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
  & LAPACK routines called by CGEJSV are implemented to work in that range.
  If that is not the case, then the restriction for safe computation with
  the singular values in the range of normalized IEEE numbers is that the
  spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
  overflow. This code (CGEJSV) is best used in this restricted range,
  meaning that singular values of magnitude below ||A||_2 / SLAMCH('O') are
  returned as zeros. See JOBR for details on this.
     Further, this implementation is somewhat slower than the one described
  in [1,2] due to replacement of some non-LAPACK components, and because
  the choice of some tuning parameters in the iterative part (CGESVJ) is
  left to the implementer on a particular machine.
     The rank revealing QR factorization (in this code: CGEQP3) should be
  implemented as in [3]. We have a new version of CGEQP3 under development
  that is more robust than the current one in LAPACK, with a cleaner cut in
  rank deficient cases. It will be available in the SIGMA library [4].
  If M is much larger than N, it is obvious that the initial QRF with
  column pivoting can be preprocessed by the QRF without pivoting. That
  well known trick is not used in CGEJSV because in some cases heavy row
  weighting can be treated with complete pivoting. The overhead in cases
  M much larger than N is then only due to pivoting, but the benefits in
  terms of accuracy have prevailed. The implementer/user can incorporate
  this extra QRF step easily. The implementer can also improve data movement
  (matrix transpose, matrix copy, matrix transposed copy) - this
  implementation of CGEJSV uses only the simplest, naive data movement.
Contributor:
Zlatko Drmac (Zagreb, Croatia)
References:
 [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
     LAPACK Working note 169.
 [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
     LAPACK Working note 170.
 [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
     factorization software - a case study.
     ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
     LAPACK Working note 176.
 [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
     QSVD, (H,K)-SVD computations.
     Department of Mathematics, University of Zagreb, 2008, 2016.
Bugs, examples and comments:
Please report all bugs and send interesting examples and/or comments to drmac.nosp@m.@mat.nosp@m.h.hr. Thank you.

Definition at line 565 of file cgejsv.f.

568*
569* -- LAPACK computational routine --
570* -- LAPACK is a software package provided by Univ. of Tennessee, --
571* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
572*
573* .. Scalar Arguments ..
574 IMPLICIT NONE
575 INTEGER INFO, LDA, LDU, LDV, LWORK, LRWORK, M, N
576* ..
577* .. Array Arguments ..
578 COMPLEX A( LDA, * ), U( LDU, * ), V( LDV, * ), CWORK( LWORK )
579 REAL SVA( N ), RWORK( LRWORK )
580 INTEGER IWORK( * )
581 CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
582* ..
583*
584* ===========================================================================
585*
586* .. Local Parameters ..
587 REAL ZERO, ONE
588 parameter( zero = 0.0e0, one = 1.0e0 )
589 COMPLEX CZERO, CONE
590 parameter( czero = ( 0.0e0, 0.0e0 ), cone = ( 1.0e0, 0.0e0 ) )
591* ..
592* .. Local Scalars ..
593 COMPLEX CTEMP
594 REAL AAPP, AAQQ, AATMAX, AATMIN, BIG, BIG1, COND_OK,
595 $ CONDR1, CONDR2, ENTRA, ENTRAT, EPSLN, MAXPRJ, SCALEM,
596 $ SCONDA, SFMIN, SMALL, TEMP1, USCAL1, USCAL2, XSC
597 INTEGER IERR, N1, NR, NUMRANK, p, q, WARNING
598 LOGICAL ALMORT, DEFR, ERREST, GOSCAL, JRACC, KILL, LQUERY,
599 $ LSVEC, L2ABER, L2KILL, L2PERT, L2RANK, L2TRAN, NOSCAL,
600 $ ROWPIV, RSVEC, TRANSP
601*
602 INTEGER OPTWRK, MINWRK, MINRWRK, MINIWRK
603 INTEGER LWCON, LWLQF, LWQP3, LWQRF, LWUNMLQ, LWUNMQR, LWUNMQRM,
604 $ LWSVDJ, LWSVDJV, LRWQP3, LRWCON, LRWSVDJ, IWOFF
605 INTEGER LWRK_CGELQF, LWRK_CGEQP3, LWRK_CGEQP3N, LWRK_CGEQRF,
606 $ LWRK_CGESVJ, LWRK_CGESVJV, LWRK_CGESVJU, LWRK_CUNMLQ,
607 $ LWRK_CUNMQR, LWRK_CUNMQRM
608* ..
609* .. Local Arrays
610 COMPLEX CDUMMY(1)
611 REAL RDUMMY(1)
612*
613* .. Intrinsic Functions ..
614 INTRINSIC abs, cmplx, conjg, alog, max, min, real, nint, sqrt
615* ..
616* .. External Functions ..
617 REAL SLAMCH, SCNRM2
618 INTEGER ISAMAX, ICAMAX
619 LOGICAL LSAME
620 EXTERNAL isamax, icamax, lsame, slamch, scnrm2
621* ..
622* .. External Subroutines ..
626 $ xerbla
627*
628 EXTERNAL cgesvj
629* ..
630*
631* Test the input arguments
632*
633 lsvec = lsame( jobu, 'U' ) .OR. lsame( jobu, 'F' )
634 jracc = lsame( jobv, 'J' )
635 rsvec = lsame( jobv, 'V' ) .OR. jracc
636 rowpiv = lsame( joba, 'F' ) .OR. lsame( joba, 'G' )
637 l2rank = lsame( joba, 'R' )
638 l2aber = lsame( joba, 'A' )
639 errest = lsame( joba, 'E' ) .OR. lsame( joba, 'G' )
640 l2tran = lsame( jobt, 'T' ) .AND. ( m .EQ. n )
641 l2kill = lsame( jobr, 'R' )
642 defr = lsame( jobr, 'N' )
643 l2pert = lsame( jobp, 'P' )
644*
645 lquery = ( lwork .EQ. -1 ) .OR. ( lrwork .EQ. -1 )
646*
647 IF ( .NOT.(rowpiv .OR. l2rank .OR. l2aber .OR.
648 $ errest .OR. lsame( joba, 'C' ) )) THEN
649 info = - 1
650 ELSE IF ( .NOT.( lsvec .OR. lsame( jobu, 'N' ) .OR.
651 $ ( lsame( jobu, 'W' ) .AND. rsvec .AND. l2tran ) ) ) THEN
652 info = - 2
653 ELSE IF ( .NOT.( rsvec .OR. lsame( jobv, 'N' ) .OR.
654 $ ( lsame( jobv, 'W' ) .AND. lsvec .AND. l2tran ) ) ) THEN
655 info = - 3
656 ELSE IF ( .NOT. ( l2kill .OR. defr ) ) THEN
657 info = - 4
658 ELSE IF ( .NOT. ( lsame(jobt,'T') .OR. lsame(jobt,'N') ) ) THEN
659 info = - 5
660 ELSE IF ( .NOT. ( l2pert .OR. lsame( jobp, 'N' ) ) ) THEN
661 info = - 6
662 ELSE IF ( m .LT. 0 ) THEN
663 info = - 7
664 ELSE IF ( ( n .LT. 0 ) .OR. ( n .GT. m ) ) THEN
665 info = - 8
666 ELSE IF ( lda .LT. m ) THEN
667 info = - 10
668 ELSE IF ( lsvec .AND. ( ldu .LT. m ) ) THEN
669 info = - 13
670 ELSE IF ( rsvec .AND. ( ldv .LT. n ) ) THEN
671 info = - 15
672 ELSE
673* #:)
674 info = 0
675 END IF
676*
677 IF ( info .EQ. 0 ) THEN
678* .. compute the minimal and the optimal workspace lengths
679* [[The expressions for computing the minimal and the optimal
680* values of LCWORK, LRWORK are written with a lot of redundancy and
681* can be simplified. However, this verbose form is useful for
682* maintenance and modifications of the code.]]
683*
684* .. minimal workspace length for CGEQP3 of an M x N matrix,
685* CGEQRF of an N x N matrix, CGELQF of an N x N matrix,
686* CUNMLQ for computing N x N matrix, CUNMQR for computing N x N
687* matrix, CUNMQR for computing M x N matrix, respectively.
688 lwqp3 = n+1
689 lwqrf = max( 1, n )
690 lwlqf = max( 1, n )
691 lwunmlq = max( 1, n )
692 lwunmqr = max( 1, n )
693 lwunmqrm = max( 1, m )
694* .. minimal workspace length for CPOCON of an N x N matrix
695 lwcon = 2 * n
696* .. minimal workspace length for CGESVJ of an N x N matrix,
697* without and with explicit accumulation of Jacobi rotations
698 lwsvdj = max( 2 * n, 1 )
699 lwsvdjv = max( 2 * n, 1 )
700* .. minimal REAL workspace length for CGEQP3, CPOCON, CGESVJ
701 lrwqp3 = 2 * n
702 lrwcon = n
703 lrwsvdj = n
704 IF ( lquery ) THEN
705 CALL cgeqp3( m, n, a, lda, iwork, cdummy, cdummy, -1,
706 $ rdummy, ierr )
707 lwrk_cgeqp3 = int( cdummy(1) )
708 CALL cgeqrf( n, n, a, lda, cdummy, cdummy,-1, ierr )
709 lwrk_cgeqrf = int( cdummy(1) )
710 CALL cgelqf( n, n, a, lda, cdummy, cdummy,-1, ierr )
711 lwrk_cgelqf = int( cdummy(1) )
712 END IF
713 minwrk = 2
714 optwrk = 2
715 miniwrk = n
716 IF ( .NOT. (lsvec .OR. rsvec ) ) THEN
717* .. minimal and optimal sizes of the complex workspace if
718* only the singular values are requested
719 IF ( errest ) THEN
720 minwrk = max( n+lwqp3, n**2+lwcon, n+lwqrf, lwsvdj )
721 ELSE
722 minwrk = max( n+lwqp3, n+lwqrf, lwsvdj )
723 END IF
724 IF ( lquery ) THEN
725 CALL cgesvj( 'L', 'N', 'N', n, n, a, lda, sva, n, v,
726 $ ldv, cdummy, -1, rdummy, -1, ierr )
727 lwrk_cgesvj = int( cdummy(1) )
728 IF ( errest ) THEN
729 optwrk = max( n+lwrk_cgeqp3, n**2+lwcon,
730 $ n+lwrk_cgeqrf, lwrk_cgesvj )
731 ELSE
732 optwrk = max( n+lwrk_cgeqp3, n+lwrk_cgeqrf,
733 $ lwrk_cgesvj )
734 END IF
735 END IF
736 IF ( l2tran .OR. rowpiv ) THEN
737 IF ( errest ) THEN
738 minrwrk = max( 7, 2*m, lrwqp3, lrwcon, lrwsvdj )
739 ELSE
740 minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj )
741 END IF
742 ELSE
743 IF ( errest ) THEN
744 minrwrk = max( 7, lrwqp3, lrwcon, lrwsvdj )
745 ELSE
746 minrwrk = max( 7, lrwqp3, lrwsvdj )
747 END IF
748 END IF
749 IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
750 ELSE IF ( rsvec .AND. (.NOT.lsvec) ) THEN
751* .. minimal and optimal sizes of the complex workspace if the
752* singular values and the right singular vectors are requested
753 IF ( errest ) THEN
754 minwrk = max( n+lwqp3, lwcon, lwsvdj, n+lwlqf,
755 $ 2*n+lwqrf, n+lwsvdj, n+lwunmlq )
756 ELSE
757 minwrk = max( n+lwqp3, lwsvdj, n+lwlqf, 2*n+lwqrf,
758 $ n+lwsvdj, n+lwunmlq )
759 END IF
760 IF ( lquery ) THEN
761 CALL cgesvj( 'L', 'U', 'N', n,n, u, ldu, sva, n, a,
762 $ lda, cdummy, -1, rdummy, -1, ierr )
763 lwrk_cgesvj = int( cdummy(1) )
764 CALL cunmlq( 'L', 'C', n, n, n, a, lda, cdummy,
765 $ v, ldv, cdummy, -1, ierr )
766 lwrk_cunmlq = int( cdummy(1) )
767 IF ( errest ) THEN
768 optwrk = max( n+lwrk_cgeqp3, lwcon, lwrk_cgesvj,
769 $ n+lwrk_cgelqf, 2*n+lwrk_cgeqrf,
770 $ n+lwrk_cgesvj, n+lwrk_cunmlq )
771 ELSE
772 optwrk = max( n+lwrk_cgeqp3, lwrk_cgesvj,n+lwrk_cgelqf,
773 $ 2*n+lwrk_cgeqrf, n+lwrk_cgesvj,
774 $ n+lwrk_cunmlq )
775 END IF
776 END IF
777 IF ( l2tran .OR. rowpiv ) THEN
778 IF ( errest ) THEN
779 minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj, lrwcon )
780 ELSE
781 minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj )
782 END IF
783 ELSE
784 IF ( errest ) THEN
785 minrwrk = max( 7, lrwqp3, lrwsvdj, lrwcon )
786 ELSE
787 minrwrk = max( 7, lrwqp3, lrwsvdj )
788 END IF
789 END IF
790 IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
791 ELSE IF ( lsvec .AND. (.NOT.rsvec) ) THEN
792* .. minimal and optimal sizes of the complex workspace if the
793* singular values and the left singular vectors are requested
794 IF ( errest ) THEN
795 minwrk = n + max( lwqp3,lwcon,n+lwqrf,lwsvdj,lwunmqrm )
796 ELSE
797 minwrk = n + max( lwqp3, n+lwqrf, lwsvdj, lwunmqrm )
798 END IF
799 IF ( lquery ) THEN
800 CALL cgesvj( 'L', 'U', 'N', n,n, u, ldu, sva, n, a,
801 $ lda, cdummy, -1, rdummy, -1, ierr )
802 lwrk_cgesvj = int( cdummy(1) )
803 CALL cunmqr( 'L', 'N', m, n, n, a, lda, cdummy, u,
804 $ ldu, cdummy, -1, ierr )
805 lwrk_cunmqrm = int( cdummy(1) )
806 IF ( errest ) THEN
807 optwrk = n + max( lwrk_cgeqp3, lwcon, n+lwrk_cgeqrf,
808 $ lwrk_cgesvj, lwrk_cunmqrm )
809 ELSE
810 optwrk = n + max( lwrk_cgeqp3, n+lwrk_cgeqrf,
811 $ lwrk_cgesvj, lwrk_cunmqrm )
812 END IF
813 END IF
814 IF ( l2tran .OR. rowpiv ) THEN
815 IF ( errest ) THEN
816 minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj, lrwcon )
817 ELSE
818 minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj )
819 END IF
820 ELSE
821 IF ( errest ) THEN
822 minrwrk = max( 7, lrwqp3, lrwsvdj, lrwcon )
823 ELSE
824 minrwrk = max( 7, lrwqp3, lrwsvdj )
825 END IF
826 END IF
827 IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
828 ELSE
829* .. minimal and optimal sizes of the complex workspace if the
830* full SVD is requested
831 IF ( .NOT. jracc ) THEN
832 IF ( errest ) THEN
833 minwrk = max( n+lwqp3, n+lwcon, 2*n+n**2+lwcon,
834 $ 2*n+lwqrf, 2*n+lwqp3,
835 $ 2*n+n**2+n+lwlqf, 2*n+n**2+n+n**2+lwcon,
836 $ 2*n+n**2+n+lwsvdj, 2*n+n**2+n+lwsvdjv,
837 $ 2*n+n**2+n+lwunmqr,2*n+n**2+n+lwunmlq,
838 $ n+n**2+lwsvdj, n+lwunmqrm )
839 ELSE
840 minwrk = max( n+lwqp3, 2*n+n**2+lwcon,
841 $ 2*n+lwqrf, 2*n+lwqp3,
842 $ 2*n+n**2+n+lwlqf, 2*n+n**2+n+n**2+lwcon,
843 $ 2*n+n**2+n+lwsvdj, 2*n+n**2+n+lwsvdjv,
844 $ 2*n+n**2+n+lwunmqr,2*n+n**2+n+lwunmlq,
845 $ n+n**2+lwsvdj, n+lwunmqrm )
846 END IF
847 miniwrk = miniwrk + n
848 IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
849 ELSE
850 IF ( errest ) THEN
851 minwrk = max( n+lwqp3, n+lwcon, 2*n+lwqrf,
852 $ 2*n+n**2+lwsvdjv, 2*n+n**2+n+lwunmqr,
853 $ n+lwunmqrm )
854 ELSE
855 minwrk = max( n+lwqp3, 2*n+lwqrf,
856 $ 2*n+n**2+lwsvdjv, 2*n+n**2+n+lwunmqr,
857 $ n+lwunmqrm )
858 END IF
859 IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
860 END IF
861 IF ( lquery ) THEN
862 CALL cunmqr( 'L', 'N', m, n, n, a, lda, cdummy, u,
863 $ ldu, cdummy, -1, ierr )
864 lwrk_cunmqrm = int( cdummy(1) )
865 CALL cunmqr( 'L', 'N', n, n, n, a, lda, cdummy, u,
866 $ ldu, cdummy, -1, ierr )
867 lwrk_cunmqr = int( cdummy(1) )
868 IF ( .NOT. jracc ) THEN
869 CALL cgeqp3( n,n, a, lda, iwork, cdummy,cdummy, -1,
870 $ rdummy, ierr )
871 lwrk_cgeqp3n = int( cdummy(1) )
872 CALL cgesvj( 'L', 'U', 'N', n, n, u, ldu, sva,
873 $ n, v, ldv, cdummy, -1, rdummy, -1, ierr )
874 lwrk_cgesvj = int( cdummy(1) )
875 CALL cgesvj( 'U', 'U', 'N', n, n, u, ldu, sva,
876 $ n, v, ldv, cdummy, -1, rdummy, -1, ierr )
877 lwrk_cgesvju = int( cdummy(1) )
878 CALL cgesvj( 'L', 'U', 'V', n, n, u, ldu, sva,
879 $ n, v, ldv, cdummy, -1, rdummy, -1, ierr )
880 lwrk_cgesvjv = int( cdummy(1) )
881 CALL cunmlq( 'L', 'C', n, n, n, a, lda, cdummy,
882 $ v, ldv, cdummy, -1, ierr )
883 lwrk_cunmlq = int( cdummy(1) )
884 IF ( errest ) THEN
885 optwrk = max( n+lwrk_cgeqp3, n+lwcon,
886 $ 2*n+n**2+lwcon, 2*n+lwrk_cgeqrf,
887 $ 2*n+lwrk_cgeqp3n,
888 $ 2*n+n**2+n+lwrk_cgelqf,
889 $ 2*n+n**2+n+n**2+lwcon,
890 $ 2*n+n**2+n+lwrk_cgesvj,
891 $ 2*n+n**2+n+lwrk_cgesvjv,
892 $ 2*n+n**2+n+lwrk_cunmqr,
893 $ 2*n+n**2+n+lwrk_cunmlq,
894 $ n+n**2+lwrk_cgesvju,
895 $ n+lwrk_cunmqrm )
896 ELSE
897 optwrk = max( n+lwrk_cgeqp3,
898 $ 2*n+n**2+lwcon, 2*n+lwrk_cgeqrf,
899 $ 2*n+lwrk_cgeqp3n,
900 $ 2*n+n**2+n+lwrk_cgelqf,
901 $ 2*n+n**2+n+n**2+lwcon,
902 $ 2*n+n**2+n+lwrk_cgesvj,
903 $ 2*n+n**2+n+lwrk_cgesvjv,
904 $ 2*n+n**2+n+lwrk_cunmqr,
905 $ 2*n+n**2+n+lwrk_cunmlq,
906 $ n+n**2+lwrk_cgesvju,
907 $ n+lwrk_cunmqrm )
908 END IF
909 ELSE
910 CALL cgesvj( 'L', 'U', 'V', n, n, u, ldu, sva,
911 $ n, v, ldv, cdummy, -1, rdummy, -1, ierr )
912 lwrk_cgesvjv = int( cdummy(1) )
913 CALL cunmqr( 'L', 'N', n, n, n, cdummy, n, cdummy,
914 $ v, ldv, cdummy, -1, ierr )
915 lwrk_cunmqr = int( cdummy(1) )
916 CALL cunmqr( 'L', 'N', m, n, n, a, lda, cdummy, u,
917 $ ldu, cdummy, -1, ierr )
918 lwrk_cunmqrm = int( cdummy(1) )
919 IF ( errest ) THEN
920 optwrk = max( n+lwrk_cgeqp3, n+lwcon,
921 $ 2*n+lwrk_cgeqrf, 2*n+n**2,
922 $ 2*n+n**2+lwrk_cgesvjv,
923 $ 2*n+n**2+n+lwrk_cunmqr,n+lwrk_cunmqrm )
924 ELSE
925 optwrk = max( n+lwrk_cgeqp3, 2*n+lwrk_cgeqrf,
926 $ 2*n+n**2, 2*n+n**2+lwrk_cgesvjv,
927 $ 2*n+n**2+n+lwrk_cunmqr,
928 $ n+lwrk_cunmqrm )
929 END IF
930 END IF
931 END IF
932 IF ( l2tran .OR. rowpiv ) THEN
933 minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj, lrwcon )
934 ELSE
935 minrwrk = max( 7, lrwqp3, lrwsvdj, lrwcon )
936 END IF
937 END IF
938 minwrk = max( 2, minwrk )
939 optwrk = max( optwrk, minwrk )
940 IF ( lwork .LT. minwrk .AND. (.NOT.lquery) ) info = - 17
941 IF ( lrwork .LT. minrwrk .AND. (.NOT.lquery) ) info = - 19
942 END IF
943*
944 IF ( info .NE. 0 ) THEN
945* #:(
946 CALL xerbla( 'CGEJSV', - info )
947 RETURN
948 ELSE IF ( lquery ) THEN
949 cwork(1) = optwrk
950 cwork(2) = minwrk
951 rwork(1) = minrwrk
952 iwork(1) = max( 4, miniwrk )
953 RETURN
954 END IF
955*
956* Quick return for void matrix (Y3K safe)
957* #:)
958 IF ( ( m .EQ. 0 ) .OR. ( n .EQ. 0 ) ) THEN
959 iwork(1:4) = 0
960 rwork(1:7) = 0
961 RETURN
962 ENDIF
963*
964* Determine whether the matrix U should be M x N or M x M
965*
966 IF ( lsvec ) THEN
967 n1 = n
968 IF ( lsame( jobu, 'F' ) ) n1 = m
969 END IF
970*
971* Set numerical parameters
972*
973*! NOTE: Make sure SLAMCH() does not fail on the target architecture.
974*
975 epsln = slamch('Epsilon')
976 sfmin = slamch('SafeMinimum')
977 small = sfmin / epsln
978 big = slamch('O')
979* BIG = ONE / SFMIN
980*
981* Initialize SVA(1:N) = diag( ||A e_i||_2 )_1^N
982*
983*(!) If necessary, scale SVA() to protect the largest norm from
984* overflow. It is possible that this scaling pushes the smallest
985* column norm left from the underflow threshold (extreme case).
986*
987 scalem = one / sqrt(real(m)*real(n))
988 noscal = .true.
989 goscal = .true.
990 DO 1874 p = 1, n
991 aapp = zero
992 aaqq = one
993 CALL classq( m, a(1,p), 1, aapp, aaqq )
994 IF ( aapp .GT. big ) THEN
995 info = - 9
996 CALL xerbla( 'CGEJSV', -info )
997 RETURN
998 END IF
999 aaqq = sqrt(aaqq)
1000 IF ( ( aapp .LT. (big / aaqq) ) .AND. noscal ) THEN
1001 sva(p) = aapp * aaqq
1002 ELSE
1003 noscal = .false.
1004 sva(p) = aapp * ( aaqq * scalem )
1005 IF ( goscal ) THEN
1006 goscal = .false.
1007 CALL sscal( p-1, scalem, sva, 1 )
1008 END IF
1009 END IF
1010 1874 CONTINUE
1011*
1012 IF ( noscal ) scalem = one
1013*
1014 aapp = zero
1015 aaqq = big
1016 DO 4781 p = 1, n
1017 aapp = max( aapp, sva(p) )
1018 IF ( sva(p) .NE. zero ) aaqq = min( aaqq, sva(p) )
1019 4781 CONTINUE
1020*
1021* Quick return for zero M x N matrix
1022* #:)
1023 IF ( aapp .EQ. zero ) THEN
1024 IF ( lsvec ) CALL claset( 'G', m, n1, czero, cone, u, ldu )
1025 IF ( rsvec ) CALL claset( 'G', n, n, czero, cone, v, ldv )
1026 rwork(1) = one
1027 rwork(2) = one
1028 IF ( errest ) rwork(3) = one
1029 IF ( lsvec .AND. rsvec ) THEN
1030 rwork(4) = one
1031 rwork(5) = one
1032 END IF
1033 IF ( l2tran ) THEN
1034 rwork(6) = zero
1035 rwork(7) = zero
1036 END IF
1037 iwork(1) = 0
1038 iwork(2) = 0
1039 iwork(3) = 0
1040 iwork(4) = -1
1041 RETURN
1042 END IF
1043*
1044* Issue warning if denormalized column norms detected. Override the
1045* high relative accuracy request. Issue licence to kill nonzero columns
1046* (set them to zero) whose norm is less than sigma_max / BIG (roughly).
1047* #:(
1048 warning = 0
1049 IF ( aaqq .LE. sfmin ) THEN
1050 l2rank = .true.
1051 l2kill = .true.
1052 warning = 1
1053 END IF
1054*
1055* Quick return for one-column matrix
1056* #:)
1057 IF ( n .EQ. 1 ) THEN
1058*
1059 IF ( lsvec ) THEN
1060 CALL clascl( 'G',0,0,sva(1),scalem, m,1,a(1,1),lda,ierr )
1061 CALL clacpy( 'A', m, 1, a, lda, u, ldu )
1062* computing all M left singular vectors of the M x 1 matrix
1063 IF ( n1 .NE. n ) THEN
1064 CALL cgeqrf( m, n, u,ldu, cwork, cwork(n+1),lwork-n,ierr )
1065 CALL cungqr( m,n1,1, u,ldu,cwork,cwork(n+1),lwork-n,ierr )
1066 CALL ccopy( m, a(1,1), 1, u(1,1), 1 )
1067 END IF
1068 END IF
1069 IF ( rsvec ) THEN
1070 v(1,1) = cone
1071 END IF
1072 IF ( sva(1) .LT. (big*scalem) ) THEN
1073 sva(1) = sva(1) / scalem
1074 scalem = one
1075 END IF
1076 rwork(1) = one / scalem
1077 rwork(2) = one
1078 IF ( sva(1) .NE. zero ) THEN
1079 iwork(1) = 1
1080 IF ( ( sva(1) / scalem) .GE. sfmin ) THEN
1081 iwork(2) = 1
1082 ELSE
1083 iwork(2) = 0
1084 END IF
1085 ELSE
1086 iwork(1) = 0
1087 iwork(2) = 0
1088 END IF
1089 iwork(3) = 0
1090 iwork(4) = -1
1091 IF ( errest ) rwork(3) = one
1092 IF ( lsvec .AND. rsvec ) THEN
1093 rwork(4) = one
1094 rwork(5) = one
1095 END IF
1096 IF ( l2tran ) THEN
1097 rwork(6) = zero
1098 rwork(7) = zero
1099 END IF
1100 RETURN
1101*
1102 END IF
1103*
1104 transp = .false.
1105*
1106 aatmax = -one
1107 aatmin = big
1108 IF ( rowpiv .OR. l2tran ) THEN
1109*
1110* Compute the row norms, needed to determine row pivoting sequence
1111* (in the case of heavily row weighted A, row pivoting is strongly
1112* advised) and to collect information needed to compare the
1113* structures of A * A^* and A^* * A (in the case L2TRAN.EQ..TRUE.).
1114*
1115 IF ( l2tran ) THEN
1116 DO 1950 p = 1, m
1117 xsc = zero
1118 temp1 = one
1119 CALL classq( n, a(p,1), lda, xsc, temp1 )
1120* CLASSQ gets both the ell_2 and the ell_infinity norm
1121* in one pass through the vector
1122 rwork(m+p) = xsc * scalem
1123 rwork(p) = xsc * (scalem*sqrt(temp1))
1124 aatmax = max( aatmax, rwork(p) )
1125 IF (rwork(p) .NE. zero)
1126 $ aatmin = min(aatmin,rwork(p))
1127 1950 CONTINUE
1128 ELSE
1129 DO 1904 p = 1, m
1130 rwork(m+p) = scalem*abs( a(p,icamax(n,a(p,1),lda)) )
1131 aatmax = max( aatmax, rwork(m+p) )
1132 aatmin = min( aatmin, rwork(m+p) )
1133 1904 CONTINUE
1134 END IF
1135*
1136 END IF
1137*
1138* For square matrix A try to determine whether A^* would be better
1139* input for the preconditioned Jacobi SVD, with faster convergence.
1140* The decision is based on an O(N) function of the vector of column
1141* and row norms of A, based on the Shannon entropy. This should give
1142* the right choice in most cases when the difference actually matters.
1143* It may fail and pick the slower converging side.
1144*
1145 entra = zero
1146 entrat = zero
1147 IF ( l2tran ) THEN
1148*
1149 xsc = zero
1150 temp1 = one
1151 CALL slassq( n, sva, 1, xsc, temp1 )
1152 temp1 = one / temp1
1153*
1154 entra = zero
1155 DO 1113 p = 1, n
1156 big1 = ( ( sva(p) / xsc )**2 ) * temp1
1157 IF ( big1 .NE. zero ) entra = entra + big1 * alog(big1)
1158 1113 CONTINUE
1159 entra = - entra / alog(real(n))
1160*
1161* Now, SVA().^2/Trace(A^* * A) is a point in the probability simplex.
1162* It is derived from the diagonal of A^* * A. Do the same with the
1163* diagonal of A * A^*, compute the entropy of the corresponding
1164* probability distribution. Note that A * A^* and A^* * A have the
1165* same trace.
1166*
1167 entrat = zero
1168 DO 1114 p = 1, m
1169 big1 = ( ( rwork(p) / xsc )**2 ) * temp1
1170 IF ( big1 .NE. zero ) entrat = entrat + big1 * alog(big1)
1171 1114 CONTINUE
1172 entrat = - entrat / alog(real(m))
1173*
1174* Analyze the entropies and decide A or A^*. Smaller entropy
1175* usually means better input for the algorithm.
1176*
1177 transp = ( entrat .LT. entra )
1178*
1179* If A^* is better than A, take the adjoint of A. This is allowed
1180* only for square matrices, M=N.
1181 IF ( transp ) THEN
1182* In an optimal implementation, this trivial transpose
1183* should be replaced with faster transpose.
1184 DO 1115 p = 1, n - 1
1185 a(p,p) = conjg(a(p,p))
1186 DO 1116 q = p + 1, n
1187 ctemp = conjg(a(q,p))
1188 a(q,p) = conjg(a(p,q))
1189 a(p,q) = ctemp
1190 1116 CONTINUE
1191 1115 CONTINUE
1192 a(n,n) = conjg(a(n,n))
1193 DO 1117 p = 1, n
1194 rwork(m+p) = sva(p)
1195 sva(p) = rwork(p)
1196* previously computed row 2-norms are now column 2-norms
1197* of the transposed matrix
1198 1117 CONTINUE
1199 temp1 = aapp
1200 aapp = aatmax
1201 aatmax = temp1
1202 temp1 = aaqq
1203 aaqq = aatmin
1204 aatmin = temp1
1205 kill = lsvec
1206 lsvec = rsvec
1207 rsvec = kill
1208 IF ( lsvec ) n1 = n
1209*
1210 rowpiv = .true.
1211 END IF
1212*
1213 END IF
1214* END IF L2TRAN
1215*
1216* Scale the matrix so that its maximal singular value remains less
1217* than SQRT(BIG) -- the matrix is scaled so that its maximal column
1218* has Euclidean norm equal to SQRT(BIG/N). The only reason to keep
1219* SQRT(BIG) instead of BIG is the fact that CGEJSV uses LAPACK and
1220* BLAS routines that, in some implementations, are not capable of
1221* working in the full interval [SFMIN,BIG] and that they may provoke
1222* overflows in the intermediate results. If the singular values spread
1223* from SFMIN to BIG, then CGESVJ will compute them. So, in that case,
1224* one should use CGESVJ instead of CGEJSV.
1225 big1 = sqrt( big )
1226 temp1 = sqrt( big / real(n) )
1227* >> for future updates: allow bigger range, i.e. the largest column
1228* will be allowed up to BIG/N and CGESVJ will do the rest. However, for
1229* this all other (LAPACK) components must allow such a range.
1230* TEMP1 = BIG/REAL(N)
1231* TEMP1 = BIG * EPSLN this should 'almost' work with current LAPACK components
1232 CALL slascl( 'G', 0, 0, aapp, temp1, n, 1, sva, n, ierr )
1233 IF ( aaqq .GT. (aapp * sfmin) ) THEN
1234 aaqq = ( aaqq / aapp ) * temp1
1235 ELSE
1236 aaqq = ( aaqq * temp1 ) / aapp
1237 END IF
1238 temp1 = temp1 * scalem
1239 CALL clascl( 'G', 0, 0, aapp, temp1, m, n, a, lda, ierr )
1240*
1241* To undo scaling at the end of this procedure, multiply the
1242* computed singular values with USCAL2 / USCAL1.
1243*
1244 uscal1 = temp1
1245 uscal2 = aapp
1246*
1247 IF ( l2kill ) THEN
1248* L2KILL enforces computation of nonzero singular values in
1249* the restricted range of condition number of the initial A,
1250* sigma_max(A) / sigma_min(A) approx. SQRT(BIG)/SQRT(SFMIN).
1251 xsc = sqrt( sfmin )
1252 ELSE
1253 xsc = small
1254*
1255* Now, if the condition number of A is too big,
1256* sigma_max(A) / sigma_min(A) .GT. SQRT(BIG/N) * EPSLN / SFMIN,
1257* as a precaution measure, the full SVD is computed using CGESVJ
1258* with accumulated Jacobi rotations. This provides numerically
1259* more robust computation, at the cost of slightly increased run
1260* time. Depending on the concrete implementation of BLAS and LAPACK
1261* (i.e. how they behave in presence of extreme ill-conditioning) the
1262* implementor may decide to remove this switch.
1263 IF ( ( aaqq.LT.sqrt(sfmin) ) .AND. lsvec .AND. rsvec ) THEN
1264 jracc = .true.
1265 END IF
1266*
1267 END IF
1268 IF ( aaqq .LT. xsc ) THEN
1269 DO 700 p = 1, n
1270 IF ( sva(p) .LT. xsc ) THEN
1271 CALL claset( 'A', m, 1, czero, czero, a(1,p), lda )
1272 sva(p) = zero
1273 END IF
1274 700 CONTINUE
1275 END IF
1276*
1277* Preconditioning using QR factorization with pivoting
1278*
1279 IF ( rowpiv ) THEN
1280* Optional row permutation (Bjoerck row pivoting):
1281* A result by Cox and Higham shows that the Bjoerck's
1282* row pivoting combined with standard column pivoting
1283* has similar effect as Powell-Reid complete pivoting.
1284* The ell-infinity norms of A are made nonincreasing.
1285 IF ( ( lsvec .AND. rsvec ) .AND. .NOT.( jracc ) ) THEN
1286 iwoff = 2*n
1287 ELSE
1288 iwoff = n
1289 END IF
1290 DO 1952 p = 1, m - 1
1291 q = isamax( m-p+1, rwork(m+p), 1 ) + p - 1
1292 iwork(iwoff+p) = q
1293 IF ( p .NE. q ) THEN
1294 temp1 = rwork(m+p)
1295 rwork(m+p) = rwork(m+q)
1296 rwork(m+q) = temp1
1297 END IF
1298 1952 CONTINUE
1299 CALL claswp( n, a, lda, 1, m-1, iwork(iwoff+1), 1 )
1300 END IF
1301*
1302* End of the preparation phase (scaling, optional sorting and
1303* transposing, optional flushing of small columns).
1304*
1305* Preconditioning
1306*
1307* If the full SVD is needed, the right singular vectors are computed
1308* from a matrix equation, and for that we need theoretical analysis
1309* of the Businger-Golub pivoting. So we use CGEQP3 as the first RR QRF.
1310* In all other cases the first RR QRF can be chosen by other criteria
1311* (eg speed by replacing global with restricted window pivoting, such
1312* as in xGEQPX from TOMS # 782). Good results will be obtained using
1313* xGEQPX with properly (!) chosen numerical parameters.
1314* Any improvement of CGEQP3 improves overall performance of CGEJSV.
1315*
1316* A * P1 = Q1 * [ R1^* 0]^*:
1317 DO 1963 p = 1, n
1318* .. all columns are free columns
1319 iwork(p) = 0
1320 1963 CONTINUE
1321 CALL cgeqp3( m, n, a, lda, iwork, cwork, cwork(n+1), lwork-n,
1322 $ rwork, ierr )
1323*
1324* The upper triangular matrix R1 from the first QRF is inspected for
1325* rank deficiency and possibilities for deflation, or possible
1326* ill-conditioning. Depending on the user specified flag L2RANK,
1327* the procedure explores possibilities to reduce the numerical
1328* rank by inspecting the computed upper triangular factor. If
1329* L2RANK or L2ABER are up, then CGEJSV will compute the SVD of
1330* A + dA, where ||dA|| <= f(M,N)*EPSLN.
1331*
1332 nr = 1
1333 IF ( l2aber ) THEN
1334* Standard absolute error bound suffices. All sigma_i with
1335* sigma_i < N*EPSLN*||A|| are flushed to zero. This is an
1336* aggressive enforcement of lower numerical rank by introducing a
1337* backward error of the order of N*EPSLN*||A||.
1338 temp1 = sqrt(real(n))*epsln
1339 DO 3001 p = 2, n
1340 IF ( abs(a(p,p)) .GE. (temp1*abs(a(1,1))) ) THEN
1341 nr = nr + 1
1342 ELSE
1343 GO TO 3002
1344 END IF
1345 3001 CONTINUE
1346 3002 CONTINUE
1347 ELSE IF ( l2rank ) THEN
1348* .. similarly as above, only slightly more gentle (less aggressive).
1349* Sudden drop on the diagonal of R1 is used as the criterion for
1350* close-to-rank-deficient.
1351 temp1 = sqrt(sfmin)
1352 DO 3401 p = 2, n
1353 IF ( ( abs(a(p,p)) .LT. (epsln*abs(a(p-1,p-1))) ) .OR.
1354 $ ( abs(a(p,p)) .LT. small ) .OR.
1355 $ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3402
1356 nr = nr + 1
1357 3401 CONTINUE
1358 3402 CONTINUE
1359*
1360 ELSE
1361* The goal is high relative accuracy. However, if the matrix
1362* has high scaled condition number the relative accuracy is in
1363* general not feasible. Later on, a condition number estimator
1364* will be deployed to estimate the scaled condition number.
1365* Here we just remove the underflowed part of the triangular
1366* factor. This prevents the situation in which the code is
1367* working hard to get the accuracy not warranted by the data.
1368 temp1 = sqrt(sfmin)
1369 DO 3301 p = 2, n
1370 IF ( ( abs(a(p,p)) .LT. small ) .OR.
1371 $ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3302
1372 nr = nr + 1
1373 3301 CONTINUE
1374 3302 CONTINUE
1375*
1376 END IF
1377*
1378 almort = .false.
1379 IF ( nr .EQ. n ) THEN
1380 maxprj = one
1381 DO 3051 p = 2, n
1382 temp1 = abs(a(p,p)) / sva(iwork(p))
1383 maxprj = min( maxprj, temp1 )
1384 3051 CONTINUE
1385 IF ( maxprj**2 .GE. one - real(n)*epsln ) almort = .true.
1386 END IF
1387*
1388*
1389 sconda = - one
1390 condr1 = - one
1391 condr2 = - one
1392*
1393 IF ( errest ) THEN
1394 IF ( n .EQ. nr ) THEN
1395 IF ( rsvec ) THEN
1396* .. V is available as workspace
1397 CALL clacpy( 'U', n, n, a, lda, v, ldv )
1398 DO 3053 p = 1, n
1399 temp1 = sva(iwork(p))
1400 CALL csscal( p, one/temp1, v(1,p), 1 )
1401 3053 CONTINUE
1402 IF ( lsvec )THEN
1403 CALL cpocon( 'U', n, v, ldv, one, temp1,
1404 $ cwork(n+1), rwork, ierr )
1405 ELSE
1406 CALL cpocon( 'U', n, v, ldv, one, temp1,
1407 $ cwork, rwork, ierr )
1408 END IF
1409*
1410 ELSE IF ( lsvec ) THEN
1411* .. U is available as workspace
1412 CALL clacpy( 'U', n, n, a, lda, u, ldu )
1413 DO 3054 p = 1, n
1414 temp1 = sva(iwork(p))
1415 CALL csscal( p, one/temp1, u(1,p), 1 )
1416 3054 CONTINUE
1417 CALL cpocon( 'U', n, u, ldu, one, temp1,
1418 $ cwork(n+1), rwork, ierr )
1419 ELSE
1420 CALL clacpy( 'U', n, n, a, lda, cwork, n )
1421*[] CALL CLACPY( 'U', N, N, A, LDA, CWORK(N+1), N )
1422* Change: here index shifted by N to the left, CWORK(1:N)
1423* not needed for SIGMA only computation
1424 DO 3052 p = 1, n
1425 temp1 = sva(iwork(p))
1426*[] CALL CSSCAL( p, ONE/TEMP1, CWORK(N+(p-1)*N+1), 1 )
1427 CALL csscal( p, one/temp1, cwork((p-1)*n+1), 1 )
1428 3052 CONTINUE
1429* .. the columns of R are scaled to have unit Euclidean lengths.
1430*[] CALL CPOCON( 'U', N, CWORK(N+1), N, ONE, TEMP1,
1431*[] $ CWORK(N+N*N+1), RWORK, IERR )
1432 CALL cpocon( 'U', n, cwork, n, one, temp1,
1433 $ cwork(n*n+1), rwork, ierr )
1434*
1435 END IF
1436 IF ( temp1 .NE. zero ) THEN
1437 sconda = one / sqrt(temp1)
1438 ELSE
1439 sconda = - one
1440 END IF
1441* SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1).
1442* N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
1443 ELSE
1444 sconda = - one
1445 END IF
1446 END IF
1447*
1448 l2pert = l2pert .AND. ( abs( a(1,1)/a(nr,nr) ) .GT. sqrt(big1) )
1449* If there is no violent scaling, artificial perturbation is not needed.
1450*
1451* Phase 3:
1452*
1453 IF ( .NOT. ( rsvec .OR. lsvec ) ) THEN
1454*
1455* Singular Values only
1456*
1457* .. transpose A(1:NR,1:N)
1458 DO 1946 p = 1, min( n-1, nr )
1459 CALL ccopy( n-p, a(p,p+1), lda, a(p+1,p), 1 )
1460 CALL clacgv( n-p+1, a(p,p), 1 )
1461 1946 CONTINUE
1462 IF ( nr .EQ. n ) a(n,n) = conjg(a(n,n))
1463*
1464* The following two DO-loops introduce small relative perturbation
1465* into the strict upper triangle of the lower triangular matrix.
1466* Small entries below the main diagonal are also changed.
1467* This modification is useful if the computing environment does not
1468* provide/allow FLUSH TO ZERO underflow, for it prevents many
1469* annoying denormalized numbers in case of strongly scaled matrices.
1470* The perturbation is structured so that it does not introduce any
1471* new perturbation of the singular values, and it does not destroy
1472* the job done by the preconditioner.
1473* The licence for this perturbation is in the variable L2PERT, which
1474* should be .FALSE. if FLUSH TO ZERO underflow is active.
1475*
1476 IF ( .NOT. almort ) THEN
1477*
1478 IF ( l2pert ) THEN
1479* XSC = SQRT(SMALL)
1480 xsc = epsln / real(n)
1481 DO 4947 q = 1, nr
1482 ctemp = cmplx(xsc*abs(a(q,q)),zero)
1483 DO 4949 p = 1, n
1484 IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1485 $ .OR. ( p .LT. q ) )
1486* $ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) )
1487 $ a(p,q) = ctemp
1488 4949 CONTINUE
1489 4947 CONTINUE
1490 ELSE
1491 CALL claset( 'U', nr-1,nr-1, czero,czero, a(1,2),lda )
1492 END IF
1493*
1494* .. second preconditioning using the QR factorization
1495*
1496 CALL cgeqrf( n,nr, a,lda, cwork, cwork(n+1),lwork-n, ierr )
1497*
1498* .. and transpose upper to lower triangular
1499 DO 1948 p = 1, nr - 1
1500 CALL ccopy( nr-p, a(p,p+1), lda, a(p+1,p), 1 )
1501 CALL clacgv( nr-p+1, a(p,p), 1 )
1502 1948 CONTINUE
1503*
1504 END IF
1505*
1506* Row-cyclic Jacobi SVD algorithm with column pivoting
1507*
1508* .. again some perturbation (a "background noise") is added
1509* to drown denormals
1510 IF ( l2pert ) THEN
1511* XSC = SQRT(SMALL)
1512 xsc = epsln / real(n)
1513 DO 1947 q = 1, nr
1514 ctemp = cmplx(xsc*abs(a(q,q)),zero)
1515 DO 1949 p = 1, nr
1516 IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1517 $ .OR. ( p .LT. q ) )
1518* $ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) )
1519 $ a(p,q) = ctemp
1520 1949 CONTINUE
1521 1947 CONTINUE
1522 ELSE
1523 CALL claset( 'U', nr-1, nr-1, czero, czero, a(1,2), lda )
1524 END IF
1525*
1526* .. and one-sided Jacobi rotations are started on a lower
1527* triangular matrix (plus perturbation which is ignored in
1528* the part which destroys triangular form (confusing?!))
1529*
1530 CALL cgesvj( 'L', 'N', 'N', nr, nr, a, lda, sva,
1531 $ n, v, ldv, cwork, lwork, rwork, lrwork, info )
1532*
1533 scalem = rwork(1)
1534 numrank = nint(rwork(2))
1535*
1536*
1537 ELSE IF ( ( rsvec .AND. ( .NOT. lsvec ) .AND. ( .NOT. jracc ) )
1538 $ .OR.
1539 $ ( jracc .AND. ( .NOT. lsvec ) .AND. ( nr .NE. n ) ) ) THEN
1540*
1541* -> Singular Values and Right Singular Vectors <-
1542*
1543 IF ( almort ) THEN
1544*
1545* .. in this case NR equals N
1546 DO 1998 p = 1, nr
1547 CALL ccopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1548 CALL clacgv( n-p+1, v(p,p), 1 )
1549 1998 CONTINUE
1550 CALL claset( 'U', nr-1,nr-1, czero, czero, v(1,2), ldv )
1551*
1552 CALL cgesvj( 'L','U','N', n, nr, v, ldv, sva, nr, a, lda,
1553 $ cwork, lwork, rwork, lrwork, info )
1554 scalem = rwork(1)
1555 numrank = nint(rwork(2))
1556
1557 ELSE
1558*
1559* .. two more QR factorizations ( one QRF is not enough, two require
1560* accumulated product of Jacobi rotations, three are perfect )
1561*
1562 CALL claset( 'L', nr-1,nr-1, czero, czero, a(2,1), lda )
1563 CALL cgelqf( nr,n, a, lda, cwork, cwork(n+1), lwork-n, ierr)
1564 CALL clacpy( 'L', nr, nr, a, lda, v, ldv )
1565 CALL claset( 'U', nr-1,nr-1, czero, czero, v(1,2), ldv )
1566 CALL cgeqrf( nr, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1567 $ lwork-2*n, ierr )
1568 DO 8998 p = 1, nr
1569 CALL ccopy( nr-p+1, v(p,p), ldv, v(p,p), 1 )
1570 CALL clacgv( nr-p+1, v(p,p), 1 )
1571 8998 CONTINUE
1572 CALL claset('U', nr-1, nr-1, czero, czero, v(1,2), ldv)
1573*
1574 CALL cgesvj( 'L', 'U','N', nr, nr, v,ldv, sva, nr, u,
1575 $ ldu, cwork(n+1), lwork-n, rwork, lrwork, info )
1576 scalem = rwork(1)
1577 numrank = nint(rwork(2))
1578 IF ( nr .LT. n ) THEN
1579 CALL claset( 'A',n-nr, nr, czero,czero, v(nr+1,1), ldv )
1580 CALL claset( 'A',nr, n-nr, czero,czero, v(1,nr+1), ldv )
1581 CALL claset( 'A',n-nr,n-nr,czero,cone, v(nr+1,nr+1),ldv )
1582 END IF
1583*
1584 CALL cunmlq( 'L', 'C', n, n, nr, a, lda, cwork,
1585 $ v, ldv, cwork(n+1), lwork-n, ierr )
1586*
1587 END IF
1588* .. permute the rows of V
1589* DO 8991 p = 1, N
1590* CALL CCOPY( N, V(p,1), LDV, A(IWORK(p),1), LDA )
1591* 8991 CONTINUE
1592* CALL CLACPY( 'All', N, N, A, LDA, V, LDV )
1593 CALL clapmr( .false., n, n, v, ldv, iwork )
1594*
1595 IF ( transp ) THEN
1596 CALL clacpy( 'A', n, n, v, ldv, u, ldu )
1597 END IF
1598*
1599 ELSE IF ( jracc .AND. (.NOT. lsvec) .AND. ( nr.EQ. n ) ) THEN
1600*
1601 CALL claset( 'L', n-1,n-1, czero, czero, a(2,1), lda )
1602*
1603 CALL cgesvj( 'U','N','V', n, n, a, lda, sva, n, v, ldv,
1604 $ cwork, lwork, rwork, lrwork, info )
1605 scalem = rwork(1)
1606 numrank = nint(rwork(2))
1607 CALL clapmr( .false., n, n, v, ldv, iwork )
1608*
1609 ELSE IF ( lsvec .AND. ( .NOT. rsvec ) ) THEN
1610*
1611* .. Singular Values and Left Singular Vectors ..
1612*
1613* .. second preconditioning step to avoid need to accumulate
1614* Jacobi rotations in the Jacobi iterations.
1615 DO 1965 p = 1, nr
1616 CALL ccopy( n-p+1, a(p,p), lda, u(p,p), 1 )
1617 CALL clacgv( n-p+1, u(p,p), 1 )
1618 1965 CONTINUE
1619 CALL claset( 'U', nr-1, nr-1, czero, czero, u(1,2), ldu )
1620*
1621 CALL cgeqrf( n, nr, u, ldu, cwork(n+1), cwork(2*n+1),
1622 $ lwork-2*n, ierr )
1623*
1624 DO 1967 p = 1, nr - 1
1625 CALL ccopy( nr-p, u(p,p+1), ldu, u(p+1,p), 1 )
1626 CALL clacgv( n-p+1, u(p,p), 1 )
1627 1967 CONTINUE
1628 CALL claset( 'U', nr-1, nr-1, czero, czero, u(1,2), ldu )
1629*
1630 CALL cgesvj( 'L', 'U', 'N', nr,nr, u, ldu, sva, nr, a,
1631 $ lda, cwork(n+1), lwork-n, rwork, lrwork, info )
1632 scalem = rwork(1)
1633 numrank = nint(rwork(2))
1634*
1635 IF ( nr .LT. m ) THEN
1636 CALL claset( 'A', m-nr, nr,czero, czero, u(nr+1,1), ldu )
1637 IF ( nr .LT. n1 ) THEN
1638 CALL claset( 'A',nr, n1-nr, czero, czero, u(1,nr+1),ldu )
1639 CALL claset( 'A',m-nr,n1-nr,czero,cone,u(nr+1,nr+1),ldu )
1640 END IF
1641 END IF
1642*
1643 CALL cunmqr( 'L', 'N', m, n1, n, a, lda, cwork, u,
1644 $ ldu, cwork(n+1), lwork-n, ierr )
1645*
1646 IF ( rowpiv )
1647 $ CALL claswp( n1, u, ldu, 1, m-1, iwork(iwoff+1), -1 )
1648*
1649 DO 1974 p = 1, n1
1650 xsc = one / scnrm2( m, u(1,p), 1 )
1651 CALL csscal( m, xsc, u(1,p), 1 )
1652 1974 CONTINUE
1653*
1654 IF ( transp ) THEN
1655 CALL clacpy( 'A', n, n, u, ldu, v, ldv )
1656 END IF
1657*
1658 ELSE
1659*
1660* .. Full SVD ..
1661*
1662 IF ( .NOT. jracc ) THEN
1663*
1664 IF ( .NOT. almort ) THEN
1665*
1666* Second Preconditioning Step (QRF [with pivoting])
1667* Note that the composition of TRANSPOSE, QRF and TRANSPOSE is
1668* equivalent to an LQF CALL. Since in many libraries the QRF
1669* seems to be better optimized than the LQF, we do explicit
1670* transpose and use the QRF. This is subject to changes in an
1671* optimized implementation of CGEJSV.
1672*
1673 DO 1968 p = 1, nr
1674 CALL ccopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1675 CALL clacgv( n-p+1, v(p,p), 1 )
1676 1968 CONTINUE
1677*
1678* .. the following two loops perturb small entries to avoid
1679* denormals in the second QR factorization, where they are
1680* as good as zeros. This is done to avoid painfully slow
1681* computation with denormals. The relative size of the perturbation
1682* is a parameter that can be changed by the implementer.
1683* This perturbation device will be obsolete on machines with
1684* properly implemented arithmetic.
1685* To switch it off, set L2PERT=.FALSE. To remove it from the
1686* code, remove the action under L2PERT=.TRUE., leave the ELSE part.
1687* The following two loops should be blocked and fused with the
1688* transposed copy above.
1689*
1690 IF ( l2pert ) THEN
1691 xsc = sqrt(small)
1692 DO 2969 q = 1, nr
1693 ctemp = cmplx(xsc*abs( v(q,q) ),zero)
1694 DO 2968 p = 1, n
1695 IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
1696 $ .OR. ( p .LT. q ) )
1697* $ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) )
1698 $ v(p,q) = ctemp
1699 IF ( p .LT. q ) v(p,q) = - v(p,q)
1700 2968 CONTINUE
1701 2969 CONTINUE
1702 ELSE
1703 CALL claset( 'U', nr-1, nr-1, czero, czero, v(1,2), ldv )
1704 END IF
1705*
1706* Estimate the row scaled condition number of R1
1707* (If R1 is rectangular, N > NR, then the condition number
1708* of the leading NR x NR submatrix is estimated.)
1709*
1710 CALL clacpy( 'L', nr, nr, v, ldv, cwork(2*n+1), nr )
1711 DO 3950 p = 1, nr
1712 temp1 = scnrm2(nr-p+1,cwork(2*n+(p-1)*nr+p),1)
1713 CALL csscal(nr-p+1,one/temp1,cwork(2*n+(p-1)*nr+p),1)
1714 3950 CONTINUE
1715 CALL cpocon('L',nr,cwork(2*n+1),nr,one,temp1,
1716 $ cwork(2*n+nr*nr+1),rwork,ierr)
1717 condr1 = one / sqrt(temp1)
1718* .. here need a second opinion on the condition number
1719* .. then assume worst case scenario
1720* R1 is OK for inverse <=> CONDR1 .LT. REAL(N)
1721* more conservative <=> CONDR1 .LT. SQRT(REAL(N))
1722*
1723 cond_ok = sqrt(sqrt(real(nr)))
1724*[TP] COND_OK is a tuning parameter.
1725*
1726 IF ( condr1 .LT. cond_ok ) THEN
1727* .. the second QRF without pivoting. Note: in an optimized
1728* implementation, this QRF should be implemented as the QRF
1729* of a lower triangular matrix.
1730* R1^* = Q2 * R2
1731 CALL cgeqrf( n, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1732 $ lwork-2*n, ierr )
1733*
1734 IF ( l2pert ) THEN
1735 xsc = sqrt(small)/epsln
1736 DO 3959 p = 2, nr
1737 DO 3958 q = 1, p - 1
1738 ctemp=cmplx(xsc*min(abs(v(p,p)),abs(v(q,q))),
1739 $ zero)
1740 IF ( abs(v(q,p)) .LE. temp1 )
1741* $ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) )
1742 $ v(q,p) = ctemp
1743 3958 CONTINUE
1744 3959 CONTINUE
1745 END IF
1746*
1747 IF ( nr .NE. n )
1748 $ CALL clacpy( 'A', n, nr, v, ldv, cwork(2*n+1), n )
1749* .. save ...
1750*
1751* .. this transposed copy should be better than naive
1752 DO 1969 p = 1, nr - 1
1753 CALL ccopy( nr-p, v(p,p+1), ldv, v(p+1,p), 1 )
1754 CALL clacgv(nr-p+1, v(p,p), 1 )
1755 1969 CONTINUE
1756 v(nr,nr)=conjg(v(nr,nr))
1757*
1758 condr2 = condr1
1759*
1760 ELSE
1761*
1762* .. ill-conditioned case: second QRF with pivoting
1763* Note that windowed pivoting would be equally good
1764* numerically, and more run-time efficient. So, in
1765* an optimal implementation, the next call to CGEQP3
1766* should be replaced with eg. CALL CGEQPX (ACM TOMS #782)
1767* with properly (carefully) chosen parameters.
1768*
1769* R1^* * P2 = Q2 * R2
1770 DO 3003 p = 1, nr
1771 iwork(n+p) = 0
1772 3003 CONTINUE
1773 CALL cgeqp3( n, nr, v, ldv, iwork(n+1), cwork(n+1),
1774 $ cwork(2*n+1), lwork-2*n, rwork, ierr )
1775** CALL CGEQRF( N, NR, V, LDV, CWORK(N+1), CWORK(2*N+1),
1776** $ LWORK-2*N, IERR )
1777 IF ( l2pert ) THEN
1778 xsc = sqrt(small)
1779 DO 3969 p = 2, nr
1780 DO 3968 q = 1, p - 1
1781 ctemp=cmplx(xsc*min(abs(v(p,p)),abs(v(q,q))),
1782 $ zero)
1783 IF ( abs(v(q,p)) .LE. temp1 )
1784* $ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) )
1785 $ v(q,p) = ctemp
1786 3968 CONTINUE
1787 3969 CONTINUE
1788 END IF
1789*
1790 CALL clacpy( 'A', n, nr, v, ldv, cwork(2*n+1), n )
1791*
1792 IF ( l2pert ) THEN
1793 xsc = sqrt(small)
1794 DO 8970 p = 2, nr
1795 DO 8971 q = 1, p - 1
1796 ctemp=cmplx(xsc*min(abs(v(p,p)),abs(v(q,q))),
1797 $ zero)
1798* V(p,q) = - TEMP1*( V(q,p) / ABS(V(q,p)) )
1799 v(p,q) = - ctemp
1800 8971 CONTINUE
1801 8970 CONTINUE
1802 ELSE
1803 CALL claset( 'L',nr-1,nr-1,czero,czero,v(2,1),ldv )
1804 END IF
1805* Now, compute R2 = L3 * Q3, the LQ factorization.
1806 CALL cgelqf( nr, nr, v, ldv, cwork(2*n+n*nr+1),
1807 $ cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, ierr )
1808* .. and estimate the condition number
1809 CALL clacpy( 'L',nr,nr,v,ldv,cwork(2*n+n*nr+nr+1),nr )
1810 DO 4950 p = 1, nr
1811 temp1 = scnrm2( p, cwork(2*n+n*nr+nr+p), nr )
1812 CALL csscal( p, one/temp1, cwork(2*n+n*nr+nr+p), nr )
1813 4950 CONTINUE
1814 CALL cpocon( 'L',nr,cwork(2*n+n*nr+nr+1),nr,one,temp1,
1815 $ cwork(2*n+n*nr+nr+nr*nr+1),rwork,ierr )
1816 condr2 = one / sqrt(temp1)
1817*
1818*
1819 IF ( condr2 .GE. cond_ok ) THEN
1820* .. save the Householder vectors used for Q3
1821* (this overwrites the copy of R2, as it will not be
1822* needed in this branch, but it does not overwrite the
1823* Huseholder vectors of Q2.).
1824 CALL clacpy( 'U', nr, nr, v, ldv, cwork(2*n+1), n )
1825* .. and the rest of the information on Q3 is in
1826* WORK(2*N+N*NR+1:2*N+N*NR+N)
1827 END IF
1828*
1829 END IF
1830*
1831 IF ( l2pert ) THEN
1832 xsc = sqrt(small)
1833 DO 4968 q = 2, nr
1834 ctemp = xsc * v(q,q)
1835 DO 4969 p = 1, q - 1
1836* V(p,q) = - TEMP1*( V(p,q) / ABS(V(p,q)) )
1837 v(p,q) = - ctemp
1838 4969 CONTINUE
1839 4968 CONTINUE
1840 ELSE
1841 CALL claset( 'U', nr-1,nr-1, czero,czero, v(1,2), ldv )
1842 END IF
1843*
1844* Second preconditioning finished; continue with Jacobi SVD
1845* The input matrix is lower triangular.
1846*
1847* Recover the right singular vectors as solution of a well
1848* conditioned triangular matrix equation.
1849*
1850 IF ( condr1 .LT. cond_ok ) THEN
1851*
1852 CALL cgesvj( 'L','U','N',nr,nr,v,ldv,sva,nr,u, ldu,
1853 $ cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,rwork,
1854 $ lrwork, info )
1855 scalem = rwork(1)
1856 numrank = nint(rwork(2))
1857 DO 3970 p = 1, nr
1858 CALL ccopy( nr, v(1,p), 1, u(1,p), 1 )
1859 CALL csscal( nr, sva(p), v(1,p), 1 )
1860 3970 CONTINUE
1861
1862* .. pick the right matrix equation and solve it
1863*
1864 IF ( nr .EQ. n ) THEN
1865* :)) .. best case, R1 is inverted. The solution of this matrix
1866* equation is Q2*V2 = the product of the Jacobi rotations
1867* used in CGESVJ, premultiplied with the orthogonal matrix
1868* from the second QR factorization.
1869 CALL ctrsm('L','U','N','N', nr,nr,cone, a,lda, v,ldv)
1870 ELSE
1871* .. R1 is well conditioned, but non-square. Adjoint of R2
1872* is inverted to get the product of the Jacobi rotations
1873* used in CGESVJ. The Q-factor from the second QR
1874* factorization is then built in explicitly.
1875 CALL ctrsm('L','U','C','N',nr,nr,cone,cwork(2*n+1),
1876 $ n,v,ldv)
1877 IF ( nr .LT. n ) THEN
1878 CALL claset('A',n-nr,nr,czero,czero,v(nr+1,1),ldv)
1879 CALL claset('A',nr,n-nr,czero,czero,v(1,nr+1),ldv)
1880 CALL claset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1881 END IF
1882 CALL cunmqr('L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1883 $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr)
1884 END IF
1885*
1886 ELSE IF ( condr2 .LT. cond_ok ) THEN
1887*
1888* The matrix R2 is inverted. The solution of the matrix equation
1889* is Q3^* * V3 = the product of the Jacobi rotations (applied to
1890* the lower triangular L3 from the LQ factorization of
1891* R2=L3*Q3), pre-multiplied with the transposed Q3.
1892 CALL cgesvj( 'L', 'U', 'N', nr, nr, v, ldv, sva, nr, u,
1893 $ ldu, cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr,
1894 $ rwork, lrwork, info )
1895 scalem = rwork(1)
1896 numrank = nint(rwork(2))
1897 DO 3870 p = 1, nr
1898 CALL ccopy( nr, v(1,p), 1, u(1,p), 1 )
1899 CALL csscal( nr, sva(p), u(1,p), 1 )
1900 3870 CONTINUE
1901 CALL ctrsm('L','U','N','N',nr,nr,cone,cwork(2*n+1),n,
1902 $ u,ldu)
1903* .. apply the permutation from the second QR factorization
1904 DO 873 q = 1, nr
1905 DO 872 p = 1, nr
1906 cwork(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1907 872 CONTINUE
1908 DO 874 p = 1, nr
1909 u(p,q) = cwork(2*n+n*nr+nr+p)
1910 874 CONTINUE
1911 873 CONTINUE
1912 IF ( nr .LT. n ) THEN
1913 CALL claset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1914 CALL claset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1915 CALL claset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1916 END IF
1917 CALL cunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1918 $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1919 ELSE
1920* Last line of defense.
1921* #:( This is a rather pathological case: no scaled condition
1922* improvement after two pivoted QR factorizations. Other
1923* possibility is that the rank revealing QR factorization
1924* or the condition estimator has failed, or the COND_OK
1925* is set very close to ONE (which is unnecessary). Normally,
1926* this branch should never be executed, but in rare cases of
1927* failure of the RRQR or condition estimator, the last line of
1928* defense ensures that CGEJSV completes the task.
1929* Compute the full SVD of L3 using CGESVJ with explicit
1930* accumulation of Jacobi rotations.
1931 CALL cgesvj( 'L', 'U', 'V', nr, nr, v, ldv, sva, nr, u,
1932 $ ldu, cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr,
1933 $ rwork, lrwork, info )
1934 scalem = rwork(1)
1935 numrank = nint(rwork(2))
1936 IF ( nr .LT. n ) THEN
1937 CALL claset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1938 CALL claset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1939 CALL claset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1940 END IF
1941 CALL cunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1942 $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1943*
1944 CALL cunmlq( 'L', 'C', nr, nr, nr, cwork(2*n+1), n,
1945 $ cwork(2*n+n*nr+1), u, ldu, cwork(2*n+n*nr+nr+1),
1946 $ lwork-2*n-n*nr-nr, ierr )
1947 DO 773 q = 1, nr
1948 DO 772 p = 1, nr
1949 cwork(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1950 772 CONTINUE
1951 DO 774 p = 1, nr
1952 u(p,q) = cwork(2*n+n*nr+nr+p)
1953 774 CONTINUE
1954 773 CONTINUE
1955*
1956 END IF
1957*
1958* Permute the rows of V using the (column) permutation from the
1959* first QRF. Also, scale the columns to make them unit in
1960* Euclidean norm. This applies to all cases.
1961*
1962 temp1 = sqrt(real(n)) * epsln
1963 DO 1972 q = 1, n
1964 DO 972 p = 1, n
1965 cwork(2*n+n*nr+nr+iwork(p)) = v(p,q)
1966 972 CONTINUE
1967 DO 973 p = 1, n
1968 v(p,q) = cwork(2*n+n*nr+nr+p)
1969 973 CONTINUE
1970 xsc = one / scnrm2( n, v(1,q), 1 )
1971 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1972 $ CALL csscal( n, xsc, v(1,q), 1 )
1973 1972 CONTINUE
1974* At this moment, V contains the right singular vectors of A.
1975* Next, assemble the left singular vector matrix U (M x N).
1976 IF ( nr .LT. m ) THEN
1977 CALL claset('A', m-nr, nr, czero, czero, u(nr+1,1), ldu)
1978 IF ( nr .LT. n1 ) THEN
1979 CALL claset('A',nr,n1-nr,czero,czero,u(1,nr+1),ldu)
1980 CALL claset('A',m-nr,n1-nr,czero,cone,
1981 $ u(nr+1,nr+1),ldu)
1982 END IF
1983 END IF
1984*
1985* The Q matrix from the first QRF is built into the left singular
1986* matrix U. This applies to all cases.
1987*
1988 CALL cunmqr( 'L', 'N', m, n1, n, a, lda, cwork, u,
1989 $ ldu, cwork(n+1), lwork-n, ierr )
1990
1991* The columns of U are normalized. The cost is O(M*N) flops.
1992 temp1 = sqrt(real(m)) * epsln
1993 DO 1973 p = 1, nr
1994 xsc = one / scnrm2( m, u(1,p), 1 )
1995 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1996 $ CALL csscal( m, xsc, u(1,p), 1 )
1997 1973 CONTINUE
1998*
1999* If the initial QRF is computed with row pivoting, the left
2000* singular vectors must be adjusted.
2001*
2002 IF ( rowpiv )
2003 $ CALL claswp( n1, u, ldu, 1, m-1, iwork(iwoff+1), -1 )
2004*
2005 ELSE
2006*
2007* .. the initial matrix A has almost orthogonal columns and
2008* the second QRF is not needed
2009*
2010 CALL clacpy( 'U', n, n, a, lda, cwork(n+1), n )
2011 IF ( l2pert ) THEN
2012 xsc = sqrt(small)
2013 DO 5970 p = 2, n
2014 ctemp = xsc * cwork( n + (p-1)*n + p )
2015 DO 5971 q = 1, p - 1
2016* CWORK(N+(q-1)*N+p)=-TEMP1 * ( CWORK(N+(p-1)*N+q) /
2017* $ ABS(CWORK(N+(p-1)*N+q)) )
2018 cwork(n+(q-1)*n+p)=-ctemp
2019 5971 CONTINUE
2020 5970 CONTINUE
2021 ELSE
2022 CALL claset( 'L',n-1,n-1,czero,czero,cwork(n+2),n )
2023 END IF
2024*
2025 CALL cgesvj( 'U', 'U', 'N', n, n, cwork(n+1), n, sva,
2026 $ n, u, ldu, cwork(n+n*n+1), lwork-n-n*n, rwork, lrwork,
2027 $ info )
2028*
2029 scalem = rwork(1)
2030 numrank = nint(rwork(2))
2031 DO 6970 p = 1, n
2032 CALL ccopy( n, cwork(n+(p-1)*n+1), 1, u(1,p), 1 )
2033 CALL csscal( n, sva(p), cwork(n+(p-1)*n+1), 1 )
2034 6970 CONTINUE
2035*
2036 CALL ctrsm( 'L', 'U', 'N', 'N', n, n,
2037 $ cone, a, lda, cwork(n+1), n )
2038 DO 6972 p = 1, n
2039 CALL ccopy( n, cwork(n+p), n, v(iwork(p),1), ldv )
2040 6972 CONTINUE
2041 temp1 = sqrt(real(n))*epsln
2042 DO 6971 p = 1, n
2043 xsc = one / scnrm2( n, v(1,p), 1 )
2044 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
2045 $ CALL csscal( n, xsc, v(1,p), 1 )
2046 6971 CONTINUE
2047*
2048* Assemble the left singular vector matrix U (M x N).
2049*
2050 IF ( n .LT. m ) THEN
2051 CALL claset( 'A', m-n, n, czero, czero, u(n+1,1), ldu )
2052 IF ( n .LT. n1 ) THEN
2053 CALL claset('A',n, n1-n, czero, czero, u(1,n+1),ldu)
2054 CALL claset( 'A',m-n,n1-n, czero, cone,u(n+1,n+1),ldu)
2055 END IF
2056 END IF
2057 CALL cunmqr( 'L', 'N', m, n1, n, a, lda, cwork, u,
2058 $ ldu, cwork(n+1), lwork-n, ierr )
2059 temp1 = sqrt(real(m))*epsln
2060 DO 6973 p = 1, n1
2061 xsc = one / scnrm2( m, u(1,p), 1 )
2062 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
2063 $ CALL csscal( m, xsc, u(1,p), 1 )
2064 6973 CONTINUE
2065*
2066 IF ( rowpiv )
2067 $ CALL claswp( n1, u, ldu, 1, m-1, iwork(iwoff+1), -1 )
2068*
2069 END IF
2070*
2071* end of the >> almost orthogonal case << in the full SVD
2072*
2073 ELSE
2074*
2075* This branch deploys a preconditioned Jacobi SVD with explicitly
2076* accumulated rotations. It is included as optional, mainly for
2077* experimental purposes. It does perform well, and can also be used.
2078* In this implementation, this branch will be automatically activated
2079* if the condition number sigma_max(A) / sigma_min(A) is predicted
2080* to be greater than the overflow threshold. This is because the
2081* a posteriori computation of the singular vectors assumes robust
2082* implementation of BLAS and some LAPACK procedures, capable of working
2083* in presence of extreme values, e.g. when the singular values spread from
2084* the underflow to the overflow threshold.
2085*
2086 DO 7968 p = 1, nr
2087 CALL ccopy( n-p+1, a(p,p), lda, v(p,p), 1 )
2088 CALL clacgv( n-p+1, v(p,p), 1 )
2089 7968 CONTINUE
2090*
2091 IF ( l2pert ) THEN
2092 xsc = sqrt(small/epsln)
2093 DO 5969 q = 1, nr
2094 ctemp = cmplx(xsc*abs( v(q,q) ),zero)
2095 DO 5968 p = 1, n
2096 IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
2097 $ .OR. ( p .LT. q ) )
2098* $ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) )
2099 $ v(p,q) = ctemp
2100 IF ( p .LT. q ) v(p,q) = - v(p,q)
2101 5968 CONTINUE
2102 5969 CONTINUE
2103 ELSE
2104 CALL claset( 'U', nr-1, nr-1, czero, czero, v(1,2), ldv )
2105 END IF
2106
2107 CALL cgeqrf( n, nr, v, ldv, cwork(n+1), cwork(2*n+1),
2108 $ lwork-2*n, ierr )
2109 CALL clacpy( 'L', n, nr, v, ldv, cwork(2*n+1), n )
2110*
2111 DO 7969 p = 1, nr
2112 CALL ccopy( nr-p+1, v(p,p), ldv, u(p,p), 1 )
2113 CALL clacgv( nr-p+1, u(p,p), 1 )
2114 7969 CONTINUE
2115
2116 IF ( l2pert ) THEN
2117 xsc = sqrt(small/epsln)
2118 DO 9970 q = 2, nr
2119 DO 9971 p = 1, q - 1
2120 ctemp = cmplx(xsc * min(abs(u(p,p)),abs(u(q,q))),
2121 $ zero)
2122* U(p,q) = - TEMP1 * ( U(q,p) / ABS(U(q,p)) )
2123 u(p,q) = - ctemp
2124 9971 CONTINUE
2125 9970 CONTINUE
2126 ELSE
2127 CALL claset('U', nr-1, nr-1, czero, czero, u(1,2), ldu )
2128 END IF
2129
2130 CALL cgesvj( 'L', 'U', 'V', nr, nr, u, ldu, sva,
2131 $ n, v, ldv, cwork(2*n+n*nr+1), lwork-2*n-n*nr,
2132 $ rwork, lrwork, info )
2133 scalem = rwork(1)
2134 numrank = nint(rwork(2))
2135
2136 IF ( nr .LT. n ) THEN
2137 CALL claset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
2138 CALL claset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
2139 CALL claset( 'A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv )
2140 END IF
2141
2142 CALL cunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
2143 $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
2144*
2145* Permute the rows of V using the (column) permutation from the
2146* first QRF. Also, scale the columns to make them unit in
2147* Euclidean norm. This applies to all cases.
2148*
2149 temp1 = sqrt(real(n)) * epsln
2150 DO 7972 q = 1, n
2151 DO 8972 p = 1, n
2152 cwork(2*n+n*nr+nr+iwork(p)) = v(p,q)
2153 8972 CONTINUE
2154 DO 8973 p = 1, n
2155 v(p,q) = cwork(2*n+n*nr+nr+p)
2156 8973 CONTINUE
2157 xsc = one / scnrm2( n, v(1,q), 1 )
2158 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
2159 $ CALL csscal( n, xsc, v(1,q), 1 )
2160 7972 CONTINUE
2161*
2162* At this moment, V contains the right singular vectors of A.
2163* Next, assemble the left singular vector matrix U (M x N).
2164*
2165 IF ( nr .LT. m ) THEN
2166 CALL claset( 'A', m-nr, nr, czero, czero, u(nr+1,1), ldu )
2167 IF ( nr .LT. n1 ) THEN
2168 CALL claset('A',nr, n1-nr, czero, czero, u(1,nr+1),ldu)
2169 CALL claset('A',m-nr,n1-nr, czero, cone,u(nr+1,nr+1),ldu)
2170 END IF
2171 END IF
2172*
2173 CALL cunmqr( 'L', 'N', m, n1, n, a, lda, cwork, u,
2174 $ ldu, cwork(n+1), lwork-n, ierr )
2175*
2176 IF ( rowpiv )
2177 $ CALL claswp( n1, u, ldu, 1, m-1, iwork(iwoff+1), -1 )
2178*
2179*
2180 END IF
2181 IF ( transp ) THEN
2182* .. swap U and V because the procedure worked on A^*
2183 DO 6974 p = 1, n
2184 CALL cswap( n, u(1,p), 1, v(1,p), 1 )
2185 6974 CONTINUE
2186 END IF
2187*
2188 END IF
2189* end of the full SVD
2190*
2191* Undo scaling, if necessary (and possible)
2192*
2193 IF ( uscal2 .LE. (big/sva(1))*uscal1 ) THEN
2194 CALL slascl( 'G', 0, 0, uscal1, uscal2, nr, 1, sva, n, ierr )
2195 uscal1 = one
2196 uscal2 = one
2197 END IF
2198*
2199 IF ( nr .LT. n ) THEN
2200 DO 3004 p = nr+1, n
2201 sva(p) = zero
2202 3004 CONTINUE
2203 END IF
2204*
2205 rwork(1) = uscal2 * scalem
2206 rwork(2) = uscal1
2207 IF ( errest ) rwork(3) = sconda
2208 IF ( lsvec .AND. rsvec ) THEN
2209 rwork(4) = condr1
2210 rwork(5) = condr2
2211 END IF
2212 IF ( l2tran ) THEN
2213 rwork(6) = entra
2214 rwork(7) = entrat
2215 END IF
2216*
2217 iwork(1) = nr
2218 iwork(2) = numrank
2219 iwork(3) = warning
2220 IF ( transp ) THEN
2221 iwork(4) = 1
2222 ELSE
2223 iwork(4) = -1
2224 END IF
2225
2226*
2227 RETURN
2228* ..
2229* .. END OF CGEJSV
2230* ..
subroutine xerbla(srname, info)
Definition cblat2.f:3285
subroutine ccopy(n, cx, incx, cy, incy)
CCOPY
Definition ccopy.f:81
subroutine cgelqf(m, n, a, lda, tau, work, lwork, info)
CGELQF
Definition cgelqf.f:143
subroutine cgeqp3(m, n, a, lda, jpvt, tau, work, lwork, rwork, info)
CGEQP3
Definition cgeqp3.f:159
subroutine cgeqrf(m, n, a, lda, tau, work, lwork, info)
CGEQRF
Definition cgeqrf.f:146
subroutine cgesvj(joba, jobu, jobv, m, n, a, lda, sva, mv, v, ldv, cwork, lwork, rwork, lrwork, info)
CGESVJ
Definition cgesvj.f:351
integer function isamax(n, sx, incx)
ISAMAX
Definition isamax.f:71
integer function icamax(n, cx, incx)
ICAMAX
Definition icamax.f:71
subroutine clacgv(n, x, incx)
CLACGV conjugates a complex vector.
Definition clacgv.f:74
subroutine clacpy(uplo, m, n, a, lda, b, ldb)
CLACPY copies all or part of one two-dimensional array to another.
Definition clacpy.f:103
real function slamch(cmach)
SLAMCH
Definition slamch.f:68
subroutine clapmr(forwrd, m, n, x, ldx, k)
CLAPMR rearranges rows of a matrix as specified by a permutation vector.
Definition clapmr.f:104
subroutine clascl(type, kl, ku, cfrom, cto, m, n, a, lda, info)
CLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition clascl.f:143
subroutine slascl(type, kl, ku, cfrom, cto, m, n, a, lda, info)
SLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition slascl.f:143
subroutine claset(uplo, m, n, alpha, beta, a, lda)
CLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values.
Definition claset.f:106
subroutine slassq(n, x, incx, scale, sumsq)
SLASSQ updates a sum of squares represented in scaled form.
Definition slassq.f90:124
subroutine classq(n, x, incx, scale, sumsq)
CLASSQ updates a sum of squares represented in scaled form.
Definition classq.f90:124
subroutine claswp(n, a, lda, k1, k2, ipiv, incx)
CLASWP performs a series of row interchanges on a general rectangular matrix.
Definition claswp.f:115
logical function lsame(ca, cb)
LSAME
Definition lsame.f:48
real(wp) function scnrm2(n, x, incx)
SCNRM2
Definition scnrm2.f90:90
subroutine cpocon(uplo, n, a, lda, anorm, rcond, work, rwork, info)
CPOCON
Definition cpocon.f:121
subroutine csscal(n, sa, cx, incx)
CSSCAL
Definition csscal.f:78
subroutine sscal(n, sa, sx, incx)
SSCAL
Definition sscal.f:79
subroutine cswap(n, cx, incx, cy, incy)
CSWAP
Definition cswap.f:81
subroutine ctrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
CTRSM
Definition ctrsm.f:180
subroutine cungqr(m, n, k, a, lda, tau, work, lwork, info)
CUNGQR
Definition cungqr.f:128
subroutine cunmlq(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)
CUNMLQ
Definition cunmlq.f:168
subroutine cunmqr(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)
CUNMQR
Definition cunmqr.f:168
Here is the call graph for this function:
Here is the caller graph for this function: