 LAPACK  3.10.1 LAPACK: Linear Algebra PACKage

## ◆ zgejsv()

 subroutine zgejsv ( character*1 JOBA, character*1 JOBU, character*1 JOBV, character*1 JOBR, character*1 JOBT, character*1 JOBP, integer M, integer N, complex*16, dimension( lda, * ) A, integer LDA, double precision, dimension( n ) SVA, complex*16, dimension( ldu, * ) U, integer LDU, complex*16, dimension( ldv, * ) V, integer LDV, complex*16, dimension( lwork ) CWORK, integer LWORK, double precision, dimension( lrwork ) RWORK, integer LRWORK, integer, dimension( * ) IWORK, integer INFO )

ZGEJSV

Purpose:
``` ZGEJSV computes the singular value decomposition (SVD) of a complex M-by-N
matrix [A], where M >= N. The SVD of [A] is written as

[A] = [U] * [SIGMA] * [V]^*,

where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
diagonal elements, [U] is an M-by-N (or M-by-M) unitary matrix, and
[V] is an N-by-N unitary matrix. The diagonal elements of [SIGMA] are
the singular values of [A]. The columns of [U] and [V] are the left and
the right singular vectors of [A], respectively. The matrices [U] and [V]
are computed and stored in the arrays U and V, respectively. The diagonal
of [SIGMA] is computed and stored in the array SVA.```

# Arguments:

Parameters
 [in] JOBA ``` JOBA is CHARACTER*1 Specifies the level of accuracy: = 'C': This option works well (high relative accuracy) if A = B * D, with well-conditioned B and arbitrary diagonal matrix D. The accuracy cannot be spoiled by COLUMN scaling. The accuracy of the computed output depends on the condition of B, and the procedure aims at the best theoretical accuracy. The relative error max_{i=1:N}|d sigma_i| / sigma_i is bounded by f(M,N)*epsilon* cond(B), independent of D. The input matrix is preprocessed with the QRF with column pivoting. This initial preprocessing and preconditioning by a rank revealing QR factorization is common for all values of JOBA. Additional actions are specified as follows: = 'E': Computation as with 'C' with an additional estimate of the condition number of B. It provides a realistic error bound. = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings D1, D2, and well-conditioned matrix C, this option gives higher accuracy than the 'C' option. If the structure of the input matrix is not known, and relative accuracy is desirable, then this option is advisable. The input matrix A is preprocessed with QR factorization with FULL (row and column) pivoting. = 'G': Computation as with 'F' with an additional estimate of the condition number of B, where A=B*D. If A has heavily weighted rows, then using this condition number gives too pessimistic error bound. = 'A': Small singular values are not well determined by the data and are considered as noisy; the matrix is treated as numerically rank deficient. The error in the computed singular values is bounded by f(m,n)*epsilon*||A||. The computed SVD A = U * S * V^* restores A up to f(m,n)*epsilon*||A||. This gives the procedure the licence to discard (set to zero) all singular values below N*epsilon*||A||. = 'R': Similar as in 'A'. Rank revealing property of the initial QR factorization is used do reveal (using triangular factor) a gap sigma_{r+1} < epsilon * sigma_r in which case the numerical RANK is declared to be r. The SVD is computed with absolute error bounds, but more accurately than with 'A'.``` [in] JOBU ``` JOBU is CHARACTER*1 Specifies whether to compute the columns of U: = 'U': N columns of U are returned in the array U. = 'F': full set of M left sing. vectors is returned in the array U. = 'W': U may be used as workspace of length M*N. See the description of U. = 'N': U is not computed.``` [in] JOBV ``` JOBV is CHARACTER*1 Specifies whether to compute the matrix V: = 'V': N columns of V are returned in the array V; Jacobi rotations are not explicitly accumulated. = 'J': N columns of V are returned in the array V, but they are computed as the product of Jacobi rotations, if JOBT = 'N'. = 'W': V may be used as workspace of length N*N. See the description of V. = 'N': V is not computed.``` [in] JOBR ``` JOBR is CHARACTER*1 Specifies the RANGE for the singular values. Issues the licence to set to zero small positive singular values if they are outside specified range. If A .NE. 0 is scaled so that the largest singular value of c*A is around SQRT(BIG), BIG=DLAMCH('O'), then JOBR issues the licence to kill columns of A whose norm in c*A is less than SQRT(SFMIN) (for JOBR = 'R'), or less than SMALL=SFMIN/EPSLN, where SFMIN=DLAMCH('S'), EPSLN=DLAMCH('E'). = 'N': Do not kill small columns of c*A. This option assumes that BLAS and QR factorizations and triangular solvers are implemented to work in that range. If the condition of A is greater than BIG, use ZGESVJ. = 'R': RESTRICTED range for sigma(c*A) is [SQRT(SFMIN), SQRT(BIG)] (roughly, as described above). This option is recommended. =========================== For computing the singular values in the FULL range [SFMIN,BIG] use ZGESVJ.``` [in] JOBT ``` JOBT is CHARACTER*1 If the matrix is square then the procedure may determine to use transposed A if A^* seems to be better with respect to convergence. If the matrix is not square, JOBT is ignored. The decision is based on two values of entropy over the adjoint orbit of A^* * A. See the descriptions of WORK(6) and WORK(7). = 'T': transpose if entropy test indicates possibly faster convergence of Jacobi process if A^* is taken as input. If A is replaced with A^*, then the row pivoting is included automatically. = 'N': do not speculate. The option 'T' can be used to compute only the singular values, or the full SVD (U, SIGMA and V). For only one set of singular vectors (U or V), the caller should provide both U and V, as one of the matrices is used as workspace if the matrix A is transposed. The implementer can easily remove this constraint and make the code more complicated. See the descriptions of U and V. In general, this option is considered experimental, and 'N'; should be preferred. This is subject to changes in the future.``` [in] JOBP ``` JOBP is CHARACTER*1 Issues the licence to introduce structured perturbations to drown denormalized numbers. This licence should be active if the denormals are poorly implemented, causing slow computation, especially in cases of fast convergence (!). For details see [1,2]. For the sake of simplicity, this perturbations are included only when the full SVD or only the singular values are requested. The implementer/user can easily add the perturbation for the cases of computing one set of singular vectors. = 'P': introduce perturbation = 'N': do not perturb``` [in] M ``` M is INTEGER The number of rows of the input matrix A. M >= 0.``` [in] N ``` N is INTEGER The number of columns of the input matrix A. M >= N >= 0.``` [in,out] A ``` A is COMPLEX*16 array, dimension (LDA,N) On entry, the M-by-N matrix A.``` [in] LDA ``` LDA is INTEGER The leading dimension of the array A. LDA >= max(1,M).``` [out] SVA ``` SVA is DOUBLE PRECISION array, dimension (N) On exit, - For WORK(1)/WORK(2) = ONE: The singular values of A. During the computation SVA contains Euclidean column norms of the iterated matrices in the array A. - For WORK(1) .NE. WORK(2): The singular values of A are (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if sigma_max(A) overflows or if small singular values have been saved from underflow by scaling the input matrix A. - If JOBR='R' then some of the singular values may be returned as exact zeros obtained by "set to zero" because they are below the numerical rank threshold or are denormalized numbers.``` [out] U ``` U is COMPLEX*16 array, dimension ( LDU, N ) If JOBU = 'U', then U contains on exit the M-by-N matrix of the left singular vectors. If JOBU = 'F', then U contains on exit the M-by-M matrix of the left singular vectors, including an ONB of the orthogonal complement of the Range(A). If JOBU = 'W' .AND. (JOBV = 'V' .AND. JOBT = 'T' .AND. M = N), then U is used as workspace if the procedure replaces A with A^*. In that case, [V] is computed in U as left singular vectors of A^* and then copied back to the V array. This 'W' option is just a reminder to the caller that in this case U is reserved as workspace of length N*N. If JOBU = 'N' U is not referenced, unless JOBT='T'.``` [in] LDU ``` LDU is INTEGER The leading dimension of the array U, LDU >= 1. IF JOBU = 'U' or 'F' or 'W', then LDU >= M.``` [out] V ``` V is COMPLEX*16 array, dimension ( LDV, N ) If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of the right singular vectors; If JOBV = 'W', AND (JOBU = 'U' AND JOBT = 'T' AND M = N), then V is used as workspace if the pprocedure replaces A with A^*. In that case, [U] is computed in V as right singular vectors of A^* and then copied back to the U array. This 'W' option is just a reminder to the caller that in this case V is reserved as workspace of length N*N. If JOBV = 'N' V is not referenced, unless JOBT='T'.``` [in] LDV ``` LDV is INTEGER The leading dimension of the array V, LDV >= 1. If JOBV = 'V' or 'J' or 'W', then LDV >= N.``` [out] CWORK ``` CWORK is COMPLEX*16 array, dimension (MAX(2,LWORK)) If the call to ZGEJSV is a workspace query (indicated by LWORK=-1 or LRWORK=-1), then on exit CWORK(1) contains the required length of CWORK for the job parameters used in the call.``` [in] LWORK ``` LWORK is INTEGER Length of CWORK to confirm proper allocation of workspace. LWORK depends on the job: 1. If only SIGMA is needed ( JOBU = 'N', JOBV = 'N' ) and 1.1 .. no scaled condition estimate required (JOBA.NE.'E'.AND.JOBA.NE.'G'): LWORK >= 2*N+1. This is the minimal requirement. ->> For optimal performance (blocked code) the optimal value is LWORK >= N + (N+1)*NB. Here NB is the optimal block size for ZGEQP3 and ZGEQRF. In general, optimal LWORK is computed as LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZGEQRF), LWORK(ZGESVJ)). 1.2. .. an estimate of the scaled condition number of A is required (JOBA='E', or 'G'). In this case, LWORK the minimal requirement is LWORK >= N*N + 2*N. ->> For optimal performance (blocked code) the optimal value is LWORK >= max(N+(N+1)*NB, N*N+2*N)=N**2+2*N. In general, the optimal length LWORK is computed as LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZGEQRF), LWORK(ZGESVJ), N*N+LWORK(ZPOCON)). 2. If SIGMA and the right singular vectors are needed (JOBV = 'V'), (JOBU = 'N') 2.1 .. no scaled condition estimate requested (JOBE = 'N'): -> the minimal requirement is LWORK >= 3*N. -> For optimal performance, LWORK >= max(N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB, where NB is the optimal block size for ZGEQP3, ZGEQRF, ZGELQ, ZUNMLQ. In general, the optimal length LWORK is computed as LWORK >= max(N+LWORK(ZGEQP3), N+LWORK(ZGESVJ), N+LWORK(ZGELQF), 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMLQ)). 2.2 .. an estimate of the scaled condition number of A is required (JOBA='E', or 'G'). -> the minimal requirement is LWORK >= 3*N. -> For optimal performance, LWORK >= max(N+(N+1)*NB, 2*N,2*N+N*NB)=2*N+N*NB, where NB is the optimal block size for ZGEQP3, ZGEQRF, ZGELQ, ZUNMLQ. In general, the optimal length LWORK is computed as LWORK >= max(N+LWORK(ZGEQP3), LWORK(ZPOCON), N+LWORK(ZGESVJ), N+LWORK(ZGELQF), 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMLQ)). 3. If SIGMA and the left singular vectors are needed 3.1 .. no scaled condition estimate requested (JOBE = 'N'): -> the minimal requirement is LWORK >= 3*N. -> For optimal performance: if JOBU = 'U' :: LWORK >= max(3*N, N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB, where NB is the optimal block size for ZGEQP3, ZGEQRF, ZUNMQR. In general, the optimal length LWORK is computed as LWORK >= max(N+LWORK(ZGEQP3), 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMQR)). 3.2 .. an estimate of the scaled condition number of A is required (JOBA='E', or 'G'). -> the minimal requirement is LWORK >= 3*N. -> For optimal performance: if JOBU = 'U' :: LWORK >= max(3*N, N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB, where NB is the optimal block size for ZGEQP3, ZGEQRF, ZUNMQR. In general, the optimal length LWORK is computed as LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZPOCON), 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMQR)). 4. If the full SVD is needed: (JOBU = 'U' or JOBU = 'F') and 4.1. if JOBV = 'V' the minimal requirement is LWORK >= 5*N+2*N*N. 4.2. if JOBV = 'J' the minimal requirement is LWORK >= 4*N+N*N. In both cases, the allocated CWORK can accommodate blocked runs of ZGEQP3, ZGEQRF, ZGELQF, SUNMQR, ZUNMLQ. If the call to ZGEJSV is a workspace query (indicated by LWORK=-1 or LRWORK=-1), then on exit CWORK(1) contains the optimal and CWORK(2) contains the minimal length of CWORK for the job parameters used in the call.``` [out] RWORK ``` RWORK is DOUBLE PRECISION array, dimension (MAX(7,LWORK)) On exit, RWORK(1) = Determines the scaling factor SCALE = RWORK(2) / RWORK(1) such that SCALE*SVA(1:N) are the computed singular values of A. (See the description of SVA().) RWORK(2) = See the description of RWORK(1). RWORK(3) = SCONDA is an estimate for the condition number of column equilibrated A. (If JOBA = 'E' or 'G') SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1). It is computed using ZPOCON. It holds N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA where R is the triangular factor from the QRF of A. However, if R is truncated and the numerical rank is determined to be strictly smaller than N, SCONDA is returned as -1, thus indicating that the smallest singular values might be lost. If full SVD is needed, the following two condition numbers are useful for the analysis of the algorithm. They are provided for a developer/implementer who is familiar with the details of the method. RWORK(4) = an estimate of the scaled condition number of the triangular factor in the first QR factorization. RWORK(5) = an estimate of the scaled condition number of the triangular factor in the second QR factorization. The following two parameters are computed if JOBT = 'T'. They are provided for a developer/implementer who is familiar with the details of the method. RWORK(6) = the entropy of A^* * A :: this is the Shannon entropy of diag(A^* * A) / Trace(A^* * A) taken as point in the probability simplex. RWORK(7) = the entropy of A * A^*. (See the description of RWORK(6).) If the call to ZGEJSV is a workspace query (indicated by LWORK=-1 or LRWORK=-1), then on exit RWORK(1) contains the required length of RWORK for the job parameters used in the call.``` [in] LRWORK ``` LRWORK is INTEGER Length of RWORK to confirm proper allocation of workspace. LRWORK depends on the job: 1. If only the singular values are requested i.e. if LSAME(JOBU,'N') .AND. LSAME(JOBV,'N') then: 1.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'), then: LRWORK = max( 7, 2 * M ). 1.2. Otherwise, LRWORK = max( 7, N ). 2. If singular values with the right singular vectors are requested i.e. if (LSAME(JOBV,'V').OR.LSAME(JOBV,'J')) .AND. .NOT.(LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) then: 2.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'), then LRWORK = max( 7, 2 * M ). 2.2. Otherwise, LRWORK = max( 7, N ). 3. If singular values with the left singular vectors are requested, i.e. if (LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) .AND. .NOT.(LSAME(JOBV,'V').OR.LSAME(JOBV,'J')) then: 3.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'), then LRWORK = max( 7, 2 * M ). 3.2. Otherwise, LRWORK = max( 7, N ). 4. If singular values with both the left and the right singular vectors are requested, i.e. if (LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) .AND. (LSAME(JOBV,'V').OR.LSAME(JOBV,'J')) then: 4.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'), then LRWORK = max( 7, 2 * M ). 4.2. Otherwise, LRWORK = max( 7, N ). If, on entry, LRWORK = -1 or LWORK=-1, a workspace query is assumed and the length of RWORK is returned in RWORK(1). ``` [out] IWORK ``` IWORK is INTEGER array, of dimension at least 4, that further depends on the job: 1. If only the singular values are requested then: If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) then the length of IWORK is N+M; otherwise the length of IWORK is N. 2. If the singular values and the right singular vectors are requested then: If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) then the length of IWORK is N+M; otherwise the length of IWORK is N. 3. If the singular values and the left singular vectors are requested then: If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) then the length of IWORK is N+M; otherwise the length of IWORK is N. 4. If the singular values with both the left and the right singular vectors are requested, then: 4.1. If LSAME(JOBV,'J') the length of IWORK is determined as follows: If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) then the length of IWORK is N+M; otherwise the length of IWORK is N. 4.2. If LSAME(JOBV,'V') the length of IWORK is determined as follows: If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) then the length of IWORK is 2*N+M; otherwise the length of IWORK is 2*N. On exit, IWORK(1) = the numerical rank determined after the initial QR factorization with pivoting. See the descriptions of JOBA and JOBR. IWORK(2) = the number of the computed nonzero singular values IWORK(3) = if nonzero, a warning message: If IWORK(3) = 1 then some of the column norms of A were denormalized floats. The requested high accuracy is not warranted by the data. IWORK(4) = 1 or -1. If IWORK(4) = 1, then the procedure used A^* to do the job as specified by the JOB parameters. If the call to ZGEJSV is a workspace query (indicated by LWORK = -1 or LRWORK = -1), then on exit IWORK(1) contains the required length of IWORK for the job parameters used in the call.``` [out] INFO ``` INFO is INTEGER < 0: if INFO = -i, then the i-th argument had an illegal value. = 0: successful exit; > 0: ZGEJSV did not converge in the maximal allowed number of sweeps. The computed values may be inaccurate.```
Further Details:
```  ZGEJSV implements a preconditioned Jacobi SVD algorithm. It uses ZGEQP3,
ZGEQRF, and ZGELQF as preprocessors and preconditioners. Optionally, an
additional row pivoting can be used as a preprocessor, which in some
cases results in much higher accuracy. An example is matrix A with the
structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
diagonal matrices and C is well-conditioned matrix. In that case, complete
pivoting in the first QR factorizations provides accuracy dependent on the
condition number of C, and independent of D1, D2. Such higher accuracy is
not completely understood theoretically, but it works well in practice.
Further, if A can be written as A = B*D, with well-conditioned B and some
diagonal D, then the high accuracy is guaranteed, both theoretically and
in software, independent of D. For more details see , .
The computational range for the singular values can be the full range
( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
& LAPACK routines called by ZGEJSV are implemented to work in that range.
If that is not the case, then the restriction for safe computation with
the singular values in the range of normalized IEEE numbers is that the
spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
overflow. This code (ZGEJSV) is best used in this restricted range,
meaning that singular values of magnitude below ||A||_2 / DLAMCH('O') are
returned as zeros. See JOBR for details on this.
Further, this implementation is somewhat slower than the one described
in [1,2] due to replacement of some non-LAPACK components, and because
the choice of some tuning parameters in the iterative part (ZGESVJ) is
left to the implementer on a particular machine.
The rank revealing QR factorization (in this code: ZGEQP3) should be
implemented as in . We have a new version of ZGEQP3 under development
that is more robust than the current one in LAPACK, with a cleaner cut in
rank deficient cases. It will be available in the SIGMA library .
If M is much larger than N, it is obvious that the initial QRF with
column pivoting can be preprocessed by the QRF without pivoting. That
well known trick is not used in ZGEJSV because in some cases heavy row
weighting can be treated with complete pivoting. The overhead in cases
M much larger than N is then only due to pivoting, but the benefits in
terms of accuracy have prevailed. The implementer/user can incorporate
this extra QRF step easily. The implementer can also improve data movement
(matrix transpose, matrix copy, matrix transposed copy) - this
implementation of ZGEJSV uses only the simplest, naive data movement.```
Contributor:
Zlatko Drmac, Department of Mathematics, Faculty of Science, University of Zagreb (Zagreb, Croatia); drmac.nosp@m.@mat.nosp@m.h.hr
References:
```  Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
LAPACK Working note 169.
 Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
LAPACK Working note 170.
 Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
factorization software - a case study.
ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
LAPACK Working note 176.
 Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
QSVD, (H,K)-SVD computations.
Department of Mathematics, University of Zagreb, 2008, 2016.```
Please report all bugs and send interesting examples and/or comments to drmac.nosp@m.@mat.nosp@m.h.hr. Thank you.

Definition at line 566 of file zgejsv.f.

569 *
570 * -- LAPACK computational routine --
571 * -- LAPACK is a software package provided by Univ. of Tennessee, --
572 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
573 *
574 * .. Scalar Arguments ..
575  IMPLICIT NONE
576  INTEGER INFO, LDA, LDU, LDV, LWORK, LRWORK, M, N
577 * ..
578 * .. Array Arguments ..
579  COMPLEX*16 A( LDA, * ), U( LDU, * ), V( LDV, * ),
580  \$ CWORK( LWORK )
581  DOUBLE PRECISION SVA( N ), RWORK( LRWORK )
582  INTEGER IWORK( * )
583  CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
584 * ..
585 *
586 * ===========================================================================
587 *
588 * .. Local Parameters ..
589  DOUBLE PRECISION ZERO, ONE
590  parameter( zero = 0.0d0, one = 1.0d0 )
591  COMPLEX*16 CZERO, CONE
592  parameter( czero = ( 0.0d0, 0.0d0 ), cone = ( 1.0d0, 0.0d0 ) )
593 * ..
594 * .. Local Scalars ..
595  COMPLEX*16 CTEMP
596  DOUBLE PRECISION AAPP, AAQQ, AATMAX, AATMIN, BIG, BIG1,
597  \$ COND_OK, CONDR1, CONDR2, ENTRA, ENTRAT, EPSLN,
598  \$ MAXPRJ, SCALEM, SCONDA, SFMIN, SMALL, TEMP1,
599  \$ USCAL1, USCAL2, XSC
600  INTEGER IERR, N1, NR, NUMRANK, p, q, WARNING
601  LOGICAL ALMORT, DEFR, ERREST, GOSCAL, JRACC, KILL, LQUERY,
602  \$ LSVEC, L2ABER, L2KILL, L2PERT, L2RANK, L2TRAN, NOSCAL,
603  \$ ROWPIV, RSVEC, TRANSP
604 *
605  INTEGER OPTWRK, MINWRK, MINRWRK, MINIWRK
606  INTEGER LWCON, LWLQF, LWQP3, LWQRF, LWUNMLQ, LWUNMQR, LWUNMQRM,
607  \$ LWSVDJ, LWSVDJV, LRWQP3, LRWCON, LRWSVDJ, IWOFF
608  INTEGER LWRK_ZGELQF, LWRK_ZGEQP3, LWRK_ZGEQP3N, LWRK_ZGEQRF,
609  \$ LWRK_ZGESVJ, LWRK_ZGESVJV, LWRK_ZGESVJU, LWRK_ZUNMLQ,
610  \$ LWRK_ZUNMQR, LWRK_ZUNMQRM
611 * ..
612 * .. Local Arrays
613  COMPLEX*16 CDUMMY(1)
614  DOUBLE PRECISION RDUMMY(1)
615 *
616 * .. Intrinsic Functions ..
617  INTRINSIC abs, dcmplx, conjg, dlog, max, min, dble, nint, sqrt
618 * ..
619 * .. External Functions ..
620  DOUBLE PRECISION DLAMCH, DZNRM2
621  INTEGER IDAMAX, IZAMAX
622  LOGICAL LSAME
623  EXTERNAL idamax, izamax, lsame, dlamch, dznrm2
624 * ..
625 * .. External Subroutines ..
626  EXTERNAL dlassq, zcopy, zgelqf, zgeqp3, zgeqrf, zlacpy, zlapmr,
629  \$ xerbla
630 *
631  EXTERNAL zgesvj
632 * ..
633 *
634 * Test the input arguments
635 *
636  lsvec = lsame( jobu, 'U' ) .OR. lsame( jobu, 'F' )
637  jracc = lsame( jobv, 'J' )
638  rsvec = lsame( jobv, 'V' ) .OR. jracc
639  rowpiv = lsame( joba, 'F' ) .OR. lsame( joba, 'G' )
640  l2rank = lsame( joba, 'R' )
641  l2aber = lsame( joba, 'A' )
642  errest = lsame( joba, 'E' ) .OR. lsame( joba, 'G' )
643  l2tran = lsame( jobt, 'T' ) .AND. ( m .EQ. n )
644  l2kill = lsame( jobr, 'R' )
645  defr = lsame( jobr, 'N' )
646  l2pert = lsame( jobp, 'P' )
647 *
648  lquery = ( lwork .EQ. -1 ) .OR. ( lrwork .EQ. -1 )
649 *
650  IF ( .NOT.(rowpiv .OR. l2rank .OR. l2aber .OR.
651  \$ errest .OR. lsame( joba, 'C' ) )) THEN
652  info = - 1
653  ELSE IF ( .NOT.( lsvec .OR. lsame( jobu, 'N' ) .OR.
654  \$ ( lsame( jobu, 'W' ) .AND. rsvec .AND. l2tran ) ) ) THEN
655  info = - 2
656  ELSE IF ( .NOT.( rsvec .OR. lsame( jobv, 'N' ) .OR.
657  \$ ( lsame( jobv, 'W' ) .AND. lsvec .AND. l2tran ) ) ) THEN
658  info = - 3
659  ELSE IF ( .NOT. ( l2kill .OR. defr ) ) THEN
660  info = - 4
661  ELSE IF ( .NOT. ( lsame(jobt,'T') .OR. lsame(jobt,'N') ) ) THEN
662  info = - 5
663  ELSE IF ( .NOT. ( l2pert .OR. lsame( jobp, 'N' ) ) ) THEN
664  info = - 6
665  ELSE IF ( m .LT. 0 ) THEN
666  info = - 7
667  ELSE IF ( ( n .LT. 0 ) .OR. ( n .GT. m ) ) THEN
668  info = - 8
669  ELSE IF ( lda .LT. m ) THEN
670  info = - 10
671  ELSE IF ( lsvec .AND. ( ldu .LT. m ) ) THEN
672  info = - 13
673  ELSE IF ( rsvec .AND. ( ldv .LT. n ) ) THEN
674  info = - 15
675  ELSE
676 * #:)
677  info = 0
678  END IF
679 *
680  IF ( info .EQ. 0 ) THEN
681 * .. compute the minimal and the optimal workspace lengths
682 * [[The expressions for computing the minimal and the optimal
683 * values of LCWORK, LRWORK are written with a lot of redundancy and
684 * can be simplified. However, this verbose form is useful for
685 * maintenance and modifications of the code.]]
686 *
687 * .. minimal workspace length for ZGEQP3 of an M x N matrix,
688 * ZGEQRF of an N x N matrix, ZGELQF of an N x N matrix,
689 * ZUNMLQ for computing N x N matrix, ZUNMQR for computing N x N
690 * matrix, ZUNMQR for computing M x N matrix, respectively.
691  lwqp3 = n+1
692  lwqrf = max( 1, n )
693  lwlqf = max( 1, n )
694  lwunmlq = max( 1, n )
695  lwunmqr = max( 1, n )
696  lwunmqrm = max( 1, m )
697 * .. minimal workspace length for ZPOCON of an N x N matrix
698  lwcon = 2 * n
699 * .. minimal workspace length for ZGESVJ of an N x N matrix,
700 * without and with explicit accumulation of Jacobi rotations
701  lwsvdj = max( 2 * n, 1 )
702  lwsvdjv = max( 2 * n, 1 )
703 * .. minimal REAL workspace length for ZGEQP3, ZPOCON, ZGESVJ
704  lrwqp3 = 2 * n
705  lrwcon = n
706  lrwsvdj = n
707  IF ( lquery ) THEN
708  CALL zgeqp3( m, n, a, lda, iwork, cdummy, cdummy, -1,
709  \$ rdummy, ierr )
710  lwrk_zgeqp3 = dble( cdummy(1) )
711  CALL zgeqrf( n, n, a, lda, cdummy, cdummy,-1, ierr )
712  lwrk_zgeqrf = dble( cdummy(1) )
713  CALL zgelqf( n, n, a, lda, cdummy, cdummy,-1, ierr )
714  lwrk_zgelqf = dble( cdummy(1) )
715  END IF
716  minwrk = 2
717  optwrk = 2
718  miniwrk = n
719  IF ( .NOT. (lsvec .OR. rsvec ) ) THEN
720 * .. minimal and optimal sizes of the complex workspace if
721 * only the singular values are requested
722  IF ( errest ) THEN
723  minwrk = max( n+lwqp3, n**2+lwcon, n+lwqrf, lwsvdj )
724  ELSE
725  minwrk = max( n+lwqp3, n+lwqrf, lwsvdj )
726  END IF
727  IF ( lquery ) THEN
728  CALL zgesvj( 'L', 'N', 'N', n, n, a, lda, sva, n, v,
729  \$ ldv, cdummy, -1, rdummy, -1, ierr )
730  lwrk_zgesvj = dble( cdummy(1) )
731  IF ( errest ) THEN
732  optwrk = max( n+lwrk_zgeqp3, n**2+lwcon,
733  \$ n+lwrk_zgeqrf, lwrk_zgesvj )
734  ELSE
735  optwrk = max( n+lwrk_zgeqp3, n+lwrk_zgeqrf,
736  \$ lwrk_zgesvj )
737  END IF
738  END IF
739  IF ( l2tran .OR. rowpiv ) THEN
740  IF ( errest ) THEN
741  minrwrk = max( 7, 2*m, lrwqp3, lrwcon, lrwsvdj )
742  ELSE
743  minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj )
744  END IF
745  ELSE
746  IF ( errest ) THEN
747  minrwrk = max( 7, lrwqp3, lrwcon, lrwsvdj )
748  ELSE
749  minrwrk = max( 7, lrwqp3, lrwsvdj )
750  END IF
751  END IF
752  IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
753  ELSE IF ( rsvec .AND. (.NOT.lsvec) ) THEN
754 * .. minimal and optimal sizes of the complex workspace if the
755 * singular values and the right singular vectors are requested
756  IF ( errest ) THEN
757  minwrk = max( n+lwqp3, lwcon, lwsvdj, n+lwlqf,
758  \$ 2*n+lwqrf, n+lwsvdj, n+lwunmlq )
759  ELSE
760  minwrk = max( n+lwqp3, lwsvdj, n+lwlqf, 2*n+lwqrf,
761  \$ n+lwsvdj, n+lwunmlq )
762  END IF
763  IF ( lquery ) THEN
764  CALL zgesvj( 'L', 'U', 'N', n,n, u, ldu, sva, n, a,
765  \$ lda, cdummy, -1, rdummy, -1, ierr )
766  lwrk_zgesvj = dble( cdummy(1) )
767  CALL zunmlq( 'L', 'C', n, n, n, a, lda, cdummy,
768  \$ v, ldv, cdummy, -1, ierr )
769  lwrk_zunmlq = dble( cdummy(1) )
770  IF ( errest ) THEN
771  optwrk = max( n+lwrk_zgeqp3, lwcon, lwrk_zgesvj,
772  \$ n+lwrk_zgelqf, 2*n+lwrk_zgeqrf,
773  \$ n+lwrk_zgesvj, n+lwrk_zunmlq )
774  ELSE
775  optwrk = max( n+lwrk_zgeqp3, lwrk_zgesvj,n+lwrk_zgelqf,
776  \$ 2*n+lwrk_zgeqrf, n+lwrk_zgesvj,
777  \$ n+lwrk_zunmlq )
778  END IF
779  END IF
780  IF ( l2tran .OR. rowpiv ) THEN
781  IF ( errest ) THEN
782  minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj, lrwcon )
783  ELSE
784  minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj )
785  END IF
786  ELSE
787  IF ( errest ) THEN
788  minrwrk = max( 7, lrwqp3, lrwsvdj, lrwcon )
789  ELSE
790  minrwrk = max( 7, lrwqp3, lrwsvdj )
791  END IF
792  END IF
793  IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
794  ELSE IF ( lsvec .AND. (.NOT.rsvec) ) THEN
795 * .. minimal and optimal sizes of the complex workspace if the
796 * singular values and the left singular vectors are requested
797  IF ( errest ) THEN
798  minwrk = n + max( lwqp3,lwcon,n+lwqrf,lwsvdj,lwunmqrm )
799  ELSE
800  minwrk = n + max( lwqp3, n+lwqrf, lwsvdj, lwunmqrm )
801  END IF
802  IF ( lquery ) THEN
803  CALL zgesvj( 'L', 'U', 'N', n,n, u, ldu, sva, n, a,
804  \$ lda, cdummy, -1, rdummy, -1, ierr )
805  lwrk_zgesvj = dble( cdummy(1) )
806  CALL zunmqr( 'L', 'N', m, n, n, a, lda, cdummy, u,
807  \$ ldu, cdummy, -1, ierr )
808  lwrk_zunmqrm = dble( cdummy(1) )
809  IF ( errest ) THEN
810  optwrk = n + max( lwrk_zgeqp3, lwcon, n+lwrk_zgeqrf,
811  \$ lwrk_zgesvj, lwrk_zunmqrm )
812  ELSE
813  optwrk = n + max( lwrk_zgeqp3, n+lwrk_zgeqrf,
814  \$ lwrk_zgesvj, lwrk_zunmqrm )
815  END IF
816  END IF
817  IF ( l2tran .OR. rowpiv ) THEN
818  IF ( errest ) THEN
819  minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj, lrwcon )
820  ELSE
821  minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj )
822  END IF
823  ELSE
824  IF ( errest ) THEN
825  minrwrk = max( 7, lrwqp3, lrwsvdj, lrwcon )
826  ELSE
827  minrwrk = max( 7, lrwqp3, lrwsvdj )
828  END IF
829  END IF
830  IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
831  ELSE
832 * .. minimal and optimal sizes of the complex workspace if the
833 * full SVD is requested
834  IF ( .NOT. jracc ) THEN
835  IF ( errest ) THEN
836  minwrk = max( n+lwqp3, n+lwcon, 2*n+n**2+lwcon,
837  \$ 2*n+lwqrf, 2*n+lwqp3,
838  \$ 2*n+n**2+n+lwlqf, 2*n+n**2+n+n**2+lwcon,
839  \$ 2*n+n**2+n+lwsvdj, 2*n+n**2+n+lwsvdjv,
840  \$ 2*n+n**2+n+lwunmqr,2*n+n**2+n+lwunmlq,
841  \$ n+n**2+lwsvdj, n+lwunmqrm )
842  ELSE
843  minwrk = max( n+lwqp3, 2*n+n**2+lwcon,
844  \$ 2*n+lwqrf, 2*n+lwqp3,
845  \$ 2*n+n**2+n+lwlqf, 2*n+n**2+n+n**2+lwcon,
846  \$ 2*n+n**2+n+lwsvdj, 2*n+n**2+n+lwsvdjv,
847  \$ 2*n+n**2+n+lwunmqr,2*n+n**2+n+lwunmlq,
848  \$ n+n**2+lwsvdj, n+lwunmqrm )
849  END IF
850  miniwrk = miniwrk + n
851  IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
852  ELSE
853  IF ( errest ) THEN
854  minwrk = max( n+lwqp3, n+lwcon, 2*n+lwqrf,
855  \$ 2*n+n**2+lwsvdjv, 2*n+n**2+n+lwunmqr,
856  \$ n+lwunmqrm )
857  ELSE
858  minwrk = max( n+lwqp3, 2*n+lwqrf,
859  \$ 2*n+n**2+lwsvdjv, 2*n+n**2+n+lwunmqr,
860  \$ n+lwunmqrm )
861  END IF
862  IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
863  END IF
864  IF ( lquery ) THEN
865  CALL zunmqr( 'L', 'N', m, n, n, a, lda, cdummy, u,
866  \$ ldu, cdummy, -1, ierr )
867  lwrk_zunmqrm = dble( cdummy(1) )
868  CALL zunmqr( 'L', 'N', n, n, n, a, lda, cdummy, u,
869  \$ ldu, cdummy, -1, ierr )
870  lwrk_zunmqr = dble( cdummy(1) )
871  IF ( .NOT. jracc ) THEN
872  CALL zgeqp3( n,n, a, lda, iwork, cdummy,cdummy, -1,
873  \$ rdummy, ierr )
874  lwrk_zgeqp3n = dble( cdummy(1) )
875  CALL zgesvj( 'L', 'U', 'N', n, n, u, ldu, sva,
876  \$ n, v, ldv, cdummy, -1, rdummy, -1, ierr )
877  lwrk_zgesvj = dble( cdummy(1) )
878  CALL zgesvj( 'U', 'U', 'N', n, n, u, ldu, sva,
879  \$ n, v, ldv, cdummy, -1, rdummy, -1, ierr )
880  lwrk_zgesvju = dble( cdummy(1) )
881  CALL zgesvj( 'L', 'U', 'V', n, n, u, ldu, sva,
882  \$ n, v, ldv, cdummy, -1, rdummy, -1, ierr )
883  lwrk_zgesvjv = dble( cdummy(1) )
884  CALL zunmlq( 'L', 'C', n, n, n, a, lda, cdummy,
885  \$ v, ldv, cdummy, -1, ierr )
886  lwrk_zunmlq = dble( cdummy(1) )
887  IF ( errest ) THEN
888  optwrk = max( n+lwrk_zgeqp3, n+lwcon,
889  \$ 2*n+n**2+lwcon, 2*n+lwrk_zgeqrf,
890  \$ 2*n+lwrk_zgeqp3n,
891  \$ 2*n+n**2+n+lwrk_zgelqf,
892  \$ 2*n+n**2+n+n**2+lwcon,
893  \$ 2*n+n**2+n+lwrk_zgesvj,
894  \$ 2*n+n**2+n+lwrk_zgesvjv,
895  \$ 2*n+n**2+n+lwrk_zunmqr,
896  \$ 2*n+n**2+n+lwrk_zunmlq,
897  \$ n+n**2+lwrk_zgesvju,
898  \$ n+lwrk_zunmqrm )
899  ELSE
900  optwrk = max( n+lwrk_zgeqp3,
901  \$ 2*n+n**2+lwcon, 2*n+lwrk_zgeqrf,
902  \$ 2*n+lwrk_zgeqp3n,
903  \$ 2*n+n**2+n+lwrk_zgelqf,
904  \$ 2*n+n**2+n+n**2+lwcon,
905  \$ 2*n+n**2+n+lwrk_zgesvj,
906  \$ 2*n+n**2+n+lwrk_zgesvjv,
907  \$ 2*n+n**2+n+lwrk_zunmqr,
908  \$ 2*n+n**2+n+lwrk_zunmlq,
909  \$ n+n**2+lwrk_zgesvju,
910  \$ n+lwrk_zunmqrm )
911  END IF
912  ELSE
913  CALL zgesvj( 'L', 'U', 'V', n, n, u, ldu, sva,
914  \$ n, v, ldv, cdummy, -1, rdummy, -1, ierr )
915  lwrk_zgesvjv = dble( cdummy(1) )
916  CALL zunmqr( 'L', 'N', n, n, n, cdummy, n, cdummy,
917  \$ v, ldv, cdummy, -1, ierr )
918  lwrk_zunmqr = dble( cdummy(1) )
919  CALL zunmqr( 'L', 'N', m, n, n, a, lda, cdummy, u,
920  \$ ldu, cdummy, -1, ierr )
921  lwrk_zunmqrm = dble( cdummy(1) )
922  IF ( errest ) THEN
923  optwrk = max( n+lwrk_zgeqp3, n+lwcon,
924  \$ 2*n+lwrk_zgeqrf, 2*n+n**2,
925  \$ 2*n+n**2+lwrk_zgesvjv,
926  \$ 2*n+n**2+n+lwrk_zunmqr,n+lwrk_zunmqrm )
927  ELSE
928  optwrk = max( n+lwrk_zgeqp3, 2*n+lwrk_zgeqrf,
929  \$ 2*n+n**2, 2*n+n**2+lwrk_zgesvjv,
930  \$ 2*n+n**2+n+lwrk_zunmqr,
931  \$ n+lwrk_zunmqrm )
932  END IF
933  END IF
934  END IF
935  IF ( l2tran .OR. rowpiv ) THEN
936  minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj, lrwcon )
937  ELSE
938  minrwrk = max( 7, lrwqp3, lrwsvdj, lrwcon )
939  END IF
940  END IF
941  minwrk = max( 2, minwrk )
942  optwrk = max( minwrk, optwrk )
943  IF ( lwork .LT. minwrk .AND. (.NOT.lquery) ) info = - 17
944  IF ( lrwork .LT. minrwrk .AND. (.NOT.lquery) ) info = - 19
945  END IF
946 *
947  IF ( info .NE. 0 ) THEN
948 * #:(
949  CALL xerbla( 'ZGEJSV', - info )
950  RETURN
951  ELSE IF ( lquery ) THEN
952  cwork(1) = optwrk
953  cwork(2) = minwrk
954  rwork(1) = minrwrk
955  iwork(1) = max( 4, miniwrk )
956  RETURN
957  END IF
958 *
959 * Quick return for void matrix (Y3K safe)
960 * #:)
961  IF ( ( m .EQ. 0 ) .OR. ( n .EQ. 0 ) ) THEN
962  iwork(1:4) = 0
963  rwork(1:7) = 0
964  RETURN
965  ENDIF
966 *
967 * Determine whether the matrix U should be M x N or M x M
968 *
969  IF ( lsvec ) THEN
970  n1 = n
971  IF ( lsame( jobu, 'F' ) ) n1 = m
972  END IF
973 *
974 * Set numerical parameters
975 *
976 *! NOTE: Make sure DLAMCH() does not fail on the target architecture.
977 *
978  epsln = dlamch('Epsilon')
979  sfmin = dlamch('SafeMinimum')
980  small = sfmin / epsln
981  big = dlamch('O')
982 * BIG = ONE / SFMIN
983 *
984 * Initialize SVA(1:N) = diag( ||A e_i||_2 )_1^N
985 *
986 *(!) If necessary, scale SVA() to protect the largest norm from
987 * overflow. It is possible that this scaling pushes the smallest
988 * column norm left from the underflow threshold (extreme case).
989 *
990  scalem = one / sqrt(dble(m)*dble(n))
991  noscal = .true.
992  goscal = .true.
993  DO 1874 p = 1, n
994  aapp = zero
995  aaqq = one
996  CALL zlassq( m, a(1,p), 1, aapp, aaqq )
997  IF ( aapp .GT. big ) THEN
998  info = - 9
999  CALL xerbla( 'ZGEJSV', -info )
1000  RETURN
1001  END IF
1002  aaqq = sqrt(aaqq)
1003  IF ( ( aapp .LT. (big / aaqq) ) .AND. noscal ) THEN
1004  sva(p) = aapp * aaqq
1005  ELSE
1006  noscal = .false.
1007  sva(p) = aapp * ( aaqq * scalem )
1008  IF ( goscal ) THEN
1009  goscal = .false.
1010  CALL dscal( p-1, scalem, sva, 1 )
1011  END IF
1012  END IF
1013  1874 CONTINUE
1014 *
1015  IF ( noscal ) scalem = one
1016 *
1017  aapp = zero
1018  aaqq = big
1019  DO 4781 p = 1, n
1020  aapp = max( aapp, sva(p) )
1021  IF ( sva(p) .NE. zero ) aaqq = min( aaqq, sva(p) )
1022  4781 CONTINUE
1023 *
1024 * Quick return for zero M x N matrix
1025 * #:)
1026  IF ( aapp .EQ. zero ) THEN
1027  IF ( lsvec ) CALL zlaset( 'G', m, n1, czero, cone, u, ldu )
1028  IF ( rsvec ) CALL zlaset( 'G', n, n, czero, cone, v, ldv )
1029  rwork(1) = one
1030  rwork(2) = one
1031  IF ( errest ) rwork(3) = one
1032  IF ( lsvec .AND. rsvec ) THEN
1033  rwork(4) = one
1034  rwork(5) = one
1035  END IF
1036  IF ( l2tran ) THEN
1037  rwork(6) = zero
1038  rwork(7) = zero
1039  END IF
1040  iwork(1) = 0
1041  iwork(2) = 0
1042  iwork(3) = 0
1043  iwork(4) = -1
1044  RETURN
1045  END IF
1046 *
1047 * Issue warning if denormalized column norms detected. Override the
1048 * high relative accuracy request. Issue licence to kill nonzero columns
1049 * (set them to zero) whose norm is less than sigma_max / BIG (roughly).
1050 * #:(
1051  warning = 0
1052  IF ( aaqq .LE. sfmin ) THEN
1053  l2rank = .true.
1054  l2kill = .true.
1055  warning = 1
1056  END IF
1057 *
1058 * Quick return for one-column matrix
1059 * #:)
1060  IF ( n .EQ. 1 ) THEN
1061 *
1062  IF ( lsvec ) THEN
1063  CALL zlascl( 'G',0,0,sva(1),scalem, m,1,a(1,1),lda,ierr )
1064  CALL zlacpy( 'A', m, 1, a, lda, u, ldu )
1065 * computing all M left singular vectors of the M x 1 matrix
1066  IF ( n1 .NE. n ) THEN
1067  CALL zgeqrf( m, n, u,ldu, cwork, cwork(n+1),lwork-n,ierr )
1068  CALL zungqr( m,n1,1, u,ldu,cwork,cwork(n+1),lwork-n,ierr )
1069  CALL zcopy( m, a(1,1), 1, u(1,1), 1 )
1070  END IF
1071  END IF
1072  IF ( rsvec ) THEN
1073  v(1,1) = cone
1074  END IF
1075  IF ( sva(1) .LT. (big*scalem) ) THEN
1076  sva(1) = sva(1) / scalem
1077  scalem = one
1078  END IF
1079  rwork(1) = one / scalem
1080  rwork(2) = one
1081  IF ( sva(1) .NE. zero ) THEN
1082  iwork(1) = 1
1083  IF ( ( sva(1) / scalem) .GE. sfmin ) THEN
1084  iwork(2) = 1
1085  ELSE
1086  iwork(2) = 0
1087  END IF
1088  ELSE
1089  iwork(1) = 0
1090  iwork(2) = 0
1091  END IF
1092  iwork(3) = 0
1093  iwork(4) = -1
1094  IF ( errest ) rwork(3) = one
1095  IF ( lsvec .AND. rsvec ) THEN
1096  rwork(4) = one
1097  rwork(5) = one
1098  END IF
1099  IF ( l2tran ) THEN
1100  rwork(6) = zero
1101  rwork(7) = zero
1102  END IF
1103  RETURN
1104 *
1105  END IF
1106 *
1107  transp = .false.
1108 *
1109  aatmax = -one
1110  aatmin = big
1111  IF ( rowpiv .OR. l2tran ) THEN
1112 *
1113 * Compute the row norms, needed to determine row pivoting sequence
1114 * (in the case of heavily row weighted A, row pivoting is strongly
1115 * advised) and to collect information needed to compare the
1116 * structures of A * A^* and A^* * A (in the case L2TRAN.EQ..TRUE.).
1117 *
1118  IF ( l2tran ) THEN
1119  DO 1950 p = 1, m
1120  xsc = zero
1121  temp1 = one
1122  CALL zlassq( n, a(p,1), lda, xsc, temp1 )
1123 * ZLASSQ gets both the ell_2 and the ell_infinity norm
1124 * in one pass through the vector
1125  rwork(m+p) = xsc * scalem
1126  rwork(p) = xsc * (scalem*sqrt(temp1))
1127  aatmax = max( aatmax, rwork(p) )
1128  IF (rwork(p) .NE. zero)
1129  \$ aatmin = min(aatmin,rwork(p))
1130  1950 CONTINUE
1131  ELSE
1132  DO 1904 p = 1, m
1133  rwork(m+p) = scalem*abs( a(p,izamax(n,a(p,1),lda)) )
1134  aatmax = max( aatmax, rwork(m+p) )
1135  aatmin = min( aatmin, rwork(m+p) )
1136  1904 CONTINUE
1137  END IF
1138 *
1139  END IF
1140 *
1141 * For square matrix A try to determine whether A^* would be better
1142 * input for the preconditioned Jacobi SVD, with faster convergence.
1143 * The decision is based on an O(N) function of the vector of column
1144 * and row norms of A, based on the Shannon entropy. This should give
1145 * the right choice in most cases when the difference actually matters.
1146 * It may fail and pick the slower converging side.
1147 *
1148  entra = zero
1149  entrat = zero
1150  IF ( l2tran ) THEN
1151 *
1152  xsc = zero
1153  temp1 = one
1154  CALL dlassq( n, sva, 1, xsc, temp1 )
1155  temp1 = one / temp1
1156 *
1157  entra = zero
1158  DO 1113 p = 1, n
1159  big1 = ( ( sva(p) / xsc )**2 ) * temp1
1160  IF ( big1 .NE. zero ) entra = entra + big1 * dlog(big1)
1161  1113 CONTINUE
1162  entra = - entra / dlog(dble(n))
1163 *
1164 * Now, SVA().^2/Trace(A^* * A) is a point in the probability simplex.
1165 * It is derived from the diagonal of A^* * A. Do the same with the
1166 * diagonal of A * A^*, compute the entropy of the corresponding
1167 * probability distribution. Note that A * A^* and A^* * A have the
1168 * same trace.
1169 *
1170  entrat = zero
1171  DO 1114 p = 1, m
1172  big1 = ( ( rwork(p) / xsc )**2 ) * temp1
1173  IF ( big1 .NE. zero ) entrat = entrat + big1 * dlog(big1)
1174  1114 CONTINUE
1175  entrat = - entrat / dlog(dble(m))
1176 *
1177 * Analyze the entropies and decide A or A^*. Smaller entropy
1178 * usually means better input for the algorithm.
1179 *
1180  transp = ( entrat .LT. entra )
1181 *
1182 * If A^* is better than A, take the adjoint of A. This is allowed
1183 * only for square matrices, M=N.
1184  IF ( transp ) THEN
1185 * In an optimal implementation, this trivial transpose
1186 * should be replaced with faster transpose.
1187  DO 1115 p = 1, n - 1
1188  a(p,p) = conjg(a(p,p))
1189  DO 1116 q = p + 1, n
1190  ctemp = conjg(a(q,p))
1191  a(q,p) = conjg(a(p,q))
1192  a(p,q) = ctemp
1193  1116 CONTINUE
1194  1115 CONTINUE
1195  a(n,n) = conjg(a(n,n))
1196  DO 1117 p = 1, n
1197  rwork(m+p) = sva(p)
1198  sva(p) = rwork(p)
1199 * previously computed row 2-norms are now column 2-norms
1200 * of the transposed matrix
1201  1117 CONTINUE
1202  temp1 = aapp
1203  aapp = aatmax
1204  aatmax = temp1
1205  temp1 = aaqq
1206  aaqq = aatmin
1207  aatmin = temp1
1208  kill = lsvec
1209  lsvec = rsvec
1210  rsvec = kill
1211  IF ( lsvec ) n1 = n
1212 *
1213  rowpiv = .true.
1214  END IF
1215 *
1216  END IF
1217 * END IF L2TRAN
1218 *
1219 * Scale the matrix so that its maximal singular value remains less
1220 * than SQRT(BIG) -- the matrix is scaled so that its maximal column
1221 * has Euclidean norm equal to SQRT(BIG/N). The only reason to keep
1222 * SQRT(BIG) instead of BIG is the fact that ZGEJSV uses LAPACK and
1223 * BLAS routines that, in some implementations, are not capable of
1224 * working in the full interval [SFMIN,BIG] and that they may provoke
1225 * overflows in the intermediate results. If the singular values spread
1226 * from SFMIN to BIG, then ZGESVJ will compute them. So, in that case,
1227 * one should use ZGESVJ instead of ZGEJSV.
1228 * >> change in the April 2016 update: allow bigger range, i.e. the
1229 * largest column is allowed up to BIG/N and ZGESVJ will do the rest.
1230  big1 = sqrt( big )
1231  temp1 = sqrt( big / dble(n) )
1232 * TEMP1 = BIG/DBLE(N)
1233 *
1234  CALL dlascl( 'G', 0, 0, aapp, temp1, n, 1, sva, n, ierr )
1235  IF ( aaqq .GT. (aapp * sfmin) ) THEN
1236  aaqq = ( aaqq / aapp ) * temp1
1237  ELSE
1238  aaqq = ( aaqq * temp1 ) / aapp
1239  END IF
1240  temp1 = temp1 * scalem
1241  CALL zlascl( 'G', 0, 0, aapp, temp1, m, n, a, lda, ierr )
1242 *
1243 * To undo scaling at the end of this procedure, multiply the
1244 * computed singular values with USCAL2 / USCAL1.
1245 *
1246  uscal1 = temp1
1247  uscal2 = aapp
1248 *
1249  IF ( l2kill ) THEN
1250 * L2KILL enforces computation of nonzero singular values in
1251 * the restricted range of condition number of the initial A,
1252 * sigma_max(A) / sigma_min(A) approx. SQRT(BIG)/SQRT(SFMIN).
1253  xsc = sqrt( sfmin )
1254  ELSE
1255  xsc = small
1256 *
1257 * Now, if the condition number of A is too big,
1258 * sigma_max(A) / sigma_min(A) .GT. SQRT(BIG/N) * EPSLN / SFMIN,
1259 * as a precaution measure, the full SVD is computed using ZGESVJ
1260 * with accumulated Jacobi rotations. This provides numerically
1261 * more robust computation, at the cost of slightly increased run
1262 * time. Depending on the concrete implementation of BLAS and LAPACK
1263 * (i.e. how they behave in presence of extreme ill-conditioning) the
1264 * implementor may decide to remove this switch.
1265  IF ( ( aaqq.LT.sqrt(sfmin) ) .AND. lsvec .AND. rsvec ) THEN
1266  jracc = .true.
1267  END IF
1268 *
1269  END IF
1270  IF ( aaqq .LT. xsc ) THEN
1271  DO 700 p = 1, n
1272  IF ( sva(p) .LT. xsc ) THEN
1273  CALL zlaset( 'A', m, 1, czero, czero, a(1,p), lda )
1274  sva(p) = zero
1275  END IF
1276  700 CONTINUE
1277  END IF
1278 *
1279 * Preconditioning using QR factorization with pivoting
1280 *
1281  IF ( rowpiv ) THEN
1282 * Optional row permutation (Bjoerck row pivoting):
1283 * A result by Cox and Higham shows that the Bjoerck's
1284 * row pivoting combined with standard column pivoting
1285 * has similar effect as Powell-Reid complete pivoting.
1286 * The ell-infinity norms of A are made nonincreasing.
1287  IF ( ( lsvec .AND. rsvec ) .AND. .NOT.( jracc ) ) THEN
1288  iwoff = 2*n
1289  ELSE
1290  iwoff = n
1291  END IF
1292  DO 1952 p = 1, m - 1
1293  q = idamax( m-p+1, rwork(m+p), 1 ) + p - 1
1294  iwork(iwoff+p) = q
1295  IF ( p .NE. q ) THEN
1296  temp1 = rwork(m+p)
1297  rwork(m+p) = rwork(m+q)
1298  rwork(m+q) = temp1
1299  END IF
1300  1952 CONTINUE
1301  CALL zlaswp( n, a, lda, 1, m-1, iwork(iwoff+1), 1 )
1302  END IF
1303 *
1304 * End of the preparation phase (scaling, optional sorting and
1305 * transposing, optional flushing of small columns).
1306 *
1307 * Preconditioning
1308 *
1309 * If the full SVD is needed, the right singular vectors are computed
1310 * from a matrix equation, and for that we need theoretical analysis
1311 * of the Businger-Golub pivoting. So we use ZGEQP3 as the first RR QRF.
1312 * In all other cases the first RR QRF can be chosen by other criteria
1313 * (eg speed by replacing global with restricted window pivoting, such
1314 * as in xGEQPX from TOMS # 782). Good results will be obtained using
1315 * xGEQPX with properly (!) chosen numerical parameters.
1316 * Any improvement of ZGEQP3 improves overall performance of ZGEJSV.
1317 *
1318 * A * P1 = Q1 * [ R1^* 0]^*:
1319  DO 1963 p = 1, n
1320 * .. all columns are free columns
1321  iwork(p) = 0
1322  1963 CONTINUE
1323  CALL zgeqp3( m, n, a, lda, iwork, cwork, cwork(n+1), lwork-n,
1324  \$ rwork, ierr )
1325 *
1326 * The upper triangular matrix R1 from the first QRF is inspected for
1327 * rank deficiency and possibilities for deflation, or possible
1328 * ill-conditioning. Depending on the user specified flag L2RANK,
1329 * the procedure explores possibilities to reduce the numerical
1330 * rank by inspecting the computed upper triangular factor. If
1331 * L2RANK or L2ABER are up, then ZGEJSV will compute the SVD of
1332 * A + dA, where ||dA|| <= f(M,N)*EPSLN.
1333 *
1334  nr = 1
1335  IF ( l2aber ) THEN
1336 * Standard absolute error bound suffices. All sigma_i with
1337 * sigma_i < N*EPSLN*||A|| are flushed to zero. This is an
1338 * aggressive enforcement of lower numerical rank by introducing a
1339 * backward error of the order of N*EPSLN*||A||.
1340  temp1 = sqrt(dble(n))*epsln
1341  DO 3001 p = 2, n
1342  IF ( abs(a(p,p)) .GE. (temp1*abs(a(1,1))) ) THEN
1343  nr = nr + 1
1344  ELSE
1345  GO TO 3002
1346  END IF
1347  3001 CONTINUE
1348  3002 CONTINUE
1349  ELSE IF ( l2rank ) THEN
1350 * .. similarly as above, only slightly more gentle (less aggressive).
1351 * Sudden drop on the diagonal of R1 is used as the criterion for
1352 * close-to-rank-deficient.
1353  temp1 = sqrt(sfmin)
1354  DO 3401 p = 2, n
1355  IF ( ( abs(a(p,p)) .LT. (epsln*abs(a(p-1,p-1))) ) .OR.
1356  \$ ( abs(a(p,p)) .LT. small ) .OR.
1357  \$ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3402
1358  nr = nr + 1
1359  3401 CONTINUE
1360  3402 CONTINUE
1361 *
1362  ELSE
1363 * The goal is high relative accuracy. However, if the matrix
1364 * has high scaled condition number the relative accuracy is in
1365 * general not feasible. Later on, a condition number estimator
1366 * will be deployed to estimate the scaled condition number.
1367 * Here we just remove the underflowed part of the triangular
1368 * factor. This prevents the situation in which the code is
1369 * working hard to get the accuracy not warranted by the data.
1370  temp1 = sqrt(sfmin)
1371  DO 3301 p = 2, n
1372  IF ( ( abs(a(p,p)) .LT. small ) .OR.
1373  \$ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3302
1374  nr = nr + 1
1375  3301 CONTINUE
1376  3302 CONTINUE
1377 *
1378  END IF
1379 *
1380  almort = .false.
1381  IF ( nr .EQ. n ) THEN
1382  maxprj = one
1383  DO 3051 p = 2, n
1384  temp1 = abs(a(p,p)) / sva(iwork(p))
1385  maxprj = min( maxprj, temp1 )
1386  3051 CONTINUE
1387  IF ( maxprj**2 .GE. one - dble(n)*epsln ) almort = .true.
1388  END IF
1389 *
1390 *
1391  sconda = - one
1392  condr1 = - one
1393  condr2 = - one
1394 *
1395  IF ( errest ) THEN
1396  IF ( n .EQ. nr ) THEN
1397  IF ( rsvec ) THEN
1398 * .. V is available as workspace
1399  CALL zlacpy( 'U', n, n, a, lda, v, ldv )
1400  DO 3053 p = 1, n
1401  temp1 = sva(iwork(p))
1402  CALL zdscal( p, one/temp1, v(1,p), 1 )
1403  3053 CONTINUE
1404  IF ( lsvec )THEN
1405  CALL zpocon( 'U', n, v, ldv, one, temp1,
1406  \$ cwork(n+1), rwork, ierr )
1407  ELSE
1408  CALL zpocon( 'U', n, v, ldv, one, temp1,
1409  \$ cwork, rwork, ierr )
1410  END IF
1411 *
1412  ELSE IF ( lsvec ) THEN
1413 * .. U is available as workspace
1414  CALL zlacpy( 'U', n, n, a, lda, u, ldu )
1415  DO 3054 p = 1, n
1416  temp1 = sva(iwork(p))
1417  CALL zdscal( p, one/temp1, u(1,p), 1 )
1418  3054 CONTINUE
1419  CALL zpocon( 'U', n, u, ldu, one, temp1,
1420  \$ cwork(n+1), rwork, ierr )
1421  ELSE
1422  CALL zlacpy( 'U', n, n, a, lda, cwork, n )
1423 *[] CALL ZLACPY( 'U', N, N, A, LDA, CWORK(N+1), N )
1424 * Change: here index shifted by N to the left, CWORK(1:N)
1425 * not needed for SIGMA only computation
1426  DO 3052 p = 1, n
1427  temp1 = sva(iwork(p))
1428 *[] CALL ZDSCAL( p, ONE/TEMP1, CWORK(N+(p-1)*N+1), 1 )
1429  CALL zdscal( p, one/temp1, cwork((p-1)*n+1), 1 )
1430  3052 CONTINUE
1431 * .. the columns of R are scaled to have unit Euclidean lengths.
1432 *[] CALL ZPOCON( 'U', N, CWORK(N+1), N, ONE, TEMP1,
1433 *[] \$ CWORK(N+N*N+1), RWORK, IERR )
1434  CALL zpocon( 'U', n, cwork, n, one, temp1,
1435  \$ cwork(n*n+1), rwork, ierr )
1436 *
1437  END IF
1438  IF ( temp1 .NE. zero ) THEN
1439  sconda = one / sqrt(temp1)
1440  ELSE
1441  sconda = - one
1442  END IF
1443 * SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1).
1444 * N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
1445  ELSE
1446  sconda = - one
1447  END IF
1448  END IF
1449 *
1450  l2pert = l2pert .AND. ( abs( a(1,1)/a(nr,nr) ) .GT. sqrt(big1) )
1451 * If there is no violent scaling, artificial perturbation is not needed.
1452 *
1453 * Phase 3:
1454 *
1455  IF ( .NOT. ( rsvec .OR. lsvec ) ) THEN
1456 *
1457 * Singular Values only
1458 *
1459 * .. transpose A(1:NR,1:N)
1460  DO 1946 p = 1, min( n-1, nr )
1461  CALL zcopy( n-p, a(p,p+1), lda, a(p+1,p), 1 )
1462  CALL zlacgv( n-p+1, a(p,p), 1 )
1463  1946 CONTINUE
1464  IF ( nr .EQ. n ) a(n,n) = conjg(a(n,n))
1465 *
1466 * The following two DO-loops introduce small relative perturbation
1467 * into the strict upper triangle of the lower triangular matrix.
1468 * Small entries below the main diagonal are also changed.
1469 * This modification is useful if the computing environment does not
1470 * provide/allow FLUSH TO ZERO underflow, for it prevents many
1471 * annoying denormalized numbers in case of strongly scaled matrices.
1472 * The perturbation is structured so that it does not introduce any
1473 * new perturbation of the singular values, and it does not destroy
1474 * the job done by the preconditioner.
1475 * The licence for this perturbation is in the variable L2PERT, which
1476 * should be .FALSE. if FLUSH TO ZERO underflow is active.
1477 *
1478  IF ( .NOT. almort ) THEN
1479 *
1480  IF ( l2pert ) THEN
1481 * XSC = SQRT(SMALL)
1482  xsc = epsln / dble(n)
1483  DO 4947 q = 1, nr
1484  ctemp = dcmplx(xsc*abs(a(q,q)),zero)
1485  DO 4949 p = 1, n
1486  IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1487  \$ .OR. ( p .LT. q ) )
1488 * \$ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) )
1489  \$ a(p,q) = ctemp
1490  4949 CONTINUE
1491  4947 CONTINUE
1492  ELSE
1493  CALL zlaset( 'U', nr-1,nr-1, czero,czero, a(1,2),lda )
1494  END IF
1495 *
1496 * .. second preconditioning using the QR factorization
1497 *
1498  CALL zgeqrf( n,nr, a,lda, cwork, cwork(n+1),lwork-n, ierr )
1499 *
1500 * .. and transpose upper to lower triangular
1501  DO 1948 p = 1, nr - 1
1502  CALL zcopy( nr-p, a(p,p+1), lda, a(p+1,p), 1 )
1503  CALL zlacgv( nr-p+1, a(p,p), 1 )
1504  1948 CONTINUE
1505 *
1506  END IF
1507 *
1508 * Row-cyclic Jacobi SVD algorithm with column pivoting
1509 *
1510 * .. again some perturbation (a "background noise") is added
1511 * to drown denormals
1512  IF ( l2pert ) THEN
1513 * XSC = SQRT(SMALL)
1514  xsc = epsln / dble(n)
1515  DO 1947 q = 1, nr
1516  ctemp = dcmplx(xsc*abs(a(q,q)),zero)
1517  DO 1949 p = 1, nr
1518  IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1519  \$ .OR. ( p .LT. q ) )
1520 * \$ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) )
1521  \$ a(p,q) = ctemp
1522  1949 CONTINUE
1523  1947 CONTINUE
1524  ELSE
1525  CALL zlaset( 'U', nr-1, nr-1, czero, czero, a(1,2), lda )
1526  END IF
1527 *
1528 * .. and one-sided Jacobi rotations are started on a lower
1529 * triangular matrix (plus perturbation which is ignored in
1530 * the part which destroys triangular form (confusing?!))
1531 *
1532  CALL zgesvj( 'L', 'N', 'N', nr, nr, a, lda, sva,
1533  \$ n, v, ldv, cwork, lwork, rwork, lrwork, info )
1534 *
1535  scalem = rwork(1)
1536  numrank = nint(rwork(2))
1537 *
1538 *
1539  ELSE IF ( ( rsvec .AND. ( .NOT. lsvec ) .AND. ( .NOT. jracc ) )
1540  \$ .OR.
1541  \$ ( jracc .AND. ( .NOT. lsvec ) .AND. ( nr .NE. n ) ) ) THEN
1542 *
1543 * -> Singular Values and Right Singular Vectors <-
1544 *
1545  IF ( almort ) THEN
1546 *
1547 * .. in this case NR equals N
1548  DO 1998 p = 1, nr
1549  CALL zcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1550  CALL zlacgv( n-p+1, v(p,p), 1 )
1551  1998 CONTINUE
1552  CALL zlaset( 'U', nr-1,nr-1, czero, czero, v(1,2), ldv )
1553 *
1554  CALL zgesvj( 'L','U','N', n, nr, v, ldv, sva, nr, a, lda,
1555  \$ cwork, lwork, rwork, lrwork, info )
1556  scalem = rwork(1)
1557  numrank = nint(rwork(2))
1558
1559  ELSE
1560 *
1561 * .. two more QR factorizations ( one QRF is not enough, two require
1562 * accumulated product of Jacobi rotations, three are perfect )
1563 *
1564  CALL zlaset( 'L', nr-1,nr-1, czero, czero, a(2,1), lda )
1565  CALL zgelqf( nr,n, a, lda, cwork, cwork(n+1), lwork-n, ierr)
1566  CALL zlacpy( 'L', nr, nr, a, lda, v, ldv )
1567  CALL zlaset( 'U', nr-1,nr-1, czero, czero, v(1,2), ldv )
1568  CALL zgeqrf( nr, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1569  \$ lwork-2*n, ierr )
1570  DO 8998 p = 1, nr
1571  CALL zcopy( nr-p+1, v(p,p), ldv, v(p,p), 1 )
1572  CALL zlacgv( nr-p+1, v(p,p), 1 )
1573  8998 CONTINUE
1574  CALL zlaset('U', nr-1, nr-1, czero, czero, v(1,2), ldv)
1575 *
1576  CALL zgesvj( 'L', 'U','N', nr, nr, v,ldv, sva, nr, u,
1577  \$ ldu, cwork(n+1), lwork-n, rwork, lrwork, info )
1578  scalem = rwork(1)
1579  numrank = nint(rwork(2))
1580  IF ( nr .LT. n ) THEN
1581  CALL zlaset( 'A',n-nr, nr, czero,czero, v(nr+1,1), ldv )
1582  CALL zlaset( 'A',nr, n-nr, czero,czero, v(1,nr+1), ldv )
1583  CALL zlaset( 'A',n-nr,n-nr,czero,cone, v(nr+1,nr+1),ldv )
1584  END IF
1585 *
1586  CALL zunmlq( 'L', 'C', n, n, nr, a, lda, cwork,
1587  \$ v, ldv, cwork(n+1), lwork-n, ierr )
1588 *
1589  END IF
1590 * .. permute the rows of V
1591 * DO 8991 p = 1, N
1592 * CALL ZCOPY( N, V(p,1), LDV, A(IWORK(p),1), LDA )
1593 * 8991 CONTINUE
1594 * CALL ZLACPY( 'All', N, N, A, LDA, V, LDV )
1595  CALL zlapmr( .false., n, n, v, ldv, iwork )
1596 *
1597  IF ( transp ) THEN
1598  CALL zlacpy( 'A', n, n, v, ldv, u, ldu )
1599  END IF
1600 *
1601  ELSE IF ( jracc .AND. (.NOT. lsvec) .AND. ( nr.EQ. n ) ) THEN
1602 *
1603  CALL zlaset( 'L', n-1,n-1, czero, czero, a(2,1), lda )
1604 *
1605  CALL zgesvj( 'U','N','V', n, n, a, lda, sva, n, v, ldv,
1606  \$ cwork, lwork, rwork, lrwork, info )
1607  scalem = rwork(1)
1608  numrank = nint(rwork(2))
1609  CALL zlapmr( .false., n, n, v, ldv, iwork )
1610 *
1611  ELSE IF ( lsvec .AND. ( .NOT. rsvec ) ) THEN
1612 *
1613 * .. Singular Values and Left Singular Vectors ..
1614 *
1615 * .. second preconditioning step to avoid need to accumulate
1616 * Jacobi rotations in the Jacobi iterations.
1617  DO 1965 p = 1, nr
1618  CALL zcopy( n-p+1, a(p,p), lda, u(p,p), 1 )
1619  CALL zlacgv( n-p+1, u(p,p), 1 )
1620  1965 CONTINUE
1621  CALL zlaset( 'U', nr-1, nr-1, czero, czero, u(1,2), ldu )
1622 *
1623  CALL zgeqrf( n, nr, u, ldu, cwork(n+1), cwork(2*n+1),
1624  \$ lwork-2*n, ierr )
1625 *
1626  DO 1967 p = 1, nr - 1
1627  CALL zcopy( nr-p, u(p,p+1), ldu, u(p+1,p), 1 )
1628  CALL zlacgv( n-p+1, u(p,p), 1 )
1629  1967 CONTINUE
1630  CALL zlaset( 'U', nr-1, nr-1, czero, czero, u(1,2), ldu )
1631 *
1632  CALL zgesvj( 'L', 'U', 'N', nr,nr, u, ldu, sva, nr, a,
1633  \$ lda, cwork(n+1), lwork-n, rwork, lrwork, info )
1634  scalem = rwork(1)
1635  numrank = nint(rwork(2))
1636 *
1637  IF ( nr .LT. m ) THEN
1638  CALL zlaset( 'A', m-nr, nr,czero, czero, u(nr+1,1), ldu )
1639  IF ( nr .LT. n1 ) THEN
1640  CALL zlaset( 'A',nr, n1-nr, czero, czero, u(1,nr+1),ldu )
1641  CALL zlaset( 'A',m-nr,n1-nr,czero,cone,u(nr+1,nr+1),ldu )
1642  END IF
1643  END IF
1644 *
1645  CALL zunmqr( 'L', 'N', m, n1, n, a, lda, cwork, u,
1646  \$ ldu, cwork(n+1), lwork-n, ierr )
1647 *
1648  IF ( rowpiv )
1649  \$ CALL zlaswp( n1, u, ldu, 1, m-1, iwork(iwoff+1), -1 )
1650 *
1651  DO 1974 p = 1, n1
1652  xsc = one / dznrm2( m, u(1,p), 1 )
1653  CALL zdscal( m, xsc, u(1,p), 1 )
1654  1974 CONTINUE
1655 *
1656  IF ( transp ) THEN
1657  CALL zlacpy( 'A', n, n, u, ldu, v, ldv )
1658  END IF
1659 *
1660  ELSE
1661 *
1662 * .. Full SVD ..
1663 *
1664  IF ( .NOT. jracc ) THEN
1665 *
1666  IF ( .NOT. almort ) THEN
1667 *
1668 * Second Preconditioning Step (QRF [with pivoting])
1669 * Note that the composition of TRANSPOSE, QRF and TRANSPOSE is
1670 * equivalent to an LQF CALL. Since in many libraries the QRF
1671 * seems to be better optimized than the LQF, we do explicit
1672 * transpose and use the QRF. This is subject to changes in an
1673 * optimized implementation of ZGEJSV.
1674 *
1675  DO 1968 p = 1, nr
1676  CALL zcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1677  CALL zlacgv( n-p+1, v(p,p), 1 )
1678  1968 CONTINUE
1679 *
1680 * .. the following two loops perturb small entries to avoid
1681 * denormals in the second QR factorization, where they are
1682 * as good as zeros. This is done to avoid painfully slow
1683 * computation with denormals. The relative size of the perturbation
1684 * is a parameter that can be changed by the implementer.
1685 * This perturbation device will be obsolete on machines with
1686 * properly implemented arithmetic.
1687 * To switch it off, set L2PERT=.FALSE. To remove it from the
1688 * code, remove the action under L2PERT=.TRUE., leave the ELSE part.
1689 * The following two loops should be blocked and fused with the
1690 * transposed copy above.
1691 *
1692  IF ( l2pert ) THEN
1693  xsc = sqrt(small)
1694  DO 2969 q = 1, nr
1695  ctemp = dcmplx(xsc*abs( v(q,q) ),zero)
1696  DO 2968 p = 1, n
1697  IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
1698  \$ .OR. ( p .LT. q ) )
1699 * \$ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) )
1700  \$ v(p,q) = ctemp
1701  IF ( p .LT. q ) v(p,q) = - v(p,q)
1702  2968 CONTINUE
1703  2969 CONTINUE
1704  ELSE
1705  CALL zlaset( 'U', nr-1, nr-1, czero, czero, v(1,2), ldv )
1706  END IF
1707 *
1708 * Estimate the row scaled condition number of R1
1709 * (If R1 is rectangular, N > NR, then the condition number
1710 * of the leading NR x NR submatrix is estimated.)
1711 *
1712  CALL zlacpy( 'L', nr, nr, v, ldv, cwork(2*n+1), nr )
1713  DO 3950 p = 1, nr
1714  temp1 = dznrm2(nr-p+1,cwork(2*n+(p-1)*nr+p),1)
1715  CALL zdscal(nr-p+1,one/temp1,cwork(2*n+(p-1)*nr+p),1)
1716  3950 CONTINUE
1717  CALL zpocon('L',nr,cwork(2*n+1),nr,one,temp1,
1718  \$ cwork(2*n+nr*nr+1),rwork,ierr)
1719  condr1 = one / sqrt(temp1)
1720 * .. here need a second opinion on the condition number
1721 * .. then assume worst case scenario
1722 * R1 is OK for inverse <=> CONDR1 .LT. DBLE(N)
1723 * more conservative <=> CONDR1 .LT. SQRT(DBLE(N))
1724 *
1725  cond_ok = sqrt(sqrt(dble(nr)))
1726 *[TP] COND_OK is a tuning parameter.
1727 *
1728  IF ( condr1 .LT. cond_ok ) THEN
1729 * .. the second QRF without pivoting. Note: in an optimized
1730 * implementation, this QRF should be implemented as the QRF
1731 * of a lower triangular matrix.
1732 * R1^* = Q2 * R2
1733  CALL zgeqrf( n, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1734  \$ lwork-2*n, ierr )
1735 *
1736  IF ( l2pert ) THEN
1737  xsc = sqrt(small)/epsln
1738  DO 3959 p = 2, nr
1739  DO 3958 q = 1, p - 1
1740  ctemp=dcmplx(xsc*min(abs(v(p,p)),abs(v(q,q))),
1741  \$ zero)
1742  IF ( abs(v(q,p)) .LE. temp1 )
1743 * \$ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) )
1744  \$ v(q,p) = ctemp
1745  3958 CONTINUE
1746  3959 CONTINUE
1747  END IF
1748 *
1749  IF ( nr .NE. n )
1750  \$ CALL zlacpy( 'A', n, nr, v, ldv, cwork(2*n+1), n )
1751 * .. save ...
1752 *
1753 * .. this transposed copy should be better than naive
1754  DO 1969 p = 1, nr - 1
1755  CALL zcopy( nr-p, v(p,p+1), ldv, v(p+1,p), 1 )
1756  CALL zlacgv(nr-p+1, v(p,p), 1 )
1757  1969 CONTINUE
1758  v(nr,nr)=conjg(v(nr,nr))
1759 *
1760  condr2 = condr1
1761 *
1762  ELSE
1763 *
1764 * .. ill-conditioned case: second QRF with pivoting
1765 * Note that windowed pivoting would be equally good
1766 * numerically, and more run-time efficient. So, in
1767 * an optimal implementation, the next call to ZGEQP3
1768 * should be replaced with eg. CALL ZGEQPX (ACM TOMS #782)
1769 * with properly (carefully) chosen parameters.
1770 *
1771 * R1^* * P2 = Q2 * R2
1772  DO 3003 p = 1, nr
1773  iwork(n+p) = 0
1774  3003 CONTINUE
1775  CALL zgeqp3( n, nr, v, ldv, iwork(n+1), cwork(n+1),
1776  \$ cwork(2*n+1), lwork-2*n, rwork, ierr )
1777 ** CALL ZGEQRF( N, NR, V, LDV, CWORK(N+1), CWORK(2*N+1),
1778 ** \$ LWORK-2*N, IERR )
1779  IF ( l2pert ) THEN
1780  xsc = sqrt(small)
1781  DO 3969 p = 2, nr
1782  DO 3968 q = 1, p - 1
1783  ctemp=dcmplx(xsc*min(abs(v(p,p)),abs(v(q,q))),
1784  \$ zero)
1785  IF ( abs(v(q,p)) .LE. temp1 )
1786 * \$ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) )
1787  \$ v(q,p) = ctemp
1788  3968 CONTINUE
1789  3969 CONTINUE
1790  END IF
1791 *
1792  CALL zlacpy( 'A', n, nr, v, ldv, cwork(2*n+1), n )
1793 *
1794  IF ( l2pert ) THEN
1795  xsc = sqrt(small)
1796  DO 8970 p = 2, nr
1797  DO 8971 q = 1, p - 1
1798  ctemp=dcmplx(xsc*min(abs(v(p,p)),abs(v(q,q))),
1799  \$ zero)
1800 * V(p,q) = - TEMP1*( V(q,p) / ABS(V(q,p)) )
1801  v(p,q) = - ctemp
1802  8971 CONTINUE
1803  8970 CONTINUE
1804  ELSE
1805  CALL zlaset( 'L',nr-1,nr-1,czero,czero,v(2,1),ldv )
1806  END IF
1807 * Now, compute R2 = L3 * Q3, the LQ factorization.
1808  CALL zgelqf( nr, nr, v, ldv, cwork(2*n+n*nr+1),
1809  \$ cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, ierr )
1810 * .. and estimate the condition number
1811  CALL zlacpy( 'L',nr,nr,v,ldv,cwork(2*n+n*nr+nr+1),nr )
1812  DO 4950 p = 1, nr
1813  temp1 = dznrm2( p, cwork(2*n+n*nr+nr+p), nr )
1814  CALL zdscal( p, one/temp1, cwork(2*n+n*nr+nr+p), nr )
1815  4950 CONTINUE
1816  CALL zpocon( 'L',nr,cwork(2*n+n*nr+nr+1),nr,one,temp1,
1817  \$ cwork(2*n+n*nr+nr+nr*nr+1),rwork,ierr )
1818  condr2 = one / sqrt(temp1)
1819 *
1820 *
1821  IF ( condr2 .GE. cond_ok ) THEN
1822 * .. save the Householder vectors used for Q3
1823 * (this overwrites the copy of R2, as it will not be
1824 * needed in this branch, but it does not overwritte the
1825 * Huseholder vectors of Q2.).
1826  CALL zlacpy( 'U', nr, nr, v, ldv, cwork(2*n+1), n )
1827 * .. and the rest of the information on Q3 is in
1828 * WORK(2*N+N*NR+1:2*N+N*NR+N)
1829  END IF
1830 *
1831  END IF
1832 *
1833  IF ( l2pert ) THEN
1834  xsc = sqrt(small)
1835  DO 4968 q = 2, nr
1836  ctemp = xsc * v(q,q)
1837  DO 4969 p = 1, q - 1
1838 * V(p,q) = - TEMP1*( V(p,q) / ABS(V(p,q)) )
1839  v(p,q) = - ctemp
1840  4969 CONTINUE
1841  4968 CONTINUE
1842  ELSE
1843  CALL zlaset( 'U', nr-1,nr-1, czero,czero, v(1,2), ldv )
1844  END IF
1845 *
1846 * Second preconditioning finished; continue with Jacobi SVD
1847 * The input matrix is lower trinagular.
1848 *
1849 * Recover the right singular vectors as solution of a well
1850 * conditioned triangular matrix equation.
1851 *
1852  IF ( condr1 .LT. cond_ok ) THEN
1853 *
1854  CALL zgesvj( 'L','U','N',nr,nr,v,ldv,sva,nr,u, ldu,
1855  \$ cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,rwork,
1856  \$ lrwork, info )
1857  scalem = rwork(1)
1858  numrank = nint(rwork(2))
1859  DO 3970 p = 1, nr
1860  CALL zcopy( nr, v(1,p), 1, u(1,p), 1 )
1861  CALL zdscal( nr, sva(p), v(1,p), 1 )
1862  3970 CONTINUE
1863
1864 * .. pick the right matrix equation and solve it
1865 *
1866  IF ( nr .EQ. n ) THEN
1867 * :)) .. best case, R1 is inverted. The solution of this matrix
1868 * equation is Q2*V2 = the product of the Jacobi rotations
1869 * used in ZGESVJ, premultiplied with the orthogonal matrix
1870 * from the second QR factorization.
1871  CALL ztrsm('L','U','N','N', nr,nr,cone, a,lda, v,ldv)
1872  ELSE
1873 * .. R1 is well conditioned, but non-square. Adjoint of R2
1874 * is inverted to get the product of the Jacobi rotations
1875 * used in ZGESVJ. The Q-factor from the second QR
1876 * factorization is then built in explicitly.
1877  CALL ztrsm('L','U','C','N',nr,nr,cone,cwork(2*n+1),
1878  \$ n,v,ldv)
1879  IF ( nr .LT. n ) THEN
1880  CALL zlaset('A',n-nr,nr,czero,czero,v(nr+1,1),ldv)
1881  CALL zlaset('A',nr,n-nr,czero,czero,v(1,nr+1),ldv)
1882  CALL zlaset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1883  END IF
1884  CALL zunmqr('L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1885  \$ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr)
1886  END IF
1887 *
1888  ELSE IF ( condr2 .LT. cond_ok ) THEN
1889 *
1890 * The matrix R2 is inverted. The solution of the matrix equation
1891 * is Q3^* * V3 = the product of the Jacobi rotations (appplied to
1892 * the lower triangular L3 from the LQ factorization of
1893 * R2=L3*Q3), pre-multiplied with the transposed Q3.
1894  CALL zgesvj( 'L', 'U', 'N', nr, nr, v, ldv, sva, nr, u,
1895  \$ ldu, cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr,
1896  \$ rwork, lrwork, info )
1897  scalem = rwork(1)
1898  numrank = nint(rwork(2))
1899  DO 3870 p = 1, nr
1900  CALL zcopy( nr, v(1,p), 1, u(1,p), 1 )
1901  CALL zdscal( nr, sva(p), u(1,p), 1 )
1902  3870 CONTINUE
1903  CALL ztrsm('L','U','N','N',nr,nr,cone,cwork(2*n+1),n,
1904  \$ u,ldu)
1905 * .. apply the permutation from the second QR factorization
1906  DO 873 q = 1, nr
1907  DO 872 p = 1, nr
1908  cwork(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1909  872 CONTINUE
1910  DO 874 p = 1, nr
1911  u(p,q) = cwork(2*n+n*nr+nr+p)
1912  874 CONTINUE
1913  873 CONTINUE
1914  IF ( nr .LT. n ) THEN
1915  CALL zlaset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1916  CALL zlaset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1917  CALL zlaset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1918  END IF
1919  CALL zunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1920  \$ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1921  ELSE
1922 * Last line of defense.
1923 * #:( This is a rather pathological case: no scaled condition
1924 * improvement after two pivoted QR factorizations. Other
1925 * possibility is that the rank revealing QR factorization
1926 * or the condition estimator has failed, or the COND_OK
1927 * is set very close to ONE (which is unnecessary). Normally,
1928 * this branch should never be executed, but in rare cases of
1929 * failure of the RRQR or condition estimator, the last line of
1930 * defense ensures that ZGEJSV completes the task.
1931 * Compute the full SVD of L3 using ZGESVJ with explicit
1932 * accumulation of Jacobi rotations.
1933  CALL zgesvj( 'L', 'U', 'V', nr, nr, v, ldv, sva, nr, u,
1934  \$ ldu, cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr,
1935  \$ rwork, lrwork, info )
1936  scalem = rwork(1)
1937  numrank = nint(rwork(2))
1938  IF ( nr .LT. n ) THEN
1939  CALL zlaset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1940  CALL zlaset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1941  CALL zlaset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1942  END IF
1943  CALL zunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1944  \$ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1945 *
1946  CALL zunmlq( 'L', 'C', nr, nr, nr, cwork(2*n+1), n,
1947  \$ cwork(2*n+n*nr+1), u, ldu, cwork(2*n+n*nr+nr+1),
1948  \$ lwork-2*n-n*nr-nr, ierr )
1949  DO 773 q = 1, nr
1950  DO 772 p = 1, nr
1951  cwork(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1952  772 CONTINUE
1953  DO 774 p = 1, nr
1954  u(p,q) = cwork(2*n+n*nr+nr+p)
1955  774 CONTINUE
1956  773 CONTINUE
1957 *
1958  END IF
1959 *
1960 * Permute the rows of V using the (column) permutation from the
1961 * first QRF. Also, scale the columns to make them unit in
1962 * Euclidean norm. This applies to all cases.
1963 *
1964  temp1 = sqrt(dble(n)) * epsln
1965  DO 1972 q = 1, n
1966  DO 972 p = 1, n
1967  cwork(2*n+n*nr+nr+iwork(p)) = v(p,q)
1968  972 CONTINUE
1969  DO 973 p = 1, n
1970  v(p,q) = cwork(2*n+n*nr+nr+p)
1971  973 CONTINUE
1972  xsc = one / dznrm2( n, v(1,q), 1 )
1973  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1974  \$ CALL zdscal( n, xsc, v(1,q), 1 )
1975  1972 CONTINUE
1976 * At this moment, V contains the right singular vectors of A.
1977 * Next, assemble the left singular vector matrix U (M x N).
1978  IF ( nr .LT. m ) THEN
1979  CALL zlaset('A', m-nr, nr, czero, czero, u(nr+1,1), ldu)
1980  IF ( nr .LT. n1 ) THEN
1981  CALL zlaset('A',nr,n1-nr,czero,czero,u(1,nr+1),ldu)
1982  CALL zlaset('A',m-nr,n1-nr,czero,cone,
1983  \$ u(nr+1,nr+1),ldu)
1984  END IF
1985  END IF
1986 *
1987 * The Q matrix from the first QRF is built into the left singular
1988 * matrix U. This applies to all cases.
1989 *
1990  CALL zunmqr( 'L', 'N', m, n1, n, a, lda, cwork, u,
1991  \$ ldu, cwork(n+1), lwork-n, ierr )
1992
1993 * The columns of U are normalized. The cost is O(M*N) flops.
1994  temp1 = sqrt(dble(m)) * epsln
1995  DO 1973 p = 1, nr
1996  xsc = one / dznrm2( m, u(1,p), 1 )
1997  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1998  \$ CALL zdscal( m, xsc, u(1,p), 1 )
1999  1973 CONTINUE
2000 *
2001 * If the initial QRF is computed with row pivoting, the left
2002 * singular vectors must be adjusted.
2003 *
2004  IF ( rowpiv )
2005  \$ CALL zlaswp( n1, u, ldu, 1, m-1, iwork(iwoff+1), -1 )
2006 *
2007  ELSE
2008 *
2009 * .. the initial matrix A has almost orthogonal columns and
2010 * the second QRF is not needed
2011 *
2012  CALL zlacpy( 'U', n, n, a, lda, cwork(n+1), n )
2013  IF ( l2pert ) THEN
2014  xsc = sqrt(small)
2015  DO 5970 p = 2, n
2016  ctemp = xsc * cwork( n + (p-1)*n + p )
2017  DO 5971 q = 1, p - 1
2018 * CWORK(N+(q-1)*N+p)=-TEMP1 * ( CWORK(N+(p-1)*N+q) /
2019 * \$ ABS(CWORK(N+(p-1)*N+q)) )
2020  cwork(n+(q-1)*n+p)=-ctemp
2021  5971 CONTINUE
2022  5970 CONTINUE
2023  ELSE
2024  CALL zlaset( 'L',n-1,n-1,czero,czero,cwork(n+2),n )
2025  END IF
2026 *
2027  CALL zgesvj( 'U', 'U', 'N', n, n, cwork(n+1), n, sva,
2028  \$ n, u, ldu, cwork(n+n*n+1), lwork-n-n*n, rwork, lrwork,
2029  \$ info )
2030 *
2031  scalem = rwork(1)
2032  numrank = nint(rwork(2))
2033  DO 6970 p = 1, n
2034  CALL zcopy( n, cwork(n+(p-1)*n+1), 1, u(1,p), 1 )
2035  CALL zdscal( n, sva(p), cwork(n+(p-1)*n+1), 1 )
2036  6970 CONTINUE
2037 *
2038  CALL ztrsm( 'L', 'U', 'N', 'N', n, n,
2039  \$ cone, a, lda, cwork(n+1), n )
2040  DO 6972 p = 1, n
2041  CALL zcopy( n, cwork(n+p), n, v(iwork(p),1), ldv )
2042  6972 CONTINUE
2043  temp1 = sqrt(dble(n))*epsln
2044  DO 6971 p = 1, n
2045  xsc = one / dznrm2( n, v(1,p), 1 )
2046  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
2047  \$ CALL zdscal( n, xsc, v(1,p), 1 )
2048  6971 CONTINUE
2049 *
2050 * Assemble the left singular vector matrix U (M x N).
2051 *
2052  IF ( n .LT. m ) THEN
2053  CALL zlaset( 'A', m-n, n, czero, czero, u(n+1,1), ldu )
2054  IF ( n .LT. n1 ) THEN
2055  CALL zlaset('A',n, n1-n, czero, czero, u(1,n+1),ldu)
2056  CALL zlaset( 'A',m-n,n1-n, czero, cone,u(n+1,n+1),ldu)
2057  END IF
2058  END IF
2059  CALL zunmqr( 'L', 'N', m, n1, n, a, lda, cwork, u,
2060  \$ ldu, cwork(n+1), lwork-n, ierr )
2061  temp1 = sqrt(dble(m))*epsln
2062  DO 6973 p = 1, n1
2063  xsc = one / dznrm2( m, u(1,p), 1 )
2064  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
2065  \$ CALL zdscal( m, xsc, u(1,p), 1 )
2066  6973 CONTINUE
2067 *
2068  IF ( rowpiv )
2069  \$ CALL zlaswp( n1, u, ldu, 1, m-1, iwork(iwoff+1), -1 )
2070 *
2071  END IF
2072 *
2073 * end of the >> almost orthogonal case << in the full SVD
2074 *
2075  ELSE
2076 *
2077 * This branch deploys a preconditioned Jacobi SVD with explicitly
2078 * accumulated rotations. It is included as optional, mainly for
2079 * experimental purposes. It does perform well, and can also be used.
2080 * In this implementation, this branch will be automatically activated
2081 * if the condition number sigma_max(A) / sigma_min(A) is predicted
2082 * to be greater than the overflow threshold. This is because the
2083 * a posteriori computation of the singular vectors assumes robust
2084 * implementation of BLAS and some LAPACK procedures, capable of working
2085 * in presence of extreme values, e.g. when the singular values spread from
2086 * the underflow to the overflow threshold.
2087 *
2088  DO 7968 p = 1, nr
2089  CALL zcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
2090  CALL zlacgv( n-p+1, v(p,p), 1 )
2091  7968 CONTINUE
2092 *
2093  IF ( l2pert ) THEN
2094  xsc = sqrt(small/epsln)
2095  DO 5969 q = 1, nr
2096  ctemp = dcmplx(xsc*abs( v(q,q) ),zero)
2097  DO 5968 p = 1, n
2098  IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
2099  \$ .OR. ( p .LT. q ) )
2100 * \$ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) )
2101  \$ v(p,q) = ctemp
2102  IF ( p .LT. q ) v(p,q) = - v(p,q)
2103  5968 CONTINUE
2104  5969 CONTINUE
2105  ELSE
2106  CALL zlaset( 'U', nr-1, nr-1, czero, czero, v(1,2), ldv )
2107  END IF
2108
2109  CALL zgeqrf( n, nr, v, ldv, cwork(n+1), cwork(2*n+1),
2110  \$ lwork-2*n, ierr )
2111  CALL zlacpy( 'L', n, nr, v, ldv, cwork(2*n+1), n )
2112 *
2113  DO 7969 p = 1, nr
2114  CALL zcopy( nr-p+1, v(p,p), ldv, u(p,p), 1 )
2115  CALL zlacgv( nr-p+1, u(p,p), 1 )
2116  7969 CONTINUE
2117
2118  IF ( l2pert ) THEN
2119  xsc = sqrt(small/epsln)
2120  DO 9970 q = 2, nr
2121  DO 9971 p = 1, q - 1
2122  ctemp = dcmplx(xsc * min(abs(u(p,p)),abs(u(q,q))),
2123  \$ zero)
2124 * U(p,q) = - TEMP1 * ( U(q,p) / ABS(U(q,p)) )
2125  u(p,q) = - ctemp
2126  9971 CONTINUE
2127  9970 CONTINUE
2128  ELSE
2129  CALL zlaset('U', nr-1, nr-1, czero, czero, u(1,2), ldu )
2130  END IF
2131
2132  CALL zgesvj( 'L', 'U', 'V', nr, nr, u, ldu, sva,
2133  \$ n, v, ldv, cwork(2*n+n*nr+1), lwork-2*n-n*nr,
2134  \$ rwork, lrwork, info )
2135  scalem = rwork(1)
2136  numrank = nint(rwork(2))
2137
2138  IF ( nr .LT. n ) THEN
2139  CALL zlaset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
2140  CALL zlaset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
2141  CALL zlaset( 'A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv )
2142  END IF
2143
2144  CALL zunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
2145  \$ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
2146 *
2147 * Permute the rows of V using the (column) permutation from the
2148 * first QRF. Also, scale the columns to make them unit in
2149 * Euclidean norm. This applies to all cases.
2150 *
2151  temp1 = sqrt(dble(n)) * epsln
2152  DO 7972 q = 1, n
2153  DO 8972 p = 1, n
2154  cwork(2*n+n*nr+nr+iwork(p)) = v(p,q)
2155  8972 CONTINUE
2156  DO 8973 p = 1, n
2157  v(p,q) = cwork(2*n+n*nr+nr+p)
2158  8973 CONTINUE
2159  xsc = one / dznrm2( n, v(1,q), 1 )
2160  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
2161  \$ CALL zdscal( n, xsc, v(1,q), 1 )
2162  7972 CONTINUE
2163 *
2164 * At this moment, V contains the right singular vectors of A.
2165 * Next, assemble the left singular vector matrix U (M x N).
2166 *
2167  IF ( nr .LT. m ) THEN
2168  CALL zlaset( 'A', m-nr, nr, czero, czero, u(nr+1,1), ldu )
2169  IF ( nr .LT. n1 ) THEN
2170  CALL zlaset('A',nr, n1-nr, czero, czero, u(1,nr+1),ldu)
2171  CALL zlaset('A',m-nr,n1-nr, czero, cone,u(nr+1,nr+1),ldu)
2172  END IF
2173  END IF
2174 *
2175  CALL zunmqr( 'L', 'N', m, n1, n, a, lda, cwork, u,
2176  \$ ldu, cwork(n+1), lwork-n, ierr )
2177 *
2178  IF ( rowpiv )
2179  \$ CALL zlaswp( n1, u, ldu, 1, m-1, iwork(iwoff+1), -1 )
2180 *
2181 *
2182  END IF
2183  IF ( transp ) THEN
2184 * .. swap U and V because the procedure worked on A^*
2185  DO 6974 p = 1, n
2186  CALL zswap( n, u(1,p), 1, v(1,p), 1 )
2187  6974 CONTINUE
2188  END IF
2189 *
2190  END IF
2191 * end of the full SVD
2192 *
2193 * Undo scaling, if necessary (and possible)
2194 *
2195  IF ( uscal2 .LE. (big/sva(1))*uscal1 ) THEN
2196  CALL dlascl( 'G', 0, 0, uscal1, uscal2, nr, 1, sva, n, ierr )
2197  uscal1 = one
2198  uscal2 = one
2199  END IF
2200 *
2201  IF ( nr .LT. n ) THEN
2202  DO 3004 p = nr+1, n
2203  sva(p) = zero
2204  3004 CONTINUE
2205  END IF
2206 *
2207  rwork(1) = uscal2 * scalem
2208  rwork(2) = uscal1
2209  IF ( errest ) rwork(3) = sconda
2210  IF ( lsvec .AND. rsvec ) THEN
2211  rwork(4) = condr1
2212  rwork(5) = condr2
2213  END IF
2214  IF ( l2tran ) THEN
2215  rwork(6) = entra
2216  rwork(7) = entrat
2217  END IF
2218 *
2219  iwork(1) = nr
2220  iwork(2) = numrank
2221  iwork(3) = warning
2222  IF ( transp ) THEN
2223  iwork(4) = 1
2224  ELSE
2225  iwork(4) = -1
2226  END IF
2227
2228 *
2229  RETURN
2230 * ..
2231 * .. END OF ZGEJSV
2232 * ..
double precision function dlamch(CMACH)
DLAMCH
Definition: dlamch.f:69
subroutine dlassq(n, x, incx, scl, sumsq)
DLASSQ updates a sum of squares represented in scaled form.
Definition: dlassq.f90:137
subroutine zlassq(n, x, incx, scl, sumsq)
ZLASSQ updates a sum of squares represented in scaled form.
Definition: zlassq.f90:137
subroutine dlascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
DLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: dlascl.f:143
integer function izamax(N, ZX, INCX)
IZAMAX
Definition: izamax.f:71
integer function idamax(N, DX, INCX)
IDAMAX
Definition: idamax.f:71
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:60
logical function lsame(CA, CB)
LSAME
Definition: lsame.f:53
subroutine zswap(N, ZX, INCX, ZY, INCY)
ZSWAP
Definition: zswap.f:81
subroutine zdscal(N, DA, ZX, INCX)
ZDSCAL
Definition: zdscal.f:78
subroutine zcopy(N, ZX, INCX, ZY, INCY)
ZCOPY
Definition: zcopy.f:81
subroutine ztrsm(SIDE, UPLO, TRANSA, DIAG, M, N, ALPHA, A, LDA, B, LDB)
ZTRSM
Definition: ztrsm.f:180
subroutine zgelqf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
ZGELQF
Definition: zgelqf.f:143
subroutine zgeqp3(M, N, A, LDA, JPVT, TAU, WORK, LWORK, RWORK, INFO)
ZGEQP3
Definition: zgeqp3.f:159
subroutine zgesvj(JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, LDV, CWORK, LWORK, RWORK, LRWORK, INFO)
ZGESVJ
Definition: zgesvj.f:351
subroutine zlascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
ZLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: zlascl.f:143
subroutine zlapmr(FORWRD, M, N, X, LDX, K)
ZLAPMR rearranges rows of a matrix as specified by a permutation vector.
Definition: zlapmr.f:104
subroutine zlaswp(N, A, LDA, K1, K2, IPIV, INCX)
ZLASWP performs a series of row interchanges on a general rectangular matrix.
Definition: zlaswp.f:115
subroutine zlacpy(UPLO, M, N, A, LDA, B, LDB)
ZLACPY copies all or part of one two-dimensional array to another.
Definition: zlacpy.f:103
subroutine zlaset(UPLO, M, N, ALPHA, BETA, A, LDA)
ZLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values.
Definition: zlaset.f:106
subroutine zlacgv(N, X, INCX)
ZLACGV conjugates a complex vector.
Definition: zlacgv.f:74
subroutine zunmlq(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
ZUNMLQ
Definition: zunmlq.f:167
subroutine zungqr(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
ZUNGQR
Definition: zungqr.f:128
subroutine zunmqr(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
ZUNMQR
Definition: zunmqr.f:167
subroutine zpocon(UPLO, N, A, LDA, ANORM, RCOND, WORK, RWORK, INFO)
ZPOCON
Definition: zpocon.f:121
subroutine dscal(N, DA, DX, INCX)
DSCAL
Definition: dscal.f:79
real(wp) function dznrm2(n, x, incx)
DZNRM2
Definition: dznrm2.f90:90
subroutine zgeqrf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
ZGEQRF VARIANT: left-looking Level 3 BLAS of the algorithm.
Definition: zgeqrf.f:151
Here is the call graph for this function:
Here is the caller graph for this function: