LAPACK 3.12.0
LAPACK: Linear Algebra PACKage
Loading...
Searching...
No Matches
dgejsv.f
Go to the documentation of this file.
1*> \brief \b DGEJSV
2*
3* =========== DOCUMENTATION ===========
4*
5* Online html documentation available at
6* http://www.netlib.org/lapack/explore-html/
7*
8*> \htmlonly
9*> Download DGEJSV + dependencies
10*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/dgejsv.f">
11*> [TGZ]</a>
12*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/dgejsv.f">
13*> [ZIP]</a>
14*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/dgejsv.f">
15*> [TXT]</a>
16*> \endhtmlonly
17*
18* Definition:
19* ===========
20*
21* SUBROUTINE DGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,
22* M, N, A, LDA, SVA, U, LDU, V, LDV,
23* WORK, LWORK, IWORK, INFO )
24*
25* .. Scalar Arguments ..
26* IMPLICIT NONE
27* INTEGER INFO, LDA, LDU, LDV, LWORK, M, N
28* ..
29* .. Array Arguments ..
30* DOUBLE PRECISION A( LDA, * ), SVA( N ), U( LDU, * ), V( LDV, * ),
31* $ WORK( LWORK )
32* INTEGER IWORK( * )
33* CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
34* ..
35*
36*
37*> \par Purpose:
38* =============
39*>
40*> \verbatim
41*>
42*> DGEJSV computes the singular value decomposition (SVD) of a real M-by-N
43*> matrix [A], where M >= N. The SVD of [A] is written as
44*>
45*> [A] = [U] * [SIGMA] * [V]^t,
46*>
47*> where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
48*> diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix, and
49*> [V] is an N-by-N orthogonal matrix. The diagonal elements of [SIGMA] are
50*> the singular values of [A]. The columns of [U] and [V] are the left and
51*> the right singular vectors of [A], respectively. The matrices [U] and [V]
52*> are computed and stored in the arrays U and V, respectively. The diagonal
53*> of [SIGMA] is computed and stored in the array SVA.
54*> DGEJSV can sometimes compute tiny singular values and their singular vectors much
55*> more accurately than other SVD routines, see below under Further Details.
56*> \endverbatim
57*
58* Arguments:
59* ==========
60*
61*> \param[in] JOBA
62*> \verbatim
63*> JOBA is CHARACTER*1
64*> Specifies the level of accuracy:
65*> = 'C': This option works well (high relative accuracy) if A = B * D,
66*> with well-conditioned B and arbitrary diagonal matrix D.
67*> The accuracy cannot be spoiled by COLUMN scaling. The
68*> accuracy of the computed output depends on the condition of
69*> B, and the procedure aims at the best theoretical accuracy.
70*> The relative error max_{i=1:N}|d sigma_i| / sigma_i is
71*> bounded by f(M,N)*epsilon* cond(B), independent of D.
72*> The input matrix is preprocessed with the QRF with column
73*> pivoting. This initial preprocessing and preconditioning by
74*> a rank revealing QR factorization is common for all values of
75*> JOBA. Additional actions are specified as follows:
76*> = 'E': Computation as with 'C' with an additional estimate of the
77*> condition number of B. It provides a realistic error bound.
78*> = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
79*> D1, D2, and well-conditioned matrix C, this option gives
80*> higher accuracy than the 'C' option. If the structure of the
81*> input matrix is not known, and relative accuracy is
82*> desirable, then this option is advisable. The input matrix A
83*> is preprocessed with QR factorization with FULL (row and
84*> column) pivoting.
85*> = 'G': Computation as with 'F' with an additional estimate of the
86*> condition number of B, where A=D*B. If A has heavily weighted
87*> rows, then using this condition number gives too pessimistic
88*> error bound.
89*> = 'A': Small singular values are the noise and the matrix is treated
90*> as numerically rank deficient. The error in the computed
91*> singular values is bounded by f(m,n)*epsilon*||A||.
92*> The computed SVD A = U * S * V^t restores A up to
93*> f(m,n)*epsilon*||A||.
94*> This gives the procedure the licence to discard (set to zero)
95*> all singular values below N*epsilon*||A||.
96*> = 'R': Similar as in 'A'. Rank revealing property of the initial
97*> QR factorization is used do reveal (using triangular factor)
98*> a gap sigma_{r+1} < epsilon * sigma_r in which case the
99*> numerical RANK is declared to be r. The SVD is computed with
100*> absolute error bounds, but more accurately than with 'A'.
101*> \endverbatim
102*>
103*> \param[in] JOBU
104*> \verbatim
105*> JOBU is CHARACTER*1
106*> Specifies whether to compute the columns of U:
107*> = 'U': N columns of U are returned in the array U.
108*> = 'F': full set of M left sing. vectors is returned in the array U.
109*> = 'W': U may be used as workspace of length M*N. See the description
110*> of U.
111*> = 'N': U is not computed.
112*> \endverbatim
113*>
114*> \param[in] JOBV
115*> \verbatim
116*> JOBV is CHARACTER*1
117*> Specifies whether to compute the matrix V:
118*> = 'V': N columns of V are returned in the array V; Jacobi rotations
119*> are not explicitly accumulated.
120*> = 'J': N columns of V are returned in the array V, but they are
121*> computed as the product of Jacobi rotations. This option is
122*> allowed only if JOBU .NE. 'N', i.e. in computing the full SVD.
123*> = 'W': V may be used as workspace of length N*N. See the description
124*> of V.
125*> = 'N': V is not computed.
126*> \endverbatim
127*>
128*> \param[in] JOBR
129*> \verbatim
130*> JOBR is CHARACTER*1
131*> Specifies the RANGE for the singular values. Issues the licence to
132*> set to zero small positive singular values if they are outside
133*> specified range. If A .NE. 0 is scaled so that the largest singular
134*> value of c*A is around DSQRT(BIG), BIG=SLAMCH('O'), then JOBR issues
135*> the licence to kill columns of A whose norm in c*A is less than
136*> DSQRT(SFMIN) (for JOBR = 'R'), or less than SMALL=SFMIN/EPSLN,
137*> where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E').
138*> = 'N': Do not kill small columns of c*A. This option assumes that
139*> BLAS and QR factorizations and triangular solvers are
140*> implemented to work in that range. If the condition of A
141*> is greater than BIG, use DGESVJ.
142*> = 'R': RESTRICTED range for sigma(c*A) is [DSQRT(SFMIN), DSQRT(BIG)]
143*> (roughly, as described above). This option is recommended.
144*> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
145*> For computing the singular values in the FULL range [SFMIN,BIG]
146*> use DGESVJ.
147*> \endverbatim
148*>
149*> \param[in] JOBT
150*> \verbatim
151*> JOBT is CHARACTER*1
152*> If the matrix is square then the procedure may determine to use
153*> transposed A if A^t seems to be better with respect to convergence.
154*> If the matrix is not square, JOBT is ignored. This is subject to
155*> changes in the future.
156*> The decision is based on two values of entropy over the adjoint
157*> orbit of A^t * A. See the descriptions of WORK(6) and WORK(7).
158*> = 'T': transpose if entropy test indicates possibly faster
159*> convergence of Jacobi process if A^t is taken as input. If A is
160*> replaced with A^t, then the row pivoting is included automatically.
161*> = 'N': do not speculate.
162*> This option can be used to compute only the singular values, or the
163*> full SVD (U, SIGMA and V). For only one set of singular vectors
164*> (U or V), the caller should provide both U and V, as one of the
165*> matrices is used as workspace if the matrix A is transposed.
166*> The implementer can easily remove this constraint and make the
167*> code more complicated. See the descriptions of U and V.
168*> \endverbatim
169*>
170*> \param[in] JOBP
171*> \verbatim
172*> JOBP is CHARACTER*1
173*> Issues the licence to introduce structured perturbations to drown
174*> denormalized numbers. This licence should be active if the
175*> denormals are poorly implemented, causing slow computation,
176*> especially in cases of fast convergence (!). For details see [1,2].
177*> For the sake of simplicity, this perturbations are included only
178*> when the full SVD or only the singular values are requested. The
179*> implementer/user can easily add the perturbation for the cases of
180*> computing one set of singular vectors.
181*> = 'P': introduce perturbation
182*> = 'N': do not perturb
183*> \endverbatim
184*>
185*> \param[in] M
186*> \verbatim
187*> M is INTEGER
188*> The number of rows of the input matrix A. M >= 0.
189*> \endverbatim
190*>
191*> \param[in] N
192*> \verbatim
193*> N is INTEGER
194*> The number of columns of the input matrix A. M >= N >= 0.
195*> \endverbatim
196*>
197*> \param[in,out] A
198*> \verbatim
199*> A is DOUBLE PRECISION array, dimension (LDA,N)
200*> On entry, the M-by-N matrix A.
201*> \endverbatim
202*>
203*> \param[in] LDA
204*> \verbatim
205*> LDA is INTEGER
206*> The leading dimension of the array A. LDA >= max(1,M).
207*> \endverbatim
208*>
209*> \param[out] SVA
210*> \verbatim
211*> SVA is DOUBLE PRECISION array, dimension (N)
212*> On exit,
213*> - For WORK(1)/WORK(2) = ONE: The singular values of A. During the
214*> computation SVA contains Euclidean column norms of the
215*> iterated matrices in the array A.
216*> - For WORK(1) .NE. WORK(2): The singular values of A are
217*> (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
218*> sigma_max(A) overflows or if small singular values have been
219*> saved from underflow by scaling the input matrix A.
220*> - If JOBR='R' then some of the singular values may be returned
221*> as exact zeros obtained by "set to zero" because they are
222*> below the numerical rank threshold or are denormalized numbers.
223*> \endverbatim
224*>
225*> \param[out] U
226*> \verbatim
227*> U is DOUBLE PRECISION array, dimension ( LDU, N ) or ( LDU, M )
228*> If JOBU = 'U', then U contains on exit the M-by-N matrix of
229*> the left singular vectors.
230*> If JOBU = 'F', then U contains on exit the M-by-M matrix of
231*> the left singular vectors, including an ONB
232*> of the orthogonal complement of the Range(A).
233*> If JOBU = 'W' .AND. (JOBV = 'V' .AND. JOBT = 'T' .AND. M = N),
234*> then U is used as workspace if the procedure
235*> replaces A with A^t. In that case, [V] is computed
236*> in U as left singular vectors of A^t and then
237*> copied back to the V array. This 'W' option is just
238*> a reminder to the caller that in this case U is
239*> reserved as workspace of length N*N.
240*> If JOBU = 'N' U is not referenced, unless JOBT='T'.
241*> \endverbatim
242*>
243*> \param[in] LDU
244*> \verbatim
245*> LDU is INTEGER
246*> The leading dimension of the array U, LDU >= 1.
247*> IF JOBU = 'U' or 'F' or 'W', then LDU >= M.
248*> \endverbatim
249*>
250*> \param[out] V
251*> \verbatim
252*> V is DOUBLE PRECISION array, dimension ( LDV, N )
253*> If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
254*> the right singular vectors;
255*> If JOBV = 'W', AND (JOBU = 'U' AND JOBT = 'T' AND M = N),
256*> then V is used as workspace if the procedure
257*> replaces A with A^t. In that case, [U] is computed
258*> in V as right singular vectors of A^t and then
259*> copied back to the U array. This 'W' option is just
260*> a reminder to the caller that in this case V is
261*> reserved as workspace of length N*N.
262*> If JOBV = 'N' V is not referenced, unless JOBT='T'.
263*> \endverbatim
264*>
265*> \param[in] LDV
266*> \verbatim
267*> LDV is INTEGER
268*> The leading dimension of the array V, LDV >= 1.
269*> If JOBV = 'V' or 'J' or 'W', then LDV >= N.
270*> \endverbatim
271*>
272*> \param[out] WORK
273*> \verbatim
274*> WORK is DOUBLE PRECISION array, dimension (MAX(7,LWORK))
275*> On exit, if N > 0 .AND. M > 0 (else not referenced),
276*> WORK(1) = SCALE = WORK(2) / WORK(1) is the scaling factor such
277*> that SCALE*SVA(1:N) are the computed singular values
278*> of A. (See the description of SVA().)
279*> WORK(2) = See the description of WORK(1).
280*> WORK(3) = SCONDA is an estimate for the condition number of
281*> column equilibrated A. (If JOBA = 'E' or 'G')
282*> SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1).
283*> It is computed using DPOCON. It holds
284*> N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
285*> where R is the triangular factor from the QRF of A.
286*> However, if R is truncated and the numerical rank is
287*> determined to be strictly smaller than N, SCONDA is
288*> returned as -1, thus indicating that the smallest
289*> singular values might be lost.
290*>
291*> If full SVD is needed, the following two condition numbers are
292*> useful for the analysis of the algorithm. They are provided for
293*> a developer/implementer who is familiar with the details of
294*> the method.
295*>
296*> WORK(4) = an estimate of the scaled condition number of the
297*> triangular factor in the first QR factorization.
298*> WORK(5) = an estimate of the scaled condition number of the
299*> triangular factor in the second QR factorization.
300*> The following two parameters are computed if JOBT = 'T'.
301*> They are provided for a developer/implementer who is familiar
302*> with the details of the method.
303*>
304*> WORK(6) = the entropy of A^t*A :: this is the Shannon entropy
305*> of diag(A^t*A) / Trace(A^t*A) taken as point in the
306*> probability simplex.
307*> WORK(7) = the entropy of A*A^t.
308*> \endverbatim
309*>
310*> \param[in] LWORK
311*> \verbatim
312*> LWORK is INTEGER
313*> Length of WORK to confirm proper allocation of work space.
314*> LWORK depends on the job:
315*>
316*> If only SIGMA is needed (JOBU = 'N', JOBV = 'N') and
317*> -> .. no scaled condition estimate required (JOBE = 'N'):
318*> LWORK >= max(2*M+N,4*N+1,7). This is the minimal requirement.
319*> ->> For optimal performance (blocked code) the optimal value
320*> is LWORK >= max(2*M+N,3*N+(N+1)*NB,7). Here NB is the optimal
321*> block size for DGEQP3 and DGEQRF.
322*> In general, optimal LWORK is computed as
323*> LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), 7).
324*> -> .. an estimate of the scaled condition number of A is
325*> required (JOBA='E', 'G'). In this case, LWORK is the maximum
326*> of the above and N*N+4*N, i.e. LWORK >= max(2*M+N,N*N+4*N,7).
327*> ->> For optimal performance (blocked code) the optimal value
328*> is LWORK >= max(2*M+N,3*N+(N+1)*NB, N*N+4*N, 7).
329*> In general, the optimal length LWORK is computed as
330*> LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF),
331*> N+N*N+LWORK(DPOCON),7).
332*>
333*> If SIGMA and the right singular vectors are needed (JOBV = 'V'),
334*> -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
335*> -> For optimal performance, LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
336*> where NB is the optimal block size for DGEQP3, DGEQRF, DGELQF,
337*> DORMLQ. In general, the optimal length LWORK is computed as
338*> LWORK >= max(2*M+N,N+LWORK(DGEQP3), N+LWORK(DPOCON),
339*> N+LWORK(DGELQF), 2*N+LWORK(DGEQRF), N+LWORK(DORMLQ)).
340*>
341*> If SIGMA and the left singular vectors are needed
342*> -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
343*> -> For optimal performance:
344*> if JOBU = 'U' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
345*> if JOBU = 'F' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,N+M*NB,7),
346*> where NB is the optimal block size for DGEQP3, DGEQRF, DORMQR.
347*> In general, the optimal length LWORK is computed as
348*> LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DPOCON),
349*> 2*N+LWORK(DGEQRF), N+LWORK(DORMQR)).
350*> Here LWORK(DORMQR) equals N*NB (for JOBU = 'U') or
351*> M*NB (for JOBU = 'F').
352*>
353*> If the full SVD is needed: (JOBU = 'U' or JOBU = 'F') and
354*> -> if JOBV = 'V'
355*> the minimal requirement is LWORK >= max(2*M+N,6*N+2*N*N).
356*> -> if JOBV = 'J' the minimal requirement is
357*> LWORK >= max(2*M+N, 4*N+N*N,2*N+N*N+6).
358*> -> For optimal performance, LWORK should be additionally
359*> larger than N+M*NB, where NB is the optimal block size
360*> for DORMQR.
361*> \endverbatim
362*>
363*> \param[out] IWORK
364*> \verbatim
365*> IWORK is INTEGER array, dimension (MAX(3,M+3*N)).
366*> On exit,
367*> IWORK(1) = the numerical rank determined after the initial
368*> QR factorization with pivoting. See the descriptions
369*> of JOBA and JOBR.
370*> IWORK(2) = the number of the computed nonzero singular values
371*> IWORK(3) = if nonzero, a warning message:
372*> If IWORK(3) = 1 then some of the column norms of A
373*> were denormalized floats. The requested high accuracy
374*> is not warranted by the data.
375*> \endverbatim
376*>
377*> \param[out] INFO
378*> \verbatim
379*> INFO is INTEGER
380*> < 0: if INFO = -i, then the i-th argument had an illegal value.
381*> = 0: successful exit;
382*> > 0: DGEJSV did not converge in the maximal allowed number
383*> of sweeps. The computed values may be inaccurate.
384*> \endverbatim
385*
386* Authors:
387* ========
388*
389*> \author Univ. of Tennessee
390*> \author Univ. of California Berkeley
391*> \author Univ. of Colorado Denver
392*> \author NAG Ltd.
393*
394*> \ingroup gejsv
395*
396*> \par Further Details:
397* =====================
398*>
399*> \verbatim
400*>
401*> DGEJSV implements a preconditioned Jacobi SVD algorithm. It uses DGEQP3,
402*> DGEQRF, and DGELQF as preprocessors and preconditioners. Optionally, an
403*> additional row pivoting can be used as a preprocessor, which in some
404*> cases results in much higher accuracy. An example is matrix A with the
405*> structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
406*> diagonal matrices and C is well-conditioned matrix. In that case, complete
407*> pivoting in the first QR factorizations provides accuracy dependent on the
408*> condition number of C, and independent of D1, D2. Such higher accuracy is
409*> not completely understood theoretically, but it works well in practice.
410*> Further, if A can be written as A = B*D, with well-conditioned B and some
411*> diagonal D, then the high accuracy is guaranteed, both theoretically and
412*> in software, independent of D. For more details see [1], [2].
413*> The computational range for the singular values can be the full range
414*> ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
415*> & LAPACK routines called by DGEJSV are implemented to work in that range.
416*> If that is not the case, then the restriction for safe computation with
417*> the singular values in the range of normalized IEEE numbers is that the
418*> spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
419*> overflow. This code (DGEJSV) is best used in this restricted range,
420*> meaning that singular values of magnitude below ||A||_2 / DLAMCH('O') are
421*> returned as zeros. See JOBR for details on this.
422*> Further, this implementation is somewhat slower than the one described
423*> in [1,2] due to replacement of some non-LAPACK components, and because
424*> the choice of some tuning parameters in the iterative part (DGESVJ) is
425*> left to the implementer on a particular machine.
426*> The rank revealing QR factorization (in this code: DGEQP3) should be
427*> implemented as in [3]. We have a new version of DGEQP3 under development
428*> that is more robust than the current one in LAPACK, with a cleaner cut in
429*> rank deficient cases. It will be available in the SIGMA library [4].
430*> If M is much larger than N, it is obvious that the initial QRF with
431*> column pivoting can be preprocessed by the QRF without pivoting. That
432*> well known trick is not used in DGEJSV because in some cases heavy row
433*> weighting can be treated with complete pivoting. The overhead in cases
434*> M much larger than N is then only due to pivoting, but the benefits in
435*> terms of accuracy have prevailed. The implementer/user can incorporate
436*> this extra QRF step easily. The implementer can also improve data movement
437*> (matrix transpose, matrix copy, matrix transposed copy) - this
438*> implementation of DGEJSV uses only the simplest, naive data movement.
439*> \endverbatim
440*
441*> \par Contributors:
442* ==================
443*>
444*> Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
445*
446*> \par References:
447* ================
448*>
449*> \verbatim
450*>
451*> [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
452*> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
453*> LAPACK Working note 169.
454*> [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
455*> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
456*> LAPACK Working note 170.
457*> [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
458*> factorization software - a case study.
459*> ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
460*> LAPACK Working note 176.
461*> [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
462*> QSVD, (H,K)-SVD computations.
463*> Department of Mathematics, University of Zagreb, 2008.
464*> \endverbatim
465*
466*> \par Bugs, examples and comments:
467* =================================
468*>
469*> Please report all bugs and send interesting examples and/or comments to
470*> drmac@math.hr. Thank you.
471*>
472* =====================================================================
473 SUBROUTINE dgejsv( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,
474 $ M, N, A, LDA, SVA, U, LDU, V, LDV,
475 $ WORK, LWORK, IWORK, INFO )
476*
477* -- LAPACK computational routine --
478* -- LAPACK is a software package provided by Univ. of Tennessee, --
479* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
480*
481* .. Scalar Arguments ..
482 IMPLICIT NONE
483 INTEGER INFO, LDA, LDU, LDV, LWORK, M, N
484* ..
485* .. Array Arguments ..
486 DOUBLE PRECISION A( LDA, * ), SVA( N ), U( LDU, * ), V( LDV, * ),
487 $ WORK( LWORK )
488 INTEGER IWORK( * )
489 CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
490* ..
491*
492* ===========================================================================
493*
494* .. Local Parameters ..
495 DOUBLE PRECISION ZERO, ONE
496 PARAMETER ( ZERO = 0.0d0, one = 1.0d0 )
497* ..
498* .. Local Scalars ..
499 DOUBLE PRECISION AAPP, AAQQ, AATMAX, AATMIN, BIG, BIG1, COND_OK,
500 $ CONDR1, CONDR2, ENTRA, ENTRAT, EPSLN, MAXPRJ, SCALEM,
501 $ sconda, sfmin, small, temp1, uscal1, uscal2, xsc
502 INTEGER IERR, N1, NR, NUMRANK, p, q, WARNING
503 LOGICAL ALMORT, DEFR, ERREST, GOSCAL, JRACC, KILL, LSVEC,
504 $ l2aber, l2kill, l2pert, l2rank, l2tran,
505 $ noscal, rowpiv, rsvec, transp
506* ..
507* .. Intrinsic Functions ..
508 INTRINSIC dabs, dlog, max, min, dble, idnint, dsign, dsqrt
509* ..
510* .. External Functions ..
511 DOUBLE PRECISION DLAMCH, DNRM2
512 INTEGER IDAMAX
513 LOGICAL LSAME
514 EXTERNAL idamax, lsame, dlamch, dnrm2
515* ..
516* .. External Subroutines ..
517 EXTERNAL dcopy, dgelqf, dgeqp3, dgeqrf, dlacpy, dlascl,
520*
521 EXTERNAL dgesvj
522* ..
523*
524* Test the input arguments
525*
526 lsvec = lsame( jobu, 'U' ) .OR. lsame( jobu, 'F' )
527 jracc = lsame( jobv, 'J' )
528 rsvec = lsame( jobv, 'V' ) .OR. jracc
529 rowpiv = lsame( joba, 'F' ) .OR. lsame( joba, 'G' )
530 l2rank = lsame( joba, 'R' )
531 l2aber = lsame( joba, 'A' )
532 errest = lsame( joba, 'E' ) .OR. lsame( joba, 'G' )
533 l2tran = lsame( jobt, 'T' )
534 l2kill = lsame( jobr, 'R' )
535 defr = lsame( jobr, 'N' )
536 l2pert = lsame( jobp, 'P' )
537*
538 IF ( .NOT.(rowpiv .OR. l2rank .OR. l2aber .OR.
539 $ errest .OR. lsame( joba, 'C' ) )) THEN
540 info = - 1
541 ELSE IF ( .NOT.( lsvec .OR. lsame( jobu, 'N' ) .OR.
542 $ lsame( jobu, 'W' )) ) THEN
543 info = - 2
544 ELSE IF ( .NOT.( rsvec .OR. lsame( jobv, 'N' ) .OR.
545 $ lsame( jobv, 'W' )) .OR. ( jracc .AND. (.NOT.lsvec) ) ) THEN
546 info = - 3
547 ELSE IF ( .NOT. ( l2kill .OR. defr ) ) THEN
548 info = - 4
549 ELSE IF ( .NOT. ( l2tran .OR. lsame( jobt, 'N' ) ) ) THEN
550 info = - 5
551 ELSE IF ( .NOT. ( l2pert .OR. lsame( jobp, 'N' ) ) ) THEN
552 info = - 6
553 ELSE IF ( m .LT. 0 ) THEN
554 info = - 7
555 ELSE IF ( ( n .LT. 0 ) .OR. ( n .GT. m ) ) THEN
556 info = - 8
557 ELSE IF ( lda .LT. m ) THEN
558 info = - 10
559 ELSE IF ( lsvec .AND. ( ldu .LT. m ) ) THEN
560 info = - 13
561 ELSE IF ( rsvec .AND. ( ldv .LT. n ) ) THEN
562 info = - 15
563 ELSE IF ( (.NOT.(lsvec .OR. rsvec .OR. errest).AND.
564 & (lwork .LT. max(7,4*n+1,2*m+n))) .OR.
565 & (.NOT.(lsvec .OR. rsvec) .AND. errest .AND.
566 & (lwork .LT. max(7,4*n+n*n,2*m+n))) .OR.
567 & (lsvec .AND. (.NOT.rsvec) .AND. (lwork .LT. max(7,2*m+n,4*n+1)))
568 & .OR.
569 & (rsvec .AND. (.NOT.lsvec) .AND. (lwork .LT. max(7,2*m+n,4*n+1)))
570 & .OR.
571 & (lsvec .AND. rsvec .AND. (.NOT.jracc) .AND.
572 & (lwork.LT.max(2*m+n,6*n+2*n*n)))
573 & .OR. (lsvec .AND. rsvec .AND. jracc .AND.
574 & lwork.LT.max(2*m+n,4*n+n*n,2*n+n*n+6)))
575 & THEN
576 info = - 17
577 ELSE
578* #:)
579 info = 0
580 END IF
581*
582 IF ( info .NE. 0 ) THEN
583* #:(
584 CALL xerbla( 'DGEJSV', - info )
585 RETURN
586 END IF
587*
588* Quick return for void matrix (Y3K safe)
589* #:)
590 IF ( ( m .EQ. 0 ) .OR. ( n .EQ. 0 ) ) THEN
591 iwork(1:3) = 0
592 work(1:7) = 0
593 RETURN
594 ENDIF
595*
596* Determine whether the matrix U should be M x N or M x M
597*
598 IF ( lsvec ) THEN
599 n1 = n
600 IF ( lsame( jobu, 'F' ) ) n1 = m
601 END IF
602*
603* Set numerical parameters
604*
605*! NOTE: Make sure DLAMCH() does not fail on the target architecture.
606*
607 epsln = dlamch('Epsilon')
608 sfmin = dlamch('SafeMinimum')
609 small = sfmin / epsln
610 big = dlamch('O')
611* BIG = ONE / SFMIN
612*
613* Initialize SVA(1:N) = diag( ||A e_i||_2 )_1^N
614*
615*(!) If necessary, scale SVA() to protect the largest norm from
616* overflow. It is possible that this scaling pushes the smallest
617* column norm left from the underflow threshold (extreme case).
618*
619 scalem = one / dsqrt(dble(m)*dble(n))
620 noscal = .true.
621 goscal = .true.
622 DO 1874 p = 1, n
623 aapp = zero
624 aaqq = one
625 CALL dlassq( m, a(1,p), 1, aapp, aaqq )
626 IF ( aapp .GT. big ) THEN
627 info = - 9
628 CALL xerbla( 'DGEJSV', -info )
629 RETURN
630 END IF
631 aaqq = dsqrt(aaqq)
632 IF ( ( aapp .LT. (big / aaqq) ) .AND. noscal ) THEN
633 sva(p) = aapp * aaqq
634 ELSE
635 noscal = .false.
636 sva(p) = aapp * ( aaqq * scalem )
637 IF ( goscal ) THEN
638 goscal = .false.
639 CALL dscal( p-1, scalem, sva, 1 )
640 END IF
641 END IF
642 1874 CONTINUE
643*
644 IF ( noscal ) scalem = one
645*
646 aapp = zero
647 aaqq = big
648 DO 4781 p = 1, n
649 aapp = max( aapp, sva(p) )
650 IF ( sva(p) .NE. zero ) aaqq = min( aaqq, sva(p) )
651 4781 CONTINUE
652*
653* Quick return for zero M x N matrix
654* #:)
655 IF ( aapp .EQ. zero ) THEN
656 IF ( lsvec ) CALL dlaset( 'G', m, n1, zero, one, u, ldu )
657 IF ( rsvec ) CALL dlaset( 'G', n, n, zero, one, v, ldv )
658 work(1) = one
659 work(2) = one
660 IF ( errest ) work(3) = one
661 IF ( lsvec .AND. rsvec ) THEN
662 work(4) = one
663 work(5) = one
664 END IF
665 IF ( l2tran ) THEN
666 work(6) = zero
667 work(7) = zero
668 END IF
669 iwork(1) = 0
670 iwork(2) = 0
671 iwork(3) = 0
672 RETURN
673 END IF
674*
675* Issue warning if denormalized column norms detected. Override the
676* high relative accuracy request. Issue licence to kill columns
677* (set them to zero) whose norm is less than sigma_max / BIG (roughly).
678* #:(
679 warning = 0
680 IF ( aaqq .LE. sfmin ) THEN
681 l2rank = .true.
682 l2kill = .true.
683 warning = 1
684 END IF
685*
686* Quick return for one-column matrix
687* #:)
688 IF ( n .EQ. 1 ) THEN
689*
690 IF ( lsvec ) THEN
691 CALL dlascl( 'G',0,0,sva(1),scalem, m,1,a(1,1),lda,ierr )
692 CALL dlacpy( 'A', m, 1, a, lda, u, ldu )
693* computing all M left singular vectors of the M x 1 matrix
694 IF ( n1 .NE. n ) THEN
695 CALL dgeqrf( m, n, u,ldu, work, work(n+1),lwork-n,ierr )
696 CALL dorgqr( m,n1,1, u,ldu,work,work(n+1),lwork-n,ierr )
697 CALL dcopy( m, a(1,1), 1, u(1,1), 1 )
698 END IF
699 END IF
700 IF ( rsvec ) THEN
701 v(1,1) = one
702 END IF
703 IF ( sva(1) .LT. (big*scalem) ) THEN
704 sva(1) = sva(1) / scalem
705 scalem = one
706 END IF
707 work(1) = one / scalem
708 work(2) = one
709 IF ( sva(1) .NE. zero ) THEN
710 iwork(1) = 1
711 IF ( ( sva(1) / scalem) .GE. sfmin ) THEN
712 iwork(2) = 1
713 ELSE
714 iwork(2) = 0
715 END IF
716 ELSE
717 iwork(1) = 0
718 iwork(2) = 0
719 END IF
720 iwork(3) = 0
721 IF ( errest ) work(3) = one
722 IF ( lsvec .AND. rsvec ) THEN
723 work(4) = one
724 work(5) = one
725 END IF
726 IF ( l2tran ) THEN
727 work(6) = zero
728 work(7) = zero
729 END IF
730 RETURN
731*
732 END IF
733*
734 transp = .false.
735 l2tran = l2tran .AND. ( m .EQ. n )
736*
737 aatmax = -one
738 aatmin = big
739 IF ( rowpiv .OR. l2tran ) THEN
740*
741* Compute the row norms, needed to determine row pivoting sequence
742* (in the case of heavily row weighted A, row pivoting is strongly
743* advised) and to collect information needed to compare the
744* structures of A * A^t and A^t * A (in the case L2TRAN.EQ..TRUE.).
745*
746 IF ( l2tran ) THEN
747 DO 1950 p = 1, m
748 xsc = zero
749 temp1 = one
750 CALL dlassq( n, a(p,1), lda, xsc, temp1 )
751* DLASSQ gets both the ell_2 and the ell_infinity norm
752* in one pass through the vector
753 work(m+n+p) = xsc * scalem
754 work(n+p) = xsc * (scalem*dsqrt(temp1))
755 aatmax = max( aatmax, work(n+p) )
756 IF (work(n+p) .NE. zero) aatmin = min(aatmin,work(n+p))
757 1950 CONTINUE
758 ELSE
759 DO 1904 p = 1, m
760 work(m+n+p) = scalem*dabs( a(p,idamax(n,a(p,1),lda)) )
761 aatmax = max( aatmax, work(m+n+p) )
762 aatmin = min( aatmin, work(m+n+p) )
763 1904 CONTINUE
764 END IF
765*
766 END IF
767*
768* For square matrix A try to determine whether A^t would be better
769* input for the preconditioned Jacobi SVD, with faster convergence.
770* The decision is based on an O(N) function of the vector of column
771* and row norms of A, based on the Shannon entropy. This should give
772* the right choice in most cases when the difference actually matters.
773* It may fail and pick the slower converging side.
774*
775 entra = zero
776 entrat = zero
777 IF ( l2tran ) THEN
778*
779 xsc = zero
780 temp1 = one
781 CALL dlassq( n, sva, 1, xsc, temp1 )
782 temp1 = one / temp1
783*
784 entra = zero
785 DO 1113 p = 1, n
786 big1 = ( ( sva(p) / xsc )**2 ) * temp1
787 IF ( big1 .NE. zero ) entra = entra + big1 * dlog(big1)
788 1113 CONTINUE
789 entra = - entra / dlog(dble(n))
790*
791* Now, SVA().^2/Trace(A^t * A) is a point in the probability simplex.
792* It is derived from the diagonal of A^t * A. Do the same with the
793* diagonal of A * A^t, compute the entropy of the corresponding
794* probability distribution. Note that A * A^t and A^t * A have the
795* same trace.
796*
797 entrat = zero
798 DO 1114 p = n+1, n+m
799 big1 = ( ( work(p) / xsc )**2 ) * temp1
800 IF ( big1 .NE. zero ) entrat = entrat + big1 * dlog(big1)
801 1114 CONTINUE
802 entrat = - entrat / dlog(dble(m))
803*
804* Analyze the entropies and decide A or A^t. Smaller entropy
805* usually means better input for the algorithm.
806*
807 transp = ( entrat .LT. entra )
808*
809* If A^t is better than A, transpose A.
810*
811 IF ( transp ) THEN
812* In an optimal implementation, this trivial transpose
813* should be replaced with faster transpose.
814 DO 1115 p = 1, n - 1
815 DO 1116 q = p + 1, n
816 temp1 = a(q,p)
817 a(q,p) = a(p,q)
818 a(p,q) = temp1
819 1116 CONTINUE
820 1115 CONTINUE
821 DO 1117 p = 1, n
822 work(m+n+p) = sva(p)
823 sva(p) = work(n+p)
824 1117 CONTINUE
825 temp1 = aapp
826 aapp = aatmax
827 aatmax = temp1
828 temp1 = aaqq
829 aaqq = aatmin
830 aatmin = temp1
831 kill = lsvec
832 lsvec = rsvec
833 rsvec = kill
834 IF ( lsvec ) n1 = n
835*
836 rowpiv = .true.
837 END IF
838*
839 END IF
840* END IF L2TRAN
841*
842* Scale the matrix so that its maximal singular value remains less
843* than DSQRT(BIG) -- the matrix is scaled so that its maximal column
844* has Euclidean norm equal to DSQRT(BIG/N). The only reason to keep
845* DSQRT(BIG) instead of BIG is the fact that DGEJSV uses LAPACK and
846* BLAS routines that, in some implementations, are not capable of
847* working in the full interval [SFMIN,BIG] and that they may provoke
848* overflows in the intermediate results. If the singular values spread
849* from SFMIN to BIG, then DGESVJ will compute them. So, in that case,
850* one should use DGESVJ instead of DGEJSV.
851*
852 big1 = dsqrt( big )
853 temp1 = dsqrt( big / dble(n) )
854*
855 CALL dlascl( 'G', 0, 0, aapp, temp1, n, 1, sva, n, ierr )
856 IF ( aaqq .GT. (aapp * sfmin) ) THEN
857 aaqq = ( aaqq / aapp ) * temp1
858 ELSE
859 aaqq = ( aaqq * temp1 ) / aapp
860 END IF
861 temp1 = temp1 * scalem
862 CALL dlascl( 'G', 0, 0, aapp, temp1, m, n, a, lda, ierr )
863*
864* To undo scaling at the end of this procedure, multiply the
865* computed singular values with USCAL2 / USCAL1.
866*
867 uscal1 = temp1
868 uscal2 = aapp
869*
870 IF ( l2kill ) THEN
871* L2KILL enforces computation of nonzero singular values in
872* the restricted range of condition number of the initial A,
873* sigma_max(A) / sigma_min(A) approx. DSQRT(BIG)/DSQRT(SFMIN).
874 xsc = dsqrt( sfmin )
875 ELSE
876 xsc = small
877*
878* Now, if the condition number of A is too big,
879* sigma_max(A) / sigma_min(A) .GT. DSQRT(BIG/N) * EPSLN / SFMIN,
880* as a precaution measure, the full SVD is computed using DGESVJ
881* with accumulated Jacobi rotations. This provides numerically
882* more robust computation, at the cost of slightly increased run
883* time. Depending on the concrete implementation of BLAS and LAPACK
884* (i.e. how they behave in presence of extreme ill-conditioning) the
885* implementor may decide to remove this switch.
886 IF ( ( aaqq.LT.dsqrt(sfmin) ) .AND. lsvec .AND. rsvec ) THEN
887 jracc = .true.
888 END IF
889*
890 END IF
891 IF ( aaqq .LT. xsc ) THEN
892 DO 700 p = 1, n
893 IF ( sva(p) .LT. xsc ) THEN
894 CALL dlaset( 'A', m, 1, zero, zero, a(1,p), lda )
895 sva(p) = zero
896 END IF
897 700 CONTINUE
898 END IF
899*
900* Preconditioning using QR factorization with pivoting
901*
902 IF ( rowpiv ) THEN
903* Optional row permutation (Bjoerck row pivoting):
904* A result by Cox and Higham shows that the Bjoerck's
905* row pivoting combined with standard column pivoting
906* has similar effect as Powell-Reid complete pivoting.
907* The ell-infinity norms of A are made nonincreasing.
908 DO 1952 p = 1, m - 1
909 q = idamax( m-p+1, work(m+n+p), 1 ) + p - 1
910 iwork(2*n+p) = q
911 IF ( p .NE. q ) THEN
912 temp1 = work(m+n+p)
913 work(m+n+p) = work(m+n+q)
914 work(m+n+q) = temp1
915 END IF
916 1952 CONTINUE
917 CALL dlaswp( n, a, lda, 1, m-1, iwork(2*n+1), 1 )
918 END IF
919*
920* End of the preparation phase (scaling, optional sorting and
921* transposing, optional flushing of small columns).
922*
923* Preconditioning
924*
925* If the full SVD is needed, the right singular vectors are computed
926* from a matrix equation, and for that we need theoretical analysis
927* of the Businger-Golub pivoting. So we use DGEQP3 as the first RR QRF.
928* In all other cases the first RR QRF can be chosen by other criteria
929* (eg speed by replacing global with restricted window pivoting, such
930* as in SGEQPX from TOMS # 782). Good results will be obtained using
931* SGEQPX with properly (!) chosen numerical parameters.
932* Any improvement of DGEQP3 improves overall performance of DGEJSV.
933*
934* A * P1 = Q1 * [ R1^t 0]^t:
935 DO 1963 p = 1, n
936* .. all columns are free columns
937 iwork(p) = 0
938 1963 CONTINUE
939 CALL dgeqp3( m,n,a,lda, iwork,work, work(n+1),lwork-n, ierr )
940*
941* The upper triangular matrix R1 from the first QRF is inspected for
942* rank deficiency and possibilities for deflation, or possible
943* ill-conditioning. Depending on the user specified flag L2RANK,
944* the procedure explores possibilities to reduce the numerical
945* rank by inspecting the computed upper triangular factor. If
946* L2RANK or L2ABER are up, then DGEJSV will compute the SVD of
947* A + dA, where ||dA|| <= f(M,N)*EPSLN.
948*
949 nr = 1
950 IF ( l2aber ) THEN
951* Standard absolute error bound suffices. All sigma_i with
952* sigma_i < N*EPSLN*||A|| are flushed to zero. This is an
953* aggressive enforcement of lower numerical rank by introducing a
954* backward error of the order of N*EPSLN*||A||.
955 temp1 = dsqrt(dble(n))*epsln
956 DO 3001 p = 2, n
957 IF ( dabs(a(p,p)) .GE. (temp1*dabs(a(1,1))) ) THEN
958 nr = nr + 1
959 ELSE
960 GO TO 3002
961 END IF
962 3001 CONTINUE
963 3002 CONTINUE
964 ELSE IF ( l2rank ) THEN
965* .. similarly as above, only slightly more gentle (less aggressive).
966* Sudden drop on the diagonal of R1 is used as the criterion for
967* close-to-rank-deficient.
968 temp1 = dsqrt(sfmin)
969 DO 3401 p = 2, n
970 IF ( ( dabs(a(p,p)) .LT. (epsln*dabs(a(p-1,p-1))) ) .OR.
971 $ ( dabs(a(p,p)) .LT. small ) .OR.
972 $ ( l2kill .AND. (dabs(a(p,p)) .LT. temp1) ) ) GO TO 3402
973 nr = nr + 1
974 3401 CONTINUE
975 3402 CONTINUE
976*
977 ELSE
978* The goal is high relative accuracy. However, if the matrix
979* has high scaled condition number the relative accuracy is in
980* general not feasible. Later on, a condition number estimator
981* will be deployed to estimate the scaled condition number.
982* Here we just remove the underflowed part of the triangular
983* factor. This prevents the situation in which the code is
984* working hard to get the accuracy not warranted by the data.
985 temp1 = dsqrt(sfmin)
986 DO 3301 p = 2, n
987 IF ( ( dabs(a(p,p)) .LT. small ) .OR.
988 $ ( l2kill .AND. (dabs(a(p,p)) .LT. temp1) ) ) GO TO 3302
989 nr = nr + 1
990 3301 CONTINUE
991 3302 CONTINUE
992*
993 END IF
994*
995 almort = .false.
996 IF ( nr .EQ. n ) THEN
997 maxprj = one
998 DO 3051 p = 2, n
999 temp1 = dabs(a(p,p)) / sva(iwork(p))
1000 maxprj = min( maxprj, temp1 )
1001 3051 CONTINUE
1002 IF ( maxprj**2 .GE. one - dble(n)*epsln ) almort = .true.
1003 END IF
1004*
1005*
1006 sconda = - one
1007 condr1 = - one
1008 condr2 = - one
1009*
1010 IF ( errest ) THEN
1011 IF ( n .EQ. nr ) THEN
1012 IF ( rsvec ) THEN
1013* .. V is available as workspace
1014 CALL dlacpy( 'U', n, n, a, lda, v, ldv )
1015 DO 3053 p = 1, n
1016 temp1 = sva(iwork(p))
1017 CALL dscal( p, one/temp1, v(1,p), 1 )
1018 3053 CONTINUE
1019 CALL dpocon( 'U', n, v, ldv, one, temp1,
1020 $ work(n+1), iwork(2*n+m+1), ierr )
1021 ELSE IF ( lsvec ) THEN
1022* .. U is available as workspace
1023 CALL dlacpy( 'U', n, n, a, lda, u, ldu )
1024 DO 3054 p = 1, n
1025 temp1 = sva(iwork(p))
1026 CALL dscal( p, one/temp1, u(1,p), 1 )
1027 3054 CONTINUE
1028 CALL dpocon( 'U', n, u, ldu, one, temp1,
1029 $ work(n+1), iwork(2*n+m+1), ierr )
1030 ELSE
1031 CALL dlacpy( 'U', n, n, a, lda, work(n+1), n )
1032 DO 3052 p = 1, n
1033 temp1 = sva(iwork(p))
1034 CALL dscal( p, one/temp1, work(n+(p-1)*n+1), 1 )
1035 3052 CONTINUE
1036* .. the columns of R are scaled to have unit Euclidean lengths.
1037 CALL dpocon( 'U', n, work(n+1), n, one, temp1,
1038 $ work(n+n*n+1), iwork(2*n+m+1), ierr )
1039 END IF
1040 sconda = one / dsqrt(temp1)
1041* SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1).
1042* N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
1043 ELSE
1044 sconda = - one
1045 END IF
1046 END IF
1047*
1048 l2pert = l2pert .AND. ( dabs( a(1,1)/a(nr,nr) ) .GT. dsqrt(big1) )
1049* If there is no violent scaling, artificial perturbation is not needed.
1050*
1051* Phase 3:
1052*
1053 IF ( .NOT. ( rsvec .OR. lsvec ) ) THEN
1054*
1055* Singular Values only
1056*
1057* .. transpose A(1:NR,1:N)
1058 DO 1946 p = 1, min( n-1, nr )
1059 CALL dcopy( n-p, a(p,p+1), lda, a(p+1,p), 1 )
1060 1946 CONTINUE
1061*
1062* The following two DO-loops introduce small relative perturbation
1063* into the strict upper triangle of the lower triangular matrix.
1064* Small entries below the main diagonal are also changed.
1065* This modification is useful if the computing environment does not
1066* provide/allow FLUSH TO ZERO underflow, for it prevents many
1067* annoying denormalized numbers in case of strongly scaled matrices.
1068* The perturbation is structured so that it does not introduce any
1069* new perturbation of the singular values, and it does not destroy
1070* the job done by the preconditioner.
1071* The licence for this perturbation is in the variable L2PERT, which
1072* should be .FALSE. if FLUSH TO ZERO underflow is active.
1073*
1074 IF ( .NOT. almort ) THEN
1075*
1076 IF ( l2pert ) THEN
1077* XSC = DSQRT(SMALL)
1078 xsc = epsln / dble(n)
1079 DO 4947 q = 1, nr
1080 temp1 = xsc*dabs(a(q,q))
1081 DO 4949 p = 1, n
1082 IF ( ( (p.GT.q) .AND. (dabs(a(p,q)).LE.temp1) )
1083 $ .OR. ( p .LT. q ) )
1084 $ a(p,q) = dsign( temp1, a(p,q) )
1085 4949 CONTINUE
1086 4947 CONTINUE
1087 ELSE
1088 CALL dlaset( 'U', nr-1,nr-1, zero,zero, a(1,2),lda )
1089 END IF
1090*
1091* .. second preconditioning using the QR factorization
1092*
1093 CALL dgeqrf( n,nr, a,lda, work, work(n+1),lwork-n, ierr )
1094*
1095* .. and transpose upper to lower triangular
1096 DO 1948 p = 1, nr - 1
1097 CALL dcopy( nr-p, a(p,p+1), lda, a(p+1,p), 1 )
1098 1948 CONTINUE
1099*
1100 END IF
1101*
1102* Row-cyclic Jacobi SVD algorithm with column pivoting
1103*
1104* .. again some perturbation (a "background noise") is added
1105* to drown denormals
1106 IF ( l2pert ) THEN
1107* XSC = DSQRT(SMALL)
1108 xsc = epsln / dble(n)
1109 DO 1947 q = 1, nr
1110 temp1 = xsc*dabs(a(q,q))
1111 DO 1949 p = 1, nr
1112 IF ( ( (p.GT.q) .AND. (dabs(a(p,q)).LE.temp1) )
1113 $ .OR. ( p .LT. q ) )
1114 $ a(p,q) = dsign( temp1, a(p,q) )
1115 1949 CONTINUE
1116 1947 CONTINUE
1117 ELSE
1118 CALL dlaset( 'U', nr-1, nr-1, zero, zero, a(1,2), lda )
1119 END IF
1120*
1121* .. and one-sided Jacobi rotations are started on a lower
1122* triangular matrix (plus perturbation which is ignored in
1123* the part which destroys triangular form (confusing?!))
1124*
1125 CALL dgesvj( 'L', 'NoU', 'NoV', nr, nr, a, lda, sva,
1126 $ n, v, ldv, work, lwork, info )
1127*
1128 scalem = work(1)
1129 numrank = idnint(work(2))
1130*
1131*
1132 ELSE IF ( rsvec .AND. ( .NOT. lsvec ) ) THEN
1133*
1134* -> Singular Values and Right Singular Vectors <-
1135*
1136 IF ( almort ) THEN
1137*
1138* .. in this case NR equals N
1139 DO 1998 p = 1, nr
1140 CALL dcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1141 1998 CONTINUE
1142 CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2), ldv )
1143*
1144 CALL dgesvj( 'L','U','N', n, nr, v,ldv, sva, nr, a,lda,
1145 $ work, lwork, info )
1146 scalem = work(1)
1147 numrank = idnint(work(2))
1148
1149 ELSE
1150*
1151* .. two more QR factorizations ( one QRF is not enough, two require
1152* accumulated product of Jacobi rotations, three are perfect )
1153*
1154 CALL dlaset( 'Lower', nr-1, nr-1, zero, zero, a(2,1), lda )
1155 CALL dgelqf( nr, n, a, lda, work, work(n+1), lwork-n, ierr)
1156 CALL dlacpy( 'Lower', nr, nr, a, lda, v, ldv )
1157 CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2), ldv )
1158 CALL dgeqrf( nr, nr, v, ldv, work(n+1), work(2*n+1),
1159 $ lwork-2*n, ierr )
1160 DO 8998 p = 1, nr
1161 CALL dcopy( nr-p+1, v(p,p), ldv, v(p,p), 1 )
1162 8998 CONTINUE
1163 CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2), ldv )
1164*
1165 CALL dgesvj( 'Lower', 'U','N', nr, nr, v,ldv, sva, nr, u,
1166 $ ldu, work(n+1), lwork, info )
1167 scalem = work(n+1)
1168 numrank = idnint(work(n+2))
1169 IF ( nr .LT. n ) THEN
1170 CALL dlaset( 'A',n-nr, nr, zero,zero, v(nr+1,1), ldv )
1171 CALL dlaset( 'A',nr, n-nr, zero,zero, v(1,nr+1), ldv )
1172 CALL dlaset( 'A',n-nr,n-nr,zero,one, v(nr+1,nr+1), ldv )
1173 END IF
1174*
1175 CALL dormlq( 'Left', 'Transpose', n, n, nr, a, lda, work,
1176 $ v, ldv, work(n+1), lwork-n, ierr )
1177*
1178 END IF
1179*
1180 DO 8991 p = 1, n
1181 CALL dcopy( n, v(p,1), ldv, a(iwork(p),1), lda )
1182 8991 CONTINUE
1183 CALL dlacpy( 'All', n, n, a, lda, v, ldv )
1184*
1185 IF ( transp ) THEN
1186 CALL dlacpy( 'All', n, n, v, ldv, u, ldu )
1187 END IF
1188*
1189 ELSE IF ( lsvec .AND. ( .NOT. rsvec ) ) THEN
1190*
1191* .. Singular Values and Left Singular Vectors ..
1192*
1193* .. second preconditioning step to avoid need to accumulate
1194* Jacobi rotations in the Jacobi iterations.
1195 DO 1965 p = 1, nr
1196 CALL dcopy( n-p+1, a(p,p), lda, u(p,p), 1 )
1197 1965 CONTINUE
1198 CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, u(1,2), ldu )
1199*
1200 CALL dgeqrf( n, nr, u, ldu, work(n+1), work(2*n+1),
1201 $ lwork-2*n, ierr )
1202*
1203 DO 1967 p = 1, nr - 1
1204 CALL dcopy( nr-p, u(p,p+1), ldu, u(p+1,p), 1 )
1205 1967 CONTINUE
1206 CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, u(1,2), ldu )
1207*
1208 CALL dgesvj( 'Lower', 'U', 'N', nr,nr, u, ldu, sva, nr, a,
1209 $ lda, work(n+1), lwork-n, info )
1210 scalem = work(n+1)
1211 numrank = idnint(work(n+2))
1212*
1213 IF ( nr .LT. m ) THEN
1214 CALL dlaset( 'A', m-nr, nr,zero, zero, u(nr+1,1), ldu )
1215 IF ( nr .LT. n1 ) THEN
1216 CALL dlaset( 'A',nr, n1-nr, zero, zero, u(1,nr+1), ldu )
1217 CALL dlaset( 'A',m-nr,n1-nr,zero,one,u(nr+1,nr+1), ldu )
1218 END IF
1219 END IF
1220*
1221 CALL dormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1222 $ ldu, work(n+1), lwork-n, ierr )
1223*
1224 IF ( rowpiv )
1225 $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1226*
1227 DO 1974 p = 1, n1
1228 xsc = one / dnrm2( m, u(1,p), 1 )
1229 CALL dscal( m, xsc, u(1,p), 1 )
1230 1974 CONTINUE
1231*
1232 IF ( transp ) THEN
1233 CALL dlacpy( 'All', n, n, u, ldu, v, ldv )
1234 END IF
1235*
1236 ELSE
1237*
1238* .. Full SVD ..
1239*
1240 IF ( .NOT. jracc ) THEN
1241*
1242 IF ( .NOT. almort ) THEN
1243*
1244* Second Preconditioning Step (QRF [with pivoting])
1245* Note that the composition of TRANSPOSE, QRF and TRANSPOSE is
1246* equivalent to an LQF CALL. Since in many libraries the QRF
1247* seems to be better optimized than the LQF, we do explicit
1248* transpose and use the QRF. This is subject to changes in an
1249* optimized implementation of DGEJSV.
1250*
1251 DO 1968 p = 1, nr
1252 CALL dcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1253 1968 CONTINUE
1254*
1255* .. the following two loops perturb small entries to avoid
1256* denormals in the second QR factorization, where they are
1257* as good as zeros. This is done to avoid painfully slow
1258* computation with denormals. The relative size of the perturbation
1259* is a parameter that can be changed by the implementer.
1260* This perturbation device will be obsolete on machines with
1261* properly implemented arithmetic.
1262* To switch it off, set L2PERT=.FALSE. To remove it from the
1263* code, remove the action under L2PERT=.TRUE., leave the ELSE part.
1264* The following two loops should be blocked and fused with the
1265* transposed copy above.
1266*
1267 IF ( l2pert ) THEN
1268 xsc = dsqrt(small)
1269 DO 2969 q = 1, nr
1270 temp1 = xsc*dabs( v(q,q) )
1271 DO 2968 p = 1, n
1272 IF ( ( p .GT. q ) .AND. ( dabs(v(p,q)) .LE. temp1 )
1273 $ .OR. ( p .LT. q ) )
1274 $ v(p,q) = dsign( temp1, v(p,q) )
1275 IF ( p .LT. q ) v(p,q) = - v(p,q)
1276 2968 CONTINUE
1277 2969 CONTINUE
1278 ELSE
1279 CALL dlaset( 'U', nr-1, nr-1, zero, zero, v(1,2), ldv )
1280 END IF
1281*
1282* Estimate the row scaled condition number of R1
1283* (If R1 is rectangular, N > NR, then the condition number
1284* of the leading NR x NR submatrix is estimated.)
1285*
1286 CALL dlacpy( 'L', nr, nr, v, ldv, work(2*n+1), nr )
1287 DO 3950 p = 1, nr
1288 temp1 = dnrm2(nr-p+1,work(2*n+(p-1)*nr+p),1)
1289 CALL dscal(nr-p+1,one/temp1,work(2*n+(p-1)*nr+p),1)
1290 3950 CONTINUE
1291 CALL dpocon('Lower',nr,work(2*n+1),nr,one,temp1,
1292 $ work(2*n+nr*nr+1),iwork(m+2*n+1),ierr)
1293 condr1 = one / dsqrt(temp1)
1294* .. here need a second opinion on the condition number
1295* .. then assume worst case scenario
1296* R1 is OK for inverse <=> CONDR1 .LT. DBLE(N)
1297* more conservative <=> CONDR1 .LT. DSQRT(DBLE(N))
1298*
1299 cond_ok = dsqrt(dble(nr))
1300*[TP] COND_OK is a tuning parameter.
1301
1302 IF ( condr1 .LT. cond_ok ) THEN
1303* .. the second QRF without pivoting. Note: in an optimized
1304* implementation, this QRF should be implemented as the QRF
1305* of a lower triangular matrix.
1306* R1^t = Q2 * R2
1307 CALL dgeqrf( n, nr, v, ldv, work(n+1), work(2*n+1),
1308 $ lwork-2*n, ierr )
1309*
1310 IF ( l2pert ) THEN
1311 xsc = dsqrt(small)/epsln
1312 DO 3959 p = 2, nr
1313 DO 3958 q = 1, p - 1
1314 temp1 = xsc * min(dabs(v(p,p)),dabs(v(q,q)))
1315 IF ( dabs(v(q,p)) .LE. temp1 )
1316 $ v(q,p) = dsign( temp1, v(q,p) )
1317 3958 CONTINUE
1318 3959 CONTINUE
1319 END IF
1320*
1321 IF ( nr .NE. n )
1322 $ CALL dlacpy( 'A', n, nr, v, ldv, work(2*n+1), n )
1323* .. save ...
1324*
1325* .. this transposed copy should be better than naive
1326 DO 1969 p = 1, nr - 1
1327 CALL dcopy( nr-p, v(p,p+1), ldv, v(p+1,p), 1 )
1328 1969 CONTINUE
1329*
1330 condr2 = condr1
1331*
1332 ELSE
1333*
1334* .. ill-conditioned case: second QRF with pivoting
1335* Note that windowed pivoting would be equally good
1336* numerically, and more run-time efficient. So, in
1337* an optimal implementation, the next call to DGEQP3
1338* should be replaced with eg. CALL SGEQPX (ACM TOMS #782)
1339* with properly (carefully) chosen parameters.
1340*
1341* R1^t * P2 = Q2 * R2
1342 DO 3003 p = 1, nr
1343 iwork(n+p) = 0
1344 3003 CONTINUE
1345 CALL dgeqp3( n, nr, v, ldv, iwork(n+1), work(n+1),
1346 $ work(2*n+1), lwork-2*n, ierr )
1347** CALL DGEQRF( N, NR, V, LDV, WORK(N+1), WORK(2*N+1),
1348** $ LWORK-2*N, IERR )
1349 IF ( l2pert ) THEN
1350 xsc = dsqrt(small)
1351 DO 3969 p = 2, nr
1352 DO 3968 q = 1, p - 1
1353 temp1 = xsc * min(dabs(v(p,p)),dabs(v(q,q)))
1354 IF ( dabs(v(q,p)) .LE. temp1 )
1355 $ v(q,p) = dsign( temp1, v(q,p) )
1356 3968 CONTINUE
1357 3969 CONTINUE
1358 END IF
1359*
1360 CALL dlacpy( 'A', n, nr, v, ldv, work(2*n+1), n )
1361*
1362 IF ( l2pert ) THEN
1363 xsc = dsqrt(small)
1364 DO 8970 p = 2, nr
1365 DO 8971 q = 1, p - 1
1366 temp1 = xsc * min(dabs(v(p,p)),dabs(v(q,q)))
1367 v(p,q) = - dsign( temp1, v(q,p) )
1368 8971 CONTINUE
1369 8970 CONTINUE
1370 ELSE
1371 CALL dlaset( 'L',nr-1,nr-1,zero,zero,v(2,1),ldv )
1372 END IF
1373* Now, compute R2 = L3 * Q3, the LQ factorization.
1374 CALL dgelqf( nr, nr, v, ldv, work(2*n+n*nr+1),
1375 $ work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, ierr )
1376* .. and estimate the condition number
1377 CALL dlacpy( 'L',nr,nr,v,ldv,work(2*n+n*nr+nr+1),nr )
1378 DO 4950 p = 1, nr
1379 temp1 = dnrm2( p, work(2*n+n*nr+nr+p), nr )
1380 CALL dscal( p, one/temp1, work(2*n+n*nr+nr+p), nr )
1381 4950 CONTINUE
1382 CALL dpocon( 'L',nr,work(2*n+n*nr+nr+1),nr,one,temp1,
1383 $ work(2*n+n*nr+nr+nr*nr+1),iwork(m+2*n+1),ierr )
1384 condr2 = one / dsqrt(temp1)
1385*
1386 IF ( condr2 .GE. cond_ok ) THEN
1387* .. save the Householder vectors used for Q3
1388* (this overwrites the copy of R2, as it will not be
1389* needed in this branch, but it does not overwrite the
1390* Huseholder vectors of Q2.).
1391 CALL dlacpy( 'U', nr, nr, v, ldv, work(2*n+1), n )
1392* .. and the rest of the information on Q3 is in
1393* WORK(2*N+N*NR+1:2*N+N*NR+N)
1394 END IF
1395*
1396 END IF
1397*
1398 IF ( l2pert ) THEN
1399 xsc = dsqrt(small)
1400 DO 4968 q = 2, nr
1401 temp1 = xsc * v(q,q)
1402 DO 4969 p = 1, q - 1
1403* V(p,q) = - DSIGN( TEMP1, V(q,p) )
1404 v(p,q) = - dsign( temp1, v(p,q) )
1405 4969 CONTINUE
1406 4968 CONTINUE
1407 ELSE
1408 CALL dlaset( 'U', nr-1,nr-1, zero,zero, v(1,2), ldv )
1409 END IF
1410*
1411* Second preconditioning finished; continue with Jacobi SVD
1412* The input matrix is lower triangular.
1413*
1414* Recover the right singular vectors as solution of a well
1415* conditioned triangular matrix equation.
1416*
1417 IF ( condr1 .LT. cond_ok ) THEN
1418*
1419 CALL dgesvj( 'L','U','N',nr,nr,v,ldv,sva,nr,u,
1420 $ ldu,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,info )
1421 scalem = work(2*n+n*nr+nr+1)
1422 numrank = idnint(work(2*n+n*nr+nr+2))
1423 DO 3970 p = 1, nr
1424 CALL dcopy( nr, v(1,p), 1, u(1,p), 1 )
1425 CALL dscal( nr, sva(p), v(1,p), 1 )
1426 3970 CONTINUE
1427
1428* .. pick the right matrix equation and solve it
1429*
1430 IF ( nr .EQ. n ) THEN
1431* :)) .. best case, R1 is inverted. The solution of this matrix
1432* equation is Q2*V2 = the product of the Jacobi rotations
1433* used in DGESVJ, premultiplied with the orthogonal matrix
1434* from the second QR factorization.
1435 CALL dtrsm( 'L','U','N','N', nr,nr,one, a,lda, v,ldv )
1436 ELSE
1437* .. R1 is well conditioned, but non-square. Transpose(R2)
1438* is inverted to get the product of the Jacobi rotations
1439* used in DGESVJ. The Q-factor from the second QR
1440* factorization is then built in explicitly.
1441 CALL dtrsm('L','U','T','N',nr,nr,one,work(2*n+1),
1442 $ n,v,ldv)
1443 IF ( nr .LT. n ) THEN
1444 CALL dlaset('A',n-nr,nr,zero,zero,v(nr+1,1),ldv)
1445 CALL dlaset('A',nr,n-nr,zero,zero,v(1,nr+1),ldv)
1446 CALL dlaset('A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv)
1447 END IF
1448 CALL dormqr('L','N',n,n,nr,work(2*n+1),n,work(n+1),
1449 $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr)
1450 END IF
1451*
1452 ELSE IF ( condr2 .LT. cond_ok ) THEN
1453*
1454* :) .. the input matrix A is very likely a relative of
1455* the Kahan matrix :)
1456* The matrix R2 is inverted. The solution of the matrix equation
1457* is Q3^T*V3 = the product of the Jacobi rotations (applied to
1458* the lower triangular L3 from the LQ factorization of
1459* R2=L3*Q3), pre-multiplied with the transposed Q3.
1460 CALL dgesvj( 'L', 'U', 'N', nr, nr, v, ldv, sva, nr, u,
1461 $ ldu, work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, info )
1462 scalem = work(2*n+n*nr+nr+1)
1463 numrank = idnint(work(2*n+n*nr+nr+2))
1464 DO 3870 p = 1, nr
1465 CALL dcopy( nr, v(1,p), 1, u(1,p), 1 )
1466 CALL dscal( nr, sva(p), u(1,p), 1 )
1467 3870 CONTINUE
1468 CALL dtrsm('L','U','N','N',nr,nr,one,work(2*n+1),n,u,ldu)
1469* .. apply the permutation from the second QR factorization
1470 DO 873 q = 1, nr
1471 DO 872 p = 1, nr
1472 work(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1473 872 CONTINUE
1474 DO 874 p = 1, nr
1475 u(p,q) = work(2*n+n*nr+nr+p)
1476 874 CONTINUE
1477 873 CONTINUE
1478 IF ( nr .LT. n ) THEN
1479 CALL dlaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1480 CALL dlaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1481 CALL dlaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1482 END IF
1483 CALL dormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1484 $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1485 ELSE
1486* Last line of defense.
1487* #:( This is a rather pathological case: no scaled condition
1488* improvement after two pivoted QR factorizations. Other
1489* possibility is that the rank revealing QR factorization
1490* or the condition estimator has failed, or the COND_OK
1491* is set very close to ONE (which is unnecessary). Normally,
1492* this branch should never be executed, but in rare cases of
1493* failure of the RRQR or condition estimator, the last line of
1494* defense ensures that DGEJSV completes the task.
1495* Compute the full SVD of L3 using DGESVJ with explicit
1496* accumulation of Jacobi rotations.
1497 CALL dgesvj( 'L', 'U', 'V', nr, nr, v, ldv, sva, nr, u,
1498 $ ldu, work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, info )
1499 scalem = work(2*n+n*nr+nr+1)
1500 numrank = idnint(work(2*n+n*nr+nr+2))
1501 IF ( nr .LT. n ) THEN
1502 CALL dlaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1503 CALL dlaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1504 CALL dlaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1505 END IF
1506 CALL dormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1507 $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1508*
1509 CALL dormlq( 'L', 'T', nr, nr, nr, work(2*n+1), n,
1510 $ work(2*n+n*nr+1), u, ldu, work(2*n+n*nr+nr+1),
1511 $ lwork-2*n-n*nr-nr, ierr )
1512 DO 773 q = 1, nr
1513 DO 772 p = 1, nr
1514 work(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1515 772 CONTINUE
1516 DO 774 p = 1, nr
1517 u(p,q) = work(2*n+n*nr+nr+p)
1518 774 CONTINUE
1519 773 CONTINUE
1520*
1521 END IF
1522*
1523* Permute the rows of V using the (column) permutation from the
1524* first QRF. Also, scale the columns to make them unit in
1525* Euclidean norm. This applies to all cases.
1526*
1527 temp1 = dsqrt(dble(n)) * epsln
1528 DO 1972 q = 1, n
1529 DO 972 p = 1, n
1530 work(2*n+n*nr+nr+iwork(p)) = v(p,q)
1531 972 CONTINUE
1532 DO 973 p = 1, n
1533 v(p,q) = work(2*n+n*nr+nr+p)
1534 973 CONTINUE
1535 xsc = one / dnrm2( n, v(1,q), 1 )
1536 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1537 $ CALL dscal( n, xsc, v(1,q), 1 )
1538 1972 CONTINUE
1539* At this moment, V contains the right singular vectors of A.
1540* Next, assemble the left singular vector matrix U (M x N).
1541 IF ( nr .LT. m ) THEN
1542 CALL dlaset( 'A', m-nr, nr, zero, zero, u(nr+1,1), ldu )
1543 IF ( nr .LT. n1 ) THEN
1544 CALL dlaset('A',nr,n1-nr,zero,zero,u(1,nr+1),ldu)
1545 CALL dlaset('A',m-nr,n1-nr,zero,one,u(nr+1,nr+1),ldu)
1546 END IF
1547 END IF
1548*
1549* The Q matrix from the first QRF is built into the left singular
1550* matrix U. This applies to all cases.
1551*
1552 CALL dormqr( 'Left', 'No_Tr', m, n1, n, a, lda, work, u,
1553 $ ldu, work(n+1), lwork-n, ierr )
1554
1555* The columns of U are normalized. The cost is O(M*N) flops.
1556 temp1 = dsqrt(dble(m)) * epsln
1557 DO 1973 p = 1, nr
1558 xsc = one / dnrm2( m, u(1,p), 1 )
1559 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1560 $ CALL dscal( m, xsc, u(1,p), 1 )
1561 1973 CONTINUE
1562*
1563* If the initial QRF is computed with row pivoting, the left
1564* singular vectors must be adjusted.
1565*
1566 IF ( rowpiv )
1567 $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1568*
1569 ELSE
1570*
1571* .. the initial matrix A has almost orthogonal columns and
1572* the second QRF is not needed
1573*
1574 CALL dlacpy( 'Upper', n, n, a, lda, work(n+1), n )
1575 IF ( l2pert ) THEN
1576 xsc = dsqrt(small)
1577 DO 5970 p = 2, n
1578 temp1 = xsc * work( n + (p-1)*n + p )
1579 DO 5971 q = 1, p - 1
1580 work(n+(q-1)*n+p)=-dsign(temp1,work(n+(p-1)*n+q))
1581 5971 CONTINUE
1582 5970 CONTINUE
1583 ELSE
1584 CALL dlaset( 'Lower',n-1,n-1,zero,zero,work(n+2),n )
1585 END IF
1586*
1587 CALL dgesvj( 'Upper', 'U', 'N', n, n, work(n+1), n, sva,
1588 $ n, u, ldu, work(n+n*n+1), lwork-n-n*n, info )
1589*
1590 scalem = work(n+n*n+1)
1591 numrank = idnint(work(n+n*n+2))
1592 DO 6970 p = 1, n
1593 CALL dcopy( n, work(n+(p-1)*n+1), 1, u(1,p), 1 )
1594 CALL dscal( n, sva(p), work(n+(p-1)*n+1), 1 )
1595 6970 CONTINUE
1596*
1597 CALL dtrsm( 'Left', 'Upper', 'NoTrans', 'No UD', n, n,
1598 $ one, a, lda, work(n+1), n )
1599 DO 6972 p = 1, n
1600 CALL dcopy( n, work(n+p), n, v(iwork(p),1), ldv )
1601 6972 CONTINUE
1602 temp1 = dsqrt(dble(n))*epsln
1603 DO 6971 p = 1, n
1604 xsc = one / dnrm2( n, v(1,p), 1 )
1605 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1606 $ CALL dscal( n, xsc, v(1,p), 1 )
1607 6971 CONTINUE
1608*
1609* Assemble the left singular vector matrix U (M x N).
1610*
1611 IF ( n .LT. m ) THEN
1612 CALL dlaset( 'A', m-n, n, zero, zero, u(n+1,1), ldu )
1613 IF ( n .LT. n1 ) THEN
1614 CALL dlaset( 'A',n, n1-n, zero, zero, u(1,n+1),ldu )
1615 CALL dlaset( 'A',m-n,n1-n, zero, one,u(n+1,n+1),ldu )
1616 END IF
1617 END IF
1618 CALL dormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1619 $ ldu, work(n+1), lwork-n, ierr )
1620 temp1 = dsqrt(dble(m))*epsln
1621 DO 6973 p = 1, n1
1622 xsc = one / dnrm2( m, u(1,p), 1 )
1623 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1624 $ CALL dscal( m, xsc, u(1,p), 1 )
1625 6973 CONTINUE
1626*
1627 IF ( rowpiv )
1628 $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1629*
1630 END IF
1631*
1632* end of the >> almost orthogonal case << in the full SVD
1633*
1634 ELSE
1635*
1636* This branch deploys a preconditioned Jacobi SVD with explicitly
1637* accumulated rotations. It is included as optional, mainly for
1638* experimental purposes. It does perform well, and can also be used.
1639* In this implementation, this branch will be automatically activated
1640* if the condition number sigma_max(A) / sigma_min(A) is predicted
1641* to be greater than the overflow threshold. This is because the
1642* a posteriori computation of the singular vectors assumes robust
1643* implementation of BLAS and some LAPACK procedures, capable of working
1644* in presence of extreme values. Since that is not always the case, ...
1645*
1646 DO 7968 p = 1, nr
1647 CALL dcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1648 7968 CONTINUE
1649*
1650 IF ( l2pert ) THEN
1651 xsc = dsqrt(small/epsln)
1652 DO 5969 q = 1, nr
1653 temp1 = xsc*dabs( v(q,q) )
1654 DO 5968 p = 1, n
1655 IF ( ( p .GT. q ) .AND. ( dabs(v(p,q)) .LE. temp1 )
1656 $ .OR. ( p .LT. q ) )
1657 $ v(p,q) = dsign( temp1, v(p,q) )
1658 IF ( p .LT. q ) v(p,q) = - v(p,q)
1659 5968 CONTINUE
1660 5969 CONTINUE
1661 ELSE
1662 CALL dlaset( 'U', nr-1, nr-1, zero, zero, v(1,2), ldv )
1663 END IF
1664
1665 CALL dgeqrf( n, nr, v, ldv, work(n+1), work(2*n+1),
1666 $ lwork-2*n, ierr )
1667 CALL dlacpy( 'L', n, nr, v, ldv, work(2*n+1), n )
1668*
1669 DO 7969 p = 1, nr
1670 CALL dcopy( nr-p+1, v(p,p), ldv, u(p,p), 1 )
1671 7969 CONTINUE
1672
1673 IF ( l2pert ) THEN
1674 xsc = dsqrt(small/epsln)
1675 DO 9970 q = 2, nr
1676 DO 9971 p = 1, q - 1
1677 temp1 = xsc * min(dabs(u(p,p)),dabs(u(q,q)))
1678 u(p,q) = - dsign( temp1, u(q,p) )
1679 9971 CONTINUE
1680 9970 CONTINUE
1681 ELSE
1682 CALL dlaset('U', nr-1, nr-1, zero, zero, u(1,2), ldu )
1683 END IF
1684
1685 CALL dgesvj( 'G', 'U', 'V', nr, nr, u, ldu, sva,
1686 $ n, v, ldv, work(2*n+n*nr+1), lwork-2*n-n*nr, info )
1687 scalem = work(2*n+n*nr+1)
1688 numrank = idnint(work(2*n+n*nr+2))
1689
1690 IF ( nr .LT. n ) THEN
1691 CALL dlaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1692 CALL dlaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1693 CALL dlaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1694 END IF
1695
1696 CALL dormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1697 $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1698*
1699* Permute the rows of V using the (column) permutation from the
1700* first QRF. Also, scale the columns to make them unit in
1701* Euclidean norm. This applies to all cases.
1702*
1703 temp1 = dsqrt(dble(n)) * epsln
1704 DO 7972 q = 1, n
1705 DO 8972 p = 1, n
1706 work(2*n+n*nr+nr+iwork(p)) = v(p,q)
1707 8972 CONTINUE
1708 DO 8973 p = 1, n
1709 v(p,q) = work(2*n+n*nr+nr+p)
1710 8973 CONTINUE
1711 xsc = one / dnrm2( n, v(1,q), 1 )
1712 IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1713 $ CALL dscal( n, xsc, v(1,q), 1 )
1714 7972 CONTINUE
1715*
1716* At this moment, V contains the right singular vectors of A.
1717* Next, assemble the left singular vector matrix U (M x N).
1718*
1719 IF ( nr .LT. m ) THEN
1720 CALL dlaset( 'A', m-nr, nr, zero, zero, u(nr+1,1), ldu )
1721 IF ( nr .LT. n1 ) THEN
1722 CALL dlaset( 'A',nr, n1-nr, zero, zero, u(1,nr+1),ldu )
1723 CALL dlaset( 'A',m-nr,n1-nr, zero, one,u(nr+1,nr+1),ldu )
1724 END IF
1725 END IF
1726*
1727 CALL dormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1728 $ ldu, work(n+1), lwork-n, ierr )
1729*
1730 IF ( rowpiv )
1731 $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1732*
1733*
1734 END IF
1735 IF ( transp ) THEN
1736* .. swap U and V because the procedure worked on A^t
1737 DO 6974 p = 1, n
1738 CALL dswap( n, u(1,p), 1, v(1,p), 1 )
1739 6974 CONTINUE
1740 END IF
1741*
1742 END IF
1743* end of the full SVD
1744*
1745* Undo scaling, if necessary (and possible)
1746*
1747 IF ( uscal2 .LE. (big/sva(1))*uscal1 ) THEN
1748 CALL dlascl( 'G', 0, 0, uscal1, uscal2, nr, 1, sva, n, ierr )
1749 uscal1 = one
1750 uscal2 = one
1751 END IF
1752*
1753 IF ( nr .LT. n ) THEN
1754 DO 3004 p = nr+1, n
1755 sva(p) = zero
1756 3004 CONTINUE
1757 END IF
1758*
1759 work(1) = uscal2 * scalem
1760 work(2) = uscal1
1761 IF ( errest ) work(3) = sconda
1762 IF ( lsvec .AND. rsvec ) THEN
1763 work(4) = condr1
1764 work(5) = condr2
1765 END IF
1766 IF ( l2tran ) THEN
1767 work(6) = entra
1768 work(7) = entrat
1769 END IF
1770*
1771 iwork(1) = nr
1772 iwork(2) = numrank
1773 iwork(3) = warning
1774*
1775 RETURN
1776* ..
1777* .. END OF DGEJSV
1778* ..
1779 END
1780*
subroutine xerbla(srname, info)
Definition cblat2.f:3285
subroutine dcopy(n, dx, incx, dy, incy)
DCOPY
Definition dcopy.f:82
subroutine dgejsv(joba, jobu, jobv, jobr, jobt, jobp, m, n, a, lda, sva, u, ldu, v, ldv, work, lwork, iwork, info)
DGEJSV
Definition dgejsv.f:476
subroutine dgelqf(m, n, a, lda, tau, work, lwork, info)
DGELQF
Definition dgelqf.f:143
subroutine dgeqp3(m, n, a, lda, jpvt, tau, work, lwork, info)
DGEQP3
Definition dgeqp3.f:151
subroutine dgeqrf(m, n, a, lda, tau, work, lwork, info)
DGEQRF
Definition dgeqrf.f:146
subroutine dgesvj(joba, jobu, jobv, m, n, a, lda, sva, mv, v, ldv, work, lwork, info)
DGESVJ
Definition dgesvj.f:337
subroutine dlacpy(uplo, m, n, a, lda, b, ldb)
DLACPY copies all or part of one two-dimensional array to another.
Definition dlacpy.f:103
subroutine dlascl(type, kl, ku, cfrom, cto, m, n, a, lda, info)
DLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition dlascl.f:143
subroutine dlaset(uplo, m, n, alpha, beta, a, lda)
DLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values.
Definition dlaset.f:110
subroutine dlassq(n, x, incx, scale, sumsq)
DLASSQ updates a sum of squares represented in scaled form.
Definition dlassq.f90:124
subroutine dlaswp(n, a, lda, k1, k2, ipiv, incx)
DLASWP performs a series of row interchanges on a general rectangular matrix.
Definition dlaswp.f:115
subroutine dpocon(uplo, n, a, lda, anorm, rcond, work, iwork, info)
DPOCON
Definition dpocon.f:121
subroutine dscal(n, da, dx, incx)
DSCAL
Definition dscal.f:79
subroutine dswap(n, dx, incx, dy, incy)
DSWAP
Definition dswap.f:82
subroutine dtrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
DTRSM
Definition dtrsm.f:181
subroutine dorgqr(m, n, k, a, lda, tau, work, lwork, info)
DORGQR
Definition dorgqr.f:128
subroutine dormlq(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)
DORMLQ
Definition dormlq.f:167
subroutine dormqr(side, trans, m, n, k, a, lda, tau, c, ldc, work, lwork, info)
DORMQR
Definition dormqr.f:167