LAPACK  3.10.1 LAPACK: Linear Algebra PACKage

## ◆ sgesvj()

 subroutine sgesvj ( character*1 JOBA, character*1 JOBU, character*1 JOBV, integer M, integer N, real, dimension( lda, * ) A, integer LDA, real, dimension( n ) SVA, integer MV, real, dimension( ldv, * ) V, integer LDV, real, dimension( lwork ) WORK, integer LWORK, integer INFO )

SGESVJ

Purpose:
``` SGESVJ computes the singular value decomposition (SVD) of a real
M-by-N matrix A, where M >= N. The SVD of A is written as
[++]   [xx]   [x0]   [xx]
A = U * SIGMA * V^t,  [++] = [xx] * [ox] * [xx]
[++]   [xx]
where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal
matrix, and V is an N-by-N orthogonal matrix. The diagonal elements
of SIGMA are the singular values of A. The columns of U and V are the
left and the right singular vectors of A, respectively.
SGESVJ can sometimes compute tiny singular values and their singular vectors much
more accurately than other SVD routines, see below under Further Details.```
Parameters
 [in] JOBA ``` JOBA is CHARACTER*1 Specifies the structure of A. = 'L': The input matrix A is lower triangular; = 'U': The input matrix A is upper triangular; = 'G': The input matrix A is general M-by-N matrix, M >= N.``` [in] JOBU ``` JOBU is CHARACTER*1 Specifies whether to compute the left singular vectors (columns of U): = 'U': The left singular vectors corresponding to the nonzero singular values are computed and returned in the leading columns of A. See more details in the description of A. The default numerical orthogonality threshold is set to approximately TOL=CTOL*EPS, CTOL=SQRT(M), EPS=SLAMCH('E'). = 'C': Analogous to JOBU='U', except that user can control the level of numerical orthogonality of the computed left singular vectors. TOL can be set to TOL = CTOL*EPS, where CTOL is given on input in the array WORK. No CTOL smaller than ONE is allowed. CTOL greater than 1 / EPS is meaningless. The option 'C' can be used if M*EPS is satisfactory orthogonality of the computed left singular vectors, so CTOL=M could save few sweeps of Jacobi rotations. See the descriptions of A and WORK(1). = 'N': The matrix U is not computed. However, see the description of A.``` [in] JOBV ``` JOBV is CHARACTER*1 Specifies whether to compute the right singular vectors, that is, the matrix V: = 'V': the matrix V is computed and returned in the array V = 'A': the Jacobi rotations are applied to the MV-by-N array V. In other words, the right singular vector matrix V is not computed explicitly; instead it is applied to an MV-by-N matrix initially stored in the first MV rows of V. = 'N': the matrix V is not computed and the array V is not referenced``` [in] M ``` M is INTEGER The number of rows of the input matrix A. 1/SLAMCH('E') > M >= 0.``` [in] N ``` N is INTEGER The number of columns of the input matrix A. M >= N >= 0.``` [in,out] A ``` A is REAL array, dimension (LDA,N) On entry, the M-by-N matrix A. On exit, If JOBU = 'U' .OR. JOBU = 'C': If INFO = 0: RANKA orthonormal columns of U are returned in the leading RANKA columns of the array A. Here RANKA <= N is the number of computed singular values of A that are above the underflow threshold SLAMCH('S'). The singular vectors corresponding to underflowed or zero singular values are not computed. The value of RANKA is returned in the array WORK as RANKA=NINT(WORK(2)). Also see the descriptions of SVA and WORK. The computed columns of U are mutually numerically orthogonal up to approximately TOL=SQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU = 'C'), see the description of JOBU. If INFO > 0, the procedure SGESVJ did not converge in the given number of iterations (sweeps). In that case, the computed columns of U may not be orthogonal up to TOL. The output U (stored in A), SIGMA (given by the computed singular values in SVA(1:N)) and V is still a decomposition of the input matrix A in the sense that the residual ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small. If JOBU = 'N': If INFO = 0: Note that the left singular vectors are 'for free' in the one-sided Jacobi SVD algorithm. However, if only the singular values are needed, the level of numerical orthogonality of U is not an issue and iterations are stopped when the columns of the iterated matrix are numerically orthogonal up to approximately M*EPS. Thus, on exit, A contains the columns of U scaled with the corresponding singular values. If INFO > 0: the procedure SGESVJ did not converge in the given number of iterations (sweeps).``` [in] LDA ``` LDA is INTEGER The leading dimension of the array A. LDA >= max(1,M).``` [out] SVA ``` SVA is REAL array, dimension (N) On exit, If INFO = 0 : depending on the value SCALE = WORK(1), we have: If SCALE = ONE: SVA(1:N) contains the computed singular values of A. During the computation SVA contains the Euclidean column norms of the iterated matrices in the array A. If SCALE .NE. ONE: The singular values of A are SCALE*SVA(1:N), and this factored representation is due to the fact that some of the singular values of A might underflow or overflow. If INFO > 0 : the procedure SGESVJ did not converge in the given number of iterations (sweeps) and SCALE*SVA(1:N) may not be accurate.``` [in] MV ``` MV is INTEGER If JOBV = 'A', then the product of Jacobi rotations in SGESVJ is applied to the first MV rows of V. See the description of JOBV.``` [in,out] V ``` V is REAL array, dimension (LDV,N) If JOBV = 'V', then V contains on exit the N-by-N matrix of the right singular vectors; If JOBV = 'A', then V contains the product of the computed right singular vector matrix and the initial matrix in the array V. If JOBV = 'N', then V is not referenced.``` [in] LDV ``` LDV is INTEGER The leading dimension of the array V, LDV >= 1. If JOBV = 'V', then LDV >= max(1,N). If JOBV = 'A', then LDV >= max(1,MV) .``` [in,out] WORK ``` WORK is REAL array, dimension (LWORK) On entry, If JOBU = 'C' : WORK(1) = CTOL, where CTOL defines the threshold for convergence. The process stops if all columns of A are mutually orthogonal up to CTOL*EPS, EPS=SLAMCH('E'). It is required that CTOL >= ONE, i.e. it is not allowed to force the routine to obtain orthogonality below EPSILON. On exit, WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N) are the computed singular vcalues of A. (See description of SVA().) WORK(2) = NINT(WORK(2)) is the number of the computed nonzero singular values. WORK(3) = NINT(WORK(3)) is the number of the computed singular values that are larger than the underflow threshold. WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi rotations needed for numerical convergence. WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep. This is useful information in cases when SGESVJ did not converge, as it can be used to estimate whether the output is still useful and for post festum analysis. WORK(6) = the largest absolute value over all sines of the Jacobi rotation angles in the last sweep. It can be useful for a post festum analysis.``` [in] LWORK ``` LWORK is INTEGER length of WORK, WORK >= MAX(6,M+N)``` [out] INFO ``` INFO is INTEGER = 0: successful exit. < 0: if INFO = -i, then the i-th argument had an illegal value > 0: SGESVJ did not converge in the maximal allowed number (30) of sweeps. The output may still be useful. See the description of WORK.```
Further Details:
The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane rotations. The rotations are implemented as fast scaled rotations of Anda and Park [1]. In the case of underflow of the Jacobi angle, a modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses column interchanges of de Rijk [2]. The relative accuracy of the computed singular values and the accuracy of the computed singular vectors (in angle metric) is as guaranteed by the theory of Demmel and Veselic [3]. The condition number that determines the accuracy in the full rank case is essentially min_{D=diag} kappa(A*D), where kappa(.) is the spectral condition number. The best performance of this Jacobi SVD procedure is achieved if used in an accelerated version of Drmac and Veselic [5,6], and it is the kernel routine in the SIGMA library [7]. Some tuning parameters (marked with [TP]) are available for the implementer.
The computational range for the nonzero singular values is the machine number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even denormalized singular values can be computed with the corresponding gradual loss of accurate digits.
Contributors:
Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
References:
[1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling.
SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174.

[2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the singular value decomposition on a vector computer.
SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371.

[3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR.
[4] Z. Drmac: Implementation of Jacobi rotations for accurate singular value computation in floating point arithmetic.
SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222.

[5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
LAPACK Working note 169.

[6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
LAPACK Working note 170.

[7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV, QSVD, (H,K)-SVD computations.
Department of Mathematics, University of Zagreb, 2008.
Please report all bugs and send interesting test examples and comments to drmac.nosp@m.@mat.nosp@m.h.hr. Thank you.

Definition at line 321 of file sgesvj.f.

323 *
324 * -- LAPACK computational routine --
325 * -- LAPACK is a software package provided by Univ. of Tennessee, --
326 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
327 *
328 * .. Scalar Arguments ..
329  INTEGER INFO, LDA, LDV, LWORK, M, MV, N
330  CHARACTER*1 JOBA, JOBU, JOBV
331 * ..
332 * .. Array Arguments ..
333  REAL A( LDA, * ), SVA( N ), V( LDV, * ),
334  \$ WORK( LWORK )
335 * ..
336 *
337 * =====================================================================
338 *
339 * .. Local Parameters ..
340  REAL ZERO, HALF, ONE
341  parameter( zero = 0.0e0, half = 0.5e0, one = 1.0e0)
342  INTEGER NSWEEP
343  parameter( nsweep = 30 )
344 * ..
345 * .. Local Scalars ..
346  REAL AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG,
347  \$ BIGTHETA, CS, CTOL, EPSLN, LARGE, MXAAPQ,
348  \$ MXSINJ, ROOTBIG, ROOTEPS, ROOTSFMIN, ROOTTOL,
349  \$ SKL, SFMIN, SMALL, SN, T, TEMP1, THETA,
350  \$ THSIGN, TOL
351  INTEGER BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1,
352  \$ ISWROT, jbc, jgl, KBL, LKAHEAD, MVL, N2, N34,
353  \$ N4, NBL, NOTROT, p, PSKIPPED, q, ROWSKIP,
354  \$ SWBAND
355  LOGICAL APPLV, GOSCALE, LOWER, LSVEC, NOSCALE, ROTOK,
356  \$ RSVEC, UCTOL, UPPER
357 * ..
358 * .. Local Arrays ..
359  REAL FASTR( 5 )
360 * ..
361 * .. Intrinsic Functions ..
362  INTRINSIC abs, max, min, float, sign, sqrt
363 * ..
364 * .. External Functions ..
365 * ..
366 * from BLAS
367  REAL SDOT, SNRM2
368  EXTERNAL sdot, snrm2
369  INTEGER ISAMAX
370  EXTERNAL isamax
371 * from LAPACK
372  REAL SLAMCH
373  EXTERNAL slamch
374  LOGICAL LSAME
375  EXTERNAL lsame
376 * ..
377 * .. External Subroutines ..
378 * ..
379 * from BLAS
380  EXTERNAL saxpy, scopy, srotm, sscal, sswap
381 * from LAPACK
382  EXTERNAL slascl, slaset, slassq, xerbla
383 *
384  EXTERNAL sgsvj0, sgsvj1
385 * ..
386 * .. Executable Statements ..
387 *
388 * Test the input arguments
389 *
390  lsvec = lsame( jobu, 'U' )
391  uctol = lsame( jobu, 'C' )
392  rsvec = lsame( jobv, 'V' )
393  applv = lsame( jobv, 'A' )
394  upper = lsame( joba, 'U' )
395  lower = lsame( joba, 'L' )
396 *
397  IF( .NOT.( upper .OR. lower .OR. lsame( joba, 'G' ) ) ) THEN
398  info = -1
399  ELSE IF( .NOT.( lsvec .OR. uctol .OR. lsame( jobu, 'N' ) ) ) THEN
400  info = -2
401  ELSE IF( .NOT.( rsvec .OR. applv .OR. lsame( jobv, 'N' ) ) ) THEN
402  info = -3
403  ELSE IF( m.LT.0 ) THEN
404  info = -4
405  ELSE IF( ( n.LT.0 ) .OR. ( n.GT.m ) ) THEN
406  info = -5
407  ELSE IF( lda.LT.m ) THEN
408  info = -7
409  ELSE IF( mv.LT.0 ) THEN
410  info = -9
411  ELSE IF( ( rsvec .AND. ( ldv.LT.n ) ) .OR.
412  \$ ( applv .AND. ( ldv.LT.mv ) ) ) THEN
413  info = -11
414  ELSE IF( uctol .AND. ( work( 1 ).LE.one ) ) THEN
415  info = -12
416  ELSE IF( lwork.LT.max( m+n, 6 ) ) THEN
417  info = -13
418  ELSE
419  info = 0
420  END IF
421 *
422 * #:(
423  IF( info.NE.0 ) THEN
424  CALL xerbla( 'SGESVJ', -info )
425  RETURN
426  END IF
427 *
428 * #:) Quick return for void matrix
429 *
430  IF( ( m.EQ.0 ) .OR. ( n.EQ.0 ) )RETURN
431 *
432 * Set numerical parameters
433 * The stopping criterion for Jacobi rotations is
434 *
435 * max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS
436 *
437 * where EPS is the round-off and CTOL is defined as follows:
438 *
439  IF( uctol ) THEN
440 * ... user controlled
441  ctol = work( 1 )
442  ELSE
443 * ... default
444  IF( lsvec .OR. rsvec .OR. applv ) THEN
445  ctol = sqrt( float( m ) )
446  ELSE
447  ctol = float( m )
448  END IF
449  END IF
450 * ... and the machine dependent parameters are
451 *[!] (Make sure that SLAMCH() works properly on the target machine.)
452 *
453  epsln = slamch( 'Epsilon' )
454  rooteps = sqrt( epsln )
455  sfmin = slamch( 'SafeMinimum' )
456  rootsfmin = sqrt( sfmin )
457  small = sfmin / epsln
458  big = slamch( 'Overflow' )
459 * BIG = ONE / SFMIN
460  rootbig = one / rootsfmin
461  large = big / sqrt( float( m*n ) )
462  bigtheta = one / rooteps
463 *
464  tol = ctol*epsln
465  roottol = sqrt( tol )
466 *
467  IF( float( m )*epsln.GE.one ) THEN
468  info = -4
469  CALL xerbla( 'SGESVJ', -info )
470  RETURN
471  END IF
472 *
473 * Initialize the right singular vector matrix.
474 *
475  IF( rsvec ) THEN
476  mvl = n
477  CALL slaset( 'A', mvl, n, zero, one, v, ldv )
478  ELSE IF( applv ) THEN
479  mvl = mv
480  END IF
481  rsvec = rsvec .OR. applv
482 *
483 * Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N )
484 *(!) If necessary, scale A to protect the largest singular value
485 * from overflow. It is possible that saving the largest singular
486 * value destroys the information about the small ones.
487 * This initial scaling is almost minimal in the sense that the
488 * goal is to make sure that no column norm overflows, and that
489 * SQRT(N)*max_i SVA(i) does not overflow. If INFinite entries
490 * in A are detected, the procedure returns with INFO=-6.
491 *
492  skl = one / sqrt( float( m )*float( n ) )
493  noscale = .true.
494  goscale = .true.
495 *
496  IF( lower ) THEN
497 * the input matrix is M-by-N lower triangular (trapezoidal)
498  DO 1874 p = 1, n
499  aapp = zero
500  aaqq = one
501  CALL slassq( m-p+1, a( p, p ), 1, aapp, aaqq )
502  IF( aapp.GT.big ) THEN
503  info = -6
504  CALL xerbla( 'SGESVJ', -info )
505  RETURN
506  END IF
507  aaqq = sqrt( aaqq )
508  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
509  sva( p ) = aapp*aaqq
510  ELSE
511  noscale = .false.
512  sva( p ) = aapp*( aaqq*skl )
513  IF( goscale ) THEN
514  goscale = .false.
515  DO 1873 q = 1, p - 1
516  sva( q ) = sva( q )*skl
517  1873 CONTINUE
518  END IF
519  END IF
520  1874 CONTINUE
521  ELSE IF( upper ) THEN
522 * the input matrix is M-by-N upper triangular (trapezoidal)
523  DO 2874 p = 1, n
524  aapp = zero
525  aaqq = one
526  CALL slassq( p, a( 1, p ), 1, aapp, aaqq )
527  IF( aapp.GT.big ) THEN
528  info = -6
529  CALL xerbla( 'SGESVJ', -info )
530  RETURN
531  END IF
532  aaqq = sqrt( aaqq )
533  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
534  sva( p ) = aapp*aaqq
535  ELSE
536  noscale = .false.
537  sva( p ) = aapp*( aaqq*skl )
538  IF( goscale ) THEN
539  goscale = .false.
540  DO 2873 q = 1, p - 1
541  sva( q ) = sva( q )*skl
542  2873 CONTINUE
543  END IF
544  END IF
545  2874 CONTINUE
546  ELSE
547 * the input matrix is M-by-N general dense
548  DO 3874 p = 1, n
549  aapp = zero
550  aaqq = one
551  CALL slassq( m, a( 1, p ), 1, aapp, aaqq )
552  IF( aapp.GT.big ) THEN
553  info = -6
554  CALL xerbla( 'SGESVJ', -info )
555  RETURN
556  END IF
557  aaqq = sqrt( aaqq )
558  IF( ( aapp.LT.( big / aaqq ) ) .AND. noscale ) THEN
559  sva( p ) = aapp*aaqq
560  ELSE
561  noscale = .false.
562  sva( p ) = aapp*( aaqq*skl )
563  IF( goscale ) THEN
564  goscale = .false.
565  DO 3873 q = 1, p - 1
566  sva( q ) = sva( q )*skl
567  3873 CONTINUE
568  END IF
569  END IF
570  3874 CONTINUE
571  END IF
572 *
573  IF( noscale )skl = one
574 *
575 * Move the smaller part of the spectrum from the underflow threshold
576 *(!) Start by determining the position of the nonzero entries of the
577 * array SVA() relative to ( SFMIN, BIG ).
578 *
579  aapp = zero
580  aaqq = big
581  DO 4781 p = 1, n
582  IF( sva( p ).NE.zero )aaqq = min( aaqq, sva( p ) )
583  aapp = max( aapp, sva( p ) )
584  4781 CONTINUE
585 *
586 * #:) Quick return for zero matrix
587 *
588  IF( aapp.EQ.zero ) THEN
589  IF( lsvec )CALL slaset( 'G', m, n, zero, one, a, lda )
590  work( 1 ) = one
591  work( 2 ) = zero
592  work( 3 ) = zero
593  work( 4 ) = zero
594  work( 5 ) = zero
595  work( 6 ) = zero
596  RETURN
597  END IF
598 *
599 * #:) Quick return for one-column matrix
600 *
601  IF( n.EQ.1 ) THEN
602  IF( lsvec )CALL slascl( 'G', 0, 0, sva( 1 ), skl, m, 1,
603  \$ a( 1, 1 ), lda, ierr )
604  work( 1 ) = one / skl
605  IF( sva( 1 ).GE.sfmin ) THEN
606  work( 2 ) = one
607  ELSE
608  work( 2 ) = zero
609  END IF
610  work( 3 ) = zero
611  work( 4 ) = zero
612  work( 5 ) = zero
613  work( 6 ) = zero
614  RETURN
615  END IF
616 *
617 * Protect small singular values from underflow, and try to
618 * avoid underflows/overflows in computing Jacobi rotations.
619 *
620  sn = sqrt( sfmin / epsln )
621  temp1 = sqrt( big / float( n ) )
622  IF( ( aapp.LE.sn ) .OR. ( aaqq.GE.temp1 ) .OR.
623  \$ ( ( sn.LE.aaqq ) .AND. ( aapp.LE.temp1 ) ) ) THEN
624  temp1 = min( big, temp1 / aapp )
625 * AAQQ = AAQQ*TEMP1
626 * AAPP = AAPP*TEMP1
627  ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.LE.temp1 ) ) THEN
628  temp1 = min( sn / aaqq, big / ( aapp*sqrt( float( n ) ) ) )
629 * AAQQ = AAQQ*TEMP1
630 * AAPP = AAPP*TEMP1
631  ELSE IF( ( aaqq.GE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
632  temp1 = max( sn / aaqq, temp1 / aapp )
633 * AAQQ = AAQQ*TEMP1
634 * AAPP = AAPP*TEMP1
635  ELSE IF( ( aaqq.LE.sn ) .AND. ( aapp.GE.temp1 ) ) THEN
636  temp1 = min( sn / aaqq, big / ( sqrt( float( n ) )*aapp ) )
637 * AAQQ = AAQQ*TEMP1
638 * AAPP = AAPP*TEMP1
639  ELSE
640  temp1 = one
641  END IF
642 *
643 * Scale, if necessary
644 *
645  IF( temp1.NE.one ) THEN
646  CALL slascl( 'G', 0, 0, one, temp1, n, 1, sva, n, ierr )
647  END IF
648  skl = temp1*skl
649  IF( skl.NE.one ) THEN
650  CALL slascl( joba, 0, 0, one, skl, m, n, a, lda, ierr )
651  skl = one / skl
652  END IF
653 *
654 * Row-cyclic Jacobi SVD algorithm with column pivoting
655 *
656  emptsw = ( n*( n-1 ) ) / 2
657  notrot = 0
658  fastr( 1 ) = zero
659 *
660 * A is represented in factored form A = A * diag(WORK), where diag(WORK)
661 * is initialized to identity. WORK is updated during fast scaled
662 * rotations.
663 *
664  DO 1868 q = 1, n
665  work( q ) = one
666  1868 CONTINUE
667 *
668 *
669  swband = 3
670 *[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective
671 * if SGESVJ is used as a computational routine in the preconditioned
672 * Jacobi SVD algorithm SGESVJ. For sweeps i=1:SWBAND the procedure
673 * works on pivots inside a band-like region around the diagonal.
674 * The boundaries are determined dynamically, based on the number of
675 * pivots above a threshold.
676 *
677  kbl = min( 8, n )
678 *[TP] KBL is a tuning parameter that defines the tile size in the
679 * tiling of the p-q loops of pivot pairs. In general, an optimal
680 * value of KBL depends on the matrix dimensions and on the
681 * parameters of the computer's memory.
682 *
683  nbl = n / kbl
684  IF( ( nbl*kbl ).NE.n )nbl = nbl + 1
685 *
686  blskip = kbl**2
687 *[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL.
688 *
689  rowskip = min( 5, kbl )
690 *[TP] ROWSKIP is a tuning parameter.
691 *
693 *[TP] LKAHEAD is a tuning parameter.
694 *
695 * Quasi block transformations, using the lower (upper) triangular
696 * structure of the input matrix. The quasi-block-cycling usually
697 * invokes cubic convergence. Big part of this cycle is done inside
698 * canonical subspaces of dimensions less than M.
699 *
700  IF( ( lower .OR. upper ) .AND. ( n.GT.max( 64, 4*kbl ) ) ) THEN
701 *[TP] The number of partition levels and the actual partition are
702 * tuning parameters.
703  n4 = n / 4
704  n2 = n / 2
705  n34 = 3*n4
706  IF( applv ) THEN
707  q = 0
708  ELSE
709  q = 1
710  END IF
711 *
712  IF( lower ) THEN
713 *
714 * This works very well on lower triangular matrices, in particular
715 * in the framework of the preconditioned Jacobi SVD (xGEJSV).
716 * The idea is simple:
717 * [+ 0 0 0] Note that Jacobi transformations of [0 0]
718 * [+ + 0 0] [0 0]
719 * [+ + x 0] actually work on [x 0] [x 0]
720 * [+ + x x] [x x]. [x x]
721 *
722  CALL sgsvj0( jobv, m-n34, n-n34, a( n34+1, n34+1 ), lda,
723  \$ work( n34+1 ), sva( n34+1 ), mvl,
724  \$ v( n34*q+1, n34+1 ), ldv, epsln, sfmin, tol,
725  \$ 2, work( n+1 ), lwork-n, ierr )
726 *
727  CALL sgsvj0( jobv, m-n2, n34-n2, a( n2+1, n2+1 ), lda,
728  \$ work( n2+1 ), sva( n2+1 ), mvl,
729  \$ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 2,
730  \$ work( n+1 ), lwork-n, ierr )
731 *
732  CALL sgsvj1( jobv, m-n2, n-n2, n4, a( n2+1, n2+1 ), lda,
733  \$ work( n2+1 ), sva( n2+1 ), mvl,
734  \$ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
735  \$ work( n+1 ), lwork-n, ierr )
736 *
737  CALL sgsvj0( jobv, m-n4, n2-n4, a( n4+1, n4+1 ), lda,
738  \$ work( n4+1 ), sva( n4+1 ), mvl,
739  \$ v( n4*q+1, n4+1 ), ldv, epsln, sfmin, tol, 1,
740  \$ work( n+1 ), lwork-n, ierr )
741 *
742  CALL sgsvj0( jobv, m, n4, a, lda, work, sva, mvl, v, ldv,
743  \$ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
744  \$ ierr )
745 *
746  CALL sgsvj1( jobv, m, n2, n4, a, lda, work, sva, mvl, v,
747  \$ ldv, epsln, sfmin, tol, 1, work( n+1 ),
748  \$ lwork-n, ierr )
749 *
750 *
751  ELSE IF( upper ) THEN
752 *
753 *
754  CALL sgsvj0( jobv, n4, n4, a, lda, work, sva, mvl, v, ldv,
755  \$ epsln, sfmin, tol, 2, work( n+1 ), lwork-n,
756  \$ ierr )
757 *
758  CALL sgsvj0( jobv, n2, n4, a( 1, n4+1 ), lda, work( n4+1 ),
759  \$ sva( n4+1 ), mvl, v( n4*q+1, n4+1 ), ldv,
760  \$ epsln, sfmin, tol, 1, work( n+1 ), lwork-n,
761  \$ ierr )
762 *
763  CALL sgsvj1( jobv, n2, n2, n4, a, lda, work, sva, mvl, v,
764  \$ ldv, epsln, sfmin, tol, 1, work( n+1 ),
765  \$ lwork-n, ierr )
766 *
767  CALL sgsvj0( jobv, n2+n4, n4, a( 1, n2+1 ), lda,
768  \$ work( n2+1 ), sva( n2+1 ), mvl,
769  \$ v( n2*q+1, n2+1 ), ldv, epsln, sfmin, tol, 1,
770  \$ work( n+1 ), lwork-n, ierr )
771
772  END IF
773 *
774  END IF
775 *
776 * .. Row-cyclic pivot strategy with de Rijk's pivoting ..
777 *
778  DO 1993 i = 1, nsweep
779 *
780 * .. go go go ...
781 *
782  mxaapq = zero
783  mxsinj = zero
784  iswrot = 0
785 *
786  notrot = 0
787  pskipped = 0
788 *
789 * Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs
790 * 1 <= p < q <= N. This is the first step toward a blocked implementation
791 * of the rotations. New implementation, based on block transformations,
792 * is under development.
793 *
794  DO 2000 ibr = 1, nbl
795 *
796  igl = ( ibr-1 )*kbl + 1
797 *
798  DO 1002 ir1 = 0, min( lkahead, nbl-ibr )
799 *
800  igl = igl + ir1*kbl
801 *
802  DO 2001 p = igl, min( igl+kbl-1, n-1 )
803 *
804 * .. de Rijk's pivoting
805 *
806  q = isamax( n-p+1, sva( p ), 1 ) + p - 1
807  IF( p.NE.q ) THEN
808  CALL sswap( m, a( 1, p ), 1, a( 1, q ), 1 )
809  IF( rsvec )CALL sswap( mvl, v( 1, p ), 1,
810  \$ v( 1, q ), 1 )
811  temp1 = sva( p )
812  sva( p ) = sva( q )
813  sva( q ) = temp1
814  temp1 = work( p )
815  work( p ) = work( q )
816  work( q ) = temp1
817  END IF
818 *
819  IF( ir1.EQ.0 ) THEN
820 *
821 * Column norms are periodically updated by explicit
822 * norm computation.
823 * Caveat:
824 * Unfortunately, some BLAS implementations compute SNRM2(M,A(1,p),1)
825 * as SQRT(SDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to
826 * overflow for ||A(:,p)||_2 > SQRT(overflow_threshold), and to
827 * underflow for ||A(:,p)||_2 < SQRT(underflow_threshold).
828 * Hence, SNRM2 cannot be trusted, not even in the case when
829 * the true norm is far from the under(over)flow boundaries.
830 * If properly implemented SNRM2 is available, the IF-THEN-ELSE
831 * below should read "AAPP = SNRM2( M, A(1,p), 1 ) * WORK(p)".
832 *
833  IF( ( sva( p ).LT.rootbig ) .AND.
834  \$ ( sva( p ).GT.rootsfmin ) ) THEN
835  sva( p ) = snrm2( m, a( 1, p ), 1 )*work( p )
836  ELSE
837  temp1 = zero
838  aapp = one
839  CALL slassq( m, a( 1, p ), 1, temp1, aapp )
840  sva( p ) = temp1*sqrt( aapp )*work( p )
841  END IF
842  aapp = sva( p )
843  ELSE
844  aapp = sva( p )
845  END IF
846 *
847  IF( aapp.GT.zero ) THEN
848 *
849  pskipped = 0
850 *
851  DO 2002 q = p + 1, min( igl+kbl-1, n )
852 *
853  aaqq = sva( q )
854 *
855  IF( aaqq.GT.zero ) THEN
856 *
857  aapp0 = aapp
858  IF( aaqq.GE.one ) THEN
859  rotok = ( small*aapp ).LE.aaqq
860  IF( aapp.LT.( big / aaqq ) ) THEN
861  aapq = ( sdot( m, a( 1, p ), 1, a( 1,
862  \$ q ), 1 )*work( p )*work( q ) /
863  \$ aaqq ) / aapp
864  ELSE
865  CALL scopy( m, a( 1, p ), 1,
866  \$ work( n+1 ), 1 )
867  CALL slascl( 'G', 0, 0, aapp,
868  \$ work( p ), m, 1,
869  \$ work( n+1 ), lda, ierr )
870  aapq = sdot( m, work( n+1 ), 1,
871  \$ a( 1, q ), 1 )*work( q ) / aaqq
872  END IF
873  ELSE
874  rotok = aapp.LE.( aaqq / small )
875  IF( aapp.GT.( small / aaqq ) ) THEN
876  aapq = ( sdot( m, a( 1, p ), 1, a( 1,
877  \$ q ), 1 )*work( p )*work( q ) /
878  \$ aaqq ) / aapp
879  ELSE
880  CALL scopy( m, a( 1, q ), 1,
881  \$ work( n+1 ), 1 )
882  CALL slascl( 'G', 0, 0, aaqq,
883  \$ work( q ), m, 1,
884  \$ work( n+1 ), lda, ierr )
885  aapq = sdot( m, work( n+1 ), 1,
886  \$ a( 1, p ), 1 )*work( p ) / aapp
887  END IF
888  END IF
889 *
890  mxaapq = max( mxaapq, abs( aapq ) )
891 *
892 * TO rotate or NOT to rotate, THAT is the question ...
893 *
894  IF( abs( aapq ).GT.tol ) THEN
895 *
896 * .. rotate
897 *[RTD] ROTATED = ROTATED + ONE
898 *
899  IF( ir1.EQ.0 ) THEN
900  notrot = 0
901  pskipped = 0
902  iswrot = iswrot + 1
903  END IF
904 *
905  IF( rotok ) THEN
906 *
907  aqoap = aaqq / aapp
908  apoaq = aapp / aaqq
909  theta = -half*abs( aqoap-apoaq ) / aapq
910 *
911  IF( abs( theta ).GT.bigtheta ) THEN
912 *
913  t = half / theta
914  fastr( 3 ) = t*work( p ) / work( q )
915  fastr( 4 ) = -t*work( q ) /
916  \$ work( p )
917  CALL srotm( m, a( 1, p ), 1,
918  \$ a( 1, q ), 1, fastr )
919  IF( rsvec )CALL srotm( mvl,
920  \$ v( 1, p ), 1,
921  \$ v( 1, q ), 1,
922  \$ fastr )
923  sva( q ) = aaqq*sqrt( max( zero,
924  \$ one+t*apoaq*aapq ) )
925  aapp = aapp*sqrt( max( zero,
926  \$ one-t*aqoap*aapq ) )
927  mxsinj = max( mxsinj, abs( t ) )
928 *
929  ELSE
930 *
931 * .. choose correct signum for THETA and rotate
932 *
933  thsign = -sign( one, aapq )
934  t = one / ( theta+thsign*
935  \$ sqrt( one+theta*theta ) )
936  cs = sqrt( one / ( one+t*t ) )
937  sn = t*cs
938 *
939  mxsinj = max( mxsinj, abs( sn ) )
940  sva( q ) = aaqq*sqrt( max( zero,
941  \$ one+t*apoaq*aapq ) )
942  aapp = aapp*sqrt( max( zero,
943  \$ one-t*aqoap*aapq ) )
944 *
945  apoaq = work( p ) / work( q )
946  aqoap = work( q ) / work( p )
947  IF( work( p ).GE.one ) THEN
948  IF( work( q ).GE.one ) THEN
949  fastr( 3 ) = t*apoaq
950  fastr( 4 ) = -t*aqoap
951  work( p ) = work( p )*cs
952  work( q ) = work( q )*cs
953  CALL srotm( m, a( 1, p ), 1,
954  \$ a( 1, q ), 1,
955  \$ fastr )
956  IF( rsvec )CALL srotm( mvl,
957  \$ v( 1, p ), 1, v( 1, q ),
958  \$ 1, fastr )
959  ELSE
960  CALL saxpy( m, -t*aqoap,
961  \$ a( 1, q ), 1,
962  \$ a( 1, p ), 1 )
963  CALL saxpy( m, cs*sn*apoaq,
964  \$ a( 1, p ), 1,
965  \$ a( 1, q ), 1 )
966  work( p ) = work( p )*cs
967  work( q ) = work( q ) / cs
968  IF( rsvec ) THEN
969  CALL saxpy( mvl, -t*aqoap,
970  \$ v( 1, q ), 1,
971  \$ v( 1, p ), 1 )
972  CALL saxpy( mvl,
973  \$ cs*sn*apoaq,
974  \$ v( 1, p ), 1,
975  \$ v( 1, q ), 1 )
976  END IF
977  END IF
978  ELSE
979  IF( work( q ).GE.one ) THEN
980  CALL saxpy( m, t*apoaq,
981  \$ a( 1, p ), 1,
982  \$ a( 1, q ), 1 )
983  CALL saxpy( m, -cs*sn*aqoap,
984  \$ a( 1, q ), 1,
985  \$ a( 1, p ), 1 )
986  work( p ) = work( p ) / cs
987  work( q ) = work( q )*cs
988  IF( rsvec ) THEN
989  CALL saxpy( mvl, t*apoaq,
990  \$ v( 1, p ), 1,
991  \$ v( 1, q ), 1 )
992  CALL saxpy( mvl,
993  \$ -cs*sn*aqoap,
994  \$ v( 1, q ), 1,
995  \$ v( 1, p ), 1 )
996  END IF
997  ELSE
998  IF( work( p ).GE.work( q ) )
999  \$ THEN
1000  CALL saxpy( m, -t*aqoap,
1001  \$ a( 1, q ), 1,
1002  \$ a( 1, p ), 1 )
1003  CALL saxpy( m, cs*sn*apoaq,
1004  \$ a( 1, p ), 1,
1005  \$ a( 1, q ), 1 )
1006  work( p ) = work( p )*cs
1007  work( q ) = work( q ) / cs
1008  IF( rsvec ) THEN
1009  CALL saxpy( mvl,
1010  \$ -t*aqoap,
1011  \$ v( 1, q ), 1,
1012  \$ v( 1, p ), 1 )
1013  CALL saxpy( mvl,
1014  \$ cs*sn*apoaq,
1015  \$ v( 1, p ), 1,
1016  \$ v( 1, q ), 1 )
1017  END IF
1018  ELSE
1019  CALL saxpy( m, t*apoaq,
1020  \$ a( 1, p ), 1,
1021  \$ a( 1, q ), 1 )
1022  CALL saxpy( m,
1023  \$ -cs*sn*aqoap,
1024  \$ a( 1, q ), 1,
1025  \$ a( 1, p ), 1 )
1026  work( p ) = work( p ) / cs
1027  work( q ) = work( q )*cs
1028  IF( rsvec ) THEN
1029  CALL saxpy( mvl,
1030  \$ t*apoaq, v( 1, p ),
1031  \$ 1, v( 1, q ), 1 )
1032  CALL saxpy( mvl,
1033  \$ -cs*sn*aqoap,
1034  \$ v( 1, q ), 1,
1035  \$ v( 1, p ), 1 )
1036  END IF
1037  END IF
1038  END IF
1039  END IF
1040  END IF
1041 *
1042  ELSE
1043 * .. have to use modified Gram-Schmidt like transformation
1044  CALL scopy( m, a( 1, p ), 1,
1045  \$ work( n+1 ), 1 )
1046  CALL slascl( 'G', 0, 0, aapp, one, m,
1047  \$ 1, work( n+1 ), lda,
1048  \$ ierr )
1049  CALL slascl( 'G', 0, 0, aaqq, one, m,
1050  \$ 1, a( 1, q ), lda, ierr )
1051  temp1 = -aapq*work( p ) / work( q )
1052  CALL saxpy( m, temp1, work( n+1 ), 1,
1053  \$ a( 1, q ), 1 )
1054  CALL slascl( 'G', 0, 0, one, aaqq, m,
1055  \$ 1, a( 1, q ), lda, ierr )
1056  sva( q ) = aaqq*sqrt( max( zero,
1057  \$ one-aapq*aapq ) )
1058  mxsinj = max( mxsinj, sfmin )
1059  END IF
1060 * END IF ROTOK THEN ... ELSE
1061 *
1062 * In the case of cancellation in updating SVA(q), SVA(p)
1063 * recompute SVA(q), SVA(p).
1064 *
1065  IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1066  \$ THEN
1067  IF( ( aaqq.LT.rootbig ) .AND.
1068  \$ ( aaqq.GT.rootsfmin ) ) THEN
1069  sva( q ) = snrm2( m, a( 1, q ), 1 )*
1070  \$ work( q )
1071  ELSE
1072  t = zero
1073  aaqq = one
1074  CALL slassq( m, a( 1, q ), 1, t,
1075  \$ aaqq )
1076  sva( q ) = t*sqrt( aaqq )*work( q )
1077  END IF
1078  END IF
1079  IF( ( aapp / aapp0 ).LE.rooteps ) THEN
1080  IF( ( aapp.LT.rootbig ) .AND.
1081  \$ ( aapp.GT.rootsfmin ) ) THEN
1082  aapp = snrm2( m, a( 1, p ), 1 )*
1083  \$ work( p )
1084  ELSE
1085  t = zero
1086  aapp = one
1087  CALL slassq( m, a( 1, p ), 1, t,
1088  \$ aapp )
1089  aapp = t*sqrt( aapp )*work( p )
1090  END IF
1091  sva( p ) = aapp
1092  END IF
1093 *
1094  ELSE
1095 * A(:,p) and A(:,q) already numerically orthogonal
1096  IF( ir1.EQ.0 )notrot = notrot + 1
1097 *[RTD] SKIPPED = SKIPPED + 1
1098  pskipped = pskipped + 1
1099  END IF
1100  ELSE
1101 * A(:,q) is zero column
1102  IF( ir1.EQ.0 )notrot = notrot + 1
1103  pskipped = pskipped + 1
1104  END IF
1105 *
1106  IF( ( i.LE.swband ) .AND.
1107  \$ ( pskipped.GT.rowskip ) ) THEN
1108  IF( ir1.EQ.0 )aapp = -aapp
1109  notrot = 0
1110  GO TO 2103
1111  END IF
1112 *
1113  2002 CONTINUE
1114 * END q-LOOP
1115 *
1116  2103 CONTINUE
1117 * bailed out of q-loop
1118 *
1119  sva( p ) = aapp
1120 *
1121  ELSE
1122  sva( p ) = aapp
1123  IF( ( ir1.EQ.0 ) .AND. ( aapp.EQ.zero ) )
1124  \$ notrot = notrot + min( igl+kbl-1, n ) - p
1125  END IF
1126 *
1127  2001 CONTINUE
1128 * end of the p-loop
1129 * end of doing the block ( ibr, ibr )
1130  1002 CONTINUE
1131 * end of ir1-loop
1132 *
1133 * ... go to the off diagonal blocks
1134 *
1135  igl = ( ibr-1 )*kbl + 1
1136 *
1137  DO 2010 jbc = ibr + 1, nbl
1138 *
1139  jgl = ( jbc-1 )*kbl + 1
1140 *
1141 * doing the block at ( ibr, jbc )
1142 *
1143  ijblsk = 0
1144  DO 2100 p = igl, min( igl+kbl-1, n )
1145 *
1146  aapp = sva( p )
1147  IF( aapp.GT.zero ) THEN
1148 *
1149  pskipped = 0
1150 *
1151  DO 2200 q = jgl, min( jgl+kbl-1, n )
1152 *
1153  aaqq = sva( q )
1154  IF( aaqq.GT.zero ) THEN
1155  aapp0 = aapp
1156 *
1157 * .. M x 2 Jacobi SVD ..
1158 *
1159 * Safe Gram matrix computation
1160 *
1161  IF( aaqq.GE.one ) THEN
1162  IF( aapp.GE.aaqq ) THEN
1163  rotok = ( small*aapp ).LE.aaqq
1164  ELSE
1165  rotok = ( small*aaqq ).LE.aapp
1166  END IF
1167  IF( aapp.LT.( big / aaqq ) ) THEN
1168  aapq = ( sdot( m, a( 1, p ), 1, a( 1,
1169  \$ q ), 1 )*work( p )*work( q ) /
1170  \$ aaqq ) / aapp
1171  ELSE
1172  CALL scopy( m, a( 1, p ), 1,
1173  \$ work( n+1 ), 1 )
1174  CALL slascl( 'G', 0, 0, aapp,
1175  \$ work( p ), m, 1,
1176  \$ work( n+1 ), lda, ierr )
1177  aapq = sdot( m, work( n+1 ), 1,
1178  \$ a( 1, q ), 1 )*work( q ) / aaqq
1179  END IF
1180  ELSE
1181  IF( aapp.GE.aaqq ) THEN
1182  rotok = aapp.LE.( aaqq / small )
1183  ELSE
1184  rotok = aaqq.LE.( aapp / small )
1185  END IF
1186  IF( aapp.GT.( small / aaqq ) ) THEN
1187  aapq = ( sdot( m, a( 1, p ), 1, a( 1,
1188  \$ q ), 1 )*work( p )*work( q ) /
1189  \$ aaqq ) / aapp
1190  ELSE
1191  CALL scopy( m, a( 1, q ), 1,
1192  \$ work( n+1 ), 1 )
1193  CALL slascl( 'G', 0, 0, aaqq,
1194  \$ work( q ), m, 1,
1195  \$ work( n+1 ), lda, ierr )
1196  aapq = sdot( m, work( n+1 ), 1,
1197  \$ a( 1, p ), 1 )*work( p ) / aapp
1198  END IF
1199  END IF
1200 *
1201  mxaapq = max( mxaapq, abs( aapq ) )
1202 *
1203 * TO rotate or NOT to rotate, THAT is the question ...
1204 *
1205  IF( abs( aapq ).GT.tol ) THEN
1206  notrot = 0
1207 *[RTD] ROTATED = ROTATED + 1
1208  pskipped = 0
1209  iswrot = iswrot + 1
1210 *
1211  IF( rotok ) THEN
1212 *
1213  aqoap = aaqq / aapp
1214  apoaq = aapp / aaqq
1215  theta = -half*abs( aqoap-apoaq ) / aapq
1216  IF( aaqq.GT.aapp0 )theta = -theta
1217 *
1218  IF( abs( theta ).GT.bigtheta ) THEN
1219  t = half / theta
1220  fastr( 3 ) = t*work( p ) / work( q )
1221  fastr( 4 ) = -t*work( q ) /
1222  \$ work( p )
1223  CALL srotm( m, a( 1, p ), 1,
1224  \$ a( 1, q ), 1, fastr )
1225  IF( rsvec )CALL srotm( mvl,
1226  \$ v( 1, p ), 1,
1227  \$ v( 1, q ), 1,
1228  \$ fastr )
1229  sva( q ) = aaqq*sqrt( max( zero,
1230  \$ one+t*apoaq*aapq ) )
1231  aapp = aapp*sqrt( max( zero,
1232  \$ one-t*aqoap*aapq ) )
1233  mxsinj = max( mxsinj, abs( t ) )
1234  ELSE
1235 *
1236 * .. choose correct signum for THETA and rotate
1237 *
1238  thsign = -sign( one, aapq )
1239  IF( aaqq.GT.aapp0 )thsign = -thsign
1240  t = one / ( theta+thsign*
1241  \$ sqrt( one+theta*theta ) )
1242  cs = sqrt( one / ( one+t*t ) )
1243  sn = t*cs
1244  mxsinj = max( mxsinj, abs( sn ) )
1245  sva( q ) = aaqq*sqrt( max( zero,
1246  \$ one+t*apoaq*aapq ) )
1247  aapp = aapp*sqrt( max( zero,
1248  \$ one-t*aqoap*aapq ) )
1249 *
1250  apoaq = work( p ) / work( q )
1251  aqoap = work( q ) / work( p )
1252  IF( work( p ).GE.one ) THEN
1253 *
1254  IF( work( q ).GE.one ) THEN
1255  fastr( 3 ) = t*apoaq
1256  fastr( 4 ) = -t*aqoap
1257  work( p ) = work( p )*cs
1258  work( q ) = work( q )*cs
1259  CALL srotm( m, a( 1, p ), 1,
1260  \$ a( 1, q ), 1,
1261  \$ fastr )
1262  IF( rsvec )CALL srotm( mvl,
1263  \$ v( 1, p ), 1, v( 1, q ),
1264  \$ 1, fastr )
1265  ELSE
1266  CALL saxpy( m, -t*aqoap,
1267  \$ a( 1, q ), 1,
1268  \$ a( 1, p ), 1 )
1269  CALL saxpy( m, cs*sn*apoaq,
1270  \$ a( 1, p ), 1,
1271  \$ a( 1, q ), 1 )
1272  IF( rsvec ) THEN
1273  CALL saxpy( mvl, -t*aqoap,
1274  \$ v( 1, q ), 1,
1275  \$ v( 1, p ), 1 )
1276  CALL saxpy( mvl,
1277  \$ cs*sn*apoaq,
1278  \$ v( 1, p ), 1,
1279  \$ v( 1, q ), 1 )
1280  END IF
1281  work( p ) = work( p )*cs
1282  work( q ) = work( q ) / cs
1283  END IF
1284  ELSE
1285  IF( work( q ).GE.one ) THEN
1286  CALL saxpy( m, t*apoaq,
1287  \$ a( 1, p ), 1,
1288  \$ a( 1, q ), 1 )
1289  CALL saxpy( m, -cs*sn*aqoap,
1290  \$ a( 1, q ), 1,
1291  \$ a( 1, p ), 1 )
1292  IF( rsvec ) THEN
1293  CALL saxpy( mvl, t*apoaq,
1294  \$ v( 1, p ), 1,
1295  \$ v( 1, q ), 1 )
1296  CALL saxpy( mvl,
1297  \$ -cs*sn*aqoap,
1298  \$ v( 1, q ), 1,
1299  \$ v( 1, p ), 1 )
1300  END IF
1301  work( p ) = work( p ) / cs
1302  work( q ) = work( q )*cs
1303  ELSE
1304  IF( work( p ).GE.work( q ) )
1305  \$ THEN
1306  CALL saxpy( m, -t*aqoap,
1307  \$ a( 1, q ), 1,
1308  \$ a( 1, p ), 1 )
1309  CALL saxpy( m, cs*sn*apoaq,
1310  \$ a( 1, p ), 1,
1311  \$ a( 1, q ), 1 )
1312  work( p ) = work( p )*cs
1313  work( q ) = work( q ) / cs
1314  IF( rsvec ) THEN
1315  CALL saxpy( mvl,
1316  \$ -t*aqoap,
1317  \$ v( 1, q ), 1,
1318  \$ v( 1, p ), 1 )
1319  CALL saxpy( mvl,
1320  \$ cs*sn*apoaq,
1321  \$ v( 1, p ), 1,
1322  \$ v( 1, q ), 1 )
1323  END IF
1324  ELSE
1325  CALL saxpy( m, t*apoaq,
1326  \$ a( 1, p ), 1,
1327  \$ a( 1, q ), 1 )
1328  CALL saxpy( m,
1329  \$ -cs*sn*aqoap,
1330  \$ a( 1, q ), 1,
1331  \$ a( 1, p ), 1 )
1332  work( p ) = work( p ) / cs
1333  work( q ) = work( q )*cs
1334  IF( rsvec ) THEN
1335  CALL saxpy( mvl,
1336  \$ t*apoaq, v( 1, p ),
1337  \$ 1, v( 1, q ), 1 )
1338  CALL saxpy( mvl,
1339  \$ -cs*sn*aqoap,
1340  \$ v( 1, q ), 1,
1341  \$ v( 1, p ), 1 )
1342  END IF
1343  END IF
1344  END IF
1345  END IF
1346  END IF
1347 *
1348  ELSE
1349  IF( aapp.GT.aaqq ) THEN
1350  CALL scopy( m, a( 1, p ), 1,
1351  \$ work( n+1 ), 1 )
1352  CALL slascl( 'G', 0, 0, aapp, one,
1353  \$ m, 1, work( n+1 ), lda,
1354  \$ ierr )
1355  CALL slascl( 'G', 0, 0, aaqq, one,
1356  \$ m, 1, a( 1, q ), lda,
1357  \$ ierr )
1358  temp1 = -aapq*work( p ) / work( q )
1359  CALL saxpy( m, temp1, work( n+1 ),
1360  \$ 1, a( 1, q ), 1 )
1361  CALL slascl( 'G', 0, 0, one, aaqq,
1362  \$ m, 1, a( 1, q ), lda,
1363  \$ ierr )
1364  sva( q ) = aaqq*sqrt( max( zero,
1365  \$ one-aapq*aapq ) )
1366  mxsinj = max( mxsinj, sfmin )
1367  ELSE
1368  CALL scopy( m, a( 1, q ), 1,
1369  \$ work( n+1 ), 1 )
1370  CALL slascl( 'G', 0, 0, aaqq, one,
1371  \$ m, 1, work( n+1 ), lda,
1372  \$ ierr )
1373  CALL slascl( 'G', 0, 0, aapp, one,
1374  \$ m, 1, a( 1, p ), lda,
1375  \$ ierr )
1376  temp1 = -aapq*work( q ) / work( p )
1377  CALL saxpy( m, temp1, work( n+1 ),
1378  \$ 1, a( 1, p ), 1 )
1379  CALL slascl( 'G', 0, 0, one, aapp,
1380  \$ m, 1, a( 1, p ), lda,
1381  \$ ierr )
1382  sva( p ) = aapp*sqrt( max( zero,
1383  \$ one-aapq*aapq ) )
1384  mxsinj = max( mxsinj, sfmin )
1385  END IF
1386  END IF
1387 * END IF ROTOK THEN ... ELSE
1388 *
1389 * In the case of cancellation in updating SVA(q)
1390 * .. recompute SVA(q)
1391  IF( ( sva( q ) / aaqq )**2.LE.rooteps )
1392  \$ THEN
1393  IF( ( aaqq.LT.rootbig ) .AND.
1394  \$ ( aaqq.GT.rootsfmin ) ) THEN
1395  sva( q ) = snrm2( m, a( 1, q ), 1 )*
1396  \$ work( q )
1397  ELSE
1398  t = zero
1399  aaqq = one
1400  CALL slassq( m, a( 1, q ), 1, t,
1401  \$ aaqq )
1402  sva( q ) = t*sqrt( aaqq )*work( q )
1403  END IF
1404  END IF
1405  IF( ( aapp / aapp0 )**2.LE.rooteps ) THEN
1406  IF( ( aapp.LT.rootbig ) .AND.
1407  \$ ( aapp.GT.rootsfmin ) ) THEN
1408  aapp = snrm2( m, a( 1, p ), 1 )*
1409  \$ work( p )
1410  ELSE
1411  t = zero
1412  aapp = one
1413  CALL slassq( m, a( 1, p ), 1, t,
1414  \$ aapp )
1415  aapp = t*sqrt( aapp )*work( p )
1416  END IF
1417  sva( p ) = aapp
1418  END IF
1419 * end of OK rotation
1420  ELSE
1421  notrot = notrot + 1
1422 *[RTD] SKIPPED = SKIPPED + 1
1423  pskipped = pskipped + 1
1424  ijblsk = ijblsk + 1
1425  END IF
1426  ELSE
1427  notrot = notrot + 1
1428  pskipped = pskipped + 1
1429  ijblsk = ijblsk + 1
1430  END IF
1431 *
1432  IF( ( i.LE.swband ) .AND. ( ijblsk.GE.blskip ) )
1433  \$ THEN
1434  sva( p ) = aapp
1435  notrot = 0
1436  GO TO 2011
1437  END IF
1438  IF( ( i.LE.swband ) .AND.
1439  \$ ( pskipped.GT.rowskip ) ) THEN
1440  aapp = -aapp
1441  notrot = 0
1442  GO TO 2203
1443  END IF
1444 *
1445  2200 CONTINUE
1446 * end of the q-loop
1447  2203 CONTINUE
1448 *
1449  sva( p ) = aapp
1450 *
1451  ELSE
1452 *
1453  IF( aapp.EQ.zero )notrot = notrot +
1454  \$ min( jgl+kbl-1, n ) - jgl + 1
1455  IF( aapp.LT.zero )notrot = 0
1456 *
1457  END IF
1458 *
1459  2100 CONTINUE
1460 * end of the p-loop
1461  2010 CONTINUE
1462 * end of the jbc-loop
1463  2011 CONTINUE
1464 *2011 bailed out of the jbc-loop
1465  DO 2012 p = igl, min( igl+kbl-1, n )
1466  sva( p ) = abs( sva( p ) )
1467  2012 CONTINUE
1468 ***
1469  2000 CONTINUE
1470 *2000 :: end of the ibr-loop
1471 *
1472 * .. update SVA(N)
1473  IF( ( sva( n ).LT.rootbig ) .AND. ( sva( n ).GT.rootsfmin ) )
1474  \$ THEN
1475  sva( n ) = snrm2( m, a( 1, n ), 1 )*work( n )
1476  ELSE
1477  t = zero
1478  aapp = one
1479  CALL slassq( m, a( 1, n ), 1, t, aapp )
1480  sva( n ) = t*sqrt( aapp )*work( n )
1481  END IF
1482 *
1484 *
1485  IF( ( i.LT.swband ) .AND. ( ( mxaapq.LE.roottol ) .OR.
1486  \$ ( iswrot.LE.n ) ) )swband = i
1487 *
1488  IF( ( i.GT.swband+1 ) .AND. ( mxaapq.LT.sqrt( float( n ) )*
1489  \$ tol ) .AND. ( float( n )*mxaapq*mxsinj.LT.tol ) ) THEN
1490  GO TO 1994
1491  END IF
1492 *
1493  IF( notrot.GE.emptsw )GO TO 1994
1494 *
1495  1993 CONTINUE
1496 * end i=1:NSWEEP loop
1497 *
1498 * #:( Reaching this point means that the procedure has not converged.
1499  info = nsweep - 1
1500  GO TO 1995
1501 *
1502  1994 CONTINUE
1503 * #:) Reaching this point means numerical convergence after the i-th
1504 * sweep.
1505 *
1506  info = 0
1507 * #:) INFO = 0 confirms successful iterations.
1508  1995 CONTINUE
1509 *
1510 * Sort the singular values and find how many are above
1511 * the underflow threshold.
1512 *
1513  n2 = 0
1514  n4 = 0
1515  DO 5991 p = 1, n - 1
1516  q = isamax( n-p+1, sva( p ), 1 ) + p - 1
1517  IF( p.NE.q ) THEN
1518  temp1 = sva( p )
1519  sva( p ) = sva( q )
1520  sva( q ) = temp1
1521  temp1 = work( p )
1522  work( p ) = work( q )
1523  work( q ) = temp1
1524  CALL sswap( m, a( 1, p ), 1, a( 1, q ), 1 )
1525  IF( rsvec )CALL sswap( mvl, v( 1, p ), 1, v( 1, q ), 1 )
1526  END IF
1527  IF( sva( p ).NE.zero ) THEN
1528  n4 = n4 + 1
1529  IF( sva( p )*skl.GT.sfmin )n2 = n2 + 1
1530  END IF
1531  5991 CONTINUE
1532  IF( sva( n ).NE.zero ) THEN
1533  n4 = n4 + 1
1534  IF( sva( n )*skl.GT.sfmin )n2 = n2 + 1
1535  END IF
1536 *
1537 * Normalize the left singular vectors.
1538 *
1539  IF( lsvec .OR. uctol ) THEN
1540  DO 1998 p = 1, n2
1541  CALL sscal( m, work( p ) / sva( p ), a( 1, p ), 1 )
1542  1998 CONTINUE
1543  END IF
1544 *
1545 * Scale the product of Jacobi rotations (assemble the fast rotations).
1546 *
1547  IF( rsvec ) THEN
1548  IF( applv ) THEN
1549  DO 2398 p = 1, n
1550  CALL sscal( mvl, work( p ), v( 1, p ), 1 )
1551  2398 CONTINUE
1552  ELSE
1553  DO 2399 p = 1, n
1554  temp1 = one / snrm2( mvl, v( 1, p ), 1 )
1555  CALL sscal( mvl, temp1, v( 1, p ), 1 )
1556  2399 CONTINUE
1557  END IF
1558  END IF
1559 *
1560 * Undo scaling, if necessary (and possible).
1561  IF( ( ( skl.GT.one ) .AND. ( sva( 1 ).LT.( big / skl ) ) )
1562  \$ .OR. ( ( skl.LT.one ) .AND. ( sva( max( n2, 1 ) ) .GT.
1563  \$ ( sfmin / skl ) ) ) ) THEN
1564  DO 2400 p = 1, n
1565  sva( p ) = skl*sva( p )
1566  2400 CONTINUE
1567  skl = one
1568  END IF
1569 *
1570  work( 1 ) = skl
1571 * The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE
1572 * then some of the singular values may overflow or underflow and
1573 * the spectrum is given in this factored representation.
1574 *
1575  work( 2 ) = float( n4 )
1576 * N4 is the number of computed nonzero singular values of A.
1577 *
1578  work( 3 ) = float( n2 )
1579 * N2 is the number of singular values of A greater than SFMIN.
1580 * If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers
1581 * that may carry some information.
1582 *
1583  work( 4 ) = float( i )
1584 * i is the index of the last sweep before declaring convergence.
1585 *
1586  work( 5 ) = mxaapq
1587 * MXAAPQ is the largest absolute value of scaled pivots in the
1588 * last sweep
1589 *
1590  work( 6 ) = mxsinj
1591 * MXSINJ is the largest absolute value of the sines of Jacobi angles
1592 * in the last sweep
1593 *
1594  RETURN
1595 * ..
1596 * .. END OF SGESVJ
1597 * ..
subroutine slassq(n, x, incx, scl, sumsq)
SLASSQ updates a sum of squares represented in scaled form.
Definition: slassq.f90:137
subroutine slascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
SLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: slascl.f:143
subroutine slaset(UPLO, M, N, ALPHA, BETA, A, LDA)
SLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values.
Definition: slaset.f:110
integer function isamax(N, SX, INCX)
ISAMAX
Definition: isamax.f:71
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:60
logical function lsame(CA, CB)
LSAME
Definition: lsame.f:53
subroutine sgsvj1(JOBV, M, N, N1, A, LDA, D, SVA, MV, V, LDV, EPS, SFMIN, TOL, NSWEEP, WORK, LWORK, INFO)
SGSVJ1 pre-processor for the routine sgesvj, applies Jacobi rotations targeting only particular pivot...
Definition: sgsvj1.f:236
subroutine sgsvj0(JOBV, M, N, A, LDA, D, SVA, MV, V, LDV, EPS, SFMIN, TOL, NSWEEP, WORK, LWORK, INFO)
SGSVJ0 pre-processor for the routine sgesvj.
Definition: sgsvj0.f:218
subroutine sswap(N, SX, INCX, SY, INCY)
SSWAP
Definition: sswap.f:82
subroutine scopy(N, SX, INCX, SY, INCY)
SCOPY
Definition: scopy.f:82
subroutine sscal(N, SA, SX, INCX)
SSCAL
Definition: sscal.f:79
real function sdot(N, SX, INCX, SY, INCY)
SDOT
Definition: sdot.f:82
subroutine srotm(N, SX, INCX, SY, INCY, SPARAM)
SROTM
Definition: srotm.f:97
real(wp) function snrm2(n, x, incx)
SNRM2
Definition: snrm2.f90:89
subroutine saxpy(N, SA, SX, INCX, SY, INCY)
SAXPY
Definition: saxpy.f:89
real function slamch(CMACH)
SLAMCH
Definition: slamch.f:68
Here is the call graph for this function:
Here is the caller graph for this function: