LASO2 Documentation
David S. Scott
Computer Science Department
University of Texas at Austin
Table of Contents
1. Introduction 1
2. New Eigenvalue Solver 1
2.1. Changed Workspace 1
2.2. Extra Space in S 2
2.3. Random Numbers 2
2.4. Storage of the Band Matrix 2
3. Other Changes 3
3.1. Local Reorthogonalization 3
3.2. Changes to subroutine SMVPC 3
3.3. Subroutine SVZERO 3
3.4. Change in Subroutine SORTQR 4
3.5. SUBROUTINE SVSORT 4
3.6. Formatting Changes 4
4. Subroutine Glossary 4
5. Variable Glossary 5
Appendix A: User guide for SNLASO
Appendix B: User guide for SILASO
1
1. Introduction
The LASO package is a set of FORTRAN IV subroutines for computing a few
eigenvalues of a large (sparse) symmetric matrix. The LASO package is
documented in a technical report [3]. This article documents the changes
to the LASO package which were incorporated into LASO2. The primary
difference between LASO and LASO2 is that LASO2 does not use EISPACK to
compute the eigenvalues of the band matrix generated by the Lanczos
algorithm. It also monitors all the eigenvalues of interest at every
Lanczos step rather than selected ones. Other changes were incorporated
at the same time and are described in this article. The user guides for
the two entry points for LASO2 (subroutines SNLASO and SILASO) are given
at the end. These guides are virtually identical to the user guides for
the old LASO package. In particular, the calling sequences for SNLASO
and SILASO are the same for both versions.
2. New Eigenvalue Solver
The subroutines available in EISPACK [4] and [1] are not very efficient
for computing a few eigenvalues of a symmetric band matrix. LASO2 uses
SLAEIG, a specialized version of the subroutine SBPRQS [2]. The new
solver is better for two reasons.
1. The work required is proportional to the dimension of the band
matrix instead of being proportional to the square of the
dimension.
2. SBPRQS can take advantage of known approximations to
eigenvectors.
In the LASO context certain eigenvalues are computed at each step and it
is easy to augment the eigenvector from the previous step with zeroes to
obtain an approximate eigenvector. In later steps of the Lanczos
algorithm these approximations are really quite good and significant
savings over the EISPACK codes are obtained. Several additional changes
were needed to merge SBPRQS and LASO. These are described in the
following subsections.
2.1. Changed Workspace
In LASO2 the array WORK is broken into different pieces. The following
table gives the work spaces and their dimensions in each of the entry
points.
ARRAY SNLASO SILASO
P1 N*NBLOCK same
P0 N*NBLOCK same
RES NVAL MAXVEC
TAU NVAL MAXVEC
OTAU NVAL MAXVEC
T (NBLOCK+1)*MAXJ same
ALP NBLOCK*NBLOCK same
BET NBLOCK*NBLOCK same
2
S MAXJ*(NVAL+1) MAXJ*(MAXVEC+2)
P2 N*NBLOCK same
BOUND 2*NVAL + 6 2*MAXVEC + 8
ATEMP (2*NBLOCK+1)*MAXVEC same
VTEMP MAXJ same
D (NBLOCK+1)*(2*NBLOCK+1) same
The four arrays BOUND, ATEMP, VTEMP, and D are used by SLAEIG. The
space as described above is slightly less than that needed by LASO.
However a further savings was achieved by overlapping the space for P2 and
the space for the last four arrays. Since P2 is occasionally used to hold
an eigenvector in a call to SLAEIG it is necessary to offset the start of
the array BOUND from the start of P2. In particular
BOUND(1) = P2(NBLOCK+1).
2.2. Extra Space in S
To take full advantage of the new band eigensolver it is necessary to
remember the eigenvectors of interest from one Lanczos step to the next.
This required that SNLASO have one extra column in S and SILASO to have
two, to hold the eigenvectors associated with the nearest undesired
eigenvalue(s) which are needed to estimate the accuracy of the desired
eigenvalues.
2.3. Random Numbers
The LASO package used the function URAND to generate uniform
pseudo-random numbers whenever they were needed. In fact they were needed
only to generate random vectors. Since SLAEIG also needs to generate
random vectors, this task was isolated in a new subroutine SLARAN. It
still uses the function URAND but it is now the only place where URAND is
accessed which makes it easier for a user to substitute some other random
number generator.
2.4. Storage of the Band Matrix
In LASO the band matrix is stored in the MAXJ x (NBLOCK+1) array T. For
efficiency SLAEIG was written to accept a (NBLOCK+1) x MAXJ array. This
required that the storage of the blocks ALP and BET in T be changed. This
change is more than just a transposition of T for two reasons.
1. In EISPACK the main diagonal of the matrix appears in the
rightmost column of T while in SLAEIG the main diagonal is the
first row.
2. In EISPACK the rows of the matrix appear as rows of T so that
the empty triangle appears in the upper left corner. In SLAEIG
3
the columns of the matrix appear as columns of T so that the
empty triangle appears at the bottom right corner.
In fact in LASO2 the 'empty' triangle already contains the next off
diagonal block BET.
It was also necessary to provide a new variable NBAND = NBLOCK + 1 in
order to dimension T properly.
3. Other Changes
Several other changes to LASO, unrelated to the new eigensolver, were
made at the same time. These changes are documented in the following
subsections.
3.1. Local Reorthogonalization
LASO provided for the local reorthogonalization of the next Lanczos
block against the two previous ones (which are still in core). Several
authors had suggested that this technique delayed the loss of
orthogonality. Using selective orthogonalization the local
reorthogonalization has the effect of slightly delaying the required
orthogonalizations but has no apparent effect on the number of Lanczos
steps required or the accuracy obtained. Since the cost of the local
reorthogonalization is significant it has been dropped from LASO2.
However to guard against significant cancellation due to good starting
vectors the second Lanczos block is still reorthogonalized against the
first.
3.2. Changes to subroutine SMVPC
As noted in the documentation for LASO, SMVPC modified the computed
orthogonality coefficients to try an reflect the effect of local
reorthogonalization. Since the reorthogonalization has been dropped from
LASO2, this modification was eliminated as well.
3.3. Subroutine SVZERO
LASO had a subroutine SVZERO for zeroing arrays. The same effect can be
obtained using the subroutine SCOPY with a zero increment. Thus the two
calls
CALL SVZERO(N, V)
CALL SCOPY(N, 0.0, 0, V, 1)
have exactly the same effect. Since SCOPY is needed anyway for other
purposes, all calls to SVZERO were replaced by calls to SCOPY and SVZERO
was eliminated.
4
3.4. Change in Subroutine SORTQR
The subroutine SORTQR was changed to allow it to be used to
orthonormalize Ritz vectors (approximate eigenvectors) as well as the
Lanczos blocks. The required change was to include an extra integer
parameter so that the dimension of the array holding the vectors to be
orthonormalized could be different than the length of the vectors.
3.5. SUBROUTINE SVSORT
At several points in the code it is necessary to sort eigenvalues and
their associated eigenvectors. This task has now been separated into a
separate module.
3.6. Formatting Changes
Three changes were made in the formatting of the code (how it appears in
a listing). These changes are
1. No required blank comment lines before and after each DO loop.
2. Indentation of DO loops 3 spaces instead of 5.
3. Elimination of sequence numbers.
4. Subroutine Glossary
This section gives a list of all the subroutines and functions provided
in LASO2 with a brief description of their purpose.
SAXPY Basic linear algebra subroutine for adding a scalar
multiple of one vector to another.
SCOPY Basic linear algebra subroutine for copying one vector to
another.
SDOT Basic linear algebra function for computing dot products.
SILASO The entry point to LASO2 for interval type problems.
SIPPLA The post processing module for interval type problems.
SIWLA The main computational module for interval type problems.
SLABAX A Subroutine to compute the product of a band matrix and a
vector. Used by SLAEIG.
SLABCM The control module for SLAEIG. SLABCM decides which shift
to use next in the Rayleigh quotient type iteration used
by SLAEIG.
SLABFC The computational module for SLAEIG. SLABFC factors the
5
shifted band matrix and solves linear equations.
SLAEIG The entry point for the band eigensolver.
SLAGER A subroutine for updating the Gerschgorin circle bounds on
the spectrum of the band matrix.
SLARAN A subroutine for generating random vectors.
SMVPC A subroutine to compute the residual bounds and
orthogonality coefficients.
SNLASO The entry point to LASO2 for number type problems.
SNPPLA The post processing module for number type problems.
SNWLA The main computational module for number type problems.
SNRM2 Basic linear algebra function for computing the 2-norm of
a vector.
SORTQR A subroutine to orthonormalize a set of vectors using
Householder transformations.
SSCAL Basic linear algebra subroutine for scaling a vector.
SSWAP Basic linear algebra subroutine for swapping two vectors.
SVSORT A subroutine to sort eigenvalues and their corresponding
eigenvectors.
URAND A portable random number function.
5. Variable Glossary
This section gives a description of most of the variables used in the
six main subroutines in LASO2. Immediately following the variable name is
a string of two characters. The first character indicates whether the
variable is a scalar (s) or an array (a). The second character indicates
the variable type: logical (l), integer (i), or real (r). The string xx
indicates that the variable is an external subroutine. After these two
characters there follows a string of six characters. These characters
indicate the status of the variable with respect to the six subroutines
1. SNLASO
2. SNWLA
3. SNPPLA
4. SILASO
5. SIWLA
6
6. SIPPLA
The possible characters and their meanings are:
. the variable does not appear in the subroutine.
i the variable is a formal parameter which must have a value
on entry to the subroutine.
o the variable is a formal parameter which will always be
assigned a value before the subroutine is finished.
b the variable is a formal parameter which is used for both
input and output.
w the variable is a formal parameter which holds internally
defined and used quantities or which is passed to
subsidiary subroutines to provide workspace.
s the variable is strictly internal to the subroutine and is
not passed to any of the other major subroutines.
Thus an entry in the table
XXXXXX sr .iobws
would mean that the variable XXXXXX is a scalar real variable which does
not appear in SNLASO, is an input parameter to SNWLA, is an output
parameter from SNPPLA, is both input and output in SILASO, is passed to
other subroutines for workspace in SIWLA, and is used only internally in
SIPPLA. Such a combination is impossible.
ALP ar .w..w. ALP is the current diagonal block of T.
ALPMAX sr .s..s. ALPMAX is the largest eigenvalue of the ALP.
ALPMIN sr .s..s. ALPMIN is the smallest eigenvalue of ALP.
ANORM sr .s..s. ANORM is the best estimate of the norm of the matrix
whose eigenvalues are being computed.
ATEMP ar .w..w. ATEMP provides work space for SLAEIG (it holds the upper
triangle of the factored matrix). It is also used in
calls to to SMVPC to hold the residual norms of the
eigenvectors (when more than one residual norm is being
calculated)
AXL sr ....s. AXL is a small perturbation of XL.
AXR sr ....s. AXR is a small perturbation of XR.
BET ar .w..w. BET is the current off diagonal block of T.
7
BETMAX sr .s..s. BETMAX is the largest singular value of BET.
BETMIN sr .s..s. BETMIN is the smallest singular value of BET.
BOUND ar .ww.ww BOUND provides workspace to SLAEIG (it holds the current
upper and lower bounds on the eigenvalues).
D ar .ww.ww D provides workspace to SLAEIG (it holds the active part
of the matrix during the factorization).
DELTA sr soi... DELTA is the (NVAL+1)th eigenvalue of A. It is needed to
assess the accuracy of the desired eigenvalues.
DELTAL sr ...soi DELTAL is the smallest eigenvalue of A which is bigger
than XL. It is needed to assess the accuracy of the
desired eigenvalues.
DELTAR sr ...soi DELTAR is the largest eigenvalue of A which is smaller
than XR. It is needed to assess the accuracy of the
desired eigenvalues.
DONE sl ....s. DONE indicates whether the largest and smallest
eigenvalues of T in the interval [AXL,AXR] have settle
down sufficiently so that the corresponding eigenvalue of
A is known to lie in the same interval.
ENOUGH sl .s.... ENOUGH indicates whether the number of eigenvalues of T
computed is all that are required.
EPS sr siisii EPS is an approximation to the relative machine
precision.
EPSRT sr .s..s. EPSRT is the square root of EPS.
H ar ..w..w H is the reduced matrix in the final Rayleigh-Ritz
procedure (when needed).
HMAX sr ..s..s HMAX is set by a call to SLAGER to an estimate of the
largest eigenvalue of H.
HMIN sr ..s..s HMIN is set by a call to SLAGER to an estimate of the
smallest (most negative) eigenvalue of H.
HV ar ..w..w HV contains the eigenvectors of H after a call to
SLAEIG.
8
IER si .ss.ss IER is the error return flag from SLAEIG.
IERR si oooooo IERR is the error return flag for LASO2.
IND ai ow.ow. IND is a integer array of work space used primarily for
sorting purposes. On exit from either SNLASO or SILASO
IND(1) is set to the number of calls to OP.
IOVECT xx iiiiii IOVECT is the user supplied subroutine for storing and
recalling Lanczos vectors.
J si .s..s. J is the current dimension of T in the Lanczos
algorithm.
MAXJ si ii.ii. MAXJ is the maximum allowed value of J.
MAXOP si ii.ii. MAXOP is the upper limit on calls to OP.
MAXVEC si ...ii. MAXVEC the maximum number of eigenvalues to be found.
N si iiiiii N is the dimension of the matrix A.
NBAND si si.si. NBAND (= NBLOCK + 1) is the row dimension of T.
NBLOCK si iiiiii NBLOCK is the number of vectors in each Lanczos block.
NFIG si ii.ii. NFIG is the number of decimal digits of accuracy
desired.
NGOOD si .s..s. NGOOD is the number of eigenvectors currently being used
for orthogonalization.
NL si ....s. NL is the number of computed eigenvalues which belong
to the left half of the spectrum.
NLEFT si .s.... NLEFT is the number of desired eigenvalues which have
not converged.
9
NMVAL si i.ii.i NMVAL is the row dimension of VAL (except in SNWLA and
SIWLA).
NMVEC si iiiiii NMVEC is the row dimension of VEC.
NOP si sobsob NOP is the number of calls to OP.
NPERM si bbibbb NPERM is the number of accepted eigenvectors of A.
NSTART si .s..s. NSTART is the number of starting vectors used in
restarting the Lanczos algorithm.
NTHETA si .s..s. NTHETA is the number of eigenvalues of T computed at a
potential pause.
NUMBER si .s..s. NUMBER (= NPERM + NGOOD) is the number of eigenvectors
currently used for orthogonalization.
NUML si ....s. NUML-1 is the number of desired eigenvalues smaller than
AXL.
NUMR si ....s. NUMR-1 is the number of desired eigenvalues larger than
AXR.
NVAL si ii.... abs(NVAL) is the number of desired eigenvalues. It may
be negative in SNLASO.
OP xx iiiiii OP is the user supplied subroutine for computing matrix
vector products.
OTAU ar .w..w. OTAU is used to monitor the return of deflated
eigenvectors.
P ar ..w..w P holds a block of eigenvectors during the post
processing.
P0 ar .w..w. P0 is the previous block of Lanczos vectors.
P1 ar .w..w. P1 is the current block of Lanczos vectors.
P2 ar .w..w. P2 is used in the Lanczos step and used elsewhere for
temporary workspace.
PNORM sr .s.... PNORM is the 'norm' of the desired eigenvalues. That is
PNORM = the largest absolute value of a desired
eigenvalue. It is used in the termination criterion.
Q ar ..w..w Q holds a block of eigenvectors during the post
processing.
10
RARITZ sl soisoi RARITZ is true if a final Rayleigh-Ritz procedure is
needed due to an eigenvalue of multiplicity greater than
NBLOCK.
RES ar .w..w. RES holds the residual norms of the eigenvectors saved
from a previous Lanczos sequence.
RNORM sr .s..s. RNORM is the 'norm' of the eigenvalues computed by
previous Lanczos sequences. (See PNORM.)
S ar .w..w. S holds eigenvectors of T.
SMALL sl sii... SMALL is true if the smallest eigenvalues of A are to be
computed.
T ar .w..w. T is the band matrix computed by the Lanczos algorithm.
TAU ar .w..w. TAU is used to monitor the return of deflated
eigenvectors.
TEST sl .s..s. TEST is true if a test run is needed due to an
eigenvalue of multiplicity greater than NBLOCK - 1.
TMAX sr .s..s. TMAX is the Gerschgorin circle bound on the largest
eigenvalue of T.
TMIN sr .s..s. TMIN is the Gerschgorin circle bound on the smallest
eigenvalue of T.
TOLA sr .s..s. TOLA is the relative error tolerance for computed
eigenvalues.
TOLG sr .s..s. TOLG is the threshold for the orthogonalization
coefficients.
TXL sr .s..s. TXL is a dummy parameter in a number type call to
SLAEIG.
TXR sr .s..s. TXR is a dummy parameter in a number type call to
SLAEIG.
11
UTOL sr .s..s. UTOL is the current absolute error tolerance for
computed eigenvalues.
VAL ar ooboob VAL holds the computed eigenvalues and in the post
processors the error estimates as well.
VEC ar ooboob VEC holds the computed eigenvectors.
VTEMP ar .w..w. VTEMP provides workspace to SLAEIG. It is also used in
calls to to SMVPC to hold the orthogonality coefficients
of the eigenvectors (when more than one is being
calculated).
XL sr ...iii XL is the left endpoint of the excluded interval.
XR sr ...iii XR is the right endpoint of the excluded interval.
Appendix A. User Guide for SNLASO
1. Purpose
The subroutine SNLASO computes a few eigenvalues and eigenvectors at one
end of the spectrum of a large sparse symmetric matrix, hereafter called
A. SNLASO uses the block Lanczos algorithm with selective
orthogonalization to compute Rayleigh-Ritz approximations to eigenpairs of
A.
2. Usage
2.1. Calling Sequence
The subroutine statement is
SUBROUTINE SNLASO(OP, IOVECT, N, NVAL, NFIG, NPERM, NMVAL, VAL, NMVEC,
VEC, NBLOCK, MAXOP, MAXJ, WORK, IND, IERR)
On input:
OP specifies a user supplied subroutine for entering
information about the matrix A with calling sequence
OP(N,M,P,Q). See section 2.2 for further information
IOVECT specifies a user supplied subroutine for storing and
recalling vectors with calling sequence IOVECT(N,M,Q,J,K).
See section 2.2 for further information.
N specifies the dimension of A.
NVAL specifies which eigenvalues are desired. abs(NVAL) is the
number of eigenvalues to be computed. If NVAL < 0 the
algebraically smallest (leftmost) eigenvalues are computed
while if NVAL > 0 the largest (rightmost) eigenvalues are
computed. NVAL must not be zero.
NFIG specifies the number of decimal digits of accuracy desired
in the eigenvalues.
NPERM is an integer variable specifying the number of eigenpairs
presupplied by the user. In most cases NPERM will be
zero. See section 2.8 about using positive values of
NPERM. NPERM must not be negative.
NMVAL specifies the row dimension of VAL. NMVAL must not be
smaller than abs(NVAL).
VAL a two dimensional real array with NMVAL rows and at least
four columns. If NPERM > 0 on input then VAL must contain
certain information.
1
NMVEC specifies the row dimension of VEC. NMVEC must not be
smaller than N.
VEC a two dimenional real array with NMVEC rows and at least
abs(NVAL) columns.
NBLOCK specifies the number of vectors in each Lanczos block.
See section 2.6 for guidelines in choosing NBLOCK. NBLOCK
must be positive and must not exceed MAXJ/6.
MAXOP specifies an upper bound on the number of calls to the
subroutine OP. SNLASO terminates when this maximum is
reached. See section 2.7 for guidelines in choosing
MAXOP.
MAXJ indicates the amount of storage available (see WORK below
in this section and IOVECT in section 2.2). MAXJ should
be as large as possible subject to the available storage.
However there is no advantage in having
MAXJ > MAXOP*NBLOCK. MAXJ must not be smaller than
6*NBLOCK.
WORK is a one dimensional real array at least as large as
2*N*NBLOCK + MAXJ*(NBLOCK+NV+2) + 2*NBLOCK*NBLOCK + 3*NV
plus the maximum of
N*NBLOCK
and
MAXJ*(2*NBLOCK+3) + 2*NV + 6 + (2*NBLOCK+2)*(NBLOCK+1),
where NV = abs(NVAL). The first N*NBLOCK elements must
contain the starting vectors for the algorithm (see
section 2.5).
IND is an integer array of dimension at least abs(NVAL) used
for workspace.
IERR is an integer variable.
On output:
NPERM is the number of eigenpairs now known.
VAL contains information about the eigenvalues. The first
column of VAL contains the eigenvalues, ordered from the
most extreme inward. The second, third, and fourth
columns of VAL contain accuracy information explained in
section 2.4.
2
VEC contains the corresponding eigenvectors.
WORK if IERR is negative then the first N*NBLOCK elements of
WORK contain the best vectors for restarting the algorithm
(see section 2.5).
IND(1) contains the actual number of calls to OP. In some
circumstances this might be slightly larger than MAXOP.
IERR is an error completion code. The normal completion code
is zero. See section 2.3 for an explanation of nonzero
codes.
2.2. User Supplied Subroutines
The two user supplied subroutines must be declared EXTERNAL in the
calling program and must conform as follows.
OP(N,M,P,Q). P and Q are NxM real arrays. Q should be returned
as AP where A is the matrix whose eigenvalues are to be
determined. M will never be larger than NBLOCK but it may be
smaller. This subroutine is the only way that the matrix enters
the calculation. The user should adequately test the subroutine
OP because SNLASO has no way of detecting errors in OP.
IOVECT(N,M,Q,J,K). Q is an NxM real array. M will never be
larger than NBLOCK but it may be smaller. IOVECT is used to store
Lanczos vectors as they are computed and to periodically recall
all the currently stored Lanczos vectors. If K = 0 then the M
columns of Q should be stored as the (J - M + 1)th through the Jth
Lanczos vectors. If K = 1 then the columns of Q should be
returned as the (J - M +1)th through the Jth Lanczos vectors which
were previously stored.
Lanczos vectors are computed sequentially. They are stored by
calls to IOVECT with K = 0 and increasing values of J up to some
internally derived value J = I which signals a pause. These
vectors are then recalled by calls to IOVECT with K = 1 and the
same sequence of J values. The first J value in any sequence is
equal to M. After the pause more Lanczos vectors are computed and
these are stored by calls to IOVECT with K = 0 and J values
greater than I until the next pause at which time all the Lanczos
vectors currently stored are recalled with calls to IOVECT with
K = 1 and J = M, ... .
After any pause the algorithm may discard the current Lanczos
vectors and start a new sequence of Lanczos vectors by a call to
IOVECT with K = 0 and J = M. At subsequent pauses only the
current sequence of Lanczos vectors is recalled. In solving a
problem SNLASO may pause many times and discard the previous
Lanczos vectors several times before computing all the desired
3
eigenvalues.
The largest value of J which can appear in a call to IOVECT is
J = MAXJ.
We give two examples for IOVECT. The first example requires
than logical unit 20 be assigned to a secondary storage medium.
SUBROUTINE IOVECT(N,M,Q,J,K)
INTEGER N,M,J,K,I,L
DIMENSION Q(N,M)
IF(J .EQ. M) REWIND 20
IF(K .EQ. 0) WRITE(20) ((Q(I,L), I=1,N), L=1,M)
IF(K .EQ. 1) READ(20) ((Q(I,L), I=1,N), L=1,M)
RETURN
END
This example will fail on systems that do not allow a WRITE
after a READ. On such a system it would be necessary to copy back
and forth between two unit numbers.
The Lanczos vectors can also be kept in fast store. In this
example we assume that N < 100 and MAXJ < 50.
SUBROUTINE IOVECT(N,M,Q,J,K)
INTEGER N,M,J,K,I,L,L1
DIMENSION Q(N,M)
COMMON QVEC(100,50)
IF(K .EQ. 1) GO TO 30
DO 20 L = 1,M
L1 = J - M + L
DO 10 I = 1,N
QVEC(I,L1) = Q(I,L)
10 CONTINUE
20 CONTINUE
RETURN
C
30 DO 50 L = 1,M
L1 = J - M + L
DO 40 I = 1,N
Q(I,L) = QVEC(I,L1)
40 CONTINUE
50 CONTINUE
RETURN
END
4
2.3. Error Completion Codes
IERR = 0 indicates a normal completion. abs(NVAL) eigenpairs have been
computed. See section 2.4 for a description of the information returned.
IERR > 0 and IERR < 1024 indicates that some calling parameters were
inconsistant.
1-bit is set if N < 6*NBLOCK
2-bit is set if NFIG < 0
4-bit is set if NMVEC < N
8-bit is set if NPERM < 0
16-bit is set if MAXJ < 6*NBLOCK
32-bit is set if abs(NVAL) < max(1, NPERM)
64-bit is set if abs(NVAL) > NMVAL
128-bit is set if abs(NVAL) > MAXOP
256-bit is set if abs(NVAL) > MAXJ/2
512-bit is set if NBLOCK < 1
Thus IERR can be decoded to determine the errors. For example IERR = 68
indicates both NMVEC < N and abs(NVAL) > NMVAL.
IERR = -1 can occur only if NPERM > 0 on input. It indicates that
either some user supplied eigenvector was zero or that significant
cancellation occurred when the user supplied vectors were orthogonalized.
Some modification of the user supplied vectors may have occurred but no
other computation will have been done.
IERR = -2 indicates that MAXOP calls to the subroutine OP occurred
without finding all the desired eigenvalues. NPERM eigenvalues are known
and information on them is returned as described in section 2.4.
Furthermore the first N*NBLOCK elements of work contain the best vectors
for restarting the algorithm. Thus SNLASO may be immediately recalled to
continue working on the problem if progress up to this point is deemed
satisfactory.
IERR = -8 indicates that disastrous loss of orthogonality has occurred.
This is usually due to errors in the user supplied subroutine OP or
possible IOVECT.
2.4. Information Returned When IERR = 0
IERR = 0 indicates that the desired eigenpairs have been found. The
eigenvalues are in the first column of VAL. If NVAL < 0 the eigenvalues
are in ascending order (smallest at the top) while if NVAL > 0 the
eigenvalues are in descending order. The corresponding eigenvectors are
in the first abs(NVAL) columns of VEC. The second column of VAL contains
the residual norms (2-norm of Ay - ye, for the eigenvector y and the
eigenvalue e) which are bounds on the accuracy of the eigenvalues.
5
In most cases the residual norms seriously underestimate the accuracy of
the eigenvalues. To obtain a more realistic estimate of the error the
program approximates the gap between the desired eigenvalues and the rest
of the spectrum and computes the (residual squared)/gap which would be a
bound on the accuracy of the eigenvalue if the gap was accurate. This
estimate of the accuracy of the eigenvalue is returned in the third column
of VAL. The fourth column of VAL contains residual/gap which is an
estimate of the error in the eigenvector.
If the user has supplied some eigenpairs of the matrix, it is possible
that some of these eigenpairs have been discarded in favor of eigenpairs
computed by the algorithm. (See section 2.8 for additional information).
2.5. Choosing the Starting Vectors
SNLASO requires NBLOCK starting vectors to be stored in the first
N*NBLOCK elements of the array WORK. Zero vectors are replaced by random
vectors so that a set of random starting vectors may be selected by merely
initializing the first N*NBLOCK elements of WORK to zero. However
convergence is enhanced if the starting vectors have large components in
the directions of the desired eigenvectors. Therefore if the user knows
approximations to the desired eigenvectors he should choose his starting
vectors as linear combinations of these approximations.
If some of the desired eigenpairs are already known to sufficient
accuracy it is possible to avoid having SNLASO recompute these eigenpairs,
see section 2.8 for details.
2.6. Choosing a Value for NBLOCK
NBLOCK specifies the number of vectors in each Lanczos block. Two
factors may favor a large block size. The convergence of the algorithm is
faster if NBLOCK exceeds the highest multiplicity of any of the desired
eigenvalues. For example if a desired eigenvalue is double then a value
of NBLOCK at least 3 is best. Also NBLOCK vectors are multiplied by A on
every call to OP. Large values of NBLOCK tend to reduce the number of
calls to OP so that if the matrix must be recomputed or recalled from disk
then a large value of NBLOCK may be preferable (perhaps as large as 8).
On the other hand the amount of overhead involved in the algorithm
itself is a quadratic of NBLOCK. Furthermore the convergence of the
algorithm is degraded if nblock > sqrt(MAXJ).
In conclusion if the matrix multiply is inexpensive then a small value
of NBLOCK (2 or 3) is recommended while if the matrix multiply is
expensive then larger values of NBLOCK (up to 8 say) are better.
NBLOCK = 1 is recommended if and only if abs(NVAL) = 1 as well.
6
2.7. Choosing a Value for MAXOP
SNLASO is an iterative procedure. The user must limit the effort by
SNLASO on a given problem by choosing a value for MAXOP. If more than
MAXOP calls to the subroutine OP are needed to solve the given problem
then SNLASO terminates at that point with IERR set to -2.
MAXOP must not be less than abs(NVAL) and setting MAXOP less than
MAXJ/NBLOCK will waste the storage indicated by MAXJ. If cost is not a
factor and the subroutine OP is known to be correct then there is no harm
in setting MAXOP to a large value (say N). Otherwise MAXOP should be set
an upper bound on the acceptable cost of running the code.
2.8. Setting NPERM > 0
SNLASO allows known eigenpairs to be input directly so that they need
not be recomputed. The first column of VAL must contain the eigenvalue
(in any order) and the second column of VAL must contain the residual
norms. The correct order of magnitude is sufficient. Columns three and
four of VAL are arbitrary. The first NPERM columns of VEC must contain
the corresponding eigenvectors. SNLASO will sort the eigenvalues (and
vectors) and then orthonormalize the eigenvectors.
The user supplied eigenpairs are counted the number of desired
eigenpairs so NPERM must not be greater than abs(NVAL). If in the course
of the computation it appears that a user supplied eigenpair is not one of
the desired ones it will be discarded. In particular if NPERM = abs(NVAL)
then the algorithm will either confirm that the user supplied eigenpairs
are the desired ones or it will discard one or more of them and go on and
compute their replacements.
3. Applicability and Restrictions
SNLASO is designed to find a few extreme eigenpairs of a large sparse
symmetric matrix. For small dense matrices the subroutines provided by
EISPACK are to be preferred. It is not necessary for the matrix to be
explicitly represented. It is only necessary to providea subroutine OP to
compute matrix vector products. Any compact storage scheme that is
sufficient to generate the matrix vector products will work. Consider the
generalized eigenvalue problem (A - (lambda)M)x = 0, where M is positive
T -1 -T
definite and can be factored as LL . The matrix L AL can be coded in
OP as a triangular solve, a multiply by A, and another triangular solve.
Thus the generalized eigenvalue problem can be reduced to a standard
-1 -T
eigenvalue problem without explicitly forming the matrix L AL . More
complex operators can also be handled.
SNLASO calls a number of subsidiary functions and subroutines. The
following names must not be used in the driver program.
SNWLA,SNPPLA,SMVPC,SORTQR,SVSORT,SAXPY,SCOPY,SDOT,SNRM2,SSCAL,SSWAP,SLAEIG,
7
SLAGER,SLARAN,SLABAX,SLABCM,SLABFC,URAND.
4. Discussion of Method
SNLASO uses the block Lanczos algorithm with selective
orthogonalization.
The (scalar) Lanczos algorithm is an efficient scheme for computing an
orthonormal basis q, q ,...,q for the Krylov subspace
1 2 j
j-1
span(q , Aq , ..., A q ) and the corresponding reduced matrix
1 1 1
T
T = Q AQ , which happens to be tridiagonal. At each step the Krylov
j j j
subspace grows larger, one more Lanczos vector is added to the list, and T
gets longer. With T it is easy to compute the Rayleigh-Ritz
approximations to eigenpairs of A from the Krylov subspace and their
corresponding error bounds. The algorithm continues until the error
bounds are sufficiently small.
The block Lanczos algorithm replaces each vector in the simple Lanczos
algorithm by an orthonormal block of vectors. Block Lanczos has
theoretical advantages over simple Lanczos with respect to finding
multiple eigenvalues and has advantages in efficiency if the cost of
forming the matrix vector products is high.
Unfortunately finite precision arithmetic causes the vectors computed by
the algorithm to lose orthogonality and approach linear dependence. To
maintain robust independence among the Lanczos vectors SNLASO uses
selective orthogonalization in which a few Lanczos vectors are
orthogonalized against a few selected Ritz vectors.
The algorithm is terminated when the desired Ritz values are
sufficiently accurate. If necessary, SNLASO then makes another Lanczos
run to test for undisclosed multiplicities. Finally if such
multiplicities are discovered SNLASO performs a final Rayleigh-Ritz
procedure to resolve such clusters.
Appendix B. User Guide for SILASO
1. Purpose
The subroutine SILASO computes all the eigenvalues and eigenvectors of a
large sparse symmetric matrix, hereafter called A, outside a user defined
excluded interval. SILASO uses the block Lanczos algorithm with selective
orthogonalization to compute Rayleigh-Ritz approximations to eigenpairs of
A.
2. Usage
2.1. Calling Sequence
The subroutine statement is
SUBROUTINE SILASO(OP, IOVECT, N, XL, XR, NFIG, NPERM, NMVAL, VAL, NMVEC,
MAXVEC, VEC, NBLOCK, MAXOP, MAXJ, WORK, IND, IERR)
On input:
OP specifies a user supplied subroutine for entering
information about the matrix A with calling sequence
OP(N,M,P,Q). See section 2.2 for further information
IOVECT specifies a user supplied subroutine for storing and
recalling vectors with calling sequence IOVECT(N,M,Q,J,K).
See section 2.2 for further information.
N specifies the dimension of A.
XL specifies the left endpoint of the excluded interval.
XR specifies the right endpoint of the excluded interval.
NFIG specifies the number of decimal digits of accuracy desired
in the eigenvalues.
NPERM is an integer variable specifying the number of eigenpairs
presupplied by the user. In most cases NPERM will be
zero. See section 2.8 about using positive values of
NPERM. NPERM must not be negative.
NMVAL specifies the row dimension of VAL. NMVAL must not be
smaller than MAXVEC.
VAL a two dimensional real array with NMVAL rows and at least
four columns. If NPERM > 0 on input then VAL must contain
certain information.
NMVEC specifies the row dimension of VEC. NMVEC must not be
smaller than N.
1
MAXVEC specifies the maximum number of eigenpairs which can be
computed.
VEC a two dimenional real array with NMVEC rows and at least
MAXVEC columns.
NBLOCK specifies the number of vectors in each Lanczos block.
See section 2.6 for guidelines in choosing NBLOCK. NBLOCK
must be positive and must not exceed MAXJ/6.
MAXOP specifies an upper bound on the number of calls to the
subroutine OP. SILASO terminates when this maximum is
reached. See section 2.7 for guidelines in choosing
MAXOP.
MAXJ indicates the amount of storage available (see WORK below
in this section and IOVECT in section 2.2). MAXJ should
be as large as possible subject to the available storage.
However there is no advantage in having
MAXJ > MAXOP*NBLOCK. MAXJ must not be smaller than
6*NBLOCK.
WORK is a one dimensional real array at least as large as
2*N*NBLOCK + MAXJ*(NBLOCK+1+MAXVEC+2) + 2*NBLOCK*NBLOCK + 3*MAXVEC
plus the maximum of
N*NBLOCK
and
MAXJ*(2*NBLOCK+2) + 2*MAXVEC + 8 + (2*NBLOCK+2)*(NBLOCK+1)
The first N*NBLOCK elements must contain the starting
vectors for the algorithm (see section 2.5).
IND is an integer array of dimension at least MAXVEC used for
workspace.
IERR is an integer variable.
On output:
NPERM is the number of eigenpairs now known.
VAL contains information about the eigenvalues. The first
column of VAL contains the eigenvalues, ordered from the
most extreme inward. The second, third, and fourth
columns of VAL contain accuracy information explained in
section 2.4.
2
VEC contains the corresponding eigenvectors.
WORK if IERR is negative then the first N*NBLOCK elements of
WORK contain the best vectors for restarting the algorithm
(see section 2.5).
IND(1) contains the actual number of calls to OP. In some
circumstances this might be slightly larger than MAXOP.
IERR is an error completion code. The normal completion code
is zero. See section 2.3 for an explanation of nonzero
codes.
2.2. User Supplied Subroutines
The two user supplied subroutines must be declared EXTERNAL in the
calling program and must conform as follows.
OP(N,M,P,Q). P and Q are NxM real arrays. Q should be returned
as AP where A is the matrix whose eigenvalues are to be
determined. M will never be larger than NBLOCK but it may be
smaller. This subroutine is the only way that the matrix enters
the calculation. The user should adequately test the subroutine
OP because SILASO has no way of detecting errors in OP.
IOVECT(N,M,Q,J,K). Q is an NxM real array. M will never be
larger than NBLOCK but it may be smaller. IOVECT is used to store
Lanczos vectors as they are computed and to periodically recall
all the currently stored Lanczos vectors. If K = 0 then the M
columns of Q should be stored as the (J - M + 1)th through the Jth
Lanczos vectors. If K = 1 then the columns of Q should be
returned as the (J - M +1)th through the Jth Lanczos vectors which
were previously stored.
Lanczos vectors are computed sequentially. They are stored by
calls to IOVECT with K = 0 and increasing values of J up to some
internally derived value J = I which signals a pause. These
vectors are then recalled by calls to IOVECT with K = 1 and the
same sequence of J values. The first J value in any sequence is
equal to M. After the pause more Lanczos vectors are computed and
these are stored by calls to IOVECT with K = 0 and J values
greater than I until the next pause at which time all the Lanczos
vectors currently stored are recalled with calls to IOVECT with
K = 1 and J = M, ... .
After any pause the algorithm may discard the current Lanczos
vectors and start a new sequence of Lanczos vectors by a call to
IOVECT with K = 0 and J = M. At subsequent pauses only the
current sequence of Lanczos vectors is recalled. In solving a
problem SILASO may pause many times and discard the previous
Lanczos vectors several times before computing all the desired
3
eigenvalues.
The largest value of J which can appear in a call to IOVECT is
J = MAXJ.
We give two examples for IOVECT. The first example requires
than logical unit 20 be assigned to a secondary storage medium.
SUBROUTINE IOVECT(N,M,Q,J,K)
INTEGER N,M,J,K,I,L
DIMENSION Q(N,M)
IF(J .EQ. M) REWIND 20
IF(K .EQ. 0) WRITE(20) ((Q(I,L), I=1,N), L=1,M)
IF(K .EQ. 1) READ(20) ((Q(I,L), I=1,N), L=1,M)
RETURN
END
This example will fail on systems that do not allow a WRITE
after a READ. On such a system it would be necessary to copy back
and forth between two unit numbers.
The Lanczos vectors can also be kept in fast store. In this
example we assume that N < 100 and MAXJ < 50.
SUBROUTINE IOVECT(N,M,Q,J,K)
INTEGER N,M,J,K,I,L,L1
DIMENSION Q(N,M)
COMMON QVEC(100,50)
IF(K .EQ. 1) GO TO 30
DO 20 L = 1,M
L1 = J - M + L
DO 10 I = 1,N
QVEC(I,L1) = Q(I,L)
10 CONTINUE
20 CONTINUE
RETURN
C
30 DO 50 L = 1,M
L1 = J - M + L
DO 40 I = 1,N
Q(I,L) = QVEC(I,L1)
40 CONTINUE
50 CONTINUE
RETURN
END
4
2.3. Error Completion Codes
IERR = 0 indicates a normal completion. NPERM eigenpairs have been
computed. See section 2.4 for a description of the information returned.
IERR > 0 and IERR < 1024 indicates that some calling parameters were
inconsistant.
1-bit is set if N < 6*NBLOCK
2-bit is set if NFIG < 0
4-bit is set if NMVEC < N
8-bit is set if NPERM < 0
16-bit is set if MAXJ < 6*NBLOCK
32-bit is set if MAXVEC < NPERM
64-bit is set if MAXVEC > NMVAL
128-bit is set if MAXVEC > MAXOP
256-bit is set if XL > XR
512-bit is set if NBLOCK < 1
Thus IERR can be decoded to determine the errors. For example IERR = 68
indicates both NMVEC < N and MAXVEC > NMVAL.
IERR = -1 can occur only if NPERM > 0 on input. It indicates that
either some user supplied eigenvector was zero or that significant
cancellation occurred when the user supplied vectors were orthogonalized.
Some modification of the user supplied vectors may have occurred but no
other computation will have been done.
IERR = -2 indicates that MAXOP calls to the subroutine OP occurred
without finding all the desired eigenvalues. NPERM eigenvalues are known
and information on them is returned as described in section 2.4.
Furthermore the first N*NBLOCK elements of work contain the best vectors
for restarting the algorithm. Thus SILASO may be immediately recalled to
continue working on the problem if progress up to this point is deemed
satisfactory.
IERR = -4 can occur only if NPERM > 0 on input. It indicates that a
user supplied eigenvalue lies inside the excluded interval. Some
modification of the user supplied eigenpairs may have occurred but no
other computation will have been done.
IERR = -5 and IERR = -6 indicate that the program has terminated without
full assurance that all the desired eigenvalues have been computed due to
an approximate eigenvalue which lies inside but near to the boundary of
the excluded region which has not converged sufficiently to guarantee that
the real eigenvalue lies outside the excluded interval. IERR = -5 if J =
MAXJ and IERR = -6 if MAXOP was exceeded. All the eigenpairs returned by
the code are correct but there may be more. It is necessary to recall
SILASO to continue working on the problem to obtain full assurance.
IERR = -8 indicates that disastrous loss of orthogonality has occurred.
5
This is usually due to errors in the user supplied subroutine OP or
possible IOVECT.
IERR < -10 indicates that some of the eigenvalues found by the program
lie inside the excluded interval but have error bounds which overlap the
boundary. Such eigenvalues are explicitly set equal to the boundary and
marked as described in section 2.4. The tens digit indicates the number
of such eigenvalues while the units digit should be interpreted as above.
IERR < -100 indicates that more than MAXVEC eigenvalues lie outside the
excluded interval. The hundreds digit gives the number of extra
eigenvalues now known to exist. Any eigenpairs returned by the code are
correct and after raising the value of MAXVEC it is possible to recall
SILASO to keep working on the problem. Of course the extra storage
indicated by the larger value of MAXVEC must be available and it is
possible that the code may find even more eigenvalues and again terminate
with IERR < -100. The tens and units digits are as described above.
2.4. Information Returned When IERR = 0
IERR = 0 indicates that the all desired eigenpairs have been found. The
NPERM eigenvalues are in the first column of VAL. The eigenvalues are in
ascending order. The corresponding eigenvectors are in the first NPERM
columns of VEC. The second column of VAL contains the residual norms
(2-norm of Ay - ye, for the eigenvector y and the eigenvalue e) which are
bounds on the accuracy of the eigenvalues.
In most cases the residual norms seriously underestimate the accuracy of
the eigenvalues. To obtain a more realistic estimate of the error the
program approximates the gaps between the desired eigenvalues and the rest
of the spectrum and computes the (residual squared)/gap which would be a
bound on the accuracy of the eigenvalue if the gap was accurate. This
estimate of the accuracy of the eigenvalue is returned in the third column
of VAL. The fourth column of VAL contains residual/gap which is an
estimate of the error in the eigenvector.
2.5. Choosing the Starting Vectors
SILASO requires NBLOCK starting vectors to be stored in the first
N*NBLOCK elements of the array WORK. Zero vectors are replaced by random
vectors so that a set of random starting vectors may be selected by merely
initializing the first N*NBLOCK elements of WORK to zero. However
convergence is enhanced if the starting vectors have large components in
the directions of the desired eigenvectors. Therefore if the user knows
approximations to the desired eigenvectors he should choose his starting
vectors as linear combinations of these approximations.
If some of the desired eigenpairs are already known to sufficient
accuracy it is possible to avoid having SILASO recompute these eigenpairs,
see section 2.8 for details.
6
2.6. Choosing a Value for NBLOCK
NBLOCK specifies the number of vectors in each Lanczos block. Two
factors may favor a large block size. The convergence of the algorithm is
faster if NBLOCK exceeds the highest multiplicity of any of the desired
eigenvalues. For example if a desired eigenvalue is double then a value
of NBLOCK at least 3 is best. Also NBLOCK vectors are multiplied by A on
every call to OP. Large values of NBLOCK tend to reduce the number of
calls to OP so that if the matrix must be recomputed or recalled from disk
then a large value of NBLOCK may be preferable (perhaps as large as 8).
On the other hand the amount of overhead involved in the algorithm
itself is a quadratic of NBLOCK. Furthermore the convergence of the
algorithm is degraded if nblock > sqrt(MAXJ).
In conclusion if the matrix multiply is inexpensive then a small value
of NBLOCK (2 or 3) is recommended while if the matrix multiply is
expensive then larger values of NBLOCK (up to 8 say) are better.
NBLOCK = 1 is recommended if and only if abs(NVAL) = 1 as well.
2.7. Choosing a Value for MAXOP
SILASO is an iterative procedure. The user must limit the effort by
SILASO on a given problem by choosing a value for MAXOP. If more than
MAXOP calls to the subroutine OP are needed to solve the given problem
then SILASO terminates at that point with IERR set to -2.
MAXOP must not be less than abs(NVAL) and setting MAXOP less than
MAXJ/NBLOCK will waste the storage indicated by MAXJ. If cost is not a
factor and the subroutine OP is known to be correct then there is no harm
in setting MAXOP to a large value (say N). Otherwise MAXOP should be set
an upper bound on the acceptable cost of running the code.
2.8. Setting NPERM > 0
SILASO allows known eigenpairs to be input directly so that they need
not be recomputed. The first column of VAL must contain the eigenvalue
(in any order) and the second column of VAL must contain the residual
norms. The correct order of magnitude is sufficient. Columns three and
four of VAL are arbitrary. The first NPERM columns of VEC must contain
the corresponding eigenvectors. SILASO will sort the eigenvalues (and
vectors) and then orthonormalize the eigenvectors.
3. Applicability and Restrictions
SILASO is designed to find a few extreme eigenpairs of a large sparse
symmetric matrix. For small dense matrices the subroutines provided by
EISPACK are to be preferred. It is not necessary for the matrix to be
explicitly represented. It is only necessary to providea subroutine OP to
compute matrix vector products. Any compact storage scheme that is
sufficient to generate the matrix vector products will work. Consider the
generalized eigenvalue problem (A - (lambda)M)x = 0, where M is positive
7
T -1 -T
definite and can be factored as LL . The matrix L AL can be coded in
OP as a triangular solve, a multiply by A, and another triangular solve.
Thus the generalized eigenvalue problem can be reduced to a standard
-1 -T
eigenvalue problem without explicitly forming the matrix l Al . More
complex operators can also be handled.
SILASO can be used to find all the eigenvalues of A in some given
interval [a,b] provided that it is possible to factor (A - (sigma)I) for
some value of (sigma) in [a,b]. By coding OP to solve linear equations in
(A - (sigma)I) it is possible to implicitly use the operator
-1
(A - (sigma)I) whose eigenvalues are 1/((lambda)-(sigma)). Thus all the
eigenvalues of inside [a,b] become all the eigenvalues of the inverted
operator outside [1/(a - (sigma)), 1/(b - (sigma))] and SILASO can be
applied. The only problem is that the computed eigenvalues have to be
back transformed.
SILASO calls a number of subsidiary functions and subroutines. The
following names must not be used in the driver program.
SIWLA,SIPPLA,SMVPC,SORTQR,SVSORT,SAXPY,SCOPY,SDOT,SNRM2,SSCAL,SSWAP,SLAEIG,
SLAGER,SLARAN,SLABAX,SLABCM,SLABFC,URAND.
4. Discussion of Method
SILASO uses the block Lanczos algorithm with selective
orthogonalization.
The (scalar) Lanczos algorithm is an efficient scheme for computing an
orthonormal basis q , q ,...,q for the Krylov subspace
1 2 j
j-1
span(q , Aq , ..., A q ) and the corresponding reduced matrix
1 1 1
T
T = Q AQ which happens to be tridiagonal. At each step the Krylov
j j j
subspace grows larger, one more Lanczos vector is added to the list, and T
gets longer. With T it is easy to compute the Rayleigh-Ritz
approximations to eigenpairs of A from the Krylov subspace and their
corresponding error bounds. The algorithm continues until the error
bounds are sufficiently small.
The block Lanczos algorithm replaces each vector in the simple Lanczos
algorithm by an orthonormal block of vectors. Block Lanczos has
theoretical advantages over simple Lanczos with respect to finding
multiple eigenvalues and has advantages in efficiency if the cost of
forming the matrix vector products is high.
Unfortunately finite precision arithmetic causes the vectors computed by
the algorithm to lose orthogonality and approach linear dependence. To
8
maintain robust independence among the Lanczos vectors SILASO uses
selective orthogonalization in which a few Lanczos vectors are
orthogonalized against a few selected Ritz vectors.
The algorithm is terminated when the desired Ritz values are
sufficiently accurate. If necessary, SILASO then makes another Lanczos
run to test for undisclosed multiplicities. Finally if such
multiplicities are discovered SILASO performs a final Rayleigh-Ritz
procedure to resolve such clusters.
-------