
    This file contains an assortment of notes about the QMR codes.  It
should answer some questions about the package and its structure, how to
compile the test examples, and so on.  The assumption is that you are on a
fairly standard Unix system.  All the Unix commands listed are valid on the
Sun running SunOS v4.1.1 where this distribution was tested, and they may be
different on your machine.


    General information (v1.3)
    --------------------------

    The distribution consists mainly of FORTRAN routines implementing both the
three-term recurrence and the coupled two-term recurrence variants of the
look-ahead Lanczos algorithm, with applications to the computation of
eigenvalue approximations, and the solution of linear systems with the
quasi-minimal residual method.  However, we also include some algorithms
without look-ahead, some example drivers, and support for an example data
format and some preconditioners for the linear systems solver.  The algorithms
without look-ahead are less robust than their look-ahead counterparts, but are
included here for completeness.  We recommend using the look-ahead versions of
the coupled two-term recurrence QMR algorithm for solving linear systems and of
the three-term Lanczos algorithm for computing eigenvalue approximations.


    Structure of the distribution
    -----------------------------

    The unpacking procedure creates an entire tree of subdirectories.  The
top-level directory contains installation notes, this file, and the copyright
notice that covers the codes.  Also in this directory are the `algs', `csr',
and `libs' directories (called level-0 directories), which contain all the
codes for the algorithms, the compressed sparse row data format, and the
various libraries, respectively.  The directories `algs' and `libs' are further
divided into level-1 directories, each one with the sources for only one
algorithm or one library.  The main routines of interest are the algorithms in
`algs'; the level-1 subdirectories here are complete algorithms, missing only
library routines (provided in `libs') and user-specified driver routines that
handle the matrix operations and preconditioning; we provide example drivers
supporting the compressed sparse row format in the subdirectory `csr'.

    The algorithm directories are of the form MALG, and the algorithm source
file names are of the form PMALG, where:

        P denotes the precision and data type:
            D = double precision real
            S = single precision real
            Z = double precision complex
            C = single precision complex
        M denotes the matrix type:
            S = symmetric
            U = unsymmetric
        ALG denotes the algorithm:
            CPL = QMR based on coupled two-term Lanczos with look-ahead
            CPX = QMR based on coupled two-term Lanczos without look-ahead
            LAL = eigenvalue solver based on three-term Lanczos with look-ahead
            QBG = QMR-from-BCG
            QMR = QMR based on three-term Lanczos with look-ahead
            QMX = QMR based on three-term Lanczos without look-ahead
            TFX = transpose-free QMR without look-ahead

    For example, ZUCPL denotes the double precision complex codes for the QMR
method based on coupled two-term Lanczos with look-ahead, for unsymmetric
matrices.  It will be found in the UCPL level-1 subdirectory of the `algs'
directory.

    Every directory contains a file called `filelist', which has a listing of
all the files in that directory, together with a short description of each
file.  Here is an overview of the distribution tree:


  level-0     level-1                          Description

   algs +  ................................... Algorithms
        |
        +----- scpl .......................... SCPL algorithm
        |
        +----- scpx .......................... SCPX algorithm
        |
        +----- slal .......................... SLAL algorithm
        |
        +----- sqbg .......................... SQBG algorithm
        |
        +----- sqmr .......................... SQMR algorithm
        |
        +----- sqmx .......................... SQMX algorithm
        |
        +----- ucpl .......................... UCPL algorithm
        |
        +----- ucpx .......................... UCPX algorithm
        |
        +----- ulal .......................... ULAL algorithm
        |
        +----- uqbg .......................... UQBG algorithm
        |
        +----- uqmr .......................... UQMR algorithm
        |
        +----- uqmx .......................... UQMX algorithm
        |
        +----- utfx .......................... UTFX algorithm

   csr  +  ................................... Compressed Sparse Row format
        | 
        +----- data .......................... Example CSR data files

   incl ...................................... Miscellaneous include files

   libs +  ................................... Libraries
        |
        +---- blas ........................... BLAS routines
        |
        +---- deis ........................... EISPACK routines (double)
        |
        +---- lapk ........................... LAPACK routines
        |
        +---- linp ........................... LINPACK routines
        |
        +---- misc ........................... Miscellaneous routines
        |
        +---- seis ........................... EISPACK routines (single)

    In general, the source code is in files with the extension `.src'.  These
are "almost FORTRAN", with the following exceptions:

    - they contain C preprocessor instructions, such as the macro definition
      "#define" and the include directive "#include", used to make the code
      more readable and to define constants in the code;
    - they contain lines that are longer than 72 characters.

    One needs to run these files through the C preprocessor `cpp' (which will
produce FORTRAN code, but possibly with lines longer than 72 characters, and
will also produce C preprocessor line numbers and such), then through the Unix
program `sed' (which is used to strip out the C preprocessor line numbers and
any blank lines), and finally through the Unix program `awk' (which is used to
truncate lines to 72 characters).  The default makefiles in the distribution
automatically do all this.  Some FORTRAN compilers can actually handle the
`.src' files directly; for example, the FORTRAN compiler on the Sun will
automatically invoke the C preprocessor first if the extension of a source
file is `.F', rather than `.f'; this compiler could also handle lines of up to
132 characters.  We provided an explicit mechanism to convert the `.src' files
into legal FORTRAN `.f' files, but it requires the C preprocessor `cpp' and the
Unix `awk' and `sed' programs.  If it turns out that you cannot generate the
FORTRAN sources, let us know and we'll provide them.

    Since the object files are created from the `.f' files, you will find that
if you use a debugger on the codes, then you will be staring at some rather
"dense" FORTRAN codes.  Nonetheless, it is fairly easy to establish the
correspondence between the lines in the `.f' file and the matching `.src' file,
and you will find that the `.src' file is far more readable by human eyes than
the FORTRAN source would ever be.  You'll definitely want to work with the
`.src' files, and not the `.f' files.  Trust us...

    For the single precision codes, we generally obtained them by running the
double precision codes through the Unix `sed' program, with suitable scripts.
This has at least two implications.  First, the single precision codes were not
tested nearly as extensively as their double precision counterparts.  Also, the
library FORTRAN codes are *NOT* identical to the single precision codes you
would find in the distribution for the corresponding library, though they are
functionally identical.  We do not recommend use of these library codes for any
purpose other than as support routines for this package.  In general, while we
provide all the library routines called by our codes, this is only done in the
interest of completeness.  If you have these libraries already compiled on your
system, then we would recommend using the local libraries, which might also be
optimized for your system.

    The type of the files in the distribution can be identified from the
filename extension, as follows:

	.awk	Scripts for the `awk' program
	.dat	Example data files
	.doc	Documentation files
	.f	FORTRAN source
	.inc	Include bits for the .src files
	.mak	Include bits for makefiles
	.out	Output files produced by the example drivers
	.sed	Scripts for the `sed' program
	.src	"Almost FORTRAN" source, needs some preprocessing


    Solving linear systems
    ----------------------

    As mentioned above, the distribution is logically grouped into algorithms
and data formats.  In the level-1 directories from the `algs' subdirectory,
you will find the source code for several algorithms.  In this section, we
describe the logical structure of the linear solvers.  Most of the discussion
below deals with how the solver routines are set up; you will need this
information if you want to incorporate the solvers in your own codes.  We will
only briefly discuss the example drivers provided.

    Two of the issues that come up in designing a linear solver code have to do
with the way that the matrix and the preconditioner are handled.  In our codes,
we opted to make both the preconditioner and the matrix external to the codes.
At an abstract level, this means that the solvers we provide do not know
anything about preconditioning, they only know how to solve a linear system
A x = b; in fact, the solvers also assume that the starting guess is x_0 = 0.
This system might already be the preconditioned version of the system you
really want to solve, but our codes do not know the difference: we compute
quantities that belong to the system A x = b.  The other option would be to
rewrite the algorithms so that they incorporate explicitly the preconditioner,
and they compute quantities belonging to the unpreconditioned system.  This
could be done for all the algorithms we coded; we just chose not to do it.  So,
for example, if you precondition your system on the left, then the quantities
we compute are left-preconditioned, and in particular, the residuals computed
by our solvers (and their norms used in the convergence checks) would be the
preconditioned residuals, rather than the residuals of the unpreconditioned
system.

    To perform the matrix-vector multiplication, we use reverse communication.
This means that the algorithm routines will return to the caller each time a
matrix-vector multiplication is required, signal to the caller by means of a
flag whether they need a multiplication by A or by A^T (where appropriate), and
then expect the caller to call them back after the multiplication is done.  An
example of the calling sequence used is provided both in the documentation for
each solver, as well as in the various example drivers.  The major advantage of
using reverse communication is that everything related to the matrix (storage
format, preconditioners, etc) is external to our codes, hence there are no
limitations imposed by our choice of routine names, arguments lists, and so on.

    Of course, since we may be solving the preconditioned system and returning
the preconditioned solution, you may need to recover the solution of your
original system.  We provide the example drivers, which illustrate the steps
needed to combine our codes with preconditioners; here is the setup used.
The code is set up to solve the system A x = b with initial guess x_0 = 0.
Here A x = b denotes the preconditioned system, and it is connected with the
original system as follows.  Let B y = c be the original unpreconditioned
system to be solved, and let y_0 be an arbitrary initial guess for its
solution.  Then:

          A x = b, where  A = M_1^{-1} B M_2^{-1},
                          x = M_2 (y - y_0), b = M_1^{-1} (c - B y_0).

Here M = M_1 M_2 is the preconditioner.  To recover the final iterate y_n for
the original system B y = c from the final iterate x_n for the preconditioned
system A x = b, set

               y_n = y_0 + M_2^{-1} x_n.

    Another issue that comes up when using an iterative solver is the
convergence criterion.  We use a simple relative residual criterion, aiming to
reduce the norm of the starting residual by a certain factor.  We do not claim
that our choice is the best one, and it certainly is not the only one.  Indeed,
we are aware that there are other criteria used in practice or recommended in
the literature.  It was impractical (impossible?) for us to design a criterion
that would suit everyone; if you want to change the criterion to your favorite
one, just hack the solvers.

    As noted by Freund and Hochbruck, the QMR and TFQMR algorithms can also be
used to solve consistent singular systems, as long as the coefficient matrix
has a one-dimensional subspace. In the case of singular systems, we recommend
to start QMR and TFQMR with a non-trivial initial guess for the solution; for
example, a random vector will be fine.

    Finally, we would like to mention a situation we've encountered.  In some
applications, the right-hand side vector is quite sparse, having only few
non-zero elements.  In these cases, if one starts with a zero guess for the
solution, then the starting residual (which is used to generate the Krylov
basis) is just the right-hand side vector, and thus it is also sparse.  It is
easy to see that the Krylov vectors may fill in quite slowly, especially if the
matrix is also quite sparse, and it may take a few steps before enough vectors
are generated so that a combination of them would even cover the sparsity
pattern in the solution vector.  In such cases, we recommend to start with a
random guess to the solution, which should give a starting residual that is
``dense''.


    Computing eigenvalue approximations
    ------------------------------------

    After n steps, the three-term recurrence variant of the look-ahead
Lanczos algorithm has generated an n x n block-tridiagonal matrix H_n whose
eigenvalues (the so-called Ritz values) can be used as approximations to the
eigenvalues of A.  We also provide a driver for computing such eigenvalue
approximations.  This is done by first computing all eigenvalues of H_n by
means of the unsymmetric QR algorithm.  It is well known that, due to round-off
in the Lanczos process, some of the eigenvalues of H_n can be "spurious" Ritz
values.  We use the heuristic proposed by Cullum and Willoughby to delete
spurious Ritz values from the set of eigenvalue approximations.  We stress that
the QR algorithm requires O(n^3) operations, and therefore, the driver provided
for computing eigenvalue approximations is only practical if n, the number of
Lanczos steps, is moderate.

    The driver allows us to compute eigenvalue approximations of the original
(unpreconditioned) matrix A, as well as eigenvalue approximations of the
preconditioned matrices that result from combining the original matrix with any
of the preconditioners provided with the linear systems solver.  We decided to
include this latter option so that the codes can be used to study the effect of
the various preconditioners on the spectrum of the coefficient matrices of
linear systems.

    Finally, we remark that the matrix H_n is also generated, at least
implicitly, by the coupled two-term recurrence variant of the look-ahead
Lanczos algorithm.  However, we recommend using the three-term recurrence
variant for computing eigenvalue approximations.


    The Compressed Sparse Row (CSR) example format
    ----------------------------------------------

    We provide an example data format, together with some preconditioners for
the linear systems solvers.  The data format provided is the compressed sparse
row (CSR) format, which is the transpose of the Harwell-Boeing sparse format,
also known as the compressed sparse column (CSC) format.  We provide a utility
in the `csr' subdirectory for converting back and forth between the two formats
(the utility just transposes a matrix in either format).  To compile the
utility program, change to the `csr' subdirectory, do a `make transp', and you
will get an executable named `transp'.  We use the CSR format throughout since
we initially relied on Youcef Saad's SPARSKIT format for the low-level matrix
handling, and we did not want to have to always transpose the matrix once it
was read in (which needed to be done for CSC matrices).  If you usually work
with the CSC format, then keep this in mind (indeed, we ourselves forgot more
than once that the format required is CSR rather than CSC, and solved the
transpose system instead of the one we really wanted).

    We also provide in the `csr' subdirectory utilities to convert CSR
matrices back and forth to raw binary format.  This format will typically take
less space on disk and less time to load; however, it is machine dependent and
binary data files created on one machine may not be compatible with another
machine's architecture.  There is a check in the codes to attempt to determine
whether the binary data is compatible, and if the check fails, you will
receive the message 'Unrecognized binary data file.'

    There are several parameters controlling the maximum sizes of the matrices
that can be used with the example drivers.  For example, the maximum matrix
size is determined by the parameter NDIM in the main driver files (zslal,
zssys, zulal, and zusys).  There is also a parameter NZMAX, which determines
the maximum number of non-zero elements in a sparse matrix.  The ILUT
preconditioner for this same format has two additional parameters, NZLMAX and
NZUMAX, which determine the maximum number of non-zero elements in the ILUT
preconditioner, while the ILLT preconditioner has a parameter NZLMAX.  You
may adjust any of these to fit your test matrices.

    We emphasize that the compressed sparse row format is provided only as an
example, to make the point that all matrix operations are external to the
solvers and to demonstrate how you would incorporate the solvers in your own
code.  The solvers do not require this particular format.  We also stress that
the preconditioners provided are primarily to indicate how you would combine
our solvers with a preconditioner.  We make no claims as to the effectiveness
or efficiency of these preconditioners.  In particular, we did not attempt to
optimize the ILUT preconditioner provided, with regard to either work or
storage, and you will discover if you use our implementation that it is fairly
expensive.  Neither ILUT nor SSOR are guaranteed to work on any given problem,
so do not be surprised if they do not work on yours.


    Compiling the test examples for linear systems
    ----------------------------------------------

    In order to compile the example drivers provided, you will need to generate
a header file `header.mak'.  In the top-level directory, there is a Bourne
shell script called `Setup', that is used to generate `header.mak' and to
make all the basic object files and libraries.  To run the script, change to
its directory and type

	./Setup

    The script is self-explanatory. The header file it generates, `header.mak',
contains various paths and flags specific to your installation and used by
`make' building the codes.  It is included by all the makefiles.  If you want
to change some programs and/or compilation flags, this is one place to do it.
However, be advised that if you make changes in `header.mak' and then later run
the `Setup' script again, your changes will be lost.  You could also edit the
`Setup' script itself to make your changes permanent.  While this is not
exceedingly complicated, it is a bit involved and thus, as they say, ``beyond
the scope'' of these instructions.

    We have been advised that on DEC 5000's running Ultrix, the regular shell
`/bin/sh' might not run the Setup script, instead returning with the error
message:

./Setup: syntax error at line 21: `(' unexpected

    In this case, edit the `Setup' script and replace on the top line the
shell `/bin/sh' with `/usr/bin/sh5'.

    After you run the `Setup' script, you can just type `make' for a list of
available make options.

    Let us actually make an example driver.  This will also serve as a test of
the distribution.  We will make the double precision general linear systems
driver for the compressed sparse row format.  The drivers are set up to produce
verbose output, and to compute the residual norms at every step; this would not
be normally done in a batch run.  The complete steps are as follows:

    1. Unpack the codes.

    2. Run the `Setup' script:

	./Setup

       If this is the first time you run the `Setup' script, this step will
take some time while it builds the entire source code and compiles all the
libraries.  Some compilers may complain about unused variables or data
statements out of place when compiling some of the BLAS and LINPACK routines.
The routines we provide are taken straight from the library sources, and we did
not modify them to remove the warnings.  You are welcome to hack your copy so
as to avoid the complaints.

    3. Change to the CSR data format subdirectory:

	cd csr

    4. Just for fun, let's check the `make' options:

	make

       You should see:

usage:
   make src     - make all source files
   make nosrc   - remove all source files
   make obj     - make all object files
   make noobj   - remove all object files
   make exe     - make all executables
   make noexe   - remove all executables
   make lib     - make all libraries
   make nolib   - remove all libraries
   make clean   - make noexe nolib noobj nosrc
 
   make Pslal   - LAL driver, symmetric (P = c,d,s,z)
   make Pssys   - common driver, symmetric (P = c,z)
   make Pulal   - LAL driver (P = c,d,s,z)
   make Pusys   - common driver, unsymmetric (P = c,d,s,z)
   make Pascraw - ASCII-to-binary converter (P = c,d,s,z)
   make Pcoocsr - COO-to-CSR converter (P = c,d,s,z)
   make Prawasc - binary-to-ASCII converter (P = c,d,s,z)
   make Ptransp - CSR-to-CSC converter (P = c,d,s,z)

    5. Make the double precision general driver:

	make dusys

       On at least one Cray Y-MP the loader complained about duplicate entry
points encountered.  On this system, the loader automatically links in the
system-wide scientific library, which also contains the BLAS routines, and then
the loader finds overlap between this library and the ones we provide.  If you
also get such warnings, they can be ignored.

    6. Run the driver:

	./dusys

       The following is the input for the test example; the output --- obtained
on several machines --- is listed in the next section.  We'll test the coupled
QMR algorithm with look-ahead, with the two-sided SSOR preconditioner with
parameter 1.0.  Your inputs are marked with "<==":

Enter CSR data file name: data/ducsr.dat		<==
 A)SCII or B)inary data ? A				<==
TITLE :  7-POINT TEST MATRIX FROM SPARSKIT                                      
KEY   :   7-P     
TYPE  :       RUA
NDIM  :     10000
NROW  :       225
NCOL  :       225
NZMAX :    300000
NNZ   :      1065
NRHS  :         0
Enter rhs data file name     : data/ducsrb.dat		<==
Enter starting guess file    : data/ducsrx.dat		<==
Choices of algorithm         : 1 = CPL
                             : 2 = CPX
                             : 3 = QBG
                             : 4 = QMR
                             : 5 = QMX
                             : 6 = TFX
Select an algorithm          : 1			<==
Enter convergence tolerance  : 1.0e-10			<==
Maximum number of steps NLIM : 30			<==
Enter estimated norm for P&Q : 1.0			<==
Enter estimated norm for V&W : 1.0			<==
Choices of preconditioner    : 0 = no prec
                               1 = Left ILUT
                               2 = Right ILUT
                               3 = Two-sided ILUT
                               4 = Left SSOR
                               5 = Right SSOR
                               6 = Two-sided SSOR
Select a preconditioner      : 6			<==
Enter SSOR parameter OMEGA   : 1.0			<==

[ example outputs listed below ]

The residual norm has converged.
Run again (Y/N) ? N					<==


    Test output
    -----------

    We ran the test example above on several machines, and obtained the outputs
listed below.  You may want to check your output against these; hopefully your
output is at least close to one of the ones below.  The solution for the matrix
given (ducsr.dat with the given ducsrb.dat) is the vector of all ones, so even
if the output does not match any of the ones below, you can always check the
solution.

Cray Y-MP M92/256, UNICOS 7.0.4, cf77, no optimizer:
       0 0.1000E+01 0.1000E+01
       1 0.1357E+01 0.7958E+00
       2 0.1416E+01 0.3251E+00
       3 0.1519E+01 0.4227E+00
       4 0.1596E+01 0.5553E+00
       5 0.1732E+01 0.5023E+00
       6 0.1855E+01 0.4560E+00
       7 0.1961E+01 0.3959E+00
       8 0.1982E+01 0.3133E+00
       9 0.2021E+01 0.3444E+00
Vector    11 (VW) is inner
Vector    12 (VW) is inner
Vector    13 (VW) is inner
VW block did not close:
... updated norms, restarting from step:   10 0.1209E+03 0.1209E+03
Rerunning vector    11 (PQ) as regular
Rebuilding V&W:      12
      10 0.1355E+01 0.2804E+00
      11 0.9054E+00 0.3056E+00
      12 0.4708E+00 0.1855E+00
      13 0.1305E+00 0.4414E-01
      14 0.2550E-01 0.7989E-02
      15 0.2346E-01 0.3688E-02
      16 0.4508E-02 0.9492E-03
      17 0.1606E-02 0.4685E-03
      18 0.2347E-03 0.4397E-04
      19 0.4188E-04 0.1055E-04
      20 0.3224E-04 0.1314E-05
      21 0.5856E-06 0.1244E-06
      22 0.1639E-06 0.4203E-07
      23 0.5842E-08 0.1151E-08
      24 0.4598E-08 0.4918E-09
      25 0.3388E-08 0.5688E-09
      26 0.2016E-09 0.4024E-10

IBM RS6000, AIX 3.2, xlf, no optimizer:
       0  .1000E+01  .1000E+01
       1  .8217E+00  .6945E+00
       2  .8919E+00  .3904E+00
       3  .9041E+00  .4687E+00
       4  .9196E+00  .4926E+00
       5  .8413E+00  .2578E+00
       6  .9046E+00  .2507E+00
       7  .8815E+00  .2702E+00
       8  .8935E+00  .2649E+00
       9  .7116E+00  .2472E+00
      10  .5951E+00  .2535E+00
      11  .8058E-01  .2684E-01
      12  .8347E-01  .2446E-01
Vector    13 (PQ) is inner
      13  .1770E-01  .5519E-02
      14  .1555E-02  .3952E-03
      15  .1117E-02  .2780E-03
      16  .5722E-03  .1803E-03
      17  .1154E-03  .3314E-04
      18  .2057E-04  .3812E-05
      19  .4047E-05  .9992E-06
      20  .3913E-05  .6110E-06
      21  .2762E-07  .5874E-08
      22  .1599E-07  .4629E-08
      23  .1274E-08  .2605E-09
      24  .1294E-08  .2335E-09
      25  .6871E-09  .1425E-09
      26  .2880E-09  .7320E-10

SGI, IRIX 4.0.5, f77, no optimizer:
       0 0.1000E+01 0.1000E+01
       1 0.8217E+00 0.6945E+00
       2 0.8919E+00 0.3904E+00
       3 0.9041E+00 0.4687E+00
       4 0.9196E+00 0.4926E+00
       5 0.8413E+00 0.2578E+00
       6 0.9046E+00 0.2507E+00
       7 0.8815E+00 0.2702E+00
       8 0.8935E+00 0.2649E+00
       9 0.7116E+00 0.2472E+00
      10 0.5951E+00 0.2535E+00
      11 0.8058E-01 0.2684E-01
      12 0.8347E-01 0.2446E-01
Vector    13 (PQ) is inner
      13 0.1770E-01 0.5519E-02
      14 0.1555E-02 0.3952E-03
      15 0.1117E-02 0.2780E-03
      16 0.5722E-03 0.1803E-03
      17 0.1154E-03 0.3314E-04
      18 0.2057E-04 0.3812E-05
      19 0.4047E-05 0.9992E-06
      20 0.3913E-05 0.6110E-06
      21 0.2758E-07 0.5866E-08
      22 0.1560E-07 0.4505E-08
      23 0.1374E-08 0.2973E-09
      24 0.9364E-10 0.1921E-10

Sparc 10/42, SunOS 4.1.3, f77, no optimizer:
Stardent 3000/1500, TitanOS 4.2, f77. no optimizer:
       0 0.1000E+01 0.1000E+01
       1 0.8217E+00 0.6945E+00
       2 0.8919E+00 0.3904E+00
       3 0.9041E+00 0.4687E+00
       4 0.9196E+00 0.4926E+00
       5 0.8413E+00 0.2578E+00
       6 0.9046E+00 0.2507E+00
       7 0.8815E+00 0.2702E+00
       8 0.8935E+00 0.2649E+00
       9 0.7116E+00 0.2472E+00
      10 0.5951E+00 0.2535E+00
      11 0.8058E-01 0.2684E-01
      12 0.8347E-01 0.2446E-01
Vector    13 (PQ) is inner
      13 0.1770E-01 0.5519E-02
      14 0.1555E-02 0.3952E-03
      15 0.1117E-02 0.2780E-03
      16 0.5722E-03 0.1803E-03
      17 0.1154E-03 0.3314E-04
      18 0.2057E-04 0.3812E-05
      19 0.4047E-05 0.9992E-06
      20 0.3913E-05 0.6110E-06
      21 0.2758E-07 0.5866E-08
      22 0.1560E-07 0.4505E-08
      23 0.1373E-08 0.2972E-09
      24 0.9233E-10 0.1893E-10

    The results for the Cray are markedly different from the others because
the auxiliary starting vector is different (the Cray integers are longer, so
the random number generator produces a different sequence of numbers).

    We also provide test examples for the complex codes, the files zscsr.dat
and zucsr.dat in the data subdirectory.  The right-hand sides are zscsrb.dat
and zucsrb.dat, respectively, and their exact solutions are in zsexact.dat and
zuexact.dat, so you can compare the output of your runs.  A starting guess for
either is provided in zcsrx.dat.


    Compiling the test examples for eigenvalue approximations
    ---------------------------------------------------------

    Let us now test the example eigenvalue driver.  To make it, use:

	make dulal

then run it:

	./dulal

       The following is the input for the test example; the output --- obtained
on several machines --- is listed at the end.  Your inputs are marked with
"<==":

Enter CSR data file name: data/ducsr.dat		<==
 A)SCII or B)inary data ? A				<==
TITLE :  7-POINT TEST MATRIX FROM SPARSKIT                                      
KEY   :   7-P     
TYPE  :       RUA
NDIM  :      1000
NROW  :       225
NCOL  :       225
NZMAX :    300000
NNZ   :      1065
NRHS  :         0
Maximum number of steps NLIM : 50			<==
Enter estimated norm for V&W : 1.0			<==
Choices of preconditioner    : 0 = no prec
                               1 = Left ILUT
                               2 = Right ILUT
                               3 = Two-sided ILUT
                               4 = Left SSOR
                               5 = Right SSOR
                               6 = Two-sided SSOR
Select a preconditioner      : 6			<==
Enter SSOR parameter OMEGA   : 1.0			<==
Running look-ahead Lanczos...

[ the look-ahead steps vary among machines ]

Number of Lanczos steps completed        :   50
Number of eigenvalues to compute (0=done): 16		<==
Computing eigenvalues...
Computing check eigenvalues...
Number of Lanczos eigenvalues found      :   16

Number of Lanczos steps completed        :   50
Number of eigenvalues to compute (0=done): 32		<==
Computing eigenvalues...
Computing check eigenvalues...
Number of Lanczos eigenvalues found      :   30

Find the common eigenvalues (1=Yes/0=No) ? 1		<==
Enter separation tolerance               : 1.0e-3	<==
Number of common eigenvalues found       :    4

Find the common eigenvalues (1=Yes/0=No) ? 1		<==
Enter separation tolerance               : 1.0e-2	<==
Number of common eigenvalues found       :    7

Find the common eigenvalues (1=Yes/0=No) ? 0		<==

Number of Lanczos steps completed        :   50
Number of eigenvalues to compute (0=done): 0		<==
The algorithm terminated normally.
Run again (Y/N) ? N					<==

    In the example, we ran the look-ahead Lanczos algorithm for 50 steps, then
computed the eigenvalue approximations at steps 16 and 32.  With a separation
tolerance of 1.0E-3, there were 4 common eigenvalue approximations, while with
a separation tolerance of 1.0E-2, there were 7 common eigenvalues (this on all
machines tested other than the Cray).  The approximations found are listed in
the file `eig.out', created by the solver.  The common eigenvalues found at
tolerance 1.0E-2 are:

Cray Y-MP M92/256, UNICOS 7.0.4, cf77, no optimizer:
Common eigenvalues, tolerance:.1000000E-01
  0.99999472402621734E+00  0.00000000000000000E+00
  0.99796583307602046E+00  0.00000000000000000E+00
  0.98396790041903365E+00  0.00000000000000000E+00
  0.96923303090778475E+00  0.00000000000000000E+00
  0.94364929964067172E+00  0.00000000000000000E+00
  0.90383738388304345E+00  0.00000000000000000E+00
  0.85673784003938915E+00  0.00000000000000000E+00
  0.64184130684845695E+00  0.00000000000000000E+00
  0.41095918929404316E+00  0.00000000000000000E+00
  0.28150025250651100E+00  0.00000000000000000E+00
  0.79612698325885702E-01  0.00000000000000000E+00
  0.75510886263847297E-01  0.00000000000000000E+00
 -0.18086438441356728E+00  0.00000000000000000E+00

IBM RS6000, AIX 3.2, xlf, no optimizer:
Common eigenvalues, tolerance:.1000000E-01
   .99995715561022502E+00   .00000000000000000E+00
   .99528047978514966E+00   .00000000000000000E+00
   .96681646229203844E+00   .00000000000000000E+00
   .39081261115160149E+00   .00000000000000000E+00
   .25744154934419206E+00   .00000000000000000E+00
   .75510886263718949E-01   .00000000000000000E+00
  -.18086438441355990E+00   .00000000000000000E+00

SGI, IRIX 4.0.5, f77, no optimizer:
Common eigenvalues, tolerance:.1000000E-01
  0.99995715558101299E+00  0.00000000000000000E+00
  0.99528047882318464E+00  0.00000000000000000E+00
  0.96681645898494750E+00  0.00000000000000000E+00
  0.39081261115288800E+00  0.00000000000000000E+00
  0.25744154934388667E+00  0.00000000000000000E+00
  0.75510886263770586E-01  0.00000000000000000E+00
 -0.18086438441356611E+00  0.00000000000000000E+00

DEC 5000, Ultrix 4.4, f77, no optimizer:
SGI, IRIX 5.3, f77, no optimizer:
Sparc 10/42, SunOS 4.1.3, f77, no optimizer:
Stardent 3000/1500, TitanOS 4.2, f77. no optimizer:
Common eigenvalues, tolerance:.1000000E-01
  0.99995715557553089E+00  0.00000000000000000E+00
  0.99528047864269165E+00  0.00000000000000000E+00
  0.96681645836559993E+00  0.00000000000000000E+00
  0.39081261115453225E+00  0.00000000000000000E+00
  0.25744154934335983E+00  0.00000000000000000E+00
  0.75510886263851829E-01  0.00000000000000000E+00
 -0.18086438441356778E+00  0.00000000000000000E+00


    Copyrights
    ----------

    Note that most of the codes in the distribution are copyrighted.  The only
routines that are not copyrighted are those from the support libraries.  The
full text of our copyright notice can be found in the file `cpyrit.doc', in the
top-level directory.  The copyright is there mainly for two reasons: first, we
want to make sure you understand that we do not warrant these codes to do
anything at all.  We are distributing them for free, and think they don't do
anything whatsoever.  For all intent and purpose, any description of what the
codes are doing should be construed as being a note of what we thought the
codes did on our machine on a particular Tuesday of last year.  If you're
really lucky, they might do the same for you someday.  Then again, do you
really feel *that* lucky?   Having said that, the second purpose for the
copyright is to protect our effort.  We have put time and effort in these
codes, and while we do not mind you using them for research, we do not want
you to sell them without our knowledge.  So, if you want to make any profit
from these codes, you have to have our permission.  You are allowed to use the
codes for your own research.  You are also allowed to distribute the codes, as
long as you do not charge for this more than the cost of the media and a
reasonable handling fee; an example of what we mean by ``reasonable'' is
something not more than twice the U.S. minimum wage (around $5/hr in 1992).
But you are not allowed to sell any part of these codes, either alone or
incorporated in some product you might design, without our permission.  You
are, of course, welcome to code up your own versions of these algorithms.

    Note that you may not remove or alter the copyright notice, even in your
own copies of the codes.  We have had instances where people removed the
copyright notice and all traces of our names from the codes.  We do not
appreciate such behavior; it is also a violation of the copyright notice.  If
you disagree with any of the terms of the copyright notice, or indeed, with
anything else in our approach, then do not use our codes.

    Finally, lest you think us anti-social, we have been known to give our
permission to incorporate our codes in commercial products, but we like to
know about it and to have some legal papers that spell things out.


    Closing notes
    -------------

    We would be interested in hearing about your experience with the codes,
especially bugs (what bugs?), smashing success stories, and stunning failures.
We can be contacted at freund@research.att.com (Roland Freund) and
santa@msr.epm.ornl.gov (Noel Nachtigal).  Also note that we make no claims
about the efficiency of the support codes and preconditioners.  Finally, if
you use these codes in your research, please reference, as appropriate, one
or more of the following:

(For eigenvalue computations)
Roland W. Freund, Martin H. Gutknecht, and Noel M. Nachtigal.
An Implementation of the Look-Ahead Lanczos Algorithm for Non-Hermitian
Matrices.
SIAM Journal on Scientific Computing, vol 14, 1993, pp. 137--158.

(For linear systems -- QMR, based on three-term recurrences)
Roland W. Freund and Noel M. Nachtigal.
QMR: a Quasi-Minimal Residual Method for Non-Hermitian Linear System.
Numerische Mathematik, vol 60, 1991, pp. 315--339.

(For linear systems -- QMR, based on coupled two-term recurrences)
Roland W. Freund and Noel M. Nachtigal.
An Implementation of the QMR Method Based on Coupled Two-Term Recurrences.
SIAM Journal on Scientific Computing, vol 15, 1994, pp. 313--337.

(For linear systems -- TFQMR)
Roland W. Freund.
A Transpose-Free  Quasi-Minimal  Residual Algorithm  for Non-Hermitian Linear
Systems.
SIAM Journal on Scientific Computing, vol 14, 1993, pp. 470-482.

(For QMR and TFQMR for singular systems)
Roland W. Freund and Marlis Hochbruck.
On the Use of two QMR Algorithms for Solving Singular Systems
and Applications in Markov Chain Modeling.
Technical Report 91.25, RIACS, NASA Ames Research Center, December 1991.

In case you wish to find out more about how the codes work, the following
references describe it in some detail:

(For the heuristics used in the eigenvalue computations)
Jane Cullum and Ralph A. Willoughby.
A Practical Procedure for Computing Eigenvalues of Large Sparse Nonsymmetric
Matrices.
In: Large Scale Eigenvalue Problems (J. Cullum and R.A. Willoughby, eds),
North-Holland, 1986, pp. 193--240.

(For a description of the Harwell-Boeing format)
I.S. Duff, R.G Grimes, and J.G. Lewis.
Sparse Matrix Test Problems.
ACM Transactions of Mathematical Software, vol 15, 1989, pp. 1--14.

(For a description of the implementation of two-sided SSOR)
Stanley C. Eisenstat.
Efficient Implementation of a Class of Preconditioned
Conjugate Gradient Methods
SIAM Journal on Scientific and Statistical Computing, vol 2, 1981, pp. 1--4.

(For the basic implementation of the look-ahead Lanczos algorithm)
Roland W. Freund, Martin H. Gutknecht, and Noel M. Nachtigal.
An Implementation of the Look-Ahead Lanczos Algorithm for Non-Hermitian
Matrices, Part I.
Technical Report 90.45, RIACS, NASA Ames Research Center, November 1990.

(For the QMR algorithm, based on three-term recurrences)
Roland W. Freund and Noel M. Nachtigal.
An Implementation of the Look-Ahead Lanczos Algorithm for Non-Hermitian
Matrices, Part II.
Technical Report 90.46, RIACS, NASA Ames Research Center, November 1990.

(For linear systems -- QMR, based on coupled two-term recurrences)
Roland W. Freund and Noel M. Nachtigal.
Implementation Details of the Coupled QMR Algorithm.
In: Numerical Linear Algebra (L. Reichel, A. Ruttan, and R.S. Varga, eds.),
W. de Gruyter, Berlin, 1993, pp. 123--140.

(For a description of the SPARSKIT package)
Youcef Saad.
SPARSKIT: a Basic Tool Kit for Sparse Matrix Computations.
Technical Report 90.20, RIACS, NASA Ames Research Center, May 1990.

    -------------------
						Noel M. Nachtigal
						santa@msr.epm.ornl.gov

						Roland W. Freund
						freund@research.att.com

						08/15/93
						10/15/93 (revised, v1.1)
						12/15/93 (revised, v1.11)
						01/07/94 (revised, v1.2)
						06/10/94 (revised, v1.3)

