Footnotes

...
If we tried to compute the trivial eigenvalues in the same way as the nontrivial ones, that is by taking ratios of the leading diagonal entries of and , we would get 0/0. For a detailed mathematical discussion of this decomposition, see the discussion of the Kronecker Canonical Form in [43].

...
If , we may add some zero rows to to make it upper triangular.

...
This is the case on Cybers, Cray X-MP, Cray Y-MP, Cray 2 and Cray C90.

...
See subsection 2.1.3 for explanation of the naming convention used for LAPACK routines.

...
Important machines that do not implement the IEEE standard include the Cray XMP, Cray YMP, Cray 2, Cray C90, IBM 370 and DEC Vax. Some architectures have two (or more) modes, one that implements IEEE arithmetic, and another that is less accurate but faster.

...
Machines implementing IEEE arithmetic can continue to compute past overflows, and even division by zero, square roots of negative numbers, etc., by producing infinity and NaN (``Not a Number'') symbols. These are special floating-point numbers subject to special rules of arithmetic. The default on many systems is to continue computing with these symbols, rather than giving an error message, which would often be more convenient for debugging. It is also possible to stop with an error message. The user should consult the system manual to see how to turn error messages on or off.                    

...
Sometimes our algorithms satisfy only where both and are small. This does not significantly change the following analysis.

...
More generally, we only need Lipschitz continuity of , and may use the Lipschitz constant in place of in deriving error bounds.

...
This is a different use of the term ill-posed than used in other contexts. For example, to be well-posed (not ill-posed) in the sense of Hadamard, it is sufficient for to be continuous, whereas we require Lipschitz continuity.

...
There are some caveats to this statement. When computing the inverse of a matrix, the backward error is small taking the columns of the computed inverse one at a time, with a different for each column [38]. The same is true when computing the eigenvectors of a nonsymmetric matrix. When computing the eigenvalues and eigenvectors of , or , with symmetric and symmetric and positive definite (using xSYGV or xHEGV) then the method may not be backward normwise stable if         has a large condition number , although it has useful error bounds in this case too (see section 4.10). Solving the Sylvester equation   for the matrix may not be backward stable, although there are again useful error bounds for [54].

...
For other algorithms, the answers (and computed error bounds) are as accurate as though the algorithms were componentwise relatively backward stable, even though they are not. These algorithms are called componentwise relatively forward stable.

...
As discussed in section 4.2, this approximate error bound may underestimate the true error by a factor which is a modestly growing function of the problem dimension . Often .

...
This and other numerical examples were computed in IEEE single precision arithmetic [4] on a DEC 5000/120 workstation.

...
These bounds are special cases of those in section 4.8.

...
Although such a one-to-one correspondence between computed and true eigenvalues exists, it is not as simple to describe as in the symmetric case. In the symmetric case the eigenvalues are real and simply sorting provides the one-to-one correspondence, so that and . With nonsymmetric matrices is usually just the computed eigenvalue closest to , but in very ill-conditioned problems this is not always true. In the most general case, the one-to-one correspondence may be described in the following nonconstructive way: Let be the eigenvalues of and be the eigenvalues of . Let be the eigenvalues of , where is a parameter, which is initially zero, so that we may set . As increase from 0 to 1, traces out a curve from to , providing the correspondence. Care must be taken when the curves intersect, and the correspondence may not be unique.

...
These bounds are special cases of those in sections 4.7 and 4.8, since the singular values and vectors of are simply related to the eigenvalues and eigenvectors of the Hermitian matrix [p. 427]GVL2.

...
This bound is guaranteed only if the Level 3 BLAS are implemented in a conventional way, not in a fast way as described in section 4.13.

...
Another interpretation of chordal distance is as half the usual Euclidean distance between the projections of and on the Riemann sphere, i.e., half the length of the chord connecting the projections.

...
(Input or output) means that the argument may be either an input argument or an output argument, depending on the values of other arguments; for example, in the xyySVX driver routines, some arguments are used either as output arguments to return details of a factorization, or as input arguments to supply details of a previously computed factorization.

...
(Workspace/output) means that the argument is used principally as a work array, but may also return some useful information (in its first element)

...
Changing DBLE to DREAL must be selective, because instances of DBLE with an integer argument must not be changed. The compiler should flag any instances of DBLE with a COMPLEX*16 argument if it does not accept them.

...
The requirement is stated ``LDA max(1,N)'' rather than simply ``LDA N'' because LDA must always be at least 1, even if N = 0, to satisfy the requirements of standard Fortran; on some systems, a zero or negative value of LDA would cause a run-time fault.


Tue Nov 29 14:03:33 EST 1994

LAPACK Users' Guide <BR>Release 2.0



next up previous contents index
Next: Contents

LAPACK Users' Guide - Release 2.0

  • E. Anderson,
  • Z. Bai,
  • C. Bischof,
  • J. Demmel,
  • J. Dongarra,
  • J. Du Croz,
  • A. Greenbaum,
  • S. Hammarling,
  • A. McKenney,
  • S. Ostrouchov,
  • D. Sorensen

    30 September 1994

    This work is dedicated to Jim Wilkinson whose ideas and spirit have given us inspiration and influenced the project at every turn.


    1994 by the Society for Industrial and Applied Mathematics. Certain derivative work portions have been copyrighted by the Numerical Algorithms Group Ltd.

    The printed version of LAPACK Users' Guide, Second Edition will be available from SIAM in February 1995. The list price is $28.50 and the SIAM Member Price is $22.80.Contact SIAM for additional information.

  • click here to send e-mail to service@siam.org
  • fax: 215-386-7999
  • phone: (USA) 800-447-SIAM
  • (outside USA) 215-386-7999
  • mail: SIAM, 3600 University City Science Center, Philadelphia, PA 19104-2688.

    The royalties from the sales of this book are being placed in a fund to help students attend SIAM meetings and other SIAM related activities. This fund is administered by SIAM and qualified individuals are encouraged to write directly to SIAM for guidelines.



    Tue Nov 29 14:03:33 EST 1994

    Contents



    next up previous index
    Next: List of Tables Up: LAPACK Users' Guide Release Previous: LAPACK Users' Guide Release

    Contents




    Tue Nov 29 14:03:33 EST 1994

    LAPACK Compared with LINPACK and EISPACK



    next up previous contents index
    Next: LAPACK and the Up: Essentials Previous: Computers for which

    LAPACK Compared with LINPACK and EISPACK

    LAPACK has been designed to supersede LINPACK [26]   and EISPACK [44] [70]  , principally by restructuring the software to achieve much greater efficiency, where possible, on modern high-performance computers; also by adding extra functionality, by using some new or improved algorithms, and by integrating the two sets of algorithms into a unified package.

    Appendix D lists the LAPACK counterparts of LINPACK and EISPACK routines. Not all the facilities of LINPACK and EISPACK are covered by Release 2.0 of LAPACK.




    Tue Nov 29 14:03:33 EST 1994

    Design and Documentation of Argument Lists



    next up previous contents index
    Next: Structure of the Up: Documentation and Software Previous: Documentation and Software

    Design and Documentation of Argument Lists

     

    The argument lists of all LAPACK routines conform to a single set of conventions for their design and documentation.

    Specifications of all LAPACK driver and computational routines are given in Part 2. These are derived from the specifications given in the leading comments in the code, but in Part 2 the specifications for real and complex versions of each routine have been merged, in order to save space.






    Tue Nov 29 14:03:33 EST 1994

    Structure of the Documentation



    next up previous contents index
    Next: Order of Arguments Up: Design and Documentation Previous: Design and Documentation

    Structure of the Documentation

     

    The documentation   of each LAPACK routine includes:




    Tue Nov 29 14:03:33 EST 1994

    Order of Arguments



    next up previous contents index
    Next: Argument Descriptions Up: Design and Documentation Previous: Structure of the

    Order of Arguments

     

    Arguments   of an LAPACK routine appear in the following order:




    Tue Nov 29 14:03:33 EST 1994

    Argument Descriptions



    next up previous contents index
    Next: Option Arguments Up: Design and Documentation Previous: Order of Arguments

    Argument Descriptions

     

    The style of the argument   descriptions is illustrated by the following example:

    The description of each argument gives:




    Tue Nov 29 14:03:33 EST 1994

    Option Arguments



    next up previous contents index
    Next: Problem Dimensions Up: Design and Documentation Previous: Argument Descriptions

    Option Arguments

     

    Arguments   specifying options are usually of type CHARACTER*1. The meaning of each valid value is given  , as in this example:

    The corresponding lower-case characters may be supplied (with the same meaning), but any other value is illegal (see subsection 5.1.8).

    A longer character string can be passed as the actual argument, making the calling program more readable, but only the first character is significant; this is a standard feature of Fortran 77. For example:

           CALL SPOTRS('upper', . . . )




    Tue Nov 29 14:03:33 EST 1994

    Problem Dimensions



    next up previous contents index
    Next: Array Arguments Up: Design and Documentation Previous: Option Arguments

    Problem Dimensions

     

    It is permissible for the problem   dimensions to be passed as zero, in which case the computation (or part of it) is skipped. Negative dimensions are regarded as erroneous.




    Tue Nov 29 14:03:33 EST 1994

    Array Arguments



    next up previous contents index
    Next: Work Arrays Up: Design and Documentation Previous: Problem Dimensions

    Array Arguments

     

    Each two-dimensional array argument   is immediately followed in the argument list by its leading dimension  , whose name has the form LD<array-name>. For example:

    It should be assumed, unless stated otherwise, that vectors and matrices are stored in one- and two-dimensional arrays in the conventional manner. That is, if an array X of dimension (N) holds a vector , then X(i) holds for
    i = 1,..., n. If a two-dimensional array A of dimension (LDA,N) holds an m-by-n matrix A, then A(i,j) holds for i = 1,..., m and j = 1,..., n (LDA must be at least m). See Section 5.3 for more about storage of matrices.

    Note that   array arguments are usually declared in the software as assumed-size arrays (last dimension *), for example:

          REAL A( LDA, * )
    although the documentation gives the dimensions as (LDA,N). The latter form is more informative since it specifies the required minimum value of the last dimension. However an assumed-size array declaration has been used in the software, in order to overcome some limitations in the Fortran 77 standard. In particular it allows the routine to be called when the relevant dimension (N, in this case) is zero. However actual array dimensions in the calling program must be at least 1 (LDA in this example).




    Tue Nov 29 14:03:33 EST 1994

    Work Arrays



    next up previous contents index
    Next: Error Handling and Up: Design and Documentation Previous: Array Arguments

    Work Arrays

     

    Many LAPACK routines require one or more work arrays   to be passed as arguments. The name of a work array is usually WORK - sometimes IWORK, RWORK or BWORK to distinguish work arrays of integer, real or logical (Boolean) type.

    Occasionally the first element of a work array is used to return some useful information: in such cases, the argument is described as (workspace/output) instead of simply (workspace).

    A number of routines implementing block algorithms require workspace sufficient to hold one block of rows or columns of the matrix, for example, workspace of size n-by-nb, where nb is the block size. In such cases, the actual declared length of the work array must be passed as a separate argument LWORK  , which immediately follows WORK in the argument-list.

    See Section 5.2 for further explanation.




    Tue Nov 29 14:03:33 EST 1994

    Error Handling and the Diagnostic Argument INFO



    next up previous contents index
    Next: Determining the Block Up: Design and Documentation Previous: Work Arrays

    Error Handling and the Diagnostic Argument INFO

     

    All   documented routines   have a diagnostic argument INFO   that indicates the success or failure of the computation, as follows:

    All driver and auxiliary routines check that input arguments such as N or LDA or option arguments of type character have permitted values. If an illegal value of the i-th argument is detected, the routine sets INFO = -i, and then calls an error-handling routine XERBLA.    

    The standard version of XERBLA issues an error message and halts execution,   so that no LAPACK routine would ever return to the calling program with INFO < 0. However, this might occur if a non-standard version of XERBLA is used.




    Tue Nov 29 14:03:33 EST 1994

    Determining the Block Size for Block Algorithms



    next up previous contents index
    Next: Matrix Storage Schemes Up: Documentation and Software Previous: Error Handling and

    Determining the Block Size for Block Algorithms

     

    LAPACK routines that implement block algorithms need to determine what block size   to use. The intention behind the design of LAPACK is that the choice of block size should be hidden from users as much as possible, but at the same time easily accessible to installers of the package when tuning LAPACK for a particular machine.

    LAPACK routines call an auxiliary enquiry function ILAENV  , which returns the optimal block size to be used, as well as other parameters. The version of ILAENV   supplied with the package contains default values that led to good behavior over a reasonable number of our test machines, but to achieve optimal performance, it may be beneficial to tune ILAENV   for your particular machine environment. Ideally a distinct implementation of ILAENV is needed for each machine environment (see also Chapter 6). The optimal block size may also depend on the routine, the combination of option arguments (if any), and the problem dimensions.

    If ILAENV   returns a block size of 1, then the routine performs the unblocked algorithm, calling Level 2 BLAS, and makes no calls to Level 3 BLAS.

    Some LAPACK routines require a work array whose size is proportional to the block size (see subsection 5.1.7). The actual length of the work array is supplied as an argument LWORK. The description of the arguments WORK and LWORK typically goes as follows:

    The routine determines the block size to be used by the following steps:

    1. the optimal block size is determined by calling ILAENV;

    2. if the value of LWORK indicates that enough workspace has been supplied, the routine uses the optimal block size;

    3. otherwise, the routine determines the largest block size that can be used with the supplied amount of workspace;

    4. if this new block size does not fall below a threshold value (also returned by ILAENV), the routine uses the new value;

    5. otherwise, the routine uses the unblocked algorithm.

    The minimum value of LWORK that would be needed to use the optimal block size, is returned in WORK(1).

    Thus, the routine uses the largest block size allowed by the amount of workspace supplied, as long as this is likely to give better performance than the unblocked algorithm. WORK(1) is not always a simple formula in terms of N and NB.

    The specification of LWORK gives the minimum value for the routine to return correct results. If the supplied value is less than the minimum - indicating that there is insufficient workspace to perform the unblocked algorithm - the value of LWORK is regarded as an illegal value, and is treated like any other illegal argument value (see subsection 5.1.8).

    If in doubt about how much workspace to supply, users should supply a generous amount (assume a block size of 64, say), and then examine the value of WORK(1) on exit.



    next up previous contents index
    Next: Matrix Storage Schemes Up: Documentation and Software Previous: Error Handling and




    Tue Nov 29 14:03:33 EST 1994

    LAPACK and the BLAS



    next up previous contents index
    Next: Documentation for LAPACK Up: Essentials Previous: LAPACK Compared with

    LAPACK and the BLAS

    LAPACK routines are written so that as much as possible of the computation is performed by calls to the Basic Linear Algebra Subprograms (BLAS) [28] [30] [58]  . Highly efficient machine-specific implementations of the BLAS are available for many modern high-performance computers. The BLAS enable LAPACK routines to achieve high performance with portable code. The methodology for constructing LAPACK routines in terms of calls to the BLAS is described in Chapter 3.

    The BLAS are not strictly speaking part of LAPACK, but Fortran 77 code for the BLAS is distributed with LAPACK, or can be obtained separately from netlib (see below). This code constitutes the ``model implementation'' [27] [29].

    The model implementation is not expected to perform as well as a specially tuned implementation on most high-performance computers - on some machines it may give much worse performance - but it allows users to run LAPACK codes on machines that do not offer any other implementation of the BLAS.




    Tue Nov 29 14:03:33 EST 1994

    Matrix Storage Schemes



    next up previous contents index
    Next: Conventional Storage Up: Documentation and Software Previous: Determining the Block

    Matrix Storage Schemes

     

    LAPACK allows the following different storage schemes   for matrices:

    These storage schemes are compatible with those used in LINPACK   and the BLAS, but EISPACK   uses incompatible schemes for band and tridiagonal matrices.

    In the examples below, indicates an array element that need not be set and is not referenced by LAPACK routines. Elements that ``need not be set'' are never read, written to, or otherwise accessed by the LAPACK routines. The examples illustrate only the relevant part of the arrays; array arguments may of course have additional rows or columns, according to the usual rules for passing array arguments in Fortran 77.






    Tue Nov 29 14:03:33 EST 1994

    Conventional Storage



    next up previous contents index
    Next: Packed Storage Up: Matrix Storage Schemes Previous: Matrix Storage Schemes

    Conventional Storage

     

    The default scheme for storing matrices   is the obvious one described in subsection 5.1.6: a matrix A is stored in a two-dimensional array A, with matrix element stored in array element A(i,j).

    If a matrix is triangular   (upper or lower, as specified by the argument UPLO), only the elements of the relevant triangle are accessed. The remaining elements of the array need not be set. Such elements are indicated by * in the examples below. For example, when n = 4:

    Similarly, if the matrix is upper Hessenberg, elements below the first subdiagonal need not be set.

    Routines that handle symmetric   or Hermitian   matrices allow for either the upper or lower triangle of the matrix (as specified by UPLO) to be stored in the corresponding elements of the array; the remaining elements of the array need not be set. For example, when n = 4:




    Tue Nov 29 14:03:33 EST 1994

    Packed Storage



    next up previous contents index
    Next: Band Storage Up: Matrix Storage Schemes Previous: Conventional Storage

    Packed Storage

     

    Symmetric, Hermitian or triangular matrices may be stored more compactly  , if the relevant triangle (again as specified by UPLO) is packed by columns in a one-dimensional array. In LAPACK, arrays that hold matrices in packed storage, have names ending in `P'. So:

    For example:

    Note that for real or complex symmetric matrices, packing the upper triangle by columns is equivalent to packing the lower triangle by rows; packing the lower triangle by columns is equivalent to packing the upper triangle by rows. For complex Hermitian matrices, packing the upper triangle by columns is equivalent to packing the conjugate of the lower triangle by rows; packing the lower triangle by columns is equivalent to packing the conjugate of the upper triangle by rows.




    Tue Nov 29 14:03:33 EST 1994

    Band Storage



    next up previous contents index
    Next: Tridiagonal and Bidiagonal Up: Matrix Storage Schemes Previous: Packed Storage

    Band Storage

     

    An m-by-n band matrix   with kl subdiagonals and ku superdiagonals may be stored compactly in a two-dimensional array with kl + ku + 1 rows and n columns. Columns of the matrix are stored in corresponding columns of the array, and diagonals of the matrix are stored in rows of the array. This storage scheme should be used in practice only if kl , ku << min(m , n), although LAPACK routines work correctly for all values of kl and ku. In LAPACK, arrays that hold matrices in band storage have names ending in `B'.

    To be precise, is stored in AB(ku + 1 + i - j , j) for max(1 , j - ku) < = i < = min(m , j + kl). For example, when m = n = 5, kl = 2 and ku = 1:

    The elements marked * in the upper left and lower right corners of the array AB need not be set, and are not referenced by LAPACK routines.

    Note: when a band matrix is supplied for LU factorization, space   must be allowed to store an additional kl superdiagonals, generated by fill-in as a result of row interchanges. This means that the matrix is stored according to the above scheme, but with kl + ku superdiagonals.

    Triangular band matrices are stored in the same format, with either kl = 0 if upper triangular, or ku = 0 if lower triangular.

    For symmetric or Hermitian band matrices with kd subdiagonals or superdiagonals, only the upper or lower triangle (as specified by UPLO) need be stored:

    For example, when n = 5 and kd = 2:

    EISPACK   routines use a different storage scheme for band matrices, in which rows of the matrix are stored in corresponding rows of the array, and diagonals of the matrix are stored in columns of the array (see Appendix D).




    Tue Nov 29 14:03:33 EST 1994

    Tridiagonal and Bidiagonal Matrices



    next up previous contents index
    Next: Unit Triangular Matrices Up: Matrix Storage Schemes Previous: Band Storage

    Tridiagonal and Bidiagonal Matrices

     

    An unsymmetric   tridiagonal matrix of order n is stored in three one-dimensional arrays, one of length n containing the diagonal elements, and two of length n - 1 containing the subdiagonal and superdiagonal elements in elements 1 : n - 1.

    A symmetric   tridiagonal or bidiagonal   matrix is stored in two one-dimensional arrays, one of length n containing the diagonal elements, and one of length n containing the off-diagonal elements. (EISPACK routines store the off-diagonal elements in elements 2 : n of a vector of length n.)




    Tue Nov 29 14:03:33 EST 1994

    Unit Triangular Matrices



    next up previous contents index
    Next: Real Diagonal Elements Up: Matrix Storage Schemes Previous: Tridiagonal and Bidiagonal

    Unit Triangular Matrices

     

    Some LAPACK routines have an option to handle unit triangular matrices (that is, triangular matrices with diagonal elements = 1). This option is specified by an argument DIAG  . If DIAG = 'U' (Unit triangular), the diagonal elements of the matrix need not be stored, and the corresponding array elements are not referenced by the LAPACK routines. The storage scheme for the rest of the matrix (whether conventional, packed or band) remains unchanged, as described in subsections 5.3.1, 5.3.2 and 5.3.3.




    Tue Nov 29 14:03:33 EST 1994

    Real Diagonal Elements of Complex Matrices



    next up previous contents index
    Next: Representation of Orthogonal Up: Matrix Storage Schemes Previous: Unit Triangular Matrices

    Real Diagonal Elements of Complex Matrices

     

    Complex Hermitian   matrices have diagonal matrices that are by definition purely real. In addition, some complex triangular matrices computed by LAPACK routines are defined by the algorithm to have real diagonal elements - in Cholesky or QR factorization, for example.

    If such matrices are supplied as input to LAPACK routines, the imaginary parts of the diagonal elements are not referenced, but are assumed to be zero. If such matrices are returned as output by LAPACK routines, the computed imaginary parts are explicitly set to zero.




    Tue Nov 29 14:03:33 EST 1994

    Representation of Orthogonal or Unitary Matrices



    next up previous contents index
    Next: Installing LAPACK Routines Up: Documentation and Software Previous: Real Diagonal Elements

    Representation of Orthogonal or Unitary Matrices

     

    A real orthogonal or complex unitary matrix (usually denoted Q) is often represented   in LAPACK as a product of elementary reflectors - also referred to as     elementary Householder matrices (usually denoted ). For example,

    Most users need not be aware of the details, because LAPACK routines are provided to work with this representation:

    The following further details may occasionally be useful.

    An elementary reflector (or elementary Householder matrix) H of order n is a unitary matrix   of the form    

     

    where is a scalar, and v is an n-vector, with ); v is often referred to as the Householder vector   . Often v has several leading or trailing zero elements, but for the purpose of this discussion assume that H has no such special structure.

    There is some redundancy in the representation ( 5.1), which can be removed in various ways. The representation used in LAPACK (which differs from those used in LINPACK or EISPACK) sets ; hence need not be stored. In real arithmetic, , except that implies H = I.

    In complex arithmetic  , may be complex, and satisfies and . Thus a complex H is not Hermitian (as it is in other representations), but it is unitary, which is the important property. The advantage of allowing to be complex is that, given an arbitrary complex vector x, H can be computed so that

    with real . This is useful, for example, when reducing a complex Hermitian matrix to real symmetric tridiagonal form  , or a complex rectangular matrix to real bidiagonal form  .

    For further details, see Lehoucq [59].



    next up previous contents index
    Next: Installing LAPACK Routines Up: Documentation and Software Previous: Real Diagonal Elements




    Tue Nov 29 14:03:33 EST 1994

    Installing LAPACK Routines



    next up previous contents index
    Next: Points to Note Up: Guide Previous: Representation of Orthogonal

    Installing LAPACK Routines

     






    Tue Nov 29 14:03:33 EST 1994

    Points to Note



    next up previous contents index
    Next: Installing ILAENV Up: Installing LAPACK Routines Previous: Installing LAPACK Routines

    Points to Note

     

    For anyone who obtains the complete LAPACK package from netlib or NAG (see Chapter 1), a comprehensive installation   guide   is provided. We recommend installation of the complete package as the most convenient and reliable way to make LAPACK available.

    People who obtain copies of a few LAPACK routines from netlib need to be aware of the following points:

    1. Double precision complex routines (names beginning Z-) use a COMPLEX*16 data type. This is an extension to the Fortran 77 standard, but is provided by many Fortran compilers on machines where double precision computation is usual. The following related extensions are also used:

      • the intrinsic function DCONJG, with argument and result of type COMPLEX*16;

      • the intrinsic functions DBLE and DIMAG, with COMPLEX*16 argument and DOUBLE PRECISION result, returning the real and imaginary parts, respectively;

      • the intrinsic function DCMPLX, with DOUBLE PRECISION argument(s) and COMPLEX*16 result;

      • COMPLEX*16 constants, formed from a pair of double precision constants in parentheses.

      Some compilers provide DOUBLE COMPLEX as an alternative to COMPLEX*16, and an intrinsic function DREAL instead of DBLE to return the real part of a COMPLEX*16 argument. If the compiler does not accept the constructs used in LAPACK, the installer will have to modify the code: for example, globally change COMPLEX*16 to DOUBLE COMPLEX, or selectively change DBLE to DREAL. gif

    2. For optimal performance, a small set of tuning parameters must be set for each machine, or even for each configuration of a given machine (for example, different parameters may be optimal for different numbers of processors). These values  , such as the block size, minimum block size, crossover point below which an unblocked routine should be used, and others, are set by calls to an inquiry function ILAENV. The default version of ILAENV   provided with LAPACK uses generic values which often give satisfactory performance, but users who are particularly interested in performance may wish to modify this subprogram or substitute their own version. Further details on setting ILAENV for a particular environment are provided in section 6.2.

    3. SLAMCH/DLAMCH       determines properties of the floating-point arithmetic at run-time, such as the machine epsilon, underflow threshold, overflow threshold, and related parameters. It works satisfactorily on all commercially important machines of which we are aware, but will necessarily be updated from time to time as new machines and compilers are produced.



    next up previous contents index
    Next: Installing ILAENV Up: Installing LAPACK Routines Previous: Installing LAPACK Routines




    Tue Nov 29 14:03:33 EST 1994

    Documentation for LAPACK



    next up previous contents index
    Next: Availability of LAPACK Up: Essentials Previous: LAPACK and the

    Documentation for LAPACK

     

    This Users' Guide gives an informal introduction to the design of the package, and a detailed description of its contents. Chapter 5 explains the conventions used in the software and documentation. Part 2 contains complete specifications of all the driver routines and computational routines. These specifications have been derived from the leading comments in the source text.

    On-line manpages (troff files) for LAPACK routines, as well as for most of the BLAS routines, are available on netlib. These files are automatically generated at the time of each release. For more information, see the manpages.tar.z entry on the lapack index on netlib.




    Tue Nov 29 14:03:33 EST 1994

    Installing ILAENV



    next up previous contents index
    Next: Troubleshooting Up: Installing LAPACK Routines Previous: Points to Note

    Installing ILAENV

     

    Machine-dependent     parameters   such as the block size are set by calls to an inquiry function which may be set with different values on each machine. The declaration of the environment inquiry function is

    INTEGER FUNCTION ILAENV( ISPEC, NAME, OPTS, N1, N2, N3, N4 )
    where ISPEC, N1, N2, N3, and N4 are integer variables and NAME and OPTS are CHARACTER*(*). NAME specifies the subroutine name: OPTS is a character string of options to the subroutine; and N1-N4 are the problem dimensions. ISPEC specifies the parameter to be returned; the following values are currently used in LAPACK:

    ISPEC = 1:  NB, optimal block size
          = 2:  NBMIN, minimum block size for the block routine
                to be used
          = 3:  NX, crossover point (in a block routine, for
                N < NX, an un blocked routine should be used)
          = 4:  NS, number of shifts
          = 6:  NXSVD is the threshold point for which the QR
                factorization is performed prior to reduction to
                bidiagonal form.  If M > NXSVD * N, then a
                QR factorization is performed.
          = 8:  MAXB, crossover point for block multishift QR
    

    The three block size parameters, NB, NBMIN, and NX, are used in many different   subroutines (see Table 6.1). NS and MAXB are used   in the block multishift QR algorithm, xHSEQR.         NXSVD is used   in the driver routines xGELSS and xGESVD.                

       
    Table 6.1: Use of the block parameters NB, NBMIN, and NX in LAPACK

    The LAPACK testing and timing programs use a special version of ILAENV   where the parameters are set via a COMMON block interface. This is convenient for experimenting with different values of, say, the block size in order to exercise different parts of the code and to compare the relative performance of different parameter values.

    The LAPACK timing programs were designed to collect data for all the routines in Table 6.1. The range of problem sizes needed to determine the optimal block size or crossover point   is machine-dependent, but the input files provided with the LAPACK test and timing package can be used as a starting point. For subroutines that require a crossover point, it is best to start by finding the best block size   with the crossover point set to 0, and then to locate the point at which the performance of the unblocked algorithm is beaten by the block algorithm. The best crossover point   will be somewhat smaller than the point where the curves for the unblocked and blocked methods cross.

    For example, for SGEQRF   on a single processor of a CRAY-2, NB = 32 was observed to be a good block size  , and the performance of the block algorithm with this block size surpasses the unblocked algorithm for square matrices between N = 176 and N = 192. Experiments with crossover points from 64 to 192 found that NX = 128 was a good choice, although the results for NX from 3*NB to 5*NB are broadly similar. This means that matrices with N < = 128 should use the unblocked algorithm, and for N > 128 block updates should be used until the remaining submatrix has order less than 128. The performance of the unblocked (NB = 1) and blocked (NB = 32) algorithms for SGEQRF   and for the blocked algorithm with a crossover point of 128 are compared in Figure 6.1.

       
    Figure 6.1: QR factorization on CRAY-2 (1 processor)

    By experimenting with small values of the block size, it should be straightforward to choose NBMIN, the smallest block size that gives a performance improvement over the unblocked algorithm. Note that on some machines, the optimal block size may be 1 (the unblocked algorithm gives the best performance); in this case, the choice of NBMIN is arbitrary. The prototype version of ILAENV   sets NBMIN to 2, so that blocking is always done, even though this could lead to poor performance from a block routine if insufficient workspace is supplied (see chapter 7).

    Complicating the determination of optimal parameters is the fact that the orthogonal factorization routines and SGEBRD   accept non-square matrices as input. The LAPACK timing program allows M and N to be varied independently. We have found the optimal block size to be generally insensitive to the shape of the matrix, but the crossover point is more dependent on the matrix shape. For example, if
    M >> N in the QR factorization, block updates may always be faster than unblocked updates on the remaining submatrix, so one might set NX = NB if M > = 2N.

    Parameter values for the number of shifts, etc. used to tune the block multishift QR algorithm   can be varied from the input files to the eigenvalue timing program. In particular, the performance of xHSEQR is particularly sensitive to         the correct choice of block parameters. Setting NS = 2 will give essentially the same performance as EISPACK  . Interested users should consult [3] for a description of the timing program input files.



    next up previous contents index
    Next: Troubleshooting Up: Installing LAPACK Routines Previous: Points to Note




    Tue Nov 29 14:03:33 EST 1994

    Troubleshooting



    next up previous contents index
    Next: Common Errors in Up: Guide Previous: Installing ILAENV

    Troubleshooting

       






    Tue Nov 29 14:03:33 EST 1994

    Common Errors in Calling LAPACK Routines



    next up previous contents index
    Next: Failures Detected by Up: Troubleshooting Previous: Troubleshooting

    Common Errors in Calling LAPACK Routines

     

    For the benefit of less experienced programmers, we give here a list of common programming errors in calling an LAPACK routine. These errors may cause the LAPACK routine to report a failure, as described in Section 7.2  ; they may cause an error to be reported by the system; or they may lead to wrong results - see also Section 7.3.

    Some modern compilation systems, as well as software tools such as the portability checker in Toolpack [66], can check that arguments agree in number and type; and many compilation systems offer run-time detection of errors such as an array element out-of-bounds or use of an unassigned variable.




    Tue Nov 29 14:03:33 EST 1994

    Failures Detected by LAPACK Routines



    next up previous contents index
    Next: Invalid Arguments and Up: Troubleshooting Previous: Common Errors in

    Failures Detected by LAPACK Routines

       

    There are two ways in which an LAPACK routine may report a failure to complete a computation successfully.






    Tue Nov 29 14:03:33 EST 1994

    Invalid Arguments and XERBLA



    next up previous contents index
    Next: Computational Failures and Up: Failures Detected by Previous: Failures Detected by

    Invalid Arguments and XERBLA

        If an illegal value is supplied for one of the input arguments to an LAPACK routine, it will call the error handler XERBLA to write a message to the standard output unit of the form:

     ** On entry to SGESV  parameter number  4 had an illegal value
    This particular message would be caused by passing to SGESV   a value of LDA which was less than the value of the argument N. The documentation for SGESV in Part 2 states the set of acceptable input values: ``LDA > = max(1,N).'' This is required in order that the   array A with leading dimension LDA can store an n-by-n matrix. gif The arguments are checked in order, beginning with the first. In the above example, it may - from the user's point of view - be the value of N which is in fact wrong. Invalid arguments are often caused by the kind of error listed in Section 7.1.

    In the model implementation of XERBLA   which is supplied with LAPACK, execution stops after the message; but the call to XERBLA is followed by a RETURN statement in the LAPACK routine, so that if the installer removes the STOP statement in XERBLA, the result will be an immediate exit from the LAPACK routine with a negative value of INFO. It is good practice always to check for a non-zero value of INFO on return from an LAPACK routine.   (We recommend however that XERBLA should not be modified to return control to the calling routine, unless absolutely necessary, since this would remove one of the built-in safety-features of LAPACK.)




    Tue Nov 29 14:03:33 EST 1994

    Computational Failures and INFO > 0



    next up previous contents index
    Next: Wrong Results Up: Failures Detected by Previous: Invalid Arguments and

    Computational Failures and INFO > 0

      A positive value of INFO on return from an LAPACK routine indicates a failure in the course of the algorithm. Common causes are:

    For example, if SGESVX   is called to solve a system of equations with a coefficient matrix that is approximately singular, it may detect exact singularity at the i-th stage of the LU factorization, in which case it returns INFO = i; or (more probably) it may compute an estimate of the reciprocal condition number that is less than machine precision, in which case it returns INFO = n + 1. Again, the documentation in Part 2 should be consulted for a description of the error.

    When a failure with INFO > 0 occurs, control is always returned to the calling program; XERBLA is not called, and no error message is written. It is worth repeating that it is good practice always to check for a non-zero value of INFO on return from an LAPACK routine.

    A failure with INFO > 0 may indicate any of the following:



    next up previous contents index
    Next: Wrong Results Up: Failures Detected by Previous: Invalid Arguments and




    Tue Nov 29 14:03:33 EST 1994

    Wrong Results



    next up previous contents index
    Next: Poor Performance Up: Troubleshooting Previous: Computational Failures and

    Wrong Results

       

    Wrong results from LAPACK routines are most often caused by incorrect usage.

    It is also possible that wrong results are caused by a bug outside of LAPACK, in the compiler or in one of the library routines, such as the BLAS, that are linked with LAPACK. Test procedures are available for both LAPACK and the BLAS, and the LAPACK installation guide [3] should be consulted for descriptions of the tests and for advice on resolving problems.

    A list of known problems, compiler errors, and bugs in LAPACK routines is maintained on netlib; see Chapter 1.

    Users who suspect they have found a new bug in an LAPACK routine are encouraged to report it promptly to the developers as directed in Chapter 1. The bug report should include a test case, a description of the problem and expected results, and the actions, if any, that the user has already taken to fix the bug.




    Tue Nov 29 14:03:33 EST 1994

    Poor Performance



    next up previous contents index
    Next: Index of Driver Up: Troubleshooting Previous: Wrong Results

    Poor Performance

    We have tried to make the performance of LAPACK ``transportable'' by performing most of the computation within the Level 1, 2, and 3 BLAS, and by isolating all of the machine-dependent tuning parameters in a single integer function ILAENV  . To avoid poor performance   from LAPACK routines, note the following recommendations  :

    BLAS:
    One should use BLAS that have been optimized for the machine being used if they are available. Many manufacturers and research institutions have developed, or are developing, efficient versions of the BLAS for particular machines. A portable set of Fortran BLAS is supplied with LAPACK and can always be used if no other BLAS are available or if there is a suspected problem in the local BLAS library, but no attempt has been made to structure the Fortran BLAS for high performance.

    ILAENV:
    For best performance, the LAPACK routine ILAENV should be set with optimal tuning parameters for the machine being used. The version of ILAENV provided with LAPACK supplies default values for these parameters that give good, but not optimal, average case performance on a range of existing machines. In particular, the performance of xHSEQR is particularly sensitive to           the correct choice of block parameters; the same applies to the driver routines which call xHSEQR, namely xGEES, xGEESX, xGEEV and xGEEVX.                                 Further details on setting parameters in ILAENV are found in section 6.

    LWORK WORK(1):
    The performance of some routines depends on the amount of workspace supplied. In such cases, an argument, usually called WORK, is provided, accompanied by an integer argument LWORK specifying its length as a linear array. On exit, WORK(1) returns the amount of workspace required to use the optimal tuning parameters. If LWORK < WORK(1), then insufficient workspace was provided to use the optimal parameters, and the performance may be less than possible. One should check that LWORK WORK(1) on return from an LAPACK routine requiring user-supplied workspace to see if enough workspace has been provided.   Note that the computation is performed correctly, even if the amount of workspace is less than optimal, unless LWORK is reported as an invalid value by a call to XERBLA as described in Section 7.2.

    xLAMCH:
    Users should beware of the high cost of the first call to the LAPACK auxiliary routine xLAMCH,   which computes machine characteristics such as epsilon and the smallest invertible number. The first call dynamically determines a set of parameters defining the machine's arithmetic, but these values are saved and subsequent calls incur only a trivial cost. For performance testing, the initial cost can be hidden by including a call to xLAMCH in the main program, before any calls to LAPACK routines that will be timed. A sample use of SLAMCH   is
          XXXXXX = SLAMCH( 'P' )
    or in double precision:
          XXXXXX = DLAMCH( 'P' )
    A cleaner but less portable solution is for the installer to save the values computed by xLAMCH for a specific machine and create a new version of xLAMCH with these constants set in DATA statements, taking care that no accuracy is lost in the translation.



    next up previous contents index
    Next: Index of Driver Up: Troubleshooting Previous: Wrong Results




    Tue Nov 29 14:03:33 EST 1994

    Index of Driver and Computational Routines



    next up previous contents index
    Next: Notes Up: Guide Previous: Poor Performance

    Index of Driver and Computational Routines

     






    Tue Nov 29 14:03:33 EST 1994

    Notes



    next up previous contents index
    Next: Index of Auxiliary Up: Index of Driver Previous: Index of Driver

    Notes

    1. This index     lists related pairs of real and complex routines together, for example, SBDSQR and CBDSQR.

    2. Driver routines are listed in bold type, for example SGBSV and CGBSV.

    3. Routines are listed in alphanumeric order of the real (single precision) routine name (which always begins with S-). (See subsection 2.1.3 for details of the LAPACK naming scheme.)

    4. Double precision routines are not listed here; they have names beginning with D- instead of S-, or Z- instead of C-.

    5. This index gives only a brief description of the purpose of each routine. For a precise description, consult the specifications in Part 2, where the routines appear in the same order as here.

    6. The text of the descriptions applies to both real and complex routines, except where alternative words or phrases are indicated, for example ``symmetric/Hermitian'', ``orthogonal/unitary'' or ``quasi-triangular/triangular''. For the real routines is equivalent to . (The same convention is used in Part 2.)

    7. In a few cases, three routines are listed together, one for real symmetric, one for complex symmetric, and one for complex Hermitian matrices (for example SSPCON, CSPCON and CHPCON).

    8. A few routines for real matrices have no complex equivalent (for example SSTEBZ).




    Tue Nov 29 14:03:33 EST 1994

    Availability of LAPACK



    next up previous contents index
    Next: Installation of LAPACK Up: Essentials Previous: Documentation for LAPACK

    Availability of LAPACK

    The complete LAPACK package or individual routines from LAPACK   are most easily obtained through netlib [32]  . At the time of this writing, the e-mail addresses for netlib are

    netlib@ornl.gov
    netlib@research.att.com
    Both repositories provide electronic mail and anonymous ftp service (the netlib@ornl.gov cite is available via anonymous ftp to netlib2.cs.utk.edu), and the netlib@ornl.gov cite additionally provides xnetlib  . Xnetlib uses an X Windows graphical user interface and a socket-based connection between the user's machine and the xnetlib server machine to process software requests. For more information on xnetlib, echo ``send index from xnetlib'' | mail netlib@ornl.gov.

    General information about LAPACK can be obtained by sending mail to one of the above addresses with the message

    send index from lapack

    The package is also available on the World Wide Web. It can be accessed through the URL address:

    http://www.netlib.org/lapack/index.html

    The complete package, including test code and timing programs in four different Fortran data types, constitutes some 735,000 lines of Fortran source and comments.

    Alternatively, if a user does not have internet access, the complete package can be obtained on magnetic media from NAG for a cost-covering handling charge.

    For further details contact NAG   at one of the following addresses:

    NAG Inc.                       NAG Ltd.
    1400 Opus Place, Suite 200     Wilkinson House
    Downers Grove, IL  60515-5702  Jordan Hill Road
    USA                            Oxford OX2 8DR
    Tel: +1 708 971 2337           England
    Fax: +1 708 971 2706           Tel: +44 865 511245
                                   Fax: +44 865 310139
    NAG GmbH
    Schleissheimerstrasse 5
    W-8046 Garching bei Munchen
    Germany
    Tel: +49 89 3207395
    Fax: +49 89 3207396
    



    Tue Nov 29 14:03:33 EST 1994

    Index of Auxiliary Routines



    next up previous contents index
    Next: Notes Up: Guide Previous: Notes

    Index of Auxiliary Routines

     






    Tue Nov 29 14:03:33 EST 1994

    Notes



    next up previous contents index
    Next: Quick Reference Guide Up: Index of Auxiliary Previous: Index of Auxiliary

    Notes

    1. This index   lists related pairs of real and complex routines together, in the same style as in Appendix A.

    2. Routines are listed in alphanumeric order of the real (single precision) routine name (which always begins with S-). (See subsection 2.1.3 for details of the LAPACK naming scheme.)

    3. A few complex routines have no real equivalents, and they are listed first; routines listed in italics (for example, CROT), have real equivalents in the Level 1 or Level 2 BLAS.

    4. Double precision routines are not listed here; they have names beginning with D- instead of S-, or Z- instead of C-. The only exceptions to this simple rule are that the double precision versions of ICMAX1, SCSUM1 and CSRSCL are named IZMAX1, DZSUM1 and ZDRSCL.

    5. A few routines in the list have names that are independent of data type: ILAENV, LSAME, LSAMEN and XERBLA.

    6. This index gives only a brief description of the purpose of each routine. For a precise description consult the leading comments in the code, which have been written in the same style as for the driver and computational routines.




    Tue Nov 29 14:03:33 EST 1994

    Quick Reference<A NAME=7491>  </A> Guide to the BLAS



    next up previous contents index
    Next: Converting from LINPACK Up: Guide Previous: Notes

    Quick Reference   Guide to the BLAS

     

    Level 1 BLAS

                       dim scalar vector   vector   scalars              5-element prefixes
                                                                         array
    SUBROUTINE _ROTG (                                      A, B, C, S )          S, D
    SUBROUTINE _ROTMG(                              D1, D2, A, B,        PARAM )  S, D
    SUBROUTINE _ROT  ( N,         X, INCX, Y, INCY,               C, S )          S, D
    SUBROUTINE _ROTM ( N,         X, INCX, Y, INCY,                      PARAM )  S, D
    SUBROUTINE _SWAP ( N,         X, INCX, Y, INCY )                              S, D, C, Z
    SUBROUTINE _SCAL ( N,  ALPHA, X, INCX )                                       S, D, C, Z, CS, ZD
    SUBROUTINE _COPY ( N,         X, INCX, Y, INCY )                              S, D, C, Z
    SUBROUTINE _AXPY ( N,  ALPHA, X, INCX, Y, INCY )                              S, D, C, Z
    FUNCTION   _DOT  ( N,         X, INCX, Y, INCY )                              S, D, DS
    FUNCTION   _DOTU ( N,         X, INCX, Y, INCY )                              C, Z
    FUNCTION   _DOTC ( N,         X, INCX, Y, INCY )                              C, Z
    FUNCTION   __DOT ( N,  ALPHA, X, INCX, Y, INCY )                              SDS
    FUNCTION   _NRM2 ( N,         X, INCX )                                       S, D, SC, DZ
    FUNCTION   _ASUM ( N,         X, INCX )                                       S, D, SC, DZ
    FUNCTION   I_AMAX( N,         X, INCX )                                       S, D, C, Z

    Level 2 BLAS

            options            dim   b-width scalar matrix  vector   scalar vector   prefixes
    _GEMV (        TRANS,      M, N,         ALPHA, A, LDA, X, INCX, BETA,  Y, INCY ) S, D, C, Z
    _GBMV (        TRANS,      M, N, KL, KU, ALPHA, A, LDA, X, INCX, BETA,  Y, INCY ) S, D, C, Z
    _HEMV ( UPLO,                 N,         ALPHA, A, LDA, X, INCX, BETA,  Y, INCY ) C, Z
    _HBMV ( UPLO,                 N, K,      ALPHA, A, LDA, X, INCX, BETA,  Y, INCY ) C, Z
    _HPMV ( UPLO,                 N,         ALPHA, AP,     X, INCX, BETA,  Y, INCY ) C, Z
    _SYMV ( UPLO,                 N,         ALPHA, A, LDA, X, INCX, BETA,  Y, INCY ) S, D
    _SBMV ( UPLO,                 N, K,      ALPHA, A, LDA, X, INCX, BETA,  Y, INCY ) S, D
    _SPMV ( UPLO,                 N,         ALPHA, AP,     X, INCX, BETA,  Y, INCY ) S, D
    _TRMV ( UPLO, TRANS, DIAG,    N,                A, LDA, X, INCX )                 S, D, C, Z
    _TBMV ( UPLO, TRANS, DIAG,    N, K,             A, LDA, X, INCX )                 S, D, C, Z
    _TPMV ( UPLO, TRANS, DIAG,    N,                AP,     X, INCX )                 S, D, C, Z
    _TRSV ( UPLO, TRANS, DIAG,    N,                A, LDA, X, INCX )                 S, D, C, Z
    _TBSV ( UPLO, TRANS, DIAG,    N, K,             A, LDA, X, INCX )                 S, D, C, Z
    _TPSV ( UPLO, TRANS, DIAG,    N,                AP,     X, INCX )                 S, D, C, Z
            options            dim   scalar vector   vector   matrix  prefixes
    _GER  (                    M, N, ALPHA, X, INCX, Y, INCY, A, LDA ) S, D
    _GERU (                    M, N, ALPHA, X, INCX, Y, INCY, A, LDA ) C, Z
    _GERC (                    M, N, ALPHA, X, INCX, Y, INCY, A, LDA ) C, Z
    _HER  ( UPLO,                 N, ALPHA, X, INCX,          A, LDA ) C, Z
    _HPR  ( UPLO,                 N, ALPHA, X, INCX,          AP )     C, Z
    _HER2 ( UPLO,                 N, ALPHA, X, INCX, Y, INCY, A, LDA ) C, Z
    _HPR2 ( UPLO,                 N, ALPHA, X, INCX, Y, INCY, AP )     C, Z
    _SYR  ( UPLO,                 N, ALPHA, X, INCX,          A, LDA ) S, D
    _SPR  ( UPLO,                 N, ALPHA, X, INCX,          AP )     S, D
    _SYR2 ( UPLO,                 N, ALPHA, X, INCX, Y, INCY, A, LDA ) S, D
    _SPR2 ( UPLO,                 N, ALPHA, X, INCX, Y, INCY, AP )     S, D

    Level 3 BLAS

            options                          dim      scalar matrix  matrix  scalar matrix  prefixes
    _GEMM (             TRANSA, TRANSB,      M, N, K, ALPHA, A, LDA, B, LDB, BETA,  C, LDC ) S, D, C, Z
    _SYMM ( SIDE, UPLO,                      M, N,    ALPHA, A, LDA, B, LDB, BETA,  C, LDC ) S, D, C, Z
    _HEMM ( SIDE, UPLO,                      M, N,    ALPHA, A, LDA, B, LDB, BETA,  C, LDC ) C, Z
    _SYRK (       UPLO, TRANS,                  N, K, ALPHA, A, LDA,         BETA,  C, LDC ) S, D, C, Z
    _HERK (       UPLO, TRANS,                  N, K, ALPHA, A, LDA,         BETA,  C, LDC ) C, Z
    _SYR2K(       UPLO, TRANS,                  N, K, ALPHA, A, LDA, B, LDB, BETA,  C, LDC ) S, D, C, Z
    _HER2K(       UPLO, TRANS,                  N, K, ALPHA, A, LDA, B, LDB, BETA,  C, LDC ) C, Z
    _TRMM ( SIDE, UPLO, TRANSA,        DIAG, M, N,    ALPHA, A, LDA, B, LDB )                S, D, C, Z
    _TRSM ( SIDE, UPLO, TRANSA,        DIAG, M, N,    ALPHA, A, LDA, B, LDB )                S, D, C, Z

    Notes

    Meaning of prefixes

    S - REAL                C - COMPLEX
    D - DOUBLE PRECISION    Z - COMPLEX*16   (this may not be
                                             supported by all
                                             machines)
    

    For the Level 2 BLAS a set of extended-precision routines with the prefixes ES, ED, EC, EZ may also be available.

    Level 1 BLAS

    In addition to the listed routines there are two further extended-precision dot product routines DQDOTI and DQDOTA.

    Level 2 and Level 3 BLAS

    Matrix types

    GE - GEneral     GB - General Band
    SY - SYmmetric   SB - Symmetric Band   SP - Symmetric Packed
    HE - HErmitian   HB - Hermitian Band   HP - Hermitian Packed
    TR - TRiangular  TB - Triangular Band  TP - Triangular Packed
    

    Options

    Arguments describing options are declared as CHARACTER*1 and may be passed as character strings.

    TRANS   = 'No transpose', 'Transpose', 'Conjugate transpose' (X, X^T, X^C) 
    UPLO    = 'Upper triangular', 'Lower triangular'
    DIAG    = 'Non-unit triangular', 'Unit triangular'
    SIDE    = 'Left', 'Right' (A or op(A) on the left, or A or op(A) on the right)
    

    For real matrices, TRANS = `T' and TRANS = `C' have the same meaning.
    For Hermitian matrices, TRANS = `T' is not allowed.
    For complex symmetric matrices, TRANS = `H' is not allowed.



    Tue Nov 29 14:03:33 EST 1994

    Converting from LINPACK or EISPACK



    next up previous contents index
    Next: Notes Up: Guide Previous: Quick Reference Guide

    Converting from LINPACK or EISPACK

     

    This appendix     is designed to assist people to convert programs that currently call LINPACK or EISPACK routines, to call LAPACK routines instead.






    Tue Nov 29 14:03:33 EST 1994

    Notes



    next up previous contents index
    Next: LAPACK Working Notes Up: Converting from LINPACK Previous: Converting from LINPACK

    Notes

    1. The appendix consists mainly of indexes giving the nearest LAPACK equivalents of LINPACK and EISPACK routines. These indexes should not be followed blindly or rigidly, especially when two or more LINPACK or EISPACK routines are being used together: in many such cases one of the LAPACK driver routines may be a suitable replacement.

    2. When two or more LAPACK routines are given in a single entry, these routines must be combined to achieve the equivalent function.

    3. For LINPACK, an index is given for equivalents of the real LINPACK routines; these equivalences apply also to the corresponding complex routines. A separate table is included for equivalences of complex Hermitian routines. For EISPACK, an index is given for all real and complex routines, since there is no direct 1-to-1 correspondence between real and complex routines in EISPACK.

    4. A few of the less commonly used routines in LINPACK and EISPACK have no equivalents in Release 1.0 of LAPACK; equivalents for some of these (but not all) are planned for a future release.

    5. For some EISPACK routines, there are LAPACK routines providing similar functionality, but using a significantly different method, or LAPACK routines which provide only part of the functionality; such routines are marked by a . For example, the EISPACK routine ELMHES uses non-orthogonal transformations, whereas the nearest equivalent LAPACK routine, SGEHRD, uses orthogonal transformations.

    6. In some cases the LAPACK equivalents require matrices to be stored in a different storage scheme. For example:

      • EISPACK routines BANDR  , BANDV  , BQR   and the driver routine RSB   require the lower triangle of a symmetric band matrix to be stored in a different storage scheme to that used in LAPACK, which is illustrated in subsection 5.3.3. The corresponding storage scheme used by the EISPACK routines is:

      • EISPACK routines TRED1  , TRED2  , TRED3  , HTRID3  , HTRIDI  , TQL1  , TQL2  , IMTQL1  , IMTQL2  , RATQR  , TQLRAT   and the driver routine RST   store the off-diagonal elements of a symmetric tridiagonal matrix in elements 2 : n of the array E, whereas LAPACK routines use elements 1 : n - 1.

    7. The EISPACK and LINPACK routines for the singular value decomposition return the matrix of right singular vectors, V, whereas the corresponding LAPACK routines return the transposed matrix .

    8. In general, the argument lists of the LAPACK routines are different from those of the corresponding EISPACK and LINPACK routines, and the workspace requirements are often different.

       LAPACK equivalents of LINPACK routines for real matrices
    ----------------------------------------------------------------
    LINPACK  LAPACK   Function of LINPACK routine
    ----------------------------------------------------------------
    SCHDC             Cholesky factorization with diagonal pivoting
                      option
    ----------------------------------------------------------------
    SCHDD             rank-1 downdate of a Cholesky factorization
                      or the triangular factor of a QR factorization
    ----------------------------------------------------------------
    SCHEX             rank-1 update of a Cholesky factorization
                      or the triangular factor of a QR factorization
    ----------------------------------------------------------------
    SCHUD              modifies a Cholesky factorization under
                       permutations of the original matrix
    ----------------------------------------------------------------
    SGBCO    SLANGB    LU factorization and condition estimation
             SGBTRF    of a general band matrix
             SGBCON
    ----------------------------------------------------------------
    SGBDI              determinant of a general band matrix,
                       after factorization by SGBCO or SGBFA
    ----------------------------------------------------------------
    SGBFA    SGBTRF    LU factorization of a general band matrix
    ----------------------------------------------------------------
    SGBSL    SGBTRS    solves a general band system of linear
                       equations, after factorization by SGBCO
                       or SGBFA
    ----------------------------------------------------------------
    SGECO    SLANGE    LU factorization and condition
             SGETRF    estimation of a general matrix
             SGECON
    ----------------------------------------------------------------
    SGEDI    SGETRI    determinant and inverse of a general
                       matrix, after factorization by SGECO
                       or SGEFA
    ----------------------------------------------------------------
    SGEFA    SGETRF    LU factorization of a general matrix
    ----------------------------------------------------------------
    SGESL    SGETRS    solves a general system of linear
                       equations, after factorization by
                       SGECO or SGEFA
    ----------------------------------------------------------------
    SGTSL    SGTSV     solves a general tridiagonal system
                       of linear equations
    ----------------------------------------------------------------
    SPBCO    SLANSB    Cholesky factorization and condition
             SPBTRF    estimation of a symmetric positive definite
             SPBCON    band matrix
    ----------------------------------------------------------------
    SPBDI              determinant of a symmetric positive
                       definite band matrix, after factorization
                       by SPBCO or SPBFA
    ----------------------------------------------------------------
    SPBFA    SPBTRF    Cholesky factorization of a symmetric
                       positive definite band matrix
    ----------------------------------------------------------------
    SPBSL    SPBTRS    solves a symmetric positive definite band
                       system of linear equations, after
                       factorization by SPBCO or SPBFA
    ----------------------------------------------------------------
    SPOCO    SLANSY    Cholesky factorization and condition
             SPOTRF    estimation of a symmetric positive definite
             SPOCON    matrix
    ----------------------------------------------------------------
    SPODI    SPOTRI    determinant and inverse of a symmetric
                       positive definite matrix, after factorization
                       by SPOCO or SPOFA
    ----------------------------------------------------------------
    SPOFA    SPOTRF    Cholesky factorization of a symmetric
                       positive definite matrix
    ----------------------------------------------------------------
    SPOSL    SPOTRS    solves a symmetric positive definite system
                       of linear equations, after factorization by
                       SPOCO or SPOFA
    ----------------------------------------------------------------
    SPPCO    SLANSY    Cholesky factorization and condition
             SPPTRF    estimation of a symmetric positive definite
             SPPCON    matrix (packed storage)
    ----------------------------------------------------------------
    

                   LAPACK equivalents of LINPACK
                routines for real matrices(continued)
    ----------------------------------------------------------------
    LINPACK  LAPACK    Function of LINPACK routine}\\
    ----------------------------------------------------------------
    SPPDI    SPPTRI    determinant and inverse of a symmetric
                       positive definite matrix, after factorization
                       by SPPCO or SPPFA (packed storage)
    ----------------------------------------------------------------
    SPPFA    SPPTRF    Cholesky factorization of a symmetric
                       positive definite matrix (packed storage)
    ----------------------------------------------------------------
    SPPSL    SPPTRS    solves a symmetric positive definite system
                       of linear equations, after factorization by
                       SPPCO or SPPFA (packed storage)
    ----------------------------------------------------------------
    SPTSL    SPTSV     solves a symmetric positive definite
                       tridiagonal system of linear equations
    ----------------------------------------------------------------
    SQRDC    SGEQPF    QR factorization with optional column
             or        pivoting
             SGEQRF
    ----------------------------------------------------------------
    SQRSL    SORMQR    solves linear least squares problems after
             STRSV     factorization by SQRDC
    ----------------------------------------------------------------
    SSICO    SLANSY    symmetric indefinite factorization and
             SSYTRF    condition estimation of a symmetric
             SSYCON    indefinite matrix
    ----------------------------------------------------------------
    SSIDI    SSYTRI    determinant, inertia and inverse of a
                       symmetric indefinite matrix, after
                       factorization by SSICO or SSIFA
    ----------------------------------------------------------------
    SSIFA    SSYTRF    symmetric indefinite factorization of a
                       symmetric indefinite matrix
    ----------------------------------------------------------------
    SSISL    SSYTRS    solves a symmetric indefinite system of
                       linear equations, after factorization by
                       SSICO or SSIFA
    ----------------------------------------------------------------
    SSPCO    SLANSP    symmetric indefinite factorization and
             SSPTRF    condition estimation of a symmetric
             SSPCON    indefinite matrix (packed storage)
    ----------------------------------------------------------------
    SSPDI    SSPTRI    determinant, inertia and inverse of a
                       symmetric indefinite matrix, after
                       factorization by SSPCO or SSPFA (packed
                       storage)
    ----------------------------------------------------------------
    SSPFA    SSPTRF    symmetric indefinite factorization of a
                       symmetric indefinite matrix (packed storage)
    ----------------------------------------------------------------
    SSPSL    SSPTRS    solves a symmetric indefinite system of
                       linear equations, after factorization by
                       SSPCO or SSPFA (packed storage)
    ----------------------------------------------------------------
    SSVDC    SGESVD    all or part of the singular value
                       decomposition of a general matrix
    ----------------------------------------------------------------
    STRCO    STRCON    condition estimation of a triangular matrix
    ----------------------------------------------------------------
    STRDI    STRTRI    determinant and inverse of a triangular
                       matrix
    ----------------------------------------------------------------
    STRSL    STRTRS    solves a triangular system of linear
                       equations
    ----------------------------------------------------------------
    



    next up previous contents index
    Next: LAPACK Working Notes Up: Converting from LINPACK Previous: Converting from LINPACK




    Tue Nov 29 14:03:33 EST 1994

    LAPACK Working Notes



    next up previous contents index
    Next: Specifications of Routines Up: Guide Previous: Notes

    LAPACK Working Notes

     

    Most of these working notes are available from netlib, where they can only be obtained in postscript form. To receive a list of available postscript reports, send email to netlib@ornl.gov of the form: send index from lapack/lawns

    1.
    J. W. DEMMEL, J. J. DONGARRA, J. DU CROZ, A. GREENBAUM, S. HAMMARLING, AND D. SORENSEN, Prospectus for the Development of a Linear Algebra Library for High-Performance Computers, ANL, MCS-TM-97, September 1987.

    2.
    J. J. DONGARRA, S. HAMMARLING, AND D. SORENSEN, Block Reduction of Matrices to Condensed Forms for Eigenvalue Computations, ANL, MCS-TM-99, September 1987.

    3.
    J. W. DEMMEL AND W. KAHAN, Computing Small Singular Values of Bidiagonal Matrices with Guaranteed High Relative Accuracy, ANL, MCS-TM-110, February 1988.

    4.
    J. W. DEMMEL, J. DU CROZ, S. HAMMARLING, AND D. SORENSEN, Guidelines for the Design of Symmetric Eigenroutines, SVD, and Iterative Refinement and Condition Estimation for Linear Systems, ANL, MCS-TM-111, March 1988.

    5.
    C. BISCHOF, J. W. DEMMEL, J. J. DONGARRA, J. DU CROZ, A. GREENBAUM, S. HAMMARLING, AND D. SORENSEN, Provisional Contents, ANL, MCS-TM-38, September 1988.

    6.
    O. BREWER, J. J. DONGARRA, AND D. SORENSEN, Tools to Aid in the Analysis of Memory Access Patterns for FORTRAN Programs, ANL, MCS-TM-120, June 1988.

    7.
    J. BARLOW AND J. W. DEMMEL, Computing Accurate Eigensystems of Scaled Diagonally Dominant Matrices, ANL, MCS-TM-126, December 1988.

    8.
    Z. BAI AND J. W. DEMMEL, On a Block Implementation of Hessenberg Multishift QR Iteration, ANL, MCS-TM-127, January 1989.

    9.
    J. W. DEMMEL AND A. MCKENNEY, A Test Matrix Generation Suite, ANL, MCS-P69-0389, March 1989.

    10.
    E. ANDERSON AND J. J. DONGARRA, Installing and Testing the Initial Release of LAPACK - Unix and Non-Unix Versions, ANL, MCS-TM-130, May 1989.

    11.
    P. DEIFT, J. W. DEMMEL, L.-C. LI, AND C. TOMEI, The Bidiagonal Singular Value Decomposition and Hamiltonian Mechanics, ANL, MCS-TM-133, August 1989.

    12.
    P. MAYES AND G. RADICATI, Banded Cholesky Factorization Using Level 3 BLAS, ANL, MCS-TM-134, August 1989.

    13.
    Z. BAI, J. W. DEMMEL, AND A. MCKENNEY, On the Conditioning of the Nonsymmetric Eigenproblem: Theory and Software, UT, CS-89-86, October 1989.

    14.
    J. W. DEMMEL, On Floating-Point Errors in Cholesky, UT, CS-89-87, October 1989.

    15.
    J. W. DEMMEL AND K. VESELIC, Jacobi's Method is More Accurate than QR, UT, CS-89-88, October 1989.

    16.
    E. ANDERSON AND J. J. DONGARRA, Results from the Initial Release of LAPACK, UT, CS-89-89, November 1989.

    17.
    A. GREENBAUM AND J. J. DONGARRA, Experiments with QR/QL Methods for the Symmetric Tridiagonal Eigenproblem, UT, CS-89-92, November 1989.

    18.
    E. ANDERSON AND J. J. DONGARRA, Implementation Guide for LAPACK, UT, CS-90-101, April 1990.

    19.
    E. ANDERSON AND J. J. DONGARRA, Evaluating Block Algorithm Variants in LAPACK, UT, CS-90-103, April 1990.

    20.
    E. ANDERSON, Z. BAI, C. BISCHOF, J. W. DEMMEL, J. J. DONGARRA, J. DU CROZ, A. GREENBAUM, S. HAMMARLING, A. MCKENNEY, AND D. SORENSEN, LAPACK: A Portable Linear Algebra Library for High-Performance Computers, UT, CS-90-105, May 1990.

    21.
    J. DU CROZ, P. MAYES, AND G. RADICATI, Factorizations of Band Matrices Using Level 3 BLAS, UT, CS-90-109, July 1990.

    22.
    J. W. DEMMEL AND N. J. HIGHAM, Stability of Block Algorithms with Fast Level 3 BLAS, UT, CS-90-110, July 1990.

    23.
    J. W. DEMMEL AND N. J. HIGHAM, Improved Error Bounds for Underdetermined System Solvers, UT, CS-90-113, August 1990.

    24.
    J. J. DONGARRA AND S. OSTROUCHOV, LAPACK Block Factorization Algorithms on the Intel iPSC/860, UT, CS-90-115, October, 1990.

    25.
    J. J. DONGARRA, S. HAMMARLING, AND J. H. WILKINSON, Numerical Considerations in Computing Invariant Subspaces, UT, CS-90-117, October, 1990.

    26.
    E. ANDERSON, C. BISCHOF, J. W. DEMMEL, J. J. DONGARRA, J. DU CROZ, S. HAMMARLING, AND W. KAHAN, Prospectus for an Extension to LAPACK: A Portable Linear Algebra Library for High-Performance Computers, UT, CS-90-118, November 1990.

    27.
    J. DU CROZ AND N. J. HIGHAM, Stability of Methods for Matrix Inversion, UT, CS-90-119, October, 1990.

    28.
    J. J. DONGARRA, P. MAYES, AND G. RADICATI, The IBM RISC System/6000 and Linear Algebra Operations, UT, CS-90-122, December 1990.

    29.
    R. VAN DE GEIJN, On Global Combine Operations, UT, CS-91-129, April 1991.

    30.
    J. J. DONGARRA AND R. VAN DE GEIJN, Reduction to Condensed Form for the Eigenvalue Problem on Distributed Memory Architectures, UT, CS-91-130, April 1991.

    31.
    E. ANDERSON, Z. BAI, AND J. J. DONGARRA, Generalized QR Factorization and its Applications, UT, CS-91-131, April 1991.

    32.
    C. BISCHOF AND P. TANG, Generalized Incremental Condition Estimation, UT, CS-91-132, May 1991.

    33.
    C. BISCHOF AND P. TANG, Robust Incremental Condition Estimation, UT, CS-91-133, May 1991.

    34.
    J. J. DONGARRA, Workshop on the BLACS, UT, CS-91-134, May 1991.

    35.
    E. ANDERSON, J. J. DONGARRA, AND S. OSTROUCHOV, Implementation guide for LAPACK, UT, CS-91-138, August 1991. (replaced by Working Note 41)

    36.
    E. ANDERSON, Robust Triangular Solves for Use in Condition Estimation, UT, CS-91-142, August 1991.

    37.
    J. J. DONGARRA AND R. VAN DE GEIJN, Two Dimensional Basic Linear Algebra Communication Subprograms, UT, CS-91-138, October 1991.

    38.
    Z. BAI AND J. W. DEMMEL, On a Direct Algorithm for Computing Invariant Subspaces with Specified Eigenvalues, UT, CS-91-139, November 1991.

    39.
    J. W. DEMMEL, J. J. DONGARRA, AND W. KAHAN, On Designing Portable High Performance Numerical Libraries, UT, CS-91-141, July 1991.

    40.
    J. W. DEMMEL, N. J. HIGHAM, AND R. SCHREIBER, Block LU Factorization, UT, CS-92-149, February 1992.

    41.
    E. ANDERSON, J. J. DONGARRA, AND S. OSTROUCHOV, Installation Guide for LAPACK, UT, CS-92-151, February 1992.

    42.
    N. J. HIGHAM, Perturbation Theory and Backward Error for AX-XB=C., UT, CS-92-153, April, 1992.

    43.
    J. J. DONGARRA, R. VAN DE GEIJN, AND D. W. WALKER, A Look at Scalable Dense Linear Algebra Libraries, UT, CS-92-155, April, 1992.

    44.
    E. ANDERSON AND J. J. DONGARRA, Performance of LAPACK: A Portable Library of Numerical Linear Algebra Routines, UT, CS-92-156, May 1992.

    45.
    J. W. DEMMEL, The Inherent Inaccuracy of Implicit Tridiagonal QR, UT, CS-92-162, May 1992.

    46.
    Z. BAI AND J. W. DEMMEL, Computing the Generalized Singular Value Decomposition, UT, CS-92-163, May 1992.

    47.
    J. W. DEMMEL, Open Problems in Numerical Linear Algebra, UT, CS-92-164, May 1992.

    48.
    J. W. DEMMEL AND W. GRAGG, On Computing Accurate Singular Values and Eigenvalues of Matrices with Acyclic Graphs, UT, CS-92-166, May 1992.

    49.
    J. W. DEMMEL, A Specification for Floating Point Parallel Prefix, UT, CS-92-167, May 1992.

    50.
    V. EIJKHOUT, Distributed Sparse Data Structures for Linear Algebra Operations, UT, CS-92-169, May 1992.

    51.
    V. EIJKHOUT, Qualitative Properties of the Conjugate Gradient and Lanczos Methods in a Matrix Framework, UT, CS-92-170, May 1992.

    52.
    M. T. HEATH AND P. RAGHAVAN, A Cartesian Parallel Nested Dissection Algorithm, UT, CS-92-178, June 1992.

    53.
    J. W. DEMMEL, Trading Off Parallelism and Numerical Stability, UT, CS-92-179, June 1992.

    54.
    Z. BAI AND J. W. DEMMEL, On Swapping Diagonal Blocks in Real Schur Form, UT, CS-92-182, October 1992.

    55.
    J. CHOI, J. J. DONGARRA, R. POZO, AND D. W. WALKER, ScaLAPACK: A Scalable Linear Algebra for Distributed Memory Concurrent Computers, UT, CS-92-181, November 1992.

    56.
    E. F. D'AZEVEDO, V. L. EIJKHOUT AND C. H. ROMINE, Reducing Communication Costs in the Conjugate Gradient Algorithm on Distributed Memory Multiprocessors, UT, CS-93-185, January 1993.

    57.
    J. CHOI, J. J. DONGARRA, AND D. W. WALKER, PUMMA: Parallel Universal Matrix Multiplication Algorithms on Distributed Memory Concurrent Computers, UT, CS-93-187, May 1993.

    58.
    J. J. DONGARRA AND D. W. WALKER, The Design of Linear Algebra Libraries for High Performance Computer, UT, CS-93-188, June 1993.

    59.
    J. W. DEMMEL AND X. LI, Faster Numerical Algorithms via Exception Handling, UT, CS-93-192, March 1993.

    60.
    J. W. DEMMEL, M. T. HEATH, AND H. A. VAN DER VORST, Parallel Numerical Linear Algebra, UT, CS-93-192, March 1993.

    61.
    J. J. DONGARRA, R. POZO, AND D. W. WALKER, An Object Oriented Design for High Performance Linear Algebra on Distributed Memory Architectures, UT, CS-93-200, August 1993.

    62.
    M. T. HEATH AND P. RAGHAVAN, Distributed Solution of Sparse Linear Systems, UT, CS-93-201, August 1993.

    63.
    M. T. HEATH AND P. RAGHAVAN, Line and Plane Separators, UT, CS-93-202, August 1993.

    64.
    P. RAGHAVAN, Distributed Sparse Gaussian Elimination and Orthogonal Factorization, UT, CS-93-203, August 1993.

    65.
    J. CHOI, J. J. DONGARRA, AND D. W. WALKER, Parallel Matrix Transpose Algorithms on Distributed Memory Concurrent Computers, UT, CS-93-215, November, 1993.

    66.
    V. L. EIJKHOUT, A Characterization of Polynomial Iterative Methods, UT, CS-93-216, November, 1993.

    67.
    F. DESPREZ, J. DONGARRA, AND B. TOURANCHEAU, Performance Complexity of Factorization with Efficient Pipelining and Overlap on a Multiprocessor, UT, CS-93-218, December, 1993.

    68.
    MICHAEL W. BERRY, JACK J. DONGARRA AND YOUNGBAE KIM, A Highly Parallel Algorithm for the Reduction of a Nonsymmetric Matrix to Block Upper-Hessenberg Form, UT, CS-94-221, January, 1994.

    69.
    J. RUTTER, A Serial Implementation of Cuppen's Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem, UT, CS-94-225, March, 1994.

    70.
    J. W. DEMMEL, INDERJIT DHILLON, AND HUAN REN, On the Correctness of Parallel Bisection in Floating Point, UT, CS-94-228, April, 1994.

    71.
    J. DONGARRA AND M. KOLATIS, IBM RS/6000-550 & -590 Performance for Selected Routines in ESSL, UT, CS-94-231, April, 1994.

    72.
    R. LEHOUCQ, The Computation of Elementary Unitary Matrices, UT, CS-94-233, May, 1994.

    73.
    R. CLINT WHALEY, Basic Linear Algebra Communication Subprograms: Analysis and Implementation Across Multiple Parallel Architectures, UT, CS-94-234, May, 1994.

    74.
    J. DONGARRA, A. LUMSDAINE, X. NIU, R. POZO, AND K. REMINGTON, A Sparse Matrix Library in C++ for High Performance Architectures, UT, CS-94-236, July, 1994.

    75.
    B. KÅGSTRÖM AND P. POROMAA, Computing Eigenspaces with Specified Eigenvalues of a Regular Matrix Pair (A,B) and Condition Estimation: Theory, Algorithms and Software, UT, CS-94-237, July, 1994.

    76.
    R. BARRETT, M. BERRY, J. DONGARRA, V. EIJKHOUT, AND C. ROMINE, Algorithic Bombardment for the Iterative Solution of Linear Systems: A Poly-Iterative Approach, UT, CS-94-239, August, 1994.

    77.
    V. EIJKHOUT AND R. POZO, Basic Concepts for Distributed Sparse Linear Algebra Operations, UT, CS-94-240, August, 1994.

    78.
    V. EIJKHOUT, Computational variants of the CGS and BiCGstab methods, UT, CS-94-241, August, 1994.

    79.
    G. HENRY AND R. VAN DE GEIJN, Parallelizing the QR Algorithm for the Unsymmetric Algebraic Eigenvalue Problem: Myths and Reality, UT, CS-94-244, August, 1994.

    80.
    J. CHOI, J. J. DONGARRA, S. OSTROUCHOV, A. P. PETITET, D. W. WALKER, AND R. C. WHALEY, The Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines, UT, CS-94-246, September, 1994.

    81.
    J. J. DONGARRA AND S. OSTROUCHOV, Quick Installation Guide for LAPACK on Unix Systems, UT, CS-94-249, September, 1994.

    82.
    J. J. DONGARRA AND M. KOLATIS, Call Conversion Interface (CCI) for LAPACK/ESSL, UT, CS-94-250, August, 1994.

    83.
    R. C. LI, Relative Perturbation Bounds for the Unitary Polar Factor, UT, CS-94-251, September, 1994.

    84.
    R. C. LI, Relative Perturbation Theory: (I) Eigenvalue Variations, UT, CS-94-252, September, 1994.

    85.
    R. C. LI, Relative Perturbation Theory: (II) Eigenspace Variations, UT, CS-94-253, September, 1994.

    86.
    J. DEMMEL AND K. STANLEY, The Performance of Finding Eigenvalues and Eigenvectors of Dense Symmetric Matrices on Distributed Memory Computers, UT, CS-94-254, September, 1994.

    87.
    B. KÅGSTRÖM AND P. POROMAA, Computing Eigenspaces with Specified Eigenvalues of a Regular Matrix Pair (A,B) and Condition Estimation: Theory, Algorithms and Software, UT, CS-94-255, September, 1994.



    next up previous contents index
    Next: Specifications of Routines Up: Guide Previous: Notes




    Tue Nov 29 14:03:33 EST 1994

    Specifications of Routines



    next up previous contents index
    Next: Notes Up: LAPACK Users' Guide Release Previous: LAPACK Working Notes

    Specifications of Routines

     






    Tue Nov 29 14:03:33 EST 1994

    Notes



    next up previous contents index
    Next: References Up: Specifications of Routines Previous: Specifications of Routines

    Notes

    1. The specifications that follow give the calling sequence, purpose, and descriptions of the arguments of each LAPACK driver and computational routine (but not of auxiliary routines).

    2. Specifications of pairs of real and complex routines have been merged (for example SBDSQR/CBDSQR). In a few cases, specifications of three routines have been merged, one for real symmetric, one for complex symmetric, and one for complex Hermitian matrices (for example SSYTRF/CSYTRF/CHETRF). A few routines for real matrices have no complex equivalent (for example SSTEBZ).

    3. Specifications are given only for single precision routines. To adapt them for the double precision version of the software, simply interpret REAL as DOUBLE PRECISION, COMPLEX as COMPLEX*16 (or DOUBLE COMPLEX), and the initial letters S- and C- of LAPACK routine names as D- and Z-.

    4. Specifications are arranged in alphabetical order of the real routine name.

    5. The text of the specifications has been derived from the leading comments in the source-text of the routines. It makes only a limited use of mathematical typesetting facilities. To eliminate redundancy, has been used throughout the specifications. Thus, the reader should note that is equivalent to in the real case.

    6. If there is a discrepancy between the specifications listed in this section and the actual source code, the source code should be regarded as the most up-to-date.

    =0.15in =-.4in




    Tue Nov 29 14:03:33 EST 1994

    References



    next up previous contents index
    Next: Index Up: LAPACK Users' Guide Release Previous: Notes

    References

    1
    E. ANDERSON, Z. BAI, C. BISCHOF, J. W. DEMMEL, J. J. DONGARRA, J. DU CROZ, A. GREENBAUM, S. HAMMARLING, A. MCKENNEY, AND D. SORENSEN, LAPACK: A portable linear algebra library for high-performance computers, Computer Science Dept. Technical Report CS-90-105, University of Tennessee, Knoxville, 1990. (LAPACK Working Note 20).

    2
    E. ANDERSON, Z. BAI, AND J. J. DONGARRA, Generalized QR Factorization and its Applications, Computer Science Dept. Technical Report CS-91-131, University of Tennessee, Knoxville, 1991. (LAPACK Working Note 31).

    3
    E. ANDERSON, J. J. DONGARRA, AND S. OSTROUCHOV, Installation guide for LAPACK, Computer Science Dept. Technical Report CS-92-151, University of Tennessee, Knoxville, 1992. (LAPACK Working Note 41).

    4
    ANSI/IEEE, IEEE Standard for Binary Floating-Point Arithmetic, New York, Std 754-1985 ed., 1985.

    5
    ANSI/IEEE, IEEE Standard for Radix Independent Floating Point Arithmetic, New York, Std 854-1987 ed., 1987.

    6
    M. ARIOLI, J. W. DEMMEL, AND I. S. DUFF, Solving sparse linear systems with sparse backward error, SIAM J. Matrix Anal. Appl., 10 (1989), pp. 165-190.

    7
    M. ARIOLI, I. S. DUFF, AND P. P. M. DE RIJK, On the augmented system approach to sparse least squares problems, Num. Math., 55 (1989), pp. 667-684.

    8
    Z. BAI AND J. W. DEMMEL, On a block implementation of Hessenberg multishift QR iteration, Int. J. of High Speed Comput., 1 (1989), pp. 97-112. (LAPACK Working Note 8).

    9
    Z. BAI AND J. W. DEMMEL, Design of a parallel nonsymmetric eigenroutine toolbox, Part I, Proceedings of the Sixth SIAM Conference on Parallel Proceesing for Scientific Computing, SIAM (1993), pp. 391-398.

    10
    Z. BAI AND J. W. DEMMEL, Computing the generalized singular value decomposition, SIAM J. Sci. Comp., 14 (1993), pp. 1464-1486. (LAPACK Working Note 46).

    11
    Z. BAI, J. W. DEMMEL, AND A. MCKENNEY, On computing condition numbers for the nonsymmetric eigenproblem, ACM Trans. Math. Soft. 19 (1993), pp. 202-223. (LAPACK Working Note 13).

    12
    Z. BAI AND H. ZHA, A new preprocessing algorithm for the computation of the generalized singular value decomposition, SIAM J. Sci. Comp., 14 (1993), pp. 1007-1012.

    13
    J. BARLOW AND J. DEMMEL, Computing accurate eigensystems of scaled diagonally dominant matrices, SIAM J. Num. Anal., 27 (1990), pp. 762-791. (LAPACK Working Note 7).

    14
    C. R. CRAWFORD, Reduction of a band-symmetric generalized eigenvalue problem, Comm. ACM, 16 (1973), pp. 41-44.

    15
    J. J. M. CUPPEN, A divide and conquer method for the symmetric tridiagonal eigenproblem, Numerische Math., 36 (1981), pp. 177-195.

    16
    P. DEIFT, J. W. DEMMEL, L.-C. LI, AND C. TOMEI, The bidiagonal singular value decomposition and Hamiltonian mechanics, SIAM J. Num. Anal., 28 (1991), pp. 1463-1516. (LAPACK Working Note 11).

    17
    J. W. DEMMEL, The Condition Number of Equivalence Transformations that Block Diagonalize Matrix Pencils, SIAM J. Num. Anal., 20 (1983), pp. 599-610.

    18
    J. W. DEMMEL, Underflow and the Reliability of Numerical Software, SIAM J. Sci. Stat. Comput., 5 (1984), pp. 887-919.

    19
    J. W. DEMMEL AND N. J. HIGHAM, Improved error bounds for underdetermined systems solvers, SIAM J. Matrix Anal. Appl., 14 (1993), pp. 1-14.

    20
    J. W. DEMMEL AND N. J. HIGHAM, Stability of block algorithms with fast level 3 BLAS, ACM Trans. Math. Soft., 18 (1992), pp. 274-291. (LAPACK Working Note 22).

    21
    J. W. DEMMEL AND B. KåGSTRÖM, Computing Stable Eigendecompositions of Matrix Pencils, Lin. Alg. Appl., 88/89 (1987), pp. 139-186.

    22
    J. W. DEMMEL AND W. KAHAN, Accurate singular values of bidiagonal matrices, SIAM J. Sci. Stat. Comput., 11 (1990), pp. 873-912. (LAPACK Working Note 3).

    23
    J. W. DEMMEL AND X. LI, Faster Numerical Algorithms via Exception Handling, IEEE Trans. Comp., 43 (1994), pp. 983-992. (LAPACK Working Note 59).

    24
    J. W. DEMMEL AND K. VESELIC, Jacobi's method is more accurate than QR, SIAM J. Matrix Anal. Appl. 13 (1992), pp. 1204-1246. (LAPACK Working Note 15).

    25
    B. DE MOOR AND P. VAN DOOREN, Generalization of the singular value and QR decompositions, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 993-1014.

    26
    J. J. DONGARRA, J. R. BUNCH, C. B. MOLER, AND G. W. STEWART, LINPACK Users' Guide, SIAM, Philadelphia, PA, 1979.

    27
    J. J. DONGARRA, J. DU CROZ, I. S. DUFF, AND S. HAMMARLING, Algorithm 679: A set of Level 3 Basic Linear Algebra Subprograms, ACM Trans. Math. Soft., 16 (1990), pp. 18-28.

    28
    J. J. DONGARRA, J. DU CROZ, I. S. DUFF, AND S. HAMMARLING, A set of Level 3 Basic Linear Algebra Subprograms, ACM Trans. Math. Soft., 16 (1990), pp. 1-17.

    29
    J. J. DONGARRA, J. DU CROZ, S. HAMMARLING, AND R. J. HANSON, Algorithm 656: An extended set of FORTRAN Basic Linear Algebra Subprograms, ACM Trans. Math. Soft., 14 (1988), pp. 18-32.

    30
    J. J. DONGARRA, J. DU CROZ, S. HAMMARLING, AND R. J. HANSON, An extended set of FORTRAN Basic Linear Algebra Subprograms, ACM Trans. Math. Soft., 14 (1988), pp. 1-17.

    31
    J. J. DONGARRA, I. S. DUFF, D. C. SORENSEN, AND H. A. VAN DER VORST, Solving Linear Systems on Vector and Shared Memory Computers, SIAM Publications, 1991.

    32
    J. J. DONGARRA AND E. GROSSE, Distribution of mathematical software via electronic mail, Communications of the ACM, 30 (1987), pp. 403-407.

    33
    J. J. DONGARRA, F. G. GUSTAFSON, AND A. KARP, Implementing linear algebra algorithms for dense matrices on a vector pipeline machine, SIAM Review, 26 (1984), pp. 91-112.

    34
    J. J. DONGARRA, S. HAMMARLING, AND D. C. SORENSEN, Block reduction of matrices to condensed forms for eigenvalue computations, JCAM, 27 (1989), pp. 215-227. (LAPACK Working Note 2).

    35
    J. J. DONGARRA AND S. OSTROUCHOV, Quick installation guide for LAPACK on unix systems, Computer Science Dept. Technical Report CS-94-249, University of Tennessee, Knoxville, 1994. (LAPACK Working Note 81).

    36
    J. J. DONGARRA, R. POZO, AND D. WALKER, An object oriented design for high performance linear algebra on distributed memory architectures, Computer Science Dept. Technical Report CS-93-200, University of Tennessee, Knoxville, 1993. (LAPACK Working Note 61).

    37
    A. DUBRULLE, The multishift QR algorithm: is it worth the trouble?, Palo Alto Scientific Center Report G320-3558x, IBM Corp., Palo Alto, 1991.

    38
    J. DU CROZ AND N. J. HIGHAM, Stability of methods for matrix inversion, IMA J. Num. Anal., 12 (1992), pp. 1-19. (LAPACK Working Note 27).

    39
    J. DU CROZ, P. J. D. MAYES, AND G. RADICATI DI BROZOLO, Factorizations of band matrices using Level 3 BLAS, Computer Science Dept. Technical Report CS-90-109, University of Tennessee, Knoxville, 1990. (LAPACK Working Note 21).

    40
    S. I. FELDMAN, D. M. GAY, M. W. MAIMONE, AND N. L. SCHRYER, A Fortran-to-C Converter, Computing Science Technical Report No. 149, AT & T Bell Laboratories, Murray Hill, NJ, 1990.

    41
    V. FERNANDO AND B. PARLETT, Accurate singular values and differential qd algorithms, Numerisch Math. 67 (1994), pp. 191-229.

    42
    K. A. GALLIVAN, R. J. PLEMMONS, AND A. H. SAMEH, Parallel algorithms for dense linear algebra computations, SIAM Review, 32 (1990), pp. 54-135.

    43
    F. GANTMACHER, The Theory of Matrices, vol. II (transl.), Chelsea Publishers, New York, 1959.

    44
    B. S. GARBOW, J. M. BOYLE, J. J. DONGARRA, AND C. B. MOLER, Matrix Eigensystem Routines - EISPACK Guide Extension, vol. 51 of Lecture Notes in Computer Science, Springer-Verlag, Berlin, 1977.

    45
    G. GOLUB AND C. F. VAN LOAN, Matrix Computations, Johns Hopkins University Press, Baltimore, MD, 2nd ed., 1989.

    46
    A. GREENBAUM AND J. J. DONGARRA, Experiments with QL/QR methods for the symmetric tridiagonal eigenproblem, Computer Science Dept. Technical Report CS-89-92, University of Tennessee, Knoxville, 1989. (LAPACK Working Note 17).

    47
    M. GU AND S. EISENSTAT, A stable algorithm for the rank-1 modification of the symmetric eigenproblem, Yale University, Computer Science Department Report YALEU/DCS/RR-916, New Haven, CT (1992).

    48
    W. W. HAGER, Condition estimators, SIAM J. Sci. Stat. Comput., 5 (1984), pp. 311-316.

    49
    S. HAMMARLING, The numerical solution of the general Gauss-Markov linear model, in Mathematics in Signal Processing, eds. T. S. Durrani et al., Clarendon Press, Oxford (1986).

    50
    N. J. HIGHAM, Efficient algorithms for computing the condition number of a tridiagonal matrix, SIAM J. Sci. Stat. Comput., 7 (1986), pp. 150-165.

    51
    N. J. HIGHAM, A survey of condition number estimation for triangular matrices, SIAM Review, 29 (1987), pp. 575-596.

    52
    N. J. HIGHAM, FORTRAN codes for estimating the one-norm of a real or complex matrix, with applications to condition estimation, ACM Trans. Math. Soft., 14 (1988), pp. 381-396.

    53
    N. J. HIGHAM, Experience with a matrix norm estimator, SIAM J. Sci. Stat. Comput., 11 (1990), pp. 804-809.

    54
    N. J. HIGHAM, Perturbation Theory and Backward Error for AX-XB=C., BIT, 33 (1993), pp. 124-136.

    55
    S. HUSS-LEDERMAN, A. TSAO AND G. ZHANG, A parallel implementation of the invariant subspace decomposition algorithm for dense symmetric matrices, in Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, SIAM (1993), pp. 367-374.

    56
    T. KATO, Perturbation Theory for Linear Operators, Springer Verlag, Berlin, 2 ed., 1980.

    57
    L. KAUFMAN, Banded eigenvalue solvers on vector machines, ACM Trans. Math. Soft., 10 (1984), pp. 73-86.

    58
    C. L. LAWSON, R. J. HANSON, D. KINCAID, AND F. T. KROGH, Basic Linear Algebra Subprograms for FORTRAN usage, ACM Trans. Math. Soft., 5 (1979), pp. 308-323.

    59
    R. LEHOUCQ, The computation of elementary unitary matrices, Computer Science Dept. Technical Report CS-94-233, University of Tennessee, Knoxville, 1994. (LAPACK Working Note 72).

    60
    C. PAIGE, Fast numerically stable computations for generalized linear least squares problems, SIAM J. Num. Anal., 16 (1979), pp. 165-179.

    61
    C. PAIGE, A note on a result of Sun Ji-guang: sensitivity of the CS and GSV decomposition, SIAM J. Num. Anal., 21 (1984), pp. 186-191.

    62
    C. PAIGE, Computing the generalized singular value decomposition, SIAM J. Sci. Stat. Comput., 7 (1986), pp. 1126-1146.

    63
    C. PAIGE, Some aspects of generalized QR factorization, in Reliable Numerical Computations, eds. M. Cox and S. Hammarling, Clarendon Press, Oxford (1990).

    64
    B. PARLETT, The Symmetric Eigenvalue Problem, Prentice Hall, Englewood Cliffs, NJ, 1980.

    65
    M. PAYNE AND B. WICHMANN, Language Independent Arithmetic (LIA) - Part 1: Integer and floating point arithmetic, International Standards Organization, ISO/IEC 10967-1:1994, 1994.

    66
    A. A. POLLICINI, ed., Using Toolpack Software Tools, Kluwer Academic, 1989.

    67
    J. RUTTER, A serial implementation of Cuppen's Divide and Conquer Algorithm for the Symmetric Tridiagonal Eigenproblem, University of California, Computer Science Division Report UCB/CSD 94/799, Berkeley CA (1994). (LAPACK Working Note 69).

    68
    R. SCHREIBER AND C. F. VAN LOAN, A storage efficient WY representation for products of Householder transformations, SIAM J. Sci. Stat. Comput., 10 (1989), pp. 53-57.

    69
    I. SLAPNICAR, Accurate symmetric eigenreduction by a Jacobi method, PhD dissertation, Fernuniversität - Hagen, Hagen, Germany, 1992.

    70
    B. T. SMITH, J. M. BOYLE, J. J. DONGARRA, B. S. GARBOW, Y. IKEBE, V. C. KLEMA, AND C. B. MOLER, Matrix Eigensystem Routines - EISPACK Guide, vol. 6 of Lecture Notes in Computer Science, Springer-Verlag, Berlin, 2 ed., 1976.

    71
    G. W. STEWART, On the sensitivity of the eigenvalue problem , SIAM J. Num. Anal., 9 (1972), pp. 669-686.

    72
    G. W. STEWART, Error and perturbation bounds for subspaces associated with certain eigenvalue problems, SIAM Review, 15 (1973), pp. 727-764.

    73
    G. W. STEWART AND J.-G. SUN, Matrix Perturbation Theory, Academic Press, New York, 1990.

    74
    J.-G. SUN, Perturbation analysis for the generalized singular value problem, SIAM J. Num. Anal., 20 (1983), pp. 611-625.

    75
    J. VARAH, On the separation of two matrices, SIAM J. Num. Anal., 16 (1979), pp. 216-222.

    76
    K. VESELIC AND I. SLAPNICAR, Floating point perturbations of Hermitian matrices, Linear Algebra Appl., to appear.

    77
    D. S. WATKINS AND L. ELSNER, Convergence of algorithms of decomposition type for the eigenvalue problem, Linear Algebra Appl. 143 (1991), pp. 19-47.

    78
    J. H. WILKINSON, The Algebraic Eigenvalue Problem, Oxford University Press, Oxford, 1965.

    79
    J. H. WILKINSON, Some recent advances in numerical linear algebra, in: D. A. H. JACOBS, ed., The State of the Art in Numerical Analysis, Academic Press, New York, 1977.

    80
    J. H. WILKINSON, Kronecker's canonical form and the QZ algorithm, Lin. Alg. Appl., 28 (1979), pp. 285-303.

    81
    J. H. WILKINSON AND C. REINSCH, eds., Handbook for Automatic Computation, vol 2.: Linear Algebra, Springer-Verlag, Heidelberg, 1971.




    Tue Nov 29 14:03:33 EST 1994

    Index



    next up previous contents
    Next: About this document Up: LAPACK Users' Guide Release Previous: References

    Index

    ####1####1 Writing index file .idxK
    No Title
    ####1####1 Writing index file .idxR
    No Title
    absolute error
    How to Measure , How to Measure , How to Measure
    absolute gap
    Further Details: Error , Further Details: Error , Further Details: Error , Further Details: Error
    accuracy
    Accuracy and Stability, Accuracy and Stability, Further Details: Error , Further Details: Error
    accuracy!high
    Accuracy and Stability, Improved Error Bounds, Further Details: Error , Further Details: Error , Further Details: Error , Further Details: Error
    angle between vectors and subspaces
    How to Measure , How to Measure , How to Measure , Further Details: How , Further Details: How , Further Details: Error , Error Bounds for , Error Bounds for , Error Bounds for , Error Bounds for , Further Details: Error , Error Bounds for
    arguments!ABNRM
    Balancing and Conditioning
    arguments!arrays
    Array Arguments, Array Arguments
    arguments!BALANC
    Balancing and Conditioning
    arguments!BERR
    Further Details: Error
    arguments!description conventions
    Argument Descriptions
    arguments!DIAG
    Unit Triangular Matrices
    arguments!dimensions
    Problem Dimensions
    arguments!FERR
    Error Bounds for , Further Details: Error
    arguments!ILO and IHI
    Balancing, Balancing and Conditioning
    arguments!INFO
    Error Handling and , Invalid Arguments and , Computational Failures and
    arguments!LDA
    Array Arguments, Invalid Arguments and
    arguments!LWORK
    Work Arrays
    arguments!options
    Option Arguments
    arguments!order of
    Order of Arguments
    arguments!RANK
    Further Details: Error
    arguments!RCOND
    How to Measure
    arguments!RCONDE
    Overview, Computing and
    arguments!RCONDV
    Overview, Computing and
    arguments!SCALE
    Balancing and Conditioning
    arguments!UPLO
    Option Arguments
    arguments!work space
    Work Arrays
    ARPACK
    Preface to the
    auxiliary routines
    Levels of Routines
    Auxiliary Routines, index of: see Appendix B
    Notes
    avoiding poor performance
    Poor Performance
    backward error
    Standard Error Analysis, Standard Error Analysis, Further Details: Error , Further Details: Error , Further Details: Error , Further Details: Error , Further Details: Error
    backward stability
    Standard Error Analysis, Overview, Further Details: Error , Further Details: Error , Further Details: Error
    backward stability!componentwise
    Improved Error Bounds, Further Details: Error
    backward stability!normwise
    Standard Error Analysis
    balancing and conditioning, eigenproblems
    Generalized Nonsymmetric Eigenproblems
    balancing of eigenproblems
    Balancing and Conditioning
    BANDR (EISPACK)
    Notes
    BANDV (EISPACK)
    Notes
    basis, orthonormal
    Nonsymmetric Eigenproblems (NEP)
    bidiagonal form
    Eigenvalue Problems, Representation of Orthogonal
    bidiagonal form
    Singular Value Decomposition
    BLAS
    LAPACK and the , The BLAS as
    BLAS!Level 1
    The BLAS as
    BLAS!Level 2
    The BLAS as , Block Algorithms and , Block Algorithms and
    BLAS!Level 3
    The BLAS as , Block Algorithms and
    BLAS!Level 3, fast
    Accuracy and Stability, Error Bounds for
    BLAS!quick reference guide: see Appendix C
    Quick Reference\indexBLAS!quick reference guide:
    block algorithm
    Block Algorithms and
    block size
    Installing ILAENV
    block size!determination of
    Determining the Block
    block size!from ILAENV
    Determining the Block
    block size!tuning ILAENV
    Determining the Block
    block width
    Block Algorithms and
    blocked algorithms, performance
    Examples of Block
    BQR (EISPACK)
    Notes
    bug reports
    Support for LAPACK
    cache
    Performance of LAPACK, Data Movement
    CAPSS
    Preface to the
    CBDSQR
    Singular Value Decomposition, Singular Value Decomposition, Eigenvalue Problems, Further Details: Error
    CGBBRD
    Singular Value Decomposition
    CGBCON
    Linear Equations
    CGBEQU
    Linear Equations
    CGBRFS
    Linear Equations
    CGBSVX
    Further Details: How
    CGBTRF
    Linear Equations
    CGBTRS
    Linear Equations
    CGEBAK
    Balancing
    CGEBAL
    Balancing, Balancing, Balancing and Conditioning
    CGEBRD
    Singular Value Decomposition
    CGECON
    Linear Equations
    CGEEQU
    Linear Equations
    CGEES
    Nonsymmetric Eigenproblems (NEP), Poor Performance
    CGEESX
    Nonsymmetric Eigenproblems (NEP), How to Measure , Overview, Poor Performance
    CGEEV
    Nonsymmetric Eigenproblems (NEP), Poor Performance
    CGEEVX
    Nonsymmetric Eigenproblems (NEP), Error Bounds for , Overview, Poor Performance
    CGEGS
    Generalized Nonsymmetric Eigenproblems
    CGEGV
    Generalized Nonsymmetric Eigenproblems
    CGEHRD
    EigenvaluesEigenvectors and , Balancing, Generalized Nonsymmetric Eigenproblems
    CGELQF
    Factorization, Singular Value Decomposition
    CGELS
    Linear Least Squares , Error Bounds for , Further Details: Error
    CGELSS
    Linear Least Squares , Linear Least Squares , Error Bounds for , Further Details: Error , Installing ILAENV
    CGELSX
    Linear Least Squares , Linear Least Squares , Error Bounds for , Further Details: Error
    CGEQLF
    Other Factorizations
    CGEQPF
    Factorization with Column
    CGEQRF
    Factorization, Factorization with Column , Singular Value Decomposition
    CGERFS
    Linear Equations
    CGERQF
    Other Factorizations
    CGESVD
    Singular Value Decomposition , Singular Value Decomposition, Further Details: Error , Error Bounds for , Further Details: Error , Installing ILAENV
    CGESVX
    Further Details: How
    CGETRF
    Linear Equations
    CGETRI
    Linear Equations
    CGETRS
    Linear Equations
    CGGBAK
    Generalized Nonsymmetric Eigenproblems, Generalized Nonsymmetric Eigenproblems
    CGGBAL
    Generalized Nonsymmetric Eigenproblems, Generalized Nonsymmetric Eigenproblems
    CGGGLM
    Generalized Linear Least
    CGGHRD
    Generalized Nonsymmetric Eigenproblems
    CGGQRF
    Generalized Factorization
    CGGRQF
    Generalized factorization
    CGGSVD
    Generalized Singular Value , Error Bounds for
    CGGSVP
    Generalized (or Quotient)
    CGTCON
    Linear Equations
    CGTRFS
    Linear Equations
    CGTTRF
    Linear Equations
    CGTTRS
    Linear Equations
    CHBEV
    Error Bounds for
    CHBEVD
    Error Bounds for
    CHBEVX
    Error Bounds for
    CHBGST
    Generalized Symmetric Definite
    CHBGV
    Error Bounds for
    CHBTRD
    Symmetric Eigenproblems, Symmetric Eigenproblems
    CHECON
    Linear Equations
    CHEEV
    Error Bounds for
    CHEEVD
    Error Bounds for
    CHEEVX
    Error Bounds for
    CHEGV
    Standard Error Analysis, Error Bounds for , Further Details: Error
    CHERFS
    Linear Equations
    CHETRD
    Symmetric Eigenproblems
    CHETRF
    Linear Equations
    CHETRI
    Linear Equations
    CHETRS
    Linear Equations
    CHGEQZ
    Generalized Nonsymmetric Eigenproblems
    Cholesky factorization!blocked form
    Block Algorithms and
    Cholesky factorization!split
    Generalized Symmetric Definite
    chordal distance
    Further Details: Error
    CHPCON
    Linear Equations
    CHPEV
    Error Bounds for
    CHPEVD
    Error Bounds for
    CHPEVX
    Error Bounds for
    CHPGV
    Error Bounds for
    CHPRFS
    Linear Equations
    CHPTRD
    Symmetric Eigenproblems
    CHPTRF
    Linear Equations
    CHPTRI
    Linear Equations
    CHPTRS
    Linear Equations
    CHSEIN
    EigenvaluesEigenvectors and
    CHSEQR
    EigenvaluesEigenvectors and , Balancing, Installing ILAENV, Installing ILAENV, Poor Performance
    cluster!eigenvalues
    Further Details: Error
    cluster!eigenvalues!error bound
    Overview
    cluster!singular values
    Further Details: Error
    complete orthogonal factorization
    Complete Orthogonal Factorization
    computational routines
    Levels of Routines, Computational Routines
    Computational Routines, index of: see Appendix A
    Notes
    COMQR (EISPACK)
    Eigenvalue Problems
    condensed form!reduction to
    Eigenvalue Problems
    condition number
    Invariant Subspaces and , Invariant Subspaces and , How to Measure , Standard Error Analysis, Standard Error Analysis, Error Bounds for , Error Bounds for , Further Details: Error , Further Details: Error , Error Bounds for , Error Bounds for , Error Bounds for , Overview, Overview, Balancing and Conditioning, Balancing and Conditioning, Balancing and Conditioning, Error Bounds for , Error Bounds for , Further Details: Error , Error Bounds for
    condition number!estimate
    Standard Error Analysis, Further Details: Error , Computing and
    Cosine-Sine decomposition
    Generalized Singular Value
    CPBCON
    Linear Equations
    CPBEQU
    Linear Equations
    CPBRFS
    Linear Equations
    CPBSTF
    Generalized Symmetric Definite
    CPBTRF
    Linear Equations
    CPBTRS
    Linear Equations
    CPOCON
    Linear Equations
    CPOEQU
    Linear Equations
    CPORFS
    Linear Equations
    CPOTRF
    Linear Equations
    CPOTRI
    Linear Equations
    CPOTRS
    Linear Equations
    CPPCON
    Linear Equations
    CPPEQU
    Linear Equations
    CPPRFS
    Linear Equations
    CPPTRF
    Linear Equations
    CPPTRI
    Linear Equations
    CPPTRS
    Linear Equations
    CPTCON
    Linear Equations, Further Details: Error
    CPTEQR
    Symmetric Eigenproblems, Further Details: Error
    CPTRFS
    Linear Equations
    CPTTRF
    Linear Equations
    CPTTRS
    Linear Equations
    Crawford's algorithm
    Generalized Symmetric Definite
    crossover point
    Installing ILAENV
    CSPCON
    Linear Equations
    CSPRFS
    Linear Equations
    CSPTRF
    Linear Equations
    CSPTRI
    Linear Equations
    CSPTRS
    Linear Equations
    CSTEDC
    Symmetric Eigenproblems, Eigenvalue Problems
    CSTEIN
    Symmetric Eigenproblems, Symmetric Eigenproblems
    CSTEQR
    Symmetric Eigenproblems, Symmetric Eigenproblems, Eigenvalue Problems
    CSYCON
    Linear Equations
    CSYRFS
    Linear Equations
    CSYTRF
    Linear Equations
    CSYTRI
    Linear Equations
    CSYTRS
    Linear Equations
    CTBCON
    Linear Equations
    CTBRFS
    Linear Equations
    CTBTRS
    Linear Equations
    CTGEVC
    Generalized Nonsymmetric Eigenproblems
    CTGSJA
    Generalized (or Quotient) , Further Details: Error
    CTPCON
    Linear Equations
    CTPRFS
    Linear Equations
    CTPTRI
    Linear Equations
    CTPTRS
    Linear Equations
    CTRCON
    Linear Equations, Further Details: Error
    CTREVC
    EigenvaluesEigenvectors and
    CTREXC
    Invariant Subspaces and , Computing and
    CTRRFS
    Linear Equations
    CTRSEN
    Invariant Subspaces and , Overview
    CTRSNA
    Invariant Subspaces and , Overview
    CTRSYL
    Invariant Subspaces and , Computing and
    CTRTRI
    Linear Equations
    CTRTRS
    Linear Equations, Factorization, Factorization
    CTZRQF
    Complete Orthogonal Factorization
    CUNGBR
    Singular Value Decomposition
    CUNGHR
    EigenvaluesEigenvectors and
    CUNGLQ
    Factorization
    CUNGQR
    Factorization, Factorization with Column
    CUNMBR
    Singular Value Decomposition
    CUNMHR
    EigenvaluesEigenvectors and
    CUNMLQ
    Factorization
    CUNMQR
    Factorization, Factorization, Factorization, Factorization with Column , Generalized Factorization, Generalized factorization
    CUNMRQ
    Generalized Factorization, Generalized factorization
    CUNMTR
    Symmetric Eigenproblems, Symmetric Eigenproblems
    CUPGTR
    Symmetric Eigenproblems
    Cuppen's divide and conquer algorithm
    Symmetric Eigenproblems
    cycle time
    Performance of LAPACK
    data movement
    Data Movement
    DBDSQR
    Singular Value Decomposition, Singular Value Decomposition, Eigenvalue Problems, Further Details: Error
    DDISNA
    Further Details: Error , Further Details: Error
    deflating subspaces
    Generalized Nonsymmetric Eigenproblems , Generalized Nonsymmetric Eigenproblems
    deflating subspaces!error bound
    Error Bounds for
    DGBBRD
    Singular Value Decomposition
    DGBCON
    Linear Equations
    DGBEQU
    Linear Equations
    DGBRFS
    Linear Equations
    DGBSVX
    Further Details: How
    DGBTRF
    Linear Equations
    DGBTRS
    Linear Equations
    DGEBAK
    Balancing
    DGEBAL
    Balancing, Balancing, Balancing and Conditioning
    DGEBRD
    Singular Value Decomposition
    DGECON
    Linear Equations
    DGEEQU
    Linear Equations
    DGEES
    Nonsymmetric Eigenproblems (NEP), Poor Performance
    DGEESX
    Nonsymmetric Eigenproblems (NEP), How to Measure , Overview, Poor Performance
    DGEEV
    Nonsymmetric Eigenproblems (NEP), Poor Performance
    DGEEVX
    Nonsymmetric Eigenproblems (NEP), Error Bounds for , Overview, Poor Performance
    DGEGS
    Generalized Nonsymmetric Eigenproblems
    DGEGV
    Generalized Nonsymmetric Eigenproblems
    DGEHRD
    EigenvaluesEigenvectors and , Balancing, Generalized Nonsymmetric Eigenproblems
    DGELQF
    Factorization, Singular Value Decomposition
    DGELS
    Linear Least Squares , Error Bounds for , Further Details: Error
    DGELSS
    Linear Least Squares , Linear Least Squares , Error Bounds for , Further Details: Error , Installing ILAENV
    DGELSX
    Linear Least Squares , Linear Least Squares , Error Bounds for , Further Details: Error
    DGEQLF
    Other Factorizations
    DGEQPF
    Factorization with Column
    DGEQRF
    Factorization, Factorization with Column , Singular Value Decomposition, Factorization
    DGERFS
    Linear Equations
    DGERQF
    Other Factorizations
    DGESVD
    Singular Value Decomposition , Singular Value Decomposition, Further Details: Error , Error Bounds for , Further Details: Error , Installing ILAENV
    DGESVX
    Further Details: How
    DGETRF
    Linear Equations, Factorizations for Solving
    DGETRI
    Linear Equations
    DGETRS
    Linear Equations
    DGGBAK
    Generalized Nonsymmetric Eigenproblems, Generalized Nonsymmetric Eigenproblems
    DGGBAL
    Generalized Nonsymmetric Eigenproblems, Generalized Nonsymmetric Eigenproblems
    DGGGLM
    Generalized Linear Least
    DGGHRD
    Generalized Nonsymmetric Eigenproblems
    DGGQRF
    Generalized Factorization
    DGGRQF
    Generalized factorization
    DGGSVD
    Generalized Singular Value , Error Bounds for
    DGGSVP
    Generalized (or Quotient)
    DGTCON
    Linear Equations
    DGTRFS
    Linear Equations
    DGTTRF
    Linear Equations
    DGTTRS
    Linear Equations
    DHGEQZ
    Generalized Nonsymmetric Eigenproblems
    DHSEIN
    EigenvaluesEigenvectors and
    DHSEQR
    EigenvaluesEigenvectors and , Balancing, Installing ILAENV, Installing ILAENV, Poor Performance
    distributed memory
    Preface to the
    divide and conquer
    Symmetric Eigenproblems (SEP)
    DLAMCH
    Sources of Error , Further Details: Floating , Points to Note
    documentation, structure
    Structure of the
    DOPGTR
    Symmetric Eigenproblems
    DOPMTR
    Symmetric Eigenproblems
    DORGBR
    Singular Value Decomposition
    DORGHR
    EigenvaluesEigenvectors and
    DORGLQ
    Factorization
    DORGQR
    Factorization, Factorization with Column
    DORGTR
    Symmetric Eigenproblems
    DORMBR
    Singular Value Decomposition, Singular Value Decomposition
    DORMHR
    EigenvaluesEigenvectors and , EigenvaluesEigenvectors and
    DORMLQ
    Factorization, Factorization
    DORMQR
    Factorization, Factorization, Factorization, Factorization with Column , Generalized Factorization, Generalized factorization
    DORMRQ
    Generalized Factorization, Generalized factorization
    DORMTR
    Symmetric Eigenproblems
    DPBCON
    Linear Equations
    DPBEQU
    Linear Equations
    DPBRFS
    Linear Equations
    DPBSTF
    Generalized Symmetric Definite
    DPBTRF
    Linear Equations
    DPBTRS
    Linear Equations
    DPOCON
    Linear Equations
    DPOEQU
    Linear Equations
    DPORFS
    Linear Equations
    DPOTRF
    Linear Equations, Factorizations for Solving
    DPOTRI
    Linear Equations
    DPOTRS
    Linear Equations
    DPPCON
    Linear Equations
    DPPEQU
    Linear Equations
    DPPRFS
    Linear Equations
    DPPTRF
    Linear Equations
    DPPTRI
    Linear Equations
    DPPTRS
    Linear Equations
    DPTCON
    Linear Equations, Further Details: Error
    DPTEQR
    Symmetric Eigenproblems, Further Details: Error
    DPTRFS
    Linear Equations
    DPTTRF
    Linear Equations
    DPTTRS
    Linear Equations
    driver routine!generalized least squares
    Generalized Linear Least
    driver routine!generalized nonsymmetric eigenvalue problem
    Generalized Nonsymmetric Eigenproblems , Generalized Nonsymmetric Eigenproblems
    driver routine!generalized SVD
    Generalized Singular Value
    driver routine!generalized symmetric definite eigenvalue problem
    Generalized Symmetric Definite
    driver routine!linear equations
    Linear Equations
    driver routine!linear least squares
    Linear Least Squares
    driver routine!nonsymmetric eigenvalue problem
    Nonsymmetric Eigenproblems (NEP)
    driver routine!SVD
    Singular Value Decomposition
    driver routine!symmetric eigenvalue problem
    Symmetric Eigenproblems (SEP)
    driver routine!symmetric tridiagonal eigenvalue problem
    Data Types and , Singular Value Decomposition
    driver routines
    Levels of Routines, Driver Routines
    driver routines!divide and conquer
    Symmetric Eigenproblems (SEP)
    driver routines!expert
    Linear Equations, Symmetric Eigenproblems (SEP)
    driver routines!simple
    Linear Equations, Symmetric Eigenproblems (SEP)
    Driver Routines, index of: see Appendix A
    Notes
    DSBEV
    Error Bounds for
    DSBEVD
    Error Bounds for
    DSBEVX
    Error Bounds for
    DSBGST
    Generalized Symmetric Definite
    DSBGV
    Error Bounds for
    DSBTRD
    Symmetric Eigenproblems, Symmetric Eigenproblems
    DSPCON
    Linear Equations
    DSPEV
    Error Bounds for
    DSPEVD
    Error Bounds for
    DSPEVX
    Error Bounds for
    DSPGV
    Error Bounds for
    DSPRFS
    Linear Equations
    DSPTRD
    Symmetric Eigenproblems
    DSPTRF
    Linear Equations
    DSPTRI
    Linear Equations
    DSPTRS
    Linear Equations
    DSTEBZ
    Symmetric Eigenproblems, Further Details: Error
    DSTEDC
    Symmetric Eigenproblems, Eigenvalue Problems
    DSTEIN
    Symmetric Eigenproblems, Symmetric Eigenproblems
    DSTEQR
    Symmetric Eigenproblems, Symmetric Eigenproblems, Eigenvalue Problems
    DSTERF
    Symmetric Eigenproblems, Eigenvalue Problems
    DSTEV
    Error Bounds for
    DSTEVD
    Error Bounds for
    DSTEVX
    Error Bounds for , Further Details: Error
    DSYCON
    Linear Equations
    DSYEV
    Error Bounds for
    DSYEVD
    Error Bounds for
    DSYEVX
    Error Bounds for
    DSYGV
    Standard Error Analysis, Error Bounds for , Further Details: Error
    DSYRFS
    Linear Equations
    DSYTRD
    Symmetric Eigenproblems, Eigenvalue Problems
    DSYTRF
    Linear Equations, Factorizations for Solving
    DSYTRI
    Linear Equations
    DSYTRS
    Linear Equations
    DTBCON
    Linear Equations
    DTBRFS
    Linear Equations
    DTBTRS
    Linear Equations
    DTGEVC
    Generalized Nonsymmetric Eigenproblems
    DTGSJA
    Generalized (or Quotient) , Further Details: Error
    DTPCON
    Linear Equations
    DTPRFS
    Linear Equations
    DTPTRI
    Linear Equations
    DTPTRS
    Linear Equations
    DTRCON
    Linear Equations, Further Details: Error
    DTREVC
    EigenvaluesEigenvectors and
    DTREXC
    Invariant Subspaces and , Computing and
    DTRRFS
    Linear Equations
    DTRSEN
    Invariant Subspaces and , Overview
    DTRSNA
    Invariant Subspaces and , Overview
    DTRSYL
    Invariant Subspaces and , Computing and
    DTRTRI
    Linear Equations
    DTRTRS
    Linear Equations, Factorization, Factorization
    DTZRQF
    Complete Orthogonal Factorization
    effective rank
    Factorization with Column
    efficiency
    The BLAS as
    eigendecomposition!blocked form
    Eigenvalue Problems
    eigendecomposition!multishift QR iteration
    Eigenvalue Problems
    eigendecomposition!symmetric
    Error Bounds for
    eigenvalue
    Symmetric Eigenproblems (SEP), Symmetric Eigenproblems (SEP), Symmetric Eigenproblems, EigenvaluesEigenvectors and , EigenvaluesEigenvectors and , Error Bounds for
    eigenvalue problem!ill-conditioned
    Generalized Nonsymmetric Eigenproblems
    eigenvalue problem!singular
    Generalized Nonsymmetric Eigenproblems
    eigenvalue!error bound
    Error Bounds for , Further Details: Error , Error Bounds for , Overview, Balancing and Conditioning, Error Bounds for , Error Bounds for , Further Details: Error , Further Details: Error , Error Bounds for
    eigenvalue!generalized
    Generalized Nonsymmetric Eigenproblems
    eigenvalue!GNEP
    Generalized Nonsymmetric Eigenproblems
    eigenvalue!GSEP
    Generalized Symmetric Definite
    eigenvalue!infinite
    Generalized Nonsymmetric Eigenproblems
    eigenvalue!NEP
    Nonsymmetric Eigenproblems (NEP)
    eigenvalue!nontrivial
    Generalized Singular Value
    eigenvalue!ordering of
    Nonsymmetric Eigenproblems (NEP), Invariant Subspaces and
    eigenvalue!sensitivity of
    Invariant Subspaces and , Invariant Subspaces and
    eigenvalue!SEP
    Symmetric Eigenproblems
    eigenvalue!trivial
    Generalized Singular Value
    eigenvector
    Symmetric Eigenproblems (SEP), Symmetric Eigenproblems, Error Bounds for
    eigenvector!error bound
    Error Bounds for , Further Details: Error , Error Bounds for , Overview, Balancing and Conditioning, Error Bounds for , Error Bounds for , Further Details: Error , Further Details: Error , Error Bounds for
    eigenvector!GNEP
    Generalized Nonsymmetric Eigenproblems
    eigenvector!GNEP!left
    Generalized Nonsymmetric Eigenproblems
    eigenvector!GNEP!right
    Generalized Nonsymmetric Eigenproblems
    eigenvector!GSEP
    Generalized Symmetric Definite
    eigenvector!left
    Nonsymmetric Eigenproblems (NEP), EigenvaluesEigenvectors and , Generalized Nonsymmetric Eigenproblems
    eigenvector!left!generalized
    Generalized Nonsymmetric Eigenproblems
    eigenvector!NEP
    Nonsymmetric Eigenproblems (NEP)
    eigenvector!right
    Nonsymmetric Eigenproblems (NEP), EigenvaluesEigenvectors and , Generalized Nonsymmetric Eigenproblems
    eigenvector!right!generalized
    Generalized Nonsymmetric Eigenproblems
    eigenvector!SEP
    Symmetric Eigenproblems
    EISPACK
    LAPACK Compared with , Matrix Storage Schemes, Band Storage, Installing ILAENV
    EISPACK!converting from: see Appendix D
    Converting from LINPACK
    elementary Householder matrix, see Householder matrix
    Factorization, Factorization, Singular Value Decomposition, Representation of Orthogonal , Representation of Orthogonal
    elementary reflector, see Householder matrix
    Factorization, Singular Value Decomposition, Representation of Orthogonal , Representation of Orthogonal
    equality-constrained least squares
    Generalized Linear Least
    equilibration
    Linear Equations, Linear Equations
    errata
    Known Problems in
    error bounds
    Accuracy and Stability
    error bounds!clustered eigenvalues
    Overview
    error bounds!generalized least squares
    Error Bounds for
    error bounds!generalized nonsymmetric eigenproblem
    Error Bounds for
    error bounds!generalized singular value decomposition
    Error Bounds for , Error Bounds for , Further Details: Error
    error bounds!generalized symmetric definite eigenproblem
    Error Bounds for , Further Details: Error
    error bounds!linear equations
    Error Bounds for , Further Details: Error
    error bounds!linear least squares
    Error Bounds for , Further Details: Error
    error bounds!nonsymmetric eigenproblem
    Error Bounds for , Overview
    error bounds!required for fast Level 3 BLAS
    Error Bounds for
    error bounds!singular value decomposition
    Error Bounds for , Further Details: Error
    error bounds!symmetric eigenproblem
    Error Bounds for , Further Details: Error
    error handler, XERBLA
    Error Handling and , Error Handling and , Invalid Arguments and
    error!absolute
    How to Measure , How to Measure , How to Measure
    error!analysis
    Standard Error Analysis
    error!backward
    Standard Error Analysis, Standard Error Analysis, Further Details: Error , Further Details: Error , Further Details: Error , Further Details: Error , Further Details: Error
    error!measurement of
    How to Measure
    error!measurement of!matrix
    How to Measure , How to Measure
    error!measurement of!scalar
    How to Measure , How to Measure
    error!measurement of!subspace
    How to Measure , How to Measure
    error!measurement of!vector
    How to Measure , How to Measure
    error!relative
    Sources of Error , Further Details: Floating , How to Measure , How to Measure , How to Measure , Further Details: How , Standard Error Analysis, Further Details: Error
    failures
    Failures Detected by
    failures!common causes
    Common Errors in
    failures!error handling
    Error Handling and
    failures!INFO
    Error Handling and
    floating-point arithmetic
    Sources of Error
    floating-point arithmetic!guard digit
    Further Details: Floating
    floating-point arithmetic!IEEE standard
    Further Details: Floating
    floating-point arithmetic!infinity
    Further Details: Floating
    floating-point arithmetic!machine precision
    Sources of Error , Further Details: Floating
    floating-point arithmetic!NaN
    Further Details: Floating
    floating-point arithmetic!Not-a-Number
    Further Details: Floating
    floating-point arithmetic!overflow
    Sources of Error , Further Details: Floating , Further Details: Floating , Further Details: Floating , How to Measure , Further Details: Error , Further Details: Error , Further Details: Error , Further Details: Error
    floating-point arithmetic!roundoff error
    Further Details: Floating
    floating-point arithmetic!underflow
    Sources of Error , Further Details: Floating , Further Details: Floating , Further Details: Floating
    forward error
    Linear Equations
    forward stability
    Improved Error Bounds
    forward stability!componentwise relative
    Improved Error Bounds
    gap
    Further Details: Error , Further Details: Error , Further Details: Error , Further Details: Error , Further Details: Error , Further Details: Error
    general linear model problem
    Generalized Factorization
    generalized eigenproblem!nonsymmetric
    Generalized Nonsymmetric Eigenproblems
    generalized eigenproblem!symmetric definite
    Generalized Symmetric Definite
    generalized eigenproblem!symmetric definite banded
    Generalized Symmetric Definite
    generalized Hessenberg form!reduction to
    Generalized Nonsymmetric Eigenproblems
    generalized least squares
    Generalized Linear Least
    generalized orthogonal factorization
    Generalized Orthogonal Factorizations
    generalized Schur vectors
    Generalized Nonsymmetric Eigenproblems
    generalized singular value
    Generalized Singular Value
    generalized singular value decomposition
    Generalized Singular Value , Generalized (or Quotient)
    generalized singular value decomposition!special cases
    Generalized Singular Value
    Givens rotation
    Symmetric Eigenproblems, Singular Value Decomposition, Generalized Symmetric Definite
    GLM
    Generalized Linear Least , Generalized Factorization
    GNEP
    Generalized Nonsymmetric Eigenproblems
    GQR
    Generalized Linear Least , Other Factorizations, Generalized Factorization, Generalized Factorization, Generalized Factorization
    GRQ
    Generalized Linear Least , Generalized factorization, Generalized factorization
    GSEP
    Generalized Symmetric Definite
    GSVD
    Generalized Singular Value , Generalized Singular Value , Generalized (or Quotient)
    guard digit
    Further Details: Floating
    Hessenberg form
    Eigenvalue Problems, Eigenvalue Problems
    Hessenberg form
    Balancing
    Hessenberg form!reduction to
    Eigenvalue Problems
    Hessenberg form!upper
    EigenvaluesEigenvectors and
    Householder matrix
    Factorization, Representation of Orthogonal
    Householder matrix!complex
    Representation of Orthogonal
    Householder transformation - blocked form
    Factorization
    Householder vector
    Representation of Orthogonal
    HQR (EISPACK)
    Eigenvalue Problems
    HTRID3 (EISPACK)
    Notes
    HTRIDI (EISPACK)
    Notes
    ILAENV
    Determining the Block , Determining the Block , Points to Note, Installing ILAENV, Installing ILAENV, Installing ILAENV, Poor Performance
    ill-conditioned
    Standard Error Analysis
    ill-posed
    Standard Error Analysis
    IMTQL1 (EISPACK)
    Notes
    IMTQL2 (EISPACK)
    Notes
    infinity
    Further Details: Floating
    input error
    Sources of Error
    installation
    Installation of LAPACK
    installation guide
    Points to Note
    installation!ILAENV
    Installing ILAENV
    installation!LAPACK
    Points to Note
    installation!xLAMCH
    Further Details: Floating , Points to Note
    installation!xLAMCH!cost of
    Poor Performance
    invariant subspaces
    Generalized Nonsymmetric Eigenproblems , Invariant Subspaces and , Overview
    invariant subspaces!error bound
    Further Details: Error , Overview
    inverse iteration
    EigenvaluesEigenvectors and
    inverse iteration
    Symmetric Eigenproblems
    iterative refinement
    Linear Equations, Error Bounds for
    LAPACK++
    Preface to the
    LDL factorization
    Linear Equations
    linear equations
    Linear Equations
    linear least squares problem
    Factorization with Column
    linear least squares problem
    Linear Least Squares , Orthogonal Factorizations and
    linear least squares problem
    Factorization, Factorization with Column
    linear least squares problem!generalized
    Generalized Linear Least
    linear least squares problem!generalized!equality-constrained (LSE)
    Generalized factorization
    linear least squares problem!generalized!equality-constrained (LSE)
    Generalized Linear Least
    linear least squares problem!generalized!regression model (GLM)
    Generalized Linear Least
    linear least squares problem!overdetermined
    Error Bounds for
    linear least squares problem!overdetermined system
    Further Details: Error
    linear least squares problem!rank-deficient
    Factorization with Column , Complete Orthogonal Factorization, Singular Value Decomposition
    linear least squares problem!regularization
    Further Details: Error
    linear least squares problem!underdetermined
    Further Details: Error
    linear least squares problem!weighted
    Generalized Linear Least
    linear systems, solution of
    Linear Equations, Linear Equations
    LINPACK
    LAPACK Compared with , Block Algorithms and , Matrix Storage Schemes
    LINPACK!converting from: see Appendix D
    Converting from LINPACK
    LLS
    Linear Least Squares , Linear Least Squares
    local memory
    Data Movement
    LQ factorization
    Factorization
    LSE
    Generalized Linear Least , Generalized Linear Least , Generalized factorization
    LU factorization
    Linear Equations
    LU factorization!blocked form
    Factorizations for Solving
    LU factorization!matrix types
    Linear Equations
    machine parameters
    Points to Note, Installing ILAENV
    machine precision
    Sources of Error , Further Details: Floating
    matrix inversion
    Linear Equations
    minimum norm solution
    Linear Least Squares
    minimum norm least squares solution
    Linear Least Squares
    minimum norm solution
    Singular Value Decomposition
    minimum norm solution
    Factorization, Complete Orthogonal Factorization
    multishift QR algorithm, tuning
    Installing ILAENV
    naming scheme
    Naming Scheme
    naming scheme!auxiliary
    Naming Scheme
    naming scheme!driver and computational
    Naming Scheme
    NaN
    Further Details: Floating
    NEP
    Nonsymmetric Eigenproblems (NEP)
    netlib
    Availability of LAPACK
    nonsymmetric eigenproblem
    EigenvaluesEigenvectors and
    nonsymmetric eigenproblem!generalized
    Generalized Nonsymmetric Eigenproblems , Generalized Nonsymmetric Eigenproblems
    norm!Frobenius norm
    Further Details: How
    norm!matrix
    How to Measure
    norm!two norm
    Further Details: How
    norm!vector
    How to Measure
    normalization
    Generalized Nonsymmetric Eigenproblems, Generalized Nonsymmetric Eigenproblems
    Not-a-Number
    Further Details: Floating
    Numerical Algorithms Group
    Availability of LAPACK
    numerical error, sources of
    Sources of Error
    numerical error, sources of!input error
    Sources of Error
    numerical error, sources of!roundoff error
    Sources of Error , Further Details: Floating
    orthogonal (unitary) transformation
    Complete Orthogonal Factorization
    orthogonal (unitary) factorizations
    Orthogonal Factorizations and
    orthogonal (unitary) transformation
    Generalized Nonsymmetric Eigenproblems, Eigenvalue Problems
    orthogonal factorization!generalized
    Generalized Orthogonal Factorizations
    overdetermined system
    Linear Least Squares , Linear Least Squares , Further Details: Error , Further Details: Error
    overflow
    Sources of Error , Further Details: Floating , Further Details: Floating , Further Details: Floating , How to Measure , Further Details: Error , Further Details: Error , Further Details: Error , Further Details: Error
    parallelism!compiler directives
    Parallelism
    parallelism!loop-based
    Parallelism
    PBLAS
    ScaLAPACK
    pencils
    Generalized Nonsymmetric Eigenproblems
    performance
    Computers for which , Performance of LAPACK
    performance!block size
    Installing ILAENV
    performance!crossover point
    Installing ILAENV
    performance!LWORK
    Poor Performance
    performance!recommendations
    Poor Performance
    performance!sensitivity
    Poor Performance
    permutation
    Generalized Nonsymmetric Eigenproblems
    portability
    Factors that Affect , The BLAS as , Further Details: Floating
    QL factorization
    Other Factorizations
    QL factorization!implicit
    Symmetric Eigenproblems
    QR decomposition!with pivoting
    Generalized (or Quotient)
    QR factorization
    Factorization
    QR factorization!blocked form
    Factorization
    QR factorization!generalized (GQR)
    Generalized Linear Least , Other Factorizations, Generalized Factorization
    QR factorization!implicit
    Symmetric Eigenproblems
    QR factorization!with pivoting
    Factorization with Column
    quotient singular value decomposition
    Generalized Singular Value , Generalized (or Quotient)
    QZ method
    Generalized Nonsymmetric Eigenproblems
    rank!numerical determination of
    Factorization with Column , Generalized (or Quotient) , Error Bounds for
    RATQR (EISPACK)
    Notes
    reduction!bidiagonal
    Singular Value Decomposition
    reduction!tridiagonal
    Symmetric Eigenproblems, Symmetric Eigenproblems
    reduction!upper Hessenberg
    EigenvaluesEigenvectors and
    regression, generalized linear
    Generalized Linear Least
    regularization
    Further Details: Error
    relative error
    Sources of Error , Further Details: Floating , How to Measure , How to Measure , How to Measure , Further Details: How , Standard Error Analysis, Further Details: Error
    relative gap
    Further Details: Error , Further Details: Error
    roundoff error
    Sources of Error , Further Details: Floating
    RQ factorization
    Other Factorizations
    RQ factorization!generalized (GRQ)
    Generalized Linear Least , Generalized factorization, Generalized factorization
    RSB (EISPACK)
    Notes
    RST (EISPACK)
    Notes
    s and sep
    Computing and
    SBDSQR
    Singular Value Decomposition, Singular Value Decomposition, Eigenvalue Problems, Further Details: Error
    ScaLAPACK
    ScaLAPACK
    scaling
    Linear Equations, Generalized Nonsymmetric Eigenproblems, Balancing and Conditioning, Balancing and Conditioning
    Schur factorization!generalized
    Generalized Nonsymmetric Eigenproblems
    Schur decomposition!generalized
    Generalized Nonsymmetric Eigenproblems
    Schur factorization
    Nonsymmetric Eigenproblems (NEP), EigenvaluesEigenvectors and
    Schur factorization!generalized
    Generalized Nonsymmetric Eigenproblems
    Schur form
    EigenvaluesEigenvectors and , Balancing, Invariant Subspaces and , Invariant Subspaces and
    Schur form!generalized
    Generalized Nonsymmetric Eigenproblems, Generalized Nonsymmetric Eigenproblems
    Schur vectors
    Nonsymmetric Eigenproblems (NEP), EigenvaluesEigenvectors and
    Schur vectors!generalized
    Generalized Nonsymmetric Eigenproblems , Generalized Nonsymmetric Eigenproblems
    SDISNA
    Further Details: Error , Further Details: Error
    SEP
    Symmetric Eigenproblems (SEP), Symmetric Eigenproblems
    separation of matrices
    Computing and
    SGBBRD
    Singular Value Decomposition
    SGBCON
    Linear Equations
    SGBEQU
    Linear Equations
    SGBRFS
    Linear Equations
    SGBSVX
    Further Details: How
    SGBTRF
    Linear Equations
    SGBTRS
    Linear Equations
    SGEBAK
    Balancing
    SGEBAL
    Balancing, Balancing, Balancing and Conditioning
    SGEBRD
    Singular Value Decomposition, Eigenvalue Problems, Installing ILAENV
    SGECON
    Linear Equations
    SGEEQU
    Linear Equations
    SGEES
    Nonsymmetric Eigenproblems (NEP), Poor Performance
    SGEESX
    Nonsymmetric Eigenproblems (NEP), How to Measure , Overview, Poor Performance
    SGEEV
    Nonsymmetric Eigenproblems (NEP), Poor Performance
    SGEEVX
    Nonsymmetric Eigenproblems (NEP), Error Bounds for , Overview, Poor Performance
    SGEGS
    Generalized Nonsymmetric Eigenproblems
    SGEGV
    Generalized Nonsymmetric Eigenproblems
    SGEHRD
    EigenvaluesEigenvectors and , Balancing, Generalized Nonsymmetric Eigenproblems, Eigenvalue Problems
    SGELQF
    Factorization, Singular Value Decomposition
    SGELS
    Linear Least Squares , Error Bounds for , Further Details: Error
    SGELSS
    Linear Least Squares , Linear Least Squares , Error Bounds for , Error Bounds for , Further Details: Error , Installing ILAENV
    SGELSX
    Linear Least Squares , Linear Least Squares , Error Bounds for , Error Bounds for , Further Details: Error
    SGEQLF
    Other Factorizations
    SGEQPF
    Factorization with Column
    SGEQRF
    Factorization, Factorization with Column , Singular Value Decomposition, Factorization, Installing ILAENV, Installing ILAENV
    SGERFS
    Linear Equations
    SGERQF
    Other Factorizations
    SGESV
    Error Bounds for , Invalid Arguments and
    SGESVD
    Singular Value Decomposition , Singular Value Decomposition, Further Details: Error , Error Bounds for , Further Details: Error , Installing ILAENV
    SGESVX
    Further Details: How , Error Bounds for , Computational Failures and , Computational Failures and
    SGETRF
    Linear Equations, Factorizations for Solving
    SGETRI
    Linear Equations
    SGETRS
    Linear Equations
    SGGBAK
    Generalized Nonsymmetric Eigenproblems, Generalized Nonsymmetric Eigenproblems
    SGGBAL
    Generalized Nonsymmetric Eigenproblems, Generalized Nonsymmetric Eigenproblems
    SGGGLM
    Generalized Linear Least
    SGGHRD
    Generalized Nonsymmetric Eigenproblems
    SGGQRF
    Generalized Factorization
    SGGRQF
    Generalized factorization
    SGGSVD
    Generalized Singular Value , Error Bounds for
    SGGSVP
    Generalized (or Quotient)
    SGTCON
    Linear Equations
    SGTRFS
    Linear Equations
    SGTTRF
    Linear Equations
    SGTTRS
    Linear Equations
    shared memory
    Factors that Affect
    SHGEQZ
    Generalized Nonsymmetric Eigenproblems
    SHSEIN
    EigenvaluesEigenvectors and
    SHSEQR
    EigenvaluesEigenvectors and , Balancing, Installing ILAENV, Installing ILAENV, Poor Performance
    similarity transformation
    Balancing
    singular value
    Singular Value Decomposition
    singular value decomposition (SVD)
    Singular Value Decomposition
    singular value decomposition!generalized
    Generalized Singular Value
    singular value decomposition (SVD)
    Singular Value Decomposition
    singular value decomposition!generalized
    Generalized Singular Value , Generalized (or Quotient)
    singular value decomposition!generalized!special cases
    Generalized Singular Value
    singular value!error bound
    Error Bounds for , Further Details: Error , Further Details: Error , Error Bounds for , Further Details: Error
    singular value!generalized
    Generalized Singular Value
    singular vector!error bound
    Error Bounds for , Further Details: Error , Further Details: Error
    singular vectors!left
    Singular Value Decomposition , Singular Value Decomposition
    singular vectors!right
    Singular Value Decomposition , Singular Value Decomposition
    SLAMCH
    Sources of Error , Further Details: Floating , Points to Note, Poor Performance
    SOPGTR
    Symmetric Eigenproblems
    SOPMTR
    Symmetric Eigenproblems
    SORGBR
    Singular Value Decomposition
    SORGHR
    EigenvaluesEigenvectors and
    SORGLQ
    Factorization
    SORGQR
    Factorization, Factorization with Column
    SORGTR
    Symmetric Eigenproblems
    SORMBR
    Singular Value Decomposition, Singular Value Decomposition
    SORMHR
    EigenvaluesEigenvectors and , EigenvaluesEigenvectors and
    SORMLQ
    Factorization, Factorization
    SORMQR
    Factorization, Factorization, Factorization, Factorization with Column , Generalized Factorization, Generalized factorization
    SORMRQ
    Generalized Factorization, Generalized factorization
    SORMTR
    Symmetric Eigenproblems
    source code
    Availability of LAPACK
    SPBCON
    Linear Equations
    SPBEQU
    Linear Equations
    SPBRFS
    Linear Equations
    SPBSTF
    Generalized Symmetric Definite
    SPBTRF
    Linear Equations
    SPBTRS
    Linear Equations
    spectral factorization
    Symmetric Eigenproblems (SEP)
    spectral projector
    Computing and
    split Cholesky factorization
    Generalized Symmetric Definite
    SPOCON
    Linear Equations
    SPOEQU
    Linear Equations
    SPOFA (LINPACK)
    Block Algorithms and , Block Algorithms and
    SPORFS
    Linear Equations
    SPOTRF
    Linear Equations, Block Algorithms and , Factorizations for Solving
    SPOTRI
    Linear Equations
    SPOTRS
    Linear Equations
    SPPCON
    Linear Equations
    SPPEQU
    Linear Equations
    SPPRFS
    Linear Equations
    SPPTRF
    Linear Equations
    SPPTRI
    Linear Equations
    SPPTRS
    Linear Equations
    SPTCON
    Linear Equations, Further Details: Error
    SPTEQR
    Symmetric Eigenproblems, Further Details: Error
    SPTRFS
    Linear Equations
    SPTTRF
    Linear Equations
    SPTTRS
    Linear Equations
    SSBEV
    Error Bounds for
    SSBEVD
    Error Bounds for
    SSBEVX
    Error Bounds for
    SSBGST
    Generalized Symmetric Definite
    SSBGV
    Error Bounds for
    SSBTRD
    Symmetric Eigenproblems, Symmetric Eigenproblems
    SSPCON
    Linear Equations
    SSPEV
    Error Bounds for
    SSPEVD
    Error Bounds for
    SSPEVX
    Error Bounds for
    SSPGV
    Error Bounds for
    SSPRFS
    Linear Equations
    SSPTRD
    Symmetric Eigenproblems
    SSPTRF
    Linear Equations
    SSPTRI
    Linear Equations
    SSPTRS
    Linear Equations
    SSTEBZ
    Symmetric Eigenproblems, Further Details: Error
    SSTEDC
    Symmetric Eigenproblems, Eigenvalue Problems
    SSTEIN
    Symmetric Eigenproblems, Symmetric Eigenproblems
    SSTEQR
    Symmetric Eigenproblems, Symmetric Eigenproblems, Eigenvalue Problems
    SSTERF
    Symmetric Eigenproblems, Eigenvalue Problems
    SSTEV
    Error Bounds for
    SSTEVD
    Error Bounds for
    SSTEVX
    Error Bounds for , Further Details: Error
    SSYCON
    Linear Equations
    SSYEV
    Error Bounds for
    SSYEVD
    Error Bounds for
    SSYEVX
    Error Bounds for
    SSYGV
    Standard Error Analysis, Error Bounds for , Further Details: Error
    SSYRFS
    Linear Equations
    SSYTRD
    Symmetric Eigenproblems, Eigenvalue Problems, Eigenvalue Problems
    SSYTRF
    Linear Equations, Factorizations for Solving , Factorizations for Solving
    SSYTRI
    Linear Equations
    SSYTRS
    Linear Equations
    stability
    Orthogonal Factorizations and , Accuracy and Stability
    stability!backward
    Standard Error Analysis, Standard Error Analysis, Improved Error Bounds, Overview, Further Details: Error , Further Details: Error , Further Details: Error , Further Details: Error
    stability!forward
    Improved Error Bounds
    STBCON
    Linear Equations
    STBRFS
    Linear Equations
    STBTRS
    Linear Equations
    STGEVC
    Generalized Nonsymmetric Eigenproblems
    STGSJA
    Generalized (or Quotient) , Further Details: Error
    storage scheme
    Matrix Storage Schemes
    storage scheme!band
    Band Storage
    storage scheme!band LU
    Band Storage
    storage scheme!bidiagonal
    Tridiagonal and Bidiagonal
    storage scheme!conventional
    Conventional Storage
    storage scheme!diagonal of Hermitian matrix
    Real Diagonal Elements
    storage scheme!Hermitian
    Conventional Storage
    storage scheme!orthogonal or unitary matrices
    Representation of Orthogonal
    storage scheme!packed
    Packed Storage
    storage scheme!symmetric
    Conventional Storage
    storage scheme!symmetric tridiagonal
    Tridiagonal and Bidiagonal
    storage scheme!triangular
    Conventional Storage
    storage scheme!unsymmetric tridiagonal
    Tridiagonal and Bidiagonal
    STPCON
    Linear Equations
    STPRFS
    Linear Equations
    STPTRI
    Linear Equations
    STPTRS
    Linear Equations
    Strassen's method
    Error Bounds for , Error Bounds for
    STRCON
    Linear Equations, Further Details: Error
    STREVC
    EigenvaluesEigenvectors and
    STREXC
    Invariant Subspaces and , Computing and
    STRRFS
    Linear Equations
    STRSEN
    Invariant Subspaces and , Overview
    STRSNA
    Invariant Subspaces and , Overview
    STRSYL
    Invariant Subspaces and , Computing and
    STRTRI
    Linear Equations
    STRTRS
    Linear Equations, Factorization, Factorization
    STZRQF
    Complete Orthogonal Factorization
    subspaces
    Further Details: How
    subspaces!angle between
    How to Measure , How to Measure , How to Measure , Further Details: How , Further Details: How , Further Details: Error , Error Bounds for , Error Bounds for , Error Bounds for , Error Bounds for , Further Details: Error , Error Bounds for
    subspaces!deflating
    Generalized Nonsymmetric Eigenproblems , Generalized Nonsymmetric Eigenproblems
    subspaces!invariant
    Generalized Nonsymmetric Eigenproblems , Invariant Subspaces and
    support
    Support for LAPACK
    SVD
    Singular Value Decomposition
    Sylvester equation
    Standard Error Analysis, Computing and , Computing and
    Sylvester equation
    Invariant Subspaces and
    symmetric eigenproblem
    Symmetric Eigenproblems
    symmetric indefinite factorization
    Linear Equations
    symmetric indefinite factorization!blocked form
    Factorizations for Solving
    TQL1 (EISPACK)
    Notes
    TQL2 (EISPACK)
    Notes
    TQLRAT (EISPACK)
    Notes
    TRED1 (EISPACK)
    Notes
    TRED2 (EISPACK)
    Notes
    TRED3 (EISPACK)
    Notes
    tridiagonal form
    Eigenvalue Problems
    tridiagonal form
    Symmetric Eigenproblems, Symmetric Eigenproblems, Symmetric Eigenproblems, Further Details: Error , Further Details: Error , Representation of Orthogonal
    troubleshooting
    Troubleshooting
    tuning!block multishift QR: NS, MAXB
    Installing ILAENV
    tuning!block size: NB, NBMIN, and NX
    Installing ILAENV
    tuning!SVD: NXSVD
    Installing ILAENV
    underdetermined system
    Linear Least Squares , Linear Least Squares , Factorization, Further Details: Error
    underflow
    Sources of Error , Further Details: Floating , Further Details: Floating , Further Details: Floating
    upper Hessenberg form
    EigenvaluesEigenvectors and
    vector registers
    Data Movement
    vectorization
    Vectorization
    workstation, super-scalar
    Factors that Affect
    wrong results
    Wrong Results
    XERBLA
    Error Handling and , Invalid Arguments and , Invalid Arguments and
    xnetlib
    Availability of LAPACK
    ZBDSQR
    Singular Value Decomposition, Singular Value Decomposition, Eigenvalue Problems, Further Details: Error
    ZGBBRD
    Singular Value Decomposition
    ZGBCON
    Linear Equations
    ZGBEQU
    Linear Equations
    ZGBRFS
    Linear Equations
    ZGBSVX
    Further Details: How
    ZGBTRF
    Linear Equations
    ZGBTRS
    Linear Equations
    ZGEBAK
    Balancing
    ZGEBAL
    Balancing, Balancing, Balancing and Conditioning
    ZGEBRD
    Singular Value Decomposition
    ZGECON
    Linear Equations
    ZGEEQU
    Linear Equations
    ZGEES
    Nonsymmetric Eigenproblems (NEP), Poor Performance
    ZGEESX
    Nonsymmetric Eigenproblems (NEP), How to Measure , Overview, Poor Performance
    ZGEEV
    Nonsymmetric Eigenproblems (NEP), Poor Performance
    ZGEEVX
    Nonsymmetric Eigenproblems (NEP), Error Bounds for , Overview, Poor Performance
    ZGEGS
    Generalized Nonsymmetric Eigenproblems
    ZGEGV
    Generalized Nonsymmetric Eigenproblems
    ZGEHRD
    EigenvaluesEigenvectors and , Balancing, Generalized Nonsymmetric Eigenproblems
    ZGELQF
    Factorization, Singular Value Decomposition
    ZGELS
    Linear Least Squares , Error Bounds for , Further Details: Error
    ZGELSS
    Linear Least Squares , Linear Least Squares , Error Bounds for , Further Details: Error , Installing ILAENV
    ZGELSX
    Linear Least Squares , Linear Least Squares , Error Bounds for , Further Details: Error
    ZGEQLF
    Other Factorizations
    ZGEQPF
    Factorization with Column
    ZGEQRF
    Factorization, Factorization with Column , Singular Value Decomposition
    ZGERFS
    Linear Equations
    ZGERQF
    Other Factorizations
    ZGESVD
    Singular Value Decomposition , Singular Value Decomposition, Further Details: Error , Error Bounds for , Further Details: Error , Installing ILAENV
    ZGESVX
    Further Details: How
    ZGETRF
    Linear Equations
    ZGETRI
    Linear Equations
    ZGETRS
    Linear Equations
    ZGGBAK
    Generalized Nonsymmetric Eigenproblems, Generalized Nonsymmetric Eigenproblems
    ZGGBAL
    Generalized Nonsymmetric Eigenproblems, Generalized Nonsymmetric Eigenproblems
    ZGGGLM
    Generalized Linear Least
    ZGGHRD
    Generalized Nonsymmetric Eigenproblems
    ZGGQRF
    Generalized Factorization
    ZGGRQF
    Generalized factorization
    ZGGSVD
    Generalized Singular Value , Error Bounds for
    ZGGSVP
    Generalized (or Quotient)
    ZGTCON
    Linear Equations
    ZGTRFS
    Linear Equations
    ZGTTRF
    Linear Equations
    ZGTTRS
    Linear Equations
    ZHBEV
    Error Bounds for
    ZHBEVD
    Error Bounds for
    ZHBEVX
    Error Bounds for
    ZHBGST
    Generalized Symmetric Definite
    ZHBGV
    Error Bounds for
    ZHBTRD
    Symmetric Eigenproblems, Symmetric Eigenproblems
    ZHECON
    Linear Equations
    ZHEEV
    Error Bounds for
    ZHEEVD
    Error Bounds for
    ZHEEVX
    Error Bounds for
    ZHEGV
    Standard Error Analysis, Error Bounds for , Further Details: Error
    ZHERFS
    Linear Equations
    ZHETRD
    Symmetric Eigenproblems
    ZHETRF
    Linear Equations
    ZHETRI
    Linear Equations
    ZHETRS
    Linear Equations
    ZHGEQZ
    Generalized Nonsymmetric Eigenproblems
    ZHPCON
    Linear Equations
    ZHPEV
    Error Bounds for
    ZHPEVD
    Error Bounds for
    ZHPEVX
    Error Bounds for
    ZHPGV
    Error Bounds for
    ZHPRFS
    Linear Equations
    ZHPTRD
    Symmetric Eigenproblems
    ZHPTRF
    Linear Equations
    ZHPTRI
    Linear Equations
    ZHPTRS
    Linear Equations
    ZHSEIN
    EigenvaluesEigenvectors and
    ZHSEQR
    EigenvaluesEigenvectors and , Balancing, Installing ILAENV, Installing ILAENV, Poor Performance
    ZPBCON
    Linear Equations
    ZPBEQU
    Linear Equations
    ZPBRFS
    Linear Equations
    ZPBSTF
    Generalized Symmetric Definite
    ZPBTRF
    Linear Equations
    ZPBTRS
    Linear Equations
    ZPOCON
    Linear Equations
    ZPOEQU
    Linear Equations
    ZPORFS
    Linear Equations
    ZPOTRF
    Linear Equations
    ZPOTRI
    Linear Equations
    ZPOTRS
    Linear Equations
    ZPPCON
    Linear Equations
    ZPPEQU
    Linear Equations
    ZPPRFS
    Linear Equations
    ZPPTRF
    Linear Equations
    ZPPTRI
    Linear Equations
    ZPPTRS
    Linear Equations
    ZPTCON
    Linear Equations, Further Details: Error
    ZPTEQR
    Symmetric Eigenproblems, Further Details: Error
    ZPTRFS
    Linear Equations
    ZPTTRF
    Linear Equations
    ZPTTRS
    Linear Equations
    ZSPCON
    Linear Equations
    ZSPRFS
    Linear Equations
    ZSPTRF
    Linear Equations
    ZSPTRI
    Linear Equations
    ZSPTRS
    Linear Equations
    ZSTEDC
    Symmetric Eigenproblems, Eigenvalue Problems
    ZSTEIN
    Symmetric Eigenproblems, Symmetric Eigenproblems
    ZSTEQR
    Symmetric Eigenproblems, Symmetric Eigenproblems, Eigenvalue Problems
    ZSYCON
    Linear Equations
    ZSYRFS
    Linear Equations
    ZSYTRF
    Linear Equations
    ZSYTRI
    Linear Equations
    ZSYTRS
    Linear Equations
    ZTBCON
    Linear Equations
    ZTBRFS
    Linear Equations
    ZTBTRS
    Linear Equations
    ZTGEVC
    Generalized Nonsymmetric Eigenproblems
    ZTGSJA
    Generalized (or Quotient) , Further Details: Error
    ZTPCON
    Linear Equations
    ZTPRFS
    Linear Equations
    ZTPTRI
    Linear Equations
    ZTPTRS
    Linear Equations
    ZTRCON
    Linear Equations, Further Details: Error
    ZTREVC
    EigenvaluesEigenvectors and
    ZTREXC
    Invariant Subspaces and , Computing and
    ZTRRFS
    Linear Equations
    ZTRSEN
    Invariant Subspaces and , Overview
    ZTRSNA
    Invariant Subspaces and , Overview
    ZTRSYL
    Invariant Subspaces and , Computing and
    ZTRTRI
    Linear Equations
    ZTRTRS
    Linear Equations, Factorization, Factorization
    ZTZRQF
    Complete Orthogonal Factorization
    ZUNGBR
    Singular Value Decomposition
    ZUNGHR
    EigenvaluesEigenvectors and
    ZUNGLQ
    Factorization
    ZUNGQR
    Factorization, Factorization with Column
    ZUNMBR
    Singular Value Decomposition
    ZUNMHR
    EigenvaluesEigenvectors and
    ZUNMLQ
    Factorization
    ZUNMQR
    Factorization, Factorization, Factorization, Factorization with Column , Generalized Factorization, Generalized factorization
    ZUNMRQ
    Generalized Factorization, Generalized factorization
    ZUNMTR
    Symmetric Eigenproblems, Symmetric Eigenproblems
    ZUPGTR
    Symmetric Eigenproblems



    Tue Nov 29 14:03:33 EST 1994

    Installation of LAPACK



    next up previous contents index
    Next: Support for LAPACK Up: Essentials Previous: Availability of LAPACK

    Installation of LAPACK

     

    A Quick Installation Guide (LAPACK Working Note 81) [35] is distributed with the complete package. This Quick Installation Guide provides installation instructions for Unix Systems. A comprehensive Installation Guide [3] (LAPACK Working Note 41), which contains descriptions of the testing and timings programs, as well as detailed non-Unix installation instructions, is also available. See also Chapter 6.




    Tue Nov 29 14:03:33 EST 1994

    <em>About this document ...</em>



    next up previous contents index
    Up: LAPACK Users' Guide Release Previous: Index

    About this document ...

    LAPACK Users' Guide
    Release 2.0

    This document was generated using the LaTeX2HTML translator Version 0.6.4 (Tues Aug 30 1994) Copyright © 1993, 1994, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

    The command line arguments were:
    latex2html lapack_lug.tex.

    The translation was initiated by on Tue Nov 29 14:03:33 EST 1994



    Tue Nov 29 14:03:33 EST 1994

    Support for LAPACK



    next up previous contents index
    Next: Known Problems in Up: Essentials Previous: Installation of LAPACK

    Support for LAPACK

     

    LAPACK has been thoroughly tested before release, on many different types of computers. The LAPACK project supports the package in the sense that reports of errors or poor performance will gain immediate attention from the developers  . Such reports - and also descriptions of interesting applications and other comments - should be sent to:

    LAPACK Project
    c/o J. J. Dongarra
    Computer Science Department
    University of Tennessee
    Knoxville, TN 37996-1301
    USA
    Email: lapack@cs.utk.edu




    Tue Nov 29 14:03:33 EST 1994

    Known Problems in LAPACK



    next up previous contents index
    Next: Other Related Software Up: Essentials Previous: Support for LAPACK

    Known Problems in LAPACK

    A list of known problems, bugs, and compiler errors   for LAPACK, as well as an errata list for this guide, is maintained on netlib. For a copy of this report, send email to netlib of the form:

    send release_notes from lapack




    Tue Nov 29 14:03:33 EST 1994

    Other Related Software



    next up previous contents index
    Next: LAPACK++ Up: Essentials Previous: Known Problems in

    Other Related Software

     

    As previously mentioned in the Preface, many LAPACK-related software projects are currently available on netlib. In the context of this users' guide, several of these projects require further discussion - LAPACK++, CLAPACK, ScaLAPACK, and LAPACK routines exploiting IEEE arithmetic.






    Tue Nov 29 14:03:33 EST 1994

    LAPACK++



    next up previous contents index
    Next: CLAPACK Up: Other Related Software Previous: Other Related Software

    LAPACK++

    LAPACK++ is an object-oriented C++ extension to the LAPACK library. Traditionally, linear algebra libraries have been available only in Fortran. However, with an increasing number of programmers using C and C++ for scientific software development, there is a need to have high-quality numerical libraries to support these platforms as well. LAPACK++ provides the speed and efficiency competitive with native Fortran codes, while allowing programmers to capitalize on the software engineering benefits of object-oriented programming.

    LAPACK++ supports various matrix classes for vectors, non-symmetric matrices, symmetric positive definite matrices, symmetric matrices, banded, triangular, and tridiagonal matrices; however, the current version does not include all of the capabilities of original Fortran 77 LAPACK. Emphasis is given to routines for solving linear systems consisting of nonsymmetric matrices, symmetric positive definite systems, and solving linear least-square systems. Future versions of LAPACK++ will support eigenvalue problems and singular value decompositions as well as distributed matrix classes for parallel computer architectures. For a more detailed description of the design of LAPACK++, please see [36]. This paper, as well as an Installation manual and Users' Guide are available on netlib. To obtain this software or documentation send a message to netlib@ornl.gov of the form:

    send index from c++/lapack++
    Questions and comments about LAPACK++ can be directed to lapackpp@cs.utk.edu.




    Tue Nov 29 14:03:33 EST 1994

    CLAPACK



    next up previous contents index
    Next: ScaLAPACK Up: Other Related Software Previous: LAPACK++

    CLAPACK

    The CLAPACK library was built using a Fortran to C conversion utility called f2c [40]. The entire Fortran 77 LAPACK library is run through f2c to obtain C code, and then modified to improve readability. CLAPACK's goal is to provide LAPACK for someone who does not have access to a Fortran compiler.

    However, f2c is designed to create C code that is still callable from Fortran, so all arguments must be passed using Fortran calling conventions and data structures. This requirement has several repercussions. The first is that since many compilers require distinct Fortran and C routine namespaces, an underscore (_) is appended to C routine names which will be called from Fortran. Therefore, f2c has added this underscore to all the names in CLAPACK. So, a call that in Fortran would look like:

       call dgetrf(...)
    becomes in C:
       dgetrf_(...);
    Second, the user must pass ALL arguments by reference, i.e. as pointers, since this is how Fortran works. This includes all scalar arguments like M and N. This restriction means that you cannot make a call with numbers directly in the parameter sequence. For example, consider the LU factorization of a 5-by-5 matrix. If the matrix to be factored is called A, the Fortran call
       call dgetrf(5, 5, A, 5, ipiv, info)
    becomes in C:
       M = N = LDA = 5;
       dgetrf_(&M, &N, A, &LDA, ipiv, &info);

    Some LAPACK routines take character string arguments. In all but the testing and timing code, only the first character of the string is signficant. Therefore, the CLAPACK driver, computational, and auxiliary routines only expect single character arguments. For example, the Fortran call

       call dpotrf( 'Upper', n, a, lda, info )
    becomes in C:
       char s = 'U';
       dpotrf_(&s, &n, a, &lda, &info);

    In a future release we hope to provide ``wrapper'' routines that will remove the need for these unnecessary pointers, and automatically allocate (``malloc'') any workspace that is required.

    As a final point, we must stress that there is a difference in the definition of a two-dimensional array in Fortran and C. A two-dimensional Fortran array declared as

       DOUBLE PRECISION A(LDA, N)
    is a contiguous piece of LDA N double-words of memory, stored in column-major order: elements in a column are contiguous, and elements within a row are separated by a stride of LDA double-words.

    In C, however, a two-dimensional array is in row-major order. Further, the rows of a two-dimensional C array need not be contiguous. The array

    double A[LDA][N];
    actually has LDA pointers to rows of length N. These pointers can in principle be anywhere in memory. Passing such a two-dimensional C array to a CLAPACK routine will almost surely give erroneous results.

    Instead, you must use a one-dimensional C array of size LDA N double-words (or else malloc the same amount of space). We recommend using the following code to get the array CLAPACK will be expecting:

       double *A;
       A = malloc( LDA*N*sizeof(double) );
    Note that for best memory utilization, you would set LDA=M, the actual number of rows of A. If you now wish to operate on the matrix A, remember that A is in column-major order. As an example of accessing Fortran-style arrays in C, the following code fragments show how to initialize the array A declared above so that all of column has the value :
       double *ptr;
       ptr = A;
       for(j=0; j < N; j++)
       {
          for (i=0; i < M; i++) *ptr++ = j;
          ptr += (LDA - M);
       }
    or, you can use:
       for(j=0; j < N; j++)
       {
          for (i=0; i < M; i++) A[j*LDA+i] = j;
       }
    Note that the loop over the row index i is the inner loop, since column entries are contiguous.



    next up previous contents index
    Next: ScaLAPACK Up: Other Related Software Previous: LAPACK++




    Tue Nov 29 14:03:33 EST 1994

    List of Tables



    next up previous contents index
    Next: Preface to the Up: LAPACK Users' Guide Release Previous: Contents

    List of Tables






    Tue Nov 29 14:03:33 EST 1994

    ScaLAPACK



    next up previous contents index
    Next: LAPACK routines exploiting Up: Other Related Software Previous: CLAPACK

    ScaLAPACK

    The ScaLAPACK (or Scalable LAPACK)   library includes a subset of LAPACK routines redesigned for distributed memory parallel computers. It is currently written in a Single-Program-Multiple-Data style using explicit message passing for interprocessor communication. It assumes matrices are laid out in a two-dimensional block cyclic decomposition. The goal is to have ScaLAPACK routines resemble their LAPACK equivalents as much as possible. Just as LAPACK is built on top of the BLAS, ScaLAPACK relies on the PBLAS (Parallel Basic Linear Algebra Subprograms)   and the BLACS (Basic Linear Algebra Communication Subprograms). The PBLAS perform computations analogous to the BLAS but on matrices distributed across multiple processors. The PBLAS rely on the communication protocols of the BLACS. The BLACS are designed for linear algebra applications and provide portable communication across a wide variety of distributed-memory architectures. At the present time, they are available for the Intel Gamma, Delta, and Paragon, Thinking Machines CM-5, IBM SPs, and PVM. They will soon be available for the CRAY T3D. For more information:

    echo ''send index from scalapack'' | mail netlib@ornl.gov
    All questions/comments can be directed to scalapack@cs.utk.edu.




    Tue Nov 29 14:03:33 EST 1994

    LAPACK routines exploiting IEEE arithmetic



    next up previous contents index
    Next: Contents of LAPACK Up: Other Related Software Previous: ScaLAPACK

    LAPACK routines exploiting IEEE arithmetic

    We have also explored the advantages of IEEE arithmetic in implementing linear algebra routines. For example, the accurate rounding properties of IEEE arithmetic permit high precision arithmetic to be simulated economically in short stretches of code, thereby replacing possibly much more complicated low precision algorithms. Second, the ``friendly'' exception handling capabilities of IEEE arithmetic, such as being able to continue computing past an overflow and to ask later whether an overflow occurred, permit us to use simple, fast algorithms which work almost all the time, and revert to slower, safer algorithms only if the fast algorithm fails. See [23] for more details.

    However, the continuing importance of machines implementing Cray arithmetic, the existence of some machines that only implement full IEEE exception handling by slowing down all floating point operations significantly, and the lack of portable ways to refer to exceptions in Fortran or C, has led us not to include these improved algorithms in this release of LAPACK. Since Cray has announced plans to convert to IEEE arithmetic, and some progress is being made on standardizing exception handling [65] we do expect to make these routines available in a future release.




    Tue Nov 29 14:03:33 EST 1994

    Contents of LAPACK



    next up previous contents index
    Next: Structure of LAPACK Up: Guide Previous: LAPACK routines exploiting

    Contents of LAPACK

     






    Tue Nov 29 14:03:33 EST 1994

    Structure of LAPACK



    next up previous contents index
    Next: Levels of Routines Up: Contents of LAPACK Previous: Contents of LAPACK

    Structure of LAPACK

     






    Tue Nov 29 14:03:33 EST 1994

    Levels of Routines



    next up previous contents index
    Next: Data Types and Up: Structure of LAPACK Previous: Structure of LAPACK

    Levels of Routines

     

    The subroutines in LAPACK are classified as follows:

    Both driver routines and computational routines are fully described in this Users' Guide, but not the auxiliary routines. A list of the auxiliary routines, with brief descriptions of their functions, is given in Appendix B.




    Tue Nov 29 14:03:33 EST 1994

    Data Types and Precision



    next up previous contents index
    Next: Naming Scheme Up: Structure of LAPACK Previous: Levels of Routines

    Data Types and Precision

    LAPACK provides the same range of functionality for real and complex data.

    For most computations there are matching routines, one for real and one for complex data, but there are a few exceptions. For example, corresponding to the routines for real symmetric indefinite systems of linear equations, there are routines for complex Hermitian and complex symmetric systems, because both types of complex systems occur in practical applications. However, there is no complex analogue of the routine for finding selected eigenvalues of a real symmetric tridiagonal matrix, because a complex Hermitian matrix can always be reduced to a real symmetric tridiagonal matrix.  

    Matching routines for real and complex data have been coded to maintain a close correspondence between the two, wherever possible. However, in some areas (especially the nonsymmetric eigenproblem) the correspondence is necessarily weaker.

    All routines in LAPACK are provided in both single and double precision versions. The double precision versions have been generated automatically, using Toolpack/1 [66].

    Double precision routines for complex matrices require the non-standard Fortran data type COMPLEX*16, which is available on most machines where double precision computation is usual.




    Tue Nov 29 14:03:33 EST 1994

    Naming Scheme



    next up previous contents index
    Next: Driver Routines Up: Structure of LAPACK Previous: Data Types and

    Naming Scheme

       

    The name of each LAPACK routine is a coded specification of its function (within the very tight limits of standard Fortran 77 6-character names).

    All driver and computational routines   have names of the form XYYZZZ, where for some driver routines the 6th character is blank.

    The first letter, X, indicates the data type as follows:

     S     REAL
     D     DOUBLE PRECISION
     C     COMPLEX
     Z     COMPLEX*16 or DOUBLE COMPLEX
    

    When we wish to refer to an LAPACK routine generically, regardless of data type, we replace the first letter by ``x''. Thus xGESV refers to any or all of the routines SGESV, CGESV, DGESV and ZGESV.

    The next two letters, YY, indicate the type of matrix (or of the most significant matrix). Most of these two-letter codes apply to both real and complex matrices; a few apply specifically to one or the other, as indicated in Table 2.1.

     BD      bidiagonal
     GB      general band
     GE      general (i.e., unsymmetric, in some cases rectangular)
     GG      general matrices, generalized problem (i.e., a pair of general
             matrices) (not used in Release 1.0)
     GT      general tridiagonal
     HB      (complex) Hermitian band
     HE      (complex) Hermitian
     HG      upper Hessenberg matrix, generalized problem (i.e a Hessenberg and
             a triangular matrix) (not used in Release 1.0)
     HP      (complex) Hermitian, packed storage
     HS      upper Hessenberg
     OP      (real) orthogonal, packed storage
     OR      (real) orthogonal
     PB      symmetric or Hermitian positive definite band
     PO      symmetric or Hermitian positive definite
     PP      symmetric or Hermitian positive definite, packed storage
     PT      symmetric or Hermitian positive definite tridiagonal
     SB      (real) symmetric band
     SP      symmetric, packed storage
     ST      (real) symmetric tridiagonal
     SY      symmetric
     TB      triangular band
     TG      triangular matrices, generalized problem (i.e., a pair of triangular
             matrices) (not used in Release 1.0)
     TP      triangular, packed storage
     TR      triangular (or in some cases quasi-triangular)
     TZ      trapezoidal
     UN      (complex) unitary
     UP      (complex) unitary, packed storage
    

    Table 2.1: Matrix types in the LAPACK naming scheme

    When we wish to refer to a class of routines that performs the same function on different types of matrices, we replace the first three letters by ``xyy''. Thus xyySVX refers to all the expert driver routines for systems of linear equations that are listed in Table 2.2.

    The last three letters ZZZ indicate the computation performed. Their meanings will be explained in Section 2.3. For example, SGEBRD is a single precision routine that performs a bidiagonal reduction (BRD) of a real general matrix.

    The names of auxiliary routines   follow a similar scheme except that the 2nd and 3rd characters YY are usually LA (for example, SLASCL or CLARFG). There are two kinds of exception. Auxiliary routines that implement an unblocked version of a block algorithm have similar names to the routines that perform the block algorithm, with the sixth character being ``2'' (for example, SGETF2 is the unblocked version of SGETRF). A few routines that may be regarded as extensions to the BLAS are named according to the BLAS naming schemes (for example, CROT, CSYR).



    next up previous contents index
    Next: Driver Routines Up: Structure of LAPACK Previous: Data Types and




    Tue Nov 29 14:03:33 EST 1994

    Driver Routines



    next up previous contents index
    Next: Linear Equations Up: Contents of LAPACK Previous: Naming Scheme

    Driver Routines

     

    This section describes the driver routines   in LAPACK. Further details on the terminology and the numerical operations they perform are given in Section 2.3, which describes the computational routines.






    Tue Nov 29 14:03:33 EST 1994

    Linear Equations



    next up previous contents index
    Next: Linear Least Squares Up: Driver Routines Previous: Driver Routines

    Linear Equations

     

    Two types of driver routines are provided for solving systems of linear equations  :

    Both types of driver routines can handle multiple right hand sides (the columns of B).

    Different driver routines are provided to take advantage of special properties or storage schemes of the matrix A, as shown in Table 2.2.

    These driver routines cover all the functionality of the computational routines for linear systems  , except matrix inversion  . It is seldom necessary to compute the inverse of a matrix explicitly, and it is certainly not recommended as a means of solving linear systems.

    --------------------------------------------------------------------------
    Type of matrix                        Single precision    Double precision
    and storage scheme    Operation       real     complex    real     complex  
    --------------------------------------------------------------------------
    general               simple driver   SGESV    CGESV      DGESV    ZGESV
                          expert driver   SGESVX   CGESVX     DGESVX   ZGESVX
    --------------------------------------------------------------------------
    general band          simple driver   SGBSV    CGBSV      DGBSV    ZGBSV
                          expert driver   SGBSVX   CGBSVX     DGBSVX   ZGBSVX
    --------------------------------------------------------------------------
    general tridiagonal   simple driver   SGTSV    CGTSV      DGTSV    ZGTSV
                          expert driver   SGTSVX   CGTSVX     DGTSVX   ZGTSVX
    --------------------------------------------------------------------------
    symmetric/Hermitian   simple driver   SPOSV    CPOSV      DPOSV    ZPOSV
     positive definite    expert driver   SPOSVX   CPOSVX     DPOSVX   ZPOSVX
    --------------------------------------------------------------------------
    symmetric/Hermitian   simple driver   SPPSV    CPPSV      DPPSV    ZPPSV
     positive definite    expert driver   SPPSVX   CPPSVX     DPPSVX   ZPPSVX
     (packed storage)
    --------------------------------------------------------------------------
    symmetric/Hermitian   simple driver   SPBSV    CPBSV      DPBSV    ZPBSV
     positive definite    expert driver   SPBSVX   CPBSVX     DPBSVX   ZPBSVX
     band
    --------------------------------------------------------------------------
    symmetric/Hermitian   simple driver   SPTSV    CPTSV      DPTSV    ZPTSV
     positive definite    expert driver   SPTSVX   CPTSVX     DPTSVX   ZPTSVX
     tridiagonal
    --------------------------------------------------------------------------
    symmetric/Hermitian   simple driver   SSYSV    CHESV      DSYSV    ZHESV
     indefinite           expert driver   SSYSVX   CHESVX     DSYSVX   ZHESVX
    --------------------------------------------------------------------------
    complex symmetric     simple driver            CSYSV               ZSYSV
                          expert driver            CSYSVX              ZSYSVX
    --------------------------------------------------------------------------
    symmetric/Hermitian   simple driver   SSPSV    CHPSV      DSPSV    ZHPSV
     indefinite (packed   expert driver   SSPSVX   CHPSVX     DSPSVX   ZHPSVX
     storage)
    --------------------------------------------------------------------------
    complex symmetric     simple driver            CSPSV               ZSPSV
     (packed storage)     expert driver            CSPSVX              ZSPSVX
    --------------------------------------------------------------------------
    

    Table 2.2: Driver routines for linear equations




    Tue Nov 29 14:03:33 EST 1994

    Linear Least Squares (LLS) Problems



    next up previous contents index
    Next: Generalized Linear Least Up: Driver Routines Previous: Linear Equations

    Linear Least Squares (LLS) Problems

       

    The linear least squares problem   is:

     

    where A is an m-by-n matrix, b is a given m element vector and x is the n element solution vector.

    In the most usual case m > = n and rank(A) = n, and in this case the solution to problem ( 2.1) is unique, and the problem is also referred to as finding a least squares solution to an overdetermined   system of linear equations.

    When m < n and rank(A) = m, there are an infinite number of solutions x which exactly satisfy b - Ax = 0. In this case it is often useful to find the unique solution x which minimizes , and the problem is referred to as finding a minimum norm solution   to an underdetermined   system of linear equations.

    The driver routine xGELS   solves problem ( 2.1) on the assumption that rank(A) = min(m , n) -- in other words, A has full rank - finding a least squares solution of an overdetermined   system when m > n, and a minimum norm solution of an underdetermined   system when m < n. xGELS         uses a QR or LQ factorization of A, and also allows A to be replaced by in the statement of the problem (or by if A is complex).

    In the general case when we may have rank(A) < min(m , n) -- in other words, A may be rank-deficient - we seek the minimum norm least squares solution   x which minimizes both and .

    The driver routines xGELSX         and xGELSS         solve this general formulation of problem 2.1, allowing for the possibility that A is rank-deficient; xGELSX         uses a complete orthogonal factorization of A, while xGELSS         uses the singular value decomposition of A.

    The LLS   driver routines are listed in Table 2.3.

    All three routines allow several right hand side vectors b and corresponding solutions x to be handled in a single call, storing these vectors as columns of matrices B and X, respectively. Note however that problem 2.1 is solved for each right hand side vector independently; this is not the same as finding a matrix X which minimizes .

    -------------------------------------------------------------------
                                   Single precision    Double precision
    Operation                      real     complex    real     complex
    -------------------------------------------------------------------
    solve LLS using QR or          SGELS    CGELS      DGELS    ZGELS
     LQ factorization          
    solve LLS using complete       SGELSX   CGELSX     DGELSX   ZGELSX
     orthogonal factorization  
    solve LLS using SVD            SGELSS   CGELSS     DGELSS   ZGELSS
    -------------------------------------------------------------------
    
    Table 2.3: Driver routines for linear least squares problems



    next up previous contents index
    Next: Generalized Linear Least Up: Driver Routines Previous: Linear Equations




    Tue Nov 29 14:03:33 EST 1994

    Preface to the Second Edition



    next up previous contents index
    Next: Preface to the Up: List of Tables Previous: List of Tables

    Preface to the Second Edition

    Since its initial public release in February 1992, LAPACK has expanded in both depth and breadth. LAPACK is now available in both Fortran and C. The publication of this second edition of the Users' Guide coincides with the release of version 2.0 of the LAPACK software.

    This release of LAPACK introduces new routines and extends the functionality of existing routines. Prominent among the new routines are driver and computational routines for the generalized nonsymmetric eigenproblem, generalized linear least squares problems, the generalized singular value decomposition, a generalized banded symmetric-definite eigenproblem, and divide-and-conquer methods for symmetric eigenproblems. Additional computational routines include the generalized QR and RQ factorizations and reduction of a band matrix to bidiagonal form.

    Added functionality has been incorporated into the expert driver routines that involve equilibration (xGESVX, xGBSVX, xPOSVX, xPPSVX, and xPBSVX). The option FACT = 'F' now permits the user to input a prefactored, pre-equilibrated matrix. The expert drivers xGESVX and xGBSVX now return the reciprocal of the pivot growth from Gaussian elimination. xBDSQR has been modified to compute singular values of bidiagonal matrices much more quickly than before, provided singular vectors are not also wanted. The least squares driver routines xGELS, xGELSS, and xGELSX now make available the residual root-sum-squares for each right hand side.

    All LAPACK routines reflect the current version number with the date on the routine indicating when it was last modified. For more information on revisions to the LAPACK software or this Users' Guide please refer to the LAPACK release_notes file on netlib. Instructions for obtaining this file can be found in Chapter 1.

    On-line manpages (troff files) for LAPACK routines, as well as for most of the BLAS routines, are available on netlib. Refer to Section 1.6 for further details.

    We hope that future releases of LAPACK will include routines for reordering eigenvalues in the generalized Schur factorization; solving the generalized Sylvester equation; computing condition numbers for the generalized eigenproblem (for eigenvalues, eigenvectors, clusters of eigenvalues, and deflating subspaces); fast algorithms for the singular value decomposition based on divide and conquer; high accuracy methods for symmetric eigenproblems and the SVD based on Jacobi's algorithm; updating and/or downdating for linear least squares problems; computing singular values by bidiagonal bisection; and computing singular vectors by bidiagonal inverse iteration.

    The following additions/modifications have been made to this second edition of the Users' Guide:

    Chapter 1 (Essentials) now includes information on accessing LAPACK via the World Wide Web.

    Chapter 2 (Contents of LAPACK) has been expanded to discuss new routines.

    Chapter 3 (Performance of LAPACK) has been updated with performance results from version 2.0 of LAPACK. In addition, a new section entitled ``LAPACK Benchmark'' has been introduced to present timings for several driver routines.

    Chapter 4 (Accuracy and Stability) has been simplified and rewritten. Much of the theory and other details have been separated into ``Further Details'' sections. Example Fortran code segments are included to demonstrate the calculation of error bounds using LAPACK.

    Appendices A, B, and D have been expanded to cover the new routines.

    Appendix E (LAPACK Working Notes) lists a number of new Working Notes, written during the LAPACK 2 and ScaLAPACK projects (see below) and published by the University of Tennessee. The Bibliography has been updated to give the most recent published references.

    The Specifications of Routines have been extended and updated to cover the new routines and revisions to existing routines.

    The Bibliography and Index have been moved to the end of the book. The Index has been expanded into two indexes: Index by Keyword and Index by Routine Name. Occurrences of LAPACK, LINPACK, and EISPACK routine names have been cited in the latter index.

    The original LAPACK project was funded by the NSF. Since its completion, two follow-up projects, LAPACK 2 and ScaLAPACK, have been funded in the U.S. by the NSF and ARPA in 1990-1994 and 1991-1995, respectively. In addition to making possible the additions and extensions in this release, these grants have supported the following closely related activities.

    A major effort is underway to implement LAPACK-type algorithms for distributed memory   machines. As a result of these efforts, several new software items are now available on netlib. The new items that have been introduced are distributed memory versions of the core routines from LAPACK; a fully parallel package to solve a symmetric positive definite sparse linear system on a message passing multiprocessor using Cholesky factorization  ; a package based on Arnoldi's method for solving large-scale nonsymmetric, symmetric, and generalized algebraic eigenvalue problems  ; and templates for sparse iterative methods for solving Ax = b. For more information on the availability of each of these packages, consult the scalapack and linalg indexes on netlib via netlib@ornl.gov.

    We have also explored the advantages of IEEE floating point arithmetic [4] in implementing linear algebra routines. The accurate rounding properties and ``friendly'' exception handling capabilities of IEEE arithmetic permit us to write faster, more robust versions of several algorithms in LAPACK. Since all machines do not yet implement IEEE arithmetic, these algorithms are not currently part of the library [23], although we expect them to be in the future. For more information, please refer to Section 1.11.

    LAPACK has been translated from Fortran into C and, in addition, a subset of the LAPACK routines has been implemented in C++  . For more information on obtaining the C or C++ versions of LAPACK, consult Section 1.11 or the clapack or c++ indexes on netlib via netlib@ornl.gov.

    We deeply appreciate the careful scrutiny of those individuals who reported mistakes, typographical errors, or shortcomings in the first edition.

    We acknowledge with gratitude the support which we have received from the following organizations and the help of individual members of their staff: Cray Research Inc.; NAG Ltd.

    We would additionally like to thank the following people, who were not acknowledged in the first edition, for their contributions:

    Françoise Chatelin, Inderjit Dhillon, Stan Eisenstat, Vince Fernando, Ming Gu, Rencang Li, Xiaoye Li, George Ostrouchov, Antoine Petitet, Chris Puscasiu, Huan Ren, Jeff Rutter, Ken Stanley, Steve Timson, and Clint Whaley.

    * The royalties from the sales of this book are being placed in a
      fund to help students attend SIAM meetings and other SIAM related
      activities.  This fund is administered by SIAM and qualified
      individuals are encouraged to write directly to SIAM for guidelines.
    



    next up previous contents index
    Next: Preface to the Up: List of Tables Previous: List of Tables




    Tue Nov 29 14:03:33 EST 1994

    Generalized Linear Least Squares (LSE and GLM) Problems



    next up previous contents index
    Next: Standard Eigenvalue and Up: Driver Routines Previous: Linear Least Squares

    Generalized Linear Least Squares (LSE and GLM) Problems

     

    Driver routines are provided for two types of generalized linear least squares problems.      

    The first is

     

    where A is an m-by-m matrix and B is a p-by-n matrix, c is a given m-vector, and d is a given p-vector, with p < = n < = m + p. This is called a linear equality-constrained least squares problem (LSE). The routine xGGLSE       solves this problem using the generalized RQ (GRQ) factorization,     on the assumptions that B has full row rank p and the matrix has full column rank n. Under these assumptions, the problem LSE   has a unique solution.

    The second generalized linear least squares problem is

       

    where A is an n-by-m matrix, B is an n-by-p matrix, and d is a given n-vector, with m < = n < = m + p. This is sometimes called a general (Gauss-Markov) linear model problem (GLM).       When B = I, the problem reduces to an ordinary linear least squares problem. When B is square and nonsingular, the GLM problem is equivalent to the weighted linear least squares problem:  

    The routine xGGGLM         solves this problem using the generalized QR (GQR) factorization,     on the assumptions that A has full column rank m, and the matrix (A , B) has full row rank n. Under these assumptions, the problem is always consistent, and there are unique solutions x and y. The driver routines for generalized linear least squares problems are listed in Table 2.4.

    ------------------------------------------------------------------
                                   Single precision   Double precision
    Operation                      real     complex   real     complex
    ------------------------------------------------------------------
    solve LSE problem using GQR    SGGLSE   CGGLSE    DGGLSE   ZGGLSE
    solve GLM problem using GQR    SGGGLM   CGGGLM    DGGGLM   ZGGGLM
    ------------------------------------------------------------------
    
    Table 2.4: Driver routines for generalized linear least squares problems




    Tue Nov 29 14:03:33 EST 1994

    Standard Eigenvalue and Singular Value Problems



    next up previous contents index
    Next: Symmetric Eigenproblems (SEP) Up: Driver Routines Previous: Generalized Linear Least

    Standard Eigenvalue and Singular Value Problems

     






    Tue Nov 29 14:03:33 EST 1994

    Symmetric Eigenproblems (SEP)



    next up previous contents index
    Next: Nonsymmetric Eigenproblems (NEP) Up: Standard Eigenvalue and Previous: Standard Eigenvalue and

    Symmetric Eigenproblems (SEP)

     

    The symmetric eigenvalue problem is to find the eigenvalues    , , and corresponding eigenvectors  , , such that

    For the Hermitian eigenvalue problem we have

    For both problems the eigenvalues are real.

    When all eigenvalues and eigenvectors have been computed, we write:

    where is a diagonal matrix whose diagonal elements are the eigenvalues  , and Z is an orthogonal (or unitary) matrix whose columns are the eigenvectors. This is the classical spectral factorization   of A.

    Three types of driver routines   are provided for symmetric or Hermitian eigenproblems:

    Different driver routines are provided to take advantage of special structure or storage of the matrix A, as shown in Table 2.5.

    In the future LAPACK will include routines based on the Jacobi algorithm [76] [69] [24], which are slower than the above routines but can be significantly more accurate.




    Tue Nov 29 14:03:33 EST 1994

    Nonsymmetric Eigenproblems (NEP)



    next up previous contents index
    Next: Singular Value Decomposition Up: Standard Eigenvalue and Previous: Symmetric Eigenproblems (SEP)

    Nonsymmetric Eigenproblems (NEP)

    The nonsymmetric eigenvalue problem is to find the eigenvalues    , , and corresponding eigenvectors  , , such that

    A real matrix A may have complex eigenvalues, occurring as complex conjugate pairs. More precisely, the vector v is called a right eigenvector   of A, and a vector satisfying

    is called a left eigenvector   of A.

    This problem can be solved via the Schur factorization   of A, defined in the real case as

    where Z is an orthogonal matrix and T is an upper quasi-triangular matrix with 1-by-1 and 2-by-2 diagonal blocks, the 2-by-2 blocks corresponding to complex conjugate pairs of eigenvalues of A. In the complex case the Schur factorization is

    where Z is unitary and T is a complex upper triangular matrix.

    The columns of Z are called the Schur vectors  . For each k (1 < = k < = n) , the first k columns of Z form an orthonormal   basis for the invariant subspace corresponding to the first k eigenvalues on the diagonal of T. Because this basis is orthonormal, it is preferable in many applications to compute Schur vectors rather than eigenvectors. It is possible to order the Schur factorization so that any desired set of k eigenvalues occupy the k leading positions on the diagonal of T.

    Two pairs of drivers   are provided, one pair focusing on the Schur factorization, and the other pair on the eigenvalues and eigenvectors as shown in Table 2.5:



    next up previous contents index
    Next: Singular Value Decomposition Up: Standard Eigenvalue and Previous: Symmetric Eigenproblems (SEP)




    Tue Nov 29 14:03:33 EST 1994

    Singular Value Decomposition (SVD)



    next up previous contents index
    Next: Generalized Eigenvalue and Up: Standard Eigenvalue and Previous: Nonsymmetric Eigenproblems (NEP)

    Singular Value Decomposition (SVD)

    The singular value decomposition of an m-by-n matrix A is given by    

    where U and V are orthogonal (unitary) and is an m-by-n diagonal matrix with real diagonal elements, , such that

    The are the singular values of A and the first min(m , n) columns of U and V are the left and right singular vectors of A.    

    The singular values and singular vectors satisfy:

    where and are the i-th columns of U and V respectively.

    A single driver   routine xGESVD         computes all or part of the singular value decomposition of a general nonsymmetric matrix (see Table 2.5).   A future version of LAPACK will include a driver based on divide and conquer, as in section 2.2.4.1.

    --------------------------------------------------------------------------
    Type of                                 Single precision  Double precision
    problem  Function and storage scheme    real     complex  real     complex
    --------------------------------------------------------------------------
    SEP      simple driver                  SSYEV    CHEEV    DSYEV    ZHEEV
             expert driver                  SSYEVX   CHEEVX   DSYEVX   ZHEEVX
    --------------------------------------------------------------------------
             simple driver (packed storage) SSPEV    CHPEV    DSPEV    ZHPEV
             expert driver (packed storage) SSPEVX   CHPEVX   DSPEVX   ZHPEVX
    --------------------------------------------------------------------------
             simple driver (band matrix)    SSBEV    CHBEV    DSBEV    ZHBEV
             expert driver (band matrix)    SSBEVX   CHBEVX   DSBEVX   ZHBEVX
    --------------------------------------------------------------------------
             simple driver (tridiagonal     SSTEV             DSTEV   
              matrix)
             expert driver (tridiagonal     SSTEVX            DSTEVX  
              matrix)
    --------------------------------------------------------------------------
    NEP      simple driver for              SGEES    CGEES    DGEES    ZGEES
              Schur factorization
             expert driver for              SGEESX   CGEESX   DGEESX   ZGEESX
              Schur factorization
             simple driver for              SGEEV    CGEEV    DGEEV    ZGEEV
              eigenvalues/vectors
             expert driver for              SGEEVX   CGEEVX   DGEEVX   ZGEEVX
              eigenvalues/vectors
    --------------------------------------------------------------------------
    SVD      singular values/vectors        SGESVD   CGESVD   DGESVD   ZGESVD
    --------------------------------------------------------------------------
    

    Table 2.5: Driver routines for standard eigenvalue and singular value problems




    Tue Nov 29 14:03:33 EST 1994

    Generalized Eigenvalue and Singular Value Problems



    next up previous contents index
    Next: Generalized Symmetric Definite Up: Driver Routines Previous: Singular Value Decomposition

    Generalized Eigenvalue and Singular Value Problems

     






    Tue Nov 29 14:03:33 EST 1994

    Generalized Symmetric Definite Eigenproblems (GSEP)



    next up previous contents index
    Next: Generalized Nonsymmetric Eigenproblems Up: Generalized Eigenvalue and Previous: Generalized Eigenvalue and

    Generalized Symmetric Definite Eigenproblems (GSEP)

     

        Simple drivers are provided to compute all the eigenvalues and (optionally) the eigenvectors of the following types of problems:

    where A and B are symmetric or Hermitian and B is positive definite. For all these problems the eigenvalues   are real. The matrices Z of computed eigenvectors   satisfy (problem types 1 and 3) or (problem type 2), where is a diagonal matrix with the eigenvalues on the diagonal. Z also satisfies (problem types 1 and 2) or (problem type 3).

    The routines are listed in Table 2.6.




    Tue Nov 29 14:03:33 EST 1994

    Generalized Nonsymmetric Eigenproblems (GNEP)



    next up previous contents index
    Next: Generalized Singular Value Up: Generalized Eigenvalue and Previous: Generalized Symmetric Definite

    Generalized Nonsymmetric Eigenproblems (GNEP)

     

        Given two square matrices A and B, the generalized nonsymmetric eigenvalue problem is to find     the eigenvalues   and corresponding eigenvectors   such that

    or find the eigenvalues and corresponding eigenvectors such that

    Note that these problems are equivalent with and if neither nor is zero. In order to deal with the case that or is zero, or nearly so, the LAPACK routines return two values, and , for each eigenvalue, such that and .

    More precisely, and are called right eigenvectors. Vectors   or satisfying

    are called left eigenvectors.  

    If the determinant of is zero for all values of , the eigenvalue problem is called singular, and is signaled by some   (in the presence of roundoff, and may be very small). In this case the eigenvalue problem is very ill-conditioned,   and in fact some of the other nonzero values of and may be indeterminate [21] [80] [71].

    The generalized nonsymmetric eigenvalue problem can be solved via the generalized Schur factorization   of the pair A,B, defined in the real case as

    where Q and Z are orthogonal matrices, P is upper triangular, and S is an upper quasi-triangular matrix with 1-by-1 and 2-by-2 diagonal blocks, the 2-by-2 blocks corresponding to complex conjugate pairs of eigenvalues of A,B. In the complex case the Schur factorization is

    where Q and Z are unitary and S and P are both upper triangular.

    The columns of Q and Z are called generalized Schur vectors     and span pairs of deflating subspaces of A and B [72].     Deflating subspaces are a generalization of invariant subspaces:     For each k (1 < = k < = n), the first k columns of Z span a right deflating subspace mapped by both A and B into a left deflating subspace spanned by the first k columns of Q.

      Two simple drivers are provided for the nonsymmetric problem :

    as shown in Table 2.6. In later versions of LAPACK we plan to provide expert drivers analogous to xGEESX and xGEEVX.



    next up previous contents index
    Next: Generalized Singular Value Up: Generalized Eigenvalue and Previous: Generalized Symmetric Definite




    Tue Nov 29 14:03:33 EST 1994

    Generalized Singular Value Decomposition (GSVD)



    next up previous contents index
    Next: Computational Routines Up: Generalized Eigenvalue and Previous: Generalized Nonsymmetric Eigenproblems

    Generalized Singular Value Decomposition (GSVD)

     

              The generalized (or quotient) singular value decomposition of an m-by-n matrix A and a p-by-n matrix B is given by the pair of factorizations

    The matrices in these factorizations have the following properties:

    and have the following detailed structures, depending on whether m - r > = 0 or m - r < 0. In the first case, m - r > = 0, then

    Here l is the rank of B, m = r - 1, C and S are diagonal matrices satisfying , and S is nonsingular. We may also identify , for , , and for . Thus, the first k generalized singular values are infinite, and the remaining l generalized singular values are finite.

    In the second case, when m - r < 0,

    and

    Again, l is the rank of B, k = r - 1, C and S are diagonal matrices satisfying , S is nonsingular, and we may identify , for , , , for , and . Thus, the first generalized singular values are infinite, and the remaining generalized singular values are finite.

    Here are some important special cases of the generalized singular value decomposition.     First, if B is square and nonsingular, then r = n and the generalized singular value decomposition of A and B is equivalent to the singular value decomposition of , where the singular values of are equal to the generalized singular values of the pair A,B:

    Second, if the columns of are orthonormal, then r = n, R = I and the generalized singular value decomposition of A and B is equivalent to the CS (Cosine-Sine) decomposition of [45]:  

    Third, the generalized eigenvalues and eigenvectors of can be expressed in terms of the generalized singular value decomposition: Let

    Then

    Therefore, the columns of X are the eigenvectors of , and the ``nontrivial'' eigenvalues are the squares of the generalized singular values (see also section 2.2.5.1). ``Trivial'' eigenvalues are those corresponding to the leading n - r columns of X, which span the common null space of and .     The ``trivial eigenvalues'' are not well defined gif .

    A single driver routine xGGSVD         computes the generalized singular value decomposition     of A and B (see Table 2.6). The method is based on the method described in [12] [10] [62].

       

    ----------------------------------------------------------------
    Type of  Function and         Single precision  Double precision
    problem  storage scheme       real     complex  real     complex
    ----------------------------------------------------------------
    GSEP     simple driver        SSYGV    CHEGV    DSYGV    ZHEGV
             simple driver        SSPGV    CHPGV    DSPGV    ZHPGV
              (packed storage)
             simple driver        SSBGV    CHBGV    DSBGV    ZHBGV
              (band matrices) 
    ----------------------------------------------------------------
    GNEP     simple driver for    SGEGS    CGEGS    DGEGS    ZGEGS
              Schur factorization 
             simple driver for    SGEGV    CGEGV    DGEGV    ZGEGV
              eigenvalues/vectors 
    ----------------------------------------------------------------
    GSVD     singular values/     SGGSVD   CGGSVD   DGGSVD  ZGGSVD
             vectors 
    -----------------------------------------------------------------
    
    Table 2.6: Driver routines for generalized eigenvalue and singular value problems



    next up previous contents index
    Next: Computational Routines Up: Generalized Eigenvalue and Previous: Generalized Nonsymmetric Eigenproblems




    Tue Nov 29 14:03:33 EST 1994

    Computational Routines



    next up previous contents index
    Next: Linear Equations Up: Contents of LAPACK Previous: Generalized Singular Value

    Computational Routines

       






    Tue Nov 29 14:03:33 EST 1994

    Preface to the First Edition



    next up previous contents index
    Next: Guide Up: List of Tables Previous: Preface to the

    Preface to the First Edition

    The development of LAPACK was a natural step after specifications of the Level 2 and 3 BLAS were drawn up in 1984-86 and 1987-88. Research on block algorithms had been ongoing for several years, but agreement on the BLAS made it possible to construct a new software package to take the place of LINPACK and EISPACK, which would achieve much greater efficiency on modern high-performance computers. This also seemed to be a good time to implement a number of algorithmic advances that had been made since LINPACK and EISPACK were written in the 1970's. The proposal for LAPACK was submitted while the Level 3 BLAS were still being developed and funding was obtained from the National Science Foundation (NSF) beginning in 1987.

    LAPACK is more than just a more efficient update of its popular predecessors. It extends the functionality of LINPACK and EISPACK by including: driver routines for linear systems; equilibration, iterative refinement and error bounds for linear systems; routines for computing and re-ordering the Schur factorization; and condition estimation routines for eigenvalue problems. LAPACK improves on the accuracy of the standard algorithms in EISPACK by including high accuracy algorithms for finding singular values and eigenvalues of bidiagonal and tridiagonal matrices, respectively, that arise in SVD and symmetric eigenvalue problems.

    We have tried to be consistent with our documentation and coding style throughout LAPACK in the hope that LAPACK will serve as a model for other software development efforts. In particular, we hope that LAPACK and this guide will be of value in the classroom. But above all, LAPACK has been designed to be used for serious computation, especially as a source of building blocks for larger applications.

    The LAPACK project has been a research project on achieving good performance in a portable way over a large class of modern computers. This goal has been achieved, subject to the following qualifications. For optimal performance, it is necessary, first, that the BLAS are implemented efficiently on the target machine, and second, that a small number of tuning parameters (such as the block size) have been set to suitable values (reasonable default values are provided). Most of the LAPACK code is written in standard Fortran 77, but the double precision complex data type is not part of the standard, so we have had to make some assumptions about the names of intrinsic functions that do not hold on all machines (see section 6.1). Finally, our rigorous testing suite included test problems scaled at the extremes of the arithmetic range, which can vary greatly from machine to machine. On some machines, we have had to restrict the range more than on others.

    Since most of the performance improvements in LAPACK come from restructuring the algorithms to use the Level 2 and 3 BLAS, we benefited greatly by having access from the early stages of the project to a complete set of BLAS developed for the CRAY machines by Cray Research. Later, the BLAS library developed by IBM for the IBM RISC/6000 was very helpful in proving the worth of block algorithms and LAPACK on ``super-scalar'' workstations. Many of our test sites, both computer vendors and research institutions, also worked on optimizing the BLAS and thus helped to get good performance from LAPACK. We are very pleased at the extent to which the user community has embraced the BLAS, not only for performance reasons, but also because we feel developing software around a core set of common routines like the BLAS is good software engineering practice.

    A number of technical reports were written during the development of LAPACK and published as LAPACK Working Notes, initially by Argonne National Laboratory and later by the University of Tennessee. Many of these reports later appeared as journal articles. Appendix E lists the LAPACK Working Notes, and the Bibliography gives the most recent published reference.

    A follow-on project, LAPACK 2, has been funded in the U.S. by the NSF and DARPA. One of its aims will be to add a modest amount of additional functionality to the current LAPACK package - for example, routines for the generalized SVD and additional routines for generalized eigenproblems. These routines will be included in a future release of LAPACK when they are available. LAPACK 2 will also produce routines which implement LAPACK-type algorithms for distributed memory machines, routines which take special advantage of IEEE arithmetic, and versions of parts of LAPACK in C and Fortran 90. The precise form of these other software packages which will result from LAPACK 2 has not yet been decided.

    As the successor to LINPACK and EISPACK, LAPACK has drawn heavily on both the software and documentation from those collections. The test and timing software for the Level 2 and 3 BLAS was used as a model for the LAPACK test and timing software, and in fact the LAPACK timing software includes the BLAS timing software as a subset. Formatting of the software and conversion from single to double precision was done using Toolpack/1 [66], which was indispensable to the project. We owe a great debt to our colleagues who have helped create the infrastructure of scientific computing on which LAPACK has been built.

    The development of LAPACK was primarily supported by NSF grant ASC-8715728. Zhaojun Bai had partial support from DARPA grant F49620-87-C0065; Christian Bischof was supported by the Applied Mathematical Sciences subprogram of the Office of Energy Research, U.S. Department of Energy, under contract W-31-109-Eng-38; James Demmel had partial support from NSF grant DCR-8552474; and Jack Dongarra had partial support from the Applied Mathematical Sciences subprogram of the Office of Energy Research, U.S. Department of Energy, under Contract DE-AC05-84OR21400.

    The cover was designed by Alan Edelman at UC Berkeley who discovered the matrix by performing Gaussian elimination on a certain 20-by-20 Hadamard matrix.

    We acknowledge with gratitude the support which we have received from the following organizations, and the help of individual members of their staff: Cornell Theory Center; Cray Research Inc.; IBM ECSEC Rome; IBM Scientific Center, Bergen; NAG Ltd.

    We also thank many, many people who have contributed code, criticism, ideas and encouragement. We wish especially to acknowledge the contributions of: Mario Arioli, Mir Assadullah, Jesse Barlow, Mel Ciment, Percy Deift, Augustin Dubrulle, Iain Duff, Alan Edelman, Victor Eijkhout, Sam Figueroa, Pat Gaffney, Nick Higham, Liz Jessup, Bo Kågström, Velvel Kahan, Linda Kaufman, L.-C. Li, Bob Manchek, Peter Mayes, Cleve Moler, Beresford Parlett, Mick Pont, Giuseppe Radicati, Tom Rowan, Pete Stewart, Peter Tang, Carlos Tomei, Charlie Van Loan, Kresimir Veselic, Phuong Vu, and Reed Wade.

    Finally we thank all the test sites who received three preliminary distributions of LAPACK software and who ran an extensive series of test programs and timing programs for us; their efforts have influenced the final version of the package in numerous ways.

    * The royalties from the sales of this book are being placed in a
      fund to help students attend SIAM meetings and other SIAM related
      activities.  This fund is administered by SIAM and qualified
      individuals are encouraged to write directly to SIAM for guidelines.
    



    next up previous contents index
    Next: Guide Up: List of Tables Previous: Preface to the




    Tue Nov 29 14:03:33 EST 1994

    Linear Equations



    next up previous contents index
    Next: Orthogonal Factorizations and Up: Computational Routines Previous: Computational Routines

    Linear Equations

     

    We use the standard notation for a system of simultaneous linear   equations  :

     Ax = b

    where A is the coefficient matrix, b is the right hand side, and x is the solution. In ( 2.4) A is assumed to be a square matrix of order n, but some of the individual routines allow A to be rectangular. If there are several right hand sides, we write

     AX = B

    where the columns of B are the individual right hand sides, and the columns of X are the corresponding solutions. The basic task is to compute X, given A and B.

    If A is upper or lower triangular, ( 2.4) can be solved by a straightforward process of backward or forward substitution. Otherwise, the solution is obtained after first factorizing A as a product of triangular matrices (and possibly also a diagonal matrix or permutation matrix).

    The form of the factorization depends on the properties of the matrix A. LAPACK provides routines for the following types of matrices, based on the stated   factorizations:

    The factorization for a general tridiagonal matrix is like that for a general band matrix with kl = 1 and ku = 1. The factorization for a symmetric positive definite band matrix with k superdiagonals (or subdiagonals) has the same form as for a symmetric positive definite matrix, but the factor U (or L) is a band matrix with k superdiagonals (subdiagonals). Band matrices use a compact band storage scheme described in section 5.3.3. LAPACK routines are also provided for symmetric matrices (whether positive definite or indefinite) using packed storage, as described in section 5.3.2.

    While the primary use of a matrix factorization is to solve a system of equations, other related tasks are provided as well. Wherever possible, LAPACK provides routines to perform each of these tasks for each type of matrix and storage scheme (see Tables 2.7 and 2.8). The following list relates the tasks to the last 3 characters of the name of the corresponding computational routine:

    xyyTRF:
    factorize (obviously not needed for triangular matrices);                                                                                

    xyyTRS:
    use the factorization (or the matrix A itself if it is triangular) to solve ( 2.5) by forward or backward substitution;                                                                                                        

    xyyCON:
    estimate the reciprocal of the condition number ; Higham's modification [52] of Hager's method [48] is used to estimate , except for symmetric positive definite tridiagonal matrices for which it is computed directly with comparable efficiency [50];                                                                                                        

    xyyRFS:
    compute bounds on the error in the computed solution (returned by the xyyTRS routine), and refine the solution to reduce the backward error (see below);                                                                                                        

    xyyTRI:
    use the factorization (or the matrix A itself if it is triangular) to compute (not provided for band matrices, because the inverse does not in general preserve bandedness);                                                                

    xyyEQU:
    compute scaling factors to equilibrate   A (not provided for tridiagonal, symmetric indefinite, or triangular matrices). These routines do not actually scale the matrices: auxiliary routines xLAQyy may be used for that purpose - see the code of the driver routines xyySVX for sample usage  .                                        

    Note that some of the above routines depend on the output of others:

    xyyTRF:
    may work on an equilibrated matrix produced by xyyEQU and xLAQyy, if yy is one of {GE, GB, PO, PP, PB};

    xyyTRS:
    requires the factorization returned by xyyTRF;

    xyyCON:
    requires the norm of the original matrix A, and the factorization returned by xyyTRF;

    xyyRFS:
    requires the original matrices A and B, the factorization returned by xyyTRF, and the solution X returned by xyyTRS;

    xyyTRI:
    requires the factorization returned by xyyTRF.

    The RFS (``refine solution'') routines perform iterative refinement   and compute backward and forward error   bounds for the solution. Iterative refinement is done in the same precision as the input data. In particular, the residual is not computed with extra precision, as has been traditionally done. The benefit of this procedure is discussed in Section 4.4.

    --------------------------------------------------------------------------------
    Type of matrix     Operation                  Single precision  Double precision
    and storage scheme                            real     complex  real     complex
    --------------------------------------------------------------------------------
    general            factorize                  SGETRF   CGETRF   DGETRF   ZGETRF
                       solve using factorization  SGETRS   CGETRS   DGETRS   ZGETRS
                       estimate condition number  SGECON   CGECON   DGECON   ZGECON
                       error bounds for solution  SGERFS   CGERFS   DGERFS   ZGERFS
                       invert using factorization SGETRI   CGETRI   DGETRI   ZGETRI
                       equilibrate                SGEEQU   CGEEQU   DGEEQU   ZGEEQU
    --------------------------------------------------------------------------------
    general            factorize                  SGBTRF   CGBTRF   DGBTRF   ZGBTRF
     band              solve using factorization  SGBTRS   CGBTRS   DGBTRS   ZGBTRS
                       estimate condition number  SGBCON   CGBCON   DGBCON   ZGBCON
                       error bounds for solution  SGBRFS   CGBRFS   DGBRFS   ZGBRFS
                       equilibrate                SGBEQU   CGBEQU   DGBEQU   ZGBEQU
    --------------------------------------------------------------------------------
    general            factorize                  SGTTRF   CGTTRF   DGTTRF   ZGTTRF
     tridiagonal       solve using factorization  SGTTRS   CGTTRS   DGTTRS   ZGTTRS
                       estimate condition number  SGTCON   CGTCON   DGTCON   ZGTCON
                       error bounds for solution  SGTRFS   CGTRFS   DGTRFS   ZGTRFS
    --------------------------------------------------------------------------------
    symmetric/         factorize                  SPOTRF   CPOTRF   DPOTRF   ZPOTRF
     Hermitian         solve using factorization  SPOTRS   CPOTRS   DPOTRS   ZPOTRS
     positive definite estimate condition number  SPOCON   CPOCON   DPOCON   ZPOCON
                       error bounds for solution  SPORFS   CPORFS   DPORFS   ZPORFS
                       invert using factorization SPOTRI   CPOTRI   DPOTRI   ZPOTRI
                       equilibrate                SPOEQU   CPOEQU   DPOEQU   ZPOEQU
    --------------------------------------------------------------------------------
    symmetric/         factorize                  SPPTRF   CPPTRF   DPPTRF   ZPPTRF
     Hermitian         solve using factorization  SPPTRS   CPPTRS   DPPTRS   ZPPTRS
     positive definite estimate condition number  SPPCON   CPPCON   DPPCON   ZPPCON
     (packed storage)  error bounds for solution  SPPRFS   CPPRFS   DPPRFS   ZPPRFS
                       invert using factorization SPPTRI   CPPTRI   DPPTRI   ZPPTRI
                       equilibrate                SPPEQU   CPPEQU   DPPEQU   ZPPEQU
    --------------------------------------------------------------------------------
    symmetric/         factorize                  SPBTRF   CPBTRF   DPBTRF   ZPBTRF
     Hermitian         solve using factorization  SPBTRS   CPBTRS   DPBTRS   ZPBTRS
     positive definite estimate condition number  SPBCON   CPBCON   DPBCON   ZPBCON
     band              error bounds for solution  SPBRFS   CPBRFS   DPBRFS   ZPBRFS
                       equilibrate                SPBEQU   CPBEQU   DPBEQU   ZPBEQU
    --------------------------------------------------------------------------------
    symmetric/         factorize                  SPTTRF   CPTTRF   DPTTRF   ZPTTRF
     Hermitian         solve using factorization  SPTTRS   CPTTRS   DPTTRS   ZPTTRS
     positive definite estimate condition number  SPTCON   CPTCON   DPTCON   ZPTCON
     tridiagonal       error bounds for solution  SPTRFS   CPTRFS   DPTRFS   ZPTRFS
    --------------------------------------------------------------------------------
    
    Table 2.7: Computational routines for linear equations

       
    Table 2.8: Computational routines for linear equations (continued)



    next up previous contents index
    Next: Orthogonal Factorizations and Up: Computational Routines Previous: Computational Routines




    Tue Nov 29 14:03:33 EST 1994

    Orthogonal Factorizations and Linear Least Squares Problems



    next up previous contents index
    Next: Factorization Up: Computational Routines Previous: Linear Equations

    Orthogonal Factorizations and Linear Least Squares Problems

     

    LAPACK provides a number of routines for factorizing a general rectangular m-by-n matrix A, as the product of an orthogonal matrix (unitary if complex) and a triangular (or possibly trapezoidal) matrix.

    A real matrix Q is orthogonal if ; a complex matrix Q is unitary if . Orthogonal or unitary matrices have the important property that they leave the two-norm of a vector invariant:

    As a result, they help to maintain numerical stability because they do not   amplify rounding errors.

    Orthogonal factorizations   are used in the solution of linear least squares problems  . They may also be used to perform preliminary steps in the solution of eigenvalue or singular value problems.






    Tue Nov 29 14:03:33 EST 1994

    <var>QR</var> Factorization



    next up previous contents index
    Next: Factorization Up: Orthogonal Factorizations and Previous: Orthogonal Factorizations and

    QR Factorization

    The most common, and best known, of the factorizations is the QR factorization   given by

    where R is an n-by-n upper triangular matrix and Q is an m-by-m orthogonal (or unitary) matrix. If A is of full rank n, then R is non-singular. It is sometimes convenient to write the factorization as

    which reduces to

    where consists of the first n columns of Q, and the remaining m - n columns.

    If m < n, R is trapezoidal, and the factorization can be written

    where is upper triangular and is rectangular.

    The routine xGEQRF         computes the QR factorization. The matrix Q is not formed explicitly, but is represented as a product of elementary reflectors,     as described in section 5.4. Users need not be aware of the details of this representation, because associated routines are provided to work with Q: xORGQR     (or xUNGQR     in the complex case) can generate all or part of R, while xORMQR     (or xUNMQR    ) can pre- or post-multiply a given matrix by Q or ( if complex).

    The QR factorization can be used to solve the linear least squares problem ( 2.1)   when m > = n and A is of full rank, since

    c can be computed by xORMQR     (or xUNMQR    ), and consists of its first n elements. Then x is the solution of the upper triangular system

    which can be computed by xTRTRS        . The residual vector r is given by

    and may be computed using xORMQR     (or xUNMQR    ). The residual sum of squares may be computed without forming r explicitly, since




    Tue Nov 29 14:03:33 EST 1994

    LQ Factorization



    next up previous contents index
    Next: Factorization with Column Up: Orthogonal Factorizations and Previous: Factorization

    LQ Factorization

    The LQ factorization   is given by

    where L is m-by-m lower triangular, Q is n-by-n orthogonal (or unitary), consists of the first m rows of Q, and the remaining n - m rows.

    This factorization is computed by the routine xGELQF, and again Q is         represented as a product of elementary reflectors; xORGLQ       (or xUNGLQ     in the complex case) can generate all or part of Q, and xORMLQ     (or xUNMLQ    ) can pre- or post-multiply a given matrix by Q or ( if Q is complex).

    The LQ factorization of A is essentially the same as the QR factorization of ( if A is complex), since

    The LQ factorization may be used to find a minimum norm solution   of an underdetermined   system of linear equations Ax = b where A is m-by-n with m < n and has rank m. The solution is given by

    and may be computed by calls to xTRTRS and xORMLQ.            




    Tue Nov 29 14:03:33 EST 1994

    <var>QR</var> Factorization with Column Pivoting



    next up previous contents index
    Next: Complete Orthogonal Factorization Up: Orthogonal Factorizations and Previous: Factorization

    QR Factorization with Column Pivoting

    To solve a linear least squares problem ( 2.1)     when A is not of full rank, or the rank of A is in doubt, we can perform either a QR factorization with column pivoting   or a singular value decomposition (see subsection 2.3.6).

    The QR factorization with column pivoting is given by

    where Q and R are as before and P is a permutation matrix, chosen (in general) so that

    and moreover, for each k,

    In exact arithmetic, if rank(A) = k, then the whole of the submatrix in rows and columns k + 1 to n would be zero. In numerical computation, the aim must be to determine an index k, such that the leading submatrix in the first k rows and columns is well-conditioned, and is negligible:

    Then k is the effective rank of A. See Golub and Van Loan [45] for a further discussion of numerical rank determination.    

    The so-called basic solution to the linear least squares problem ( 2.1)   can be obtained from this factorization as

    where consists of just the first k elements of .

    The routine xGEQPF         computes the QR factorization with column pivoting, but does not attempt to determine the rank of A. The matrix Q is represented in exactly the same way as after a call of xGEQRF        , and so the routines xORGQR and xORMQR can be used to work with Q (xUNGQR and xUNMQR if Q is complex).                




    Tue Nov 29 14:03:33 EST 1994

    Complete Orthogonal Factorization



    next up previous contents index
    Next: Other Factorizations Up: Orthogonal Factorizations and Previous: Factorization with Column

    Complete Orthogonal Factorization

    The QR factorization with column pivoting does not enable us to compute a minimum norm solution to a rank-deficient linear least squares problem,   unless . However, by applying further orthogonal (or unitary) transformations   from the right to the upper trapezoidal matrix , using the routine xTZRQF, can be eliminated:        

    This gives the complete orthogonal factorization  

    from which the minimum norm solution   can be obtained as




    Tue Nov 29 14:03:33 EST 1994

    Other Factorizations



    next up previous contents index
    Next: Generalized Orthogonal Factorizations Up: Orthogonal Factorizations and Previous: Complete Orthogonal Factorization

    Other Factorizations

    The QL and RQ factorizations     are given by

    and

    These factorizations are computed by xGEQLF and xGERQF, respectively; they are                 less commonly used than either the QR or LQ factorizations described above, but have applications in, for example, the computation of generalized QR factorizations [2].    

    All the factorization routines discussed here (except xTZRQF) allow arbitrary m and n, so that in some cases the matrices R or L are trapezoidal rather than triangular. A routine that performs pivoting is provided only for the QR factorization.

       

    ---------------------------------------------------------------------------
    Type of
    factorization                            Single precision  Double precision
    and matrix      Operation                real     complex  real     complex
    ---------------------------------------------------------------------------
    QR, general     factorize with pivoting  SGEQPF   CGEQPF   DGEQPF   ZGEQPF
                    factorize, no pivoting   SGEQRF   CGEQRF   DGEQRF   ZGEQRF
                    generate Q               SORGQR   CUNGQR   DORGQR   ZUNGQR
                    multiply matrix by Q     SORMQR   CUNMQR   DORMQR   ZUNMQR
    ---------------------------------------------------------------------------
    LQ, general     factorize, no pivoting   SGELQF   CGELQF   DGELQF   ZGELQF
                    generate Q               SORGLQ   CUNGLQ   DORGLQ   ZUNGLQ
                    multiply matrix by Q     SORMLQ   CUNMLQ   DORMLQ   ZUNMLQ
    ---------------------------------------------------------------------------
    QL, general     factorize, no pivoting   SGEQLF   CGEQLF   DGEQLF   ZGEQLF
                    generate Q               SORGQL   CUNGQL   DORGQL   ZUNGQL
                    multiply matrix by Q     SORMQL   CUNMQL   DORMQL   ZUNMQL
    ---------------------------------------------------------------------------
    RQ, general     factorize, no pivoting   SGERQF   CGERQF   DGERQF   ZGERQF
                    generate Q               SORGRQ   CUNGRQ   DORGRQ   ZUNGRQ
                    multiply matrix by Q     SORMRQ   CUNMRQ   DORMRQ   ZUNMRQ
    ---------------------------------------------------------------------------
    RQ, trapezoidal factorize, no pivoting   STZRQF   CTZRQF   DTZRQF   ZTZRQF
    ---------------------------------------------------------------------------
    
    Table 2.9: Computational routines for orthogonal factorizations




    Tue Nov 29 14:03:33 EST 1994

    Generalized Orthogonal Factorizations and Linear Least Squares Problems



    next up previous contents index
    Next: Generalized Factorization Up: Computational Routines Previous: Other Factorizations

    Generalized Orthogonal Factorizations and Linear Least Squares Problems

         






    Tue Nov 29 14:03:33 EST 1994

    Generalized <var>QR</var> Factorization



    next up previous contents index
    Next: Generalized factorization Up: Generalized Orthogonal Factorizations Previous: Generalized Orthogonal Factorizations

    Generalized QR Factorization

        The generalized QR (GQR) factorization of an n-by-m matrix A and an n-by-p matrix B is given by the pair of factorizations

    A = QR and B = QTZ

    where Q and Z are respectively n-by-n and p-by-p orthogonal matrices (or unitary matrices if A and B are complex). R has the form:

    or

    where is upper triangular. T has the form

    or

    where or is upper triangular.

    Note that if B is square and nonsingular, the GQR factorization of A and B implicitly gives the QR factorization of the matrix :

    without explicitly computing the matrix inverse or the product .

    The routine xGGQRF computes the GQR   factorization by         first computing the QR factorization of A and then the RQ factorization of . The orthogonal (or unitary) matrices Q and Z can either be formed explicitly or just used to multiply another given matrix in the same way as the orthogonal (or unitary) matrix in the QR factorization (see section 2.3.2).

    The GQR factorization was introduced in [63] [49]. The implementation of the GQR factorization here follows [2]. Further generalizations of the GQR   factorization can be found in [25].

    The GQR factorization can be used to solve the general (Gauss-Markov) linear     model problem (GLM) (see ( 2.3) and [60][page 252]GVL2). Using the GQR factorization of A and B, we rewrite the equation d = Ax + By from ( 2.3) as

    We partition this as

    where

    can be computed by xORMQR (or xUNMQR).        

    The GLM problem is solved by setting

    from which we obtain the desired solutions

    which can be computed by xTRSV, xGEMV and xORMRQ (or xUNMRQ).        




    Tue Nov 29 14:03:33 EST 1994

    Generalized <var>RQ</var> factorization



    next up previous contents index
    Next: Symmetric Eigenproblems Up: Generalized Orthogonal Factorizations Previous: Generalized Factorization

    Generalized RQ factorization

        The generalized RQ (GRQ) factorization of an m-by-n matrix A and a p-by-n matrix B is given by the pair of factorizations

    A = RQ and B = ZTQ

    where Q and Z are respectively n-by-n and p-by-p orthogonal matrices (or unitary matrices if A and B are complex). R has the form

    or

    where or is upper triangular. T has the form

    or

    where is upper triangular.

    Note that if B is square and nonsingular, the GRQ factorization of A and B implicitly gives the RQ factorization of the matrix :

    without explicitly computing the matrix inverse or the product .

    The routine xGGRQF computes the GRQ factorization             by first computing the RQ factorization of A and then the QR factorization of . The orthogonal (or unitary) matrices Q and Z can either be formed explicitly or just used to multiply another given matrix in the same way as the orthogonal (or unitary) matrix in the RQ factorization (see section 2.3.2).

    The GRQ factorization can be used to solve the linear equality-constrained least squares problem (LSE) (see ( 2.2) and     [page 567]GVL2). We use the GRQ factorization of B and A (note that B and A have swapped roles), written as

    B = TQ and A = ZRQ

    We write the linear equality constraints Bx = d as:

    TQx = d

    which we partition as:

    Therefore is the solution of the upper triangular system

    Furthermore,

    We partition this expression as:

    where , which can be computed by xORMQR (or xUNMQR).        

    To solve the LSE problem, we set

    which gives as the solution of the upper triangular system

    Finally, the desired solution is given by

    which can be computed by xORMRQ (or xUNMRQ).        




    Tue Nov 29 14:03:33 EST 1994

    Guide



    next up previous contents index
    Next: Essentials Up: LAPACK Users' Guide Release Previous: Preface to the

    Guide






    Tue Nov 29 14:03:33 EST 1994

    Symmetric Eigenproblems



    next up previous contents index
    Next: Nonsymmetric Eigenproblems Up: Computational Routines Previous: Generalized factorization

    Symmetric Eigenproblems

     

    Let A be a real symmetric     or complex Hermitian n-by-n matrix. A scalar is called an eigenvalue   and a nonzero column vector z the corresponding eigenvector   if . is always real when A is real symmetric or complex Hermitian.

    The basic task of the symmetric eigenproblem routines is to compute values of and, optionally, corresponding vectors z for a given matrix A.

    This computation proceeds in the following stages:

    1. The real symmetric or complex Hermitian matrix A is reduced to real tridiagonal form    T. If A is real symmetric this decomposition is with Q orthogonal and T symmetric tridiagonal. If A is complex Hermitian, the decomposition is with Q unitary and T, as before, real symmetric tridiagonal  .

    2. Eigenvalues and eigenvectors of the real symmetric tridiagonal matrix T are computed. If all eigenvalues and eigenvectors are computed, this is equivalent to factorizing T as , where S is orthogonal and is diagonal. The diagonal entries of are the eigenvalues of T, which are also the eigenvalues of A, and the columns of S are the eigenvectors of T; the eigenvectors of A are the columns of Z = QS, so that ( when A is complex Hermitian).

    In the real case, the decomposition is computed by one of the routines xSYTRD    , xSPTRD, or xSBTRD,         depending on how the matrix is stored (see Table 2.10). The complex analogues of these routines are called xHETRD, xHPTRD, and xHBTRD.             The routine xSYTRD (or xHETRD) represents the matrix Q as a product of elementary reflectors, as described in section 5.4. The routine xORGTR     (or in the complex case xUNMTR)     is provided to form Q explicitly; this is needed in particular before calling xSTEQR         to compute all the eigenvectors of A by the QR algorithm. The routine xORMTR     (or in the complex case xUNMTR)     is provided to multiply another matrix by Q without forming Q explicitly; this can be used to transform eigenvectors of T computed by xSTEIN, back to eigenvectors of A.        

    When packed storage is used, the corresponding routines for forming Q or multiplying another matrix by Q are xOPGTR and xOPMTR         (in the complex case, xUPGTR and xUPMTR).    

    When A is banded and xSBTRD     (or xHBTRD)     is used to reduce it to tridiagonal form    , Q is determined as a product of Givens rotations  , not as a product of elementary reflectors; if Q is required, it must be formed explicitly by the reduction routine. xSBTRD is based on the vectorizable algorithm due to Kaufman [57].

    There are several routines for computing eigenvalues   and eigenvectors   of T, to cover the cases of computing some or all of the eigenvalues, and some or all of the eigenvectors. In addition, some routines run faster in some computing environments or for some matrices than for others. Also, some routines are more accurate than other routines.

    xSTEQR
            This routine uses the implicitly shifted QR algorithm.     It switches between the QR and QL variants in order to handle graded matrices more effectively than the simple QL variant that is provided by the EISPACK routines IMTQL1 and IMTQL2. See [46] for details.
    xSTERF
        This routine uses a square-root free version of the QR algorithm, also switching between QR and QL variants, and can only compute all the eigenvalues. See [46] for details.
    xSTEDC
            This routine uses Cuppen's divide and conquer algorithm   to find the eigenvalues and the eigenvectors (if only eigenvalues are desired, xSTEDC calls xSTERF). xSTEDC can be many times faster than xSTEQR for large matrices but needs more work space ( or ). See [67] [47] [15] for details.
    xPTEQR
            This routine applies to symmetric positive definite tridiagonal matrices only. It uses a combination of Cholesky factorization and bidiagonal QR iteration (see xBDSQR) and may be significantly more accurate than the other routines. See [41] [16] [22] [13] for details.
    xSTEBZ
        This routine uses bisection to compute some or all of the eigenvalues. Options provide for computing all the eigenvalues in a real interval or all the eigenvalues from the i-th to the j-th largest. It can be highly accurate, but may be adjusted to run faster if lower accuracy is acceptable.
    xSTEIN
            Given accurate eigenvalues, this routine uses inverse iteration   to compute some or all of the eigenvectors.

    See Table 2.10.

       

    ------------------------------------------------------------------------------
    Type of matrix                             Single precision   Double precision
    and storage scheme  Operation              real     complex   real     complex
    ------------------------------------------------------------------------------
    dense symmetric     tridiagonal reduction  SSYTRD   CHETRD   DSYTRD   ZHETRD
    (or Hermitian)
    ------------------------------------------------------------------------------
    packed symmetric    tridiagonal reduction  SSPTRD   CHPTRD   DSPTRD   ZHPTRD
    (or Hermitian)
    ------------------------------------------------------------------------------
    band symmetric      tridiagonal reduction  SSBTRD   CHBTRD   DSBTRD   ZHBTRD
    (or Hermitian)
    orthogonal/unitary  generate matrix after  SORGTR   CUNGTR   DORGTR   ZUNGTR
                        reduction by xSYTRD
                        multiply matrix after  SORMTR   CUNMTR   DORMTR   ZUNMTR
                        reduction by xSYTRD
    ------------------------------------------------------------------------------
    orthogonal/unitary  generate matrix after  SOPGTR   CUPGTR   DOPGTR   ZUPGTR
    (packed storage)    reduction by xSPTRD
                        multiply matrix after  SOPMTR   CUPMTR   DOPMTR   ZUPMTR
                        reduction by xSPTRD
    ------------------------------------------------------------------------------
    symmetric            eigenvalues/          SSTEQR   CSTEQR   DSTEQR   ZSTEQR
    tridiagonal          eigenvectors via QR
                         eigenvalues only      SSTERF            DSTERF
                         via root-free QR
                         eigenvalues only      SSTEBZ            DSTEBZ
                         via bisection
                         eigenvectors by       SSTEIN   CSTEIN   DSTEIN   ZSTEIN
                         inverse iteration
    ------------------------------------------------------------------------------
    symmetric            eigenvalues/          SPTEQR   CPTEQR   DPTEQR   ZPTEQR
    tridiagonal          eigenvectors
    positive definite 
    ------------------------------------------------------------------------------
    
    Table 2.10: Computational routines for the symmetric eigenproblem



    next up previous contents index
    Next: Nonsymmetric Eigenproblems Up: Computational Routines Previous: Generalized factorization




    Tue Nov 29 14:03:33 EST 1994

    Nonsymmetric Eigenproblems



    next up previous contents index
    Next: EigenvaluesEigenvectors and Up: Computational Routines Previous: Symmetric Eigenproblems

    Nonsymmetric Eigenproblems

     






    Tue Nov 29 14:03:33 EST 1994

    Eigenvalues, Eigenvectors and Schur Factorization



    next up previous contents index
    Next: Balancing Up: Nonsymmetric Eigenproblems Previous: Nonsymmetric Eigenproblems

    Eigenvalues, Eigenvectors and Schur Factorization

      Let A be a square n-by-n matrix. A scalar is called an eigenvalue   and a non-zero column vector v the corresponding right eigenvector   if . A nonzero column vector u satisfying is called the left eigenvector  . The first basic task of the routines described in this section is to compute, for a given matrix A, all n values of and, if desired, their associated right eigenvectors v and/or left eigenvectors u.

    A second basic task is to compute the Schur factorization of a matrix A.   If A is complex, then its Schur factorization is , where Z is unitary and T is upper triangular. If A is real, its Schur factorization is , where Z is orthogonal. and T is upper quasi-triangular (1-by-1 and 2-by-2 blocks on its diagonal). The columns of Z are called the Schur vectors of A.   The eigenvalues of A appear on the diagonal of T; complex conjugate eigenvalues of a real A correspond to 2-by-2 blocks on the diagonal of T.

    These two basic tasks can be performed in the following stages:

    1. A general matrix A is reduced to upper Hessenberg form   H     which is zero below the first subdiagonal. The reduction may be written with Q orthogonal if A is real, or with Q unitary if A is complex. The reduction is performed by subroutine xGEHRD, which represents         Q in a factored form, as described in section 5.4. The routine xORGHR (or in the complex case xUNGHR) is provided to form Q explicitly.         The routine xORMHR     (or in the complex case xUNMHR) is provided to     multiply another matrix by Q without forming Q explicitly.

    2. The upper Hessenberg matrix H is reduced to Schur form T,   giving the Schur factorization (for H real) or (for H complex). The matrix A (the Schur vectors of H) may optionally be computed as well. Alternatively S may be postmultiplied into the matrix Q determined in stage 1, to give the matrix Z = QS, the Schur vectors of A. The eigenvalues   are obtained from the diagonal of T. All this is done by subroutine xHSEQR.        

    3. Given the eigenvalues, the eigenvectors may be computed in two different ways. xHSEIN performs inverse iteration   on H to compute         the eigenvectors of H; xORMHR     can then be used to multiply the eigenvectors by the matrix Q in order to transform them to eigenvectors of A. xTREVC         computes the eigenvectors of T, and optionally transforms them to those of H or A if the matrix S or Z is supplied. Both xHSEIN and xTREVC allow selected left and/or right eigenvectors to be computed.

    Other subsidiary tasks may be performed before or after those just described.



    next up previous contents index
    Next: Balancing Up: Nonsymmetric Eigenproblems Previous: Nonsymmetric Eigenproblems




    Tue Nov 29 14:03:33 EST 1994

    Balancing



    next up previous contents index
    Next: Invariant Subspaces and Up: Nonsymmetric Eigenproblems Previous: EigenvaluesEigenvectors and

    Balancing

    The routine xGEBAL         may be used to balance the matrix A prior to reduction to Hessenberg form  . Balancing involves two steps, either of which is optional:

    If A was balanced by xGEBAL, then eigenvectors computed by subsequent operations are eigenvectors of the balanced matrix ; xGEBAK         must then be called to transform them back to eigenvectors of the original matrix A.




    Tue Nov 29 14:03:33 EST 1994

    Invariant Subspaces and Condition Numbers



    next up previous contents index
    Next: Singular Value Decomposition Up: Nonsymmetric Eigenproblems Previous: Balancing

    Invariant Subspaces and Condition Numbers

    The Schur form   depends on the order of the eigenvalues on the diagonal of T and this may optionally be chosen by the user. Suppose the user chooses that ,
    1 < = j < = n, appear in the upper left corner of T. Then the first j columns of Z span the right invariant subspace of A corresponding to .    

    The following routines perform this re-ordering and also   compute condition numbers for eigenvalues, eigenvectors, and invariant subspaces:

    1. xTREXC         will move an eigenvalue (or 2-by-2 block) on the diagonal of the Schur form   from its original position to any other position. It may be used to choose the order in which eigenvalues appear in the Schur form.
    2. xTRSYL         solves the Sylvester matrix equation   for A, given matrices A, B and C, with A and B (quasi) triangular. It is used in the routines xTRSNA and xTRSEN, but it is also of independent interest.
    3. xTRSNA         computes the condition numbers of the eigenvalues and/or right eigenvectors of a matrix T in Schur form.   These are the same as the condition   numbers of the eigenvalues and right eigenvectors of the original matrix A from which T is derived. The user may compute these condition numbers for all eigenvalue/eigenvector pairs, or for any selected subset. For more details, see section 4.8 and [11].

    4. xTRSEN         moves   a selected subset of the eigenvalues of a matrix T in Schur form to the upper left corner of T, and optionally computes the condition numbers   of their average value and of their right invariant subspace. These are the same as the condition numbers of the average eigenvalue and right invariant subspace of the original matrix A from which T is derived. For more details, see section 4.8 and [11]

    See Table 2.11 for a complete list of the routines.

       

    -----------------------------------------------------------------------------
    Type of matrix                             Single precision  Double precision
    and storage scheme  Operation              real     complex  real     complex
    -----------------------------------------------------------------------------
    general             Hessenberg reduction   SGEHRD   CGEHRD   DGEHRD   ZGEHRD
                        balancing              SGEBAL   CGEBAL   DGEBAL   ZGEBAL
                        backtransforming       SGEBAK   CGEBAK   DGEBAK   ZGEBAK
    -----------------------------------------------------------------------------
    orthogonal/unitary  generate matrix after  SORGHR   CUNGHR   DORGHR   ZUNGHR
                        Hessenberg reduction
                        multiply matrix after  SORMHR   CUNMHR   DORMHR   ZUNMHR
                        Hessenberg reduction
    -----------------------------------------------------------------------------
    Hessenberg          Schur factorization    SHSEQR   CHSEQR   DHSEQR   ZHSEQR
                        eigenvectors by        SHSEIN   CHSEIN   DHSEIN   ZHSEIN
                        inverse iteration
    -----------------------------------------------------------------------------
    (quasi)triangular   eigenvectors           STREVC   CTREVC   DTREVC   ZTREVC
                        reordering Schur       STREXC   CTREXC   DTREXC   ZTREXC
                        factorization
                        Sylvester equation     STRSYL   CTRSYL   DTRSYL   ZTRSYL
                        condition numbers of   STRSNA   CTRSNA   DTRSNA   ZTRSNA
                        eigenvalues/vectors
                        condition numbers of   STRSEN   CTRSEN   DTRSEN   ZTRSEN
                        eigenvalue cluster/
                        invariant subspace
    -----------------------------------------------------------------------------
    
    Table 2.11: Computational routines for the nonsymmetric eigenproblem



    next up previous contents index
    Next: Singular Value Decomposition Up: Nonsymmetric Eigenproblems Previous: Balancing




    Tue Nov 29 14:03:33 EST 1994

    Singular Value Decomposition



    next up previous contents index
    Next: Generalized Symmetric Definite Up: Computational Routines Previous: Invariant Subspaces and

    Singular Value Decomposition

     

    Let A be a general real m-by-n matrix. The singular value decomposition (SVD) of A is the factorization   , where U and V are orthogonal, and , r = min(m , n), with . If A is complex, then its SVD is where U and V are unitary, and is as before with real diagonal elements. The are called the singular values  , the first r columns of V the right singular vectors   and the first r columns of U the left singular vectors  .

    The routines described in this section, and listed in Table 2.12, are used to compute this decomposition. The computation proceeds in the following stages:

    1. The matrix A is reduced to bidiagonal   form: if A is real ( if A is complex), where and are orthogonal (unitary if A is complex), and B is real and upper-bidiagonal when m > = n and lower bidiagonal when m < n, so that B is nonzero only on the main diagonal and either on the first superdiagonal (if m > = n) or the first subdiagonal (if m < n).

    2. The SVD of the bidiagonal matrix B is computed: , where and are orthogonal and is diagonal as described above. The singular vectors of A are then and .

    The reduction to bidiagonal form is performed by the subroutine xGEBRD,           or by xGBBRD         for a band matrix.

    The routine xGEBRD represents and in factored form as products of elementary reflectors,     as described in section 5.4. If A is real, the matrices and may be computed explicitly using routine xORGBR,     or multiplied by other matrices without forming and using routine xORMBR    . If A is complex, one instead uses xUNGBR     and xUNMBR    , respectively.

    If A is banded and xGBBRD is used to reduce it to bidiagonal form, and are determined as products of Givens rotations  , rather than as products of elementary reflectors. If or is required, it must be formed explicitly by xGBBRD. xGBBRD uses a vectorizable algorithm, similar to that used by xSBTRD (see Kaufman [57]). xGBBRD may be much faster than xGEBRD when the bandwidth is narrow.

    The SVD of the bidiagonal matrix is computed by the subroutine xBDSQR.         xBDSQR is more accurate than its counterparts in LINPACK and EISPACK: barring underflow and overflow, it computes all the singular values of A to nearly full relative precision, independent of their magnitudes. It also computes the singular vectors much more accurately. See section 4.9 and [41] [16] [22] for details.

    If m >> n, it may be more efficient to first perform a QR factorization of A, using the routine xGEQRF        , and then to compute the SVD of the n-by-n matrix R, since if A = QR and , then the SVD of A is given by . Similarly, if m << n, it may be more efficient to first perform an LQ factorization of A, using xGELQF. These preliminary QR and LQ         factorizations are performed by the driver xGESVD.        

    The SVD may be used to find a minimum norm solution   to a (possibly) rank-deficient linear least squares   problem ( 2.1). The effective rank, k, of A can be determined as the number of singular values which exceed a suitable threshold. Let be the leading k-by-k submatrix of , and be the matrix consisting of the first k columns of V. Then the solution is given by:

    where consists of the first k elements of . can be computed using xORMBR, and     xBDSQR has an option to multiply a vector by .        

       

    -----------------------------------------------------------------------------
    Type of matrix                             Single precision  Double precision
    and storage scheme  Operation              real     complex  real     complex
    -----------------------------------------------------------------------------
    general             bidiagonal reduction   SGEBRD   CGEBRD   DGEBRD   ZGEBRD
    -----------------------------------------------------------------------------
    general band        bidiagonal reduction   SGBBRD   CGBBRD   DGBBRD   ZGBBRD
    -----------------------------------------------------------------------------
    orthogonal/unitary  generate matrix after  SORGBR   CUNGBR   DORGBR   ZUNGBR
                        bidiagonal reduction
                        multiply matrix after  SORMBR   CUNMBR   DORMBR   ZUNMBR
                        bidiagonal reduction
    -----------------------------------------------------------------------------
    bidiagonal          singular values/       SBDSQR   CBDSQR   DBDSQR   ZBDSQR
                        singular vectors
    -----------------------------------------------------------------------------
    
    Table 2.12: Computational routines for the singular value decomposition



    next up previous contents index
    Next: Generalized Symmetric Definite Up: Computational Routines Previous: Invariant Subspaces and




    Tue Nov 29 14:03:33 EST 1994

    Generalized Symmetric Definite Eigenproblems



    next up previous contents index
    Next: Generalized Nonsymmetric Eigenproblems Up: Computational Routines Previous: Singular Value Decomposition

    Generalized Symmetric Definite Eigenproblems

     

    This section is concerned with the solution of the generalized eigenvalue problems , , and , where A and B are real symmetric or complex Hermitian and B is positive definite. Each of these problems can be reduced to a standard symmetric eigenvalue problem, using a Cholesky factorization of B as either or ( or in the Hermitian case). In the case , if A and B are banded then this may also be exploited to get a faster algorithm.

    With , we have

    Hence the eigenvalues of are those of , where C is the symmetric matrix and . In the complex case C is Hermitian with and .

    Table 2.13 summarizes how each of the three types of problem may be reduced to standard form , and how the eigenvectors z of the original problem may be recovered from the eigenvectors y of the reduced problem. The table applies to real problems; for complex problems, transposed matrices must be replaced by conjugate-transposes.

       
    Table 2.13: Reduction of generalized symmetric definite eigenproblems to standard problems

    Given A and a Cholesky factorization of B, the routines xyyGST overwrite A with the matrix C of the corresponding standard problem (see Table 2.14). This may then be solved using the routines described in subsection 2.3.4. No special routines are needed to recover the eigenvectors z of the generalized problem from the eigenvectors y of the standard problem, because these computations are simple applications of Level 2 or Level 3 BLAS.

    If the problem is and the matrices A and B are banded, the matrix C as defined above is, in general, full. We can reduce the problem to a banded standard problem by modifying the definition of C thus:

    where Q is an orthogonal matrix chosen to ensure that C has bandwidth no greater than that of A. Q is determined as a product of Givens rotations.   This is known as Crawford's algorithm     (see Crawford [14]). If X is required, it must be formed explicitly by the reduction routine.

    A further refinement is possible when A and B are banded, which halves the amount of work required to form C (see Wilkinson [79]). Instead of the standard Cholesky factorization of B as or , we use a ``split Cholesky'' factorization     ( if B is complex), where:

    with upper triangular and lower triangular of order approximately n / 2; S has the same bandwidth as B. After B has been factorized in this way by the routine xPBSTF        , the reduction of the banded generalized problem to a banded standard problem is performed by the routine xSBGST     (or xHBGST     for complex matrices). This routine implements a vectorizable form of the algorithm, suggested by Kaufman [57].

       

    --------------------------------------------------------------------
    Type of matrix                    Single precision  Double precision
    and storage scheme   Operation    real     complex  real     complex
    --------------------------------------------------------------------
    symmetric/Hermitian  reduction    SSYGST   CHEGST   DSYGST   ZHEGST
    --------------------------------------------------------------------
    symmetric/Hermitian  reduction    SSPGST   CHPGST   DSPGST   ZHPGST
    (packed storage)
    --------------------------------------------------------------------
    symmetric/Hermitian  split        SPBSTF   CPBSTF   DPBSTF   ZPBSTF
    banded               Cholesky
                         factorization
    --------------------------------------------------------------------
                         reduction    SSBGST   DSBGST   CHBGST   ZHBGST
    --------------------------------------------------------------------
    

    Table 2.14: Computational routines for the generalized symmetric definite eigenproblem



    next up previous contents index
    Next: Generalized Nonsymmetric Eigenproblems Up: Computational Routines Previous: Singular Value Decomposition




    Tue Nov 29 14:03:33 EST 1994

    Generalized Nonsymmetric Eigenproblems



    next up previous contents index
    Next: Generalized (or Quotient) Up: Computational Routines Previous: Generalized Symmetric Definite

    Generalized Nonsymmetric Eigenproblems

    Let A and B be n-by-n matrices.   A scalar is called a generalized eigenvalue   and a non-zero column vector x the corresponding right generalized eigenvector   if . A non-zero column vector y satisfying (where the superscript H denotes conjugate-transpose) is called the left generalized eigenvector   corresponding to . (For simplicity, we will usually omit the word ``generalized'' when no confusion is likely to arise.) If B is singular, we can have the infinite eigenvalue   , by which we mean Bx = 0. Note that if A is non-singular, then the equivalent problem is perfectly well-behaved, and the infinite eigenvalue corresponds to . To deal with infinite eigenvalues, the LAPACK routines return two values, and , for each eigenvalue . The first basic task of these routines is to compute the all n pairs and x and/or y for a given pair of matrices A,B.

    If the determinant of is zero for all values of , the eigenvalue problem is called singular, and is signaled by some (in the presence of roundoff, and may be very small). In this case the eigenvalue problem is very ill-conditioned, and in fact some of the other nonzero values of and may be indeterminate [43] [21] [80] [71].

    The other basic task is to compute the generalized Schur decomposition of the pair A,B.   If A and B are complex, then the pair's generalized Schur decomposition is , where Q and Z are unitary and S and P are upper triangular. The LAPACK routines normalize P   to have non-negative diagonal entries. Note that in this form, the eigenvalues can be easily computed from the diagonals: , and so the LAPACK routines return and . The generalized Schur form depends on the order on which the eigenvalues appear on the diagonal. In a future version of LAPACK, we will supply routines to allow the user to choose this order.

    If A and B are real, then the pair's generalized Schur decomposition is , , where Q and Z are orthogonal, P is upper triangular, and S is quasi-upper triangular with 1-by-1 and 2-by-2 blocks on the diagonal. The 1-by-1 blocks correspond to real generalized eigenvalues, while the 2-by-2 blocks correspond to complex conjugate pairs of generalized eigenvalues. In this case, P   is normalized so that diagonal entries of P corresponding to 1-by-1 blocks of S are non-negative, while the (upper triangular) diagonal blocks of P corresponding to 2-by-2 blocks of S are made diagonal. Note that for real eigenvalues, as for all eigenvalues in the complex case, the and values corresponding to real eigenvalues may be easily computed from the diagonal of S and P. The and values corresponding to complex eigenvalues are computed by computing , then computing the values that would result if the 2-by-2 diagonal block of S,P were upper triangularized using unitary transformations  , and finally multiplying to get and .

    The columns of Q and Z are called generalized Schur vectors   and span pairs of deflating subspaces of A and B [72].     Deflating subspaces are a generalization of invariant subspaces: The first k columns of Z span a right deflating subspace mapped by both A and B into a left deflating subspace spanned by the first k columns of Q. This pair of deflating subspaces corresponds to the first k eigenvalues appearing at the top of S and p.

    The computations proceed in the following stages:

    1. The pair A,B is reduced to generalized upper Hessenberg form. If A and B are real this decomposition is , where H is upper Hessenberg (zero below the first subdiagonal), T is upper triangular and U and V are orthogonal. If A and B are complex, the decomposition is , with U and V unitary and H and T as before. This decomposition is performed by the subroutine xGGHRD,         which computes H and T and optionally U and/or V. Note that in contrast to xGEHRD (for the standard         nonsymmetric eigenvalue problem), xGGHRD does not compute U and V in a factored form.

    2. The pair H,T is reduced to generalized Schur form   , (for H and T real) or , (for H and T complex) by subroutine xHGEQZ.           The values and are also computed. The matrices Z and Q are optionally computed.

    3. The left and/or right eigenvectors of the pair A are computed by xTGEVC.     One may optionally transform the right eigenvectors of S,P to the right eigenvectors of A,B (or of H,T) by passing UQ,VZ (or Q,Z) to xTGEVC.        

    In addition, the routines xGGBAL and xGGBAK                 may be used to balance the pair A,B prior to reduction to generalized Hessenberg form.     Balancing involves premultiplying A and B by one permutation   and postmultiplying them by another, to try to make A,B as nearly triangular as possible, and then ``scaling'' the matrices by   premultiplying A and B by one diagonal matrix and postmultiplying by another in order to make the rows and columns of A and B as close in norm to 1 as possible. These transformations can improve speed and accuracy of later processing in some cases; however, the scaling step can sometimes make things worse. Moreover, the scaling step will significantly change the generalized Schur form   that results. xGGBAL performs the balancing, and xGGBAK back transforms the eigenvectors of the balanced matrix pair.                

       

    --------------------------------------------------------------------------
    Type of matrix                          Single precision  Double precision
    and storage scheme Operation            real     complex  real     complex
    --------------------------------------------------------------------------
    general            Hessenberg reduction SGGHRD   CGGHRD   DGGHRD   ZGGHRD
                       balancing            SGGBAL   CGGBAL   DGGBAL   ZGGBAL
                       back transforming    SGGBAK   CGGBAK   DGGBAK   ZGGBAK
    --------------------------------------------------------------------------
    Hessenberg         Schur factorization  SHGEQZ   CHGEQZ   DHGEQZ   ZHGEQZ
    --------------------------------------------------------------------------
    (quasi)triangular  eigenvectors         STGEVC   CTGEVC   DTGEVC   ZTGEVC
    --------------------------------------------------------------------------
    
    Table 2.15: Computational routines for the generalized nonsymmetric eigenproblem

    A future release of LAPACK will include the routines xTGEXC, xTGSYL, xTGSNA and xTGSEN, which are analogous to the routines xTREXC, xTRSYL, xTRSNA and xTRSEN. They will reorder eigenvalues in generalized Schur form, solve the generalized Sylvester equation, compute condition numbers of generalized eigenvalues and eigenvectors, and compute condition numbers of average eigenvalues and deflating subspaces.



    next up previous contents index
    Next: Generalized (or Quotient) Up: Computational Routines Previous: Generalized Symmetric Definite




    Tue Nov 29 14:03:33 EST 1994

    Generalized (or Quotient) Singular Value Decomposition



    next up previous contents index
    Next: Performance of LAPACK Up: Computational Routines Previous: Generalized Nonsymmetric Eigenproblems

    Generalized (or Quotient) Singular Value Decomposition

     

            The generalized (or quotient) singular value decomposition of an m-by-n matrix A and a p-by-n matrix B is described in section 2.2.5. The routines described in this section, are used to compute the decomposition. The computation proceeds in the following two stages:

    1. xGGSVP         is used to reduce the matrices A and B to triangular form:

      where and are nonsingular upper triangular, and is upper triangular. If m - k - 1 < 0, the bottom zero block of does not appear, and is upper trapezoidal. , and are orthogonal matrices (or unitary matrices if A and B are complex). l is the rank of B, and k + l is the rank of .

    2. The generalized singular value decomposition of two l-by-l upper triangular matrices and is computed using xTGSJA gif :        

      Here , and are orthogonal (or unitary) matrices, C and S are both real nonnegative diagonal matrices satisfying , S is nonsingular, and R is upper triangular and nonsingular.

       

    --------------------------------------------------------
                          Single precision  Double precision
    Operation             real     complex  real     complex
    --------------------------------------------------------
    triangular reduction  SGGSVP   CGGSVP   DGGSVP   ZGGSVP
    of A and B
    --------------------------------------------------------
    GSVD of a pair of      STGSJA   CTGSJA   DTGSJA   ZTGSJA
    triangular matrices
    --------------------------------------------------------
    
    Table 2.16: Computational routines for the generalized singular value decomposition

    The reduction to triangular form, performed by xGGSVP, uses QR decomposition with column pivoting   for numerical rank determination. See [12] for details.  

    The generalized singular value decomposition of two triangular matrices, performed by xTGSJA, is done using a Jacobi-like method as described in [10] [62].




    Tue Nov 29 14:03:33 EST 1994

    Performance of LAPACK



    next up previous contents index
    Next: Factors that Affect Up: Guide Previous: Generalized (or Quotient)

    Performance of LAPACK

       

    Note: this chapter presents some performance figures for LAPACK routines. The figures are provided for illustration only, and should not be regarded as a definitive up-to-date statement of performance. They have been selected from performance figures obtained in 1994 during the development of version 2.0 of LAPACK. All reported timings were obtained using the optimized version of the BLAS available on each machine. For the IBM computers, the ESSL BLAS were used. Performance is affected by many factors that may change from time to time, such as details of hardware     (cycle time, cache size), compiler, and BLAS. To obtain up-to-date performance figures, use the timing programs provided with LAPACK.






    Tue Nov 29 14:03:33 EST 1994

    Essentials



    next up previous contents index
    Next: LAPACK Up: Guide Previous: Guide

    Essentials

     






    Tue Nov 29 14:03:33 EST 1994

    Factors that Affect Performance



    next up previous contents index
    Next: Vectorization Up: Performance of LAPACK Previous: Performance of LAPACK

    Factors that Affect Performance

     

    Can we provide portable   software for computations in dense linear algebra that is efficient on a wide range of modern high-performance computers? If so, how? Answering these questions - and providing the desired software - has been the goal of the LAPACK project.

    LINPACK [26] and EISPACK [44] [70] have for many years provided high-quality portable software for linear algebra; but on modern high-performance computers they often achieve only a small fraction of the peak performance of the machines. Therefore, LAPACK has been designed to supersede LINPACK and EISPACK, principally by achieving much greater efficiency - but at the same time also adding extra functionality, using some new or improved algorithms, and integrating the two sets of algorithms into a single package.

    LAPACK was originally targeted to achieve good performance on single-processor vector machines and on shared memory multiprocessor machines   with a modest number of powerful processors. Since the start of the project, another class of machines has emerged for which LAPACK software is equally well-suited-the high-performance ``super-scalar'' workstations  . (LAPACK is intended to be used across the whole spectrum of modern computers, but when considering performance, the emphasis is on machines at the more powerful end of the spectrum.)

    Here we discuss the main factors that affect the performance of linear algebra software on these classes of machines.






    Tue Nov 29 14:03:33 EST 1994

    Vectorization



    next up previous contents index
    Next: Data Movement Up: Factors that Affect Previous: Factors that Affect

    Vectorization

       

    Designing vectorizable algorithms in linear algebra is usually straightforward. Indeed, for many computations there are several variants, all vectorizable, but with different characteristics in performance (see, for example, [33]). Linear algebra algorithms can come close to the peak performance of many machines - principally because peak performance depends on some form of chaining of vector addition and multiplication operations, and this is just what the algorithms require.

    However, when the algorithms are realized in straightforward Fortran 77 code, the performance may fall well short of the expected level, usually because vectorizing Fortran compilers fail to minimize the number of memory references - that is, the number of vector load and store operations. This brings us to the next factor.




    Tue Nov 29 14:03:33 EST 1994

    Data Movement



    next up previous contents index
    Next: Parallelism Up: Factors that Affect Previous: Vectorization

    Data Movement

       

    What often limits the actual performance of a vector-or scalar- floating-point unit is the rate of transfer of data between different levels of memory in the machine. Examples include: the transfer of vector operands in and out of vector registers  , the transfer of scalar operands in and out of a high-speed scalar processor, the movement of data between main memory and a high-speed cache   or local memory  , and paging between actual memory and disk storage in a virtual memory system.

    It is desirable to maximize the ratio of floating-point operations to memory references, and to re-use data as much as possible while it is stored in the higher levels of the memory hierarchy (for example, vector registers or high-speed cache).

    A Fortran programmer has no explicit control over these types of data movement, although one can often influence them by imposing a suitable structure on an algorithm.




    Tue Nov 29 14:03:33 EST 1994

    Parallelism



    next up previous contents index
    Next: The BLAS as Up: Factors that Affect Previous: Data Movement

    Parallelism

       

    The nested loop structure of most linear algebra algorithms offers considerable scope for loop-based parallelism on shared memory machines. This is the principal type of parallelism that LAPACK at present aims to exploit. It can sometimes be generated automatically by a compiler, but often requires the insertion of compiler directives  .




    Tue Nov 29 14:03:33 EST 1994

    The BLAS as the Key to Portability



    next up previous contents index
    Next: Block Algorithms and Up: Performance of LAPACK Previous: Parallelism

    The BLAS as the Key to Portability

       

    How then can we hope to be able to achieve sufficient control over vectorization, data movement, and parallelism in portable Fortran code, to obtain the levels of performance that machines can offer?

    The LAPACK strategy for combining efficiency with portability   is to construct the software as much as possible out of calls to the BLAS (Basic Linear Algebra Subprograms); the BLAS are used as building blocks.

    The efficiency   of LAPACK software depends on efficient implementations of the BLAS being provided by computer vendors (or others) for their machines. Thus the BLAS form a low-level interface between LAPACK software and different machine architectures. Above this level, almost all of the LAPACK software is truly portable.

    There are now three levels of BLAS:

    Level 1 BLAS [58]:
    for vector operations, such as

    Level 2 BLAS [30]:
    for matrix-vector operations, such as

    Level 3 BLAS [28]:
    for matrix-matrix operations, such as

    Here, A, B and C are matrices, x and y are vectors, and and are scalars.

    The Level 1 BLAS   are used in LAPACK, but for convenience rather than for performance: they perform an insignificant fraction of the computation, and they cannot achieve high efficiency on most modern supercomputers.

    The Level 2 BLAS   can achieve near-peak performance on many vector processors, such as a single processor of a CRAY Y-MP, CRAY C90, or CONVEX C4 machine. However on other vector processors, such as a CRAY 2, or a RISC workstation, their performance is limited by the rate of data movement between different levels of memory.

    This limitation is overcome by the Level 3 BLAS  , which perform floating-point operations on data, whereas the Level 2 BLAS perform only operations on data.

    The BLAS also allow us to exploit parallelism in a way that is transparent to the software that calls them. Even the Level 2 BLAS offer some scope for exploiting parallelism, but greater scope is provided by the Level 3 BLAS, as Table 3.1 illustrates.

       
    Table 3.1: Speed in megaflops of Level 2 and Level 3 BLAS operations on a CRAY C90



    next up previous contents index
    Next: Block Algorithms and Up: Performance of LAPACK Previous: Parallelism




    Tue Nov 29 14:03:33 EST 1994

    Block Algorithms and their Derivation



    next up previous contents index
    Next: Examples of Block Up: Performance of LAPACK Previous: The BLAS as

    Block Algorithms and their Derivation

       

    It is comparatively straightforward to recode many of the algorithms in LINPACK and EISPACK so that they call Level 2 BLAS  . Indeed, in the simplest cases the same floating-point operations are performed, possibly even in the same order: it is just a matter of reorganizing the software. To illustrate this point we derive the Cholesky factorization algorithm that is used in the LINPACK   routine SPOFA  , which factorizes a symmetric positive definite matrix as . Writing these equations as:

    and equating coefficients of the j-th column, we obtain:

    Hence, if

    has already been computed, we can compute

    and

    from the equations:

    Here is the body of the code of the LINPACK routine SPOFA  , which implements the above method:

             DO 30 J = 1, N
                INFO = J
                S = 0.0E0
                JM1 = J - 1
                IF (JM1 .LT. 1) GO TO 20
                DO 10 K = 1, JM1
                   T = A(K,J) - SDOT(K-1,A(1,K),1,A(1,J),1)
                   T = T/A(K,K)
                   A(K,J) = T
                   S = S + T*T
       10       CONTINUE
       20       CONTINUE
                S = A(J,J) - S
    C     ......EXIT
                IF (S .LE. 0.0E0) GO TO 40
                A(J,J) = SQRT(S)
       30    CONTINUE

    And here is the same computation recoded in ``LAPACK-style'' to use the Level 2 BLAS   routine STRSV (which solves a triangular system of equations). The call to STRSV has replaced the loop over K which made several calls to the Level 1 BLAS routine SDOT. (For reasons given below, this is not the actual code used in LAPACK - hence the term ``LAPACK-style''.)

          DO 10 J = 1, N
             CALL STRSV( 'Upper', 'Transpose', 'Non-unit', J-1, A, LDA,
         $               A(1,J), 1 )
             S = A(J,J) - SDOT( J-1, A(1,J), 1, A(1,J), 1 )
             IF( S.LE.ZERO ) GO TO 20
             A(J,J) = SQRT( S )
       10 CONTINUE

    This change by itself is sufficient to make big gains in performance on a number of machines.

    For example, on an IBM RISC Sys/6000-550 (using double precision) there is virtually no difference in performance between the LINPACK-style and the LAPACK-style code. Both styles run at a megaflop rate far below its peak performance for matrix-matrix multiplication. To exploit the faster speed of Level 3 BLAS  , the algorithms must undergo a deeper level of restructuring, and be re-cast as a block algorithm - that is, an algorithm that operates on blocks or submatrices of the original matrix.

    To derive a block form of Cholesky factorization  , we write the defining equation in partitioned form thus:

    Equating submatrices in the second block of columns, we obtain:

    Hence, if

    has already been computed, we can compute

    as the solution to the equation

    by a call to the Level 3 BLAS routine STRSM; and then we can compute

    from

    This involves first updating the symmetric submatrix

    by a call to the Level 3 BLAS routine SSYRK, and then computing its Cholesky factorization. Since Fortran does not allow recursion, a separate routine must be called (using Level 2 BLAS rather than Level 3), named SPOTF2 in the code below. In this way successive blocks of columns of U are computed. Here is LAPACK-style code for the block algorithm. In this code-fragment NB denotes the width   of the blocks.

          DO 10 J = 1, N, NB
             JB = MIN( NB, N-J+1 )
             CALL STRSM( 'Left', 'Upper', 'Transpose', 'Non-unit', J-1, JB,
         $               ONE, A, LDA, A( 1, J ), LDA )
             CALL SSYRK( 'Upper', 'Transpose', JB, J-1, -ONE, A( 1, J ), LDA,
         $               ONE, A( J, J ), LDA )
             CALL SPOTF2( 'Upper', JB, A( J, J ), LDA, INFO )
             IF( INFO.NE.0 ) GO TO 20
       10 CONTINUE

    But that is not the end of the story, and the code given above is not the code that is actually used in the LAPACK routine SPOTRF  . We mentioned in subsection 3.1.1 that for many linear algebra computations there are several vectorizable variants, often referred to as i-, j- and k-variants, according to a convention introduced in [33] and used in [45]. The same is true of the corresponding block algorithms.

    It turns out that the j-variant that was chosen for LINPACK, and used in the above examples, is not the fastest on many machines, because it is based on solving triangular systems of equations, which can be significantly slower than matrix-matrix multiplication. The variant actually used in LAPACK is the i-variant, which does rely on matrix-matrix multiplication.



    next up previous contents index
    Next: Examples of Block Up: Performance of LAPACK Previous: The BLAS as




    Tue Nov 29 14:03:33 EST 1994

    Examples of Block Algorithms in LAPACK



    next up previous contents index
    Next: Factorizations for Solving Up: Performance of LAPACK Previous: Block Algorithms and

    Examples of Block Algorithms in LAPACK

     

    Having discussed in detail the derivation of one particular block algorithm, we now describe examples of the performance that has been achieved with a variety of block algorithms. The clock speeds for the computers involved in the timings are listed in Table 3.2.

       

    -------------------------------------------
                               Clock Speed
    -------------------------------------------
    CONVEX C-4640           135 MHz    7.41  ns
    CRAY C90                240 MHz    4.167 ns
    DEC 3000-500X Alpha     200 MHz    5.0   ns
    IBM POWER2 model 590     66 MHz   15.15  ns
    IBM RISC Sys/6000-550    42 MHz   23.81  ns
    SGI POWER CHALLENGE      75 MHz   13.33  ns
    -------------------------------------------
    

    Table 3.2: Clock Speeds of Computers in Timing Results

    See Gallivan et al. [42] and Dongarra et al. [31] for an alternative survey of algorithms for dense linear   algebra on high-performance computers.






    Tue Nov 29 14:03:33 EST 1994

    Factorizations for Solving Linear Equations



    next up previous contents index
    Next: Factorization Up: Examples of Block Previous: Examples of Block

    Factorizations for Solving Linear Equations

     

    The well-known LU and Cholesky factorizations are the simplest block algorithms to derive. No extra floating-point operations nor extra working storage are required.

    Table 3.3 illustrates the speed of the LAPACK routine for LU factorization of a real matrix  , SGETRF   in single precision on CRAY machines, and DGETRF   in double precision on all other machines. This corresponds to 64-bit floating-point arithmetic on all machines tested. A block size of 1 means that the unblocked algorithm is used, since it is faster than - or at least as fast as - a blocked algorithm.

       

    ---------------------------------------------------
                        No. of    Block    Values of n
                      processors   size    100     1000
    ---------------------------------------------------
    CONVEX C-4640          1        64     274      711
    CONVEX C-4640          4        64     379     2588
    CRAY C90               1       128     375      863
    CRAY C90              16       128     386     7412
    DEC 3000-500X Alpha    1        32      53       91
    IBM POWER2 model 590   1        32     110      168
    IBM RISC Sys/6000-550  1        32      33       56
    SGI POWER CHALLENGE    1        64      81      201
    SGI POWER CHALLENGE    4        64      79      353
    ---------------------------------------------------
    
    Table 3.3: Speed in megaflops of SGETRF/DGETRF for square matrices of order n

    Table 3.4 gives similar results for Cholesky factorization    .

       

    ---------------------------------------------------
                        No. of    Block    Values of n
                      processors   size    100     1000
    ---------------------------------------------------
    CONVEX C-4640          1        64     120      546
    CONVEX C-4640          4        64     150     1521
    CRAY C90               1       128     324      859
    CRAY C90              16       128     453     9902
    DEC 3000-500X Alpha    1        32      37       83
    IBM POWER2 model 590   1        32     102      247
    IBM RISC Sys/6000-550  1        32      40       72
    SGI POWER CHALLENGE    1        64      74      199
    SGI POWER CHALLENGE    4        64      69      424
    ---------------------------------------------------
    
    Table 3.4: Speed in megaflops of SPOTRF/DPOTRF for matrices of order n with UPLO = `U'

    LAPACK, like LINPACK, provides a factorization for symmetric indefinite   matrices, so that A is factorized as , where P is a permutation matrix, and D is block diagonal with blocks of order 1 or 2. A block form of this algorithm has been derived, and is implemented in the LAPACK routine SSYTRF  /DSYTRF  . It has to duplicate a little of the computation in order to ``look ahead'' to determine the necessary row and column interchanges, but the extra work can be more than compensated for by the greater speed of updating the matrix by blocks as is illustrated in Table 3.5  .

       

    -------------------
    Block   Values of n
    size    100    1000
    -------------------
     1       62      86
    64       68     165
    -------------------
    
    Table 3.5: Speed in megaflops of DSYTRF for matrices of order n with UPLO = `U' on an IBM POWER2 model 590

    LAPACK, like LINPACK, provides LU and Cholesky factorizations of band matrices. The LINPACK algorithms can easily be restructured to use Level 2 BLAS, though that has little effect on performance for matrices of very narrow bandwidth. It is also possible to use Level 3 BLAS, at the price of doing some extra work with zero elements outside the band [39]. This becomes worthwhile for matrices of large order and semi-bandwidth greater than 100 or so.



    next up previous contents index
    Next: Factorization Up: Examples of Block Previous: Examples of Block




    Tue Nov 29 14:03:33 EST 1994

    <var>QR</var> Factorization



    next up previous contents index
    Next: Eigenvalue Problems Up: Examples of Block Previous: Factorizations for Solving

    QR Factorization

     

    The traditional algorithm for QR factorization   is based on the use of elementary Householder   matrices of the general form

    where v is a column vector and

    is a scalar. This leads to an algorithm with very good vector performance, especially if coded to use Level 2 BLAS.

    The key to developing a block form of this algorithm is to represent a product of b elementary Householder matrices of order n as a block form of a Householder matrix  . This can be done in various ways. LAPACK uses the following form [68]:

    where V is an n-by-n matrix whose columns are the individual vectors

    associated with the Householder matrices

    , and T is an upper triangular matrix of order b. Extra work is required to compute the elements of T, but once again this is compensated for by the greater speed of applying the block form. Table 3.6 summarizes results obtained with the LAPACK routine SGEQRF  /DGEQRF  .

       

    -------------------------------------------------
                        No. of    Block   Values of n
                      processors   size   100    1000
    -------------------------------------------------
    CONVEX C-4640          1        64     81     521
    CONVEX C-4640          4        64     94    1204
    CRAY C90               1       128    384     859
    CRAY C90              16       128    390    7641
    DEC 3000-500X Alpha    1        32     50      86
    IBM POWER2 model 590   1        32    108     208
    IBM RISC Sys/6000-550  1        32     30      61
    SGI POWER CHALLENGE    1        64     61     190
    SGI POWER CHALLENGE    4        64     39     342
    -------------------------------------------------
    

    Table 3.6: Speed in megaflops of SGEQRF/DGEQRF for square matrices of order n




    Tue Nov 29 14:03:33 EST 1994

    Eigenvalue Problems



    next up previous contents index
    Next: LAPACK Benchmark Up: Examples of Block Previous: Factorization

    Eigenvalue Problems

     

    Eigenvalue   problems have until recently provided a less fertile ground for the development of block algorithms than the factorizations so far described. Version 2.0 of LAPACK includes new block algorithms for the symmetric eigenvalue problem, and future releases will include analogous algorithms for the singular value decomposition.

    The first step in solving many types of eigenvalue problems is to reduce the original matrix to a ``condensed form'' by orthogonal transformations  .  

    In the reduction to condensed forms, the unblocked algorithms all use elementary Householder matrices and have good vector performance. Block forms of these algorithms have been developed [34], but all require additional operations, and a significant proportion of the work must still be performed by Level 2 BLAS, so there is less possibility of compensating for the extra operations.

    The algorithms concerned are:

    Note that only in the reduction to Hessenberg form   is it possible to use the block Householder representation described in subsection 3.4.2. Extra work must be performed to compute the n-by-b matrices X and Y that are required for the block updates (b is the block size) - and extra workspace is needed to store them.

    Nevertheless, the performance gains can be worthwhile on some machines, for example, on an IBM POWER2 model 590, as shown in Table 3.7.

       

                 (all matrices are square of order n)
                    ----------------------------
                              Block  Values of n
                               size  100    1000
                    ----------------------------
                    DSYTRD       1   137     159
                                16    82     169
                    ----------------------------
                    DGEBRD       1    90     110
                                16    90     136
                    ----------------------------
                    DGEHRD       1   111     113
                                16   125     187
                    ----------------------------
    
    Table 3.7: Speed in megaflops of reductions to condensed forms on an IBM POWER2 model 590

    Following the reduction of a dense (or band) symmetric matrix to tridiagonal form T, we must compute the eigenvalues and (optionally) eigenvectors of T. Computing the eigenvalues of T alone (using LAPACK routine SSTERF    ) requires flops, whereas the reduction routine SSYTRD     does flops. So eventually the cost of finding eigenvalues alone becomes small compared to the cost of reduction. However, SSTERF does only scalar floating point operations, without scope for the BLAS, so n may have to be large before SSYTRD is slower than SSTERF.

    Version 2.0 of LAPACK includes a new algorithm, SSTEDC        , for finding all eigenvalues and eigenvectors of n. The new algorithm can exploit Level 2 and 3 BLAS, whereas the previous algorithm, SSTEQR        , could not. Furthermore, SSTEDC usually does many fewer flops than SSTEQR, so the speedup is compounded. Briefly, SSTEDC works as follows (for details, see [67] [47]). The tridiagonal matrix T is written as

    where and are tridiagonal, and H is a very simple rank-one matrix. Then the eigenvalues and eigenvectors of and are found by applying the algorithm recursively; this yields and , where is a diagonal matrix of eigenvalues, and the columns of are orthonormal eigenvectors. Thus

    where is again a simple rank-one matrix. The eigenvalues and eigenvectors of may be found using scalar operations, yielding Substituting this into the last displayed expression yields

    where the diagonals of are the desired eigenvalues of T, and the columns of are the eigenvectors. Almost all the work is done in the two matrix multiplies of and times , which is done using the Level 3 BLAS.

    The same recursive algorithm can be developed for the singular value decomposition of the bidiagonal matrix resulting from reducing a dense matrix with SGEBRD. This software will be completed for a future release of LAPACK. The current LAPACK algorithm for the bidiagonal singular values decomposition, SBDSQR        , does not use the Level 2 or Level 3 BLAS.

    For computing the eigenvalues and eigenvectors of a Hessenberg matrix-or rather for computing its Schur factorization- yet another flavour of block algorithm has been developed: a multishift QR iteration   [8]. Whereas the traditional EISPACK routine HQR   uses a double shift (and the corresponding complex routine COMQR   uses a single shift), the multishift algorithm uses block shifts of higher order. It has been found that often the total number of operations decreases as the order of shift is increased until a minimum is reached typically between 4 and 8; for higher orders the number of operations increases quite rapidly. On many machines the speed of applying the shift increases steadily with the order, and the optimum order of shift is typically in the range 8-16. Note however that the performance can be very sensitive to the choice of the order of shift; it also depends on the numerical properties of the matrix. Dubrulle [37] has studied the practical performance of the algorithm, while Watkins and Elsner [77] discuss its theoretical asymptotic convergence rate.

    Finally, we note that research into block algorithms for symmetric and nonsymmetric eigenproblems continues [55] [9], and future versions of LAPACK will be updated to contain the best algorithms available.



    next up previous contents index
    Next: LAPACK Benchmark Up: Examples of Block Previous: Factorization




    Tue Nov 29 14:03:33 EST 1994

    LAPACK



    next up previous contents index
    Next: Problems that LAPACK Up: Essentials Previous: Essentials

    LAPACK

    LAPACK is a library of Fortran 77 subroutines for solving the most commonly occurring problems in numerical linear algebra. It has been designed to be efficient on a wide range of modern high-performance computers. The name LAPACK is an acronym for Linear Algebra PACKage.




    Tue Nov 29 14:03:33 EST 1994

    LAPACK Benchmark



    next up previous contents index
    Next: Accuracy and Stability Up: Performance of LAPACK Previous: Eigenvalue Problems

    LAPACK Benchmark

    This section contains performance numbers for selected LAPACK driver routines. These routines provide complete solutions for the most common problems of numerical linear algebra, and are the routines users are most likely to call:

    Data is provided for a variety of vector computers, shared memory parallel computers, and high performance workstations. All timings were obtained by using the machine-specific optimized BLAS available on each machine. For the IBM RISC Sys/6000-550 and IBM POWER2 model 590, the ESSL BLAS were used. In all cases the data consisted of 64-bit floating point numbers (single precision on the CRAY C90 and double precision on the other machines). For each machine and each driver, a small problem (N = 100 with LDA = 101) and a large problem (N = 1000 with LDA = 1001) were run. Block sizes NB = 1, 16, 32 and 64 were tried, with data only for the fastest run reported in the tables below. Similarly, UPLO = 'L' and UPLO = 'U' were timed for SSYEVD/DSYEVD, but only times for UPLO = 'U' were reported. For SGEEV/DGEEV, ILO = 1 and IHI = N. The test matrices were generated with randomly distributed entries. All run times are reported in seconds, and block size is denoted by nb. The value of nb was chosen to make N = 1000 optimal. It is not necessarily the best choice for N = 100. See Section 6.2 for details.

    The performance data is reported using three or four statistics. First, the run-time in seconds is given. The second statistic measures how well our performance compares to the speed of the BLAS, specifically SGEMM/DGEMM. This ``equivalent matrix multiplies'' statistic is calculated as

    and labeled as in the tables. The performance information for the BLAS routines
    SGEMV/DGEMV (TRANS='N') and SGEMM/DGEMM (TRANSA='N', TRANSB='N') is provided in Table 3.8, along with the clock speed for each machine in Table 3.2. The third statistic is the true megaflop rating. For the eigenvalue and singular value drivers, a fourth ``synthetic megaflop'' statistic is also presented. We provide this statistic because the number of floating point operations needed to find eigenvalues and singular values depends on the input data, unlike linear equation solving or linear least squares solving with SGELS/DGELS. The synthetic megaflop rating is defined to be the ``standard'' number of flops required to solve the problem, divided by the run-time in microseconds. This ``standard'' number of flops is taken to be the average for a standard algorithm over a variety of problems, as given in Table 3.9 (we ignore terms of order ) [45].

       
    Table 3.8: Execution time and Megaflop rates for SGEMV/DGEMV and SGEMM/DGEMM

    Note that the synthetic megaflop rating is much higher than the true megaflop rating for
    SSYEVD/DSYEVD in Table 3.15; this is because SSYEVD/DSYEVD performs many fewer floating point operations than the standard algorithm, SSYEV/DSYEV.

       
    Table 3.9: ``Standard'' floating point operation counts for LAPACK drivers for n-by-n matrices

       
    Table 3.10: Performance of SGESV/DGESV for n-by-n matrices

       
    Table 3.11: Performance of SGELS/DGELS for n-by-n matrices

       
    Table 3.12: Performance of SGEEV/DGEEV, eigenvalues only

       
    Table 3.13: Performance of SGEEV/DGEEV, eigenvalues and right eigenvectors

       
    Table 3.14: Performance of SSYEVD/DSYEVD, eigenvalues only, UPLO='U'

       
    Table 3.15: Performance of SSYEVD/DSYEVD, eigenvalues and eigenvectors, UPLO='U'

       
    Table 3.16: Performance of SGESVD/DGESVD, singular values only

       
    Table 3.17: Performance of SGESVD/DGESVD, singular values and left and right singular vectors



    next up previous contents index
    Next: Accuracy and Stability Up: Performance of LAPACK Previous: Eigenvalue Problems




    Tue Nov 29 14:03:33 EST 1994

    Accuracy and Stability



    next up previous contents index
    Next: Sources of Error Up: Guide Previous: LAPACK Benchmark

    Accuracy and Stability

     

    In addition to providing faster routines than previously available, LAPACK provides more comprehensive and better   error   bounds  . Our ultimate goal is to provide error bounds for all quantities computed by LAPACK.

    In this chapter we explain our overall approach to obtaining error bounds, and provide enough information to use the software. The comments at the beginning of the individual routines should be consulted for more details. It is beyond the scope of this chapter to justify all the bounds we present. Instead, we give references to the literature. For example, standard material on error analysis can be found in [45].

    In order to make this chapter easy to read, we have labeled sections not essential for a first reading as Further Details. The sections not labeled as Further Details should provide all the information needed to understand and use the main error bounds computed by LAPACK. The Further Details sections provide mathematical background, references, and tighter but more expensive error bounds, and may be read later.

    In section 4.1 we discuss the sources of numerical error, in particular roundoff error. Section 4.2 discusses how to measure errors, as well as some standard notation. Section 4.3 discusses further details of how error bounds are derived. Sections 4.4 through 4.12 present error bounds for linear equations, linear least squares problems, generalized linear least squares problems, the symmetric eigenproblem, the nonsymmetric eigenproblem, the singular value decomposition, the generalized symmetric definite eigenproblem, the generalized nonsymmetric eigenproblem and the generalized (or quotient) singular value decomposition respectively. Section 4.13 discusses the impact of fast Level 3 BLAS   on the accuracy   of LAPACK routines.

    The sections on generalized linear least squares problems and the generalized nonsymmetric eigenproblem are ``placeholders'' to be completed in the next versions of the library and manual. The next versions will also include error bounds for new high accuracy   routines for the symmetric eigenvalue problem and singular value decomposition.





    next up previous contents index
    Next: Sources of Error Up: Guide Previous: LAPACK Benchmark




    Tue Nov 29 14:03:33 EST 1994

    Sources of Error in Numerical Calculations



    next up previous contents index
    Next: Further Details: Floating Up: Accuracy and Stability Previous: Accuracy and Stability

    Sources of Error in Numerical Calculations

     

              There are two sources of error whose effects can be measured by the bounds in this chapter: roundoff error and input error. Roundoff error arises from rounding results of floating-point operations during the algorithm. Input error is error in the input to the algorithm from prior calculations or measurements. We describe roundoff error first, and then input error.

    Almost all the error bounds LAPACK provides are multiples of machine epsilon,       which we abbreviate by . Machine epsilon bounds the roundoff in individual floating-point operations. It may be loosely defined as the largest relative error     in any floating-point operation that neither overflows nor underflows. (Overflow means the result is too large to represent accurately, and underflow means the result is too small to represent accurately.) Machine epsilon is available either by the function call    SLAMCH('Epsilon') (or simply SLAMCH('E')) in single precision, or by the function call DLAMCH('Epsilon') (or DLAMCH('E')) in double precision. See section 4.1.1 and Table 4.1 for a discussion of common values of machine epsilon.        

    Since overflow generally causes an error message, and underflow is almost always less significant than roundoff, we will not consider overflow and underflow further (see section 4.1.1).

    Bounds on input errors, or errors in the input parameters inherited from prior computations or measurements, may be easily incorporated into most LAPACK error bounds. Suppose the input data is accurate to, say, 5 decimal digits (we discuss exactly what this means in section 4.2). Then one simply replaces by in the error bounds.






    Tue Nov 29 14:03:33 EST 1994

    Further Details: Floating point arithmetic



    next up previous contents index
    Next: How to Measure Up: Sources of Error Previous: Sources of Error

    Further Details: Floating point arithmetic

     

        Roundoff error is bounded in terms of the machine precision ,     which is the smallest value satisfying

    where and are floating-point numbers  , is any one of the four operations +, , and , and is the floating-point result of . Machine epsilon, , is the smallest value for which this inequality is true for all , and for all and such that is neither too large (magnitude exceeds the overflow threshold)     nor too small (is nonzero with magnitude less than the underflow threshold)     to be represented accurately in the machine. We also assume bounds the relative error in unary     operations like square root:

    A precise characterization of depends on the details of the machine arithmetic and sometimes even of the compiler. For example, if addition and subtraction are implemented without a guard digit gif we must redefine to be the smallest number such that

    In order to assure portability  , machine parameters such as machine epsilon, the overflow threshold and underflow threshold are computed at runtime by the auxiliary       routine xLAMCH gif . The alternative, keeping a fixed table of machine parameter values, would degrade portability because the table would have to be changed when moving from one machine, or even one compiler, to another.

    Actually, most machines, but not yet all, do have the same machine parameters because they implement IEEE Standard Floating Point Arithmetic   [5] [4], which exactly specifies floating-point number representations and operations. For these machines, including all modern workstations and PCs gif , the values of these parameters are given in Table 4.1.

       
    Table 4.1: Values of Machine Parameters in IEEE Floating Point Arithmetic

    As stated above, we will ignore overflow and underflow in discussing error bounds. Reference [18] discusses extending error bounds to include underflow, and shows that for many common computations, when underflow occurs it is less significant than roundoff. Overflow generally causes an error message and stops execution, so the error bounds do not apply gif .        

    Therefore, most of our error bounds will simply be proportional to machine epsilon. This means, for example, that if the same problem in solved in double precision and single precision, the error bound in double precision will be smaller than the error bound in single precision by a factor of . In IEEE arithmetic, this ratio is , meaning that one expects the double precision answer to have approximately nine more decimal digits correct than the single precision answer.

    LAPACK routines are generally insensitive to the details of rounding, like their counterparts in LINPACK and EISPACK. One newer algorithm (xLASV2) can return significantly more accurate results if addition and subtraction have a guard digit     (see the end of section 4.9).



    next up previous contents index
    Next: How to Measure Up: Sources of Error Previous: Sources of Error




    Tue Nov 29 14:03:33 EST 1994

    How to Measure Errors



    next up previous contents index
    Next: Further Details: How Up: Accuracy and Stability Previous: Further Details: Floating

    How to Measure Errors

     

      LAPACK routines return four types of floating-point output arguments:

    This section provides measures for errors in these quantities, which we need in order to express error bounds.

      First consider scalars. Let the scalar be an approximation of the true answer . We can measure the difference between and either by the absolute error , or, if is nonzero, by the relative error . Alternatively, it is sometimes more convenient to use instead of the standard expression for relative error (see section 4.2.1). If the relative error of is, say , then we say that is accurate to 5 decimal digits.        

      In order to measure the error in vectors, we need to measure the size or norm of a vector x  . A popular norm is the magnitude of the largest component, , which we denote . This is read the infinity norm of x. See Table 4.2 for a summary of norms.

       
    Table 4.2: Vector and matrix norms

    If is an approximation to the exact vector x, we will refer to as the absolute error in (where p is one of the values in Table 4.2),         and refer to as the relative error in (assuming ). As with scalars, we will sometimes use for the relative error. As above, if the relative error of is, say , then we say that is accurate to 5 decimal digits. The following example illustrates these ideas:

    Thus, we would say that approximates x to 2 decimal digits.

      Errors in matrices may also be measured with norms  . The most obvious generalization of to matrices would appear to be , but this does not have certain important mathematical properties that make deriving error bounds convenient (see section 4.2.1). Instead, we will use , where A is an m-by-n matrix, or ; see Table 4.2 for other matrix norms. As before is the absolute error     in , is the relative error     in , and a relative error in of means is accurate to 5 decimal digits. The following example illustrates these ideas:

    so is accurate to 1 decimal digit.

    Here is some related notation we will use in our error bounds. The condition number of a matrix A is defined as   , where A is square and invertible, and p is or one of the other possibilities in Table 4.2. The condition number measures how sensitive is to changes in A; the larger the condition number, the more sensitive is . For example, for the same A as in the last example,

    LAPACK error estimation routines typically compute a variable called RCOND  , which is the reciprocal of the condition number (or an approximation of the reciprocal). The reciprocal of the condition number is used instead of the condition number itself in order to avoid the possibility of overflow when the condition number is very large.     Also, some of our error bounds will use the vector of absolute values of x, ( ), or similarly ( ).

      Now we consider errors in subspaces. Subspaces are the outputs of routines that compute eigenvectors and invariant subspaces of matrices. We need a careful definition of error in these cases for the following reason. The nonzero vector x is called a (right) eigenvector of the matrix A with eigenvalue if . From this definition, we see that -x, 2x, or any other nonzero multiple of x is also an eigenvector. In other words, eigenvectors are not unique. This means we cannot measure the difference between two supposed eigenvectors and x by computing , because this may be large while is small or even zero for some . This is true even if we normalize n so that , since both x and -x can be normalized simultaneously. So in order to define error in a useful way, we need to instead consider the set S of all scalar multiples of x. The set S is called the subspace spanned by x, and is uniquely determined by any nonzero member of S. We will measure the difference between two such sets by the acute angle between them. Suppose is spanned by and S is spanned by {x}. Then the acute angle between and S is defined as    

    One can show that does not change when either or x is multiplied by any nonzero scalar. For example, if

    as above, then for any nonzero scalars and .

    Here is another way to interpret the angle between and S.     Suppose is a unit vector ( ). Then there is a scalar such that

    The approximation holds when is much less than 1 (less than .1 will do nicely). If is an approximate eigenvector with error bound , where x is a true eigenvector, there is another true eigenvector satisfying . For example, if

    then for .

    Some LAPACK routines also return subspaces spanned by more than one vector, such as the invariant subspaces of matrices returned by xGEESX.         The notion of angle between subspaces also applies here;     see section 4.2.1 for details.

    Finally, many of our error bounds will contain a factor p(n) (or p(m , n)), which grows as a function of matrix dimension n (or dimensions m and n). It represents a potentially different function for each problem. In practice, the true errors usually grow just linearly; using p(n) = 10n in the error bound formulas will often give a reasonable bound. Therefore, we will refer to p(n) as a ``modestly growing'' function of n. However it can occasionally be much larger, see section 4.2.1. For simplicity, the error bounds computed by the code fragments in the following sections will use p(n) = 1. This means these computed error bounds may occasionally slightly underestimate the true error. For this reason we refer to these computed error bounds as ``approximate error bounds''.





    next up previous contents index
    Next: Further Details: How Up: Accuracy and Stability Previous: Further Details: Floating




    Tue Nov 29 14:03:33 EST 1994

    Further Details: How to Measure Errors



    next up previous contents index
    Next: Further Details: How Up: How to Measure Previous: How to Measure

    Further Details: How to Measure Errors

     

        The relative error in the approximation of the true solution has a drawback: it often cannot be computed directly, because it depends on the unknown quantity . However, we can often instead estimate , since is known (it is the output of our algorithm). Fortunately, these two quantities are necessarily close together, provided either one is small, which is the only time they provide a useful bound anyway. For example, implies

    so they can be used interchangeably.

    Table 4.2 contains a variety of norms we will use to measure errors. These norms have the properties that , and , where p is one of 1, 2, , and F. These properties are useful for deriving error bounds.

    An error bound that uses a given norm may be changed into an error bound that uses another norm. This is accomplished by multiplying the first error bound by an appropriate function of the problem dimension. Table 4.3 gives the factors such that , where n is the dimension of x.

       
    Table 4.3: Bounding One Vector Norm in Terms of Another

    Table 4.4 gives the factors such that , where A is m-by-n.

       
    Table 4.4: Bounding One Matrix Norm in Terms of Another

    The two-norm of A, , is also called the spectral norm of A, and is equal to the largest singular value of A. We shall also need to refer to the smallest singular value of A; its value can be defined in a similar way to the definition of the two-norm in Table 4.2, namely as when A has at least as many rows as columns, and defined as when A has more columns than rows. The two-norm, Frobenius norm    , and singular values of a matrix do not change if the matrix is multiplied by a real orthogonal (or complex unitary) matrix.

    Now we define subspaces spanned by more than one vector, and angles between subspaces.       Given a set of k n-dimensional vectors , they determine a subspace S consisting of all their possible linear combinations , scalars . We also say that spans S. The difficulty in measuring the difference between subspaces is that the sets of vectors spanning them are not unique. For example, {x}, {-x} and {2x} all determine the same subspace. This means we cannot simply compare the subspaces spanned by and by comparing each to . Instead, we will measure the angle between the subspaces, which is independent of the spanning set of vectors. Suppose subspace is spanned by and that subspace S is spanned by . If k = 1, we instead write more simply and {x}. When k = 1, we defined the angle between and S as the acute angle between and . When k > 1, we define the acute angle between and S as the largest acute angle between any vector in , and the closest vector x in S to :

    LAPACK routines which compute subspaces return vectors spanning a subspace which are orthonormal. This means the n-by-k matrix satisfies . Suppose also that the vectors spanning S are orthonormal, so also satisfies . Then there is a simple expression for the angle between and S:    

    For example, if

    then .

    As stated above, all our bounds will contain a factor p(n) (or p(m,n)), which measure how roundoff errors can grow as a function of matrix dimension n (or m and n). In practice, the true error usually grows just linearly with n, but we can generally only prove much weaker bounds of the form . This is because we can not rule out the extremely unlikely possibility of rounding errors all adding together instead of canceling on average. Using would give very pessimistic and unrealistic bounds, especially for large n, so we content ourselves with describing p(n) as a ``modestly growing'' polynomial function of n. Using p(n) = 10n in the error bound formulas will often give a reasonable bound. For detailed derivations of various p(n), see [78] [45].

    There is also one situation where p(n) can grow as large as : Gaussian elimination. This typically occurs only on specially constructed matrices presented in numerical analysis courses [p. 212]wilkinson1. However, the expert drivers for solving linear systems, xGESVX and xGBSVX,                 provide error bounds incorporating p(n), and so this rare possibility can be detected.



    next up previous contents index
    Next: Further Details: How Up: How to Measure Previous: How to Measure




    Tue Nov 29 14:03:33 EST 1994

    Further Details: How Error Bounds Are Derived



    next up previous contents index
    Next: Standard Error Analysis Up: Accuracy and Stability Previous: Further Details: How

    Further Details: How Error Bounds Are Derived

     






    Tue Nov 29 14:03:33 EST 1994

    Standard Error Analysis



    next up previous contents index
    Next: Improved Error Bounds Up: Further Details: How Previous: Further Details: How

    Standard Error Analysis

     

      We illustrate standard error analysis with the simple example of evaluating the scalar function y = f(z). Let the output of the subroutine which implements f(z) be denoted alg(z); this includes the effects of roundoff. If where is small, then we say alg is a backward stable     algorithm for f, or that the backward error is small.     In other words, alg(z) is the exact value of f at a slightly perturbed input . gif

    Suppose now that f is a smooth function, so that we may approximate it near z by a straight line: . Then we have the simple error estimate

    Thus, if is small, and the derivative is moderate, the error alg(z) - f(z) will be small gif . This is often written in the similar form

    This approximately bounds the relative error     by the product of the condition number of f at z, , and the relative backward error .     Thus we get an error bound by multiplying a condition   number and a backward error (or bounds for these quantities). We call a problem ill-conditioned   if its condition number is large, and ill-posed   if its condition number is infinite (or does not exist) gif .

    If f and z are vector quantities, then is a matrix (the Jacobian). So instead of using absolute values as before, we now measure by a vector norm and by a matrix norm . The conventional (and coarsest) error analysis uses a norm such as the infinity norm. We therefore call this normwise backward stability.     For example, a normwise stable method for solving a system of linear equations Ax = b will produce a solution satisfying where and are both small (close to machine epsilon). In this case the condition number is (see section 4.4 below).  

    Almost all of the algorithms in LAPACK (as well as LINPACK and EISPACK) are stable in the sense just described gif : when applied to a matrix A they produce the exact result for a slightly different matrix A + E, where is of order .

    Condition numbers may be expensive to compute exactly. For example, it costs about operations to solve Ax = b for a general matrix A, and computing exactly costs an additional operations, or twice as much. But can be estimated in only operations beyond those necessary for solution, a tiny extra cost. Therefore, most of LAPACK's condition numbers and error bounds are based on estimated condition numbers  , using the method of [52] [51] [48]. The price one pays for using an estimated rather than an exact condition number is occasional (but very rare) underestimates of the true error; years of experience attest to the reliability of our estimators, although examples where they badly underestimate the error can be constructed [53]. Note that once a condition estimate is large enough, (usually ), it confirms that the computed answer may be completely inaccurate, and so the exact magnitude of the condition estimate conveys little information.



    next up previous contents index
    Next: Improved Error Bounds Up: Further Details: How Previous: Further Details: How




    Tue Nov 29 14:03:33 EST 1994

    Improved Error Bounds



    next up previous contents index
    Next: Error Bounds for Up: Further Details: How Previous: Standard Error Analysis

    Improved Error Bounds

     

    The standard error analysis just outlined has a drawback: by using the infinity norm to measure the backward error, entries of equal magnitude in contribute equally to the final error bound . This means that if z is sparse or has some very tiny entries, a normwise backward stable algorithm may make very large changes in these entries compared to their original values. If these tiny values are known accurately by the user, these errors may be unacceptable, or the error bounds may be unacceptably large.

    For example, consider solving a diagonal system of linear equations Ax = b. Each component of the solution is computed accurately by Gaussian elimination: . The usual error bound is approximately , which can arbitrarily overestimate the true error, , if at least one is tiny and another one is large.

    LAPACK addresses this inadequacy by providing some algorithms whose backward error is a tiny relative change in each component of z: . This backward error retains both the sparsity structure of z as well as the information in tiny entries. These algorithms are therefore called componentwise relatively backward stable. Furthermore, computed error bounds reflect this stronger form of backward error gif .          

    If the input data has independent uncertainty in each component, each component must have at least a small relative uncertainty, since each is a floating-point number. In this case, the extra uncertainty contributed by the algorithm is not much worse than the uncertainty in the input data, so one could say the answer provided by a componentwise relatively backward stable algorithm is as accurate as the data warrants [1].

    When solving Ax = b using expert driver xyySVX or computational routine xyyRFS, for example, we almost always compute satisfying , where is a small relative change in and is a small relative change in . In particular, if A is diagonal, the corresponding error bound is always tiny, as one would expect (see the next section).

    LAPACK can achieve this accuracy   for linear equation solving, the bidiagonal singular value decomposition, and the symmetric tridiagonal eigenproblem, and provides facilities for achieving this accuracy for least squares problems. Future versions of LAPACK will also achieve this accuracy for other linear algebra problems, as discussed below.



    next up previous contents index
    Next: Error Bounds for Up: Further Details: How Previous: Standard Error Analysis




    Tue Nov 29 14:03:33 EST 1994

    Error Bounds for Linear Equation Solving



    next up previous contents index
    Next: Further Details: Error Up: Accuracy and Stability Previous: Improved Error Bounds

    Error Bounds for Linear Equation Solving

     

    Let Ax = b be the system to be solved, and the computed solution. Let n be the dimension of A. An approximate error bound   for may be obtained in one of the following two ways, depending on whether the solution is computed by a simple driver or an expert driver:

    1. Suppose that Ax = b is solved using the simple driver SGESV   (subsection 2.2.1). Then the approximate error bound gif

      can be computed by the following code fragment.

         EPSMCH = SLAMCH( 'E' )
      *  Get infinity-norm of A
         ANORM = SLANGE( 'I', N, N, A, LDA, WORK )
      *  Solve system; The solution X overwrites B
         CALL SGESV( N, 1, A, LDA, IPIV, B, LDB, INFO )
         IF( INFO.GT.0 ) THEN
            PRINT *,'Singular Matrix'
         ELSE IF (N .GT. 0) THEN
      *     Get reciprocal condition number RCOND of A
            CALL SGECON( 'I', N, A, LDA, ANORM, RCOND,
      $                  WORK, IWORK, INFO )
            RCOND = MAX( RCOND, EPSMCH )
            ERRBD = EPSMCH / RCOND
         END IF
      
       

      For example, suppose gif

      ,

      Then (to 4 decimal places)

      , , the true reciprocal condition number , , and the true error .  

    2. Suppose that Ax = b is solved using the expert driver SGESVX (subsection 2.2.1).   This routine provides an explicit error bound FERR, measured with the infinity-norm:  

      For example, the following code fragment solves Ax = b and computes an approximate error bound FERR:

            CALL SGESVX( 'E', 'N', N, 1, A, LDA, AF, LDAF, IPIV,
      $          EQUED, R, C, B, LDB, X, LDX, RCOND, FERR, BERR,
      $          WORK, IWORK, INFO )
            IF( INFO.GT.0 ) PRINT *,'(Nearly) Singular Matrix'
      

      For the same A and b as above, , , and the actual error is .

    This example illustrates that the expert driver provides an error bound with less programming effort than the simple driver, and also that it may produce a significantly more accurate answer.

    Similar code fragments, with obvious adaptations, may be used with all the driver routines for linear equations listed in Table 2.2. For example, if a symmetric system is solved using the simple driver xSYSV, then xLANSY must be used to compute ANORM, and xSYCON must be used to compute RCOND.






    Tue Nov 29 14:03:33 EST 1994

    Problems that LAPACK can Solve



    next up previous contents index
    Next: Computers for which Up: Essentials Previous: LAPACK

    Problems that LAPACK can Solve

    LAPACK can solve systems of linear equations, linear least squares problems, eigenvalue problems and singular value problems. LAPACK can also handle many associated computations such as matrix factorizations or estimating condition numbers.

    LAPACK contains driver routines for solving standard types of problems, computational routines to perform a distinct computational task, and auxiliary routines to perform a certain subtask or common low-level computation. Each driver routine typically calls a sequence of computational routines. Taken as a whole, the computational routines can perform a wider range of tasks than are covered by the driver routines. Many of the auxiliary routines may be of use to numerical analysts or software developers, so we have documented the Fortran source for these routines with the same level of detail used for the LAPACK routines and driver routines.

    Dense and band matrices are provided for, but not general sparse matrices. In all areas, similar functionality is provided for real and complex matrices. See Chapter 2 for a complete summary of the contents.




    Tue Nov 29 14:03:33 EST 1994

    Further Details: Error Bounds for Linear Equation Solving



    next up previous contents index
    Next: Error Bounds for Up: Error Bounds for Previous: Error Bounds for

    Further Details: Error Bounds for Linear Equation Solving

     

    The conventional error analysis of linear equation   solving goes as follows. Let Ax = b be the system to be solved. Let be the solution computed by LAPACK (or LINPACK) using any of their linear equation solvers. Let r be the residual . In the absence of rounding error r would be zero and would equal x; with rounding error one can only say the following:

    The normwise backward error of the computed solution ,     with respect to the infinity norm, is the pair E,f which minimizes

    subject to the constraint . The minimal value of is given by

    One can show that the computed solution satisfies , where p(n) is a modestly growing function of n. The corresponding condition number is .   The error is bounded by

    In the first code fragment in the last section, , which is in the numerical example, is approximated by . Approximations   of - or, strictly speaking, its reciprocal RCOND - are returned by computational routines xyyCON (subsection 2.3.1) or driver routines xyySVX (subsection 2.2.1). The code fragment makes sure RCOND is at least EPSMCH to avoid overflow in computing ERRBD.     This limits ERRBD to a maximum of 1, which is no loss of generality since a relative error of 1 or more indicates the same thing:     a complete loss of accuracy.   Note that the value of RCOND returned by xyySVX may apply to a linear system obtained from Ax = b by equilibration, i.e. scaling the rows and columns of A in order to make the condition number smaller. This is the case in the second code fragment in the last section, where the program chose to scale the rows by the factors returned in and scale the columns by the factors returned in , resulting in .

    As stated in section 4.3.2, this approach does not respect the presence of zero or tiny entries in A. In contrast, the LAPACK computational routines xyyRFS (subsection 2.3.1) or driver routines xyySVX (subsection 2.2.1) will (except in rare cases) compute a solution with the following properties:

    The componentwise backward error of the computed solution is the pair E,f which minimizes    

    (where we interpret 0 / 0 as 0) subject to the constraint . The minimal value of is given by

    One can show that for most problems the computed by xyySVX satisfies , where p(n) is a modestly growing function of n. In other words, is the exact solution of the perturbed problem where E and f are small relative perturbations in each entry of A and b, respectively. The corresponding condition number is .   The error is bounded by

    The routines xyyRFS and xyySVX return     , which is called BERR   (for Backward ERRor), and a bound on the the actual error , called FERR   (for Forward ERRor), as in the second code fragment in the last section. FERR is actually calculated by the following formula, which can be smaller than the bound given above:

    Here, is the computed value of the residual , and the norm in the numerator is estimated using the same estimation subroutine used for RCOND.

    The value of BERR for the example in the last section is .

    Even in the rare cases where xyyRFS fails to make BERR close to its minimum , the error bound FERR may remain small. See [6] for details.



    next up previous contents index
    Next: Error Bounds for Up: Error Bounds for Previous: Error Bounds for




    Tue Nov 29 14:03:33 EST 1994

    Error Bounds for Linear Least Squares Problems



    next up previous contents index
    Next: Further Details: Error Up: Accuracy and Stability Previous: Further Details: Error

    Error Bounds for Linear Least Squares Problems

     

    The linear least squares problem is to find x that minimizes . We discuss error bounds for the most common case where A is m-by-n with m > n, and A has full rank  ; this is called an overdetermined least squares problem   (the following code fragments deal with m = n as well).

    Let be the solution computed by one of the driver routines xGELS, xGELSX or xGELSS (see section 2.2.2). An approximate error bound                        

    may be computed in one of the following ways, depending on which type of driver routine is used:

    1. Suppose the simple driver SGELS is used:

         EPSMCH = SLAMCH( 'E' )
      *  Get the 2-norm of the right hand side B
         BNORM = SNRM2( M, B, 1 )
      *  Solve the least squares problem; the solution X
      *   overwrites B
         CALL SGELS( 'N', M, N, 1, A, LDA, B, LDB, WORK,
      $               LWORK, INFO )
         IF ( MIN(M,N) .GT. 0 ) THEN
      *     Get the 2-norm of the residual A*X-B
            RNORM = SNRM2( M-N, B( N+1 ), 1 )
      *     Get the reciprocal condition number RCOND of A
            CALL STRCON('I', 'U', 'N', N, A, LDA, RCOND,
      $                 WORK, IWORK, INFO)
            RCOND = MAX( RCOND, EPSMCH )
            IF ( BNORM .GT. 0.0 ) THEN
               SINT = RNORM / BNORM
            ELSE
               SINT = 0.0
            ENDIF
            COST = MAX( SQRT( (1.0E0 - SINT)*(1.0E0 + SINT) ),
      $                 EPSMCH )
            TANT = SINT / COST
            ERRBD = EPSMCH*( 2.0E0/(RCOND*COST) +
      $                      TANT / RCOND**2 )
         ENDIF
      
       

      For example, if ,

      then, to 4 decimal places,

      , , , , and the true error is .

    2. Suppose the expert driver SGELSX is used.   This routine has an input argument RCND, which is used to determine the rank of the input matrix (briefly,   the matrix is considered not to have full rank if its condition number exceeds 1/RCND).   The code fragment below only computes error bounds if the matrix has been determined to have full rank. When the matrix does not have full rank, computing and interpreting error bounds is more complicated, and the reader is referred to the next section.

         EPSMCH = SLAMCH( 'E' )
      *  Get the 2-norm of the right hand side B
         BNORM = SNRM2( M, B, 1 )
      *  Solve the least squares problem; the solution X
      *   overwrites B
         RCND = 0
         CALL SGELSX( M, N, 1, A, LDA, B, LDB, JPVT, RCND,
      $               RANK, WORK, INFO )
         IF ( RANK.LT.N ) THEN
            PRINT *,'Matrix less than full rank'
         ELSE IF ( MIN( M,N ) .GT. 0 ) THEN
      *     Get the 2-norm of the residual A*X-B
            RNORM = SNRM2( M-N, B( N+1 ), 1 )
      *     Get the reciprocal condition number RCOND of A
            CALL STRCON('I', 'U', 'N', N, A, LDA, RCOND,
      $                 WORK, IWORK, INFO)
            RCOND = MAX( RCOND, EPSMCH )
            IF ( BNORM .GT. 0.0 ) THEN
               SINT = RNORM / BNORM
            ELSE
               SINT = 0.0
            ENDIF
            COST = MAX( SQRT( (1.0E0 - SINT)*(1.0E0 + SINT) ),
      $                       EPSMCH )
            TANT = SINT / COST
            ERRBD = EPSMCH*( 2.0E0/(RCOND*COST) +
      $                      TANT / RCOND**2 )
         END IF
      
      The numerical results of this code fragment on the above A and b are the same as for the first code fragment.

    3. Suppose the other type of expert driver SGELSS is used  . This routine also has an input argument RCND, which is used to determine the rank of the matrix A. The same code fragment can be used to compute error bounds as for SGELSX, except that the call to SGELSX must be replaced by:

         CALL SGELSS( M, N, 1, A, LDA, B, LDB, S, RCND, RANK,
      $               WORK, LWORK, INFO )
      

      and the call to STRCON must be replaced by:

               RCOND = S( N ) / S( 1 )
       

      Applied to the same A and b as above, the computed is nearly the same, , , and the true error is .





    next up previous contents index
    Next: Further Details: Error Up: Accuracy and Stability Previous: Further Details: Error




    Tue Nov 29 14:03:33 EST 1994

    Further Details: Error Bounds for Linear Least Squares Problems



    next up previous contents index
    Next: Error Bounds for Up: Error Bounds for Previous: Error Bounds for

    Further Details: Error Bounds for Linear Least Squares Problems

     

    The conventional error analysis of linear least squares problems goes as follows  . As above, let be the solution to minimizing computed by LAPACK using one of the least squares drivers xGELS, xGELSS or xGELSX (see subsection 2.2.2). We discuss the most common case, where A is overdetermined   (i.e., has more rows than columns) and has full rank [45]:                        

    The computed solution has a small normwise backward error. In other words minimizes , where E and f satisfy    

    and p(n) is a modestly growing function of n. We take p(n) = 1 in the code fragments above. Let (approximated by 1/RCOND in the above code fragments), (= RNORM above), and (SINT = RNORM / BNORM above). Here, is the acute angle between the vectors and .     Then when is small, the error is bounded by

    where = COST and = TANT in the code fragments above.

    We avoid overflow by making sure RCOND and COST are both at least EPSMCH, and by handling the case of a zero B matrix separately (BNORM = 0).    

    may be computed directly from the singular values of A returned by xGELSS (as in the code fragment) or by xGESVD. It may also be approximated by using xTRCON following calls to xGELS or xGELSX. xTRCON estimates or instead of , but these can differ from by at most a factor of n.                

    If A is rank-deficient, xGELSS and xGELSX can be used to regularize the problem     by treating all singular values less than a user-specified threshold ( ) as exactly zero. The number of singular values treated as nonzero is returned in RANK. See [45] for error bounds in this case, as well as   [45] [19] for the underdetermined     case.

    The solution of the overdetermined,     full-rank problem may also be characterized as the solution of the linear system of equations

    By solving this linear system using xyyRFS or xyySVX (see section 4.4) componentwise error bounds can also be obtained [7].



    next up previous contents index
    Next: Error Bounds for Up: Error Bounds for Previous: Error Bounds for




    Tue Nov 29 14:03:33 EST 1994

    Error Bounds for Generalized Least Squares Problems



    next up previous contents index
    Next: Error Bounds for Up: Accuracy and Stability Previous: Further Details: Error

    Error Bounds for Generalized Least Squares Problems

     

    There are two kinds of generalized least squares problems that are discussed in section 2.2.3: the linear equality-constrained least squares problem, and the general linear model problem. Error bounds for these problems will be included in a future version of this manual.  




    Tue Nov 29 14:03:33 EST 1994

    Error Bounds for the Symmetric Eigenproblem



    next up previous contents index
    Next: Further Details: Error Up: Accuracy and Stability Previous: Error Bounds for

    Error Bounds for the Symmetric Eigenproblem

     

    The eigendecomposition   of an n-by-n real symmetric matrix is the factorization ( in the complex Hermitian case), where Z is orthogonal (unitary) and is real and diagonal, with . The are the eigenvalues   of Aand the columns of Z are the eigenvectors  . This is also often written . The eigendecomposition of a symmetric matrix is computed by the driver routines xSYEV, xSYEVX, xSYEVD, xSBEV, xSBEVX, xSBEVD, xSPEV, xSPEVX, xSPEVD, xSTEV, xSTEVX and xSTEVD. The complex counterparts of these routines, which compute the eigendecomposition of complex Hermitian matrices, are the driver routines xHEEV, xHEEVX, xHEEVD, xHBEV, xHBEVX, xHBEVD, xHPEV, xHPEVX, and xHPEVD (see subsection 2.2.4).                                                                                    

    The approximate error bounds       for the computed eigenvalues are

    The approximate error bounds for the computed eigenvectors , which bound the acute angles between the computed eigenvectors and true eigenvectors , are:    

    These bounds can be computed by the following code fragment:

          EPSMCH = SLAMCH( 'E' )
    *     Compute eigenvalues and eigenvectors of A
    *     The eigenvalues are returned in W
    *     The eigenvector matrix Z overwrites A
          CALL SSYEV( 'V', UPLO, N, A, LDA, W, WORK, LWORK, INFO )
          IF( INFO.GT.0 ) THEN
             PRINT *,'SSYEV did not converge'
          ELSE IF ( N.GT.0 ) THEN
    *        Compute the norm of A
             ANORM = MAX( ABS( W(1) ), ABS( W(N) ) )
             EERRBD = EPSMCH * ANORM
    *        Compute reciprocal condition numbers for eigenvectors
             CALL SDISNA( 'Eigenvectors', N, N, W, RCONDZ, INFO )
             DO 10 I = 1, N
                ZERRBD( I ) = EPSMCH * ( ANORM / RCONDZ( I ) )
    10       CONTINUE
          ENDIF

    For example, if and

    then the eigenvalues, approximate error bounds, and true errors are






    Tue Nov 29 14:03:33 EST 1994

    Further Details: Error Bounds for the Symmetric Eigenproblem



    next up previous contents index
    Next: Error Bounds for Up: Error Bounds for Previous: Error Bounds for

    Further Details: Error Bounds for the Symmetric Eigenproblem

     

    The usual error analysis of the symmetric   eigenproblem (using any LAPACK routine in subsection 2.2.4 or any EISPACK routine) is as follows [64]:

    The computed eigendecomposition is nearly the exact eigendecomposition of A + E, i.e., is a true eigendecomposition so that is orthogonal, where and . Here p(n) is a modestly growing function of n. We take p(n) = 1 in the above code fragment. Each computed eigenvalue differs from a true by at most

    Thus large eigenvalues (those near ) are computed to high relative accuracy   and small ones may not be.  

    The angular difference between the computed unit eigenvector and a true unit eigenvector satisfies the approximate bound  

    if is small enough. Here is the absolute gap     between and the nearest other eigenvalue. Thus, if is close to other eigenvalues, its corresponding eigenvector may be inaccurate. The gaps may be easily computed from the array of computed eigenvalues using subroutine SDISNA    . The gaps computed by SDISNA are ensured not to be so small as to cause overflow when used as divisors.    

    Let be the invariant subspace spanned by a collection of eigenvectors , where is a subset of the integers from 1 to n. Let S be the corresponding true subspace. Then

      where

    is the absolute gap between the eigenvalues in and the nearest other eigenvalue. Thus, a cluster   of close eigenvalues which is far away from any other eigenvalue may have a well determined invariant subspace even if its individual eigenvectors are ill-conditioned gif .

    In the special case of a real symmetric tridiagonal matrix T, the eigenvalues and eigenvectors can be computed much more accurately. xSYEV (and the other symmetric eigenproblem drivers) computes the eigenvalues and eigenvectors of a dense symmetric matrix by first reducing it to tridiagonal form   T, and then finding the eigenvalues and eigenvectors of T. Reduction of a dense matrix to tridiagonal form   T can introduce additional errors, so the following bounds for the tridiagonal case do not apply to the dense case.

    The eigenvalues of T may be computed with small componentwise relative backward error     ( ) by using subroutine xSTEBZ (subsection     2.3.4)     or driver xSTEVX (subsection 2.2.4). If T is also positive definite, they may also be computed at least as accurately by xPTEQR         (subsection 2.3.4). To compute error bounds for the computed eigenvalues we must make some assumptions about T. The bounds discussed here are from [13]. Suppose T is positive definite, and write T = DHD where and . Then the computed eigenvalues can differ from true eigenvalues by

    where p(n) is a modestly growing function of n. Thus if is moderate, each eigenvalue will be computed to high relative accuracy,   no matter how tiny it is. The eigenvectors computed by xPTEQR can differ from true eigenvectors by at most about

    if is small enough, where is the relative gap between and the nearest other eigenvalue.     Since the relative gap may be much larger than the absolute gap, this error bound may be much smaller than the previous one.

    could be computed by applying xPTCON (subsection 2.3.1) to H.         The relative gaps are easily computed from the array of computed eigenvalues.

    Jacobi's method [69] [76] [24] is another algorithm for finding eigenvalues and eigenvectors of symmetric matrices. It is slower than the algorithms based on first tridiagonalizing the matrix, but is capable of computing more accurate answers in several important cases. Routines implementing Jacobi's method and corresponding error bounds will be available in a future LAPACK release.



    next up previous contents index
    Next: Error Bounds for Up: Error Bounds for Previous: Error Bounds for




    Tue Nov 29 14:03:33 EST 1994

    Error Bounds for the Nonsymmetric Eigenproblem



    next up previous contents index
    Next: Further Details: Error Up: Accuracy and Stability Previous: Further Details: Error

    Error Bounds for the Nonsymmetric Eigenproblem

     

    The nonsymmetric eigenvalue problem   is more complicated than the symmetric eigenvalue problem. In this subsection, we state the simplest bounds and leave the more complicated ones to subsequent subsections.

    Let A be an n-by-n nonsymmetric matrix, with eigenvalues . Let be a right eigenvector corresponding to : . Let and be the corresponding computed eigenvalues and eigenvectors, computed by expert driver routine xGEEVX (see subsection 2.2.4).        

    The approximate error bounds for the computed eigenvalues are

    The approximate error bounds     for the computed eigenvectors , which bound the acute angles between the computed eigenvectors and true eigenvectors , are    

    These bounds can be computed by the following code fragment:

          EPSMCH = SLAMCH( 'E' )
    *     Compute the eigenvalues and eigenvectors of A
    *     WR contains the real parts of the eigenvalues
    *     WI contains the real parts of the eigenvalues
    *     VL contains the left eigenvectors
    *     VR contains the right eigenvectors
          CALL SGEEVX( 'P', 'V', 'V', 'B', N, A, LDA, WR, WI,
         $             VL, LDVL, VR, LDVR, ILO, IHI, SCALE, ABNRM,
         $             RCONDE, RCONDV, WORK, LWORK, IWORK, INFO )
          IF( INFO.GT.0 ) THEN
             PRINT *,'SGEEVX did not converge'
          ELSE IF ( N.GT.0 ) THEN
             DO 10 I = 1, N
                EERRBD(I) = EPSMCH*ABNRM/RCONDE(I)
                VERRBD(I) = EPSMCH*ABNRM/RCONDV(I)
    10       CONTINUE
          ENDIF

    For example, if and

    then true eigenvalues, approximate eigenvalues, approximate error bounds, and true errors are






    Tue Nov 29 14:03:33 EST 1994

    Further Details: Error Bounds for the Nonsymmetric Eigenproblem



    next up previous contents index
    Next: Overview Up: Error Bounds for Previous: Error Bounds for

    Further Details: Error Bounds for the Nonsymmetric Eigenproblem






    Tue Nov 29 14:03:33 EST 1994

    Overview



    next up previous contents index
    Next: Balancing and Conditioning Up: Further Details: Error Previous: Further Details: Error

    Overview

     

    In this subsection, we will summarize all the available error bounds. Later subsections will provide further details. The reader may also refer to [11].

    Bounds for individual eigenvalues and eigenvectors are provided by driver xGEEVX (subsection 2.2.4) or computational routine xTRSNA (subsection 2.3.5).                 Bounds for clusters   of eigenvalues and their associated invariant subspace are provided by driver xGEESX (subsection 2.2.4) or                 computational routine xTRSEN (subsection 2.3.5).  

    We let be the i-th computed eigenvalue and an i-th true eigenvalue. gif Let be the corresponding computed right eigenvector, and a true right eigenvector (so ). If is a subset of the integers from 1 to n, we let denote the average of the selected eigenvalues: , and similarly for . We also let denote the subspace spanned by ; it is called a right invariant subspace because if v is any vector in then Av is also in . is the corresponding computed subspace.

    The algorithms for the nonsymmetric eigenproblem are normwise backward stable:     they compute the exact eigenvalues, eigenvectors and invariant subspaces of slightly perturbed matrices A + E, where . Some of the bounds are stated in terms of and others in terms of ; one may use to approximate either quantity. The code fragment in the previous subsection approximates by , where is returned by xGEEVX.

    xGEEVX (or xTRSNA) returns two quantities for each , pair: and . xGEESX (or xTRSEN) returns two quantities for a selected subset of eigenvalues: and . (or ) is a reciprocal condition number for the computed eigenvalue (or ), and is referred to as RCONDE by xGEEVX (or xGEESX).     (or ) is a reciprocal condition number for the right eigenvector (or ), and is referred to as RCONDV by xGEEVX (or xGEESX).   The approximate error bounds for eigenvalues, averages of eigenvalues, eigenvectors, and invariant subspaces provided in Table 4.5 are true for sufficiently small ||E||, which is why they are called asymptotic.

       
    Table 4.5: Asymptotic error bounds for the nonsymmetric eigenproblem

           

    If the problem is ill-conditioned, the asymptotic bounds may only hold for extremely small ||E||. Therefore, in Table 4.6 we also provide global bounds which are guaranteed to hold for all .

       
    Table 4.6: Global error bounds for the nonsymmetric eigenproblem assuming

    We also have the following bound, which is true for all E: all the lie in the union of n disks, where the i-th disk is centered at and has radius . If k of these disks overlap, so that any two points inside the k disks can be connected by a continuous curve lying entirely inside the k disks, and if no larger set of k + 1 disks has this property, then exactly k of the lie inside the union of these k disks. Figure 4.1 illustrates this for a 10-by-10 matrix, with 4 such overlapping unions of disks, two containing 1 eigenvalue each, one containing 2 eigenvalues, and one containing 6 eigenvalues.

       
    Figure 4.1: Bounding eigenvalues inside overlapping disks

    Finally, the quantities s and sep tell use how we can best (block) diagonalize a matrix A by a similarity, , where each diagonal block has a selected subset of the eigenvalues of A. Such a decomposition may be useful in computing functions of matrices, for example. The goal is to choose a V with a nearly minimum condition number   which performs this decomposition, since this generally minimizes the error in the decomposition. This may be done as follows. Let be -by- . Then columns through of V span the invariant subspace   of A corresponding to the eigenvalues of ; these columns should be chosen to be any orthonormal basis of this space (as computed by xGEESX, for example). Let be the value corresponding to the cluster of eigenvalues of , as computed by xGEESX or xTRSEN. Then , and no other choice of V can make its condition number smaller than [17]. Thus choosing orthonormal subblocks of V gets to within a factor b of its minimum value.

    In the case of a real symmetric (or complex Hermitian) matrix, s = 1 and sep is the absolute gap, as defined in subsection 4.7. The bounds in Table 4.5 then reduce to the bounds in subsection 4.7.



    next up previous contents index
    Next: Balancing and Conditioning Up: Further Details: Error Previous: Further Details: Error




    Tue Nov 29 14:03:33 EST 1994

    Balancing and Conditioning



    next up previous contents index
    Next: Computing and Up: Further Details: Error Previous: Overview

    Balancing and Conditioning

     

    There are two preprocessing steps   one may perform on a matrix A in order to make its eigenproblem easier. The first is permutation, or reordering the rows and columns to make A more nearly upper triangular (closer to Schur form): , where P is a permutation matrix. If is permutable to upper triangular form (or close to it), then no floating-point operations (or very few) are needed to reduce it to Schur form. The second is scaling   by a diagonal matrix D to make the rows and columns of more nearly equal in norm: . Scaling can make the matrix norm smaller with respect to the eigenvalues, and so possibly reduce the inaccuracy contributed by roundoff [][Chap. II/11]wilkinson3. We refer to these two operations as .

    Balancing is performed by driver xGEEVX, which calls computational routine xGEBAL. The user may tell xGEEVX to optionally         permute, scale, do both, or do neither; this is specified by input parameter BALANC. Permuting has no effect on   the condition numbers   or their interpretation as described in previous subsections. Scaling, however, does change their interpretation, as we now describe.

    The output parameters of xGEEVX - SCALE (real array of length N),       ILO (integer), IHI (integer) and ABNRM (real) - describe the result of balancing a matrix A into , where N is the dimension of A. The matrix is block upper triangular, with at most three blocks: from 1 to ILO - 1, from ILO to IHI, and from IHI + 1 to N. The first and last blocks are upper triangular, and so already in Schur form. These are not scaled; only the block from ILO to IHI is scaled. Details of the scaling and permutation are described in SCALE (see the specification of xGEEVX or xGEBAL for details)  . The one-norm of is returned in ABNRM.

    The condition numbers   described in earlier subsections are computed for the balanced matrix , and so some interpretation is needed to apply them to the eigenvalues and eigenvectors of the original matrix A. To use the bounds for eigenvalues in Tables 4.5 and 4.6, we must replace and by . To use the bounds for eigenvectors, we also need to take into account that bounds on rotations of eigenvectors are for the eigenvectors of , which are related to the eigenvectors x of A by , or . One coarse but simple way to do this is as follows: let be the bound on rotations of from Table 4.5 or Table 4.6 and let be the desired bound on rotation of x. Let

    be the condition number of D.   Then

       

    The numerical example in subsection 4.8 does no scaling, just permutation.



    next up previous contents index
    Next: Computing and Up: Further Details: Error Previous: Overview




    Tue Nov 29 14:03:33 EST 1994

    Computers for which LAPACK is Suitable



    next up previous contents index
    Next: LAPACK Compared with Up: Essentials Previous: Problems that LAPACK

    Computers for which LAPACK is Suitable

    LAPACK is designed to give high efficiency   on vector processors, high-performance ``super-scalar'' workstations, and shared memory multiprocessors. LAPACK in its present form is less likely to give good performance on other types of parallel architectures (for example, massively parallel SIMD machines, or distributed memory machines), but work has begun to try to adapt LAPACK to these new architectures. LAPACK can also be used satisfactorily on all types of scalar machines (PC's, workstations, mainframes). See Chapter 3 for some examples of the performance achieved by LAPACK routines.




    Tue Nov 29 14:03:33 EST 1994

    Computing <var>s</var> and <var>sep</var>



    next up previous contents index
    Next: Error Bounds for Up: Further Details: Error Previous: Balancing and Conditioning

    Computing s and sep

     

    To explain s and sep  , we need to introduce   the spectral projector P [56] [72], and the separation of two matrices   A and B, sep(A , B) [75] [72].

    We may assume the matrix A is in Schur form, because reducing it to this form does not change the values of s and sep. Consider a cluster of m > = 1 eigenvalues, counting multiplicities. Further assume the n-by-n matrix A is

     

    where the eigenvalues of the m-by-n matrix are exactly those in which we are interested. In practice, if the eigenvalues on the diagonal of A are in the wrong order, routine xTREXC         can be used to put the desired ones in the upper left corner as shown.

    We define the spectral projector, or simply projector P belonging to the eigenvalues of as

     

    where R satisfies the system of linear equations

     

    Equation ( 4.3) is called a Sylvester equation  . Given the Schur form ( 4.1), we solve equation ( 4.3) for R using the subroutine xTRSYL.        

    We can now define s for the eigenvalues of :

    In practice we do not use this expression since is hard to compute. Instead we use the more easily computed underestimate

    which can underestimate the true value of s by no more than a factor . This underestimation makes our error bounds more conservative. This approximation of s is called RCONDE in xGEEVX and xGEESX.  

    The separation of the matrices and is defined as the smallest singular value of the linear map in ( 4.3) which takes X to , i.e.,

     

    This formulation lets us estimate using the condition estimator   xLACON [52] [51] [48], which estimates the norm of a linear operator given the ability to compute T and quickly for arbitrary x. In our case, multiplying an arbitrary vector by T means solving the Sylvester equation ( 4.3)   with an arbitrary right hand side using xTRSYL, and multiplying by means solving the same equation with replaced by and replaced by . Solving either equation costs at most operations, or as few as if m << n. Since the true value of sep is but we use , our estimate of sep may differ from the true value by as much as . This approximation to sep is called RCONDV by xGEEVX and xGEESX.  

    Another formulation which in principle permits an exact evaluation of is

     

    where is the Kronecker product of X and Y. This method is generally impractical, however, because the matrix whose smallest singular value we need is m(n - m) dimensional, which can be as large as . Thus we would require as much as extra workspace and operations, much more than the estimation method of the last paragraph.

    The expression measures the ``separation'' of the spectra of and in the following sense. It is zero if and only if and have a common eigenvalue, and small if there is a small perturbation of either one that makes them have a common eigenvalue. If and are both Hermitian matrices, then is just the gap, or minimum distance between an eigenvalue of and an eigenvalue of . On the other hand, if and are non-Hermitian, may be much smaller than this gap.



    next up previous contents index
    Next: Error Bounds for Up: Further Details: Error Previous: Balancing and Conditioning




    Tue Nov 29 14:03:33 EST 1994

    Error Bounds for the Singular Value Decomposition



    next up previous contents index
    Next: Further Details: Error Up: Accuracy and Stability Previous: Computing and

    Error Bounds for the Singular Value Decomposition

     

    The singular   value decomposition (SVD) of a real m-by-n matrix A is defined as follows. Let r = min(m , n). The the SVD of A is ( in the complex case), where U and V are orthogonal (unitary) matrices and is diagonal, with . The are the singular values of A and the leading r columns of U and of V the left and right singular vectors, respectively. The SVD of a general matrix is computed by xGESVD         (see subsection 2.2.4).

    The approximate error bounds for the computed singular values are

    The approximate error bounds for the computed singular vectors and , which bound the acute angles between the computed singular vectors and true singular vectors and , are    

    These bounds can be computing by the following code fragment.    

       EPSMCH = SLAMCH( 'E' )
    *  Compute singular value decomposition of A
    *  The singular values are returned in S
    *  The left singular vectors are returned in U
    *  The transposed right singular vectors are returned in VT
       CALL  SGESVD( 'S', 'S', M, N, A, LDA, S, U, LDU, VT, LDVT,
       $             WORK, LWORK, INFO )
       IF( INFO.GT.0 ) THEN
          PRINT *,'SGESVD did not converge'
       ELSE IF ( MIN(M,N) .GT. 0 ) THEN
          SERRBD  = EPSMCH * S(1)
    *     Compute reciprocal condition numbers for singular
    *      vectors
          CALL SDISNA( 'Left', M, N, S, RCONDU, INFO )
          CALL SDISNA( 'Right', M, N, S, RCONDV, INFO )
          DO 10 I = 1, MIN(M,N)
             VERRBD( I ) = EPSMCH*( S(1)/RCONDV( I ) )
             UERRBD( I ) = EPSMCH*( S(1)/RCONDU( I ) )
    10    CONTINUE
       END IF

    For example, if and

    then the singular values, approximate error bounds, and true errors are given below.






    Tue Nov 29 14:03:33 EST 1994

    Further Details: Error Bounds for the Singular Value Decomposition



    next up previous contents index
    Next: Error Bounds for Up: Error Bounds for Previous: Error Bounds for

    Further Details: Error Bounds for the Singular Value Decomposition

     

    The usual error analysis of the SVD algorithm   xGESVD in LAPACK (see subsection 2.2.4) or the routines in LINPACK and EISPACK is as follows [45]:

    The SVD algorithm is backward stable.     This means that the computed SVD, , is nearly the exact SVD of A + E where , and p(m , n) is a modestly growing function of m and n. This means is the true SVD, so that and are both orthogonal, where , and . Each computed singular value differs from true by at most

    (we take p(m , n) = 1 in the code fragment). Thus large singular values (those near ) are computed to high relative accuracy   and small ones may not be.    

    The angular difference between the computed left singular vector and a true satisfies the approximate bound

    where is the absolute gap     between and the nearest other singular value. We take p(m , n) = 1 in the code fragment. Thus, if is close to other singular values, its corresponding singular vector may be inaccurate. When n > m, then must be redefined as . The gaps may be easily computed from the array of computed singular values using function    SDISNA. The gaps computed by SDISNA are ensured not to be so small as to cause overflow when used as divisors.     The same bound applies to the computed right singular vector and a true vector .

    Let be the space spanned by a collection of computed left singular vectors , where is a subset of the integers from 1 to n. Let S be the corresponding true space. Then

    where

    is the absolute gap between the singular values in and the nearest other singular value. Thus, a cluster   of close singular values which is far away from any other singular value may have a well determined space even if its individual singular vectors are ill-conditioned. The same bound applies to a set of right singular vectors gif .

    In the special case of bidiagonal matrices, the singular values and singular vectors may be computed much more accu