next up previous contents index
Next: Index of Driver and Up: Troubleshooting Previous: Wrong Results   Contents   Index

Poor Performance

LAPACK relies on an efficient implementation of the BLAS. We have tried to make the performance of LAPACK ``transportable'' by performing most of the computation within the Level 1, 2, and 3 BLAS, and by isolating all of the machine-dependent tuning parameters in a single integer function ILAENV.

To avoid poor performance from LAPACK routines, note the following recommendations:

One should use machine-specific optimized BLAS if they are available. Many manufacturers and research institutions have developed, or are developing, efficient versions of the BLAS for particular machines. The BLAS enable LAPACK routines to achieve high performance with transportable software. Users are urged to determine whether such an implementation of the BLAS exists for their platform. When such an optimized implementation of the BLAS is available, it should be used to ensure optimal performance. If such a machine-specific implementation of the BLAS does not exist for a particular platform, one should consider installing a publicly available set of BLAS that requires only an efficient implementation of the matrix-matrix multiply BLAS routine xGEMM. Examples of such implementations are [21,72]. A machine-specific and efficient implementation of the routine GEMM can be automatically generated by publicly available software such as [102] and [15]. Although a reference implementation of the Fortran77 BLAS is available from the blas directory on netlib, these routines are not expected to perform as well as a specially tuned implementation on most high-performance computers - on some machines it may give much worse performance - but it allows users to run LAPACK software on machines that do not offer any other implementation of the BLAS.

For best performance, the LAPACK routine ILAENV should be set with optimal tuning parameters for the machine being used. The version of ILAENV provided with LAPACK supplies default values for these parameters that give good, but not optimal, average case performance on a range of existing machines. In particular, the performance of xHSEQR is particularly sensitive to the correct choice of block parameters; the same applies to the driver routines which call xHSEQR, namely xGEES, xGEESX, xGEEV and xGEEVX. Further details on setting parameters in ILAENV are found in section 6.

LWORK $\geq$ WORK(1):
The performance of some routines depends on the amount of workspace supplied. In such cases, an argument, usually called WORK, is provided, accompanied by an integer argument LWORK specifying its length as a linear array. On exit, WORK(1) returns the amount of workspace required to use the optimal tuning parameters. If LWORK < WORK(1), then insufficient workspace was provided to use the optimal parameters, and the performance may be less than possible. One should check LWORK $\geq$ WORK(1) on return from an LAPACK routine requiring user-supplied workspace to see if enough workspace has been provided. Note that the computation is performed correctly, even if the amount of workspace is less than optimal, unless LWORK is reported as an invalid value by a call to XERBLA as described in Section 7.3.

Users should beware of the high cost of the first call to the LAPACK auxiliary routine xLAMCH, which computes machine characteristics such as epsilon and the smallest invertible number. The first call dynamically determines a set of parameters defining the machine's arithmetic, but these values are saved and subsequent calls incur only a trivial cost. For performance testing, the initial cost can be hidden by including a call to xLAMCH in the main program, before any calls to LAPACK routines that will be timed. A sample use of SLAMCH is
      XXXXXX = SLAMCH( 'P' )
or in double precision:
      XXXXXX = DLAMCH( 'P' )
A cleaner but less portable solution is for the installer to save the values computed by xLAMCH for a specific machine and create a new version of xLAMCH with these constants set in DATA statements, taking care that no accuracy is lost in the translation.

next up previous contents index
Next: Index of Driver and Up: Troubleshooting Previous: Wrong Results   Contents   Index
Susan Blackford