The BLAS are a set of Basic Linear Algebra Subprograms that perform vector-vector, matrix-vector, and matrix-matrix operations. LAPACK is designed around the Level 1, 2, and 3 BLAS, and nearly all of the parallelism in the LAPACK routines is contained in the BLAS. Therefore, the key to getting good performance from LAPACK lies in having an efficient version of the BLAS optimized for your particular machine. Optimized BLAS libraries are available on a variety of architectures, refer to the BLAS FAQ on netlib for further information.

There are also freely available BLAS generators that automatically tune a subset of the BLAS for a given architecture. E.g.,http://www.netlib.org/blas/faq.html

And, if all else fails, there is the Fortran 77 reference implementation of the Level 1, 2, and 3 BLAS available on netlib (also included in the LAPACK distribution tar file).http://www.netlib.org/atlas/

No matter which BLAS library is used, the BLAS test programs should always be run.http://www.netlib.org/blas/blas.tgz

Users should not expect too much from the Fortran 77 reference implementation BLAS; these versions were written to define the basic operations and do not employ the standard tricks for optimizing Fortran code.

The formal definitions of the Level 1, 2, and 3 BLAS are in [9], [7], and [5]. The BLAS Quick Reference card is available on netlib.