This article has presented some performance figures for ScaLAPACK routines. The figures are provided for illustration only and should not be regarded as a definitive up-to-date statement of performance. They have been selected from performance figures obtained in 1995-1996 during the development of version 1.4 of ScaLAPACK. All reported timings were obtained by using the optimized version of the BLAS available on each machine. For the IBM computers, the ESSL BLAS were used. The PVM and MPI versions of the BLACS was used for timings involving clusters of workstations; the BLACS written on top of MPL was used for the timings on the IBM SP-2; the BLACS written on top of NX was used for timings on the Intel Paragon.
Performance is affected by many factors that may change from time to time, such as details of hardware (cycle time, cache size), communication latency , bandwidth, compiler, and BLAS. To obtain up-to-date performance figures, one should use the timing programs provided with ScaLAPACK.
ScaLAPACK is portable across a wide range of distributed-memory environments such as the IBM SP series, Intel series (Gamma, Delta, Paragon), Cray T3 series, TM CM-5, clusters of workstations, and any system for which PVM  or MPI  is available. Similar to the BLAS and LAPACK, many of the goals of the ScaLAPACK project--particularly portability--are aided by developing and promoting standards, especially for low-level communication and computation routines. We have been successful in attaining these goals, limiting machine dependencies to two standard libraries: the BLAS (Basic Linear Algebra Subroutines) and the BLACS (Basic Linear Algebra Communication Subroutines). ScaLAPACK will run on any machine where both the BLAS and the BLACS are available.
All ScaLAPACK-related software is publicly available on netlib via the URL