ScaLAPACK Frequently Asked Questions (FAQ)

scalapack@cs.utk.edu   |   ScaLAPACK User Forum   |   Subscribe to the ScaLAPACK announcement list

Many thanks to the netlib_maintainers@netlib.org from whose FAQ list I have patterned this list for ScaLAPACK.

Table of Contents

ScaLAPACK
1.1) What is ScaLAPACK?
1.2) How do I reference ScaLAPACK in a scientific publication?
1.3) Are there vendor-specific versions of ScaLAPACK?
1.4) What is the difference between the vendor and Netlib version of ScaLAPACK and which should I use?
1.5) Are there legal restrictions on the use of ScaLAPACK software?
1.6) What is two-dimensional block cyclic data distribution?
1.7) Where can I find out more information about ScaLAPACK?
1.8) What and where are the PBLAS?
1.9) Are example programs available?
1.10) How do I run an example program?
1.11) How do I install ScaLAPACK?
1.12) How do I install ScaLAPACK using MPIch-G and Globus?
1.13) How do I achieve high performance using ScaLAPACK?
1.14) Are prebuilt ScaLAPACK libraries available?
1.15) How do I find a particular routine?
1.16) I can't get a program to work. What should I do?
1.17) How can I unpack scalapack.tgz?
1.18) What technical support for ScaLAPACK is available?
1.19) How do I submit a bug report?
1.20) How do I gather a distributed vector back to one processor?

BLACS
2.1) What and where are the BLACS?
2.2) Is there a Quick Reference Guide to the BLACS available?
2.3) How do I install the BLACS?
2.4) Are prebuilt BLACS libraries available?
2.5) Are example BLACS programs available?

BLAS
3.1) What and where are the BLAS?
3.2) Are there legal restrictions on the use of BLAS reference implementation software?
3.3) Publications/references for the BLAS?
3.4) Is there a Quick Reference Guide to the BLAS available?
3.5) Are optimized BLAS libraries available? Where can I find vendor supplied BLAS?
3.6) Where can I find Java BLAS?
3.7) Is there a C interface to the BLAS?
3.8) Are prebuilt reference implementations of the Fortran77 BLAS available?
3.9) What about shared memory machines? Are there multithreaded versions of the BLAS available?

Translation in Belorussian provided by Amanda Lynn

1) ScaLAPACK

1.1) What is ScaLAPACK?

The ScaLAPACK (or Scalable LAPACK) library includes a subset of LAPACK routines redesigned for distributed memory MIMD parallel computers. It is currently written in a Single-Program-Multiple-Data style using explicit message passing for interprocessor communication. It assumes matrices are laid out in a two-dimensional block cyclic decomposition.

Like LAPACK, the ScaLAPACK routines are based on block-partitioned algorithms in order to minimize the frequency of data movement between different levels of the memory hierarchy. (For such machines, the memory hierarchy includes the off-processor memory of other processors, in addition to the hierarchy of registers, cache, and local memory on each processor.) The fundamental building blocks of the ScaLAPACK library are distributed memory versions (PBLAS) of the Level 1, 2 and 3 BLAS, and a set of Basic Linear Algebra Communication Subprograms (BLACS) for communication tasks that arise frequently in parallel linear algebra computations. In the ScaLAPACK routines, all interprocessor communication occurs within the PBLAS and the BLACS. One of the design goals of ScaLAPACK was to have the ScaLAPACK routines resemble their LAPACK equivalents as much as possible.

For detailed information on ScaLAPACK, please refer to the ScaLAPACK Users' Guide.


1.2) How do I reference ScaLAPACK in a scientific publication?

We ask that you cite the ScaLAPACK Users' Guide.

@BOOK{slug,
      AUTHOR = {Blackford, L. S. and Choi, J. and Cleary, A. and
                D'Azevedo, E. and Demmel, J. and Dhillon, I. and
                Dongarra, J. and Hammarling, S. and Henry, G. and
                Petitet, A. and Stanley, K. and Walker, D. and
                Whaley, R. C.},
      TITLE = {{ScaLAPACK} Users' Guide},
      PUBLISHER = {Society for Industrial and Applied Mathematics},
      YEAR = {1997},
      ADDRESS = {Philadelphia, PA},
      ISBN = {0-89871-397-8 (paperback)} }

1.3) Are there vendor-specific versions of ScaLAPACK?

Yes.

ScaLAPACK has been incorporated into several commercial packages, including the Sun Scalable Scientific Subroutine Library (Sun S3L), NAG Parallel Library, IBM Parallel ESSL, and Cray LIBSCI, and is being integrated into the VNI IMSL Numerical Library, as well as software libraries for Fujitsu, Hewlett-Packard/Convex, Hitachi, NEC, and SGI.


1.4) What is the difference between the vendor and Netlib version of ScaLAPACK and which should I use?

The publically available version of ScaLAPACK (on netlib) is designed to be portable and efficient across a wide range of computers. It is not hand-tuned for a specific computer architecture.

The vendor-specific versions of ScaLAPACK have been optimized for a specific architecture. Therefore, for best performance, we recommend using a vendor-optimized version of ScaLAPACK if it is available.

However, as new ScaLAPACK routines are introduced with each release, the vendor-specific versions of ScaLAPACK may only contain a subset of the existing routines.

If a user suspects an error in a vendor-specific ScaLAPACK routine, he is recommended to download the ScaLAPACK Test Suite from netlib.


1.5) Are there legal restrictions on the use of ScaLAPACK software?

ScaLAPACK (like LINPACK, EISPACK, LAPACK, etc) is a freely-available software package. It is available from netlib via anonymous ftp and the World Wide Web. It can, and is, being included in commercial packages (e.g., Sun's S3L, IBM's Parallel ESSL, NAG Numerical PVM and Interactive Supercomputing's Star-P for MATLAB). We only ask that proper credit be given to the authors.

Like all software, it is copyrighted. It is not trademarked, but we do ask the following:

If you modify the source for these routines we ask that you change the name of the routine and comment the changes made to the original.

We will gladly answer any questions regarding the software. If a modification is done, however, it is the responsibility of the person who modified the routine to provide support.


1.6) What is two-dimensional block cyclic data distribution?
two-dimensional block cyclic decomposition


1.7) Where can I find more information about ScaLAPACK?

A variety of working notes related to the ScaLAPACK library were published as LAPACK Working Notes and are available in postscript and pdf format at:

http://www.netlib.org/lapack/lawns/ and
http://www.netlib.org/lapack/lawnspdf/

To keep updated with the ScaLAPACK project, subscribe to the ScaLAPACK announcement list (USERS CANNOT POST TO THIS LIST)
[ Archives | Subscribe / unsubscribe ]
This is a low-volume list that is used to announce new version of ScaLAPACK, important updates, etc. The list is only for announcements, so only the ScaLAPACK development team can post to the list. Posts from outside the ScaLAPACK development team will be automatically discarded.


1.8) What and where are the PBLAS?

The Parallel Basic Linear Algebra Subprograms (PBLAS) are distributed memory versions of the Level 1, 2 and 3 BLAS. A Quick Reference Guide to the PBLAS is available. The software is available as part of the ScaLAPACK distribution tar file (scalapack.tgz).

There is also a new prototype version of the PBLAS (version 2.0), which is alignment-restriction free and uses logical algorithmic blocking techniques. For details, please refer to the scalapack/prototype/readme.pblas.


1.9) Are example ScaLAPACK programs available?

Yes, example ScaLAPACK programs are available. Refer to

http://www.netlib.org/scalapack/examples/
for a list of available example programs.

A detailed description of how to run a ScaLAPACK example program is discussed in Chapter 2 of the ScaLAPACK Users' Guide.


1.10) How do I run an example program?

A detailed description of how to run a ScaLAPACK example program is discussed in Chapter 2 of the ScaLAPACK Users' Guide.


1.11) How do I install ScaLAPACK?

A comprehensive Installation Guide for ScaLAPACK is provided. In short, a user only needs to modify one file, SLmake.inc, to specify his compiler, compiler flags, location of his MPI library, BLACS library, and BLAS library. And then type make lib to build the ScaLAPACK library, and make exe to build the testing/timing executables. Example SLmake.inc files for various architectures are supplied in the SCALAPACK/INSTALL subdirectory in the distribution.

When you install ScaLAPACK, the installation assumes that the user has available a low-level message passing layer (like MPI, PVM, or a native message-passing library), a BLACS library (MPIBLACS or PVMBLACS, etc), and a BLAS library. If any of these required components is not available, then the user must build the needed component before proceeding with the ScaLAPACK installation.

If a vendor-optimized BLAS library is not available, ATLAS can be used to automatically generate an optimized BLAS library for your architecture. Only as a last resort should the user use the reference implementation Fortran77 BLAS contained on the BLAS webpage.

For installing ScaLAPACK under Windows, please refer to ScaLAPACK for Windows .
We provide pre-build librairies, a Visual Studio Solution build and a nmake build.


1.12) How do I install ScaLAPACK using MPIch-G and Globus?

A detailed explanation of how to run a ScaLAPACK program using MPIch-G and Globus can be found at: http://www.cs.utk.edu/~petitet/grads/.

See Question 1.11 for general installation instructions.


1.13) How do I achieve high performance using ScaLAPACK?

ScaLAPACK performance relies on an efficient low-level message-passing layer and high speed interconnection network for communication, and an optimized BLAS library for local computation.

For a detailed description of performance-related issues, please refer to Chapter 5 of the ScaLAPACK Users' Guide.


1.14) Are prebuilt ScaLAPACK libraries available?

Yes, prebuilt ScaLAPACK libraries are available for a variety of architectures. Refer to

http://www.netlib.org/scalapack/archives/
for a complete list of available prebuilt libraries.


1.15) How do I find a particular routine?

Indexes of individual ScaLAPACK driver and computational routines are available. These indexes contain brief descriptions of each routine.

ScaLAPACK routines are available in four types: single precision real, double precision real, single precision complex, and double precision complex. At the present time, the nonsymmetric eigenproblem is only available in single and double precision real.


1.16) I can't get a program to work. What should I do?

Technical questions should be directed to the authors at the LAPACK User Forum (preferred means of communication) or at scalapack@cs.utk.edu

Please tell us the type of machine on which the tests were run, the compiler and compiler options that were used, details of the BLACS library that was used, as well as the BLAS library, and a copy of the input file if appropriate.

Be prepared to answer the following questions:

  1. Have you run the BLAS, BLACS, PBLAS and ScaLAPACK test suites?
  2. Have you checked the appropriate errata lists on netlib?
  3. Have you attempted to replicate this error using the appropriate ScaLAPACK test code and/or one of the ScaLAPACK example routines?
  4. If you are using an optimized BLAS or BLACS library, have you tried using the reference implementations from netlib?


1.17) How can I unpack scalapack.tgz?

   gunzip scalapack.tgz
   tar xvf scalapack.tar

The compression program gzip (and gunzip) is Gnu software. If it is not already available on your machine, you can download it via anonymous ftp:

   ncftp prep.ai.mit.edu
   cd pub/gnu/
   get gzip-1.2.4.tar

See Question 1.11 for installation instructions.


1.18) What technical support for ScaLAPACK is available?

Technical questions and comments should be directed to the authors at the LAPACK User Forum (preferred means of communication) or at scalapack@cs.utk.edu.

See Question 1.16


1.19) How do I submit a bug report?

Technical questions should be directed to the authors at the LAPACK User Forum (preferred means of communication) or at scalapack@cs.utk.edu.

Be prepared to answer the questions as outlined in Question 1.15. Those are the first questions that we will ask!


1.20) How do I gather a distributed vector back to one processor?

There are several ways to accomplish this task.

  1. You can create a local array of the global size and each process will write his pieces of the matrix in the appropriate locations, and then you can do a call to the BLACS routine DGSUM2D to add all of them together and then leave the answer on one process or on all processes.
  2. You can modify SCALAPACK/TOOLS/pdlaprnt.f to write to an array instead of writing to a file.
  3. You can modify the routine pdlawrite.f from the example program http://www.netlib.org/scalapack/examples/scaex.tgz.
  4. You can create a second "context" containing only one process, and then call the redistribution routines in SCALAPACK/REDIST/SRC/ to redistribute the matrix to that process grid.

2) BLACS

2.1) What and where are the BLACS?

The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that may be implemented efficiently and uniformly across a large range of distributed memory platforms.

The length of time required to implement efficient distributed memory algorithms makes it impractical to rewrite programs for every new parallel machine. The BLACS exist in order to make linear algebra applications both easier to program and more portable. It is for this reason that the BLACS are used as the communication layer of ScaLAPACK.

For further information on the BLACS, please refer to the blacs directory on netlib, as well as the BLACS Homepage.


2.2) Is there a Quick Reference Guide to the BLACS available?

Yes, there is a postscript version of the Quick Reference Guide to the BLACS available.


2.3) How do I install the BLACS?

First, you must choose which underlying message-passing layer that the BLACS will use (MPI, PVM, NX, MPL, etc). Once this decision has been made, you download the respective gzip tar file.

An Installation Guide for the BLACS is provided, as well as a comprehensive BLACS Test Suite. In short, a user only needs to modify one file, Bmake.inc, to specify his compiler, compiler flags, and location of his MPI library. And then type make mpi to build the MPI BLACS library, for example. Example Bmake.inc files for various architectures are supplied in the BLACS/BMAKES subdirectory in the distribution. There are also scripts in BLACS/INSTALL which can be run to help the user to determine some of the settings in the Bmake.inc file.

It is highly recommended that the user run the BLACS Tester to ensure that his installation is correct, and that no bugs have been detected in the low-level message-passing layer. If you suspect an error, please consult the

file on netlib.


2.4) Are prebuilt BLACS libraries available?

Yes, prebuilt BLACS libraries are available for a variety of architectures and message-passing interfaces. Refer to

http://www.netlib.org/blacs/archives/
for a complete list of available prebuilt libraries.

2.5) Are example BLACS programs available?

Yes, example BLACS programs are available. Refer to

http://www.netlib.org/scalapack/examples/
for a list of available example programs.

3) BLAS


3.1) What and where are the BLAS?

The BLAS (Basic Linear Algebra Subprograms) are routines that provide standard building blocks for performing basic vector and matrix operations. The Level 1 BLAS perform scalar, vector and vector-vector operations, the Level 2 BLAS perform matrix-vector operations, and the Level 3 BLAS perform matrix-matrix operations. Because the BLAS are efficient, portable, and widely available, they are commonly used in the development of high quality linear algebra software, LAPACK for example.

The publications given in Section 3 define the specifications for the BLAS, and a Fortran77 reference implementation of the BLAS is located in the blas directory of Netlib, together with testing and timing software. For information on efficient versions of the BLAS, see Section 5.


3.2) Are there legal restrictions on the use of BLAS reference implementation software?

The reference BLAS is a freely-available software package. It is available from netlib via anonymous ftp and the World Wide Web. Thus, it can be included in commercial software packages (and has been). We only ask that proper credit be given to the authors.

Like all software, it is copyrighted. It is not trademarked, but we do ask the following:

If you modify the source for these routines we ask that you change the name of the routine and comment the changes made to the original.

We will gladly answer any questions regarding the software. If a modification is done, however, it is the responsibility of the person who modified the routine to provide support.


3.3) Publications/references for the BLAS?

  1. C. L. Lawson, R. J. Hanson, D. Kincaid, and F. T. Krogh, Basic Linear Algebra Subprograms for FORTRAN usage, ACM Trans. Math. Soft., 5 (1979), pp. 308--323.

  2. J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson, An extended set of FORTRAN Basic Linear Algebra Subprograms, ACM Trans. Math. Soft., 14 (1988), pp. 1--17.

  3. J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson, Algorithm 656: An extended set of FORTRAN Basic Linear Algebra Subprograms, ACM Trans. Math. Soft., 14 (1988), pp. 18--32.

  4. J. J. Dongarra, J. Du Croz, I. S. Duff, and S. Hammarling, A set of Level 3 Basic Linear Algebra Subprograms, ACM Trans. Math. Soft., 16 (1990), pp. 1--17.

  5. J. J. Dongarra, J. Du Croz, I. S. Duff, and S. Hammarling, Algorithm 679: A set of Level 3 Basic Linear Algebra Subprograms, ACM Trans. Math. Soft., 16 (1990), pp. 18--28.

New BLAS
  1. L. S. Blackford, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, R. Pozo, K. Remington, R. C. Whaley, An Updated Set of Basic Linear Algebra Subprograms (BLAS), ACM Trans. Math. Soft., 28-2 (2002), pp. 135--151.

  2. J. Dongarra, Basic Linear Algebra Subprograms Technical Forum Standard, International Journal of High Performance Applications and Supercomputing, 16(1) (2002), pp. 1--111, and International Journal of High Performance Applications and Supercomputing, 16(2) (2002), pp. 115--199.

3.4) Is there a Quick Reference Guide to the BLAS available?

Yes, the Quick Reference Guide to the BLAS is available in postscript and pdf.


3.5) Are optimized BLAS libraries available? Where can I find optimized BLAS libraries?

YES! Machine-specific optimized BLAS libraries are available for a variety of computer architectures. These optimized BLAS libraries are provided by the computer vendor or by an independent software vendor (ISV) (see list below). For further details, please contact your local vendor representative.

Alternatively, the user can download ATLAS to automatically generate an optimized BLAS library for his architecture. Some prebuilt optimized BLAS libraries are also available from the ATLAS site. Goto BLAS is also available for a given set of machines. Efficient versions of the Level 3 BLAS, based on an efficient matrix matrix multiplication routine, are provided by the GEMM-Based BLAS.

If all else fails, the user can download a Fortran77 reference implementation of the BLAS from netlib. However, keep in mind that this is a reference implementation and is not optimized.

BLAS vendor library List
Last updated: July 20, 2005

Vendor

URL

AMD ACML
Apple Velocity Engine
Compaq CXML
Cray libsci
HP MLIB
IBM ESSL
Intel MKL
NEC PDLIB/SX
SGI SCSL
SUN Sun Performance Library


3.6) Where can I find Java BLAS?

Yes, Java BLAS are available. Refer to the following URLs: Java LAPACK and JavaNumerics The JavaNumerics webpage provides a focal point for information on numerical computing in Java.


3.7) Is there a C interface to the BLAS?

Yes, a C interface to the BLAS was defined in the BLAS Technical Forum Standard. The source code is also available.


3.8) Are prebuilt Fortran77 ref implementation BLAS libraries available?

Yes, you can download a prebuilt Fortran77 reference implementation BLAS library or compile the Fortran77 reference implementation source code of the BLAS from netlib.

Note that this is extremely slow and thus we do not recommend it: you should use optimized BLAS whenever possible, see FAQ 5.


3.9) What about shared memory machines? Are there multithreaded versions of the BLAS available?

ATLAS, Goto BLAS (two threads only) and most of the BLAS library available via vendors are multithreaded.

scalapack@cs.utk.edu   |   ScaLAPACK User Forum