How can we provide portable software for dense linear algebra computations that is efficient on a wide range of modern distributed-memory concurrent computers? Answering this question--and providing the appropriate software--has been an objective of the ScaLAPACK project.
The ScaLAPACK software has been designed specifically to achieve high efficiency for a wide range of modern distributed-memory concurrent computers. Examples of such machines include the Cray T3D and T3E, the IBM Scalable POWERparallel SP series, the Intel iPSC and Paragon, the nCube-2/3, networks and clusters of workstations (NoWs and CoWs), and ``piles'' of PCs (PoPCs).
For clarity of discussion, we consider this large diversity of architectures under the single model logical distributed-memory computer representation. This model consists of p processors that are connected by a message-passing interconnection network. Each processor has its own memory, called the local memory, which is accessible only to that processor. The time to access remote memory is longer than the time to access local memory. Such a computer is often referred to as a Non-Uniform Memory Access (NUMA) machine.
For the sake of simplicity, we also assume that all processors can be treated equally in terms of local performance and that the communication rate between two processors is independent from the processors considered. The local processor performance and the network performance and connectivity are therefore the main machine factors affecting the performance achieved by the ScaLAPACK drivers.