next up previous
Next: The BLACS as an Up: PerformancePortability, and Scalability Previous: The BLAS as the

Block Cyclic Data Layout as the Key to Load Balancing and Software Reuse

The way the data is distributed over the memory hierarchy of a computer is of fundamental importance to load balancing and software reuse. The block cyclic data layout allows a reduction of the overhead due to load imbalance and data movement. Block-partitioned algorithms are used to maximize the local processor performance.

Since the data decomposition largely determines the performance and scalability of a concurrent algorithm, a great deal of research [10, 21, 23, 25] has focused on different data decompositions [4, 6, 26]. In particular, the two-dimensional block cyclic distribution [28] has been suggested as a possible general-purpose basic decomposition for parallel dense linear algebra libraries [13, 24, 30], such as ScaLAPACK.

Block cyclic distribution is beneficial because of its scalability [17], load balance, and communication [24] properties. The block-partitioned computation then proceeds in consecutive order just like a conventional serial algorithm. This essential property of the block cyclic data layout explains why the ScaLAPACK design has been able to reuse the numerical and software expertise of the sequential LAPACK library.

Jack Dongarra
Sat Feb 1 08:18:10 EST 1997