The performance results in Figures 6, 8, and 10 can be used to assess the scalability of the factorization routines. In general, concurrent efficiency, , is defined as the concurrent speedup per process. That is, for the given problem size, , on the number of processes used, ,

where is the time for a problem of size to run on processes, and is the time to run on one process using the best sequential algorithm.

Another approach to investigate the efficiency is to see how the performance per process degrades as the number of processes increases for a fixed grain size, i. e., by plotting isogranularity curves in the plane, where is the performance. Since

the scalability for memory-constrained problems can readily be accessed by the extent to which the isogranularity curves differ from linearity. Isogranularity was first defined in [24], and later explored in [20][21].

Figure 11 shows the isogranularity plots for the ScaLAPACK factorization routines on the Paragon. The matrix size per process is fixed at 5 and 20 Mbytes on the Paragon. Refer to Figures 6, 8, and 10 for block size and process grid size characteristics. The near-linearity of these plots shows that the ScaLAPACK routines are quite scalable on this system.

**Figure 11:** Scalability of factorization routines on the Intel Paragon (5, 20 Mbytes/node).

Fri Apr 28 09:37:26 EDT 1995