This section describes how one can estimate the execution time of a ScaLAPACK routine on a given platform, using Equation 5.1 and the values provided in table 5.5 and table 5.8. By comparing this estimate with experimental data, the user can determine whether reasonable performance has been achieved and can (possibly) identify the performance bottlenecks, if any.
For linear system solvers, the estimate typically is accurate to within 50% for moderate-sized problems (i.e., 160,000 or more matrix elements per node). For eigensolvers, the estimate may be low by a factor of 2 for moderate-sized problems and by more than that for smaller problems. The eigensolvers take longer because they involve matrix-vector flops, as well as matrix-matrix flops, and involve substantial numbers of o() flops that are not included in the approximation. The accuracy of performance estimates increases with the problem size. Unfortunately, because ScaLAPACK eigensolvers require more memory than the other ScaLAPACK drivers, large problems cannot be solved; hence, execution times for small and medium-sized problems (rather than medium-sized and large problems) are reported.
Table 5.16: Estimated (Est) versus obtained (Obt) Mflop/s rates of PDGESV and PDPOSV on P nodes of the IBM SP2 computer for matrices of order N and a block size (NB) equal to 50
Table 5.16 shows the estimated versus obtained Mflop/s rates for two ScaLAPACK driver routines solving linear systems of equations on the IBM Scalable POWERparallel 2 computer. The results show that for these drivers the estimated execution times are within approximately 35 % of the experimental data on the SP2. (The estimated times for the symmetric eigensolvers and SVD codes would not be as accurate.)