next up previous contents index
Next: Solution of Common Numerical Up: Performance of Selected BLACS Previous: Performance of Selected BLACS

Performance of Selected PBLAS routines


The performance of Level 2 PBLAS routines is dependent on the performance of Level 2 BLAS routines which is dependent on the bulk transfer rate from main memory.

Table 5.6: Speed in Mflop/s for the PBLAS matrix-vector multiply routine PSGEMV/PDGEMV

Table 5.6  shows execution rates for the 64-bit matrix-vector multiply PBLAS routine PSGEMV /PDGEMV . The rates listed are for a matrix-vector product tex2html_wrap_inline16483, where A is a square matrix of order N and x and y are vectors that are both distributed over a process column.

The Level 3 PBLAS are not necessarily limited by memory bandwidth because they perform many flops for each word involved. The flop rate is correspondingly higher. Table 5.7 

Table 5.7: Speed in Mflop/s for the PBLAS matrix-matrix multiply routine PSGEMM/PDGEMM

shows the performance results obtained by the general matrix-matrix multiply PBLAS routine PSGEMM /PDGEMM . These results have been obtained for the matrix-matrix multiply operation tex2html_wrap_inline16435, where A, B, and C are square matrices of order N.

Susan Blackford
Tue May 13 09:21:01 EDT 1997