We started with a Fortran 77 implementation of Algorithm 1. This code is built using the BLAS and LAPACK for the basic matrix operations, such as LU decomposition, triangular inversion, QR decomposition and so on. Initially, we tested our software on SUN and IBM RS6000 workstations, and then the CRAY. Some preliminary performance data of the matrix sign function based algorithm have been reported in . In this report, we will focus on the implementation and performance evaluation of the algorithms on distributed memory parallel machines, namely the Intel Delta and the CM-5.
We have implemented Algorithm 1, and collected a large set of data for the performance of the primitive matrix operation subroutines on our target machines. More performance evaluation and comparison of these two algorithms and their applications are in progress.