
Speed in megaflops of Level 2 and Level 3 BLAS operations on a CRAY
Y-MP
(all matrices are of order 500; U is upper triangular)

Here, A, B and C are matrices, x and y are vectors, and
and
are scalars.


and equating coefficients of the
column, we obtain:

Hence, if
has already been computed, we can compute
and
from the equations:

Here is the body of the code of the LINPACK routine SPOFA, which implements the above method:
DO 30 J = 1, N
INFO = J
S = 0.0E0
JM1 = J - 1
IF (JM1 .LT. 1) GO TO 20
DO 10 K = 1, JM1
T = A(K,J) - SDOT(K-1,A(1,K),1,A(1,J),1)
T = T/A(K,K)
A(K,J) = T
S = S + T*T
10 CONTINUE
20 CONTINUE
S = A(J,J) - S
C ......EXIT
IF (S .LE. 0.0E0) GO TO 40
A(J,J) = SQRT(S)
30 CONTINUE
DO 10 J = 1, N
CALL STRSV( 'Upper', 'Transpose', 'Non-unit', J-1, A, LDA,
$ A(1,J), 1 )
S = A(J,J) - SDOT( J-1, A(1,J), 1, A(1,J), 1 )
IF( S.LE.ZERO ) GO TO 20
A(J,J) = SQRT( S )
10 CONTINUE

To derive a block form of Cholesky factorization, we write the defining equation in partitioned form thus:

Equating submatrices in the second block of columns, we obtain:

Hence, if
has already been computed, we can compute
as
the solution to the equation

by a call to the Level 3 BLAS routine STRSM; and then we can compute
from

Speed in megaflops of Cholesky factorization
for n =
500

DO 10 J = 1, N, NB
JB = MIN( NB, N-J+1 )
CALL STRSM( 'Left', 'Upper', 'Transpose', 'Non-unit', J-1, JB,
$ ONE, A, LDA, A(1,J), LDA )
CALL SSYRK( 'Upper', 'Transpose', JB, J-1, -ONE, A(1,J), LDA,
$ ONE, A(J,J), LDA )
CALL SPOTF2( 'Upper', JB, A(J,J), LDA, INFO )
IF( INFO.NE.0 ) GO TO 20
10 CONTINUE
This code runs at 49 megaflops on a 3090, more than double the speed of the LINPACK code. On a CRAY Y-MP, the use of Level 3 BLAS squeezes a little more performance out of one processor, but makes a large improvement when using all 8 processors.

Speed in megaflops of SGETRF/DGETRF for square matrices of order n

Speed in megaflops of SPOTRF/DPOTRF for matrices of order n with UPLO = `U'

Speed in megaflops of SSYTRF for matrices of order n with UPLO = `U' on a CRAY-2


where v is a column vector and r is a scalar. This leads to an algorithm with very good vector performance, especially if coded to use Level 2 BLAS.

Speed in megaflops of SGEQRF/DGEQRF for square matrices of order n
