**CS 594 -
Applications of Parallel Computing**

**Assignment 3**

**Due February
23rd, 2001**

** **

For
this assignment, we will be coding an optimized memory-hierarchy-cognizant
matrix-matrix multiply routine. For simplicity, we will only require square
matrices (worrying about the non-square case, essential for a good library
code, can be a bit time-consuming). The goal of the assignment is to:

·
Write
a matrix routine C = C + A*B for square matrices and

·
Get
as close to peak performance as possible, while still getting the correct
results (I would like you to verify you are getting the correct results).

Rewrite
your matrix multiply using Strassen's method as discussed in class. Use the
manufactured version of DGEMM to perform the matrix multiply parts you will
need. Also compare the performance of your version of Strassen's matrix
multiply with the ATLAS version. Be sure that you include verification that you
have the correct result.

Reading:

1.
J.
Dongarra, P. Mayes, G. Radicati, The IBM RISC System/6000 and Linear Algebra Operations,
UT, CS-90-122, December 1990. http://www.netlib.org/lapack/lawns/lawn28.ps

2.
R.
Whaley, A. Petitet, and Jack Dongarra, Automated Empirical Optimization of Software and the ATLAS Project, http://www.netlib.org/utk/people/JackDongarra/PAPERS/atlas_pub.pdf