CS 594 - Applications of Parallel Computing

Assignment 3

Due February 23rd, 2001



For this assignment, we will be coding an optimized memory-hierarchy-cognizant matrix-matrix multiply routine. For simplicity, we will only require square matrices (worrying about the non-square case, essential for a good library code, can be a bit time-consuming). The goal of the assignment is to:


        Write a matrix routine C = C + A*B for square matrices and

        Get as close to peak performance as possible, while still getting the correct results (I would like you to verify you are getting the correct results).


Rewrite your matrix multiply using Strassen's method as discussed in class. Use the manufactured version of DGEMM to perform the matrix multiply parts you will need. Also compare the performance of your version of Strassen's matrix multiply with the ATLAS version. Be sure that you include verification that you have the correct result.



1.     J. Dongarra, P. Mayes, G. Radicati, The IBM RISC System/6000 and Linear Algebra Operations, UT, CS-90-122, December 1990. http://www.netlib.org/lapack/lawns/lawn28.ps


2.     R. Whaley, A. Petitet, and Jack Dongarra, Automated Empirical Optimization of Software and the ATLAS Project, http://www.netlib.org/utk/people/JackDongarra/PAPERS/atlas_pub.pdf