## This implementation of SUMMA was first suggested by:

- R. Agarwal, F. Gustavson, and M. Zubair, A High Performance Matrix Multiplication Algorithm on a Distributed-Memory Parallel Computer, Using Overlapped Communication, IBM Journal of Research and Development, Vol. 38, No. 6, pp. 673--681, 1994.

## For a scalability analysis of this algorithm see:

- R. van de Geijn, and J. Watts, SUMMA: Scalable Universal Matrix Multiplication Algorithm, UT Tech Report CS-95-286, LAPACK Working Note #96, 1995.

Previous slide | Next slide | Back to first slide | View graphic version |