next up previous
Next: About this document Up: The Spectral Decomposition of Previous: Acknowledgements


E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK Users' Guide, Release 1.0. SIAM, Philadelphia, 1992. 235 pages.

E. Anderson, Z. Bai, and J. Dongarra. Generalized QR factorization and its applictions. Lin. Alg. Appl., 162-164:243-271, 1992.

L. Auslander and A. Tsao. On parallelizable eigensolvers. Advances in Applied Mathematics, 13:253-261, 1992.

Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox, Part II. in preparation.

Z. Bai and J. Demmel. On a block implementation of Hessenberg multishift QR iteration. International Journal of High Speed Computing, 1(1):97-112, 1989. (also LAPACK Working Note #8).

Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox, Part I. In Proceedings of the Sixth SIAM Conference on Parallel Proceesing for Scientific Computing. SIAM, 1993. Long version available as UC Berkeley Computer Science report via anonymous ftp from, directory pub/tech-reports/csd/csd-92-718.

Z. Bai, J. Demmel, and M. Gu. Inverse free parallel spectral divide and conquer algorithms for nonsymmetric eigenproblems. Computer Science Division Report UCB//CSD-94-793, UC Berkeley, February 1994. available by anonymous ftp to in directory pub/tech-reports/csd/csd-94-79.

A. N. Beavers and E. D. Denman. A computational method for eigenvalue and eigenvectors of a matrix with real eigenvalues. Numer. Math., 21:389-396, 1973.

C. Bischof and X. Sun. A divide and conquer method for tridiagonalizing symmetric matrices with repeated eigenvalues. MCS Report P286-0192, Argonne National Lab, 1992.

A. Ya. Bulgakov and S. K. Godunov. Circular dichotomy of the spectrum of a matrix. Siberian Math. J., 29(5):734-744, 1988.

R. Byers. Numerical stability and instability in matrix sign function based algorithms. In C. Byrnes and A. Lindquist, editors, Computational and Combinatorial Methods in Systems Theory, pages 185-200. North-Holland, 1986.

R. Byers. Solving the algebraic Riccati equation with the matrix sign function. Lin. Alg. Appl., 85:267-279, 1987.

S. Chakrabarti, J. Demmel, and K. Yelick. On the benefit of mixed parallelism. Computer science division, University of California, 1994. submitted to IPPS.

T. Chan. Rank revealing QR factorizations. Lin. Alg. Appl., 88/89:67-82, 1987.

J. Choi, J. Dongarra, and D. Walker. PB-BLAS: A set of Parallel Block Basic Linear Algebra Subprograms. University of Tennessee, Knoxville, TN, 1993. available in postscript from netlib/scalapack.

J. Choi, J. Dongarra, and D. Walker. PUMMA: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers. Computer Science Dept. Technical Report CS-93-187, University of Tennessee, Knoxville, 1993. (LAPACK Working Note #57).

M. Chu. A note on the homotopy method for linear algebraic eigenvalue problems. Lin. Alg. Appl, 105:225-236, 1988.

T. Coleman and P. Plassman. A parallel nonlinear least-squares solver: Theoretical analysis and numerical results. SIAM J. Sci. Comput., 13(3):771-793, 1992.

J. Demmel. Trading off parallelism and numerical stability. In G. Golub M. Moonen and B. de Moor, editors, Linear Algebra for Large Scale and Real-Time Applications, pages 49-68. Kluwer Academic Publishers, 1993. NATO-ASI Series E: Applied Sciences, Vol. 232; Available as via anonymous ftp from, in directory pub/tech-reports/csd/csd-92-702.

J. Demmel, M. Heath, and H. van der Vorst. Parallel numerical linear algebra. In A. Iserles, editor, Acta Numerica, volume 2. Cambridge University Press, 1993.

J. Dongarra. Performance of various computers using standard linear equations software. Computer science dept. technical report, University of Tennessee, Knoxville, TN, January 1994. up-to-date version available in netlib/benchmark.

J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling. A set of Level 3 Basic Linear Algebra Subprograms. ACM Trans. Math. Soft., 16(1):1-17, March 1990.

J. Dongarra, J. Du Croz, S. Hammarling, and Richard J. Hanson. An Extended Set of FORTRAN Basic Linear Algebra Subroutines. ACM Trans. Math. Soft., 14(1):1-17, March 1988.

J. Dongarra and M. Sidani. A parallel algorithm for the non-symmetric eigenvalue problem. SIAM J. Sci. Comp., 14(3):542-569, May 1993.

J. Dongarra, R. van de Geijn, and C. Whaley. A Users' Guide to the BLACS. University of Tennessee, Knoxville, TN, 1993. available in postscript from netlib/scalapack.

J. Dongarra and D. Walker. The design of linear algebra libraries for high performance computers. Computer Science Dept. Technical Report CS-93-188, University of Tennessee, Knoxville, June 1993. (LAPACK Working Note #58).

A. Dubrulle. The multishift QR algorithm: is it worth the trouble? Palo Alto Scientific Center Report G320-3558x, IBM Corp., 1530 Page Mill Road, Palo Alto, CA 94304, 1991.

P. Eberlein. On the Schur decomposition of a matrix for parallel computation. IEEE Trans. Comput., 36:167-174, 1987.

G. A. Geist and G. J. Davis. Finding eigenvalues and eigenvectors of unsymmetric matrices using a distributed memory multiprocessor. Parallel Computing, 13(2):199-209, 1990.

S. K. Godunov. Problem of the dichotomy of the spectrum of a matrix. Siberian Math. J., 27(5):649-660, 1986.

M. Gu and S. Eisenstat. An efficient algorithm for computing a rank-revealing QR decomposition. Computer Science Dept. Report YALEU/DCS/RR-967, Yale University, June 1993.

G. Henry and R. van de Geijn. Parallelizing the QR algorithm for the unsymmetric algebraic eigenvalue problem: myths and reality. Computer Science Dept. Technical Report CS-94-244, University of Tennessee, Knoxville, August 1994. (LAPACK Working Note #79).

N. J. Higham. Computing the polar decomposition - with applications. SIAM J. Sci. Stat. Comput., 7:1160-1174, 1986.

P. Hong and C. T. Pan. The rank revealing QR and SVD. Math. Comp., 58:575-232, 1992.

J. Howland. The sign matrix and the separation of matrix eigenvalues. Lin. Alg. Appl., 49:221-232, 1983.

S. Huss-Lederman, A. Tsao, and G. Zhang. A parallel implementation of the invariant subspace decomposition algorithm for dense symmetric matrices. In Proceedings of the Sixth SIAM Conference on Parallel Proceesing for Scientific Computing. SIAM, 1993.

C. Kenney and A. Laub. Rational iteration methods for the matrix sign function. SIAM J. Mat. Anal. Appl., 21:487-494, 1991.

C. Lawson, R. Hanson, D. Kincaid, and F. Krogh. Basic Linear Algebra Subprograms for Fortran usage. ACM Trans. Math. Soft., 5:308-323, 1979.

T.-Y. Li and Z. Zeng. Homotopy-determinant algorithm for solving nonsymmetric eigenvalue problems. Math. Comp., 59(200):483-502, October 1992.

T.-Y. Li, Z. Zeng, and L. Cong. Solving eigenvalue problems of nonsymmetric matrices with real homotopies. SIAM J. Num. Anal., 29(1):229-248, 1992.

S. Lillevik. The Touchstone 30 Gigaflop DELTA prototype. In Sixth Distributed Memory Computing Conference Proceedings. IEEE Computer Society Press, 1991.

C-C. Lin and E. Zmijewski. A parallel algorithm for computing the eigenvalues of an unsymmetric matrix on an SIMD mesh of processors. Department of Computer Science TRCS 91-15, University of California, Santa Barbara, CA, July 1991.

A. Littlefield. Characterizing and tuning communication performance on the Touchstone DELTA and iPSC/860. In Proceedings of the 1992 Intel User's Group Meeting, Dallas, TX, October 4-7 1992. Intel Corp.

A. N. Malyshev. Parallel algorithm for solving some spectral problems of linear algebra. Lin. Alg. Appl., 188,189:489-520, 1993.

C. Paige. Some aspects of generalized QR factorization. In M. Cox and S. Hammarling, editors, Reliable Numerical Computations. Clarendon Press, Oxford, 1990.

J. Roberts. Linear model reduction and solution of the algebraic Riccati equation. Inter. J. Control, 32:677-687, 1980.

A. Sameh. On Jacobi and Jacobi-like algorithms for a parallel computer. Math. Comp., 25:579-590, 1971.

G. Shroff. A parallel algorithm for the eigenvalues and eigenvectors of a general complex matrix. Num. Math., 58:779-805, 1991.

G. W. Stewart. A Jacobi-like algorithm for computing the Schur decomposition of a non-Hermitian matrix. SIAM J. Sci. Stat. Comput., 6:853-864, 1985.

G. W. Stewart. A parallel implementation of the QR algorithm. Parallel Computing, 5:187-196, 1987.

G. W. Stewart. Updating a rank-revealing ULV decomposition. SIAM J. Mat. Anal. Appl., 14(2):494-499, April 1993.

Thinking Machines Corporation. CM Fortran Reference Manual, December 1992.

Thinking Machines Corporation. The Connection Machine CM-5 Technical Summary, 1992.

Thinking Machines Corporation. CMSSL for CM Fortran: CM-5 Edition, version 3.1, 1993.

R. van de Geijn. Implementing the QR Algorithm on an Array of Processors. PhD thesis, University of Maryland, College Park, August 1987. Computer Science Department Report TR-1897.

R. van de Geijn and D. Hudson. Efficient parallel implementation of the nonsymmetric QR algorithm. In J. Gustafson, editor, Hypercube Concurrent Computers and Applications. ACM, 1989.

D. Watkins. Shifting strategies for the parallel QR algorithm. Dept. of pure and applied math. report, Washington State Univ., Pullman, WA, 1992.

R. Clint Whaley. Basic linear algebra communication subroutines: Analysis and implementation across multiple parallel architectures. Technical report, University of Tennessee, Knoxville, June 1994. (LAPACK Working Note #73).

Appendix: Performance data

In this appendix, we record performance data of PUMMA (version 1.0) for matrix multiplication and ScaLAPACK 1.0 (beta version) matrix inversion (LU factorization plus triangular matrix inversion), QR decomposition with and without column pivoting on Intel Touchstone Delta system and performance data of CMSSL 3.2 subroutines for matrix multiplication, matrix inversion, QR decomposition with and without column pivoting on 32 PEs VUs CM-5. Parts of data are used to draw Figures 2 and 4.

Table 10: Performance of matrix inversion (LU + Triangular inversion), blocksize=30. 

Table 9: Performance of PUMMA (version 1.0) matrix multiplication, use BLACS, blocksize=10  

Table 12: Performance of QR decomposition with column pivoting.  

Table 11: Performance of QR decomposition method for solving the least squares problem (QR decomposition + Triangular solver), blocksize=6. 

Table 14: Performance of matrix multiplication and matrix inversion, and QRP, CMSSL version 3.2  

Table 13: Backward accuracy, timing in seconds and megaflops of the SDC algorithm with Newton iteration on a 32 PEs with VUs CM-5.  

Jack Dongarra
Mon Jan 9 11:07:50 EST 1995