Let *A* be a square *n*-by-*n* matrix. A scalar is called
an **eigenvalue** and a non-zero column vector *v* the corresponding
**right eigenvector** if . A nonzero column vector *u*
satisfying
is called the **left eigenvector**.
The first basic task
of the routines described in this section
is to compute, for a given matrix *A*, all *n* values of and,
if desired, their associated right eigenvectors *v* and/or
left eigenvectors *u*.

A second basic task is to compute the **Schur factorization** of a matrix *A*.
If *A* is complex, then its Schur factorization is , where
*Z* is unitary and *T* is upper triangular. If *A* is real, its
Schur factorization is , where *Z* is orthogonal.
and *T* is upper quasi-triangular (1-by-1 and 2-by-2 blocks on
its diagonal).
The columns of *Z* are called the **Schur vectors ** of *A*.
The eigenvalues of *A* appear on the diagonal of *T*; complex conjugate
eigenvalues of a real *A* correspond to 2-by-2 blocks on the diagonal
of *T*.

We are implementing two parallel algorithms for computing the Schur factorization. The first is a parallelization of the conventional sequential algorithm, and the second is a novel divide-and-conquer algorithm. We discuss each briefly in turn.

The ScaLAPACK solution has elements in common with LAPACK. For
example, both reduce to upper Hessenberg form initially. Both use an implicit
*QR* algorithm based approach. The *QR* algorithm involves bulge chasing,
and LAPACK uses a single bulge. Parallelism is obtained in ScaLAPACK
by chasing multiple bulges that are staggered far enough apart so that all
nodes have computation to perform (except during pipeline start-up and
wind-down.)
The broadcast of the Householder reflections which is critical to any
*QR* algorithm is blocked independently of the distributed block size.
The application of the Householder reflections are also done in a block
fashion (independent of the above broadcast blocking) to increase data re-use.

Our second algorithm is based on the *sign function* of a
matrix [3, 4, 5]. This algorithm
offers the flexibility of computing just part of the Schur form
(corresponding, say, to the eigenvalues with positive real parts),
uses more highly parallelized building blocks from ScaLAPACK than
the HQR algorithm in the last paragraph (matrix inversion, matrix
multiply and *QR* factorization), but also requires more floating
point operations; it remains to be seen for which problems and on
which machines which algorithm is faster. The sign function
also entails a dynamic load balancing scheme [7] to
implement its divide-and-conquer approach most efficiently.

Thu Jul 25 15:38:00 EDT 1996