next up previous contents index
Next: Balancing Matrices   T. Chen Up: Non-Hermitian Eigenvalue Problems Previous: Non-Hermitian Eigenvalue Problems   Contents   Index

Introduction

In this chapter we discuss the non-Hermitian eigenvalue problem (NHEP)
\begin{displaymath}
Ax = \lambda x,
\end{displaymath} (117)

where the square matrix $A \neq A^{\ast}$. $x \neq 0$ is called a right eigenvector. A vector $y \neq 0$ satisfying
\begin{displaymath}
y^{\ast} A = \lambda y^{\ast}
\end{displaymath} (118)

is called a left eigenvector of $A$.

For a presentation of relevant theory for NHEPs, as well as pointers to literature, we refer to §2.5. In particular, we mention that the perturbation theory for these problems is delicate. One should keep in mind that a small norm of the residual $A\widehat{x}-\widehat{\lambda}\widehat{x}$ for a computed eigenpair $\widehat{\lambda}, \widehat{x}$ does not necessarily imply small errors in the computed values. For more information, see §2.5 and §7.13. Here we will give a brief introduction to those aspects that play a role in the selection of the algorithms. This will be followed by short characterizations of the different classes of methods, covered in this chapter.

In contrast to a Hermitian matrix, a non-Hermitian matrix does not have an orthogonal set of eigenvectors; in other words, a non-Hermitian matrix $A$ can in general not be transformed by an orthogonal matrix $Q$ to diagonal form $D=Q^\ast A Q$. Most non-Hermitian matrices can be transformed by a nonorthogonal $X$ to diagonal form $D=X^{-1}AX$, but there exist matrices for which even this is not possible. Such matrices can be viewed as limit cases for which $X$ converges to a singular operator, and these matrices do not have a complete set of eigenvectors; they are called defective. This reveals a source for numerical instability: if $X$ is in some sense close to a singular operator, then the transformation may be expected to be sensitive to perturbations in $A$. Such perturbations are introduced by the process for the computation of $X$ and $D$. Therefore, it is of great importance to work with orthogonal, or close to orthogonal, transformations as often as possible. It is well known that for any non-Hermitian matrix $A$ there exists a unitary matrix $U$ that transforms it to upper triangular form

\begin{displaymath}
T=U^\ast A U.
\end{displaymath} (119)

The matrix $T$ is called the Schur form of $A$. The eigenvalues of $A$ appear along the diagonal of $T$. Furthermore, the matrix $U$ can be orthogonally transformed so that the eigenvalues of $A$ occur in a prescribed order along the diagonal of $R$. If $z$ is an eigenvector of $R$ corresponding to the eigenvalue $\lambda$, $Tz=\lambda z$, it follows that

\begin{displaymath}
A(Uz)=\lambda (Uz),
\end{displaymath}

so that $Uz$ is an eigenvector of $A$ corresponding to the eigenvector $\lambda$.

Reduction to Schur form can be done but it has its price. Let $n$ be the order of a non-Hermitian matrix $A$. For $n> 3$, there is in general no finite algorithm that reduces $A$ to Schur form (if we exclude trivial cases, such as where $A$ is already upper triangular). The usual approach for matrices of modest order, say $n\le 1000$, is to reduce it first to upper Hessenberg form which can be done in a finite number of steps. Subsequently this upper Hessenberg matrix is reduced in an iterative manner (QR iterations) to Schur form. See §7.3 for more details and for pointers to software.

The main problem with this standard approach is that the computational costs are proportional to $n^3$ (hence the limitation to matrices of order of a few thousands at most) and it requires storage proportional to $n^2$ elements. In general it is not possible to exploit sparsity of the matrix $A$ in order to reduce the storage requirements. For these reasons alternatives have been proposed. These alternatives are all completely iterative, that is, there is no finite direct reduction step. They are usually only suitable for the computation of a handful of interesting eigenpairs. We will now briefly summarize the iterative methods, to be presented in this chapter.

Single- and multiple-vector iterations,
§7.4. The largest eigenvalue in absolute value, and the corresponding eigenvector, of a larger matrix can be computed iteratively by the Power iteration. This is only advisable if this largest eigenvalue is significantly larger, in a relative sense, than the absolute values of the other eigenvalues. In order to create a matrix with well-separated largest eigenvalue, one usually works formally with the operator $(A - \sigma I)^{-1}$ for a user-defined $\sigma$ close to the desired eigenvalue. This is referred to as inverse iteration. The method has been generalized to a block method for a set of eigenvalues that is well separated from the remaining part of the spectrum. These methods, members of the family of single- and multiple-vector iterations, work well when the desired eigenvalues are well separated from the unwanted ones. Even then, however, the methods can hardly compete with more modern subspace methods and the only niche for their usage seems to be when one cannot afford storage for more than two iteration vectors.

Arnoldi methods,
§7.5, §7.6, and §7.7. Modern subspace iteration methods attempt to build a subspace that is rich in the eigenvectors that correspond to the wanted eigenvalues. In fact, they can be interpreted as methods that keep a number of vectors that are generated in the power method. With respect to this basis for a subspace of restricted dimension they usually reduce the matrix to a more manageable form. In the Arnoldi method one starts with an initial guess $v$ and generates an orthonormal basis for a Krylov subspace of dimension $m$:

\begin{displaymath}
{\cal K}^m(A;v) \equiv \mbox{span}\{v, Av, A^2v, \ldots , A^{m-1}v \}.
\end{displaymath}

(As for the power method one may also use the shifted inverse operator $(A - \sigma I)^{-1}$.)

The orthonormalization process leads to an $m$ by $m$ upper Hessenberg matrix $H_m$ (the approach is only practical for $m\ll n$) and this matrix can be interpreted as a suitably reduced form of the matrix $A$ restricted to the Krylov subspace. The eigenvalues of $H_m$ are approximations for some of the eigenvalues of $A$ and they can be computed by reducing $H_m$ to Schur form by the same standard method as discussed above for small dense matrices. An introduction to the Arnoldi method is given in §7.5. Clever restart techniques have been designed to keep memory requirements and computational overhead as low as possible: these techniques are known as implicitly restarted Arnoldi methods (IRAMs) and are discussed in §7.6. The computational overhead in the Arnoldi methods is not very suitable for parallelism, in particular not for some distributed memory computers. In order to create a larger ratio between computation and memory references block variants have been suggested. These block Arnoldi variants are discussed in §7.7.

Lanczos methods,
§7.8, §7.9, §7.10, and §7.11. The Arnoldi methods work with subspaces and one has to store the vectors that represent a basis for the current subspace. Moreover, adding a new vector to the basis involves the orthogonalization of this new vector with respect to all the vectors of the basis. For symmetric $A$ there is a simple three-term recurrence that generates a basis for the subspace, and if one is interested in the eigenvalues only, then only the last three basis vectors have to be stored. A variant of this Lanczos process is also available, but then one has to give up the orthogonality of the basis vectors. Hence economy in memory and computational overhead comes at the risk of instability. The original unsymmetric Lanczos method, or simply two-sided Lanczos, suffers also from (near) breakdowns. Despite these drawbacks, the method has received much attention and variants have been suggested that have been proven to be efficient and useful for relevant problems. The method, and several enhancements to it, have been described in §7.8, along with appropriate software.

A block version of the Lanczos method, and an implementation of it called ABLE, are presented in §7.9. ABLE is an adaptively blocked method designed to cure breakdown, and it attempts to eliminate the causes of slow convergence (such as a loss of biorthogonality, and clusters of nearby eigenvalues). Deflation within Krylov subspaces is nicely incorporated in yet another variant, the band Lanczos method, presented in §7.10. This is specially tailored for the reduced-order modeling of multiple-input and multiple-output linear dynamical systems of high dimensions.

A variant of the two-sided Lanczos method for complex symmetric systems is discussed in §7.11. With respect to the unsymmetric case, this variant reduces CPU time and computer storage by a factor of 2.

Jacobi-Davidson method,
§7.12. The Arnoldi and Lanczos methods are most effective when applied with shift-and-invert; that is, systems of the form $(A-\sigma I)x=b$ have to be solved efficiently and accurately. If neither of this is practical, but if some reasonable approximation for $x=(A-\sigma I)^{-1}b$ is cheaply available, then the Jacobi-Davidson methods can prove useful. These methods work with orthogonal transformations, just as the Arnoldi methods, but in the construction of an orthonormal basis for the involved subspace there is no attempt to create a special structure for the reduced matrix. Therefore, the computational overhead is larger than for Arnoldi. All the possible gain has to come from an effective and cheap approximation for the shift-and-invert step.

In Table 7.1 we present a summary of the various classes of methods. The table is not intended to replace further reading; it serves only as an indication of where to start and it may help the reader to follow a certain path in discovering which method serves which need best. The actual performance of each method depends, often critically, on many parameters, and one method that is best for one class of problems may encounter convergence problems for some particular matrices.



Table 7.1: Summary of algorithms for NHEPs
Algorithm Mode FL-$\lambda$ $\lambda$-ext $\lambda$-int Memory/overhead
Power $A$ Slow No No Low
  $(A - \sigma I)^{-1}$ Yes No No Low
Subspace $A$ Yes Yes No Modest
  $(A - \sigma I)^{-1}$ Yes Yes Yes Modest
Arnoldi $A$ Yes Yes No Modest
(IRAM) $(A - \sigma I)^{-1}$ Yes Yes Yes Modest
Lanczos $A$ Yes Yes No Low
  $(A - \sigma I)^{-1}$ Yes Yes Yes Low
Block Lan. $A$ Yes Yes No Modest
  $(A - \sigma I)^{-1}$ Yes Yes Yes Modest
Jac.-Dav. $A$ Yes Yes Slow Modest
  $(A - \sigma I)^{-1}$ Yes Yes Yes Modest
  Precond Yes Yes Yes Modest

Explanation of Table 7.1

FL-$\lambda$:
A few well-isolated largest (in absolute value) eigenvalues.
$\lambda$-ext:
(A group of) eigenvalues at the exterior of the spectrum.
$\lambda$-int:
(A group of) eigenvalues in the interior of the spectrum, near a specified target.
$A$:
Operations with (shifted) $A$.
$(A - \sigma I)^{-1}$:
With shift-and-invert operations; that is, one needs an efficient solution mechanism for determining $x$ from $(A-\sigma I)x=b$ ($\sigma$ and $b$ depend on the method).
Precond:
Operations with a preconditioner for $A-\sigma I$; that is, inexact solution of $(A-\sigma I)x=b$ is permitted, which gives possibilities for more efficiency (the success depends of course on the quality of the preconditioner).


Table: Comparison of algorithms for non-Hermitian eigenvalue problems
Algorithm Mode WIL E-ext E-int Memory
Power it Dir $\square$ $--$ $--$ $++$
  SI $\square$ $\square$ $\square$ $++$
Mult. vect Dir $+$ $-$ $--$ $+$
  SI $+$ $\square$ $\square$ $+$
Arnoldi Dir $+$ $+$ $\square$ $\square$
(IRAM) SI $++$ $++$ $++$ $\square$
Lanczos Dir $+$ $+$ $--$ $++$
  SI $++$ $++$ $++$ $++$
Block Lan Dir $+$ $+$ $--$ $+$
  SI $+$ $+$ $+$ $+$
Jac Dav Dir $+$ $+$ $\square$ $\square/-$
  SI $++$ $++$ $++$ $\square/-$
  Prec $++$ $++$ $+$ $\square/-$


next up previous contents index
Next: Balancing Matrices   T. Chen Up: Non-Hermitian Eigenvalue Problems Previous: Non-Hermitian Eigenvalue Problems   Contents   Index
Susan Blackford 2000-11-20