Preconditioned Simultaneous Iterations

A well-known idea of using simultaneous or block iterations provides an important improvement over single-vector methods and permits us to compute an

-dimensional invariant subspace, rather than one eigenvector at a time. It can also serve as an acceleration technique over single-vector methods on parallel computers, as convergence for extreme eigenvalues usually increases with the size of the block and every step can be naturally implemented on wide varieties of multiprocessor computers.

A block algorithm is a straightforward generalization of the single-vector method and is typically combined with the Rayleigh-Ritz procedure. For every preconditioned eigensolver discussed earlier, several variants of simultaneous iterations can be easily suggested; e.g., see block versions of the preconditioned power method with shift in [266,149,63,130]. For shortness, we describe here only one method--a simultaneous version of the locally optimal PCG method, suggested in [268]--and only for the pencil $B - \mu A$ .

$\begin{algorithm} {Simultaneous PCG Method for GHEP } { \begin{tabbing} (nr)ss\=... ...> \> from the top Ritz value for $ j=1, \ldots, m$\end{tabbing}} \end{algorithm}$

As in the single-vector algorithm, special measures need to be taken to overcome the problem of almost-linear dependent basis in the trial subspace in Algorithm 11.11.

There is no theory available yet to predict accurately the speed of convergence of Algorithm 11.11. However, by analogy with known convergence theory of PCG system solvers, we expect convergence of $\mu^{(i)}_j$ to $\mu_j$ to be linear with the ratio

$\begin{displaymath} \left( \frac{1 - \sqrt{\xi_j} }{1+ \sqrt{\xi_j}} \right)^2, ... ...lta_0}{\delta_1} \frac{\mu_j - \mu_{m+1}}{\mu_j - \mu_{\min}}. \end{displaymath}$

(288)

We compared numerically (see [268]) a block variant of the steepest ascent Algorithm 11.6 and Algorithm 11.11, with the block size

. We plot, however, errors for only two top eigenvalues, leaving the third one out of the picture. The two shades of red represent the block SA method, and the two shades of blue correspond to Algorithm 11.11. It is easy to separate the methods on a black-and-white print, too, as Algorithm 11.11 always converges much faster. Two straight lines correspond to linear convergence predicted by (11.18).

In all tests,

is a diagonal matrix with minimal entries 1, 2, 3 and the maximal entry $10^{10},$ and we measure the eigenvalue error as

$\begin{displaymath} \lambda_i^{(i)} - \lambda_j, \ j=1,2. \end{displaymath}$

Our random initial guess leads to very big initial errors as the matrix

is poorly conditioned, $\kappa_2(A)=10^{10}$ . We observe many initial errors at the level of $10^{10}$ on all figures below, but both tested methods successfully decrease errors to single digits just after a very few iterations.

We can see that the huge condition number of

, the size of the problem, the distribution of eigenvalues in the unwanted part of the spectrum, and the particular choice of the preconditioner

do not noticeably affect the convergence of Algorithm 11.11 as the elements of a bundle, of convergence history lines, are quite close to each other on all figures. Moreover, our theoretical prediction (11.18) of the rate of convergence of Algorithm 11.11 is relatively accurate, though pessimistic. A 10-fold increase of $\delta_1 / \delta_0$ leads to the increase of number of iterations--10-fold for the block SA method, but only about 3-fold for Algorithm 11.11--exactly as we would expect. We observe that convergence for the first eigenvalue, in dark colors, is often faster than that for the second one, in lighter colors and dashed. Finally, we also observe a superlinear acceleration when the number of iterations is getting comparable with a half of the size of the problem solved. However, such a large ratio of the number of iterations and the size of the problem is atypical in practice, so we try to avoid it simply by increasing the size of the problem; e.g., on Figure 11.3 the first two runs are performed for the problem of the size 200; after noticing a superlinear convergence we immediately increase the size to 1000.

**Figure 11.2:** Conjugate gradient versus steepest ascent, $\delta _1 / \delta _0=10$
$\begin{figure}\begin{center} \epsfxsize =27pc \leavevmode \centering\epsfbox{myfig10.eps}\end{center}\end{figure}$

**Figure 11.3:** Conjugate gradient versus steepest ascent, $\delta _1 / \delta _0=100$
$\begin{figure}\begin{center} \epsfxsize =27pc \leavevmode \centering\epsfbox{myfig100.eps}\end{center}\end{figure}$

**Figure 11.4:** Conjugate gradient versus steepest ascent, $\delta _1 / \delta _0=1000$
$\begin{figure}\begin{center} \epsfxsize =27pc \leavevmode \centering\epsfbox{myfig1000.eps}\end{center}\end{figure}$

Figures 11.2 to 11.4 clearly illustrate two statements made in §11.1. First, the performance of preconditioned solvers depends heavily on the quality of the preconditioner used. A good preconditioner leads to a fast convergence on Figure 11.2. A bad preconditioner significantly slows the convergence down on Figure 11.4. We note that the size of the problem does not really affect the convergence speed. Second, the implementation particular to each method can make a big difference even with the same preconditioner. This is especially noticeable for poor-quality preconditioners, e.g., on Figure 11.4. Algorithm 11.11, the locally optimal PCG method, converges about a hundred times faster than Algorithm 11.6, though both algorithms use the same preconditioner and have similar costs of every iteration.

To conclude, our numerical results suggest that Algorithm 11.11 is a genuine conjugate gradient method. Our most recent numerical results [269] demonstrate that Algorithm 11.11 is practically the optimal method on the whole class of the preconditioned solvers for symmetric definite eigenvalue problems.

There are known [91] block-preconditioned methods for simultaneous computation of both ends of the spectrum, but they do not offer much improvement over the standard approach of computing minimal and maximal eigenvalues separately.

Another known idea of constructing block eigensolvers is to use methods of nonlinear optimization to minimize, or maximize, the trace of the projection matrix in the Rayleigh-Ritz method (see, e.g., [91,16,366] and §9.4), here

. It leads to methods somewhat similar to Algorithm 11.11 if the conjugate gradient method used for optimization, but Algorithm 11.11 has an advantage of using the Rayleigh-Ritz method.

The use of locking, a form of deflation, which exploits the unequal convergence rates of the different eigenvectors, enhances the performance of the preconditioned simultaneous iteration methods described above, quite similar to the case of classical methods, without preconditioning; see §4.3. Because of the different rates of convergence of each of the approximate eigenvectors computed by the simultaneous iteration, it is a common practice to extract those already converged and perform a deflation. We freeze extracted approximations and remove them from subsequent iterations such that there is no need to continue to multiply them by any matrices. However, we will still need either to include them in the basis of the trial subspace of the Rayleigh-Ritz method or to perform the subsequent orthogonalizations with respect to the frozen vectors whenever such orthogonalizations are needed.

The former possibility seems more natural and is easier to program than the latter one. It is also quite simple to analyze its influence on the iterative method using known results on accuracy of the Rayleigh-Ritz method; see, e.g., [387,267]. The cost of orthogonalization can be somewhat lower, however, as it reduces the dimension of the trial subspace.

We note that locking should also be used in single-vector iterative methods when we try to compute a group of eigenvectors one by one.

Using orthogonalization for locking in preconditioned eigensolvers may not, unfortunately, be simple if we want to be able to investigate the propagation of the resulting error in the process of further iterations.

As an example, let us consider our simplest preconditioned eigensolver, Algorithm 11.5, with additional orthogonalization, defined by an orthogonal projector $P^\perp$ onto an orthogonal complement of the subspace, spanned by already computed eigenvectors, which can be written as

$\begin{displaymath} x^{(i+1)} = P^\perp w^{(i)} + \tau^{(i)} x^{(i)}, \qquad w^{(i)} = T (B x^{(i)} - \mu^{(i)} A x^{(i)}). \end{displaymath}$

(289)

First, we need to choose a scalar product for the orthogonal complement of the subspace, spanned by already computed eigenvectors. The

-based scalar seems to be a natural choice here. When

is positive definite, it is common to use the

-based scalar product as well.

Second, we need to define a scalar product, in which projector $P^\perp$ is orthogonal. A traditional approach is to use the same scalar product as on the first step. It is also trivial to implement in a code.

Unfortunately, with such a choice, the iteration operator in method (11.19) is loosing the property of being symmetric, with respect to the $T^{-1}$ -based scalar product. It makes theoretical investigation of the influence of orthogonalization to approximately computed eigenvectors quite complicated (see [147,149]), where direct analysis of perturbations is done.

To preserve the symmetry, we must use the $T^{-1}$ -orthogonal projector $P^\perp$ in spite of the fact that we use a different, say

-based, scalar product on the first step to define the orthogonal complement. With this choice, we can use the standard and simple backward error analysis [264,265] instead of direct analysis [147,149], but the actual computation of $P^\perp w$ for a given

requires special attention.

Following [264,265], we take the original subspace, spanned by the frozen approximate eigenvectors (let us call it $\tilde X$ ) and find a $T^{-1}$ -orthogonal basis of the subspace $TA \tilde X$ . Mathematically, a $T^{-1}$ -orthogonal complement to the latter subspace coincides with an

-orthogonal complement of the original subspace, $\tilde X$ . Now, we can use a standard $T^{-1}$ -orthogonal projector onto a $T^{-1}$ -orthogonal complement of $TA \tilde X$ . Clearly, the $T^{-1}$ -based scalar product must be possible to compute to use this trick.

We also note that the use of a $T^{-1}$ -based scalar product, as well as any scalar product based on an ill-conditioned matrix, may lead to unstable computations.