Preconditioned Simultaneous Iterations

A well-known idea of using *simultaneous* or *block iterations*
provides an important improvement over single-vector methods
and permits us to
compute an -dimensional invariant subspace, rather than one
eigenvector at a time. It can also serve as an acceleration
technique over single-vector methods on parallel computers,
as convergence for extreme eigenvalues usually increases
with the size of the block and every step can be naturally
implemented on wide varieties of multiprocessor computers.

A block algorithm is a straightforward generalization of the single-vector method and is typically combined with the Rayleigh-Ritz procedure. For every preconditioned eigensolver discussed earlier, several variants of simultaneous iterations can be easily suggested; e.g., see block versions of the preconditioned power method with shift in [266,149,63,130]. For shortness, we describe here only one method--a simultaneous version of the locally optimal PCG method, suggested in [268]--and only for the pencil .

As in the single-vector algorithm, special measures need to be taken to overcome the problem of almost-linear dependent basis in the trial subspace in Algorithm 11.11.

There is no theory available yet to predict accurately
the speed of convergence of Algorithm 11.11.
However, by analogy with known convergence theory
of PCG system solvers,
we expect convergence of to
to be linear with the ratio

We compared numerically (see [268]) a block variant of the steepest ascent Algorithm 11.6 and Algorithm 11.11, with the block size . We plot, however, errors for only two top eigenvalues, leaving the third one out of the picture. The two shades of red represent the block SA method, and the two shades of blue correspond to Algorithm 11.11. It is easy to separate the methods on a black-and-white print, too, as Algorithm 11.11 always converges much faster. Two straight lines correspond to linear convergence predicted by (11.18).

In all tests, , is a diagonal matrix with
minimal entries 1, 2, 3 and the maximal entry
and we measure the eigenvalue error as

The size of the problem is changing within 100 and 2000. The initial guess is random for every new run. The preconditioner is a random symmetric positive definite matrix with the fixed value of of (11.9).

Our random initial guess leads to very big initial errors as the matrix is poorly conditioned, . We observe many initial errors at the level of on all figures below, but both tested methods successfully decrease errors to single digits just after a very few iterations.

We can see that the huge condition number of , the size of the problem, the distribution of eigenvalues in the unwanted part of the spectrum, and the particular choice of the preconditioner do not noticeably affect the convergence of Algorithm 11.11 as the elements of a bundle, of convergence history lines, are quite close to each other on all figures. Moreover, our theoretical prediction (11.18) of the rate of convergence of Algorithm 11.11 is relatively accurate, though pessimistic. A 10-fold increase of leads to the increase of number of iterations--10-fold for the block SA method, but only about 3-fold for Algorithm 11.11--exactly as we would expect. We observe that convergence for the first eigenvalue, in dark colors, is often faster than that for the second one, in lighter colors and dashed. Finally, we also observe a superlinear acceleration when the number of iterations is getting comparable with a half of the size of the problem solved. However, such a large ratio of the number of iterations and the size of the problem is atypical in practice, so we try to avoid it simply by increasing the size of the problem; e.g., on Figure 11.3 the first two runs are performed for the problem of the size 200; after noticing a superlinear convergence we immediately increase the size to 1000.

Figures 11.2 to 11.4
clearly illustrate two statements made in §11.1.
First, the *performance of preconditioned solvers depends heavily on
the quality of the preconditioner used*. A good preconditioner
leads to a fast convergence on Figure 11.2.
A bad preconditioner significantly
slows the convergence down on Figure 11.4. We note that
the size of the problem does not really affect the convergence speed.
Second, the *implementation particular to each method can make a
big difference even with the same preconditioner*. This is especially
noticeable for poor-quality preconditioners, e.g., on Figure 11.4.
Algorithm 11.11,
the locally optimal PCG method,
converges about *a hundred times faster* than
Algorithm 11.6, though both
algorithms use
the same preconditioner and have similar costs of every iteration.

To conclude, our numerical results suggest that Algorithm 11.11
is a genuine conjugate gradient method.
Our most recent numerical results [269]
demonstrate that Algorithm 11.11 is practically the
*optimal method on the whole class of the preconditioned solvers
for symmetric definite eigenvalue problems*.

There are known [91]
block-preconditioned methods for simultaneous computation
of *both ends of the spectrum*, but they
do not offer much improvement over the standard
approach of computing minimal and maximal eigenvalues
separately.

Another known idea of constructing block eigensolvers
is to use *methods of nonlinear optimization
to minimize, or maximize, the trace of
the projection matrix* in the Rayleigh-Ritz method (see, e.g.,
[91,16,366] and §9.4), here .
It leads to methods somewhat similar to Algorithm 11.11
if the conjugate gradient method used for optimization, but Algorithm 11.11 has
an advantage of using the Rayleigh-Ritz method.

The use of *locking*, a form of *deflation*, which
exploits the unequal convergence rates of the different
eigenvectors, enhances the
performance of the preconditioned
simultaneous iteration methods described above, quite similar
to the case of classical methods, without preconditioning;
see §4.3.
Because of the different rates of convergence of each of the
approximate eigenvectors computed by the simultaneous iteration,
it is a common practice to extract those already converged
and perform a deflation.
We freeze extracted approximations and remove them
from subsequent iterations such that there
is no need to continue to multiply them by any matrices.
However, we will still need either to include them
in the basis of the trial subspace of the Rayleigh-Ritz
method or to perform
the subsequent orthogonalizations with respect to the frozen vectors
whenever such orthogonalizations are needed.

The former possibility seems more natural and is easier to program than the latter one. It is also quite simple to analyze its influence on the iterative method using known results on accuracy of the Rayleigh-Ritz method; see, e.g., [387,267]. The cost of orthogonalization can be somewhat lower, however, as it reduces the dimension of the trial subspace.

We note that locking should also be used in single-vector iterative methods when we try to compute a group of eigenvectors one by one.

Using orthogonalization for locking in preconditioned eigensolvers may not, unfortunately, be simple if we want to be able to investigate the propagation of the resulting error in the process of further iterations.

As an example, let us consider our simplest
preconditioned eigensolver, Algorithm 11.5,
with additional orthogonalization, defined by an orthogonal
projector onto an orthogonal complement of the subspace,
spanned by already computed eigenvectors, which can be written as

First, we need to choose a scalar product for the orthogonal complement of the subspace, spanned by already computed eigenvectors. The -based scalar seems to be a natural choice here. When is positive definite, it is common to use the -based scalar product as well.

Second, we need to define a scalar product, in which projector is orthogonal. A traditional approach is to use the same scalar product as on the first step. It is also trivial to implement in a code.

Unfortunately, with such a choice, the iteration operator in method (11.19) is loosing the property of being symmetric, with respect to the -based scalar product. It makes theoretical investigation of the influence of orthogonalization to approximately computed eigenvectors quite complicated (see [147,149]), where direct analysis of perturbations is done.

To preserve the symmetry, we must use the -orthogonal projector in spite of the fact that we use a different, say -based, scalar product on the first step to define the orthogonal complement. With this choice, we can use the standard and simple backward error analysis [264,265] instead of direct analysis [147,149], but the actual computation of for a given requires special attention.

Following [264,265], we take the original subspace, spanned by the frozen approximate eigenvectors (let us call it ) and find a -orthogonal basis of the subspace . Mathematically, a -orthogonal complement to the latter subspace coincides with an -orthogonal complement of the original subspace, . Now, we can use a standard -orthogonal projector onto a -orthogonal complement of . Clearly, the -based scalar product must be possible to compute to use this trick.

We also note that the use of a -based scalar product, as well as any scalar product based on an ill-conditioned matrix, may lead to unstable computations.