 
 
 
 
 
 
 
 
 
 
The analysis is given for the Cayley transform only, though shift-and-invert could be used as well. The reasons are threefold. First, the shift-and-invert Arnoldi method and the Cayley transform Arnoldi method produce the same Ritz vectors. Second, it makes a link with the (Jacobi) Davidson methods easier to establish. Third, the Cayley transform leads to a more natural approach to the problem.
In the Arnoldi method applied to the Cayley transform, 
 , we must
solve a
sequence of linear systems
, we must
solve a
sequence of linear systems
 
 , where
, where  is the residual and
 is the residual and  the last
Krylov
basis vector.
The result
 the last
Krylov
basis vector.
The result  is orthonormalized against
 is orthonormalized against 
 into
 into
 .
One could also apply the Cayley transform to another vector in the Krylov
space rather than to
.
One could also apply the Cayley transform to another vector in the Krylov
space rather than to  .
In this case, we solve the linear system
.
In this case, we solve the linear system
 is chosen as follows: it may be a Ritz vector
(Jacobi-Davidson) or the last basis vector (Arnoldi) or the continuation
vector (rational Krylov).
The residual norm
 is chosen as follows: it may be a Ritz vector
(Jacobi-Davidson) or the last basis vector (Arnoldi) or the continuation
vector (rational Krylov).
The residual norm  is a measure of the backward error since we
can
rewrite (11.2) as
 is a measure of the backward error since we
can
rewrite (11.2) as
 
 is a modest multiple of machine precision.
We say that the direct method computes a backward stable solution of the
linear system.
Since
 is a modest multiple of machine precision.
We say that the direct method computes a backward stable solution of the
linear system.
Since  is very small, we talk about an exact eigenvalue solver.
Obtaining a small residual norm by an iterative method is also possible
but is
usually very expensive.
In this section, we study the situation where
 is very small, we talk about an exact eigenvalue solver.
Obtaining a small residual norm by an iterative method is also possible
but is
usually very expensive.
In this section, we study the situation where  is large.
In order to give an indication of what we mean by large,
a few words about iterative linear system solvers are needed.
A linear system
 is large.
In order to give an indication of what we mean by large,
a few words about iterative linear system solvers are needed.
A linear system  is said to be solved with a
relative residual tolerance
 is said to be solved with a
relative residual tolerance
 when the solution,
 when the solution,  , satisfies
, satisfies 
 Krylov methods [388] are typically used.
GMRES [389], BiCGSTAB(
Krylov methods [388] are typically used.
GMRES [389], BiCGSTAB( ) [410], and
QMR [179] are among those most widely used.
See [41] for templates for all these solvers.
Their performance substantially improves when a
suitable preconditioner is employed.
Hence what we mean by a large error is that
) [410], and
QMR [179] are among those most widely used.
See [41] for templates for all these solvers.
Their performance substantially improves when a
suitable preconditioner is employed.
Hence what we mean by a large error is that
 
By putting all the  for
 for 
 together in
 together in
![$S_{k-1}=[s_{1},\ldots,s_{k-1}]$](img3711.png) ,
,
 in
 in  and
 and  in
 in  , we obtain
, we obtain
 be the generalized Moore-Penrose inverse of
 be the generalized Moore-Penrose inverse of
 , i.e.,
, i.e., 
 , and define
, and define
 .
We can rewrite the relation as
.
We can rewrite the relation as
 steps with the
``inexact'' Cayley transform
 steps with the
``inexact'' Cayley transform
 
Note that for a fixed  and
 and  ,
, 
 depends on
 depends on  and the way each of the linear systems (11.2) are solved.
When (11.2) is
solved by a preconditioner
 
and the way each of the linear systems (11.2) are solved.
When (11.2) is
solved by a preconditioner  or a stationary linear system solver, i.e.,
 or a stationary linear system solver, i.e.,
 for
 for 
 , then
, then
 is independent of
 is independent of  and
 and
 and so is
 and so is 
 .
.
Eigenpairs are computed from 
 .
In Arnoldi's method, this happens by the Hessenberg matrix that arises
from the
orthogonalization of
.
In Arnoldi's method, this happens by the Hessenberg matrix that arises
from the
orthogonalization of  ; in the Davidson method, one uses the
projection
with
; in the Davidson method, one uses the
projection
with  and
 and  , e.g.,
, e.g.,
 .
.
 contains the eigenvectors associated
with the
well-separated extreme eigenvalues of
 contains the eigenvectors associated
with the
well-separated extreme eigenvalues of 
 .
This is a perturbed
.
This is a perturbed 
 , so the relation with
, so the relation with  is
partially
lost, and accurately computing eigenpairs of
 is
partially
lost, and accurately computing eigenpairs of  may be
difficult.
To see which parameters play a role in this perturbation, 
from Theorems 4.3 and 4.4 in [323], it can be shown that
if
 may be
difficult.
To see which parameters play a role in this perturbation, 
from Theorems 4.3 and 4.4 in [323], it can be shown that
if 
 has distinct eigenvalues, 
then for each eigenpair
 has distinct eigenvalues, 
then for each eigenpair  of
 of 
 there is an eigenpair
 there is an eigenpair  of
of 
 such that
 such that
 
 .
These inequalities show that
if
.
These inequalities show that
if  and/or
 and/or  are small,
 are small,
 and
 and  correspond very well.
In this case,
 correspond very well.
In this case,  can be computed via eigenpairs of
 can be computed via eigenpairs of 
 .
Since
.
Since 
 ,
, 
 , is
reduced to
, is
reduced to 
 .
.
This section can be concluded as follows.
 are still targeted.
 are still targeted.
 .
For any
.
For any  , the transformation is able to compute eigenvalues near
, the transformation is able to compute eigenvalues near
 accurately.
In practice, this means that only one (simple) eigenvalue can be computed
very accurately.
It also implies that
accurately.
In practice, this means that only one (simple) eigenvalue can be computed
very accurately.
It also implies that  needs to be updated from time to time, since
 needs to be updated from time to time, since
 is unknown.
The other eigenvalues are only computed approximately depending on the
error level
 is unknown.
The other eigenvalues are only computed approximately depending on the
error level  .
.
 defines a
Krylov subspace
 defines a
Krylov subspace 
 .
.
The remaining question is now how to compute the eigenpairs of 
 or
how to exploit them for computing eigenpairs of
 or
how to exploit them for computing eigenpairs of 
 .
In §11.2.3 and §11.2.4,
we discuss the Rayleigh-Ritz technique; i.e.,
eigenpairs are computed from the orthogonal projection of
.
In §11.2.3 and §11.2.4,
we discuss the Rayleigh-Ritz technique; i.e.,
eigenpairs are computed from the orthogonal projection of
 on the
 on the 
 .
In §11.2.7, the eigenpairs are computed from the
eigenpairs of
.
In §11.2.7, the eigenpairs are computed from the
eigenpairs of
 directly by use of the rational Krylov recurrence relation.
§11.2.6 presents a Lanczos algorithm that uses the
recurrence relation for the eigenvectors and the Rayleigh-Ritz projection
for the eigenvalues.
 directly by use of the rational Krylov recurrence relation.
§11.2.6 presents a Lanczos algorithm that uses the
recurrence relation for the eigenvectors and the Rayleigh-Ritz projection
for the eigenvalues.
Note that  and
 and  cannot be chosen too far away from each other.
Suppose the eigenvalue
 cannot be chosen too far away from each other.
Suppose the eigenvalue  is wanted.  From the conclusion given
above, the convergence is faster when
 is wanted.  From the conclusion given
above, the convergence is faster when  is
chosen close to
 is
chosen close to  , and the computed eigenvalue can only be
accurate when
, and the computed eigenvalue can only be
accurate when
 is close to
 is close to  .
In theory, one could select
.
In theory, one could select  , which is usually used in
the Davidson method.
Since
, which is usually used in
the Davidson method.
Since 
 in that case, there is a high risk of stagnation in the
latter method.
A robust selection is used in the Jacobi-Davidson method, which we
discuss
in §11.2.5.
 in that case, there is a high risk of stagnation in the
latter method.
A robust selection is used in the Jacobi-Davidson method, which we
discuss
in §11.2.5.
 
 
 
 
 
 
 
 
