Direct Methods

In this section, we briefly discuss methods for the computation
of singular values and vectors of dense matrices.
These methods are sometimes called transformation methods.
Some or all of them are implemented in LAPACK,
ScaLAPACK, and in the `svd` command in
MATLAB.^{}

While it is possible to apply the transformation methods of §4.2 to one of the Hermitian matrices , , or , the methods used in practice are specialized for the SVD and so are more efficient and accurate.

The most common transformation methods compute the thin SVD in three phases, shown below. (They can be easily modified to compute the full SVD, or just selected singular values and/or singular vectors, but we present just the thin SVD for simplicity.)

- Find an by unitary matrix and
an by matrix with orthonormal columns
such that
is an by
*bidiagonal matrix*, i.e., is nonzero on the main diagonal and first superdiagonal only. - Compute the SVD of : .
- Multiply and to get the thin SVD .

Phase 1, reduction to bidiagonal form, is performed by
LAPACK routine `xGEBRD` and
ScaLAPACK routine `PxGEBRD`
using a sequence of unitary Householder
reflections on the left and unitary Householder
reflections on the right, for a cost of
floating point operations.
If singular values only are desired,
phase 2 costs just ,
which is much less than the cost of phase 1, and phase 3 is
omitted.
If all left and right singular vectors are desired,
the cost depends strongly on which algorithm is selected,
as described below.

The algorithms for phase 2 are analogous to those available for the symmetric tridiagonal eigenproblem as discussed in §4.2, with some differences outlined below.

Currently, the fastest available sequential or shared memory parallel
dense SVD routine is LAPACK's `xGESDD`,
but a faster one is on the way.
The fastest available distributed memory parallel dense SVD
routine is ScaLAPACK's `PxGESVD`, but again a better
one is one the way.

*QR algorithm:*-
This algorithm finds all the singular values, and optionally
all the left and/or right singular vectors of a bidiagonal matrix.
It costs for singular values only and for the
singular vectors as well.
It can be implemented so that all singular values are computed
with nearly all correct significant figures, even the tiniest
ones (as long as underflow does not occur) [123].
QR is used to
compute singular vectors in LAPACK's computational routine
`xBDSQR`, which is used by driver routine`xGESVD`to compute the SVD of dense matrices but has been superseded by faster methods described below.`xBDSQR`is also used by ScaLAPACK driver routine`PxGESVD`. *DQDS algorithm:*-
This algorithm finds just the singular
values of a bidiagonal matrix in time, but as
accurately and rather more efficiently than QR
[168,355,360].
It is the sequential algorithm of choice for singular values,
is implemented in LAPACK as
`xLASQ1`, and is used by all LAPACK driver routines. *Divide-and-conquer method:*-
This is currently the fastest method available in LAPACK to
find all the singular values and vectors of a bidiagonal matrix larger
than about 25 by 25 [208,12].
It divides the matrix into two halves,
computes the SVD of each half, and glues the solutions together
by solving a special rational equation. For a large bidiagonal
matrix this is repeated recursively until the parts are
smaller than 25, when the QR algorithm is called.
^{}It can be slightly less accurate than DQDS or QR, because the error in the tiniest singular values may be as large as machine epsilon times , but this is almost always good enough.Divide-and-conquer is implemented by LAPACK computational routine

`xBDSDC`, which is used by LAPACK driver routine`xGESDD`to compute the SVD of a dense matrix.`xGESDD`is currently the LAPACK algorithm of choice for the SVD of dense matrices. *Bisection method and inverse iteration:*-
Bisection and inverse iteration can be used to find
just those singular values and vectors of interest.
Current algorithms work in time per singular triplet,
unless the singular value is in a cluster of nearby
singular values, in which case the cost can rise to
if orthogonalization is used to try to guarantee
orthogonal singular vectors. This could be as high
as if many singular values are tightly clustered.
Still, this can sometimes fail to guarantee orthogonality
[129].
Neither LAPACK, ScaLAPACK, nor MATLAB currently contains an SVD routine
based on bisection and inverse iteration.
Recent advances in [168,128,358,356] promise an algorithm that costs work per singular triplet while guaranteeing orthogonality, but it is not yet complete. When done, both LAPACK and ScaLAPACK will provide this algorithm, which should be the fastest in all cases.

If is banded, phase 1 can be performed
more cheaply by LAPACK computational
routine `xGBBRD`, but no LAPACK driver routines are available.

Finally, we mention *Jacobi's method* [114,198]
for the SVD.
This transformation method repeatedly multiplies
on the right by elementary orthogonal matrices (Jacobi rotations)
until converges to ; the product of the Jacobi rotations
is . Jacobi is slower than any of the above transformation methods
(it can be made to run within about twice the time of QR [136])
but has the useful property that for certain
it can deliver the tiny singular values, and their singular vectors,
much more accurately than any of the above methods
provided that it is properly implemented [118,115].