next up previous contents index
Next: A Brief Survey of Up: A Brief Survey of Previous: Direct Solvers for Sparse   Contents   Index

Direct Solvers for Structured Matrices

The linear system of equations involving the structured matrices described in §10.2.3 have the attractive property of being solvable in $O(n^2)$ time instead of $O(n^3)$.

The common property that makes it possible to solve such systems fast (in $O(n^2)$ time) is the fact that they have a low displacement rank [257].

Displacement rank may be defined as follows: For given matrices $A$ and $F$ and a matrix $T$ we define the operator

\Delta_{A,F}(T)=A\cdot T-T\cdot F

and denote $\alpha={\rm rank}(\Delta_{A,F}(T))$. We say that $\alpha$ is the ``displacement rank'' of $T$ with respect to the operator $\Delta_{A,F}$. If $\alpha$ is small (usually $O(1)$) we say that the matrix $T$ has a ``low displacement rank.'' Since ${\rm rank}(\Delta_{A,F}(T))=\alpha$ there exist $n\times \alpha$ matrices $G$ and $B$ such that $\Delta_{A,F}(T)=G\cdot B^T$. The matrices $G$ and $B$ are called generators of $T$. We will refer to matrices with low displacement rank as having displacement structure.

For an example for a Cauchy matrix $C = (C_{ij})=(1/(x_i-y_j))$ we have

\Delta_{\diag(x),\diag(y)}(C)=\diag(x) \cdot C-C \cdot \diag(y)
=(1,1,\ldots,1)^T\cdot (1,1,\ldots,1).

Therefore, a Cauchy matrix has displacement rank 1 and generators $G=B=(1,1,\ldots,1)^T$.

A matrix with low displacement rank $\alpha$ (not necessarily equal to $1$) with respect to the operator $\Delta_{{\rm diag}(x),{\rm diag}(y)}$ is called Cauchy-like. In a similar way one defines Vandermonde-like matrices, Toeplitz-like matrices, etc. These higher displacement rank systems are solvable in $O(\alpha n^2)$ time.

Fast algorithms exploit the displacement equation in order to produce an LU factorization of the matrix $F$. The algorithms work on the generators of a matrix instead of the matrix itself, which permits them to go much faster.

The block-Toeplitz matrices and matrices of type $T^TT$, where $T$ is Toeplitz, also have a low displacement rank [257].

If $A$ and $B$ have displacement structure, then systems of type $(A+\sigma B)x=b$ can be solved using the fast low displacement rank methods only if $A+\sigma B$ has displacement structure as well (with respect to a possibly different displacement operator). For example, if $A$ is Hankel and $B=I$ then $A+\sigma B$ is Toeplitz-plus-Hankel, which also has displacement structure. If $A+\sigma B$ does not have displacement structure then one would be restricted to using zero-shifts $\sigma=0$ or fast iterative methods for which only a fast matrix-vector product is needed [257].

Some of the methods impose additional restrictions on the matrices to be symmetric (Toeplitz) or positive definite. This may result in additional restrictions on the choice of shifts available.

Special attention should be exercised when using fast algorithms for structured matrices since the speed often comes at the price of accuracy. Some displacement-rank algorithms simulate Gaussian elimination (with partial, complete, or no pivoting) by working on the generators instead of on the matrices themselves [193]. Even so, these algorithms need not have the same numerical properties as Gaussian Elimination because of the possibility of ``generator growth'' [257], which is uncommon, but can appear even for very well conditioned matrices.

There are many methods to stabilize the fast algorithms including the problem of generator growth [257, p. 111], and despite the occasional instabilities these methods are very attractive.

The fast algorithms are very well described in the literature, but reliable software libraries are not available. One should also be aware of the constants hidden in the $O(n^2)$ notation and that those constants make the fast methods faster than the traditional $O(n^3)$ methods only for $n$ sufficiently large (usually at least a few hundred, depending on the structure). The traditional $O(n^3)$ algorithms are often optimized to use BLAS 3 and use the computer's memory hierarchy efficiently in order for the code to run near the full speed of the processor (see §10.2.1). The fast algorithms can usually only use BLAS 2 and it may be difficult or impossible to optimize the fast algorithms to make efficient use of the computer's cache and fast memory. This means that even if a fast $O(n^2)$ algorithm and a slow $O(n^3)$ algorithm perform the same number of floating point operations, the ``slow'' algorithm is very likely to be faster because of these optimization issues.

We should also mention the Bjorck-Pereyra-type algorithms for solving Vandermonde and three-term Vandermonde systems [51,230]. These $O(n^2)$ methods have remarkable numerical properties. Under some additional restrictions on the order of the nodes in the Vandermonde matrix and the sign pattern of the right-hand side, it is possible to solve those systems to the full machine precision.

next up previous contents index
Next: A Brief Survey of Up: A Brief Survey of Previous: Direct Solvers for Sparse   Contents   Index
Susan Blackford 2000-11-20