next up previous contents index
Next: Software Availability Up: An Adaptively Blocked Lanczos Previous: An Adaptively Blocked Lanczos   Contents   Index

Storage Requirements and Floating Point Operations.

The following table summarizes the storage requirements for the different levels of ABLE. For clarity, we assume that the block size is a constant, $p$. To obtain an upper bound for the storage requirements in the variable block case, replace $p$ by the maximum allowed block sizes, $p_{\max}$.
Level Function Storage requirement
1 basic three-term recurrences $matrix+6pn+3p^2 j^2 + 4pj$
2 full biorthogonality additional $2jpn$
3 semi-biorthogonality additional $2jpn+4p^2 j$
4 treat breakdown or cluster no additional requirement
Note that matrix denotes the required storage for computing the product $AZ$ and $A^{\ast}Z$.

To maintain full or semi-biorthogonality, it is necessary to store all computed Lanczos vectors $\{ Q_i \}$ and $\{ P_i \}$. The user must allocate memory for two rectangular arrays of dimension $n$ by $m$ for this purpose. If memory is limited, the best remedy is to restart as described in [207,110], but this is not available in the current version of ABLE.

Now, let us summarize the cost of floating point operations performed per Lanczos step. Lower order floating point operation costs are ignored. The following table is a summary of the different types of operations required in the implementation of ABLE.

  Matrix- m-inner m-saxpy m-scaling  
Function m-vectors product     QRD
  product ($X^{\ast} Y$) ($Y+Xa$) ($X a$)  
Basic three-term 2 6 8 2 2
fullbo   $2j$ $2j$    
semibo   2      
Here the m-inner product, m-saxpy, and m-scaling denote the generalizations of the corresponding Level 1 BLAS inner product, saxpy and scaling operations to multiple-vector operations. As with the storage requirements, for simplicity we assume that the block size is constant. The multiple vectors are of dimension $n\times p$. The m-inner product, m-saxpy, and m-scaling involve $2p^2 n$, $2p(p+1)n$ and $2p^2 n$ floating point operations, respectively. QR decomposition (QRD) of an $n\times p$ ($p\ll n$) matrix costs $2p^2 n$ floating point operations. Note that for particular problems and data structures, certain operations can be performed more efficiently.

Additional floating point operations are required for the following occasional events:


next up previous contents index
Next: Software Availability Up: An Adaptively Blocked Lanczos Previous: An Adaptively Blocked Lanczos   Contents   Index
Susan Blackford 2000-11-20