The out-of-core prototype codes consists of roughly three components: (1) One component handles I/O and file management. (2) The left-look variant of LU,QR, and Cholesky factorization algorithms (3) Support routines for operations with out-of-core matrix. (1) I/O Component. ================= A high level inteceface is provided for read and writing sections of a ScaLAPACK array. For example, call ZLAREAD( iodev, m,n, ia,ja, B,ib,jb,descB, info ) will read a m by n submatrix from an out-of=core matrix A( ia:(ia+m-1), ja:(ja+n-1) ) into a ScaLAPACK matrix B, B(ib:(ib+m-1), jb:(jb+n-1) ) There are no alignment constraints on (ia,ja) or (ib,jb), however, best performance occurs if A, B are processor and block aligned. Similar to Fortran I/O, each out-of-core matrix is associated with a device unit number (between 1 to 99). At the lowest level, disk input and output is record oriented. Each record is an mmb x nnb ScaLAPACK matrix, where mmb is a mulitple of mb*nprow and nnb a multiple of nb*npcol, (ie mod(mmb, mb*nprow) == mod(nnb, nb*npcol) == 0). Each out-of-core matrix, like 2-D Block Cyclicly distributed ScaLAPACK matrices, is associated with a descriptor. The descriptor is constructed by call PFDESCINIT( descA, m,n, mb,nb, rsrc,csrc, ictxt, iodev, filetype, mmb,nnb, Asize, filename, info ) where iodev integer (1 <= iodev <= 99) iodev associated with the out-of-core matrix filetype character*1 'D' data is distributed in many files, This option is best on an environment where each processor has access to a fast local disk. 'S' data is shared in one file. This option is best on an environment where a parallel/concurrent file system supports concurrent read/write requests such as the Paragon Parallel file system. Note that some NFS implmentation may not support concurrent read/write. 'I' similar to 'S' where data is shared and interleaved in one shared file. filename character*(*) file to be associated with the out-of-core matrix. if filename starts with '/' (filename(1:1).eq.'/') it is assumed to be a full absolute path name. Otherwise, the file is assumed to be on a fast disk partition such as '/tmp' or '/pfs'. Asize integer The size of temporary work space/buffer to be used in accessing the out-of-core matrix. The I/O unit can be close by call LACLOSE( iodev, 'NoKeep', myid, nproc, info ) to remove the file or call LACLOSE( iodev, 'Keep', myid, nproc, info ) to keep the file. Note that the data layout on disk is tied to the processor grid (nprow,npcol) and block size (mb,nb). (2) 'Left-looking' variant of LU, QR, Cholesky, factorzations. LAPACK and ScaLAPACK implements a 'Right-Looking' variant of LU, QR and Cholesky factorization. A 'Left-looking' variant can reduce the volume of I/O for out-of-core algorithms. A column oriented implementation is chosen to reuse most of ScaLAPACK routines for performing pivoting or applying Householder elementary operations. For best performance, the factorization routines shall need a minium of 2 (m by nnb) ScaLAPACK array panels. The algorithm attempt to use a variable width panel (eg in Cholesky factorization) to fully utilize all of in core memory. PFxGEQRF --- QR factorization PFxGEQRS --- solve with QR factorization PFxTRF --- LU factorization PFxTRS --- solve with LU factorization PFxPOTRF --- Cholesky factorization PFxPOTRS --- solve with Cholesky factorization (3) Support routines. A number of support routines are written to operate on out-of-core matrices. PFxGEMM -- matrix-matrix operation where at most one descriptor is associated with an out-of-core matrix. PFxTRSM -- perform triangular solve where the triangular factor is out-of-core. PFxORMQR (PFxUNMQR) -- apply Householder elementary transformation. PFxMATGEN -- generate a random out-of-core matrix. PFxLAPRNT --- print the contents of an out-of-core matrix.