The out-of-core prototype codes consists of roughly three components:
(1) One component handles I/O and file management.
(2) The left-look variant of LU,QR, and Cholesky factorization algorithms
(3) Support routines for operations with out-of-core matrix.
(1) I/O Component.
=================
A high level inteceface is provided for read and writing
sections of a ScaLAPACK array. For example,
call ZLAREAD( iodev, m,n, ia,ja,
B,ib,jb,descB, info )
will read a m by n submatrix from an out-of=core matrix
A( ia:(ia+m-1), ja:(ja+n-1) )
into a ScaLAPACK matrix B,
B(ib:(ib+m-1), jb:(jb+n-1) )
There are no alignment constraints on (ia,ja) or (ib,jb), however,
best performance occurs if A, B are processor and block aligned.
Similar to Fortran I/O, each out-of-core matrix is associated with a
device unit number (between 1 to 99). At the lowest level, disk input
and output is record oriented. Each record is an mmb x nnb ScaLAPACK
matrix, where mmb is a mulitple of mb*nprow and nnb a multiple of
nb*npcol, (ie mod(mmb, mb*nprow) == mod(nnb, nb*npcol) == 0).
Each out-of-core matrix, like 2-D Block Cyclicly distributed
ScaLAPACK matrices, is associated with a descriptor. The
descriptor is constructed by
call PFDESCINIT( descA, m,n, mb,nb, rsrc,csrc, ictxt,
iodev, filetype, mmb,nnb, Asize, filename, info )
where
iodev integer (1 <= iodev <= 99)
iodev associated with the out-of-core matrix
filetype character*1
'D' data is distributed in many files,
This option is best on an environment where
each processor has access to a fast local disk.
'S' data is shared in one file.
This option is best on an environment where
a parallel/concurrent file system supports
concurrent read/write requests such as the Paragon
Parallel file system.
Note that some NFS implmentation may not support
concurrent read/write.
'I' similar to 'S' where data is shared and interleaved
in one shared file.
filename character*(*)
file to be associated with the out-of-core matrix.
if filename starts with '/' (filename(1:1).eq.'/')
it is assumed to be a full absolute path name.
Otherwise, the file is assumed to be on a fast disk partition
such as '/tmp' or '/pfs'.
Asize integer
The size of temporary work space/buffer to be used
in accessing the out-of-core matrix.
The I/O unit can be close by
call LACLOSE( iodev, 'NoKeep', myid, nproc, info )
to remove the file or
call LACLOSE( iodev, 'Keep', myid, nproc, info )
to keep the file.
Note that the data layout on disk is tied to the processor grid
(nprow,npcol) and block size (mb,nb).
(2) 'Left-looking' variant of LU, QR, Cholesky, factorzations.
LAPACK and ScaLAPACK implements a 'Right-Looking' variant of
LU, QR and Cholesky factorization. A 'Left-looking' variant
can reduce the volume of I/O for out-of-core algorithms.
A column oriented implementation is chosen to reuse most of
ScaLAPACK routines for performing pivoting or applying Householder
elementary operations.
For best performance, the factorization routines shall need
a minium of 2 (m by nnb) ScaLAPACK array panels. The algorithm
attempt to use a variable width panel (eg in Cholesky factorization)
to fully utilize all of in core memory.
PFxGEQRF --- QR factorization
PFxGEQRS --- solve with QR factorization
PFxTRF --- LU factorization
PFxTRS --- solve with LU factorization
PFxPOTRF --- Cholesky factorization
PFxPOTRS --- solve with Cholesky factorization
(3) Support routines.
A number of support routines are written to operate
on out-of-core matrices.
PFxGEMM -- matrix-matrix operation where at most one
descriptor is associated with an out-of-core matrix.
PFxTRSM -- perform triangular solve where the triangular factor
is out-of-core.
PFxORMQR
(PFxUNMQR) -- apply Householder elementary transformation.
PFxMATGEN -- generate a random out-of-core matrix.
PFxLAPRNT --- print the contents of an out-of-core matrix.