Each time step of the IFS consists of computations in three different spaces (grid point space, Fourier space and spectral space) and has the following order:

- computations in grid point space,
- transition to Fourier space by Fast Fourier Transforms,
- computations in Fourier space,
- transition to spectral space by Legendre transforms,
- computations in spectral space (phase I),
- computations in spectral space (phase II),
- transition to Fourier space by inverse Legendre transforms,
- computations in Fourier space and
- transition to grid point space by inverse Fourier transforms.

So, in each computational step a partitioning of the data and the computations to processes is possible with respect to two dimensions, providing a high degree of parallelism. The data transposition strategy [1, 4] allows to utilise this approach efficiently on parallel systems. Its idea is to re-distribute the complete data to the processes (data transposition) at some stages of the algorithm such that the arithmetic computations between two consecutive transpositions can be performed without any interprocess communication: during the corresponding computational step the direction with data dependencies is not partitioned among processes.

Figure 1 shows the structure of the parallel algorithm: here, the data are partitioned among 3 X 4 processes. Transpositions are performed in each data space in order to allow a fully parallelised computational phase without any interprocess communication. The dashed cells in Fourier and spectral space illustrate the data partitioning before and after the Legendre transforms; in Fourier space each process is responsible for two of the indicated process columns (which are in general not adjacent) in order to provide load balanced Legendre transforms.

The usual expression for the size of the IFS data structure is T*M* L*z*, where
*M* is the *wave number* and *z* is the number of *levels*. We
investigate T*M* L31 with *M*=63, 106, 213. The
standard relation between *M* and the number of grid points in direction of
longitudes or latitudes is and
(cf. Figure 1). For the present paper we consider the
RAPS version 2.0 with full grid and Eulerian treatment of advection. The
finest grid size corresponds to a 60 km network on the equator.

For the mapping of the process structure onto the computing nodes we used a
simple mapping function called *column-mapping*. Here
the columns of the process structure are mapped onto the columns of the cluster
structure one after the other. Depending on the size of these columns, more
than one process column might be mapped onto one cluster column or one process
column might be split to several cluster columns. Let and denote the elements of the -process
structure or the -cluster structure respectively. The
-cluster structure represents a cluster of
C90 systems each of which consists of computing nodes. Then the
column-mapping is defined by . We refer to
Figure 2 as an example of this mapping.

Tue May 28 14:38:25 PST 1996