SPMD (Single Program Multiple Data)
Collective Communication Module
for Fortran 90

Purpose:

This document describes a SPMD (Single Program Multiple Data) Collective Communication Module for Fortran 90. This module contains routines that provide a subset of the functionality of the MPI collective communications routines but with a interface that is simplified and compatible with Fortran 90 calling semantics.

The API and a reference implementation are being produced under contract for the High Performance Computing Modernization Program by Coherent Cognition.

Task Order Number: N62306-01-D-7110/0009
High Performance Computing Modernization Program
Task Number: CE 019
Title: SPMD Collective Communication Module


Background and contract information

Problem description:

Many of DoD's HPC applications use the SPMD (Single Program Multiple Data) parallel programming style. MPI is the most common SPMD API. It provides a rich set of collective communications routines, but these are typically: too general (and therefore more complicated than necessary, not very compatible with Fortran 90 calling conventions, or not usable for non-commutative operations such as sums by programs that require bit for bit reproducibility There are other SPMD API's, which are not as portable as MPI but often faster and easier to use when available. The most widely used of these is Cray's SHMEM put/get library which is available on Cray T3E, SGI Origin, IBM SP, and some Compaq systems. SHMEM has a rudimentary set of collective operations, but like other alternatives to MPI it is held back by a lack of a robust and easy to use set of collective communications routines. There are other promising SPMD API's with similar deficiencies in this area. As an example of the problem for alternative API's, the conversion of the NAS MG benchmark from MPI to Co-Array Fortran extended the length of the program by about 15% (992 to 1150 active lines of code) in large part because collective operations had to be implemented from scratch. The difference in code size would have been very small (and the conversion effort much reduced in time and complexity) if a standard module of collective operations were available for Co-Array Fortran. The Co-Array Fortran version of NAS MG is significantly faster than the original MPI version on the Cray T3E, illustrating that alternative SPMD API's can have performance advantages.

Project solution:

We propose developing and releasing as Open Source a compact Fortran 90 module for the most commonly used SPMD collective communications operations. The first version would be for MPI and would provide a thin layer on top of a subset of MPI's existing capability (e.g. only MPI_COMM_WORLD, only contiguous INTEGER or REAL arrays, but any legal KINDs). It would also provide an option guaranteeing the same answer from the same data on the same number of cpus. Since MPI does not support such a guarantee via its existing collective operations, point to point MPI operations will be used in this mode. The initial release of the MPI-based module will include a test suite and will place emphasis on correctness rather than performance. Once such a module exists and is available to all it can be used as a collective communication API for other SPMD API's. We propose developing a version of the module for SHMEM to illustrate this capability and provide an improved collective operation API for the most widely used and widely available alternative to MPI. We anticipate that the supporters of other alternative SPMD API's will provide their own versions of the module without further action via PET. This is greatly simplified by using the SHMEM version as a reference design, since most alternative API's are closer to SHMEM than to MPI. Potential future PET tasks, not covered by this one year proposal, might concentrate on performance enhancements. These are much easier to implement for the module than for the much larger and more general entire MPI collective API. For example, a "SMP cluster" aware version would target a very common generic machine architecture. Any future PET tasks of this kind would only be attempted if there was evidence that the SPMD module was being heavily used by the DoD HPC community.

User impact/advocacy/collaboration:

The primary benefit to users is the ability to express SPMD collective operations portably and independently of the underlying communication API (MPI or SHMEM). This is particularly important when using alternatives to MPI, which can be faster than MPI but typically require significantly more programming effort to express collective operations in the absence of the module provided by this project. Using a Fortran 90 module as an API does not exclude Fortran 77 users (since all HPCMP systems have F90 compilers which also support all of F77), and provides a much more natural and compile time checkable interface for the increasing number of Fortran 90 users.

Deliverables:

The project is for a single year, with a technical report delivered at the end of the year covering the entire project.

  1. An API for SPMD Collective Communications that is independent of the underlying communications API and based on a Fortran 90 module. The API will include a mandatory user-selectable option for guaranteeing the same answer from the same data on the same number of cpus.
  2. An Open Source reference implementation of the module for MPI.
  3. An Open Source reference implementation of the module for SHMEM.
  4. A test and timing suite that includes no MPI or SHMEM code which can be compiled against either the MPI or SHMEM versions of the module to confirm correctness and performance. The suite will be run on at least two HPCMP machine types for MPI and for SHMEM.
The rest of this document is in several parts. We have
Basic API and users guide

Describes the minimum requirements for using the various routines defined within the API.

Advanced API features and users guide

All of the above plus information about optional arguments for the routines.

Formal API specification

Specification for the API, similar in style to the MPI standard document.

Reference implementation in MPI

Source code for an implementation of the API based on MPI. Compile and link instructions for the Cray T3e,IBM SP, and Apple OS X are also included.

Reference implementation in SHMEM

Source code for an implementation of the API based on SHMEM. Compile and link instructions for the Cray T3e and IBM SP are also included.

Timing and test suite

Source code for a timing and test suite. Example runs.

Fortran 90 semantics and MPI

This section describes Fortran 90 calling semantics that are important to the definition of the API. In particular, modules, interfaces, optional parameters, and procedure (subroutine) overloading are discussed. A thorough understanding of the information in this section is NOT required to use the API. This section provides information that will help the user and potential implementors understand why the routines in the API have the prescribed calling procedures.

References

Pointers to various web pages and reference for printed documents and books.