Cluster computing - the use of a strongly connected cluster of (shared memory) high performance computers instead of a single supercomputer - is a topic of central interest. However very little practical experience has been made so far with large scale production codes on clusters of supercomputers.
In a joint project with CRAY Research and in close co-operation with the ECMWF, we investigated the potential of cluster computing for weather (and climate) predictions. Clearly, weather prediction is one of the classical and one of the most important application fields where high performance computers are needed. We therefore started with this challenging application.
For our investigation, we consider the IFS (Integrated Forecasting System) code of the ECMWF. This code is of central significance for European medium range weather prediction and for the atmospheric calculations in climate research. SCAI has a very detailed knowledge of this code , since they provided the first parallel version of the code based on the portable interface PARMACS .
Meanwhile several parallel versions of the code are available: 1D and 2D partitionings and a large set of mappings . An additional aspect is the resolution of the basic 3D-grid. Current fine resolution leads to a calculation/communication ratio which is favourable to efficient parallelisation. Good results have been achieved on a variety of parallel machines [4, 5]. Our study shows that it is necessary to thoroughly tailor the parallelisation strategy to the supercomputer cluster configuration in order to obtain optimal cluster computing efficiencies.
As a model in our cluster computing configuration, CRAY proposed to base the study on a cluster of C90 computers connected by one or several Hippi channels. For the daily forecast model which contains about 6.3 million grid points we obtained a very good efficiency. Model evaluation delivered a speed-up of 1.7 on two C90 systems with 1 Hippi channel and an (even better) speed-up of 3.5 on four C90 systems with 6 Hippi channels if compared to a single C90 system.
We introduce the cluster efficiency as a special metric to compare cluster of systems and single systems in section 2. The two subsequent sections contain a definition of the considered machine configuration and the structure of the parallel application. We briefly describe our main result in section 5. Section 6 gives an idea in which way our results had been obtained. Finally, we discuss several results of our investigation in section 7 in some greater detail.