Next: Motivation and Background Up: LAPACK Working Note 112: Previous: LAPACK Working Note 112:

Introduction

There are special challenges associated with writing reliable numerical software on networks containing heterogeneous processors, that is processors which may do floating point arithmetic differently. This includes not just machines with completely different floating point formats and semantics, such as Cray vector computers running Cray arithmetic versus workstations running IEEE standard floating point arithmetic, but even supposedly identical machines running with different compilers, or even just different compiler options or runtime environments.

The basic problem occurs when making data dependent branches on different processors. The flow of an algorithm is usually data dependent and so slight variations in the data may lead to different processors executing completely different sections of code.

This paper represents the experience of the ScaLAPACK and NAG teams in developing numerical software for distributed memory message-passing systems, and the awareness that the software being developed may not be as robust on heterogeneous systems as on homogeneous systems. We briefly describe the work of these teams in Section 2, and Section 3 defines our use of the terms homogeneous and heterogeneous computing, and discusses the considerations leading to the definitions.

In Sections 4, 5 and 8 we look at three areas that require attention in developing software for heterogeneous networks: machine parameters, where we discuss what the values of machine parameters, such as machine precision should be; checking global arguments and communicating floating point values; and algorithmic integrity, that is, how can we ensure that algorithms perform correctly in a heterogeneous setting. The particular case of communicating floating point values on IEEE machines is briefly discussed in Section 6. Some additional considerations arising from what we regard as poor arithmetic, ranging from lack of full IEEE arithmetic support to unnecessary overflow in complex arithmetic, are discussed in Section 7.

This report is an updated version of [&make_named_href('', "node12.html#DDHOS:IEEEC:96","[5]")], which takes into account problems encountered during the preparation of Version 1.2 of ScaLAPACK [&make_named_href('', "node12.html#CDDDOPSWW:DMW:95x","[2]")].

Next: Motivation and Background Up: LAPACK Working Note 112: Previous: LAPACK Working Note 112:

Jack Dongarra
Fri Aug 30 15:13:52 EDT 1996