Introduction and Overview

next up previous contents
Next: Blocking Send and Up: Point-to-Point Communication Previous: Point-to-Point Communication

Introduction and Overview


The basic communication mechanism of MPI is the transmittal of data between a pair of processes, one side sending, the other, receiving. We call this ``point to point communication.'' Almost all the constructs of MPI are built around the point to point operations and so this chapter is fundamental. It is also quite a long chapter since: there are many variants to the point to point operations; there is much to say in terms of the semantics of the operations; and related topics, such as probing for messages, are explained here because they are used in conjunction with the point to point operations.

MPI provides a set of send and receive functions that allow the communication of typedtyped data data with an associated tag.tagmessage tag Typing of the message contents is necessary for heterogeneous support - the type information is needed so that correct data representation conversions can be performed as data is sent from one architecture to another. The tag allows selectivity of messages at the receiving end: one can receive on a particular tag, or one can wild-card this quantity, allowing reception of messages with any tag. Message selectivity on the source process of the message is also provided.

A fragment of C code appears in Example gif for the example of process 0 sending a message to process 1. The code executes on both process 0 and process 1. Process 0 sends a character string using MPI_Send(). The first three parameters of the send call specify the data to be sent: the outgoing data is to be taken from msg; it consists of strlen(msg)+1 entries, each of type MPI_CHAR (The string "Hello there" contains strlen(msg)=11 significant characters. In addition, we are also sending the tex2html_html_special_mark_quot''" string terminator character). The fourth parameter specifies the message destination, which is process 1. The fifth parameter specifies the message tag. Finally, the last parameter is a communicatorcommunicator that specifies a communication domaincommunication domain for this communication. Among other things, a communicator serves to define a set of processes that can be contacted. Each such process is labeled by a process rank.rank Process ranks are integers and are discovered by inquiry to a communicator (see the call to MPI_Comm_rank()). MPI_COMM_WORLDMPI_COMM_WORLD is a default communicator provided upon start-up that defines an initial communication domain for all the processes that participate in the computation. Much more will be said about communicators in Chapter gif.

The receiving process specified that the incoming data was to be placed in msg and that it had a maximum size of 20 entries, of type MPI_CHAR. The variable status, set by MPI_Recv(), gives information on the source and tag of the message and how many elements were actually received. For example, the receiver can examine this variable to find out the actual length of the character string received. Datatype matchingdatatype matchingtype matching (between sender and receiver) and data conversion data conversionrepresentation conversion on heterogeneous systems are discussed in more detail in Section gif.

Example 2.1 C code. Process 0 sends a message to process 1.
char msg[20];
int myrank, tag = 99;
MPI_STATUS status;
MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /* find my rank */
if (myrank == 0){
   strcpy(msg, "Hello there");
   MPI_SEND(msg, strlen(msg)+1, MPI_CHAR, 1, tag, MPI_COMM_WORLD);
} else if (myrank == 1){
  MPI_Recv(msg, 20, MPI_CHAR, 0, tag, MPI_COMM_WORLD, &status);
The Fortran version of this code is shown in Example gif. In order to make our Fortran examples more readable, we use Fortran 90 syntax, here and in many other places in this book. The examples can be easily rewritten in standard Fortran 77. The Fortran code is essentially identical to the C code. All MPI calls are procedures, and an additional parameter is used to return the value returned by the corresponding C function. Note that Fortran strings have fixed size and are not null-terminated. The receive operation stores "Hello there" in the first 11 positions of msg.

Example 2.2 Fortran code.
INTEGER myrank, ierr, status(MPI_STATUS_SIZE)
INTEGER tag = 99
IF (myrank .EQ. 0) THEN 
   msg = "Hello there"
                  tag, MPI_COMM_WORLD, ierr) 
ELSE IF (myrank .EQ. 1) THEN
                 tag, MPI_COMM_WORLD, status, ierr)

These examples employed blocking blocking send and receive functions. The send call blocks until the send buffer can be reclaimed (i.e., after the send, process 0 can safely over-write the contents of msg). Similarly, the receive function blocks until the receive buffer actually contains the contents of the message. MPI also provides nonblockingnonblocking send and receive functions that allow the possible overlap of message transmittal with computation, or the overlap of multiple message transmittals with one-another. Non-blocking functions always come in two parts: the posting functions, posting which begin the requested operation; and the test-for-completion functions,test-for-completion which allow the application program to discover whether the requested operation has completed. Our chapter begins by explaining blocking functions in detail, in Section gif-gif, while nonblocking functions are covered later, in Sections gif-gif.

We have already said rather a lot about a simple transmittal of data from one process to another, but there is even more. To understand why, we examine two aspects of the communication: the semantics semantics of the communication primitives, and the underlying protocols that protocols implement them. Consider the previous example, on process 0, after the blocking send has completed. The question arises: if the send has completed, does this tell us anything about the receiving process? Can we know that the receive has finished, or even, that it has begun?

Such questions of semantics are related to the nature of the underlying protocol implementing the operations. If one wishes to implement a protocol minimizing the copying and buffering of data, the most natural semantics might be the ``rendezvous''rendezvous version, where completion of the send implies the receive has been initiated (at least). On the other hand, a protocol that attempts to block processes for the minimal amount of time will necessarily end up doing more buffering and copying of data and will have ``buffering'' semantics.buffering

The trouble is, one choice of semantics is not best for all applications, nor is it best for all architectures. Because the primary goal of MPI is to standardize the operations, yet not sacrifice performance, the decision was made to include all the major choices for point to point semantics in the standard.

The above complexities are manifested in MPI by the existence of modesmodes for point to point communication. Both blocking and nonblocking communications have modes. The mode allows one to choose the semantics of the send operation and, in effect, to influence the underlying protocol of the transfer of data.

In standard modestandard mode the completion of the send does not necessarily mean that the matching receive has started, and no assumption should be made in the application program about whether the out-going data is buffered by MPI. In buffered mode buffered mode the user can guarantee that a certain amount of buffering space is available. The catch is that the space must be explicitly provided by the application program. In synchronous mode synchronous mode a rendezvous semantics between sender and receiver is used. Finally, there is ready mode. ready mode This allows the user to exploit extra knowledge to simplify the protocol and potentially achieve higher performance. In a ready-mode send, the user asserts that the matching receive already has been posted. Modes are covered in Section gif.

next up previous contents
Next: Blocking Send and Up: Point-to-Point Communication Previous: Point-to-Point Communication

Jack Dongarra
Fri Sep 1 06:16:55 EDT 1995