Next: 16.2 Low-Level Primitives Up: 16 The Zipcode Message-Passing Previous: 16 The Zipcode Message-Passing

16.1 Overview of Zipcode

Zipcode is a message-passing system developed originally by Skjellum, beginning at Caltech in the Summer of 1988 [Skjellum:90a], [Skjellum:91c], [Skjellum:92c] and [Skjellum:91a]. Zipcode was created to address features and issues absent in then-existing message-passing systems such as CrOS/Express, described in Section 5.2. In particular, Zipcode was based on an underlying reactive asynchronous low-level message-passing system. CrOS was built on top of loosely synchronous low-level message-passing systems, which reflected CP's initial hardware and applications. Interestingly, both Zipcode and Express have evolved from their starting to quite similar high-level functionality. Currently, Zipcode continues to serve as a vehicle to demonstrate high-level message-passing research concepts and, more importantly, to provide the basis for supporting vendor-independent scalable concurrent libraries; notably, the Multicomputer Toolbox [Falgout:92a], [Skjellum:91b;92a;92d]. The basic assertion of Zipcode is that carefully managed, expressive message-passing is an effective way to program multicomputers and distributed computers, while low-level message passing is admittedly both error-prone and difficult.

The purpose of Zipcode is to manage the message-passing process within parallel codes in an open-ended way. This is done so that large-scale software can be constructed in a multicomputer application, with reduced likelihood that software so constructed will conflict in its dynamic resource use, thereby avoiding potentially hard-to-resolve, source-level conflicts. Furthermore, the message-passing notations provided are to reflect the algorithms and data organizations of the concurrent algorithms, rather than predefined tagging strategies. Tagging, while generic and easy to understand, proves insufficient to support manageable application development. Notational abstractions provide a means for the user to help Zipcode make runtime optimizations when a code runs on systems with specific hardware features. Abstraction is therefore seen as a means to higher performance, and notation is seen as a means towards more understandable, easier-to-develop-and-maintain concurrent software. Context allocation (see below) provides a ``social contract'' within which multiple libraries and codes can coexist reasonably. Contexts are like system-managed ``hypertag''; contexts here are called ``Zipcodes''.

Safety in communication is achieved by context control; the main process data structure is the process list (a collective of processes that are to communicate). These constructs are handled dynamically by the system. Contexts are needed so that diverse codes can be brought together and made to work without the possibility of message-passing conflicts, and without the need to globalize the semantic and syntactic issues of message passing contained in each separate piece of code. For instance, the use and support of independently conceived concurrent libraries requires separate communication space, which contexts support. As applications mature, more contexts are likely to be needed, especially if diverse libraries are linked into the system, or a number of (possibly overlapping) process structures are needed to represent various phases of a calculation. In purely message-passing instances within Zipcode, contexts control the flow of messages through a global messaging resource. In more complex hierarchies, contexts will manage channel and/or shared-memory blocks in the user program, while the notation remains message-passing-like to the user. This evolution is transparent to the user.

Concurrent mathematical libraries are well supported by the definition of multidimensional, logical-process-grid primitives, as provided by Zipcode; one-, two- and three-dimensional grids are currently supported (grid mail classes, also known as virtual topologies). Grids are used to assign machine-independent naming to the processes participating in a calculation, with a shape chosen by the user. Such grids form the basis for higher level data structures that describe how matrices and vectors are shared across a set of processes, but these descriptors are external to Zipcode. New grids may be aligned to existing grids to provide nesting, partitioning, and other desired subsetting of process grids, all done in the machine-independent notation of the parent grid. The routine whoami and associated routines, described in Section 5.2, provide this capability in CrOS/Express.

Mail classes (such as new grid structures) may be added statically to the system; because code cannot move with data in extant multicomputers, mail classes have to be enumerated at compile-time. Because we at present retain a C implementation, rather than C++, the library must currently be modified explicitly to add new classes of mail, rather than by inheritance. Fortunately, the predefined classes (grids, tagged messages) address a number of the situations we have encountered thus far in practical applications. Non-mathematically oriented users may conceive of mail classes that we have not as yet imagined, and which might be application-specific.

Recently, we have evolved the Zipcode system to provide higher level application interfaces to the basic message-passing contexts and classes of mail. These interfaces allow us to unify the notions of heterogeneity and non-uniform memory access hierarchy in a single framework, on a context-by-context basis. For instance, we view a homogeneous collection of multicomputer nodes as a particular type of memory hierarchy. We see this unification of heterogeneity and memory hierarchy in our notation as an important conceptual advance, both for distributed- and concurrent-computing applications of Zipcode. Mainly, heterogeneity impacts transmission bandwidth and should not have to be treated as a separate feature in data transmission, nor should it be explicitly visible in user-defined application code or algorithms, except perhaps in highly restricted method definitions, for performance' sake.

For instance, the notations currently provided by Zipcode support writing application programs so that the same message-passing code can map reasonably well to heterogeneous architectures, to those with shared memory between subsets of nodes, and to those which support active-message strategies. Furthermore, it should be possible to cache limited internode channel resources within the library, transparent to the user. This is possible because the gather-send and scatter-receive notations remove message formatting from the user's control. We provide general gather/scatter specifications through persistent invoice data types. This notation is available both to C and Fortran programmers. As a side effect, we provide a clean interface for message passing in the Fortran environment. If compilers support code inlining and other optimizations, we are convinced that overheads can be drastically reduced for systems with lighter communication overheads than heretofore developed. Cheap dynamic allocation mechanisms also help in this regard, and are easily attainable. In all cases, the user will have to map the process lists to processors to take advantage of the hierarchies, but this can be done systematically using Zipcode.

We define message-passing operations on a context-by-context basis (methods), so that the methods implementing send, receive, combine, broadcast , and so on, are potentially different for each context, reflecting optimizations appropriate to given parts of a hierarchy (homogeneity, power-of-two, flat shared memory, and so on). We have to rely on the user to map the problem to take advantage of such special contexts, but we provide a straightforward mechanism to take advantage of hierarchy through the gather/scatter notation. When compilers provide inlining, we will see significant improvements in performance for lower latency realizations of the system. Higher level notation, and context-by-context method definitions are key to optimizing for memory hierarchy and heterogeneity. Because the user provides us with information on the desired operations, rather than instructions on how to do them, we are able to discover optimizations. Low-level notations cannot hope to achieve this type of optimization, because they do not expose the semantic information in their instructions, nor work over process lists, for which special properties may be asserted (except with extensive compile-time analysis).

This evolutionary process implies that Zipcode has surpassed its original Reactive Kernel/Cosmic Environment platform; it is now planned that Zipcode implementations will be based on one or more of the following in a given implementation:

Hardware-based shared memory (with and/or without an intervening CE/RK layer),
Active-message strategies (cf., [Eiken:92a]),
Pure message passing (with and/or without an intervening CE/RK layer),
Control-network operations definable on process lists (subsets of processes or processors).

Heterogeneous translation can be by one of several translation mechanisms. For instance, XDR [XDR:87a], ELROS [Branstetter:91a;92a] or other strategies (that appropriately balance the work of the sender and recipient in the translation process as a function of their computational bandwidth for such translations). Because invoices are persistent objects, the possibility of nodal vectorization of translated objects is possible, using ELROS or other machine-specific strategies (perhaps user-defined); XDR is not currently amenable to vectorization. Such translation strategies will also be held transparent to the user, except when the user chooses to intervene, by providing a submethod that implements part of an invoice translation. With this approach, we can take advantage of the architectures presented at run time, on a context-by-context basis.

Importantly, when a code is moved to a system that does not have special features (e.g., a purely message-passing system), the user code's calls to Zipcode will compile down to pure message-passing, whereas the calls compile down to faster schemes within special hierarchies. This multifaceted approach to implementing Zipcode follows its original design philosophy; originally, the CE/RK primitives upon which Zipcode is based were the cheapest available primitives for system-level message-passing, and hence the most attractive to build higher level services like Zipcode. Today, vendor operating systems are likely to provide additional services in the other categories mentioned above which, if used directly in applications, would prove unportable, unmanageable, or too low level (like direct use of CE/RK primitives). If a user needs to optimize a code for a specific system, he or she works in terms of process lists and contexts to get desirable mappings from which Zipcode can effect runtime optimizations.

The CE/RK primitives (originally central to Zipcode) manage memory as well as message-passing operations. This is an important feature, carried into the Zipcode system, which is helpful in reducing the number of copies needed to pass a message from sender to recipient in basic message (hence, less wasted bandwidth ). In CE/RK the system provides message space, which is freed upon transmission and allocated upon receipt. This approach removes the need for complicated strategies involving asynchronous sends, in which the user has to poll to see when his or her buffer is once again usable. Since the majority of transmissions in realistic applications involve a gather before send (and scatter on receive), rather than block-data transmission, these semantics provide, on the whole, good notational and performance benefits, while retaining simplicity. Zipcode extends the concept of the CE/RK-managed messages to include buffered messages (for global operations) and synchronizations. These three varieties of primitives make different assumptions about how memory is allocated (and by whom), and are implemented with the most efficient available system calls in a given Zipcode implementation. In all cases, actual memory allocation can be effected using lightweight allocation procedures in efficient implementations, rather than heavyweight mallocs. Therefore, the dynamic nature of the allocations need not imply significant performance penalties.

When moving Zipcode to a new system, the CE/RK layer will normally be the first interface provided, with additional interfaces provided if the hardware's special properties so warrant. In this way, user codes and libraries will come up to speed quickly, yet attain better performance as the Zipcode port is optimized for the new system. We see this as a desirable mode of operation, with the highest initial return on investment.

Next: 16.2 Low-Level Primitives Up: 16 The Zipcode Message-Passing Previous: 16 The Zipcode Message-Passing

Guy Robinson
Wed Mar 1 10:19:35 EST 1995