Minutes of the Message Passing Interface Standard Meeting
                                Dallas, Texas
                             February 17-19, 1993

The MPI Standards Committee met in Dallas on February 17-19, 1993, at the
Bristol Suites Hotel in North Dallas.

This was the fourth meeting of the MPI committee and the second of the now
regular meetings in Dallas.  There were both general meetings of the
committee as a whole and meetings of several of the subcommittees.  Because
interest in the Point-to-Point communications and the Collective
communications was so general, these met as committees of the whole.

No formal decisions were taken at this meeting, but a number of straw votes
were taken in the subcommittees.  These are included as part of the reports
on the work of the subcommittees.

These minutes were taken by Rusty Lusk (lusk@mcs.anl.gov) and Bob Knighten
(knighten@ssd.intel.com).

These minutes are quite long.  If you want to see the important topics you
can search for --- and this will quickly the lead to each topic (and a few
other things.)


Wednesday, February 17
---------  -----------

-------------------------------------------------------------------------------
			       General Meeting
-------------------------------------------------------------------------------


Jack Dongarra called the meeting to order at 1:30, There was a discussion
of the agenda.  Other topics included the possibility of some DARPA funding
and a tutorial for Supercomputing '93.  The next meeting will be March
31-April 2 at same place (the Bristol Suites in Dallas).  The following
meetings are tentatively scheduled for May 12-14 and June 23-25.

Bob Knighten proposed that we set a definite schedule, particularly if we
are going to follow the example of the HPF committee.  This was discussed more
fully on Friday.  (Search for "Schedule" below.)


Attendees:
---------

Joe Baron               IBM Austin               jbaron@vnet.ibm.com
Harry Scott Berryman    Yale Univ.               berryman@cs.yale.edu
Lyndon Clarke	        EPCC, U. Edinburgh       lyndon@epcc.ed.ac.uk
James Cownie	        Meiko			 jim@meiko.co.uk
Jack Dongarra	        UT/ORNL			 dongarra@cs.utk.edu
Vince Fernando          NAG & UC Berkeley	 fernando@jaguar.berkeley.com
Jon Flower	        ParaSoft		 jwf@parasoft.com
Daniel Frye	        IBM-Kingston		 danielf@kgnvma.vnet.ibm.com
Al Geist	        ORNL			 gst@ornl.gov
Adam Greenberg          TMC			 moose@think.com
Bill Gropp	        ANL			 gropp@mcs.anl.gov
Leslie Hart	        NOAA/FSL		 hart@fsl.noaa.gov
Tom Haupt	        Syracuse U.		 haupt@npac.syr.edu
Don Heller              Shell Development	 heller@shell.com
Rolf Hempel	        GMD			 hempel@gmd.de
Tom Henderson	        NOAA/FSL		 hender@fsl.noaa.gov
Steven Huss-Lederman    SRC			 lederman@super.org
John Kapenga            Western Michigan U.      john@cs.wmich.edu
Bob Knighten	        Intel SSD		 knighten@ssd.intel.com
Rik Littlefield         PNL			 rj_littlefield@pnl.gov
Rusty Lusk	        ANL			 lusk@mcs.anl.gov
Peter Madams            nCube			 pmadams@ncube.com
Alan Mainwaring         TMC                      amm@think.com
Oliver McBryan          U. Colorado              mcbryan@cs.colorado.edu
Barney Maccabe          Sandia                   abmacca@cs.sandia.gov
Dan Nessett	        LLNL			 nessett@llnl.gov
Steve Otto	        Oregon Graduate Institute otto@cse.ogi.edu
Peter Pacheco           U. of San Francisco      peter@sun.math.usfca.edu
Howard Palmer	        nCube			 hep@ncube.com
Paul Pierce	        Intel			 prp@ssd.intel.com
Sanjay Ranka            Syracuse U.              ranka@top.cis.syr.edu
Peter Rigsbee	        Cray Research		 par@cray.com
Mark Sears              Sandia			 mpsears@cs.sandia.gov
Anthony Skjellum        Mississippi State U.	 tony@cs.msstate.edu
Marc Snir               IBM, T.J. Watson         snir@watson.ibm.com
Alan Sussman            U. of Maryland           als@cs.umd.edu
Bob Tomlinson           LANL			 bob@lanl.gov
Dennis Weeks	        Convex			 weeks@convex.com
Stephen Wheat	        Sandia NL		 srwheat@cs.sandia.gov
Stephen Ericsson Zenith Kuck & Associates        zenith@kai.com


The group then became a committee of the whole to meet as the
Point-to-Point Communications Subcommittee.

-------------------------------------------------------------------------------
			 Point-to-Point Subcommittee
-------------------------------------------------------------------------------

Marc Snir opened point-to-point subcommittee meeting, and asked for
discussion of his draft ("Point-to-Point Communication" by Marc Snir, Feb
8, 1993; this is also included in the overall MPI draft dated February 16,
1993.)

He asked about additions to his draft.  Cancel was mentioned and was
discussed later.


Alignment of "sequence of bytes" buffers:
--------- -- --------- -- ------ -------

Marc began discussion of the draft by asking about alignment.  There
followed a discussion of whether messages of type "sequence of bytes"
should be restricted to be of length a multiple of 4 or 8, or should be
aligned.

Jim Cownie proposed that we also vote on requiring that all data types
start on their natural boundaries.  We decided that this was too
restrictive, given that some Fortran compilers do not deliver this.


Straw vote:  String of bytes can start on any byte address and can be of any
----------   length (including 0.)  
             Yes: 34  No: 0 

After a question from Bob Knighten, it was agreed that bytes have 8 bits.
(Bob Knighten reminded us that few other standards require that.


Named Constants for Options:
----- --------- --- -------

Discussion of whether we should use named constants or specific values for
various options.  Fortran 77 does not specify an "include" facility.

Straw vote:  Use named constants?
----------   Yes: 27  No: 2

Scott Berryman spoke in favor of using fixed constants, because
of existing Fortran practice.  Joe Baron also spoke in favor.  This was
deferred to the language binding committee.


Structure of Buffer:
--------- -- ------

Note the message is a sequence of bytes (at this point).  There is no
requirement that the structures on send and receive side match.  (Can move
contiguous area into scattered area, etc.)

Paul Pierce raised the issue that we need to be clear about whether we want
to use *real* iovecs or not.

General discussion of how general the datavec should be.  Tony Skjellum
spoke in favor of generality (mentioned BLAS, which don't have most general
striding, so people invent their own.)

Jim Cownie suggested that we include data type in the descriptor vectors
(that is, there would always be a data type, which might be "byte").

Bill Gropp suggested that the data descriptor vector be an opaque data
type, both to help Fortran binding, to allow taking advantages of real
iovecs where they apply, and to allow extensibility.

Baron mentioned the FORTRAN favorite way of specifying these vectors, with
a[b[i]].  This was postponed to the language binding committee.

(Note: a concrete proposal by Gropp and Lusk on how data description
vectors might be handled was made later, and is described below.)


Receive Criteria:
------- --------

(See proposal) selection by tag and by source, and by context.

Tony Skjellum proposed that an AND mask be available to deal with ranges of
bits in the types and sources.  Scott Berryman objected to "bit-twiddling"
in standards.

Truncation of buffer:
---------- -- ------

Rolf spoke in favor of allowing buffer to be longer than the message received,
although some thought this should be an error.  There was general agreement
that "too short messages" should not be erroneous.  Al Geist said truncation
(messages too long) should be an error.  Paul Pierce said that experience was
on this side.

Jim Cownie said that we should be clear about whether the first part of a
truncated message appears in the buffer.  Marc Snir observed that standards
seldom specify the behavior of an erroneous program.

Straw vote:  Matched messages must fit in buffer, otherwise an error.
----------   Yes: 26  No: 0 


Send and Receive:
---- --- -------

Marc Snir noted that all the point-to-point operations can be defined in terms
of the four low level operations: INIT, START, COMPLETE, FREE.  See Section
1.5 (Communication Handles) of Snir's document.

Discussion of how restrictive the init routines should be (difference
between Snir and Gropp-Lusk proposals.)

Much discussion of efficiency of handle creation and modification, whether
handle would be in system space or user space,and whether there should be
default values.

Some discussion of whether we should have this Level 1 (of the Gropp-Lusk
multilevel proposal) at all.  (General agreement that it should.)

Handles:  Do we have handles?  Do handles have default values?  Are
-------   handles malleable?

Adam Greenberg argued that modifiable handles make channels much harder.  They
are hard to start with.  [NOTE: If so, then must be a single "create handle
with attributes" operation.]

Straw vote:  Unmodifiable handles? 
----------   Yes:  1  No: the rest

Bill Gropp suggested that handles could be created, then repeatedly
modified, and finally "committed to", after which time they cannot be
modified without recommitment before use.  This allows creation to proceed
by modifying defaults, and some sort of compilation to take place on the
commit operation.

Jim Cownie proposed a "dup" function for handles.

Straw vote:  This section should be rewritten to allow modification,
----------   followed by commit operation.  (Can even modify and reuse
             repeatedly.) 
             Yes: 28  No: 0
   
What about defaults.  One argument against defaults - want to be able
to catch unset fields.  Gropp/Lusk went with defaults because
extensibility and requiring setting of attributes of handles are
conflicting as new attributes break old programs.  Also did not want
to be able to create handles that cannot be used.

Straw vote:  Have Defaults
----------   Yes: 20  No: 5


Parameters of a handle:
           buffer
           start mode

Some more discussion of ready receive semantics.  Marc says he added
ready-send semantics more for symmetry, but also to allow for the "pull"
model of communication, to go along with the currently prevalent "push"
model.  General discussion followed about whether it provides a way to
write erroneous programs.  It was pointed out that the call always works;
on machines which don't support a special protocol it merely provides no
performance improvement.  But on some machines the ready-receiver semantics
of send provides a performance win.  Ready-receiver (send message with fore
knowledge that receiver has posted the receive) passed narrowly last time,
so another vote was taken.

Straw vote:  Have ready-receive?
----------   Yes: 20  No: 12


When does an operation complete? (1.5.4 of Snir proposal)
---- ---- -- --------- --------

Adam Greenberg began by asking whether we need synchronous mode (send does
not complete until receive has completed at the other end) if we are
assuming reliable communication.  General discussion of whether we need
synch mode.  Lusk retracts previous arguments in favor of synch mode.
Greenberg asks why.  Only argument in favor is that it can use no
buffering.  Jim Cownie pointed out that it is a way of forcing the effect
of "no system buffering".  Paul Pierce argued that there be a global method
of specifying that no system buffering is to be used.

Straw vote:  Have synchronous mode?
----------   Yes: 10  No: 15

As a consequence section 1.5.4 goes away.


Extracting information from handle on completion: (Section 1.5.6)
---------- ----------- ---- ------ -- ----------

Separate COMPLETE-RECEIVE and COMPLETE-SEND?  Al Geist: separate improves
clarity.  Jim Cownie: same allows for waiting on a set of completions of
mixed sends and receives.  Marc Snir: the completes have different
parameters.

Paul Pierce: could have query between complete and free, so that complete
could have same parameters.

Jim Cownie spoke in favor of this: it is similar to the modifying the
attributes of a handle and then committing it.  But he proposes that the user
pass to the complete routine an area (in user space) where all the parameters
are stashed.  Marc says that the parameters then become different again, since
these structures are not the same.  Jim: it still could be.

Rehash of commit and handles in general.  Should the commit return a system
handle?

Straw vote:  Have a separate query function.
----------   Yes: 21  No: 3

Straw vote:  Have a single complete function?
----------   Yes: 23  No: 0

Rik Littlefield noted that the phrase at the top of page 12: "Note that it
is correct, but inefficient, to implement MPI_CHECK via a call to
MPI_COMPLETE, in which case, MPI_CHECK always returns true." is wrong.
This will change to remove comment at top of p. 12 that says mpi_check
might block.


Higher-Level Operations:  (Section 1.6)
------------ ----------

Now that synchronous operations have been discarded,there are only 6
operations for each send and receive, as opposed to 12.

Jim Cownie objected to in/out arguments.

Discussion: Jim Cownie reminded us of "solution number 5":
(For more information see minutes of last meeting.)

 1: handle + inquiry.
 2: pass in arguments where unwanted information is 
 5: pass in structure where things are stashed (in user space)
   (out arguments are replaced by one structure pointer to opaque structure)
    This is also thread-safe.

Three proposals:  in/out arguments, multiple parameters, opaque structures.

Straw vote:  Do not use in/out arguments?
----------   Yes: 32  No: 2

Straw vote:  Package the arguments in a structure rather than using list of
----------   arguments. 
             Yes: 24  No: 8

The Point-to-Point Communication Subcommittee meeting ended at this point.
It continued on Thursday.


 ------------------------------------------------------------------------------
			       General Meeting
 ------------------------------------------------------------------------------

There was a brief general meeting with Marc Snir presiding.

Short discussion of next meeting dates:

Next meeting March 31, April 1 - 2 as scheduled
6 weeks later is May 12 - 14; this was approved.
and June 23 - 25 was approved.

The subcommittees meeting Wednesday night were: Collective, Topology,
Context, Introduction, Environment and Profiling.

______________________________________________________________________
______________________________________________________________________

Some subcommittee meetings took place Wednesday evening.  Reports on those
meetings are part of the General Meeting minutes for Friday.
______________________________________________________________________
______________________________________________________________________

Thursday, February 18:
--------  -------- --

Marc reminded us that we need to move quickly toward readings and approval

---------------------------------------------------------------------------
                   Collective Communication Subcommittee
---------------------------------------------------------------------------

Al Geist opened the collective communication discussion at 9:15.  He
reviewed where we were at the end of the last meeting and urged people to
send in written proposals.


Broadcast:
---------

Discussion of syntax of broadcast -- do both the sender and receivers of a
broadcast call mpi_bcast or do the receivers call mpi-receive.  If the
receive must handle broadcasts, it puts an extra burden on them.
Suggestion that there are applications, such as discrete-event simulation,
where it would be convenient if broadcasts were received by normal
receives.

Marc Snir: Do we have the same data types as for point-to-point messages?
Al Geist: Yes.

Discussion of whether the broadcast should be synchronous or not.

Straw vote:  Broadcast not required to be synchronous?
----------   Yes: 11  No: 5  (There were 37 people in the room.)

Straw vote:  Have a non-blocking broadcast?
----------   Yes: 21  No: 7

A non-blocking broadcast returns as soon as possible, but buffers are
invalid until the operation is complete (as verified by some inquiry
routine.)

Buffer options will be like point to point.

Straw vote:  Broadcast is received by broadcast, not receive.
----------   Yes: unanimous


Barrier:
-------

Discussion of tag on barrier.

Jim Cownie suggested a non-blocking barrier, so that one could initiate a
test for whether processes have reached a certain point, and test later.

Non-blocking barrier: Each process entering the barrier posts that fact.
There is an inquiry function to check that everyone in group has entered
barrier.

Non-blocking barriers would then *require* tags, since one could be
participating in multiple barriers.  Scott Berryman said that he has had to
implement this.  Adam Greenberg noted that CM-5 implements in hardware
(with limitations).  Users would like with fewer limitations.

Straw vote:  Should we have a non-blocking barrier?
----------   Yes:  No: 1

Straw vote:  Should we have a tag for both blocking and nonblocking barriers?
----------   Yes: 31  No: 1


gather/concat:
-------------

There is a need for a gather-then-broadcast.

Discussion of whether there should be in and out buffers on the gather.
Observation that buf is an IN/OUT parameter and for root is actually used
both IN and OUT.  Proposal to separate the IN and OUT buffers.  General
discussion: How much buffer space and where needed.

GENERAL_GATHER(in_buf, out_buf, bytes, out_bytes, tag, context, gid)
out_bytes = 0 indicating if result is to be delivered in out_buf

Weeks: Note that bytes must be the same in every call.  If non-blocking,
one call may return without any error indication even though call is
erroneous.

Weeks: Proposal to specify out_bytes with it being 0 if result buffer fills
in out_buf.

In any case should have in_buf and out_buf because root is using buf for
two very different purposes.  Greenberg, Flower: Agree but for different
reasons.

Alternative proposal:

GENERAL_GATHER(in_buf, out_buf, bytes, flag, tag, context, gid) flag
indicates if result is to be delivered in out_buf

Oliver McBryan: Question of whether gen_gather should replace concat.

Straw vote:  Use two buffers (inbuf, outbuf), not a single in/out-buf on gather
----------   Yes: 33  No: 0

There is a problem of when it is know where flag is set.  A straw vote was
proposed, but Adam Greenberg argued that we need further discussion before
voting and the vote was postponed.

Straw vote:  Have a gather function to a single "root node"?
----------   Yes: 32  No: 0

Straw vote:  Have an all-to-all gather?
----------   Yes: 26  No: 0

Straw vote:  Should we have any further discussion of including flag?
----------   Yes: 9  No: 15
[So absent a new proposal flag is out.]

Straw vote:  Have non-blocking versions of gather and all-to-all?
----------   Yes: 18  No: 5

Straw vote:  Have a gen_gather and all-to all, with different buffer size
----------   on each process.
             Yes: 29  No: 0


Creating and Freeing Groups:
-------- --- ------- ------

Al Geist opened discussion on creating and freeing groups.  He pointed out
that this is still tangled up with process id's and contexts.

mkgroup (list of processes, specified somehow) returns group id.
  called by all processes,

Cownie and McBryan proposed that all in a group call this, passing flag to
say whether they want to join new group or not.

Rolf says that we don't want to have to have topologies to have groups.

Marc pointed out that the "flag" version subsumes the list version, given
that everyone calls it, and pointed out that some sort of global synch is
desirable in order to have system-global gids.

Discussion of whether gids are globally known (i.e. known to all processes
in a group) and valid or not.

gids could be valid
   only in one process
   only among members of a group
   system wide

Discussion of only making subgroups vs. creating union of groups.

Discussion of whether we want to deal with dynamic process creation or not.
In dynamic situation, there is no common ancestor for group creation.

Tony suggested that people are going to want dynamic groups.  MPI must be
competitive with PVM.  Paul Pierce called for a concrete proposal, since
dynamic processes goes considerably beyond what we have seriously
considered so far.

Oliver McBryan proposed a special join operation for forming unions of
groups.

Marc proposed that we vote at least on an operation on partitioning an
existing group by key.

Straw vote:  Should MPI have an operation to partition an existing group?
----------   Yes: 32  No: 0


Should we throw out the list version?

Straw vote:  Should MPI only provide partitions of an existing group?
----------   Yes: 10  No: 17
[The alternative is to keep the list form of mkgroup.]

This was the end of the Collective Communications Subcommittee meeting.


-------------------------------------------------------------------------------
			 Point-to-Point Subcommittee
-------------------------------------------------------------------------------

Marc Snir called the Point-to-Point Subcommittee to order at 1:30 p.m.

He started with a review of the previous day and reminded us that we were now
discussing Level II rather than Level I.

(Page numbers from here on in the minutes refer to the "Draft Document for a
Standard Message-Passing Interface" of February 16, 1993 [prepared and
distributed by Steve Otto] whereas previous page numbers referred to the
draft "Point-to-Point Communication" by Marc Snir, Feb 8, 1993.)

Discussion of how to get information about completion of calls: opaque
structures used for both blocking and non-blocking.

Discussion that a program in which if there is a posted send on one process
and a matching posted receive on another node, then the operations will
eventually complete.

Jim Cownie proposed that there be only one type of handle, and then only
one kind of wait: wait(handle) together with query functions.

Discussion of having a uniform wait routine.  We don't have concrete
proposals for what the query functions would look like.

  wait(handle, opaque_return)

  query (opaque_return,.... )  
    or could have layered higher-level specialized query function.

Tony Skjellum pointed out that you will first have to query the
opaque_return to find out what the type of the handle is, in order to
determine which query function to use on it.  (Applicable to wait_any.)

Straw vote:  wait_send, wait_recv with different parameters?
----------   Yes: 2  No: 29
(The preferred alternative is to have a uniform wait, handles, opaque_return
and query function(s)).

Steven Zenith asked that the word "alternation" be replace by "choice" - this
was accepted without further discussion.


Waiting on Set of Events (wait_any):  (pages 20-21).
------- -- --- -- ------  --------

The two functions discussed are wait_any and wait_all.

Wait_any completes a single operation, and the handle is freed.

There was a discussion of which handle is selected and the issue of fairness.
It was observed that it would be responsibility of programmer to pass always a
list of valid handles.  Another possibility would be to have wait_any modify
the list of handles on return (and would change the matched handle to a magic
"null" handle that would match nothing but always be accepted.)  What would
happen if only these null values were passed?

Rik Littlefield suggested that the in/out argument problem doesn't apply
here, so it would be better to have the handle list modified.  The specific
freed handle replaced by NULL, say.

Straw vote:  wait_any(list, index, opaque_return) returning index and
----------   opaque_return with index identifying the handle returned
             Yes: 28 Con: 1

Straw vote: "null" handles (for deletion from list)?
----------  Yes: 19  No: 3

Straw vote:  wait_any to set handle matched to null?  
----------   Yes: 21  No: 6
(Bob Knighten pointed out that this simplifies handling of a shared list when
there are multiple threads.)

Is all-null an error or is it a no-op?  Postponed.

Straw vote:  Should we have a wait_all?
----------   Yes: 15  No: 9

  wait_all(list-of-handles, list-of-opaque-returns) thus approved.

Bill Gropp suggested that we should worry about an error during this
operation.

Should we have a wait_all for *all* operations?

Jim Cownie suggested that if we know what contexts are, we might want to
have a wait_all for all events in the context.

Other questions: What happens with multiple wait_all with overlapping lists?
What happens in a multithreaded environment?


Probe:
-----

We want to receive a message without knowing its length, for example.
(probe returns the envelope of the message and locks the message).  Then
you can receive it.  We should also have an unlock operation.

This is a different sort of handle than at level 1, since at level one the
buffer has been associated with the handle, but for this one you learn
about the buffer.

Therefore this section (middle of p 21. )  should have "handle" replaced
by, say, "lock".  This requires a separate receive in order to receive on a
lock.

Also the in/out parameters should have input parameters, plus an opaque
return object.

The revised functions are:

  mpi_probe(source, tag, context, opaque_return, lock)

  mpi_precvx(lock,...)
  mpi_unlock(lock)

Peter Rigsbee suggested that there be a wait version and a status version.
(wait version doesn't return until it returns with a lock).

Discussion of blocking, nonblocking version of the precvx.

Jim Cownie argued against unlock, since probing is sort of a contract to
receive the message.  There was a counter that it is often desirable to decide
NOT to read this message but rather toss it back into the pool.

What to do about message order after unlocking?  Several
possibilities: head of queue, original location, tail of queue,
unspecified.  PROBE clearly perturbs the order of receipt of
messages.  Skjellum and Lusk argue that the only reason for this is
because of other problems - we ought to fix those problems instead.
(The problems here are the specification of fixed size buffers all
over the place rather than just providing the needed buffers to the
extent possible.)

Steve Wheat proposed that we do away with the lock and the new receive.
Have PROBE(source,tag,context,info) and then using a blocking receive to get
the particular message the probe located.  Dealing with thread safety can
be dealt with by using critical regions.  Note that you cannot use this to
look at the entire message queue.

Paul Pierce proposed that "you get the buffer and give me the address" be
one of the buffer types.

Summarize:

   1.   lock, unlock precv
   2.   probe gives you info, then you receive from that tag-source.
   3.   get rid of probes. and mpi gets buffer for you.


Buffer Descriptors:
------ -----------

Bill Gropp described a buffer-descriptor proposal, in which there is a
function  create_bd( ... )  returning a buffer-descriptor.  It is then
possible to append different kinds of descriptors.  The idea is to be able to
build arbitrary structures for mixed data types and gather-scatter
operations.  The append operations might look like:

  bd_contig(bd, address, datatype, numitems)

(so can build mixed-type heterogeneous messages)

  bd_stride(bd, address, datatype, stride, numitems {, itemlen?})

for strided data

  bd_abi( bd, address, index_array, datatype, numitems)

for indirect address vectors {abi stands for A[B[I]]}

free_bd(bd). or have send free it.  Note probably want to reuse, so free
should be explicit.

Al Geist noted that this is for sophisticated users, so likely can get
by with just bd_stride.

Data types?  Gropp's current view is that these are only primitive data
types, not derived data types, e.g. no structures.  Can build buffer
descriptor for structure by multiple calls.  It would be nice to have
a program/function that would do that for the programmer.  That would
not be hard.

Note that this is very different from existing practice.  There
should certainly be a level close to existing practice.

After a discussion of this, there was a

Straw vote:  Get a fleshed-out proposal for this?
----------   Yes: 29  No: 2

Straw vote:  Should there also be a simple version for contiguous messages
----------   of fixed types (to better conform to existing practice)?
             Yes: 28  No: 3

This was the end of the data type discussion and Marc Snir took the floor
again.


Cancel:
------

Someone brought up that all the operations for which we can wait, require
perhaps a cancel.

Jim Cownie argued (again) that the worrisome case is the wait for an
outstanding nonblocking receive.  MPI_cancel(handle) guarantees that the
buffer will not be written on.  Here is a sample program that illustrates
the usefulness of cancel:

  IRECV(...)
  REPEAT(...)
    WAIT(HANDLE)
    . . .
    IRECV(...)
  UNTIL(converge)
  CANCEL

Berryman: this could be handled with an appropriate use of tags.  Snir:
Cancel send could be done with free_handle.  Pierce: That should be an
error.  Marc withdrew the suggestion.

This led us into a discussion of whether MPI will require a fixed number of
processes.  Note that introduction does not discuss the (fixed number of
processes) requirement.


Correctness:  (p. 21)
-----------

What is a correct MPI program?  What is done with erroneous MPI programs?

Review of message-order preservation.  In the case of threads, there may
not be an order to the messages.  So *if* there is an order on messages, it
is preserved.  Note that this is for *matching* receives.

What about receive any with two messages from same source?  They should be
received in order.  There is also a fairness question.

Paul Pierce said that the important order is the order in which the
receives are posted, not the order of the receives themselves, so that
messages land in the correct buffers.  General agreement.

Jim Cownie brought up the fairness issue with respect to receive-any.


Progress and Fairness:
-------- --- --------

This brings up the resources issue.

Discussion of minimum resource requirements.  number of handles, etc.

Bob Knighten proposed that the bounds be implicit in the test suite.

It was proposed that there be an appendix to describe implementation
profiles, which will be an agreement on what an implementation will try to
support.

Mark Snir's current document attempts to specify the weakest possible
requirement - no system buffering buffering is explicitly required.
Rather there is just the requirement that if a matching send and receive
have been posted, the the operation will complete.  Marc noted that this
can be extended to collective communications.

Oliver McBryan suggested that the user could supply some space to the
system that it could use for MPI, even on machines that supply no
buffering.

Rik Littlefield suggested that the user could provide the buffer space to
the system, and declare its requirements.

This discussion was deferred until there is a concrete proposal from the
Environment Subcommittee (which has renamed itself from the Environmental
Subcommittee to the Environmental Management Subcommittee).


Error Handling:
----- --------

Snir: Two communities: program writers and system programmers
System programmers not likely to write in Fortran.
Single mechanism may not be suitable across languages.

One approach, especially appropriate for Fortran application programmers:
an error should bring the system down and maybe help you debug, the other,
for system programmers writing in C, test return codes for errors.

Marc suggested that the default be to blow up.

Jim Cownie suggested the solution of having an alternate set of routines,
and Rick Littlefield pointed out that this is the only possible thread-safe
mechanism.  Paul Pierce suggested having the syntactic alternatives, as in
NX.

It was suggested that the error-handling mode should be attached to the
levels.

It was proposed that C routines return negative values on error, while
Fortran routines are used to having extra out parameter.

Marc summarized the alternatives:

  both f77 and C always return error code.
  only C
  2 different libraries
  can select to signal when error occurs
  At what granularity?
    per job
    per context

Straw vote:  F77 code should {always/never} return an error code?
----------   Always: 22  Never: 3

Straw vote:  Should there be alternate libraries to select between these
----------   alternatives? 
             Yes: 6  No: 18

Without debate it was assumed that the same result would prevail in a vote
for C.

Straw vote:  Can select (in some way) what happens when an error occurs?
----------   In F77:  Yes: 21  No: 6

Without debate it was assumed the same vote would prevail in C.


-------------------------------------------------------------------------------
			       General Meeting
-------------------------------------------------------------------------------

Friday, February 19:
------  -------- --

The meeting started at 9:00 with:

Report of the Context Subcommittee:
------ -- --- ------ ------------

Tony Skjellum reported on the meeting of the previous night (see notes
above.)  There is a Contexts Draft 1.0 that is available.

Contexts are a partition of the tag space for matching, no wild cards on
context.  Contexts chosen by users, tags by users.

MPI_NEW_CONTEXT is executed by one process to obtain a context.  It is then
broadcast to those who need it.

If there is a group all there is an associated context, to deal with
bootstrap problem.

Pairwise message ordering is preserved within context.

The context subcommittee will try to come next time with proposals that are
consistent and resolve the circular interaction among groups, contexts, and
process identifiers.

Rick Littlefield noted that there is no mechanism for statically created
context.  He prefers static name_server model.

Discussion of whether groups can be used to replace contexts.  Marc Snir
pointed out that the real question is that of another parameter one sends,
and whether we call it gid or context is irrelevant.  Rolf Hempel pointed
out that the intentioned use is quite different, so we should have both
concepts, like we do selectivity on source although it can be encoded in
the tag,

Steven Wheat said that his users understand contexts quite clearly as types
you can't wildcard, while groups are more confusing.

Jim Cownie suggested that there is a definite bootstrap problem with
getting the context to the processes that need it.  Tony Skjellum said that
in zipcode contexts are associated with groups, so obtaining a context is a
synchronized, group operation.

John Kapenga spoke in favor of a way for a group to obtain an associated
context.

Marc Snir spoke in favor of the name-server approach for libraries.

Rik LIttlefield said that it would be nice to have guidelines on how to
write an MPI-safe library.

Marc Snir summarized that the extra match field can be:

   separate
   in the tag  (tag range registration)
   in the pid  (send to ports instead of processes)

Paul Pierce summarized issues that need to be addressed in concrete
proposals:

   are groups local, are contexts global or local
   how to implement service.

General agreement that the context subcommittee will produce a new white
paper clarifying these issues.


Report of the Process Topologies Subcommittee:
------ -- --- ------- ---------- ------------

Rolf Hempel reported on the work of the Process Topologies Subcommittee.

(See the proposal in the "Draft Document for a Standard Message-Passing
Interface" of February 16, 1993)

the Process Topologies and Collective Communication Subcommittees met
jointly, as both deal with groups.

Division of Responsibilities:
-------- -- ---------------- 

 Process Topologies                      Collective Communications

topology group creation                	basic group creation
group partitioning along		group partitioning by key
    coordinate lines      		


Topology Functions:
-------- ---------

Topology definition function always creates a new group/
advantage: ranks in parent group so not change
          ranks in new group are aligned with the Topology   

supported topologies:

agreed on  cartesian structures  (grids, tori)
           arbitrary graphs

This is all we should aim at for MPI-1.  We can do trees later.

{McBryan: What is relation of order to topology?  There should be
translation functions.  What is order of a tree?}


Standard case:

  MPI decides which process in group gets which position in topology.

{Geist: Does this mean the topology must be encoded in the gid?
 Hemple: No.}

Additional option:

  User assigns topology position to each process explicitly.  (Marc Snir
will write a proposal)


Mapping:
-------

  MPI implementation may try to efficiently place processes.

Option:  user can explicitly ask for random mapping.  This would be
more efficient.  Also could be used to explicitly request a random
pattern of communication.  This might be redundant, since it could be
a user-requested mapping.

{Random or arbitrary?  Random.  Why random?  Because it is sometimes
the correct answer. May reduce contention over any systematic placement.}


Indexing in MPI:
-------- -- ---

There is a general problem in MPI:  How are n object numbered?

      0, 1, ..., n-1   C style
      1, 2, ..., n     Fortran style

For inter-language compatibility we should pick one, but this issue was
deferred to the Language-Binding Subcommittee.


Applications:
------------

   rank in group
   node numbers in graph structures
   MPI_WAITANY

there was a possibility of having this alternative selectability, but
Tony pointed out that this breaks libraries.

Straw vote:  MPI to number objects using the C convention (0, ..., n-1)?
----------   Yes: 32  No: 1

(So (0,0) is the first element of a two-dimensional arrays.)

Another issue, row-major vs. column-major in arrays.  visible in ranks
in groups and order in buffering.  Alignment of groups and subgroups.
General discussion.  is (0,1) the second element or (1,0) the second
element?

Discussion of usefulness of elaborate mappings, and whether vendors
will offer support for this.  Otherwise, topologies are lightweight
(function mesh(i,j) returns process id)

Snir: General comments on proposal.  One purpose is to renumber
processes so that they can be more efficiently placed.  But will this
actually be used by any vendor.  
Hempel: This is not part of the standard and creating a new group
solves other problems.  
Snir/McBryan advantage of topology is ability to express communication
in terms of topology.  Minor advantage.  
Geist: Relation of collective communication to topolgy?  Want SHIFT to
be relative to topology?  But argument is a gid and so how is topology
encoded?  
Topologies are implemented on top of (augmented) groups.  I.e. the
group associated with a topology can "know" the process order for
"shift left".  Marc Snir suggested that Topology functions be local
(who is my left neighbor, right neighbor)
McBryan: Sounds like this should be in a library, not in the language.
Hempel: MPI is not a language.  
Snir: Need an interesting example of a system where this will actually
be used, otherwise this is only a convenience feature.  
Cownie: Based on experience with Parmacs on machines where this
placement is important.  Have vendors moved onto machines where this
is no longer relevant.
Hempel: It could be a serious mistake to believe that situation with
newest machines shows that placement is not relevant to future very
large machines.
Skjellum: With very large machines, the entire model will have to
change.  Topology is not the right way to provide information.  Not
relevant to program running several distinct kernels.
Snir: Global communication vs. local communication.
Berryman: Do not accept idea that topology is not relevant.  User
needs to be aware of machine topology and able to use this in program.

Jim Cownie pointed out that on machines where the processes are on the
nodes of the switch network, and there there is an important performance
benefit to mapping correctly.  But new machines are getting away from
this, and process placement may become less of an issue.  Machines are
becoming "flatter".

End of Rolf's discussion of topologies.


MPI Tutorial at Supercomputing `93:
--- -------- -- ------------------

Rusty Lusk asked those interested in participating in a tutorial on
MPI at Supercomputing '93 contact Jack Dongarra or himself.


Validation of MPI:
---------- -- ---

Oliver McBryan suggested that we have an effort to write, port,
provide and share applications programs for MPI.

Rusty Lusk noted the ongoing implementation by Bill Gropp and himself
that will allow people to test applications soon.

There will be an effort to port some HPF programs to MPI.


Schedule:
--------

Bob Knighten asked about the schedule.  Marc said that we would have a
reading about the language-independent stuff, and the C and fortran
bindings separately.

The point-to-point proposals will be ready for the next meeting.
Profiling should be ready.
Introduction should be ready.

Others should be ready for first reading at the following meeting.


Report on Environmental Management Subcommittee  (Bill Gropp):
------ -- ------------- ---------- ------------

  three classes of routines
     MPI
     Parallel-related
     Non-MPI useful, but not specifically MPI, like high-resolution timers.

3 routines agreed on (no syntax)

mytid
numtid
validtags

Management Hints:
---------- -----

  provide requested value
  return actual value

for implementation limits and characteristics.  Exact choices of items
that can be managed not determined yet.

We aren't doing error handling.


Report of the Language Binding Subcommittee
------ -- --- -------- ------- ------------

Scott Berryman reported on the Language Binding Subcommittee.  This
subcommittee has been biding its time.  There will be a "Thou shalt not"
list on the network soon.

A proposal for standards used and exceptions allowed will be presented at
next meeting.  For F77 the basic proposal will be F77 plus long names plus
underscores plus include.  We will vote on these at the next meeting.

John Kapenga asked whether we want to say anything about I/O?

Jon Flower pointed out that we need some minimal requirement, driven
by the need to write a test suite.

Bill Gropp pointed out that minimally we should be able to run this
program:

        if (master) printf ("hello").


Jim Cownie and Rusty Lusk said that all one really needs is a
requirement that at least one node be able to do stdio.  Marc Snir
pointed out that there needs to be an enquiry function to find out
which node can do I/O.


 ------------------------------------------------------------------------------
                                   MPI -1.1
 ------------------------------------------------------------------------------

Jim Cownie presented MPI -1.1.  A revision from -1.0 was required
since many of the concepts humorously presented there have now been
adopted into MPI  (e.g., non-blocking barrier.)


Design approach:  macc o  (opposite of Occam!)
                      h

  "Entia sunt multiplicanda."

 Objective:  To be as complex as possible with no coherent subsets.

Developments since MPI -1.0

  Non blocking barrier remove to MPI-1
  Handles added - needed for opening doors
  NPROCS redefined - now guaranteed never to return the same answer twice.
  Number of collective routines increased - need more to keep all procs busy.
  Another two versions of all functions probably erroneous &
guaranteed erroneous

  All errors are opaque (following industry practice)
  Non blocking exit added
{But can it be canceled?}
  
Preserved from MPI -1
  All groups are contexts
  All contexts are groups

Environmental management:
  I require the hardware to be ...
  I require the vendor to be ...

 ------------------------------------------------------------------------------
 ------------------------------------------------------------------------------
Thursday night there were meetings of at several subcommittees.  Notes from
the meeting of the Communication Contexts Subcommittee are included below.
 ------------------------------------------------------------------------------

Communication Contexts
February 18, 1993

Anthony Skjellum opened the discussion

[Started with 12, another 10 or so came in about 9:20]

Contexts are to make it feasible to build scalable software that can
be mixed.

TAGS - unstructured bits (at least 32)

   1.  definition of tags
   2.  matching of tags

CONTEXT - unstructured integer, system assigned

NEW_CONTEXTS(number_of_contexts, array_of_contexts)
 number specified, array returned
FREE_CONTEXTS()
CONTEXTS_AVAILABLE()  minimum in system >= 16K

contexts are gotten by one process and then distributed to others


Contexts
--------

1) Avoid crossing messages between libraries and user code

Select: by source, by tag (and mask), by context - "safe"

Relation to groups and global operations

Use of contexts in a 3-D grid model: context for each full two dimensional
array section => big numbers

Alternative need for multiple name spaces apart from groups: Wheat: Server
with multiple contributors to a package of data.  Also, context as a means
of separating stages in a software pipeline without need barrier.

Cownie: Proposes using many fewer contexts a very fast implementation by
using an array of queues with indexing via the context.

An alternative to context is using copies of group ids, but then need
point-to-point routines to be aware of groups, i.e. able to select on gids.

Purpose of contexts is to keep separate part of program from interfering
without preventing users from doing anything they want with tags.  Context
as an "endpoint of communication" (Cownie as "queue index").  Connection
with groups, i.e. collective communications.

Lusk: What are cost/complication of using instead tag registration, i.e.
the user requests a range of tags?  Change in "don't care".

Rik Littlefield agreed with need for large number of contexts - situations
where intersections of groups are complicated.

Paul Pierce argued that semantics of using a part of tag as context implies
enough difference that there is likely only a syntactic distinction.

Sandia - user's group had no problem with idea of context; groups caused
them a great deal of confusion and disinterest.

Some situations, e.g. separating software libraries, need only a very small
number of contexts, but using contexts to separate intersecting groups can
lead to situations where you need many contexts.

Skjellum asks Cownie about performance impact.  What about using index for
small number and switching for large number?  Well . . .  Paul Pierce: "IF"
in performance critical code is always a problem.

There followed a discussion of the relationship between contexts and the
problem of implementing collective operations using P-P operations.  Paul
Pierce pointed out that some implementations will want to do this!  [Intel
will do this; Meiko will not do this.]

Rik Littlefield proposed to have context associated with a "code package"
in some fashion, then structured use of tags within the code is sufficient.
For this to work we need to have some essentially static assignment of
context.  But only modest number of contexts needed.

Groups vs. contexts again.  What is a pid?  Etc.  Zipcode bases everything
on groups.

Proposal: Make groups and contexts the same.  [Of course this is part of
MPI -1, rejected last time]
 ------------------------------------------------------------------------------
 ==============================================================================