15.2.3 What We Learned

Next: 15.3 Time Warp Up: MOOS II: An Previous: 15.2.2 Dynamic Load-Balancing Support

15.2.3 What We Learned

Aside from finding some new reasons not to use old hardware, we were able to pinpoint issues worthy of further study, concerning parallel programming in general and load balancing in particular.

In the MOOSE programming style, it is messy and expensive for user tasks to find out when other groups of tasks have terminated. In our defense, the same unsolved problem exists in ADA.
There are serious ambiguities in the meaning of parallel-file IO for asynchronous systems like MOOSE, which are exacerbated when tasks can move from node to node. When writing, does each task have a block in the file, and if so, how do we correlate tasks with blocks? There are hints that a hypertext-like system is more suitable than a linear file for parallel IO, but we were unable to obtain completely satisfactory semantics.
It is not clear that there is a general solution to the dynamic load-balancing problem. The issue may be as resistant to classification as, say, the general nonlinear PDE problem. A better solution may be to provide language support so that the programmer can control the load distribution as part of the program. Existing applications that load-balance successfully (e.g., chess [Felten:87a] in Section 14.3 or the N-body solver [Salmon:89a] in Section 12.4) compute and use load information on the fly in ways that a general-purpose method could not.
As the number of tasks grows, the amount of load information grows enormously, and we have to be selective about what we record. The best choice seems to be application- and machine-dependent.
In irregular problems, having a large number of tasks does not automatically solve the load-balancing problem. Unforeseen correlations between tasks tend to appear that confound naive balancing schemes. The load-balancing problem must be tackled on many scales simultaneously.

Future work will therefore have to focus less on the mechanism of moving tasks around and more on how to communicate load information between user and system.

MOOSE was written by John Salmon, Sean Callahan, Jon Flower and Adam Kolawa. MOOS II was written by Jeff Koller.

The CP references are: [Salmon:88a], [Koller:88a;88b;88d;89a], [Fox:86h].

Next: 15.3 Time Warp Up: MOOS II: An Previous: 15.2.2 Dynamic Load-Balancing Support

Guy Robinson
Wed Mar 1 10:19:35 EST 1995