We will briefly describe the implementation details of these new features. For a more indepth description see .
A new class of system messages (SM_XXX) was defined, to be sent between pvmds, resource managers, hosters and taskers, as well as client tasks of a resource manager.
A new entry point in the pvmd, schentry(), serves messages of the SM class for all three new interfaces. The pvmd was modified to allow it to receive messages from arbitrary tasks (tasks of other pvmds). The pvmds don't usually communicate with foreign tasks (those on other hosts). The pvmd has message reassembly buffers for each foreign pvmd and each task it manages. Reassembly buffers for foreign tasks would be too complicated. To free up the reassembly buffer for a foreign task (if the task dies), the pvmd would have to request notification from the task's pvmd, causing extra communication. For the sake of simplicity the pvmd local to the sending task serves as a message repeater. The message is reassembled by the task's local pvmd as if it were the receiver, then forwarded all at once to the destination pvmd, which reassembles the message again. The source address is preserved, so the sender can be identified. Libpvm maintains dynamic reassembly buffers, so messages from pvmd to task do not cause a problem.
The existing fault recovery mechanisms were mostly adequate to serve the new system tasks. For example, if pvm_addhosts() is called to add hosts to the virtual machine and the hoster task fails while starting the new pvmds, the master pvmd enters the normal task-exit cleanup routine, which cancels the startup operation and returns error code PvmDSysErr for each host in the result vector. Likewise, if the tasker fails, the pvmd can find and terminate the tasks for which it was responsible. The resource manager operations are not currently recovered, because it's not clear what action to take.