Starting Slave Pvmds

Next: Resource Manager Up: PVM Daemon Previous: Pvmd'

Starting Slave Pvmds

Getting a slave pvmd started is a messy task with no good solution. The goal is to get a process running on the new host, with enough identity to let it be fully configured and added as a peer.

Ideally, the mechanism used should be widely available, secure, and fast, while leaving the system easy to install. We'd like to avoid having to type passwords all the time, but don't want to put them in a file from where they can be stolen. No one system meets all of these criteria. Using inetd or connecting to an already-running pvmd or pvmd server at a reserved port would allow fast, reliable startup, but would require that a system administrator install PVM on each host. Starting the pvmd via rlogin or telnet with a chat script would allow access even to IP-connected hosts behind firewall machines and would require no special privilege to install; the main drawbacks are speed and the effort needed to get the chat program working reliably.

Two widely available systems are rsh and rexec() ; we use both to cover the cases where a password does and does not need to be typed. A manual startup option allows the user to take the place of a chat program, starting the pvmd by hand and typing in the configuration. rsh is a privileged program that can be used to run commands on another host without a password, provided the destination host can be made to trust the source host. This can be done either by making it equivalent (requires a system administrator) or by creating a .rhosts file on the destination host (this isn't a great idea). The alternative, rexec(), is a function compiled into the pvmd. Unlike rsh, which doesn't take a password, rexec() requires the user to supply one at run time, either by typing it in or by placing it in a .netrc file (this is a really bad idea).

Figure: Timeline of addhost operation

Figure shows a host being added to the machine. A task calls pvm_addhosts() to send a request to its pvmd, which in turn sends a DM_ADD message to the master (possibly itself). The master pvmd creates a new host table entry for each host requested, looks up the IP addresses, and sets the options from host file entries or defaults. The host descriptors are kept in a waitc_add structure (attached to a wait context) and not yet added to the host table. The master forks the pvmd' to do the dirty work, passing it a list of hosts and commands to execute (an SM_STHOST message). The pvmd' uses rsh, rexec() or manual startup to start each pvmd, pass it parameters, and get a line of configuration data back. The configuration dialog between pvmd' and a new slave is as follows:

---------------------------------------------------------------------------
pvmd' --> slave:   (exec) $PVM_ROOT/lib/pvmd -s -d8 -nhonk 1 80 a9ca95:0f5a
                                              4096 3 80a95c43:0000
slave --> pvmd':   ddpro<2312> arch ip<80a95c43:0b3f> mtu<4096>
pvmd' --> slave:   EOF
---------------------------------------------------------------------------

The addresses of the master and slave pvmds are passed on the command line. The slave writes its configuration on standard output, then waits for an EOF from the pvmd' and disconnects. It runs in probationary status (runstate = PVMDSTARTUP) until it receives the rest of its configuration from the master pvmd. If it isn't configured within five minutes (parameter DDBAILTIME), it assumes there is some problem with the master and quits. The protocol revision (DDPROTOCOL) of the slave pvmd must match that of the master. This number is incremented whenever a change in the protocol makes it incompatible with the previous version. When several hosts are added at once, startup is done in parallel. The pvmd' sends the data (or errors) in a DM_STARTACK message to the master pvmd, which completes the host descriptors held in the wait context.

If a special task called a hoster is registered with the master pvmd when it receives the DM_ADD message, the pvmd' is not used. Instead, the SM_STHOST message is sent to the hoster, which starts the remote processes as described above using any mechanism it wants, then sends a SM_STHOSTACK message (same format as DM_STARTACK) back to the master pvmd. Thus, the method of starting slave pvmds is dynamically replaceable, but the hoster does not have to understand the configuration protocol. If the hoster task fails during an add operation, the pvmd uses the wait context to recover. It assumes none of the slaves were started and sends a DM_ADDACK message indicating a system error.

After the slaves are started, the master sends each a DM_SLCONF message to set parameters not included in the startup protocol. It then broadcasts a DM_HTUPD message to all new and existing slaves. Upon receiving this message, each slave knows the configuration of the new virtual machine. The master waits for an acknowledging DM_HTUPDACK message from every slave, then broadcasts an HT_COMMIT message, shifting all to the new host table. Two phases are needed so that new hosts are not advertised (e.g., by pvm_config()) until all pvmds know the new configuration. Finally, the master sends a DM_ADDACK reply to the original request, giving the new host id's.

Note: Recent experience suggests it would be cleaner to manage the pvmd' through the task interface instead of the host interface. This approach would allow multiple starters to run at once (parallel startup is implemented explicitly in a single pvmd' process).

Next: Resource Manager Up: PVM Daemon Previous: Pvmd'