Debugging and Tracing

First, the bad news. Adding printf() calls to your code is still a state-of-the-art methodology.

PVM tasks can be started in a debugger on systems that support X-Windows. If PvmTaskDebug is specified in pvm_spawn(), PVM runs $PVM_ROOT/lib/debugger, which opens an xterm in which it runs the task in a debugger defined in pvm3/lib/debugger2. The PvmTaskDebug flag is not inherited, so you must modify each call to spawn. The DISPLAY environment variable can be exported to a remote host so the xterm will always be displayed on the local screen. Use the following command before running the application:

        setenv PVM_EXPORT DISPLAY

Make sure DISPLAY is set to the name of your host (not unix:0) and the host name is fully qualified if your virtual machine includes hosts at more than one administrative site. To spawn a task in a debugger from the console, use the command:

        spawn -? [ rest of spawn command ]

You may be able to use the libpvm trace facility to isolate problems, such as hung processes. A task has a trace mask, which allows each function in libpvm to be selectively traced, and a trace sink, which is another task to which trace data is sent (as messages). A task's trace mask and sink are inherited by any tasks spawned by it.

The console can spawn a task with tracing enabled (using the spawn -@), collect the trace data and print it out. In this way, a whole job (group of tasks related by parentage) can be traced. The console has a trace command to edit the mask passed to tasks it spawns. Or, XPVM can be used to collect and display trace data graphically.

It is difficult to start an application by hand and trace it, though. Tasks with no parent (anonymous tasks) have a default trace mask and sink of NULL. Not only must the first task call pvm_setopt() and pvm_settmask() to initialize the tracing parameters, but it must collect and interpret the trace data. If you must start a traced application from a TTY, we suggest spawning an xterm from the console:

        spawn -@ /usr/local/X11R5/bin/xterm -n PVMTASK

The task context held open by the xterm has tracing enabled. If you now run a PVM program in the xterm, it will reconnect to the task context and trace data will be sent back to the PVM console. Once the PVM program exits, you must spawn a new xterm to run again, since the task context will be closed.

Because the libpvm library is linked with your program, it can't be trusted when debugging. If you overwrite part of its memory (for example by overstepping the bounds of an array) it may start to behave erratically, making the fault hard to isolate. The pvmds are somewhat more robust and attempt to sanity-check messages from tasks, but can still be killed by errant programs.

The pvm_setopt() function can be used to set the debug mask for PVM message-passing functions, as described in §. Setting this mask to 3, for example, will force PVM to log for every message sent or received by that task, information such as the source, destination, and length of the message. You can use this information to trace lost or stray messages.

Next: Debugging the System Up: Troubleshooting Previous: Resource Limitations