Thus far, cluster software systems have used a process-based model of parallelism, as in distributed memory multiprocessors but on the opposite end of the spectrum from loop-level parallelism common in vector supercomputers. To enhance functionality as well as performance, we are investigating a threads-based parallelism model within PVM, that compromises between the large granularity of processes and the fine granularity in loops. Threads, or lightweight processes, are essentially multiple sequences of control within a single process that share portions of a common address space. A subroutine (or collection of subroutines) is associated with each thread, and these are multiplexed on the basis of priorities and status - thus providing an effective means of context switching with minimal overheads. Several stand-alone threads packages are available, and operating systems are incorporating native threads into their repertoire - and it is anticipated that threads will be a standard feature of most software environments in the near future.
Figure 8: PVM Threads
Figure 8 depicts the architecture of the PVM threads system under development. From a program development point of view, threads-based cluster computing will differ minimally from the existing process oriented paradigm. In the PVM-threads system, programs export threads, thereby establishing a mapping between a symbolic name and a subroutine address. PVM processes are initiated as in the current scenario, but subsequently spawn multiple threads, each of which, when activated, is assigned a unique thread identifier. The run time system spawns threads based on user-supplied options as well as relative processing speeds of machines in a cluster - the smaller granularity of threads, when coupled with load-based placement, allows for more control in load balancing. Once spawned, threads communicate via explicit message passing calls - in reality however, messages are exchanged only when communicating threads are situated in distinct processes (local communication transparently takes place via shared memory).
From the functional viewpoint, such a threads-based model offers two main advantages. First, data decomposition based on smaller granularity can be implemented without the loss of efficiency that a process-based model would incur. This is especially important in applications such as tree-search algorithms, integer computations, and database query systems, where the amount of computation between communication phases tends to be small. Second, such a paradigm is natural for client-server computing. Services can be exported using the thread registration mechanism, and invoked via functions akin to remote procedure calls. This facility is very useful for non-numeric computing applications, especially those in the database and transaction processing domain.
In terms of performance enhancement, threads provide tremendously increased potential for overlapping computation and communication. Within a processor, the typical communication-computation-communication cycle of parallel processing results in idle periods when a process based model is used. However, with a threads-based model, one thread can be productively utilizing the CPU when another is communication bound or blocked waiting for data to arrive. In preliminary tests with the threads interface to PVM, performance improvements of up to 35% were attained on several standard algorithms, without any other external optimizations. For more information on the threads based implementation of PVM see .