The low-level benchmark codes are designed to measure the basic architectural features of parallel machines. Full application codes obviously measure the performance of a parallel system on the full problem and this is ultimately what the user wants. However, in many instances, the full application codes are complex, contain many 100s of thousands of lines of Fortran, and are not available in a suitable parallel version. In order to obtain a guide to the performance of any given parallel system on a particular application something less complex than the full application is useful. A profile of the sequential version of the application enables the compute intensive portions of the program to be identified. It is these compute-intensive subsections of an application that we wish to model with the introduction of parallel kernel benchmarks.
The popular kernel benchmarks that have been used for traditional vector supercomputers, such as the Livermore Loops , the LINPACK benchmark  and the original NAS kernels , are clearly inappropriate for the performance evaluation of parallel machines. The tuning restrictions of these benchmarks rule out many widely used parallel extensions. More importantly, the computation and memory requirements of these programs do not do justice to the vastly increased capabilities of the new parallel machines, particularly those that will be available by the mid 1990's. For these reasons we believe that a new, widely accepted set of kernel benchmarks is desirable as a step on the way to more sensible and scientific performance reporting of parallel systems.
The kernel codes are typically up to a few thousand lines of Fortran and are sufficiently simple that the performance of a given parallel machine on this program may be related to the underlying architectural parameters. It must be acknowledged, however, that the performance on kernels alone is insufficient to assess completely the performance potential of a parallel machine on full scientific applications. The chief difficulty is that a certain data structure may be very efficient on a certain system for one of the isolated kernels, and yet this data structure would be inappropriate if incorporated into a larger application. For example, the performance of a real CFD application on a parallel system is critically dependent on data motion between different computational kernels. In addition, full applications typically have initialization phases, I/O and so on, so complete reproduction of these features can be of critical importance for a realistic guide to performance.
For these reasons the PARKBENCH suite introduces a level of complexity above kernel codes which is called compact applications. These are full but perhaps simplified application codes that contain all the necessary features of the full problem but are sufficiently simple to run and analyse. These are described in Chapter 5.