THE NAS PARALLEL BENCHMARK CFD CODES

The NAS Parallel Benchmark CFD codes (APPLU, APPSP, and APPBT) were submitted by David H. Bailey of the NASA Ames Research Center. They may be obtained in the current distribution from the netlib repository. Further details, provided by David Bailey, are given below.
-------------------------------------------------------------------------------
Name of Program         : APPLU
			  APPSP
			  APPBT
-------------------------------------------------------------------------------
Submitter's Name        : David H. Bailey
Submitter's Organization: NASA Ames Research Center
Submitter's Address     : Mail Stop T27A-1
			  Moffett Field, CA 94035-1000
Submitter's Telephone # : 415-604-4410
Submitter's Fax #       : 415-604-3957
Submitter's Email       : dbailey@nas.nasa.gov
-------------------------------------------------------------------------------
Major Application Field : Computational Fluid Dyamics
Application Subfield(s) : None
-------------------------------------------------------------------------------
Application "pedigree" (origin, history, authors, major mods) :

These three codes constitute the "CFD application benchmarks" of the
NAS Parallel Benchmark suite.  They are regarded by those of us who
have developed this suite as the most important and relevant to NASA's
applications of the NPB.  These three have been part of the original
suite since its establishment as a "paper and pencil" benchmark 1991.
Numerous vendors have submitted and updated performance reports for
these benchmarks.

-------------------------------------------------------------------------------
May this code be freely distributed (if not specify restrictions) :

This code may be freely distributed internationally.

-------------------------------------------------------------------------------
Give length in bytes of integers and floating-point numbers that should be
used in this application:

All floating point data and operations are 64-bit operations.  There
is no restriction on integer sizes -- both 32-bit and 64-bit may be
used. 

-------------------------------------------------------------------------------
Documentation describing the implementation of the application (at module
level, or lower) :

D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum,
R. Fatoohi, S. Fineberg, P. Frederickson, T. Lasinski, R. Schreiber,
H. Simon, V. Venkatakrishnan and S. Weeratunga, The NAS Parallel
Benchmarks, RNR Technical Report RNR-94-007, March 1994.

-------------------------------------------------------------------------------
Research papers describing sequential code and/or algorithms :

See previous.

-------------------------------------------------------------------------------
Research papers describing parallel code and/or algorithms :

V. K. Naik, ``Performance Issues in Implementing NAS Parallel Benchmark
Applications on IBM SP-1'', Research Report, T.J. Watson Research Center,
IBM, (in preparation) 1993.

E. Barszcz, R. Fatoohi, V. Venkatakrishnan, and S. Weeratunga, 
``Solution of Regular Sparse Triangular Linear Systems on Vector
and Distributed Memory Multiprocessors'', Tech Report RNR-93-07,
NASA Ames Research Center, Moffett Field, CA 94035, April 1993.

Rod Fatoohi and Sisira Weeratunga, "Performance Evaluation of Three
Distributed Computing Environments for Scientific Applications", to
appear, Proceedings of Supercomputing '94.

-------------------------------------------------------------------------------
Other relevant research papers:

-------------------------------------------------------------------------------
Application available in the following languages (give message passing system
used, if applicable, and machines application runs on) :

Three version of the code are available from us:
1) Sequential (reduced-size problems that will run on workstation).
2) Parallel (for Intel iPSC/860 or Paragon).
3) Parallel (for CM-2 or CM-5).
At least one other scientist (Sundarem) has implemented the benchmarks
in PVM.

-------------------------------------------------------------------------------

Total number of lines in source code: APPBT: 4434 APPLU: 3262 APPSP: 3493
Number of lines excluding comments  : APPBT: 4258 APPLU: 3104 APPSP: 3305
Size in bytes of source code        : APPBT: 148186 APPLU: 96370 APPSP: 92783
-------------------------------------------------------------------------------
List input files (filename, number of lines, size in bytes, and if formatted) :

APPBT:
appbt.A.inp - Class A, 21 lines 812 bytes
appbt.B.inp - Class B, 21 lines 812 bytes

APPLU:
applu.A.inp - Class A, 26 lines, 918 bytes
applu.B.inp - Class B, 26 lines, 919 bytes

APPSP:
appsp.A.inp - Class A, 26 lines, 814 bytes
appsp.B.inp - Class B, 26 lines, 817 bytes

-------------------------------------------------------------------------------
List output files (filename, number of lines, size in bytes, and if formatted) :

standard output: formatted text

-------------------------------------------------------------------------------
Brief, high-level description of what application does:

These programs perform fluid dynamics flow simulation.  They are
stripped of complexities associated with real CFD application
programs, thereby enabling a simpler description of the algorithms.
However, they do reproduce the essential computation and data motion
characteristics of large scale, state of the art CFD codes.

-------------------------------------------------------------------------------
Main algorithms used:

Each employs a different high-level solution scheme.  See below.

-------------------------------------------------------------------------------
Skeleton sketch of application:

The three problems are:

LU: A regular-sparse, block (5 x 5) lower and upper triangular system
solution.  This problem represents the computations associated with
the implicit operator of a newer class of implicit CFD algorithms,
typified at NASA Ames by the code ``INS3D-LU''.  This problem exhibits
a somewhat limited amount of parallelism compared to the next two.

SP: Solution of multiple, independent systems of non diagonally
dominant, scalar, pentadiagonal equations.  SP and the following
problem BT are representative of computations associated with the
implicit operators of CFD codes such as ``ARC3D'' at NASA Ames.  SP
and BT are similar in many respects, but there is a fundamental
difference with respect to the communication to computation ratio.

BT: Solution of multiple, independent systems of non diagonally
dominant, block tridiagonal equations with a (5 x 5) block size.

-------------------------------------------------------------------------------
Brief description of I/O behaviour:

Only a brief output at final completion.

-------------------------------------------------------------------------------
Describe the data distribution (if appropriate) :

This may be done in any of several ways.  On most systems, the main
data arrays are decomposed in all three dimensions.

-------------------------------------------------------------------------------
Give parameters of the data distribution (if appropriate) :

See above.

-------------------------------------------------------------------------------
Brief description of load balance behavior :

Since the arrays involved are static, load balancing only depends how
well the individual arrays can be divided between processors.

-------------------------------------------------------------------------------
Give parameters that determine the problem size :

See table below.

-------------------------------------------------------------------------------
Give memory as function of problem size :

We do not have a formula for this.  See table below.

-------------------------------------------------------------------------------
Give number of floating-point operations as function of problem size :

We do not have a formula for this.  See table below.

-------------------------------------------------------------------------------
Give communication overhead as function of problem size and data distribution :

We do not have a formula for this.  It depends on implementation.

-------------------------------------------------------------------------------
Give three problem sizes, small, medium, and large for which the benchmark
should be run (give parameters for problem size, sizes of I/O files,
memory required, and number of floating point operations) :

Class A size problem, with Y-MP/1 statistics:

Benchmark code 		Problem   Memory   Time    Rate
               		size       (Mw)    (sec)  (Mflop/s)

LU (LU)               	64^3        30      344     189
Pentadiagonal (SP)    	64^3         6      806     175
Block tridiagonal (BT)	64^3        24      923     192

Class B size problem, with C-90/1 statistics:

Benchmark code 		Problem   Memory    Time    Rate
               		size       (Mw)     (sec)  (Mflop/s)

LU (LU)                 102^3       122     1973     162
Pentadiagonal (SP)      102^3        22     2160     207
Block tridiagonal (BT)  102^3        96     3554     203


-------------------------------------------------------------------------------
How did you determine the number of floating-point operations (hardware
monitor, count by hand, etc.) :  HPM on Crays


-------------------------------------------------------------------------------
Other relevant information:



-------------------------------------------------------------------------------
PARKBENCH compact applications page
Last Modified May 14, 1996