________ CS - 89 - 85 ________

Performance of Various Computers Using Standard

Linear Equations Software

Jack J. Dongarra*

Computer Science Department University of Tennessee Knoxville, TN 37996-1301

and

Mathematical Sciences Section Oak Ridge National Laboratory Oak Ridge, TN 37831

CS - 89 - 85

March 6, 1993

*Electronic mail address: dongarra@cs.utk.edu. This work was supported in part by the Applied Mathematical Sciences subprogram of theOffice of Energy Research, U.S. Department of Energy, under Contract DE-AC05-84OR21400, and in part by the Science Alliance a state supported program at the University of Tennessee.

1 * * 1

Performance of Various Computers Using Standard Linear

Equations Software

Jack J. Dongarra

Computer Science Department University of Tennessee Knoxville, TN 37996-1301

and

Mathematical Sciences Section Oak Ridge National Laboratory Oak Ridge, TN 37831

March 6, 1993

Abstract

This report compares the performance of different computersystems in solving dense systems of linear equations. The comparison involves approximately a hundred computers,ranging from a CRAY Y-MP to scientific workstations such as the Apollo and Sun to IBM PCs.

1 Introduction and Objectives

The timing information presented here should in no way be used to judge the overall performance of acomputer system. The results reflect only one problem area: solving dense systems of equations. This report provides performance information on a wide assortment of computers ranging from the home-used PC up to the most powerful supercomputers. The information has been collected over a period of time and will undergo change as new machines are added and as hardware and software systems improve. Theprograms used to generate this data can easily be obtained over the Internet. While we make every attempt to verify the results obtained fromusers and vendors, errors are bound to exist and should be brought to our attention. We encourage users to obtain the programs and run the routines on their machines, reporting any discrepancies with the numbers listed here. The first table reports three numbers for each machine listed (in some cases the numbers are missing because of lack of data). All performance numbers reflect arithmetic performed in full precision (usually 64-bit), unless noted. On some machines full precision may be single precision, such as the CRAY, or double precision, suchas the IBM. The first number is for the LINPACK[1 ] benchmark program for a matrix of order 100 in a Fortran environment. Thesecond number is for solving a system of equations of order 1000, with no restrictionon the method or its implementation. The third number is the theoretical peak performance of the machine. LINPACK programs can be characterized as having a high percentage of floating-point arith- metic operations. The routines involved in this timing study, SGEFA and SGESL, use column- oriented algorithms. That is, the programs usually reference array elements sequentially down a column, not across a row. Column orientation is important in increasing efficiency because of the __________________________________________________ This work was supported in part by the Applied Mathematical Sciences subprogram of the Office of* * Energy Research, U.S. Department of Energy,under Contract DE-AC05-84OR21400, and in part by the Science Al* *liance a state supported program at the University of Tennessee. March 6, 1993 * * 2

way Fortran stores arrays. Most floating-point operations in LINPACK take place in a set ofsubpro- grams, the Basic Linear Algebra Subprograms (BLAS) [3], which are called repeatedly throughout the calculation. These BLAS, referred to now as Level 1 BLAS, reference one-dimensional arrays, rather than two-dimensional arrays. In the first case, the problem size is relatively small (order 100), and nochanges were made to the LINPACK software. Moreover, no attempt was made to use special hardware features or to exploit vector capabilities or multiple processors. (The compilers on some machines may, of course, generate optimized code that itself accesses special features.) Thus,many high-performance machines may not have reached their asymptotic execution rates. In the second case, the problem size is larger (matrix of order 1000), and modifying or replacing the algorithm and software was permitted to achieve as high an executionrate as possible. Thus, the hardware had more opportunity for reaching near-asymptotic rates. An important constraint, however, was that all optimizated programs maintain the same relative accuracy as standard tech- niques, such as Gaussian elimination used in LINPACK. Furthermore, the driver program (supplied with the LINPACK benchmark) had to be run to ensure that the same problem is solved. The driver program sets up the matrix, calls the routines to solve the problem, verifies that the answers are correct, and computes the total number of op erations to solve the problem (independent of the method) as 2n3=3 + 2n2, where n = 1000. The last column is based not on an actual program run,but on a paper computation to determine the theoretical peak Mflop/s rate for the machine. This is the number manufacturers often cite;it represents an upper bound on performance. That is, the manufacturer guarantees that programs will not exceed this rate_sort of a "speed of light" for a given computer. The theoretical peak performance is determined by counting the number of floating-point ad- ditions and multiplications (in full precision) that can be completed during a period of time, usually the cycle time of the machine. As an example, the CRAY Y-MP/8 has a cycle time of 6 ns. During a cycle the results of both an addition and a multiplication can be completed 2 operations_______________1_cycle 1 cycle 6ns = 333 Mflop/s on a single processor. On the CRAYY-MP/8 there are 8 pro cessors; thus, the peak performance is 2667 Mflop/s. The information in this report is presented to users to provide a range of performance for the various computers and to show the effects of typical Fortran programming and the results that can be obtained through careful programming. The maximum rate of execution is given for comparison. The column labeled "Computer" gives the name of the computer hardware on which the program was run on. In some cases we have indicated the number of processors in the configuration and, in some cases, the cycle time of the processor in nanoseconds. The column labeled "LINPACKBenchmark" gives the operating system and compiler used. The run was based on two routines from LINPACK: SGEFA and SGESL were used for single precision, and DGEFA and DGESL were used for double precision. These routines perform standard LU decomp osition with partial pivoting and backsubstitution. Thetiming was done on a matrix of order 100, where no changes are allowed to the Fortran programs. The column labeled "TPP" (Toward Peak Performance) gives the results of hand optimization; the problem size was of order 1000. The final column labeled "Theoretical Peak" gives the maximum rate ofexecution based on the cycle time of the hardware. The same matrix was used to solve the system of equations. The results were checked for accuracy by calculating a residual for the problem jjAx bjj=(jjAjjjjxjj): The term Mflop/s, used as a rate of execution, stands for millions of floating-point op erations completed per second. For solving a system of n equations, 2=3n3+ 2n2 operations are performed (we count both additions and multiplications). March 6, 1993 * * 3

The information in the tables was compiled over a period of time. Subsequentsystems software and hardware changes may alter the timings to some extent. One further note: The following tables should not be taken too seriously. In multiprogramming environments it is often difficult to reliably measure the execution time of a single program. We trust that anyone actually evaluating machines and operating systems willgather more reliable and more representative data.

2 A Look at Parallel Processing

While collecting the data presented in Table 1,we were able to experiment with parallel processing on a number of computer systems. For these experiments, we used either the standard LINPACK algorithm or an algorithm based on matrix-matrix [2] techniques. In the case of the LINPACK algorithm, the loop around the SAXPY can be performed in parallel. In the matrix-matrix imple- mentation the matrix product can be split into submatrices and performed in parallel. In either case, the parallelism follows a simple fork-and-join model where each processor gets some number of op erations to perform. For a problem of size 1000, we expect a high degree of parallelism. Thus, it is not surprising that we get such high efficiency (see Table 2). Theactual percentage of parallelism, of course, dep endson the algorithm and on the speed of the uniprocessor on the parallel part relative to the sp eed of the uniprocessor on the non-parallel part.

3 Highly Parallel Computing

With the arrival of masssively parallel computers there is a need to benchmark such machines on problems that make sense. The problem size and rule for the runs reflected in the Tables 1 and2 do not permit massively parallel computers to demonstrate their potential performance. The basic flaw is the problem size is too small. To provide a forum for comparing such machines the following benchmark was run on a number of massively parallel machines. The benchmark involves solving a system of linear equations (as was done in Tables 1 and 2). However in this case, the problem size is allowed to increase and the performance numbers reflect the largest problem run on the machine. The ground rules are as follows: Solve systems of linear equations by some method, allow the size of the problem to vary, and measure the execution time for each size problem. In computing the floating-point execution rate, use 2n3=3 + 2n2 operations independent of the actual method used. (If you choose to do Gaussian Elimination, partial pivoting must be used.) Compute and rep orta residual for the accuracy of solution as jjAx bjj=(jjAjjjjxjj): The columns in Table 3 are defined as follows: Rmax the performance in Gflop/s for the largest problem run on amachine. Nmax the size of the largest problem run on a machine. N1=2 the size where half the Rmax execution rate is achieved. Rpeak the theoretical peak performance in Gflop/s for the machine. In addition, the number of processors and the cycle time is listed.

4 Obtaining the Software and Running the Benchmarks

The software used to generate the data for this report can be obtained by sending electronicmail to netlib@ornl.gov March 6, 1993 * * 4

4.1 LINPACK Benchmark

The first results listed in Table 1 involved no hand optimization of the LINPACK benchmark. To receive the single-precision software for this benchmark, in the mail message to netlib@ornl.* *gov type: send linpacks from benchmark . To receive the double-precision software for the LINPACK Benchmark, type: send linpackd from benchmark . To run the timing programs, one must supply a real function SECOND which returns the time in seconds from some fixed starting time. There is only one ground rule for running this benchmark:

fflNo changes are to be made to the Fortran source code, not even changes in the comments.

The compiler and operating system must be generally available. Results from ab eta version of acompiler are allowed, however the standard compiler results must also be listed.

4.2 Toward Peak Performance

The second set of results listed in Table 1 reflected user optimization of the software. To receive the single-precision software for the column labeled "Toward Peak Performance,"in the mail message netlib@ornl.gov type: send 1000s from benchmark To receive the double-precision software,type: send 1000d from benchmark The ground rules for running this benchmark are as follows:

fflReplacements or modifications are allowed in the routine LU.

fflThe user is allowed to supply any method for the solution of the system of equations.

fflThe Mflop/s ratewill be computed based on the operation count for LU decomposition.

fflIn all cases, the main driver routine, with its test matrix generator and residual check, must be used.

This report is updated from time to time. A fax copy of this report can be supplied,for details contact the author. To obtain a Postscript copy of the report send mail to netlib@ornl.gov and in the message type: send performance from benchmark. To have results verified, please send theoutput of the runs to

Jack Dongarra Computer Science Department University of Tennessee Knoxville, TN 37996-1301

Internet: dongarra@cs.utk.edu

Fax number: 615-974-8296 March 6, 1993 * * 5

Table 1: Performance in Solving a System of Linear Equations

Computer _ "LINPACK Benchmark" _ "TPP" * * _ "Theoretical

__ n = 100 _ BestEffort * * _Peak" Mflop/s _____________________________________________OS/Compiler____________Mflop/s___n=1000,_Mflop/s* *__________________________________ CRAY Y-MP C90 (16pro c.4.2 ns) _ CF77 5.0 -Zp -Wd-e68 479 _ 9715 * * _ 15238 CRAY Y-MP C90 (8pro c. 4.2 ns) _ CF77 5.0 -Zp -Wd-e68 468 _ 5994 * * _ 7619 CRAY Y-MP C90 (4pro c. 4.2 ns) _ CF77 5.0 -Zp -Wd-e68 388 _ 3272 * * _ 3810 CRAY Y-MP C90 (2pro c. 4.2 ns) _ CF77 5.0 -Zp -Wd-e68 387 _ 1709 * * _ 1905 CRAY Y-MP C90 (1pro c. 4.2 ns) _ CF77 5.0 -Zp -Wd-e68 387 _ 874 * * _ 952 NEC SX-3/44R (4 proc. 2.5 ns) _ _ 15120 * * _ 25600 NEC SX-3/42R (4 proc. 2.5 ns) _ _ 8950 * * _ 12800 NEC SX-3/41R (4 proc. 2.5 ns) _ _ 4815 * * _ 6400 NEC SX-3/24R (2 proc. 2.5 ns) _ _ 9454 * * _ 12800 NEC SX-3/22R (2 proc. 2.5 ns) _ _ 5116 * * _ 6400 NEC SX-3/21R (2 proc. 2.5 ns) _ _ 2627 * * _ 3200 NEC SX-3/14R (1 proc. 2.5 ns) _ f77sx 040 R2.2 -pi*:* 368 _ 5199 * * _ 6400 NEC SX-3/12R (1 proc. 2.5 ns) _ f77sx 040 R2.2 -pi*:* 368 _ 2757 * * _ 3200 NEC SX-3/44 (4 proc. 2.9 ns) _ _ 13420 * * _ 22000 NEC SX-3/24 (2 proc. 2.9 ns) _ _ 8149 * * _ 11000 NEC SX-3/42 (4 proc. 2.9 ns) _ _ 7752 * * _ 11000 NEC SX-3/22 (2 proc. 2.9 ns) _ _ 4404 * * _ 5500 NEC SX-3/14 (1 proc. 2.9 ns) _ f77sx 020 R1.13 -pi*:* 314 _ 4511 * * _ 5500 NEC SX-3/12 (1 proc. 2.9 ns) _ f77sx 020 R1.13 -pi*:* 313 _ 2283 * * _ 2750 CRAY Y-MP/832 (8pro c. 6ns) _ CF77 4.0 -Zp -Wd-e68 275 _ 2144 * * _ 2667 Fujitsu VP2600/10(3.2 ns) _ FORTRAN77 EX/VP V11L10 249 _ 4009 * * _ 5000 CRAY Y-MP/832 (4pro c. 6ns) _ CF77 4.0 -Zp -Wd-e68 226 _ 1159 * * _ 1333 NEC SX-3/11R (1 proc. 2.5 ns) _ f77sx 040 R2.2 -pi*:* 202 _ 1418 * * _ 1600 NEC SX-3/1LR (1 proc. 2.5 ns) _ f77sx 040 R2.2 -pi*:* 201 _ 767 * * _ 800 CRAY Y-MP/832 (2pro c. 6ns) _ CF77 5.0 -Zp -Wd-e68 181 _ 604 * * _ 667 CRAY X-MP/416 (4pro c. 8.5 ns) _ CF77 4.0 -Zp -Wd-e68 178 _ 822 * * _ 940 NEC SX-3/11 (1 proc. 2.9 ns) _ f77sx 020 R1.13 -pi*:* 173 _ 1223 * * _ 1370 NEC SX-3/1L (1 proc. 2.9 ns) _ f77sx 020 R1.13 -pi*:* 171 _ 661 * * _ 680 Fujitsu VP2400/10(4 ns) _ FORTRAN77 EX/VP V11L10 170 _ 1688 * * _ 2000 CRAY Y-MP/832 (1pro c. 6ns) _ CF77 5.0 -Zp -Wd-e68 161 _ 324 * * _ 333 CRAY Y-MP M92 (2pro c. 6ns) _ CF77 5.0 -Zp -Wd-e68 145 _ 550 * * _ 666 CRAY Y-MP M92 (1pro c. 6ns) _ CF77 5.0 -Zp -Wd-e68 145 _ 332 * * _ 333 CRAY X-MP/416 (2pro c. 8.5 ns) _ CF77 5.0 -Zp -Wd-e68 143 _ 426 * * _ 470 CRAY 2S/4-128 (4pro c. 4.1 ns) _ CF77 5.0 -Zp -Wd-e68 129 _ 1406 * * _ 1951 Fujitsu VP2200/10(4 ns) _ FORTRAN77 EX/VP V11L10 127 _ 842 * * _ 1000 CRAY X-MP/416 (1pro c. 8.5 ns) _ CF77 5.0 -Zp -Wd-e68 121 _ 218 * * _ 235 CRAY 2S/4-128 (2pro c. 4.1 ns) _ CF77 5.0 -Zp -Wd-e68 113 _ 741 * * _ 976 Fujitsu VP2100/10(4 ns) _ FORTRAN77 EX/VP V11L10 112 _ 445 * * _ 500 Hitachi S-820/80 (4 ns) _ FORT77/HAPV23-0C 107 _ * * _ 3000 CRAY 2S/4-128 (1pro c. 4.1 ns) _ CF77 5.0 -Zp -Wd-e68 107 _ 384 * * _ 488 CRAY 2S/8-128 (8pro c. 4.1 ns) _ CF77 4.0 -Zp -Wd-e68 102 _ 2171 * * _ 3902 ETA 10-G (1 proc. 7 ns) _ ETAV/FTN200 93 _ 496 * * _ 571 CONVEX C-3880 (8 proc.) (16.7 ns) _ fc7.0 -tm c38 -O3 -ep 8 -ds -is . 86 _ 795 * * _ 960 IBM ES/9000-982 VF(8 proc 7.1ns) _ VAST-2/VS Fortran V2R5 _ 2278 * * _ 4507 March 6, 1993 * * 6

Computer _ "LINPACK Benchmark" _ "* *TPP" _ "Theoretical

__ n = 100 _ Bes* *t Effort _ Peak"Mflop/s __________________________________________________OS/Compiler________________Mflop/s___n=1000* *,_Mflop/s__________________________________ IBM ES/9000-972 VF(7 proc 7.1ns) _ VAST-2/VSFortran V2R5 _ * *2072 _ 3944 IBM ES/9000-962 VF(6 proc 7.1ns) _ VAST-2/VSFortran V2R5 _ * *1923 _ 3380 IBM ES/9000-952 VF(5 proc 7.1ns) _ VAST-2/VSFortran V2R5 _ * *1681 _ 2817 IBM ES/9000-942 VF(4 proc 7.1ns) _ VAST-2/VSFortran V2R5 _ * *1377 _ 2254 IBM ES/9000-831 VF(3 proc 7.1ns) _ VAST-2/VSFortran V2R5 _ * *1082 _ 1690 IBM ES/9000-821 VF(2 proc 7.1ns) _ VAST-2/VSFortran V2R5 _ * * 767 _ 1127 IBM ES/9000-711 VF(1 proc 7.1ns) _ VAST-2/VSFortran V2R5 86 _ * * 422 _ 563 CONVEX C-3840 (4 proc.) (16.7 ns) _ fc7.0 -tm c38 -O3 -ep 4 -ds -is . 75 _ * * 425 _ 480 CONVEX C-3830 (3 proc.) (16.7 ns) _ fc7.0 -tm c38 -O3 -ep 3 -ds -is . 71 _ * * 327 _ 360 CONVEX C-3820 (2 proc.) (16.7 ns) _ fc7.0 -tm c38 -O3 -ep 2 -ds -is . 62 _ * * 222 _ 240 CRAY-2/4-256 (4 pro c. 4.1 ns) _ cf77 3.0 62 _ * *1406 _ 1951 ETA 10-E (1 proc. 10.5 ns) _ ETAV/FTN200 62 _ * * 334 _ 381 IBM ES/9000-900 VF(6 proc. 9 ns) _ VAST-2/VSFortran V2R4 _ * *1457 _ 2664 IBM ES/9000-860 VF(5 proc. 9 ns) _ VAST-2/VSFortran V2R4 _ * *1210 _ 2220 IBM ES/9000-820 VF(4 proc. 9 ns) _ VAST-2/VSFortran V2R4 _ * *1003 _ 1776 IBM ES/9000-740 VF(3 proc. 9 ns) _ VAST-2/VSFortran V2R4 _ * * 775 _ 1332 IBM ES/9000-640 VF(2 proc. 9 ns) _ VAST-2/VSFortran V2R4 _ * * 539 _ 888 IBM ES/9000-660 VF(2 proc. 9 ns) _ VAST-2/VSFortran V2R4 _ * * 535 _ 888 IBM ES/9000-520 VF(1 proc. 9 ns) _ VAST-2/VSFortran V2R4 60 _ * * 338 _ 444 CRAY X-MP/14se (10 ns) _ cf77 3.0 53 _ * * 184 _ 210 CRAY-2/4-256 (2 pro c. 4.1 ns) _ cf77 3.0 48 _ * * 709 _ 976 IBM ES/9000-711 (1 proc 7.1ns) _ VAST-2/VSFortran V2R5 48 _ * * _ CONVEX C-3810 (1 proc.) (16.7 ns) _ fc7.0 -tm c38 -O2 -is . 44 _ * * 113 _ 120 DEC 10000-610 Alpha AXP 200 MHz _ 3.2 inl=daxpy,ur=4,ur2=240 43 _ * * 155 _ 200 NEC SX-2 _ FORTRAN77/SX 43 _ * * 885 _ 1300 HP 9000/735 (99 MHz) _ +OP3 -Wl,-aarchive -WP,-nv -w,MLIB 41 _ * * 107 _ 198 CRAY Y-MP EL (4 pro c. 30 ns) _ CF77 5.0 -Zp -Wd-e68 39 _ * * 345 _ 532 DEC 7000-610 Alpha AXP(182 MHz) _ 3.2 inl=daxpy,ur=4,ur2=240 39 _ * * 141 _ 182 CRAY-2/4-256 (1 pro c. 4.1 ns) _ cf77 3.0 38 _ * * 360 _ 488 IBM ES/9000-520 (1 proc. 9 ns) _ VAST-2/VSFortran V2R4 38 _ * * _ IBM ES/9000-820 (1 proc. 9 ns) _ VAST-2/VSFortran V2R4 38 _ * * _ DEC 4000-610 Alpha AXP(160 MHz) _ 3.2 inl=daxpy,ur=4,ur2=240 36 _ * * 114 _ 160 NEC SX-1 _ FORTRAN77/SX 36 _ * * 422 _ 650 CONVEX C-3440 (4 proc.) _ fc7.0 fc -O3 -ep 4 -ds -is . 34 _ * * 172 _ 200 ETA 10-Q (1 proc. 19 ns) _ ETAV/FTN200 34 _ * * 185 _ 210 CRAY Y-MP EL (2 pro c. 30 ns) _ CF77 5.0 -Zp -Wd-e68 33 _ * * 191 _ 266 CRAY S-MP/MCP784(84 proc. 25 ns) _ _ * * 742 _ 3360 CRAY S-MP/MCP756(56 proc. 25 ns) _ _ * * 678 _ 2240 CRAY S-MP/MCP728(28 proc. 25 ns) _ _ * * 508 _ 1120 CRAY S-MP/MCP707(7 pro c. 25 ns) _ MCP Release 2.2 33 _ * * 194 _ 280 FPS 510S MCP784 (84 proc. 25 ns) _ _ * * 548 _ 3360 FPS 510S MCP756 (56 proc. 25 ns) _ _ * * 513 _ 2240 FPS 510S MCP728 (28 proc. 25 ns) _ _ * * 414 _ 1120 FPS 510S MCP707 (7 proc. 25 ns) _ pgf77 -O4 -Minline 33 _ * * 184 _ 280 CDC Cyber 2000V _ FortranV2 32 _ * * _ CONVEX C-3430 (3 proc.) _ fc7.0 fc -O3 -ep 3 -ds -is . 32 _ * * 132 _ 150 March 6, 1993 * * 7

Computer _ "LINPACK Benchmark" _ "TPP" * * _ "Theoretical

__ n = 100 _ Best Eff* *ort _ Peak"Mflop/s _________________________________________________OS/Compiler____________Mflop/s___n=1000,_Mfl* *op/s__________________________________ CRAY Y-MP EL (1 pro c. 30 ns) _ CF77 5.0 -Zp -Wd-e68 32 _ 107 * * _ 133 NEC SX-1E _ FORTRAN 77/SX 32 _ 221 * * _ 325 Alliant FX/2800-200 (14 proc) _ fortran 1.1.27 -O -inline 31 _ 325 * * _ 560 IBM RISC Sys/6000-970 (50 MHz) _ v2.2.1 xlf -O -P -Wp,-ea478 31 _ 84 * * _ 100 IBM RS/6000 Cluster(8 proc 62.5 MHz) _ _ 269 * * _ 1000 IBM RS/6000 Cluster(4 proc 62.5 MHz) _ _ 206 * * _ 500 IBM RS/6000 Cluster(2 proc 62.5 MHz) _ _ 144 * * _ 250 IBM RS/6000 580 62.5 MHz _ v2.2.1 xlf -O -P -Wp,-ea478 _ 105 * * _ 125 IBM RS/6000 Cluster(8 proc 50 MHz) _ _ 194 * * _ 800 IBM RS/6000 Cluster(6 proc 50 MHz) _ _ 174 * * _ 600 IBM RS/6000 Cluster(4 proc 50 MHz) _ _ 152 * * _ 400 IBM RS/6000 Cluster(2 proc 50 MHz) _ _ 111 * * _ 200 IBM RISC Sys/6000-560 (50 MHz) _ v2.2.1 xlf -O -P -Wp,-ea478 31 _ 84 * * _ 100 IBM ES/9000-742 VF(4 proc 11ns) _ VAST-2/VS Fortran V2R5 _ 441 * * _ 752 IBM ES/9000-732 VF(3 proc 11ns) _ VAST-2/VS Fortran V2R5 _ 352 * * _ 545 IBM ES/9000-622 VF(2 proc 11ns) _ VAST-2/VS Fortran V2R5 _ 244 * * _ 364 IBM ES/9000-621 VF(2 proc 11ns) _ VAST-2/VS Fortran V2R5 _ 244 * * _ 364 IBM ES/9000-521 VF(2 proc 11ns) _ VAST-2/VS Fortran V2R5 _ 185 * * _ 364 IBM ES/9000-511 VF(1 proc 11ns) _ VAST-2/VS Fortran V2R5 30 _ 130 * * _ 182 DEC 3000-500 Alpha AXP(150 MHz) _ 3.2 inl=daxpy,ur=4,ur2=240 30 _ 107 * * _ 150 Alliant FX/2800-200 (12 proc) _ fortran 1.1.27 -O -inline 29 _ 290 * * _ 480 Alliant FX/2800 210 (1 proc) _ fortran 1.3.02 -Ovg -inline 25 _ 34 * * _ 50 Alliant FX/2800-200 (10 proc) _ fortran 1.1.27 -O -inline 27 _ 250 * * _ 400 ETA 10-P (1 proc. 24 ns) _ ETAV/FTN200 27 _ 146 * * _ 167 CONVEX C-3420 (2 proc.) _ fc7.0 fc -O3 -ep 2 -ds -is . 27 _ 90 * * _ 100 CRAY-1S (12.5 ns) _ cf77 2.1 27 _ 110 * * _ 160 CONVEX C-3240 (4 proc.) _ fc -O3 -ep 2 -uo -pp=fcpp1 -is . 26 _ 171 * * _ 200 CONVEX C-240 (4 proc.) _ 6.1 -O3 -ep2 -uo -pp=fcpp1 -is . 26 _ 166 * * _ 200 CONVEX C-3230 (3 proc.) _ fc -O3 -ep 2 -uo -pp=fcpp1 -is . 26 _ 132 * * _ 150 CONVEX C-230 (3 proc.) _ 6.1 -O3 -ep2 -uo -pp=fcpp1 -is . 26 _ 128 * * _ 150 DEC 3000-400 Alpha AXP(133 MHz) _ 3.2 inl=daxpy,ur=4,ur2=240 26 _ 90 * * _ 133 IBM RISC Sys/6000-950 (42 MHz) _ v2.2.1 xlf -O -P -Wp,-ea478 26 _ 70 * * _ 84 IBM RISC Sys/6000-550 (42 MHz) _ v2.2.1 xlf -O -P -Wp,-ea478 26 _ 70 * * _ 84 Alliant FX/2800-200 (8 proc) _ fortran 1.1.27 -O -inline 25 _ 207 * * _ 320 NAS AS/EX 100 VPF (4 proc) _ _ 320 * * _ 484 NAS AS/EX 90 VPF (3 proc) _ _ 251 * * _ 363 NAS AS/EX 80 VPF (2 proc) _ _ 173 * * _ 242 NAS AS/EX 60 VPF _ VAST-2/VS 2.3.0 opt=3 25 _ 94 * * _ 121 HP 9000/750 (66 MHz) _ +OP3 -Wl,-aarchive -WP,-nv -w 24 _ 47 * * _ 66 HP 9000/730 (66 MHz) _ +OP3 -Wl,-aarchive -WP,-nv -w 24 _ 49 * * _ 66 IBM ES/9000 Model 480 VF _ VAST-2/VS Fortran V2R4 _ 180 * * _ 266 IBM ES/9000-340 VF (14.5 ns) _ VAST-2/VS Fortran V2R4 23 _ * * _ 138 IBM ES/9000-411 VF(1 proc 11ns) _ VAST-2/VS Fortran V2R5 23 _ 99 * * _ 182 DEC VAX 9000 420VP(2 proc 16 ns) _ HPO V1.3-163V, DXML _ 155 * * _ 250 DEC VAX 9000 410VP(1 proc 16 ns) _ HPO V1.3-163V, DXML 22 _ 89 * * _ 125 IBM ES/9000-610 VF (4 proc 15 ns) _ VAST-2/VS Fortran V2R4 _ 335 * * _ 532 March 6, 1993 * * 8

Computer _ "LINPACK Benchmark" _ "TPP" * * _ "Theoretical

__ n = 100 _ Best Effort * * _Peak" Mflop/s _____________________________________________OS/Compiler____________Mflop/s___n=1000,_Mflop/s* *__________________________________ IBM ES/9000-570 VF (3 proc 15 ns) _ VAST-2/VS Fortran V2R4 _ 252 * * _ 399 IBM ES/9000-490 VF (2 proc 15 ns) _ VAST-2/VS Fortran V2R4 _ 171 * * _ 266 IBM ES/9000-320 VF (1 proc 15 ns) _ VAST-2/VS Fortran V2R4 22 _ 91 * * _ 133 Multiflow TRACE 28/300 _ Fortran 2.2.1 22 _ 69 * * _ 123 CONVEX C-3220 (2 proc.) _ fc -O3 -ep 2 -uo -pp=fcpp1 -is . 22 _ 89 * * _ 100 CONVEX C-220 (2 proc.) _ 6.1 -O3 -ep2 -uo -pp=fcpp1 -is . 22 _ 87 * * _ 100 Alliant FX/2800-200 (6 proc) _ fortran 1.1.27 -O -inline 21 _ 148 * * _ 240 Siemens VP400-EX (7 ns) _ Fortran 77/VP V10L30 21 _ 794 * * _ 1714 FPS Model 522 _ F774.2 20 _ 105 * * _ 133 FujitsuVP-400 _ Fortran 77 V10L30 20 _ * * _ 1142 IBM RISC Sys/6000-530H(33 MHz) _ v2.2.1 xlf -O -P -Wp,-ea478 20 _ 55 * * _ 66 Siemens VP200-EX (7 ns) _ Fortran 77 V10L30 20 _ 472 * * _ 857 Amdahl 1400 _ 77/VPV10L20 19 _ 521 * * _ 1142 Amdahl 1200 _ 77/VPV10L20 19 _ 424 * * _ 571 CONVEX C-3410 (1 proc.) _ fc7.0 fc -O2 -is . 19 _ 47 * * _ 50 IBM ES/9000 Model 260 VF (15 ns) _ VAST-2/VS Fortran V2R4 19 _ 78 * * _ 133 IBM RISC Sys/6000-540 (30 MHz) _ v2.2.1 xlf -O -P -Wp,-ea478 19 _ 50 * * _ 60 IBM RISC Sys/6000-350 (42 MHz) _ v2.2.1 xlf -O -P -Wp,-ea478 19 _ 61 * * _ 84 IBM ES/9000-311 VF(1 proc 11ns) _ VAST-2/VS Fortran V2R5 19 _ 82 * * _ 182 FujitsuVP-200 _ Fortran 77 18 _ 422 * * _ 533 HP 9000/720 (50 MHz) _ HP-UX 8.05 f77 +OP4 +O3 18 _ 36 * * _ 50 NAS AS/EX 50 VPF _ VAST-2/VS 2.3.0 18 _ 82 * * _ 121 Siemens VP100-EX (7 ns) _ Fortran 77/VP V10L30 18 _ 254 * * _ 428 SGI 4D/480(8 proc) 40MHz _ f77 -O2 -mp 18 _ 71 * * _ 128 Alliant FX/2800-200 (4 proc) _ fortran 1.1.27 -O -inline 17 _ 94 * * _ 160 Amdahl 1100 _ 77/VPV10L20 17 _ 248 * * _ 285 CDC CYBER 205 (4-pipe) _ FTN 17 _ 195 * * _ 400 CDC CYBER 205 (2-pipe) _ FTN 17 _ 113 * * _ 200 CONVEX C-3210 (1 proc.) _ fc -O2 -uo -pp=fcpp1 -is . 17 _ 44 * * _ 50 CONVEX C-210 (1 proc.) _ 6.1 -O2 -uo -pp=fcpp1-is . 17 _ 44 * * _ 50 Cray XMS (55 ns) _ cf77 5.0 -Zp -Wd-e68 17 _ 34 * * _ 36 HitachiS-810/20 _ FORT77/HAP 17 _ * * _ 840 IBM ES/9000 Model 210 VF (15 ns) _ VAST-2/VS Fortran V2R4 17 _ 72 * * _ 133 Siemens VP50-EX (7 ns) _ Fortran 77/VP V10L30 17 _ 238 * * _ 285 Multiflow TRACE 14/300 _ Fortran 2.2.1 17 _ 42 * * _ 63 HitachiS-810/10 _ HAPV21.00 16 _ * * _ 315 IBM 3090/600J VF (6 proc, 14.5 ns) _ _ 540 * * _ 828 IBM 3090/500J VF (5 proc, 14.5 ns) _ _ 458 * * _ 690 IBM 3090/400J VF (4 proc, 14.5 ns) _ _ 370 * * _ 552 IBM 3090/380J VF (3 proc, 14.5 ns) _ _ 282 * * _ 414 IBM 3090/300J VF (3 proc, 14.5 ns) _ _ 284 * * _ 414 IBM 3090/280J VF (2 proc, 14.5 ns) _ _ 191 * * _ 276 IBM 3090/200J VF (2 proc, 14.5 ns) _ _ 192 * * _ 276 IBM 3090/180J VF (1 proc, 14.5 ns) _ VS Fortran V2R3 16 _ 97 * * _ 138 IBM 3090/180S VF (1 proc, 15 ns) _ VS Fortran 2.3.0 16 _ 92 * * _ 133 FujitsuVP-100 _ Fortran 77 16 _ * * _ 267 March 6, 1993 * * 9

Computer _ "LINPACKBenchmark" _ "TPP" * * _ "Theoretical

__ n = 100 _Best Effort* * _ Peak"Mflop/s ______________________________________________OS/Compiler____________Mflop/s____n=1000,Mflop/* *s__________________________________ Amdahl 500 _ 77/VP V10L20 16 _ 133 * * _ 142 HitachiM680H/vector _ Fort 77 E2 V04-0I 16 _ * * _ SGI Crimson(1 proc 50 MHz R4000) _ -O2 -mips2 -G 8192 16 _ 32 * * _ 50 SGI 4D/380(8 proc) 33MHz _ f77 -O2 -mp 16 _ 60 * * _ 106 FPS Model 511 _ F77 4.2 15 _ 56 * * _ 67 HitachiM680H _ Fort 77 E2 V04-0I 15 _ * * _ IBM RISC Sys/6000-930 (25 MHz) _ v2.2.1 xlf -O -P -Wp,-ea478 15 _ 42 * * _ 50 IBM RISC Sys/6000-530 (25 MHz) _ v2.2.1 xlf -O -P -Wp,-ea478 15 _ 42 * * _ 50 IBM RISC Sys/6000-340 (33 MHz) _ v2.2.1 xlf -O -P -Wp,-ea478 15 _ 49 * * _ 66 IBM ES/9000-511 (1 proc 11ns) _ VAST-2/VS Fortran V2R5 15 _ * * _ Kendall Square (32 proc) _ _ 513 * * _ 1280 Kendall Square (16 proc) _ _ 307 * * _ 640 Kendall Square (8 proc) _ _ 146 * * _ 320 Kendall Square (4 proc) _ _ 47 * * _ 160 Kendall Square (1 proc) _ ksrf77 -O2 -r8 -inline_auto 15 _ 31 * * _ 40 NAS AS/EX 60 _ Fortran 15 _ * * _ 40 SGI 4D/440(4 proc) 40MHz _ f77 -O2 -mp 15 _ 42 * * _ 64 Siemens H120F _ Fortran 77 15 _ * * _ Cydrome CYDRA 5 _ Fortran 77 Rel 2.4.1 14 _ * * _ 25 FujitsuVP-50 _ Fortran 77 14 _ * * _ 133 IBM ES/9000 Model 190 VF(15 ns) _ VAST-2/VS Fortran V2R4 14 _ 60 * * _ 133 IBM 3090/180E VF _ VS 2.1.1 opt=3 13 _ 71 * * _ 116 SGI 4D/340(4 proc) 33MHz _ f77 -O2 -mp 13 _ 36 * * _ 53 CDC CYBER 990E _ FTN V2 VL=HIGH 12 _ * * _ CRAY-1S (12.5 ns,1983 run) _ CFT 1.12 12 _ 110 * * _ 160 IBM 3090/180 VF _ VS Fortran V2 12 _ 65 * * _ 108 IBM RISC Sys/6000-520H(25 MHz) _ v2.2.1 xlf -O -P -Wp,-ea478 12 _ 37 * * _ 50 IBM RISC Sys/6000-320H(25 MHz) _ v2.2.1 xlf -O -P -Wp,-ea478 12 _ 37 * * _ 50 Stardent3040 _ 3.0 -inline -nmax=300 12 _ 77 * * _ 128 Stardent3030 _ 3.0 -inline -nmax=300 12 _ 63 * * _ 96 Stardent 2040 (Stellar GS2000) _ f77 -O3 -is R2.1 12 _ * * _ 40 Stardent 1040 (Stellar GS1000) _ f77 -O3 -is -re R2.0 12 _ * * _ 40 CDC 4680InfoServer (60 MHz) _ f77 2.20 -O3 -mips2 -Wb,-r6000 11 _ * * _ CRAY S-MP/MCP101(1 pro c. 25 ns) _ MCP Release 2.2 11 _ 31 * * _ 40 FPS 510S MCP101 (1 proc. 25 ns) _ pgf77 -O4 11 _ 30 * * _ 40 IBM ES/9000 Model 340 _ VAST-2/VS Fortran V2R4 11 _ * * _ IBM ES/9000-411 (1 proc 11ns) _ VAST-2/VS Fortran V2R5 11 _ * * _ Meiko Comp. Surface (32 proc) _ _ 210 * * _ 1280 Meiko Comp. Surface (16 proc) _ _ 187 * * _ 640 Meiko Comp. Surface (8 proc) _ _ 147 * * _ 320 Meiko Comp. Surface (4 proc) _ _ 98 * * _ 160 Meiko Comp. Surface (2 proc) _ -O4 -Mvect=smallvect _ 58 * * _ 80 Meiko Comp. Surface (1 proc) _ -Minline=daxpy 11 _ 31 * * _ 40 Stardent3020 _ 3.0 -inline -nmax=300 11 _ 46 * * _ 64 Sperry 1100/90 ext w/ISP _ UCS level 2 11 _ * * _ Multiflow TRACE 7/300 _ Fortran 2.2.1 11 _ 22 * * _ 31 March 6, 1993 * * 10

Computer _ "LINPACK Benchmark" _ * *"TPP" _ "Theoretical

__ n = 100 _ Be* *st Effort _Peak" Mflop/s _________________________________________________OS/Compiler__________________Mflop/s___n=100* *0,_Mflop/s__________________________________ Alliant FX/2800-200 (2 proc) _ fortran 1.1.27 -O -inline 10 _ * * 53 _ 80 Alliant FX/80 (8 proc.) _ -O -DAS -inline 10 _ * * 69 _ 188 IBM 3090/180J _ VS Fortran V2R3 10 _ * * _ MIPS RC6280 (60.0MHz) _ f77 2.20 -O 10 _ * * 16 _ 24 MIPS RC6260 (60.0MHz) _ f77 2.20 -O 10 _ * * 16 _ 24 Multiflow TRACE 14/200 _ Fortran 1.7 10 _ * * _ 31 Stardent3010 _ 3.0 -inline -nmax=300 10 _ * * 25 _ 32 Stardent 1540 (Ardent Titan-4) _ _ * * 47 _ 64 Stardent 1530 (Ardent Titan-3) _ _ * * 37 _ 48 Stardent 1520 (Ardent Titan-2) _ f77 1.0 -O3 -inline 10 _ * * 25 _ 32 SGI 4D/240(4 proc) 25MHz _ f77 -O2 -mp 9.8 _ * * 28 _ 40 Intel iPSC/Delta (512 proc) _ _ * * 446 _ 20480 Intel iPSC/Delta (256 proc) _ _ * * 418 _ 10240 Intel iPSC/Delta (128 proc) _ _ * * 393 _ 5120 Intel iPSC/Delta (64 proc) _ _ * * 352 _ 2560 Intel iPSC/Delta (32 proc) _ _ * * 304 _ 1280 Intel iPSC/Delta (16 proc) _ _ * * 231 _ 640 Intel iPSC/Delta (8 proc) _ _ * * 163 _ 320 Intel iPSC/Delta (4 proc) _ _ * * 100 _ 160 Intel iPSC/Delta (2 proc) _ if77 -O3 -Mvect=smallvect _ * * 58 _ 80 Intel iPSC/Delta (1 proc) _ -Minline=daxpy -Knoieee 9.7 _ * * 34 _ 40 Intel iPSC/860 d7 (128 proc) _ _ * * 219 _ 5120 Intel iPSC/860 d6 (64 proc) _ _ * * 208 _ 2560 Intel iPSC/860 d5 (32 proc) _ _ * * 167 _ 1280 Intel iPSC/860 d4 (16 proc) _ _ * * 131 _ 640 Intel iPSC/860 d3 (8 proc) _ _ * * 103 _ 320 Intel iPSC/860 d2 (4 proc) _ _ * * 75 _ 160 Intel iPSC/860 d1 (2 proc) _ if77 -O3 -Mvect=smallvect _ * * 52 _ 80 Intel iPSC/860 d0 (1 proc) _ -Minline=daxpy -Knoieee 9.7 _ * * 34 _ 40 IBM 3090/180S _ VS Fortran 2.3.0 9.6 _ * * 92 _ 133 Alliant FX/80 (7 proc.) _ -O -DAS -inline 9.5 _ * * 63 _ 165 CDC CYBER 4680 _ f77 2.11.2 o2 9.4 _ * * _ IBM Power Vis. Sys. (32 proc.) _ _ * * 310 _ 1280 IBM Power Vis. Sys. (1 proc.) _ -O4 -Mvect=smallvect:201 -Minline=daxpy 9.3 _ * * _ NAS AS/EX 50 _ Fortran 9.3 _ * * _ 28 Sun SPARCsystem 10/30 36MHz _ f77 -O4 -cg89 -libmil -native 9.3 _ * * _ SGI 4D/420(2 proc) 40MHz _ f77 -O2 -mp 9.3 _ * * 23 _ 32 IBM RISC Sys/6000-520 (20 MHz) _ v2.2.1 xlf -O -P -Wp,-ea478 9 _ * * 29 _ 40 IBM RISC Sys/6000-320 (20 MHz) _ v2.2.1 xlf -O -P -Wp,-ea478 9 _ * * 29 _ 40 IBM ES/9000-180 VF(15 ns) _ VAST-2/VS Fortran V2R4 8.9 _ * * 48 _ 133 Solbourne 6/904 (Viking sparc) _ f77 -O3 -cg89 -dalign 8.9 _ * * _ DEC VAXvector 6000/520 (2 proc) _ Fortran HPO V1.2 8.8 _ * * 51 _ 90 Comparex 8/92 (Fujitsu M382) _ VS/FORTRAN 2.4.0 8.7 _ * * _ DEC VAXstation 4000-90 _ V 5.2 8.7 _ * * _ IBM ES/9000-311 (1 proc 11ns) _ VAST-2/VS Fortran V2R5 8.6 _ * * _ IBM ES/9000 Model 320 _ VAST-2/VS Fortran V2R4 8.5 _ * * _ March 6, 1993 * * 11

Computer _ "LINPACKBenchmark" _ "TPP" _ "* *Theoretical

__ n = 100 _ Best Effort _P* *eak" Mflop/s _________________________________________OS/Compiler__________Mflop/s___n=1000,_Mflop/s______* *____________________________ NAS AS/9160 _ VAST/VS 1.4.1 opt=3 8.3 _ _ Alliant FX/80 (5 proc.) _ -O -DAS -inline 8.1 _ 49 _ * * 118 IBM ES/9000 Model 260 _ VAST-2/VS Fortran V2R4 8.0 _ _ SCS-40 _ CFT 1.13 8.0 _ 17 _ * * 45 SGI 4D/320(2 proc) 33MHz _ f77 -O2 -mp 8.0 _ 20 _ * * 26 IBM ES/9000 Model 210 _ VAST-2/VS Fortran V2R4 7.7 _ _ IBM ES/9000 Model 320 _ VS/FORTRAN V2R4 7.6 _ _ IBM 3090/120E VF _ VS 2.1.1 opt=3 7.5 _ 54 _ * * 108 IBM 3090/180E _ VS 2.1.1 opt=3 7.4 _ 71 _ * * 116 Siemens 7890F _ Fortran 77 V10.3 7.2 _ _ CONVEX C-130 _ Fortran 4.0 7.2 _ 31 _ * * 36 Alliant FX/80 (4 proc.) _ -O -DAS -inline 7.2 _ 33 _ * * 94 DEC VAXvector 6000/510 (1 proc) _ Fortran HPO V1.2 7.0 _ 28 _ * * 45 Stardent 1510 (Ardent Titan-1) _ f77 1.0 -O2 -inline 6.9 _ 13 _ * * 16 IBM ES/9000 Model 190 _ VAST-2/VS Fortran V2R4 6.6 _ _ * * 133 IBM 3090/180 _ VS opt=3 6.8 _ 65 _ * * 108 Alliant FX/40 (4 proc.) _ -O -DAS -inline 6.7 _ 33 _ * * 94 CONVEX C-120 _ fc 5.1 6.5 _ 17 _ * * 20 IBM RISC Sys/6000-220 (33 MHz) _ v2.2.1 xlf -O -P -Wp,ea478 6.5 _ 14 _ * * 66 Alliant FX/4 (4 proc.) _-O -DAS -inline 6.4 _ 21 _ * * 47 FujitsuM-380 _ Fortran 77, opt=3 6.3 _ _ DEC VAX 6620 _ V5.5 6.2 _ _ Alliant FX/2800-200 (1 proc) _ fortran 1.1.27 -O -inline 6.4 _ 28 _ * * 40 Multiflow TRACE 7/200 _ Fortran 1.4 6.0 _ _ * * 15 SGI 4D/420(1 proc) 40MHz _ f77 -O2 6.0 _ 12 _ * * 16 Siemens 7890G _ Fortran 77 V10.3 opt=4 5.9 _ _ IBM 3090/150E _ VS 2.1.1 opt=3 5.9 _ 64 _ * * 112 FPS-264 (M64/60) _ F02 APFTN64 OPT=4 5.9 _ 34 _ * * 38 Alliant FX/80 (3 proc.) _ -O -DAS -inline 5.9 _ 32 _ * * 71 SGI 4D/220(2 proc) 25MHz _ f77 -O2 -mp 5.9 _ 15 _ * * 20 Apollo DN10000 _ f77,10.7 5.8 _ _ Alliant FX/40 (3 proc.) _ -O -DAS -inline 5.6 _ 27 _ * * 71 DEC 5900 RISC _ Ultrix 4.1 5.3 _ _ DEC 5000/240 _ Ultrix 5.3 _ _ Alliant FX/4 (3 proc.) _-O -DAS -inline 5.1 _ 17 _ * * 35 CDC 4330-300 (33 MHz) _ f77 2.20 -O3 5.1 _ _ VAXstation4000-90 _ DEC FORTRAN V5.2 5.1 _ _ DEC VAX 6000/610 (1 proc) _ VMS V5.2 5.0 _ _ Intel iPSC/2 d4/VX (16 proc) _ _ 39 _ Intel iPSC/2 d5/VX (32 proc) _ _ 52 _ SGI 4D/310(1 proc) 33MHz _ f77 -O2 5.0 _ 10 _ * * 13 HoneywellDPS90 _ ES F77V 1.0 5.0 _ _ Siemens 7890D _ Fortran 77 V10.3 5.0 _ _ IBM ES/9000 Model 180 (15 ns) _ VAST-2/VS Fortran V2R4 4.9 _ _ CDC CYBER 875 _ FTN 5 opt=3 4.8 _ _ Number Smasher i860 40MHz _ -on -OLM -fdiv -inline 4.7 _ _ * * 40 March 6, 1993 * * 12

Computer _ "LINPACKBenchmark" _ "TPP" _ "* *Theoretical

__ n = 100 _ Best Effort _ Pe* *ak"Mflop/s _________________________________________OS/Compiler_________Mflop/s___n=1000,_Mflop/s_______* *___________________________ CDC CYBER 176 _ FTN 5.1 opt=2 4.6 _ _ MIPS RC3360 (33.3MHz) _ f77 2.20 -O 4.5 _ 11 _ * * 13 Alliant FX/80 (2 proc.) _ -O -DAS -inline 4.4 _ 22 _ * * 47 Alliant FX/40 (2 proc.) _ -O -DAS -inline 4.3 _ 19 _ * * 47 NAS AS/EX 30 _ VS 1.4.1 opt=3 4.3 _ _ SGI 4D/35 _ f77 -O3 4.3 _ _ SUN 4/600 MP _ f77 1.4 -O3 -cg89 -dalign 4.3 _ _ SUN SPARCstation IPX _ f77 1.4 -O3 -cg89 -dalign 4.1 _ _ SUN 4/50 IPX _ f77 1.4 -O3 -cg89 -dalign 4.1 _ _ CDC CYBER 4360 _ f77 2.11.2 o2 4.0 _ _ SUN SPARCstation 2 _ f77 1.4 -O3 -cg89 -dalign 4.0 _ _ Amdahl 5860 HSFPF _ H enhanced opt=3 3.9 _ _ MIPS M/2000 (25.0MHz) _ f77 2.20 -O 3.9 _ 7.9 _ * * 10 MIPS RC3260 (25.0MHz) _ f77 2.20 -O 3.9 _ 7.9 _ * * 10 Alliant FX/4 (2 proc.) _-O -DAS -inline 3.8 _ 12 _ * * 24 SGI 4D/210(1 proc) 25MHz _ f77 -O2 3.9 _ 7.8 _ * * 10 Amdahl 5860 HSFPF _ VS opt=3 3.8 _ _ CDC 4320 _ f77 2.20 opt=02 3.7 _ _ DEC station 5000/200 (25 Mhz) _ MIPS f77 2.0 3.7 _ _ MIPS RS3230 (25.0MHz) _ f77 2.20 -O 3.7 _ 7.8 _ * * 10 DEC VAXvector 6000/420 (2 proc) _ Fortran HPO V1.0 _ 43 _ * * 90 DEC VAXvector 6000/410 (1 proc) _ Fortran HPO V1.0 3.6 _ 24 _ * * 45 SUN 4/490 _ 4.1.1 f77 -O3 3.6 _ _ CDC 4330 _ f77 2.20 opt=02 3.5 _ _ NAS 8093 w/HSA _ VS 1.4.0 opt=3 3.5 _ _ CDC 7600 _ FTN 3.3 _ _ CDC CYBER 960-31 _ NOS/VE 1.3.1 FTN 1.6 3.1 _ _ Gould NP1 _ Fortran 3.1 _ _ IBM 3090/120E _ VS 2.1.1 opt=3 3.1 _ 54 _ * * 108 MIPS RC3240 (25.0MHz) _ f77 2.20 -O 3.1 _ 7.1 _ * * 10 CDC CYBER 4340 _ f77 2.11.2 o2 3.0 _ _ CONVEX C-1/XP _ Fortran 2.0 3.0 _ _ * * 20 DEC VAX 6540 _ VMS 5.4-2 3.0 _ _ FPS-264/20 (M64/50) _ F02 APFTN64 OPT=4 3.0 _ 17 _ Harris Nighthawk 4802 (88100) _ f77 3.0 _ _ CONVEX C-1/XL _ Fortran 1.6 2.9 _ _ * * 20 IBM ES/9000 Model 150 _ VS Fortran V2R4 2.9 _ _ NAS AS/EX 25 _ VS 1.4.1 opt=3 2.9 _ _ Solbourne 5/602 _ f77 (Sun) 1.2 -O3 -dalign 2.9 _ _ SUN 4/330 _ f77 1.4 -O3 -dalign 2.7 _ _ SUN 4/370 _ f77 1.3.1 -O3 -cg89 -dalign 2.7 _ _ CDC CYBER 760 _ FTN 5, opt=3 2.6 _ _ CyberPlus _ CPFTN 1.1-07 2.6 _ _ IBM 370/195 _ H enhanced opt=3 2.5 _ _ SUN 4/330 SparcServer _ f77 1.2, -O3 -dalign 2.5 _ _ Alliant FX/80 (1 proc.) _ -O -DAS -inline 2.4 _ 12 _ * * 24 March 6, 1993 * * 13

Computer _ "LINPACKBenchmark" _ "TPP" * *_ "Theoretical

__ n = 100 _ BestEffort * * _Peak" Mflop/s ___________________________________________OS/Compiler____________Mflop/s____n=1000,_Mflop/s_* *_________________________________ Alliant FX/40 (1 proc.) _ -O -DAS -inline 2.4 _ 10 * * _ 24 Gateway 2000 66 MHz 80486-DX2 _ F77L-EM32 5.01 /4 /Z1 2.4 _ * * _ HP-APOLLO 9000/425e (68040) _ f77 -O4 rev 10.3.5 2.3 _ * * _ NAS AS/EX 20 _ VS 1.4.1 opt=3 2.2 _ * * _ Fujitsu AP1000 (512 proc.) _ _ 610 * * _ 2844 Fujitsu AP1000 (256 proc.) _ _ 333 * * _ 1422 Fujitsu AP1000 (128 proc.) _ _ 193 * * _ 711 Fujitsu AP1000 (64 pro c.) _ _ 100 * * _ 356 Fujitsu AP1000 (1pro c.) _ SUN f77 1.3.1 -O3 -dalign 2.2 _ 1.7 * * _ 5.6 HP-APOLLO 9000/425t (68040) _ f77 -O4 rev 10.3.4 2.2 _ * * _ Alliant FX/4 (1 proc.) _-O -DAS -inline 2.1 _ 6.3 * * _ 12 CDC CYBER 175 _ FTN 5 opt=2 2.1 _ * * _ CDC CYBER 180-860 _ NOS/VE OPT=HIGH 2.1 _ * * _ FPS-M64/30 _ APFTN464 OPT=4 2.1 _ 10 * * _ IBM ES/9000 Model 130 _ VS Fortran V2R4 2.1 _ * * _ IBM 3081 K (1 proc.) _ H enhanced opt=3 2.1 _ * * _ MIPS M120-5 _ UMIPS v.3 3.0 f771.31 -O 2.1 _ 3.6 * * _ 8.3 MIPS M/120 (16.7MHz) _ f77 2.20 -O 2.1 _ 4.8 * * _ 6.7 TadpoleSPARCbook (25 MHz) _ f77 -O 2.1 _ * * _ CDC 7600 _ Local 2.0 _ * * _ IBM 3081 K (1 proc.) _ VS opt=3 2.0 _ * * _ Culler PSC _ CSD Fortran 3.21 2.0 _ * * _ 5 FPS M64/35 _ APFTN464 2.0 _ * * _ HP 425T (68040) _ 1.9 _ * * _ CDC CYBER 175 _ FTN 5 opt=1 1.8 _ * * _ HP 9000 Series 835 _ 2.1 fc -O 1.8 _ * * _ Sperry 1100/90 _ FTN opt=ZEO 1.8 _ * * _ SUN SPARCstation 1+ _ f77 1.4 -O3 -cg89 -dalign 1.8 _ * * _ ELXSI 6420 (5 proc.) _ _ 6.4 * * _ ELXSI 6420 (3 proc.) _ _ 4.0 * * _ ELXSI 6420 (2 proc.) _ _ 2.7 * * _ ELXSI 6420 (1 proc.) _ EMBOS 6.3 +opt+inline+vector 1.7 _ 1.4 * * _ FPS-164/364(M64/40) _ F02 APFTN64 OPT=4 1.7 _ 9 * * _ Honeywell DPS 8/88 _ FR7X 1.7 _ * * _ IBM 3033 _ H enhanced opt=3 1.7 _ * * _ IBM 3033 _ VS opt=3 1.7 _ * * _ IBM 3081 D _ VS opt=3 1.7 _ * * _ MIPS RS2030 (16.7MHz) _ f77 2.20 -O 1.7 _ 4.7 * * _ 6.7 Sperry 1100/90 ext _ UFTN 1.7 _ * * _ HP 9000 Series 850 w/fp _ 2.0 fc -O 1.6 _ * * _ Amdahl 470 V/8 _ H enhanced opt=3 1.6 _ * * _ CDC CYBER 170-750 _ FTN 5.1, opt=3 1.6 _ * * _ CDC CYBER 180-850 _ NOS/VE OPT=HIGH 1.6 _ * * _ DECstation 3100 _ V3.0/V1.31 -O 1.6 _ * * _ DEC 5400 _ f77 -O3 1.6 _ * * _ Amdahl 470 V/8 _ VS opt=3 1.5 _ * * _ March 6, 1993 * * 14

Computer _ "LINPACK Benchmark" _ "TPP" * * _ "Theoretical

__ n = 100 _ Best Effort * * _ Peak" Mflop/s _____________________________________________OS/Compiler____________Mflop/s___n=1000,_Mflop/s* *__________________________________ DEC VAXstation 4000-60 _ V5.2 1.5 _ * * _ MIPS M/1000 (15.0MHz) _ f77 2.20 -O 1.5 _ 3.7 * * _ 6 NAS 8093 _ VS 1.4.0 opt=3 1.5 _ * * _ Siemens 7570-P _ For1 1.6A 1.5 _ * * _ AIR 486/33 m-board, 256K cache _ Lahey F77L3, v5.0 /Z1 1.4 _ * * _ Apple Mac Quadra 700 _ Absoft -w-v -O -f -s -N40 1.4 _ * * _ Compaq Deskpro 486/33l-120 w/487 _ Microway NDPF487 -O -OL -on 1.4 _ * * _ NeXTCube _ 2.0 gcc 1.36 -O 1.4 _ * * _ SUN SPARCstation 1 _ f77 1.3.1-O3 -cg89 -dalign 1.4 _ * * _ IBM 4381-23 _ VS Fortran 2.1.1 opt=3 1.3 _ * * _ Compaq Deskpro 486/33l-120 w/487 _ Salford FTN77/ optimized 1.3 _ * * _ Compaq Deskpro 486/33l-120 w/487 _ Watcom WFC386P /OL /OT 1.3 _ * * _ AIR 486/33 m-board, 256K cache _ Lahey F77L3, v5.0 /nZ1 1.2 _ * * _ CDC 7600 _ CHAT, No opt 1.2 _ * * _ DEC VAX 6000/460 (6 proc) _ _ 8.4 * * _ 15 DEC VAX 6000/450 (5 proc) _ _ 7.1 * * _ 13 DEC VAX 6000/440 (4 proc) _ _ 5.8 * * _ 10 DEC VAX 6000/430 (3 proc) _ _ 4.4 * * _ 7.6 DEC VAX 6000/420 (2 proc) _ _ 3.0 * * _ 5.1 DEC VAX 6000/410 (1 proc) _ VMSV5.2 1.2 _ 1.5 * * _ 2.6 IBM ES/9000 Model 120 _ VS Fortran V2R4 1.2 _ * * _ MIPS M/800 (12.5MHz) _ f77 1.31 -O 1.2 _ * * _ 5 Prime P6350 _ f77 rev 20.2.b2 -opt 1.2 _ * * _ IBM 4381 90E _ VS Fortran 2.1.1 opt=3 1.2 _ * * _ CSPI MAP-6430 _ Fortran 1.5.35 1.2 _ * * _ IBM 4381-13 _ VS 1.4.0 opt=3 1.2 _ * * _ IBM 370/168 FastMult _ HExt 1.2 _ * * _ ELXSI 6420 _ Fortran 5.14 opt=10 1.2 _ 1.4 * * _ Amdahl 470 V/6 _ Hopt=2 1.1 _ * * _ Compaq Deskpro 486/33l-120 w/487 _ Lahey F77L3 /Z1 1.1 _ * * _ SUN 4/260 _ f77 -O sys4-b eta2 1.1 _ 1.1 * * _ 3.3 CDC CYBER 180-840 _ NOS/VEOPT=HIGH .99 _ * * _ DEC VAX 8800 (4 proc) _ _ 4.9 * * _ DEC VAX 8800 (3 proc) _ _ 3.7 * * _ DEC VAX 8800 (2 proc) _ _ 2.5 * * _ DEC VAX 8550/8700/8800 _ VMSv4.5 .99 _ 1.3 * * _ Solbourne _ f77-O .98 _ IBM 4381-22 _ VS Fortran 2.1.1 opt=3 .97 _ IBM 4381 MG2 _ VS Fortran opt=3 .96 _ IBM 4381-12 _ VS Fortran 1.4.0 opt=3 .95 _ ICL 3980 w/FPU _ FORTRAN77 PLUS V10.02 .93 _ Siemens 7860E _ Fortran 77 V10.3 .92 _ Concurrent3280XP _ Fortran VII,Z 8.1 .87 _ MIPS M800 w/R2010 FP _ f771.10 .87 _ Gould PN 9005 _ VTX/32 2.0Fortran 77 .87 _ VAXstation3100-76 _ DEC FORTRAN V5.2 .85 _ March 6, 1993 * * 15

Computer _ "LINPACK Benchmark" _ "TPP" _ "Th* *eoretical

__ n = 100 _ Best Effort _Peak"* * Mflop/s ________________________________________OS/Compiler________Mflop/s___n=1000,_Mflop/s_________* *_________________________ IBM 9370-90 _ VS Fortran 1.3.0 opt=3 .78 _ nCUBE 2, 1024 proc _ _ 258 _ * *2409 nCUBE 2, 512 proc _ _ 204 _ * *1205 nCUBE 2, 256 proc _ _ 165 _ * * 602 nCUBE 2, 128 proc _ _ 116 _ * * 301 nCUBE 2, 64 proc _ _ 76.9 _ * * 151 nCUBE 2, 32 proc _ _ 46.0 _ * * 75 nCUBE 2, 16 proc _ _ 26.1 _ * * 38 nCUBE 2, 8 proc _ _ 14.2 _ * * 19 nCUBE 2, 4 proc _ _ 7.50 _ * * 9.4 nCUBE 2, 2 proc _ _ 3.91 _ * * 4.7 nCUBE 2, 1 proc _ Fort77/ncc-O3 .78 _ 2.02 _ * * 2.35 IBM 370/165 FastMult _ H Ext .77 _ Prime P9955II _ f77 rev 20.2.b2 -opt .72 _ DEC VAX 8530 _ VMS v4.6 .73 _ HP 9000 Series 850 _ 2.0 fc -O .71 _ DEC VAX 8650 _ VMS v4.5 .70 _ DEC VAX 8500 _ VMS v4.5 .65 _ HP/Apollo DN4500 (68030 + FPA) _ .60 _ Mentor Graphics Computer _ fortran .60 _ MIPS M/500 ( 8.3HHz) _ f77 1.21 -O .60 _ _ * * 3.3 Data General MV/20000 _ f77 .59 _ IBM 9377-80 _ VS Fortran 2.1.1 opt=3 .58 _ Sperry 1100/80 w/SAM _ FTN opt=ZEO .58 _ CDC CYBER 930-31 _ NOS/VE 1.2.2 .58 _ Russian PS-2100 _ FORTRAN-PS .57 _ 1.6 _ Harris H1200 _ VOS 4.1 opt g .56 _ HP/Apollo DN4500 (68030) _ .55 _ HP 9000 Series 825 _ 2.0 fc -O .53 _ HP-APOLLO 9000/400t (68030) _ f77 -O4 rev 10.8(190) .51 _ Harris HCX-9 _ hf77 -O3 .50 _ Pyramid 9810 _ OSx 4.0 .50 _ HP 9000 Series 840 _ 2.0 fc -O .49 _ DEC VAX 8600 _ VMS v4.5 .48 _ Harris HCX-7 w/fpp _ f77 1.0 .48 _ CDC 6600 _ FTN 4.6 opt=2 .48 _ CDC CYBER 170-835 _ FTN 5 opt=2 .47 _ CCI Power 6/32 w/fpa _ UNIX 4.2 bsd f77 .47 _ IBM 4381-21 _ VS Fortran 2.1.1 opt=3 .47 _ Sperry 7000 _ 4.2 .47 _ Gould PN9000 _ UNIX .47 _ SUN-3/260 + FPA _ 3.2 f77 -O -ffpa .46 _ IBM 4381 MG1 _ VS Fortran opt=3 .46 _ DEC VAX 6210 (1 proc.) _ VMS v5.0 .46 _ CDC CYBER 170-835 _ FTN 5 opt=1 .44 _ HP 9000 Series 840 _ HP-UX 14.3 .43 _ March 6, 1993 * * 16

Computer _ "LINPACKBenchmark" _ "TPP" _ * * "Theoretical

__ n = 100 _ Best Effort _ * *Peak" Mflop/s _________________________________________OS/Compiler___________Mflop/s___n=1000,_Mflop/s_____* *_____________________________ IBM RT 135 _ AIX-2.2 .42 _ Harris H1000 _ VOS 3.3 opt g .41 _ microVAX3200/3500/3600 _ VMS v4.6 .41 _ Apple Macintosh IIfx _ A/UX 2.0 f77 .41 _ Apollo DN5xxT FPX _ DOMAIN/IX SR9.7 opt 4 .40 _ microVAX3200/3500/3600 _ ULTRIX 2.2/VFU .40 _ IBM 9370-60 _ VS Fortran 1.4.0 opt=3 .40 _ Sun-3/160 + FPA _ 3.2 f77 -O -ffpa .40 _ Prime P9755 _ f77 rev 20.2.b2 -opt .40 _ Ridge 3200 Model 90 _ ROS/rf .39 _ IBM 4381-11 _ VS Fortran 1.4.0 opt=3 .39 _ Gould 32/9705 mult acc _ fort77+ 4.3 .39 _ NORSK DATA ND-570/2 _ Fortran-500-E .38 _ Sperry 1100/80 _ FTN opt=ZEO .38 _ Apple Mac IIfx _ Absoft -w -v -O -f -s .37 _ CDC CYBER 930-11 _ NOS/VE OPT=High .37 _ CSA w/T800C-20 _ Fortran 3L .37 _ Inmos T800 (20 MHz) _ Fortran 3L -:o0 .37 _ Sequent Symmetry (386 w/fpa) _ Fortran -fpa -O3 .37 _ CONCEPT 32/8750 _ UTX/32 .36 _ CelerityC1230 _ UNIX 4.2 bsd f77 .36 _ IBM RT PC 6150/115 fpa2 _ f77 .36 _ IBM 9373-30 _ VS Fortran 2.1.1 opt=3 .36 _ CDC 6600 _ RUN .36 _ Gould PN9080 _ UTX/32 .35 _ Prime 9950 _ F77 19.4.2 .34 _ Opus Series 300pm 30 MHz _ UNIX Greenhills .33 _ Masscomp MC5600 w/fpa _ f77 v1.2 -O3 rtv v3.1 .33 _ Data General MV/10000 _ f77 opt level 2 .30 _ IBM 4361 MG5 _ VS Fortran opt=3 .30 _ DATEK 80386-33 /w 64KB Cache _ MS Fortran 5.0 -Ox -AH -G2 .27 _ Inmos T800 (20 MHz) _ Fortran 3L -:o1 .26 _ Apollo DN3500 _ FTN -CPU 3000 -opt 4 .25 _ IRIS 2400 Turbo/FPA _ f77 .24 _ CDC CYBER 180-830 _ NOS/VE OPT=HIGH .24 _ Apple Macintosh PowerBook 170 _ Absoft -w -v -O -f -s .23 _ Gould PN 6005 _ VTX/32 2.0 Fortran 77 .23 _ Harris 800 _Fortran 77 .23 _ IBM 370/158 _ H opt=3 .23 _ IBM 370/158 _ VS Fortran opt=3 .22 _ NORSK DATA ND-560 _ Fortran-500 .22 _ CelerityC1200 _ UNIX 4.2 bsd f77 .21 _ Honeywell DPS 8/70 _ FR7X .21 _ Denelcor HEP _ f77 UPX .21 _ VAX 11/785FPA _ VMS v4.5 .20 _ CDC CYBER 170-720 _ FTN 5, opt=2 .20 _ March 6, 1993 * * 17

Computer _ "LINPACK Benchmark" _ "TPP" _ * *"Theoretical

__ n = 100 _ BestEffort _ P* *eak" Mflop/s __________________________________________OS/Compiler_________Mflop/s____n=1000,Mflop/s______* *____________________________ Apple Macintosh IIsi _ Absoft -w -v -O -f -s .19 _ Itel AS/5 mod 3 _ H .19 _ NORSK DATA ND-500 _ Fortran-500-E .19 _ KONTRONKSM/386 _ UNIX SVS F77 2.8 .19 _ Sun 386i/250 25 MHz _ SunOS 4.0; Sun 1.1 -O .19 _ CDC CYBER 170-825 _ FTN 5, opt=2 .19 _ IBM 4341 MG10 _ VS Fortran opt=3 .19 _ Apollo DN2500 _ .18 _ Pyramid 98xe _ OSx 4.0 .18 _ IBM 9370-40 _ VS Fortran 1.4.0 opt=3 .18 _ VAX 11/785FPA _ UNIX 4.2 bsd f77 .18 _ DEC VAX 8250/8350 (UP) _ VMS v4.6 .18 _ CDC CYBER 170-825 _ FTN 5, opt=1 .18 _ Ridge Server/RT EFP _ ROS/rf .18 _ CDC CYBER 170-720 _ FTN 5, opt=1 .17 _ Ridge 32/130 _ OS 3.3/RISC .17 _ PC Craft 2400/25MHz w/80387 _ PLI Fortran 2.09 .17 _ Concurrent3252 _ OS 6.2.4 fortran z .17 _ Tandy 5000 MC 20MHz _ LPI Fortran 3.0 .17 _ Tektronix 4315 w/68882 _ UTEK f77 .17 _ CDC CYBER 180-810 _ NOS/VE OPT=HIGH .17 _ Prime P2755 _ f77 rev 20.2.b2 -opt .17 _ Apple Macintosh IIx _ A/UX 2.0 f77 .16 _ Concurrent3242 _ OS 32 v7.2 f77 .16 _ Compaq 386/20 w/387 _ Microsoft Forrtan 4.1 .16 _ Apple Macintosh IIcx _ Absoft -w -v -O -f -s .15 _ Apple Macintosh IIx _ Absoft -w -v -O -f -s .15 _ DEC VAX 8200/8300 _ VMS v4.5 .15 _ IBM PS/2-70 (20 MHz) _ AIX 1.2 .15 _ Apple Macintosh SE 30 _ Absoft -w -v -O -f -s .14 _ Apollo DN4000 _ DOMAIN/IX SR9.7 opt 4 .14 _ ICL 2988 _ f77 OPT=2 .14 _ IBM 9370-20 _ VS Fortran 1.4.0 opt=3 .14 _ HP Vectra RS/20C20 MHz _ LPI Fortran 3.0 .14 _ VAX 11/780FPA _ VMS v4.5 .14 _ Compaq 386/20 w/387 _ RM/Forrtan2.43 .13 _ microVAXII _ VMS v4.5 .13 _ Prime P2450 _ f77 rev 20.2.b2 -opt .13 _ Apple Macintosh IIsi _ Fortran .12 _ Apple Mac II/16 Mhz/25 Mhz 68882 _ Absoft 2.4 -w -v -O -f -s .12 _ CDC 6500 _ FUN .12 _ CONCEPT 32/6750 _ UTX/32 .12 _ IBM PS/2-70 (16 MHz) _ AIX 1.2 .12 _ IBM RT w/68881 _ f77 .12 _ VAX 11/750FPA _ VMS v4.1 .12 _ micro VAX II _ ULTRIX2.2/VFU .12 _ March 6, 1993 * * 18

Computer _ "LINPACK Benchmark" _ "TPP" _ * *"Theoretical

__ n = 100 _ BestEffort _ P* *eak" Mflop/s __________________________________________OS/Compiler_________Mflop/s____n=1000,Mflop/s______* *____________________________ Prime 750 _ Primos f77 v19.1 .11 _ Sun 3/260, 20 MHz 68881 _ 3.2 f77 -O -f68881 .11 _ ENCORE Multimax NS32332 _ f77 .11 _ Tektronix 4315 w/68881 _ UTEK f77 .11 _ HP 9000 Series 350 _ HP-UX, f77 5.2 .11 _ Definicon DSI-780 _ SVS Fortran (MSDOS) .11 _ Concurrent3230 _ OS 6.2.2 fortran 5.2 .11 _ VAX 11/780FPA _ UNIX 4.3 BSD f77 -O .11 _ Sun 3/160, 16.7 MHz 68881 _ 3.2 f77 -O -f68881 .10 _ NCUBE (1 proc. 8 MHz) _ Fortran .10 _ Apple Mac SE/30 _ ABSOFT 2.4 .10 _ Apollo DN590 _ DOMAIN/IX SR9.7 opt 4 .099 _ Masscomp MC5600 68881 _ f77 v1.2 -O3 rtv v3.1 .099 _ VAX 11/750FPA _ UNIX 4.2 bsd f77 .096 _ Prime 850 _ Primos .095 _ Sperry 1100/60 _ FTN opt=ZEO .093 _ Pyramid 90X FPA _ UNIX 4.2 bsd f77 .088 _ Apple Mac II/16 Mhz/25 Mhz 68882 _ Absoft 2.4 .087 _ SUN-3/50, 16.7 MHz 68881 _ 3.2 f77 -O -f68881 .087 _ HP 9000 Series 330 _ HP-UX, f77 5.2 .087 _ Apple Macintosh II _ Absoft -w -v -O -f -s .083 _ microVAXII _ f77 Ultrix 1.1 .082 _ Apple Mac SE + 20 MHz 68881 _ ABSOFT 2.4 .082 _ Ridge 32/110 _ ROS 3.3/RISC .081 _ Data General MV/8000 _ f77 opt level 2 .078 _ Apple MAC II w/882 _ .078 _ Prime P2350 _ f77 rev 20.2.b2 -opt .077 _ Apple Mac/Levco Prodigy 4 _ ABSOFT MacFort 020 .076 _ Apple Mac II w/68020 _ FORTRAN .074 _ HP 9000 Series 320 _ HP-UX, f77 5.2 .073 _ Apollo DN3000 _ DOMAIN/IX SR9.7 opt 4 .071 _ Apollo DN460/660 _ AEGIS 8.0 FTN .069 _ Masscomp MC500 w/FPP _ 3.1 Fortran .061 _ Harris HS-20 w/FPP _ Fortran 77 3.1 .061 _ Sequent Balance 8000 _ DYNIX Fortran 2.4.4 .059 _ Definicon DSI-32/10 _ Greenhills f77 (MSDOS) .057 _ VAX11/750 _ VMS v4.1 .057 _ Encore Multimax _ f77 .055 _ HP 9000 Series 500 _ Fortran1.7 .043 _ Opus 32.32 _ UNIX, f77 4.2 bsd .043 _ ATT 3B20 FP _ UNIX V 2.0/4 .040 _ Acorn Cambridge _ fortran .039 _ IBM 4331 MG2 _ H opt=3 .038 _ Burroughs B6800 _ Fortran 77 ver 34 .037 _ VAX 11/725FPA _ VMS v4.1 .037 _ Masscomp MCS-541 w/FPB _ Fortran3.1 .037 _ March 6, 1993 * * 19

Computer _ "LINPACKBenchmark" _ "TPP" _ * * "Theoretical

__ n = 100 _ Best Effort * *_Peak" Mflop/s ____________________________________________OS/Compiler__________Mflop/s___n=1000,_Mflop/s___* *_______________________________ IBM RT PC Model 20 _ f77 .036 _ VAX 11/730FPA _ VMS .036 _ Prime 2250 _ Fortran 77 .034 _ IBM PC-AT/370 _ VS Fortran opt=3 .033 _ IBM PC-XT/370 _ H opt=3 .031 _ VAX11/750 _ UNIX 4.2 bsd f77 .029 _ Apollo DN320 _ AEGIS 8.0 FTN .028 _ Sun 2/50 + SKY FFP _ f77 -O -fsky 3.0 .027 _ Ametek S14/32 (1 node) _ RM Fortran 2.11 .026 _ Apollo DN550 FPA _ AEGIS 8.0 FTN .025 _ AMSTRAC 1512 8086/8087 9.54 MHz _ MS-Fortran 4.0 -Ox -AH .022 _ microVAXI _ VMS .023 _ Canaan _ VS .021 _ Chas. River Data 6835+SKY _ SVS Fortran 77 .018 _ Apollo DN 420 PEB _ AEGIS 7+ FTN .017 _ IBM AT w/80287 _ PROFORT 1.0 .012 _ IBM PC w/8087 _ PROFORT 1.0 .012 _ Cadtrak DS1/8087 _ Intel Fortran 77 .011 _ Apple Mac Classic II/16 MHz _ Absoft 2.4 .011 _ IBM PC/AT w/80287 _ Microsoft 3.2 .0091 _ Chas. River Data 6835 _ SVS Fortran 77 .0088 _ Apollo DN300 _ AEGIS 8.0 FTN .0071 _ Masscomp MC500 _ 3.1 Fortran .0070 _ IBM PC w/8087 _ Microsoft 3.2 .0069 _ Apple Mac II _ ABSOFT 2.4 .0064 _ HP 9000 Series 200 _ HP-UX .0062 _ Sun 2/50 _ f77 -O -fsoft 3.0 .0055 _ AtariST _ ABSOFT AC/Fortran v2.2 .0051 _ Apple Macintosh _ ABSOFT 2.0b .0038 _ March 6, 1993 * * 20

Table 2: A Look at Parallel Processing

Computer 1000 x 1000 Problem with Parallel Processing Time no. of Time Speedup Efficiency __________________________uniprocessor___processors__multiprocessors_________________________* *___

CRAY Y-MP C90 0.765 16 .0688 11.12 .69 CRAY Y-MP C90 0.765 14 .0751 10.19 .73 CRAY Y-MP C90 0.765 12 .0831 9.21 .77 CRAY Y-MP C90 0.765 10 .0951 8.04 .80 CRAY Y-MP C90 0.765 8 .112 6.85 .86 CRAY Y-MP C90 0.765 6 .145 5.28 .88 CRAY Y-MP C90 0.765 4 .204 3.74 .94 _CRAY_Y-MP_C90_______________0.765___________2_____________.391__________1.96_________.98____* *__ _NECSX-3_____________________0.149___________2____________.0820__________1.82_________.91____* *__ CRAYY-MP/8 2.17 8 .312 6.96 .87 CRAYY-MP/8 2.17 4 .577 3.76 .94 CRAYY-MP/8 2.17 3 .754 2.88 .96 _CRAYY-MP/8___________________2.17___________2_____________1.11__________1.96_________.98____* *__ CRAY2S 1.76 4 .476 3.66 .91 CRAY2S 1.76 3 .617 2.82 .94 _CRAY2S_______________________1.76___________2_____________.902__________1.93_________.96____* *__ CRAYX-MP/4 3.10 4 .813 3.78 .94 CRAYX-MP/4 3.10 3 1.07 2.87 .96 _CRAYX-MP/4___________________3.10___________2_____________1.57__________1.96_________.98____* *__ CONVEXC3880 5.90 8 .841 7.02 .88 CONVEXC3840 5.90 4 1.58 3.74 .94 CONVEXC3830 5.90 3 2.05 2.88 .96 _CONVEXC3820__________________5.90___________2_____________3.01__________1.96_________.98____* *__ CRAYS-MP/MCP784 21.4 84 0.902 23.7 .28 CRAYS-MP/MCP756 21.4 56 0.986 21.7 .39 CRAYS-MP/MCP728 21.4 28 1.32 16.2 .58 _CRAYS-MP/MCP707______________21.4___________7_____________3.46__________6.19_________.88____* *__ Fujitsu AP1000 160 512 1.10 147 .29 Fujitsu AP1000 160 256 1.50 108 .42 Fujitsu AP1000 160 128 2.42 66.5 .52 Fujitsu AP1000 160 64 3.51 46.0 .72 Fujitsu AP1000 160 32 6.71 24.0 .75 Fujitsu AP1000 160 16 11.5 13.9 .87 Fujitsu AP1000 160 8 22.6 7.12 .89 Fujitsu AP1000 160 4 41.3 3.90 .97 _Fujitsu_AP1000_______________160____________2_____________81.4__________1.96_________.98____* *__ IBM 3090/600S VF 7.27 6 1.29 5.64 .94 IBM 3090/500S VF 7.27 5 1.52 4.78 .96 IBM 3090/400S VF 7.27 4 1.89 3.85 .96 IBM 3090/300S VF 7.27 3 2.46 2.96 .99 IBM 3090/280S VF 7.27 2 3.65 1.99 .99 _IBM_3090/200S_VF_____________7.27___________2_____________3.64__________1.99_________.99____* *__ IntelDelta 22 512 1.5 14.7 .03 IntelDelta 22 256 1.6 13.8 .05 March 6, 1993 * * 21

Computer 1000 x 1000 Problem with Parallel Processing Time no. of Time Speedup Effici* *ency _______________________________uniprocessor___processors__multiprocessors____________________* *_________________

Intel Delta 22 128 1.7 12.9 .10 Intel Delta 22 64 1.9 11.5 .18 Intel Delta 22 32 2.2 10.0 .31 Intel Delta 22 16 2.9 7.59 .47 Intel Delta 22 8 4.1 5.37 .67 Intel Delta 22 4 6.7 3.28 .82 Intel_Delta_____________________________22____________2_____________11.6__________1.90________.95* *________________ IBM 3090/600E VF 9.36 6 1.73 5.41 .90 IBM 3090/500E VF 9.36 5 2.02 4.63 .93 IBM 3090/400E VF 9.36 4 2.48 3.77 .94 IBM 3090/300E VF 9.36 3 3.21 2.92 .97 IBM 3090/200E_VF_______________________9.36___________2_____________4.73__________1.98________.99* *________________ Alliant FX/2800-200 22.9 14 2.06 11.1 .79 Alliant FX/2800-200 22.9 12 2.30 10.0 .83 Alliant FX/2800-200 22.9 10 2.68 8.54 .85 Alliant FX/2800-200 22.9 8 3.24 7.07 .88 Alliant FX/2800-200 22.9 4 6.07 3.77 .94 Alliant_FX/2800-200____________________22.9___________2_____________11.8__________1.94________.97* *________________ IBM PVS 20.4 32 2.17 9.35 .29 IBM PVS 20.4 16 2.35 8.64 .54 IBM PVS 20.4 8 3.41 5.95 .74 IBM PVS 20.4 4 5.71 3.56 .89 IBM PVS________________________________20.4___________2_____________10.6__________1.92________.96* *________________ nCUBE 2 331 1024 2.59 128 .12 nCUBE 2 331 512 3.29 101 .20 nCUBE 2 331 256 4.05 81.7 .32 nCUBE 2 331 128 5.74 57.7 .45 nCUBE 2 331 64 8.70 38.0 .59 nCUBE 2 331 32 14.5 22.8 .71 nCUBE 2 331 16 25.6 12.9 .81 nCUBE 2 331 8 46.9 7.04 .88 nCUBE 2 331 4 89.1 3.71 .93 nCUBE_2________________________________331____________2_____________171.__________1.93________.97* *________________ Intel iPSC/860 22 128 2.8 7.68 .06 Intel iPSC/860 22 64 3.2 6.72 .11 Intel iPSC/860 22 32 4.0 5.38 .17 Intel iPSC/860 22 16 5.1 4.22 .26 Intel iPSC/860 22 8 6.5 3.31 .41 Intel iPSC/860 22 4 8.9 2.42 .60 Intel_iPSC/860__________________________22____________2_____________12.8__________1.68________.84* *________________ Meiko Computing Surface (i860) 21.9 32 3.19 6.85 .21 Meiko Computing Surface (i860) 21.9 24 3.30 6.62 .28 Meiko Computing Surface (i860) 21.9 16 3.57 6.12 .38 Meiko Computing Surface (i860) 21.9 8 4.56 4.79 .60 Meiko Computing Surface (i860) 21.9 4 6.83 3.20 .80 Meiko Computing Surface (i860) 21.9 2 11.6 1.88 .94 March 6, 1993 * * 22

Computer 1000 x 1000Problem with Parallel Processing Time no. of Time Speedup Effici* *ency ______________________________uniprocessor___processors___multiprocessors____________________* *_________________

CONVEX C3240 14.9 4 3.92 3.81 .95 CONVEX C3230 14.9 3 5.06 2.95 .98 CONVEX_C3220__________________________14.9____________2____________7.50___________1.99________.99* *_______________ CONVEX C-240 15 4 4.03 3.76 .94 CONVEX C-230 15 3 5.20 2.91 .97 CONVEX_C-220___________________________15_____________2____________7.65___________1.98________.99* *_______________ Parsytec FT-400 1075 400 4.90 219. .55 Parsytec FT-400 1075 256 6.59 163. .64 Parsytec FT-400 1075 100 13.2 81.4 .81 Parsytec FT-400 1075 64 19.1 56.3 .88 Parsytec_FT-400_______________________1075___________16____________69.2___________15.5________.97* *_______________ FPS Model_522__________________________12_____________2____________6.36___________1.89________.95* *_______________ Suprenum S1C1 51 16 6.4 8.0 .50 Suprenum S1C1 51 14 7.1 7.2 .51 Suprenum S1C1 51 12 7.9 6.5 .54 Suprenum S1C1 51 10 8.9 5.8 .58 Suprenum S1C1 51 8 10.4 4.9 .61 Suprenum S1C1 51 6 13.1 3.9 .65 Suprenum S1C1 51 4 18.1 2.8 .70 Suprenum_S1C1__________________________51_____________2____________33.4___________1.5_________.75* *_______________ Alliant FX/800-200 24.2 4 7.09 3.41 .85 Alliant_FX/800-200____________________24.2____________2____________12.7___________1.91________.95* *_______________ Alliant FX/80 57.7 8 9.64 5.99 .75 Alliant FX/80 57.7 7 10.6 5.44 .78 Alliant FX/80 57.7 6 11.8 4.89 .82 Alliant FX/80 57.7 5 13.6 4.24 .85 Alliant FX/80 57.7 4 16.2 3.56 .89 Alliant FX/80 57.7 3 20.7 2.79 .93 Alliant_FX/80_________________________57.7____________2____________29.8___________1.94________.97* *_______________ Stardent 1540 (Ardent Titan-4) 51.2 4 14.3 3.57 .89 Stardent 1530 (Ardent Titan-3) 51.2 3 18.3 2.80 .93 Stardent_1520_(Ardent_Titan-2)________51.2____________2____________16.3___________1.95________.97* *_______________ SGI 4D/480 40 MHz 54.0 8 9.48 5.70 .71 SGI 4D/440 40 MHz 54.0 4 15.91 3.39 .85 SGI 4D/420_40_MHz_____________________54.0____________2____________28.80__________1.88________.94* *_______________ SGI 4D/380 33 MHz 65.0 8 11.13 5.84 .73 SGI 4D/340 33 MHz 65.0 4 18.62 3.49 .87 SGI 4D/320_33_MHz_____________________65.0____________2____________34.17__________1.90________.95* *_______________ Alliant FX/40 66.1 4 20.5 3.22 .81 Alliant FX/40 66.1 3 24.9 2.65 .88 Alliant_FX/40_________________________66.1____________2____________34.8___________1.90________.95* *_______________ SGI 4D/240 25 MHz 85.2 4 23.89 3.57 .89 SGI 4D/220_25_MHz_____________________85.2____________2____________44.89__________1.90________.95* *_______________ Alliant FX/4 106 4 32.3 3.28 .82 Alliant FX/4 106 3 38.7 2.74 .91 Alliant FX/4 106 2 55.8 1.90 .95 March 6, 1993 * * 23

Computer 1000 x 1000 Problem with Parallel Processing Time no. of Time Speedup Efficiency __________________________uniprocessor___processors___multiprocessors________________________* *____

DEC VAX 6000-460 439 6 80 5.5 .92 DEC VAX 6000-450 439 5 94 4.7 .94 DEC VAX 6000-440 439 4 114 3.8 .96 DEC VAX 6000-430 439 3 152 2.9 .96 _DEC_VAX_6000-420_____________439____________2_____________222____________1.9_________.99____* *__ ELXSI6420 475 5 104 4.57 .91 ELXSI6420 475 3 167 2.84 .95 _ELXSI6420____________________475____________2_____________245____________1.94________.97____* *__ DEC VAX 6240 1295 4 332 3.90 .98 DEC VAX 6230 1295 3 439 2.95 .98 _DEC_VAX_6220_________________1295___________2_____________654____________1.98________.99____* *__ Sequent Balance21000 11111 30 445 25.0 .83 March 6, 1993 * * 24

Table 3: Highly Parallel Computing

Computer _ Numberof Rmax Nmax N1=2 Rpeak _____________(Full_Precision)______________Processors__Gflop/s___order____order___Gflop/s__ Thinking Machines CM-5 _ 1024 _ 59.7 52224 24064 131 Thinking Machines CM-5 _ 512 _ 30.4 36864 16384 66 NEC SX-3/44R (2.5 ns) _ 4 _ 23.2 6400 830 26 NEC SX-3/44 (2.9 ns) _ 4 _ 20.0 6144 832 22 Thinking Machines CM-5 _ 256 _ 15.1 26112 12032 33 Intel Delta (40 MHz) _ 512 _ 13.9 25000 7500 20 CRAY Y-MP C90 (238.1 MHz 4.2 ns) _ 16 _ 13.7 10000 650 15 NEC SX-3/42R (2.5 ns) _ 4 _ 11.6 4352 516 13 NEC SX-3/24R (2.5 ns) _ 2 _ 11.6 4352 492 13 Intel Delta (40 MHz) _ 384 _ 10.2 20000 6000 15 NEC SX-3/24 (2.9 ns) _ 2 _ 10.0 4352 500 11 NEC SX-3/42 (2.9 ns) _ 4 _ 10.0 4608 640 11 Thinking Machines CM-200 (10 MHz) _ 2048 _ 9.0 28672 11264 20 Thinking Machines CM-5 _ 128 _ 7.7 18432 8192 16 Intel Delta (40 MHz) _ 256 _ 7.0 18000 5000 10 NEC SX-3/41R (2.5 ns) _ 4 _ 5.8 3584 414 6.4 NEC SX-3/22R (2.5 ns) _ 2 _ 5.8 3072 370 6.4 NEC SX-3/14R (2.5 ns) _ 1 _ 5.8 2816 282 6.4 Intel Delta (40 MHz) _ 192 _ 5.2 15000 4500 7.7 Thinking Machines CM-2 (7 MHz) _ 2048 _ 5.2 26624 11000 14 NEC SX-3/22 (2.9 ns) _ 2 _ 5.0 3072 384 5.5 NEC SX-3/14 (2.9 ns) _ 1 _ 5.0 3072 384 5.5 Alliant CAMPUS/800 (40 MHz) _ 192 _ 4.8 17024 5768 7.7 Alliant CAMPUS/800 (40 MHz) _ 168 _ 4.1 16016 5516 6.7 Thinking Machines CM-5 _ 64 _ 3.8 13056 6016 8 Intel Delta (40 MHz) _ 128 _ 3.5 12500 3500 5 Alliant CAMPUS/800 (40 MHz) _ 144 _ 3.5 15484 4956 5.8 Alliant CAMPUS/800 (40 MHz) _ 120 _ 2.9 14000 4620 4.8 NEC SX-3/21R (2.5 ns) _ 2 _ 2.9 2560 257 3.2 NEC SX-3/12R (2.5 ns) _ 1 _ 2.9 2048 174 3.2 NEC SX-3/12 (2.9 ns) _ 1 _ 2.5 2048 256 2.8 Intel iPSC/860 (40 MHz) _ 128 _ 2.6 12000 4500 5. Alliant CAMPUS/800 (40 MHz) _ 96 _ 2.3 13020 4396 3.8 Intel iPSC/860 (40 MHz) _ 120 _ 2.3 12000 4500 4.8 Fujitsu AP1000 _ 512 _ 2.3 25600 2500 2.8 Thinking Machines CM-5 _ 32 _ 1.9 9216 4096 4 Intel iPSC/860 (40 MHz) _ 96 _ 1.9 11000 4000 3.8 nCUBE 2 (20 MHz) _ 1024 _ 1.9 21376 3193 2.4 Intel Delta (40 MHz) _ 64 _ 1.7 8000 2500 2.6 Alliant CAMPUS/800 (40 MHz) _ 72 _ 1.6 12012 3724 2.9 MasPar MP-2216 (80ns) _ 16384 _ 1.6 11264 1920 2.4 NEC SX-3/11R (2.5 ns) _ 1 _ 1.5 2048 130 1.6 Intel iPSC/860 (40 MHz) _ 72 _ 1.4 9000 3500 2.9 Intel iPSC/860 (40 MHz) _ 64 _ 1.4 9000 3500 2.6 Meiko Computing Surface (40 MHz) _ 62 _ 1.3 8500 3500 2.5 March 6, 1993 * * 25

Computer _ Number of Rmax Nmax N1=2 Rpeak ______________(FullPrecision)________________Pro_cessors___Gflop/s___order___order___Gflop/s_* *___ NEC SX-3/11 (2.9 ns) _ 1 _ 1.3 2816 192 1.4 Fujitsu AP1000 _ 256 _ 1.2 18000 1600 1.4 Alliant CAMPUS/800 (40 MHz) _ 48 _ 1.1 10024 3024 1.9 Intel iPSC/860 (40 MHz) _ 48 _ .98 7000 3000 1.9 Thinking Machines CM-5 _ 16 _ .98 6528 3008 2 nCUBE 2 (20 MHz) _ 512 _ .958 15200 2240 1.2 IBM PVS (40MHz) _ 32 _ .925 6000 1560 1.3 Intel Delta (40 MHz) _ 32 _ .9 6000 2000 1.3 Meiko Computing Surface (40 MHz) _ 32 _ .825 7000 3000 1.3 NEC SX-3/1LR (2.5 ns) _ 1 _ .78 2304 112 0.8 IBM RS/6000 Cluster (PARC) (62.5 MHz) _ 8 _ .694 10000 1500 1.0 NEC SX-3/1L (2.9 ns) _ 1 _ .67 2048 128 .68 Intel iPSC/860 (40 MHz) _ 32 _ .64 6000 2500 1.3 Fujitsu AP1000 _ 128 _ .566 12800 1100 .71 IBM RS/6000 Cluster (PARC) (50 MHz) _ 8 _ .520 7500 1300 .8 Alliant CAMPUS/800 (40 MHz) _ 24 _ .504 7000 2492 .96 Intel iPSC/860 (40 MHz) _ 24 _ .49 5000 2000 .96 nCUBE 2 (20 MHz) _ 256 _ .482 10784 1504 .64 MasPar MP-1216 (80ns) _ 16384 _ .473 11264 1280 .55 Intel Delta (40 MHz) _ 16 _ .45 4000 1000 .64 Meiko Computing Surface (40 MHz) _ 16 _ .445 5000 2000 .64 MasPar MP-1 (80 ns) _ 16384 _ .44 5504 1180 .58 IBM RS/6000 Cluster (PARC) (50 MHz) _ 6 _ .404 7000 1200 .6 MasPar MP-2204 (80ns) _ 4096 _ .374 5632 896 .60 IBM RS/6000 Cluster (PARC) (62.5 MHz) _ 4 _ .37 5500 850 .50 Intel iPSC/860 (40 MHz) _ 16 _ .36 4500 1500 .64 IBM RS/6000 Cluster (PARC) (50 MHz) _ 4 _ .293 5500 1000 .4 Fujitsu AP1000 _ 64 _ .291 10000 648 .36 nCUBE 2 (20 MHz) _ 128 _ .242 7776 1050 .32 Meiko Computing Surface (40 MHz) _ 8 _ .235 3500 750 .32 Parsytec FT-400 (20 MHz) _ 400 _ .232 7999 814 .6 Intel Delta (40 MHz) _ 8 _ .23 3000 1000 .32 IBM RS/6000 Cluster (PARC) (62.5 MHz) _ 2 _ .19 4000 350 .25 Intel iPSC/860 (40 MHz) _ 8 _ .19 3000 850 .32 Meiko Computing Surface (40 MHz) _ 4 _ .121 2500 500 .16 nCUBE 2 (20 MHz) _ 64 _ .121 5472 701 .15 Intel Delta (40 MHz) _ 4 _ .12 2000 500 .16 MasPar MP-1204 (80ns) _ 4096 _ .116 5632 640 .138 Intel iPSC/860 (40 MHz) _ 4 _ .10 2250 550 .16 IBM RS/6000 (62.5 MHz) _ 1 _ .096 3000 .125 MasPar MP-2201 (80ns) _ 1024 _ .092 2816 448 .15 Meiko Computing Surface (40 MHz) _ 2 _ .062 1750 250 .08 Thinking Machines CM-5 _ 1 _ .068 1632 672 .128 nCUBE 2 (20 MHz) _ 32 _ .061 3888 486 .075 Intel Delta (40 MHz) _ 2 _ .06 1500 500 .08 Intel iPSC/860 (40 MHz) _ 2 _ .058 1500 400 .08 nCUBE 2 (20 MHz) _ 16 _ .032 5580 342 .038 March 6, 1993 * * 26

Computer _ Number of Rmax Nmax N1=2 Rpeak ______________(Full_Precision)________________Processors___Gflop/s___order___order___Gflop/s_* *_____ Meiko Computing Surface (40MHz) _ 1 _ .031 1250 .04 MasPar MP-1201 (80ns) _ 1024 _ .029 2816 320 .034 Intel iPSC/860 (40 MHz) _ 1 _ .024 750 .040 nCUBE 2 (20 MHz) _ 8 _ .0161 3960 241 .019 nCUBE 2 (20 MHz) _ 4 _ .0080 2760 143 .0094 nCUBE 2 (20 MHz) _ 2 _ .0040 1280 94 .0047 nCUBE_2_(20_MHz)__________________________________1_________.0020____1280______51_____.0024__* *______

Thinking Machines CM-200 (half precision) _ 2048 _ 18.5 39936 14336 40 Thinking Machines CM-2 (half precision) _ 2048 _ 10.4 33920 14000 28 IBM GF11* (half precision) (51.9ns) _ 500 _ 5.6 2500 1060 9.6 Fujitsu AP1000(half precision) _ 512 _ 3.53 40000 2368 4.3

*The IBM GF11 is an experimental research computer and not a commercial product. The columns in Table 3 are defined as follows: Rmax the performance in Gflop/s for the largest problem run on a machine. Nmax the size of the largest problem run on a machine. N1=2 the size where half the Rmax execution rate is achieved. Rpeak the theoretical peak performance in Gflop/s for the machine. In addition, the number of processors and the cycle time is listed. Full or half precision refle* *cts the computation was computed using 64 or 32-bit floating point arithmetic respectively.

4.3 Acknowledgments

Iam indebted to the many people who have helped put together this collection.

References

[1]J. Dongarra, J. Bunch, C. Moler, and G. W. Stewart. LINPACK User's Guide. SIAM, Philadelphia, PA, 1979.

[2]J. J. Dongarra, I. S. Duff, D. C. Sorensen, and H. A. Van der Vorst. Solving Linear Systems on V* *ector and Shared Memory Computers. SIAM Publications, Philadelphia, PA, 1990.

[3]C. Lawson, R. Hanson, D. Kincaid, and F. Krogh. Basic linearalgebra subprograms for Fortran usag* *e. ACM Trans. Math. Softw., 5:308-323, 1979.