.B .nr BT ''-%-'' .he '''' .pl 11i .de fO 'bp .. .wh -.5i fO .LP .nr LL 6.5i .ll 6.5i .nr LT 6.5i .lt 6.5i .ta 5.0i .ft 3 .bp .R .sp 1i .ce 100 .R .sp .5i . .sp 10 ARGONNE NATIONAL LABORATORY .br 9700 South Cass Avenue .br Argonne, Illinois 60439 .sp .6i .ps 12 .ft 3 Performance of Various Computers Using Standard Linear Equations Software in a Fortran Environment .ps 11 .sp 3 Jack J. Dongarra .sp 3 .ps 10 .ft 1 Mathematics and Computer Science Division .sp 2 Technical Memorandum No. 23 .sp .7i \*(DY .pn 1 .he ''-%-'' .he '''' .bp .ft 3 .ps 11 .bp .LP .EQ delim @@ .EN .nr PO .5i .nr LL 7.0i .po .5i .ll 7.0i .B .ps 14 .ce Performance of Various Computers Using Standard .sp .ce Linear Equations Software in a Fortran Environment .sp 2 .AU .ps 11 Jack J. Dongarra .AI .ps 10 Mathematics and Computer Science Division\h'.20i' Argonne National Laboratory\h'.20i' Argonne, Illinois 60439\h'.20i' .FS .ps 9 .vs 11p @size -1 {"" sup \(dg}@\|Work supported in part by the Applied Mathematical Sciences subprogram of the Office of Energy Research, U. S. Department of Energy under Contract W-31-109-Eng-38. .FE .sp 3 .R .ps 10 .ce \*(DY .sp 2 .QS Abstract - This note compares the performance of different computer systems while solving dense systems of linear equations using the LINPACK software in a Fortran environment. About 100 computers, ranging from a CRAY X-MP to the 68000 based systems such as the Apollo and SUN Workstations to IBM PC's, are compared. .QE .sp 2 .PP The timing information presented here should in no way be used to judge the overall performance of a computer system. The results reflect only one problem area: solving dense systems of equations using the LINPACK [1] programs in a Fortran environment. .sp .PP LINPACK programs can be characterized as having a high percentage of floating point arithmetic operations. The routines involved in this timing study, SGEFA and SGESL, use column oriented algorithms. That is, the programs usually reference array elements sequentially down a column, not across a row. Column orientation is important in increasing efficiency because of the way Fortran stores arrays. Most floating point operations in LINPACK take place in a set of subprograms, the Basic Linear Algebra Subprograms (BLAS) [2] that are called repeatedly throughout the calculation. The BLAS reference one-dimensional arrays, rather than two-dimensional arrays. .sp .PP The following two tables report timing results using LINPACK to solve a system of linear equations of order 100. The execution speeds, particularly for vector computers, may not have reached their asymptotic rates. (See the appendix for a different comparison of large scientific computers in Fortran.) It should be noted that no changes were made to the LINPACK software with the exception of replacing the BLAS. In certain cases, as noted, the Fortran BLAS were replaced by assembly language versions, or on vector machines the ``unrolled loops'' were replaced by simple loops in Fortran. For the LINPACK routines, no attempt was made to use special hardware features on certain machines, or to exploit vector capabilities or multiple processors. (The compilers on some machines may, of course, generate optimized code that itself accesses special features.) .sp .PP One further note: The information in the tables was compiled over a period of time. Subsequent systems software and hardware changes may alter the timings to some extent. .ps 14 .sp 2 .ds CF \*(DY .ds LH " Full Precision .ds CH - % - .ds RH All Fortran .bp .B .ce Solving a System of Linear Equations .sp .ce with LINPACK @ nothing sup a @ in Full Precision @ nothing sup b @ Using All Fortran .sp .R .ps 10 .TS H center; l l c c c c l l c c c c l l c c c c. Computer OS/Compiler@ nothing sup c @ Ratio@ nothing sup d @ MFLOPS@ nothing sup e @ Time Unit@ nothing sup f @ secs @ mu @secs _ .TH NEC SX-2 FORTRAN 77/SX(Rolled BLAS) .26 46 .015 0.043 NEC SX-1 FORTRAN 77/SX(Rolled BLAS) .31 39 .017 0.051 NEC SX-1E FORTRAN 77/SX(Rolled BLAS) .35 35 .020 0.057 CRAY X-MP-2 (1 proc.) CFT 1.13(Rolled BLAS) .50 24 .028 0.082 Amdahl 1200 Fortran 77(Rolled BLAS) .69 18 .039 0.11 CDC Cyber 205 (2-pipe) FTN(Rolled BLAS) .70 17 .039 0.11 Fujitsu VP-200 Fortran 77(Rolled BLAS) .72 17 .040 0.12 Hitachi S-810/20 FORT77/HAP(Rolled BLAS) .74 17 .042 0.12 Fujitsu VP-100 Fortran 77(Rolled BLAS) .78 16 .044 0.13 Amdahl 1100 Fortran 77(Rolled BLAS) .78 16 .044 0.13 CRAY-2 (1-proc.) CFT 2.70(Rolled BLAS) .84 15 .047 0.14 Amdahl 500 Fortran 77(Rolled BLAS) .86 14 .048 0.14 Fujitsu VP-50 Fortran 77(Rolled BLAS) .90 14 .051 0.15 CRAY-1S CFT(Rolled BLAS) 1 12 .056 0.16 IBM 3090-200/VF (1 proc.) VS Fortran V2(Rolled BLAS) 1.0 12 .057 0.17 Sperry 1100/90 ext w/ISP UCS level 2 1.1 11 .061 0.18 Alliant FX/8 (8CEs) FX Fortran v2.0.19(Rolled BLAS) 1.6 7.6 .009 0.26 SCS-40 CFT 1.13(Rolled BLAS) 1.7 7.3 .095 0.28 IBM 3090-200 (1 proc.) VS opt=3 1.8 6.8 .102 0.30 Fujitsu M-380 Fortran 77, opt=3 1.9 6.3 .109 0.32 FPS-264 F02 APFTN64 OPT=4(Rolled BLAS) 2.2 5.6 .122 0.35 NAS 9060 VS opt=2 2.3 5.3 .130 0.38 Honeywell DPS90 ES F77V 1.0(Rolled BLAS) 2.5 5.0 .138 0.40 CDC Cyber 875 FTN 5 opt=3 2.6 4.8 .143 0.42 CDC Cyber 176 FTN 5.1 opt=2 2.6 4.6 .148 0.43 Amdahl 5860 HSFPF H enhanced opt=3 3.1 3.9 .176 0.51 Amdahl 5860 HSFPF VS opt=3 3.2 3.8 .181 0.53 CDC 7600 FTN 3.8 3.3 .210 0.61 FPS-264/20 F02 APFTN64 OPT=4(Rolled BLAS) 4.1 3.0 .229 0.67 CONVEX C-1 Fortran 1.6(Rolled BLAS) 4.2 2.9 .235 0.69 CDC Cyber 760 FTN 5, opt=3 4.7 2.6 .260 0.76 IBM 370/195 H enhanced opt=3 4.9 2.5 .275 0.80 IBM 3081 K (1 proc.) H enhanced opt=3 5.7 2.1 .321 0.94 CDC Cyber 175 FTN 5 opt=2 5.8 2.1 .322 0.94 CDC Cyber 180-860 NOS/VE OPT=HIGH 5.8 2.1 .324 0.94 IBM 3081 K (1 proc.) VS opt=3 6.2 2.0 .346 1.01 CDC 7600 Local 6.4 2.0 .359 1.05 Sperry 1100/90 FTN opt=ZEO 6.7 1.8 .374 1.09 CDC Cyber 175 FTN 5 opt=1 6.8 1.8 .381 1.11 IBM 3033 H enhanced opt=3 7.0 1.7 .390 1.14 Honeywell DPS 8/88 FR7X 7.0 1.7 .394 1.15 IBM 3033 VS opt=3 7.1 1.7 .396 1.15 FPS-164/364 F02 APFTN64 OPT=4(Rolled BLAS) 7.2 1.7 .405 1.18 Sperry 1100/90 ext UFTN 7.3 1.7 .408 1.19 IBM 3081 D VS opt=3 7.4 1.7 .415 1.21 Alliant FX/1 (1CE) FX Fortran v2.0.19(Rolled BLAS) 7.5 1.6 .420 1.22 Amdahl 470 V/8 H enhanced opt=3 7.7 1.6 .429 1.25 CDC Cyber 180-850 NOS/VE OPT=HIGH 7.8 1.6 .436 1.27 Amdahl 470 V/8 VS opt=3 8.2 1.5 .458 1.33 CDC 7600 CHAT, No opt 9.9 1.2 .554 1.61 IBM 4381-13 VS 1.4.0 opt=3 10 1.2 .579 1.69 IBM 370/168 Fast Mult H Ext 10 1.2 .579 1.69 Amdahl 470 V/6 H opt=2 11 1.1 .631 1.84 ELXSI FTN MOD 2 12 1.1 .643 1.87 DEC VAX 8800 VMS V4.3 12 .99 .690 2.01 CDC Cyber 180-840 NOS/VE OPT=HIGH 12 .99 .694 2.02 IBM 4381 MG2 VS opt=3 13 .96 .718 2.09 IBM 4381-12 VS 1.4.0 opt=3 13 .95 .726 2.12 IBM 370/165 Fast Mult H Ext 16 .77 .890 2.59 DEC VAX 8650 VMS v4.1 17 .70 .975 2.84 DEC VAX 8500 VMS v4 19 .65 1.05 3.07 Sperry 1100/80 w/SAM FTN opt=ZEO 21 .58 1.18 3.45 Harris H1200 VOS 4.1 opt g 22 .56 1.22 3.57 DEC VAX 8600 VMS v4.1 Fortran 4.2 25 .49 1.41 4.11 Harris HCX-7 w/fpp f77 1.0 26 .48 1.43 4.17 CDC 6600 FTN 4.6 opt=2 26 .48 1.44 4.19 CDC Cyber 170-835 FTN 5 opt=2 26 .47 1.45 4.22 CCI Power 6/32 w/fpa UNIX 4.2 bsd f77 26 .47 1.45 4.22 Sperry 7000 4.2 26 .47 1.47 4.27 Gould PN9000 UNIX 26 .47 1.47 4.29 IBM 4381 MG1 VS opt=3 27 .46 1.49 4.35 CDC Cyber 170-835 FTN 5 opt=1 28 .44 1.57 4.58 Harris H1000 VOS 3.3 opt g 30 .41 1.67 4.86 SUN-3/160M + FPA f77 -O -ffpa 3.1 30 .40 1.70 4.95 IBM 4381-11 VS 1.4.0 opt=3 31 .39 1.76 5.12 DEC VAX 8600 VMS v3.6 Fortran 3.4 32 .38 1.78 5.18 NORSK DATA ND-570/2 Fortran-500-E 32 .38 1.80 5.24 Sperry 1100/80 FTN opt=ZEO 32 .38 1.80 5.24 CONCEPT 32/8750 UTX/32 34 .36 1.88 5.48 Celerity C1230 UNIX 4.2 bsd f77 34 .36 1.90 5.53 CDC 6600 RUN 34 .36 1.93 5.62 Gould PN9080 UTX/32 35 .35 1.97 5.73 Prime 9950 F77 19.4.2 36 .34 2.00 5.83 Data General MV/10000 f77 opt level 2 40 .30 2.26 6.58 IBM 4361 MG5 VS opt=3 41 .30 2.31 6.73 IRIS 2400 Turbo/FPA f77 50 .24 2.82 8.21 CDC Cyber 180-830 NOS/VE OPT=HIGH 52 .24 2.89 8.41 Harris 800 Fortran 77 53 .23 2.99 8.70 IBM 370/158 H opt=3 53 .23 2.99 8.71 IBM 370/158 VS opt=3 56 .22 3.15 9.17 NORSK DATA ND-560 Fortran-500 57 .22 3.20 9.32 Celerity C1200 UNIX 4.2 bsd f77 58 .21 3.23 9.42 Honeywell DPS 8/70 FR7X 58 .21 3.24 9.43 Denelcor HEP f77 UPX 59 .21 3.32 9.68 CDC Cyber 170-720 FTN 5, opt=2 62 .20 3.47 10.1 VAX 11/785 FPA VMS v4.1 63 .20 3.50 10.2 Itel AS/5 mod 3 H 63 .19 3.54 10.3 NORSK DATA ND-500 Fortran-500-E 63 .19 3.54 10.3 CDC Cyber 170-825 FTN 5, opt=2 65 .19 3.63 10.6 IBM 4341 MG10 VS opt=3 66 .19 3.70 10.8 VAX 11/785 FPA UNIX 4.2 bsd f77 67 .18 3.75 10.9 CDC Cyber 170-825 FTN 5, opt=1 68 .18 3.81 11.1 CDC Cyber 170-720 FTN 5, opt=1 70 .17 3.93 11.4 Ridge 32/130 ROS 3.3/RISC 71 .17 3.96 11.5 Perkin Elmer 3252 OS 6.2.4 fortran z 71 .17 4.00 11.7 CDC Cyber 180-810 NOS/VE OPT=HIGH 74 .17 4.14 12.1 Perkin Elmer 3242 OS 32 v7.2 f77 80 .16 4.50 13.1 DEC VAX 8200 VMS V4.3 80 .15 4.49 13.8 ICL 2988 f77 OPT=2 85 .14 4.78 13.9 VAX 11/780 FPA VMS v4.1 89 .14 4.96 14.4 micro VAX II VMS v4.1 97 .13 5.45 15.9 VAX 11/750 FPA VMS v4.1 99 .12 5.52 16.1 CONCEPT 32/6750 UTX/32 99 .12 5.53 16.1 VAX 11/780 FPA UNIX 4.2 BSD f77 101 .13 5.67 16.5 CDC 6500 FUN 102 .12 5.69 16.6 Prime 750 Primos f77 v19.1 107 .11 6.00 17.4 Definicon DSI-780 SVS Fortran (MSDOS) 109 .113 6.09 17.7 Perkin Elmer 3230 OS 6.2.2 fortran 5.2 112 .11 6.28 18.3 VAX 11/750 FPA UNIX 4.2 bsd f77 128 .096 7.15 20.8 Prime 850 Primos 130 .095 7.26 21.1 Sperry 1100/60 FTN opt=ZEO 132 .093 7.38 21.5 Pyramid 90X FPA UNIX 4.2 bsd f77 137 .088 7.65 22.3 Ridge 32/110 ROS 3.3/RISC 151 .081 8.48 24.7 SUN-3/75 w/68881 f77 -O -f68881 3.0 155 .079 8.67 25.2 Data General MV/8000 f77 opt level 2 157 .078 8.80 25.6 Apollo DN460/660 AEGIS 8.0 FTN 179 .069 10.0 29.2 HP 9000 Series 320 HP-UX, f77 5.15 195 .063 10.9 31.9 Apollo DN3000 AEGIS 8.0 FTN 197 .062 11.4 32.1 Masscomp MC500 w/FPP 3.1 Fortran 200 .061 11.2 32.6 Harris HS-20 w/FPP Fortran 77 3.1 202 .061 11.3 33.0 Sequent Balance 8000 DYNIX Fortran 2.4.4 208 .059 11.7 33.9 Definicon DSI-32/10 GreenHills f77 (MSDOS) 214 .057 12.0 34.9 VAX 11/750 VMS v4.1 215 .057 12.1 35.1 HP 9000 Series 500 Fortran 1.7 285 .043 16.0 46.6 ENCORE MULTIMAX f77 298 .041 16.7 48.7 ATT 3B20 FP UNIX V 2.0/4 310 .040 17.3 50.5 Acorn Cambridge fortran 312 .039 17.5 51.0 IBM 4331 MG2 H opt=3 326 .038 18.3 53.2 Burroughs B6800 Fortran 77 ver 34 329 .037 18.4 53.7 VAX 11/725 FPA VMS v4.1 330 .037 18.5 53.9 Masscomp MCS-541 w/FPB Fortran 3.1 332 .037 18.6 54.1 IBM RT PC Model 20 f77 341 .036 19.1 55.6 VAX 11/730 FPA VMS 348 .036 19.5 56.9 Prime 2250 Fortran 77 365 .034 20.5 59.6 IBM PC-AT/370 VS opt=3 369 .033 20.7 60.2 IBM PC-XT/370 H opt=3 391 .031 21.9 63.7 VAX 11/750 UNIX 4.2 bsd f77 422 .029 23.7 69.0 Apollo DN320 AEGIS 8.0 FTN 440 .028 24.6 71.8 SUN-2/50 + SKY FFP f77 -O -fsky 3.0 454 .027 25.4 74.0 Apollo DN550 FPA AEGIS 8.0 FTN 489 .025 27.4 79.8 micro VAX I VMS 529 .023 29.6 86.2 Canaan VS 588 .021 33.0 96.0 Chas. River Data 6835+SKY SVS Fortran 77 700 .018 39.2 114. Apollo DN 420 PEB AEGIS 7+ FTN 707 .017 39.6 115. IBM AT w/80287 PROFORT 1.0 1054 .012 59.1 172. IBM PC w/8087 PROFORT 1.0 1054 .012 59.1 172. Cadtrak DS1/8087 Intel Fortran 77 1143 .011 64.0 186. IBM PC/AT w/80287 Microsoft 3.2 1341 .0091 75.1 219. Chas. River Data 6835 SVS Fortran 77 1401 .0088 78.5 229. Apollo DN300 AEGIS 8.0 FTN 1719 .0071 96.4 281. Masscomp MC500 3.1 Fortran 1751 .0070 98.1 286. IBM PC w/8087 Microsoft 3.2 1766 .0069 98.9 288. HP 9000 Series 200 HP-UX 1982 .0062 111. 323 SUN-2/50 f77 -O -fsoft 3.0 2232 .0055 125. 363. SUN UNIX, f77 no opt 2661 .0046 149. 434. Apple Macintosh ABSOFT 2.0b 3196 .0038 179. 521. .TE .sp 2 .ds LH " Full Precision .ds CH - % - .ds RH Coded BLAS .bp .B .ce Solving a System of Linear Equations .sp .ce with LINPACK @ nothing sup a @ in Full Precision @ nothing sup b @ Using Coded BLAS .sp .R .ps 10 .TS H center; l l c c c c l l c c c c l l c c c c. Computer OS/Compiler@ nothing sup c @ Ratio@ nothing sup d @ MFLOPS@ nothing sup e @ Time Unit@ nothing sup f @ secs @ mu @secs _ .TH CRAY X-MP-2 (1 proc.) CFT 1.14(Coded BLAS) .22 57 .012 0.035 Amdahl 1200 Fortran 77(Coded BLAS) .45 27 .025 0.073 CDC Cyber 205 (4-pipe) FTN200 2.1.6(Coded BLAS) .50 25 .028 0.081 CDC Cyber 205 (2-pipe) FTN200 2.1.6(Coded BLAS) .53 23 .029 0.086 CRAY-1S CFT(Coded BLAS) .54 23 .030 0.088 Amdahl 1100 Fortran 77(Coded BLAS) .54 23 .030 0.087 Amdahl 500 Fortran 77(Coded BLAS) .60 20 .034 0.098 IBM 3090 Model 200/VF VS Fortran V2(Coded BLAS) .83 15 .047 0.14 NAS 9160 VS opt=3(Coded BLAS) .94 13 .053 0.15 FPS-264 F.01 F77(Coded BLAS) 1.2 10 .066 0.19 Alliant FX/8 (8CEs) FX Fortran v2.0.19(Coded BLAS) 1.3 9.8 .070 0.20 CDC Cyber 875 FTN 5 opt=3(Coded BLAS) 2.2 5.5 .124 0.36 FPS-264/20 F02 APFTN64 OPT=4(Coded BLAS) 2.6 4.6 .148 0.43 CDC 7600 FTN(Coded BLAS) 2.6 4.6 .148 0.43 CDC Cyber 760 FTN 5, opt=3(Coded BLAS) 3.3 3.7 .186 0.54 CONVEX C-1 Fortran 1.6(Coded BLAS) 3.7 3.3 .209 0.61 FPS-164/364 F.01 D, opt=3(Coded BLAS) 4.2 2.9 .232 0.68 Alliant FX/1 (1CE) FX Fortran v2.0.19(Coded BLAS) 6.2 2.0 .348 1.01 IBM 4381-13 M/A asst VS 1.4.0 opt=3(Coded BLAS) 7.7 1.6 .430 1.25 ELXSI FTN MOD 2(Coded BLAS) 8.7 1.4 .485 1.41 IBM 4381-12 M/A asst VS 1.4.0 opt=3(Coded BLAS) 9.6 1.3 .540 1.57 IBM 4381 MG2 M/A asst VS opt=3(Coded BLAS) 10 1.2 .559 1.63 DEC VAX 8800 VMS v4(Coded BLAS) 11 1.13 .606 1.76 DEC VAX 8650 VMS v4.1(Coded BLAS) 13 .96 .715 2.08 Harris H1200 VOS 4.1 opt g(Coded BLAS) 14 .85 .805 2.34 Gould PN9000 UNIX(Coded BLAS) 15 .81 .850 2.48 NORSK DATA ND-570/2 Fortran-500-E(Coded BLAS) 16 .76 .900 2.62 DEC VAX 8500 VMS v4(Coded BLAS) 16 .76 .900 2.62 IBM 4381 MG1 M/A asst VS opt=3(Coded BLAS) 19 .65 1.06 3.10 DEC VAX 8600 VMS v4.1(Coded BLAS) 19 .66 1.04 3.03 IBM 4381-11 M/A asst VS 1.4.0 opt=3(Coded BLAS) 21 .59 1.16 3.37 Celerity C1230 UNIX 4.2 bsd f77(Coded BLAS) 21 .59 1.17 3.40 Harris H1000 VOS 3.3 opt g(Coded BLAS) 22 .57 1.21 3.52 IRIS 2400 Turbo/FPA f77(Coded BLAS) 36 .34 2.04 5.93 Celerity C1200 UNIX 4.2 bsd f77(Coded BLAS) 38 .32 2.15 6.26 VAX 11/785 FPA VMS v4.1(Coded BLAS) 54 .23 3.01 8.77 DEC VAX 8200 VMS 4.3(Coded BLAS) 67 .18 3.75 10.9 VAX 11/780 FPA VMS v4.1(Coded BLAS) 74 .17 4.12 12.0 micro VAX II VMS v4.1(Coded BLAS) 79 .16 4.40 12.8 VAX 11/750 FPA VMS v4.1(Coded BLAS) 83 .15 4.64 13.5 Apollo DN460/660 AEGIS 8.0 FTN(Coded BLAS) 111 .11 6.20 18.1 Masscomp MC500 w/FPP 3.1 Fortran(Coded BLAS) 132 .093 7.42 21.6 Sequent Balance 8000 DYNIX Fortran 2.4.4(Coded BLAS) 185 .066 10.4 30.2 Apollo DN320 AEGIS 8.0 FTN(Coded BLAS) 235 .052 13.2 38.3 Apollo DN550 FPA AEGIS 8.0 FTN(Coded BLAS) 262 .047 14.7 42.7 VAX 11/725 FPA VMS v4.1(Coded BLAS) 283 .043 15.8 46.2 VAX 11/730 FPA VMS(Coded BLAS) 286 .043 16.0 46.6 Apollo DN300 AEGIS 8.0 FTN(Coded BLAS) 1721 .0071 96.4 281. .TE .ds LH " Full Precision .ds CH - % - .ds RH Compiler Directives .bp .sp 2 .PP With some of the parallel and vector computers it is possible to take advantage of the multiprocessors and special hardware through Fortran. For example, on the CRAY X-MP and the Alliant FX/8, a simple compiler directive, in the form of a Fortran comment, allows one to gain access to the multiprocessing capabilities of these machines. .TS center; l l c c c c. Computer OS/Compiler@ nothing sup c @ Ratio@ nothing sup d @ MFLOPS@ nothing sup e @ Time Unit@ nothing sup f @ secs @ mu @secs _ CRAY X-MP-4 Comp Dir(Rolled BLAS) .25 49 .014 .0408 Fujitsu VP-200 Comp Dir(Rolled BLAS) .64 19 .040 .110 Alliant FX/8 (8CEs) Comp Dir(Coded BLAS) 1.4 8.6 .08 .233 Alliant FX/8 (8CEs) Comp Dir(Rolled BLAS) 2.0 6.2 .11 .320 .TE .ds LH " Half Precision .ds CH - % - .ds RH All Fortran .bp .ps 14 .B .ce Solving a System of Linear Equations .sp .ce with LINPACK @ nothing sup a @ in Half Precision @ nothing sup b @ Using All Fortran .ps 10 .sp .R .TS H center; l l c c c c l l c c c c l l c c c c. Computer OS/Compiler@ nothing sup c @ Ratio@ nothing sup d @ MFLOPS@ nothing sup e @ Time Unit@ nothing sup f @ secs @ mu @secs _ .TH NEC SX-2 FORTRAN 77/SX(Rolled BLAS) .26 47 .015 0.043 NEC SX-1 FORTRAN 77/SX(Rolled BLAS) .31 40 .017 0.050 NEC SX-1E FORTRAN 77/SX(Rolled BLAS) .35 35 .020 0.057 Amdahl 1200 Fortran 77(Rolled BLAS) .67 18 .038 0.11 Fujitsu VP-200 Fortran 77(Rolled BLAS) .69 18 .039 0.11 Amdahl 1100 Fortran 77(Rolled BLAS) .73 17 .041 0.12 Fujitsu VP-100 Fortran 77(Rolled BLAS) .75 16 .042 0.12 Hitachi S-810/20 FORT77/HAP(Rolled BLAS) .78 16 .044 0.13 Amdahl 500 Fortran 77(Rolled BLAS) .81 15 .045 0.13 Fujitsu VP-50 Fortran 77(Rolled BLAS) .88 14 .049 0.14 Sperry 1100/90 ext w/ISP UCS level 2 .89 14 .050 0.15 IBM 3090 Model 200/VF VS Fortran V2(Rolled BLAS) .95 13 .053 0.16 Alliant FX/8 (8CEs) FX Fortran v2.0.19(Rolled BLAS) 1.6 7.6 .090 0.26 IBM 3090 Model 200 VS opt=3 1.7 7.1 .097 0.28 Fujitsu M-380 Fortran 77, opt=3 1.7 7.0 .098 0.28 Honeywell DPS90 ES F77V 1.0(Rolled BLAS) 1.8 6.7 .103 0.30 Amdahl 5860 HSFPF H enhanced opt=3 2.2 5.5 .125 0.36 NAS 9060 VS opt=2 2.4 5.2 .133 0.38 Amdahl 5860 HSFPF VS opt=3 2.4 5.1 .135 0.39 CONVEX C-1 Fortran 1.6(Rolled BLAS) 3.0 4.1 .168 0.49 Sperry 1100/90 FTN opt=ZEO 4.4 2.8 .248 0.72 Amdahl 470 V/8 H enhanced opt=3 4.4 2.8 .246 0.71 Amdahl 470 V/8 VS opt=3 4.5 2.7 .254 0.74 Sperry 1100/90 ext UFTN 5.0 2.5 .279 0.81 IBM 3081 K H enhanced opt=3 5.1 2.4 .283 0.82 IBM 3081 K VS opt=3 5.6 2.2 .311 0.91 IBM 3033 VS Fortran 6.3 1.9 .353 1.03 Honeywell DPS 8/88 FR7X 6.6 1.8 .371 1.08 IBM 3081 D VS opt=3 6.7 1.8 .376 1.10 Alliant FX/1 (1CE) FX Fortran v2.0.19(Rolled BLAS) 7.8 1.6 .440 1.28 DEC VAX 8800 VMS 4.3 9.1 1.35 .510 1.48 DEC VAX 8650 VMS v4.1 9.7 1.3 .545 1.59 ELXSI FTN MOD 2 10 1.2 .570 1.66 IBM 4381-13 VS 1.4.0 opt=3 12 1.1 .644 1.88 Sperry 7000 4.2 13 .96 .717 2.09 CCI Power 6/32 w/fpa UNIX 4.2 bsd f77 13 .94 .733 2.14 Harris HCX-7 w/fpp f77 1.0 13 .94 .733 2.14 Numerix NMX-432 AR Fortran 14 .89 .769 2.24 DEC VAX 8600 VMS v4.1 Fortran 4.2 14 .88 .780 2.27 IBM 4381 MG2 VS opt=3 14 .86 .801 2.33 IBM 4381-12 VS 1.4.0 opt=3 14 .85 .805 2.34 Sperry 1100/80 w/SAM FTN opt=ZEO 15 .83 .830 2.41 DEC VAX 8500 VMS v4 15 .80 .859 2.50 Gould PN9000 UNIX 17 .69 1.0 2.83 Gould 9050 w MACC UNIX 4.2bsd f77 18 .70 .980 2.85 Harris H1200 VOS 4.1 opt g 18 .68 1.01 2.95 SUN-3/160M + FPA f77 -O -ffpa 3.1 20 .62 1.12 3.25 Gould PN9080 UTX/32 21 .57 1.20 3.49 DEC VAX 8600 VMS v3.6 Fortran 3.4 22 .57 1.21 3.54 Harris H1000 VOS 3.3 opt g 22 .57 1.21 3.52 CONCEPT 32/8750 UTX/32 23 .54 1.27 3.69 Sperry 1100/80 FTN opt=ZEO 24 .52 1.32 3.85 IBM 4381 MG1 VS opt=3 24 .51 1.33 3.89 Celerity C1230 UNIX 4.2 bsd f77 24 .51 1.35 3.93 NORSK DATA ND-570/2 Fortran-500-E 25 .50 1.38 4.02 IBM 4381-11 VS 1.4.0 opt=3 28 .43 1.58 4.61 IBM 4361 MG5 VS opt=3 29 .42 1.65 4.81 Data General MV/10000 f77 opt level 2 31 .39 1.75 5.09 IRIS 2400 Turbo/FPA f77 31 .40 1.72 5.02 DEC VAX 11/785 FPA VMS v4.1 31 .40 1.72 5.02 Prime 9950 F77 19.4.2 36 .34 2.00 5.82 Honeywell DPS 8/70 FR7X 40 .31 2.22 6.46 DEC VAX 11/785 FPA UNIX 4.2 bsd f77 40 .31 2.27 6.50 Celerity C1200 UNIX 4.2 bsd f77 40 .30 2.27 6.60 IBM 370/158 H opt=3 42 .29 2.35 6.86 Perkin Elmer 3252 OS 6.2.4 fortran z 45 .27 2.50 7.28 NORSK DATA ND-560 Fortran-500 45 .27 2.54 7.39 NORSK DATA ND-500 Fortran-500-E 46 .27 2.58 7.51 DEC KL-20 F20 46 .27 2.59 7.53 IBM 370/158 VS opt=3 46 .26 2.60 7.58 Ridge 32/130 ROS 3.3/RISC 47 .26 2.64 7.70 DEC VAX 11/780 FPA VMS v4.1 49 .25 2.74 7.98 Sperry 1100/60 FTN opt=ZEO 49 .25 2.77 8.09 ICL 2988 f77 OPT=2 50 .25 2.79 8.13 Perkin Elmer 3242 OS 32 v7.2 f77 51 .24 2.88 8.37 Harris 800 Fortran 77 53 .23 2.99 8.70 DEC VAX 8200 VMS 4.3 55 .22 3.10 9.03 IBM 4341 MG10 VS opt=3 57 .22 3.18 9.25 DEC VAX 11/780 FPA UNIX 4.2 BSD f77 58 .21 3.25 9.47 Honeywell 6080 Y 62 .20 3.46 10.1 Pyramid 90X FPA UNIX 4.2 bsd f77 63 .20 3.50 10.2 CONCEPT 32/6750 UTX/32 65 .19 3.63 10.6 Ridge 32/110 ROS 3.3/RISC 67 .18 3.73 10.9 DEC VAX 11/750 FPA VMS v4.1 67 .18 3.75 10.9 Data General MV/8000 f77 opt level 2 69 .18 3.84 11.2 DEC micro VAX II VMS v4.1 70 .17 3.95 11.5 DEC VAX 11/780 VMS v4.1 74 .17 4.13 12.0 Perkin Elmer 3230 OS 6.2.2 fortran 5.2 76 .16 4.26 12.4 Prime 750 Primos f77 v19.1 89 .14 5.00 14.6 DEC VAX 11/750 FPA UNIX 4.2 bsd f77 91 .13 5.12 14.9 Definicon DSI-780 SVS Fortran (MSDOS) 94 .13 5.27 15.3 Prime 850 Primos 97 .13 5.41 15.8 Masscomp MC500 w/FPP 3.1 Fortran 111 .11 6.23 18.2 Harris HS-20 w/FPP Fortran 77 3.1 114 .11 6.38 18.6 Apollo DN460/660 AEGIS 8.0 FTN 118 .10 6.60 19.3 HP 9000 Series 500 Fortran 1.7 125 .098 7.00 20.4 Pyramid 90X UNIX 4.2 bsd f77 134 .092 7.50 21.8 DEC VAX 11/750 VMS v4.1 138 .089 7.71 22.5 IBM 4331 MG2 H opt=3 140 .088 7.84 22.8 SUN-3/75 w/68881 f77 -O -f68881 3.0 145 .084 8.15 23.7 Sequent Balance 8000 DYNIX Fortran 2.4.4 162 .075 9.10 26.5 HP 9000 Series 320 HP-UX, f77 5.15 164 .075 9.18 26.7 Apollo DN3000 AEGIS 8.0 FTN 174 .068 10.0 29.2 Definicon DSI-32/10 GreenHills f77 (MSDOS) 175 .070 9.82 28.6 DEC VAX 11/750 UNIX 4.1 bsd f77 204 .060 11.4 33.3 Masscomp MCS-541 w/FPB Fortran 3.1 227 .054 12.7 36.9 ATT 3B20 FP UNIX V 2.0/4 231 .053 12.9 37.7 ENCORE MULTIMAX f77 233 .052 13.1 38.1 Burroughs 6700 H 234 .052 13.1 38.2 DEC VAX 11/725 FPA VMS v4.1 236 .052 13.2 38.5 Acorn Cambridge fortran 250 .049 14.0 40.8 Prime 2250 Fortran 77 258 .048 14.5 42.1 DEC VAX 11/730 FPA VMS 259 .047 14.5 42.2 SUN-2/50 + SKY FFP f77 -O -fsky 3.0 266 .046 14.9 43.4 micro VAX I VMS 272 .045 15.2 44.4 IBM PC-AT/370 VS opt=3 279 .044 15.6 45.5 Apollo DN320 AEGIS 8.0 FTN 277 .044 15.5 45.2 Chas. River Data 6835+SKY SVS Fortran 77 284 .043 15.9 46.3 IBM PC-XT/370 H opt=3 303 .040 17.0 49.5 DEC KA-10 F40 305 .040 17.1 49.8 Apollo DN550 FPA AEGIS 8.0 FTN 305 .040 17.1 49.9 Canaan VS 306 .040 17.1 49.9 IBM RT PC Model 20 f77 368 .033 20.6 60.0 Apollo DN 420 PEB AEGIS 7+ FTN 394 .031 22.1 64.3 Chas. River Data 6835 SVS Fortran 77 770 .016 43.1 126. IBM AT w/80287 PROFORT 1.0 891 .013 49.8 145. Cadtrak DS1/8087 Intel Fortran 77 893 .013 50.0 146. IBM PC w/8087 PROFORT 1.0 906 .013 50.8 148. Apollo DN300 AEGIS 8.0 FTN 944 .013 53.0 154. SUN-2/50 f77 -O -fsoft 3.0 983 .013 55.0 160. Masscomp MC500 3.1 Fortran 1037 .012 58.1 169. IBM PC w/8087 Microsoft 3.1 1071 .011 60.0 175. HP 9000 Series 200 HP-UX 1196 .010 67.0 195. SUN UNIX, f77 no opt 1298 .0094 72.7 212. Apple Macintosh ABSOFT 2.0b 1723 .0071 96.5 281. Tandy 2000 Microsoft 3.13 3763 .0033 211. 614. IBM PC Microsoft 3.1 21875 .00056 1225. 3568. Apple III Pascal 50232 .00024 2813. 8193. .TE .sp .ds LH " Half Precision .ds CH - % - .ds RH Coded BLAS .B .ce Solving a System of Linear Equations .sp .ce with LINPACK @ nothing sup a @ in Half Precision @ nothing sup b @ Using Coded BLAS .ps 10 .sp .R .TS H center; l l c c c c l l c c c c l l c c c c. Computer OS/Compiler@ nothing sup c @ Ratio@ nothing sup d @ MFLOPS@ nothing sup e @ Time Unit@ nothing sup f @ secs @ mu @secs _ .TH Amdahl 1200 Fortran 77(Coded BLAS) .45 28 .025 0.073 CDC Cyber 205 (4-pipe) FTN200 2.1.6(Coded BLAS) .46 27 .026 0.075 Amdahl 1100 Fortran 77(Coded BLAS) .50 24 .028 0.082 CDC Cyber 205 (2-pipe) FTN200 2.1.6(Coded BLAS) .53 23 .030 0.077 Amdahl 500 Fortran 77(Coded BLAS) .57 22 .032 0.093 NAS 9160 VS opt=3 (Coded BLAS) .70 17 .039 0.11 Alliant FX/8 (8CEs) FX Fortran v2.0.19(Coded BLAS) 1.3 9.8 .070 0.20 CONVEX C-1 Fortran 1.6(Coded BLAS) 2.5 4.9 .139 0.41 Numerix NMX-432 AR Fortran(Coded BLAS) 3.5 3.5 .199 0.58 Alliant FX/1 (1CE) FX Fortran v2.0.19(Coded BLAS) 6.1 2.0 .340 0.99 DEC VAX 8650 VMS v4.1(Coded BLAS) 6.4 1.9 .361 1.05 DEC VAX 8800 VMS v4(Coded BLAS) 7.4 1.7 .416 1.21 ELXSI FTN MOD 2(Coded BLAS) 7.5 1.6 .418 1.22 Mitsubishi MX/3000 w/sp Fortran 77(Coded BLAS) 7.9 1.6 .445 1.29 Gould PN9000 UNIX(Coded BLAS) 9.2 1.3 .520 1.50 DEC VAX 8600 VMS v4.1(Coded BLAS) 9.8 1.3 .546 1.59 NORSK DATA ND-570/2 Fortran-500-E(Coded BLAS) 12 1.0 .678 1.98 Harris H1200 VOS 4.1 opt g(Coded BLAS) 12 .99 .693 2.01 DEC VAX 8500 VMS v4(Coded BLAS) 13 .96 .717 2.09 Harris H1000 VOS 3.3 opt g(Coded BLAS) 15 .83 .825 2.40 Celerity C1230 UNIX 4.2 bsd f77(Coded BLAS) 17 .72 .950 2.77 IRIS 2400 Turbo/FPA f77(Coded BLAS) 23 .54 1.26 3.68 DEC VAX 11/785 FPA VMS v4.1(Coded BLAS) 24 .51 1.34 3.91 Celerity C1200 UNIX 4.2 bsd f77(Coded BLAS) 28 .44 1.55 4.52 DEC VAX 11/780 FPA VMS v4.1(Coded BLAS) 36 .34 2.02 5.88 DEC VAX 8200 VMS 4.3(Coded BLAS) 42 .29 2.38 6.93 DEC VAX 11/750 FPA VMS v4.1(Coded BLAS) 51 .24 2.83 8.24 DEC micro VAX II VMS v4.1(Coded BLAS) 54 .23 3.04 8.81 Masscomp MC500 w/FPP 3.1 Fortran(Coded BLAS) 72 .17 4.02 11.7 Apollo DN460/660 AEGIS 8.0 FTN(Coded BLAS) 99 .12 5.60 16.2 Sequent Balance 8000 DYNIX Fortran 2.4.4(Coded BLAS) 148 .083 8.31 24.2 Apollo DN320 AEGIS 8.0 FTN(Coded BLAS) 150 .082 8.40 24.5 Apollo DN550 FPA AEGIS 8.0 FTN(Coded BLAS) 162 .076 9.10 26.4 DEC VAX 11/725 FPA VMS v4.1(Coded BLAS) 186 .066 10.4 30.4 DEC VAX 11/730 FPA VMS(Coded BLAS) 205 .060 11.5 33.4 COMPAQ PC/8087 Microsoft 3.13(Coded BLAS) 591 .021 33.1 96.5 Apollo DN300 AEGIS 8.0 FTN(Coded BLAS) 924 .013 51.8 151. .TE .sp .sp .PP @ nothing sup a @ LINPACK routines @SGEFA@ and @SGESL@ were used for single precision and routines @DGEFA@ and @DGESL@ were used for double precision. These routines perform standard @LU@ decomposition with partial pivoting and backsubstitution. .sp .PP @ nothing sup b @\f2Full Precision\f1 implies the use of (approximately) 64 bit arithmetic, e.g. CDC single precision or IBM double precision. \f2Half Precision\f1 implies the use of (approximately) 32 bit arithmetic, e.g. IBM single precision. .sp .PP @ nothing sup c @\f2OS/Compiler\f1 refers to the operating system and compiler used, (Coded BLAS) refers to the use of assembly language coding of the BLAS, and (Rolled BLAS) refers to a Fortran version with, single statement, simple loops [2]. .sp .PP @ nothing sup d @\f2Ratio\f1 is the number of times faster or slower a particular machine configuration is when compared to the CRAY-1S using a Fortran coding for the BLAS in full precision. .sp .PP @ nothing sup e @\f2MFLOPS\f1 is a rate of execution, the number of million floating point operations completed per second. For solving a system of @n@ equations, approximately @ 2/3 n sup 3 ~+~ 2 n sup 2 @ operations are performed (we count both additions and multiplications). .sp .PP @ nothing sup f @\f2Unit\f1 is the time in microseconds required to execute the statement @ y sub i ~ =~ y sub i ~+~ t*x sub i @. This involves one floating point multiplication, one floating point addition, and a few one-dimensional indexing operations and storage references. The actual statement occurs in SAXPY, which is called roughly @n sup 2 /2 @ times by SGEFA and @2n@ times by SGESL with vectors of varying lengths. The statement is executed approximately @ n sup 3 over 3 ~ +~ n sup 2 @ times. Thus for @n ~=~ 100@, .EQ I Unit ~=~ { 10 sup 6 Time } / ( { 100 sup 3 over 3 ~ +~ 100 sup 2 } ) . .EN .sp 2 .PP The execution times for the LINPACK benchmark in Tables 1 and 2 were gathered in the following way: The LINPACK code (SEGFA and SGESL in single precision and DGEFA and DGESL in double precision) was not modified in any way. In particular, no changes to the source statements or changes or additions to the comments were allowed. Only the BLAS were allowed to change, in two ways. First, in some cases, the Fortran version of the BLAS was replaced and implemented in assembly language for the specific machine; these are indicated by the annotation "(Coded BLAS)". Second, in the case where a Fortran implementation of the BLAS was used on vector machines the loops were usually rolled; these are indicated by the annotation "(Rolled BLAS)". (In the Fortran implementation of the BLAS the loops are unrolled to provided for better performance on scalar machines. On vector machines this technique usually results in poor performance since the compiler cannot reconize the vector loop.) .PP The same matrix was used to solve the system of equations. The results were checked for accuracy by calculating a residual for the problem, @ || ~ Ax ~-~ b ~ || / (||A|| ||x|| ) @. The timing program is available on request. .PP Anyone interested in adding to or updating this table is encouraged to contact the author. Please send suggestions and interesting results to: .sp .nf Jack J. Dongarra Mathematics and Computer Science Division Argonne National Laboratory Argonne, Illinois 60439 .EQ delim ## .EN ARPAnet: DONGARRA@ANL-MCS.ARPA .SH References .sp .IP [1] .R J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart, .I LINPACK Users' Guide, .R SIAM Publications, Philadelphia, 1979. .sp .IP [2] .R C. Lawson, R. Hanson, D. Kincaid, and F. Krogh, "Basic Linear Algebra Subprograms for Fortran Usage," .I ACM Trans. Math. Software, .R Vol. 5, No. 3, 1979, pp. 308-371. .sp .IP [3] .R J. J. Dongarra and S. C. Eisenstat, "Squeezing the Most out of an Algorithm in CRAY Fortran," .I ACM Trans. Math. Software, .R Vol. 10, No. 3, 1984, pp. 221-230. .sp .ds LH " Matrix Vector .ds CH - % - .ds RH .bp .SH APPENDIX .sp .EQ delim @@ .EN .B .ps 14 .ce Performance of Large Scientific Computers .sp .ce in a Fortran Environment .sp 2 .PP The LINPACK routines used to generate the timings in the previous table do not reflect the true performance of "advanced scientific computers". A different implementation of the solution of linear equations, presented in a report by Dongarra and Eisenstat [3], better describes the performance on such machines. That algorithm is based on matrix-vector operations rather than just vector operations. This produces a program that has a high level of modularity or larger granularity, having the potential for better performance across a wide range of machines, especially on high performance computers. The number of floating point operations required and the roundoff errors produced by both algorithms are exactly the same, only the way in which the matrix elements are accessed is different. As before, a Fortran program was run and the time to complete the solution of equations for a matrix of order 300 is reported. .PP Note that these numbers are for a problem of order 300 and all runs are for full precision. .PP The table was compiled over a period of time. Subsequent software and hardware changes to a computer system may alter the timing to some extent. .ps .sp 2 .B .ce Solving a System of Linear Equations .ce Using the Vector Unrolling Technique .sp 2 .R .ps 10 .TS H center; l l c c c l l c c c l l c c c. Computer OS/Compiler@ nothing sup a @ MFLOPS@ nothing sup b @ Time Unit@ nothing sup c @ secs @ mu @secs _ .TH CRAY X-MP-4 @"" sup \(sc @ CFT(Coded MV routines) 480 .038 .0042 CRAY X-MP-4 @"" sup \(sc @ CFT(Coded ISAMAX) 356 .051 .0056 NEC SX-2 FORTRAN 77/SX 309 .057 .0064 CRAY X-MP-2 @"" sup \(dd @ CFT(Coded ISAMAX) 257 .076 .0083 Amdahl 1200 Fortran 77(Coded MV routines) 230 .078 .0086 Fujitsu VP-200 Fortran 77(Comp directive) 220 .083 .0091 Amdahl 1200 Fortran 77(Comp directive) 215 .084 .0092 NEC SX-1 FORTRAN 77/SX 207 .087 .0096 Fujitsu VP-200 Fortran 77 183 .099 .011 Amdahl 1200 Fortran 77 180 1.00 .011 CRAY X-MP-1 @"" sup \(dg @ CFT(Coded MV routines) 171 .106 .0117 Amdahl 1100 Fortran 77(Coded MV routines) 167 .108 .0119 CRAY X-MP-2 @"" sup \(dd @ CFT 161 .113 .012 Amdahl 1100 Fortran 77(Comp directive) 159 .113 .0124 Fujitsu VP-100 Fortran 77(Comp directive) 159 .113 .0124 Hitachi S-810/20 FORT77/HAP 158 .115 .013 Amdahl 1100 Fortran 77 142 .127 .014 Fujitsu VP-100 Fortran 77 139 .129 .014 NEC SX-1E FORTRAN 77/SX 140 .126 .014 CRAY X-MP-1 @"" sup \(dg @ CFT(Coded ISAMAX) 134 .136 .015 CRAY X-MP-1 @"" sup \(dg @ CFT 106 .172 .019 Amdahl 500 Fortran 77(Coded MV routines) 102 .176 .019 Fujitsu VP-50 Fortran 77(Comp directive) 100 .180 .02 Amdahl 500 Fortran 77(Comp directive) 99 .182 .02 CRAY-2 (1 proc.) CFT 2.70 93 .195 .022 Amdahl 500 Fortran 77 84 .214 .024 Fujitsu VP-50 Fortran 77 84 .214 .024 CRAY 1-M CFT(Coded ISAMAX) 83 .215 .024 CRAY 1-S CFT(Coded ISAMAX) 76 .236 .026 CRAY 1-M CFT 69 .259 .029 CRAY 1-S CFT 66 .273 .030 Sperry 1100/90 ext w/ISP UFTN(Coded MV routines) 39 .455 .051 FPS-264 F02, F77(Coded MV routines) 33 .550 .061 CDC Cyber 205 ftn 200 opt=1(Coded MV routines) 31 .59 .065 Sperry 1100/90 ext w/ISP UFTN 29 .618 .069 IBM 3090 Model 200/VF VS Fortran V2(Coded MV routines) 27 .673 .074 SCS-40 CFT 1.13 26 .683 .075 FPS-164/364 + 4 MAX E, F77 (Coded MV routines) 26 .70 .078 FPS-164/364 + 3 MAX E, F77 (Coded MV routines) 24 .77 .085 NAS 9160 VS opt=3(Coded MV routines) 20 .88 .097 FPS-164/364 + 2 MAX E, F77 (Coded MV routines) 20 .89 .098 FPS-264 F02 APFTN64 OPT=4 20 .90 .101 IBM 3090 Model 200/VF VS Fortran V2 18 1.03 .114 FPS-164/364 + 1 MAX E, F77 (Coded MV routines) 15 1.8 .130 Alliant FX/8 (8CEs) FX Fortran(Coded MV routines) 14 1.3 .140 CONVEX C-1 Fortran 1.6(Coded MV routines) 14 1.3 .140 FPS-264/20 F02 APFTN64 OPT=4 11 1.6 .180 CONVEX C-1 Fortran 1.6 8.7 2.1 .230 FPS 164/364 E, opt=3(Coded MV routines) 8.7 2.1 .231 Alliant FX/8 (8CEs) FX Fortran 7.3 2.5 .275 NAS 9060 VS opt=2 6.9 2.6 .285 FPS-164/364 F02 APFTN64 OPT=4 5.1 3.5 .391 IBM 370/195 VS opt=2 4.4 4.1 .455 IBM 3033 VS opt=2 2.5 7.1 .800 VAX 11/780 FPA UNIX xf77 .11 177. 19.5 .TE .sp .I Comments: .R .in +2. @"" sup \(sc @ These timings are for four processors with manual changes to use parallel features. .sp @"" sup \(dd @ These timings are for two processors with manual changes to use parallel features. .sp @"" sup \(dg @ These timings are for one processor of an X-MP-2. .sp The major difference between the CRAY 1-M and CRAY 1-S is in the memory speed, the CRAY 1-M having slower memory. The timings show the CRAY 1-M to be faster than the CRAY 1-S. After much discussion and examination of the generated assembly language code it was determined that, in fact, the CRAY 1-M was faster for this program. The code generated by the compiler causes the CRAY 1-S to miss a chain-slot. On the CRAY 1-M, because of slower memory, the chain-slot is not missed, thus the faster execution time. .in -2. .sp .PP @ nothing sup a @\f2OS/Compiler\f1 refers to the operating system and compiler used, (Coded ISAMAX) refers to the use of assembly language coding of the BLAS ISAMAX and \f2Comp Directive\f1 refers to the use of compiler directives in the matrix vector routines. .sp .sp .PP @ nothing sup b @\f2MFLOPS\f1 is a rate of execution, the number of million floating point operations completed per second. For solving a system of @n@ equations, @ 2/3 n sup 3 ~+~ 2 n sup 2 @ operations are performed (we count both additions and multiplications). .sp .PP @ nothing sup c @\f2Unit\f1 is the time in microseconds required to execute the statement @ y sub i ~ =~ y sub i ~+~ t*x sub i @. This involves one floating point multiplication, one floating point addition, and a few one-dimensional indexing operations and storage references. .sp .fi .ds LH " Peak Performance .ds CH - % - .ds RH .bp .ps 14 .ce .B Toward Peak Performance .sp 2 .R .PP In response to many requests, we have collected the results of solving a system of equations of order 1000. The difference between this performance and the previous results listed in this paper is that the manufacturer is allowed to use any algorithm to solve the problem. The only restriction is that a driver program (supplied by the author of this paper) be run to ensure that the same problem is being solved and to verify that the answer is correct. .sp 2 .R .ps 10 .TS H center; l l l l n n. Computer MFLOPS Time _ .TH CRAY X-MP-4 (4 processors) 713 .938 NEC SX-2 709 .947 Fujitsu VP-200 422 1.58 Amdahl 1200 397 1.67 CDC Cyber 205 (4-pipe/half precision) 308 2.16 Amdahl 1100 230 2.89 CDC Cyber 205 (2-pipe/half precision) 195 3.41 CDC Cyber 205 (4-pipe) 195 3.42 Amdahl 500 123 5.40 CDC Cyber 205 (2-pipe) 113 5.91 FPS-164 + 15 MAX 101 6.63 FPS-164 + 8 MAX 79 8.44 IBM 3090/VF (1 processor) 65 10.3 FPS-164/364 + 4 MAX 55 12.1 FPS-164/364 + 3 MAX 47 14.3 FPS-164/364 + 2 MAX 36 18.4 FPS-264 34 19.8 Alliant FX/8 (8 processors) 26 26 FPS-164/364 + 1 MAX 24 27.7 FPS-264/20 17 40 FPS-164/364 9 71.2 IBM 3081-KX (2 processors) 4 166 Sequent Balance 21000 (30 processors) 1.5 445 SUN-3/160M + FPA f77 -O -ffpa 3.1 .46 1448 .TE .sp 3 .br .SH Acknowledgments .PP I would like to thank the people who have helped in putting together this collection.