.B
.nr BT ''-%-''
.he ''''
.pl 11i
.de fO
'bp
..
.wh -.5i fO
.LP
.nr LL 6.5i
.ll 6.5i
.nr LT 6.5i
.lt 6.5i
.ta 5.0i
.ft 3
.bp
.R
.sp 1i
.ce 100
.R
.sp .5i
 .
.sp 10
ARGONNE NATIONAL LABORATORY
.br
9700 South Cass Avenue
.br
Argonne, Illinois  60439
.sp .6i
.ps 12
.ft 3
Performance of Various Computers Using Standard 

Linear Equations Software in a Fortran Environment
.ps 11
.sp 3
Jack J. Dongarra
.sp 3
.ps 10
.ft 1
Mathematics and Computer Science Division
.sp 2
Technical Memorandum No. 23
.sp .7i
\*(DY
.pn 1
.he ''-%-''
.he ''''
.bp
.ft 3
.ps 11
.bp
.LP
.EQ
delim @@
.EN
.nr PO .5i
.nr LL 7.0i
.po .5i
.ll 7.0i
.B
.ps 14
.ce
Performance of Various Computers Using Standard 
.sp
.ce
Linear Equations Software in a Fortran Environment
.sp 2
.AU
.ps 11
Jack J. Dongarra
.AI
.ps 10
Mathematics and Computer Science Division\h'.20i'   
Argonne National Laboratory\h'.20i'   
Argonne, Illinois 60439\h'.20i'
.FS
.ps 9
.vs 11p
@size -1 {"" sup \(dg}@\|Work supported in part by the Applied Mathematical
Sciences subprogram of the Office of Energy Research,
U. S. Department of Energy under Contract W-31-109-Eng-38.
.FE
.sp 3
.R
.ps 10
.ce
\*(DY
.sp 2
.QS
Abstract - This note compares the performance of different computer systems
while solving dense systems of linear equations using the LINPACK
software in a Fortran environment. About 100 computers, ranging from 
a CRAY X-MP to the 68000 based systems 
such as the Apollo and SUN Workstations to IBM PC's, are compared.
.QE
.sp 2
.PP
The timing information presented here should in no way be used 
to judge the overall 
performance of a computer system. The results reflect only one 
problem area: solving dense systems of equations using the LINPACK [1]
programs in a Fortran environment.
.sp
.PP
LINPACK programs can be characterized as having a high percentage 
of floating point arithmetic operations. 
The routines involved in this timing study, SGEFA and SGESL,
use column oriented algorithms.
That is, the programs usually
reference array elements sequentially down a column, not across a row.
Column orientation is important in increasing efficiency because
of the way Fortran stores arrays.
Most floating point operations in LINPACK
take place in a set of subprograms,
the Basic Linear Algebra Subprograms (BLAS) [2] that
are called repeatedly throughout the calculation.
The BLAS reference one-dimensional arrays, rather than two-dimensional
arrays.
.sp
.PP
The following two tables report timing results 
using LINPACK to solve a system of linear equations of order 100.
The execution speeds,
particularly for vector computers, may not have reached their asymptotic rates.
(See the appendix for a different comparison of large scientific
computers in Fortran.)
It should be noted that no changes were made to the LINPACK software
with the exception of replacing the BLAS.
In certain cases, as noted, the Fortran BLAS were replaced by assembly
language versions, or on vector machines the ``unrolled loops''
were replaced by simple loops in Fortran.
For the LINPACK routines, no attempt was made to use special hardware features 
on certain machines, or to exploit vector capabilities or multiple processors.
(The compilers on some machines may, of course, generate optimized code
that itself accesses special features.) 
.sp
.PP
One further note: The information in the tables was compiled 
over a period of time.
Subsequent systems software and hardware changes
may alter the timings to some extent.
.ps 14
.sp 2
.ds CF \*(DY
.ds LH "    Full Precision
.ds CH - % -
.ds RH All Fortran
.bp
.B
.ce
Solving a System of Linear Equations 
.sp
.ce
with LINPACK @ nothing sup a @ in Full Precision @ nothing sup b @ Using All Fortran
.sp
.R
.ps 10
.TS H
center;
l l c c c c
l l c c c c
l l c c c c.
Computer	OS/Compiler@ nothing sup c @	Ratio@ nothing sup d @	MFLOPS@ nothing sup e @	Time	Unit@ nothing sup f @
				secs	@ mu @secs
_

.TH
NEC SX-2	FORTRAN 77/SX(Rolled BLAS)	.26	46	.015	0.043
NEC SX-1	FORTRAN 77/SX(Rolled BLAS)	.31	39	.017	0.051
NEC SX-1E	FORTRAN 77/SX(Rolled BLAS)	.35	35	.020	0.057
CRAY X-MP-2 (1 proc.)	CFT 1.13(Rolled BLAS)	.50	24	.028	0.082
Amdahl 1200	Fortran 77(Rolled BLAS)	.69	18	.039	0.11
CDC Cyber 205 (2-pipe)	FTN(Rolled BLAS)	.70	17	.039	0.11
Fujitsu VP-200	Fortran 77(Rolled BLAS)	.72	17	.040	0.12
Hitachi S-810/20	FORT77/HAP(Rolled BLAS)	.74	17	.042	0.12
Fujitsu VP-100	Fortran 77(Rolled BLAS)	.78	16	.044	0.13
Amdahl 1100	Fortran 77(Rolled BLAS)	.78	16	.044	0.13
CRAY-2 (1-proc.)	CFT 2.70(Rolled BLAS)	.84	15	.047	0.14
Amdahl 500	Fortran 77(Rolled BLAS)	.86	14	.048	0.14
Fujitsu VP-50	Fortran 77(Rolled BLAS)	.90	14	.051	0.15
CRAY-1S	CFT(Rolled BLAS)	1	12	.056	0.16
IBM 3090-200/VF (1 proc.)	VS Fortran V2(Rolled BLAS)	1.0	12	.057	0.17
Sperry 1100/90 ext w/ISP	UCS level 2	1.1	11	.061	0.18
Alliant FX/8 (8CEs)	FX Fortran v2.0.19(Rolled BLAS)	1.6	7.6	.009	0.26
SCS-40	CFT 1.13(Rolled BLAS)	1.7	7.3	.095	0.28
IBM 3090-200 (1 proc.)	VS opt=3	1.8	6.8	.102	0.30
Fujitsu M-380	Fortran 77, opt=3	1.9	6.3	.109	0.32
FPS-264	F02 APFTN64 OPT=4(Rolled BLAS)	2.2	5.6	.122	0.35
NAS 9060	VS opt=2	2.3	5.3	.130	0.38
Honeywell DPS90	ES F77V 1.0(Rolled BLAS)	2.5	5.0	.138	0.40
CDC Cyber 875	FTN 5 opt=3	2.6	4.8	.143	0.42
CDC Cyber 176	FTN 5.1 opt=2	2.6	4.6	.148	0.43
Amdahl 5860 HSFPF	H enhanced opt=3	3.1	3.9	.176	0.51
Amdahl 5860 HSFPF	VS opt=3	3.2	3.8	.181	0.53
CDC 7600	FTN	3.8	3.3	.210	0.61
FPS-264/20	F02 APFTN64 OPT=4(Rolled BLAS)	4.1	3.0	.229	0.67
CONVEX C-1	Fortran 1.6(Rolled BLAS)	4.2	2.9	.235	0.69
CDC Cyber 760	FTN 5, opt=3	4.7	2.6	.260	0.76
IBM 370/195	H enhanced opt=3	4.9	2.5	.275	0.80
IBM 3081 K (1 proc.)	H enhanced opt=3	5.7	2.1	.321	0.94
CDC Cyber 175	FTN 5 opt=2	5.8	2.1	.322	0.94
CDC Cyber 180-860	NOS/VE OPT=HIGH	5.8	2.1	.324	0.94
IBM 3081 K (1 proc.)	VS opt=3	6.2	2.0	.346	1.01
CDC 7600	Local	6.4	2.0	.359	1.05
Sperry 1100/90	FTN opt=ZEO	6.7	1.8	.374	1.09
CDC Cyber 175	FTN 5 opt=1	6.8	1.8	.381	1.11
IBM 3033	H enhanced opt=3	7.0	1.7	.390	1.14
Honeywell DPS 8/88	FR7X	7.0	1.7	.394	1.15
IBM 3033	VS opt=3	7.1	1.7	.396	1.15
FPS-164/364	F02 APFTN64 OPT=4(Rolled BLAS)	7.2	1.7	.405	1.18
Sperry 1100/90 ext 	UFTN	7.3	1.7	.408	1.19
IBM 3081 D	VS opt=3	7.4	1.7	.415	1.21
Alliant FX/1 (1CE)	FX Fortran v2.0.19(Rolled BLAS)	7.5	1.6	.420	1.22
Amdahl 470 V/8	H enhanced opt=3	7.7	1.6	.429	1.25
CDC Cyber 180-850	NOS/VE OPT=HIGH	7.8	1.6	.436	1.27
Amdahl 470 V/8	VS opt=3	8.2	1.5	.458	1.33
CDC 7600	CHAT, No opt	9.9	1.2	.554	1.61
IBM 4381-13	VS 1.4.0 opt=3	10	1.2	.579	1.69
IBM 370/168 Fast Mult	H Ext	10	1.2	.579	1.69
Amdahl 470 V/6	H opt=2	11	1.1	.631	1.84
ELXSI	FTN MOD 2	12	1.1	.643	1.87
DEC VAX 8800	VMS V4.3	12	.99	.690	2.01
CDC Cyber 180-840	NOS/VE OPT=HIGH	12	.99	.694	2.02
IBM 4381 MG2	VS opt=3	13	.96	.718	2.09
IBM 4381-12 	VS 1.4.0 opt=3	13	.95	.726	2.12
IBM 370/165 Fast Mult	H Ext	16	.77	.890	2.59
DEC VAX 8650	VMS v4.1	17	.70	.975	2.84
DEC VAX 8500	VMS v4	19	.65	1.05	3.07
Sperry 1100/80 w/SAM	FTN opt=ZEO	21	.58	1.18	3.45
Harris H1200	VOS 4.1 opt g	22	.56	1.22	3.57
DEC VAX 8600	VMS v4.1 Fortran 4.2	25	.49	1.41	4.11
Harris HCX-7 w/fpp	f77 1.0	26	.48	1.43	4.17
CDC 6600	FTN 4.6 opt=2	26	.48	1.44	4.19
CDC Cyber 170-835	FTN 5 opt=2	26	.47	1.45	4.22
CCI Power 6/32 w/fpa	UNIX 4.2 bsd f77	26	.47	1.45	4.22
Sperry 7000	4.2	26	.47	1.47	4.27
Gould PN9000	UNIX	26	.47	1.47	4.29
IBM 4381 MG1	VS opt=3	27	.46	1.49	4.35
CDC Cyber 170-835	FTN 5 opt=1	28	.44	1.57	4.58
Harris H1000	VOS 3.3 opt g	30	.41	1.67	4.86
SUN-3/160M + FPA	f77 -O -ffpa 3.1	30	.40	1.70	4.95
IBM 4381-11	VS 1.4.0 opt=3	31	.39	1.76	5.12
DEC VAX 8600	VMS v3.6 Fortran 3.4	32	.38	1.78	5.18
NORSK DATA ND-570/2	Fortran-500-E	32	.38	1.80	5.24
Sperry 1100/80	FTN opt=ZEO	32	.38	1.80	5.24
CONCEPT 32/8750	UTX/32	34	.36	1.88	5.48
Celerity C1230	UNIX 4.2 bsd f77	34	.36	1.90	5.53
CDC 6600	RUN	34	.36	1.93	5.62
Gould PN9080	UTX/32	35	.35	1.97	5.73
Prime 9950	F77 19.4.2	36	.34	2.00	5.83
Data General MV/10000	f77 opt level 2	40	.30	2.26	6.58
IBM 4361 MG5	VS opt=3	41	.30	2.31	6.73
IRIS 2400 Turbo/FPA	f77	50	.24	2.82	8.21
CDC Cyber 180-830	NOS/VE OPT=HIGH	52	.24	2.89	8.41
Harris 800	Fortran 77	53	.23	2.99	8.70
IBM 370/158	H opt=3	53	.23	2.99	8.71
IBM 370/158	VS opt=3	56	.22	3.15	9.17
NORSK DATA ND-560	Fortran-500	57	.22	3.20	9.32
Celerity C1200	UNIX 4.2 bsd f77	58	.21	3.23	9.42
Honeywell DPS 8/70	FR7X	58	.21	3.24	9.43
Denelcor HEP	f77 UPX	59	.21	3.32	9.68
CDC Cyber 170-720	FTN 5, opt=2	62	.20	3.47	10.1
VAX 11/785 FPA	VMS v4.1	63	.20	3.50	10.2
Itel AS/5 mod 3	H	63	.19	3.54	10.3
NORSK DATA ND-500	Fortran-500-E	63	.19	3.54	10.3
CDC Cyber 170-825	FTN 5, opt=2	65	.19	3.63	10.6
IBM 4341 MG10	VS opt=3	66	.19	3.70	10.8
VAX 11/785 FPA	UNIX 4.2 bsd f77	67	.18	3.75	10.9
CDC Cyber 170-825	FTN 5, opt=1	68	.18	3.81	11.1
CDC Cyber 170-720	FTN 5, opt=1	70	.17	3.93	11.4
Ridge 32/130	ROS 3.3/RISC	71	.17	3.96	11.5
Perkin Elmer 3252	OS 6.2.4 fortran z	71	.17	4.00	11.7
CDC Cyber 180-810	NOS/VE OPT=HIGH	74	.17	4.14	12.1
Perkin Elmer 3242	OS 32 v7.2 f77	80	.16	4.50	13.1
DEC VAX 8200	VMS V4.3	80	.15	4.49	13.8
ICL 2988	f77 OPT=2	85	.14	4.78	13.9
VAX 11/780 FPA	VMS v4.1	89	.14	4.96	14.4
micro VAX II	VMS v4.1	97	.13	5.45	15.9
VAX 11/750 FPA	VMS v4.1	99	.12	5.52	16.1
CONCEPT 32/6750	UTX/32	99	.12	5.53	16.1
VAX 11/780 FPA	UNIX 4.2 BSD f77	101	.13	5.67	16.5
CDC 6500	FUN	102	.12	5.69	16.6
Prime 750	Primos f77 v19.1	107	.11	6.00	17.4
Definicon DSI-780	SVS Fortran (MSDOS)	109	.113	6.09	17.7
Perkin Elmer 3230	OS 6.2.2 fortran 5.2	112	.11	6.28	18.3
VAX 11/750 FPA	UNIX 4.2 bsd f77	128	.096	7.15	20.8
Prime 850	Primos	130	.095	7.26	21.1
Sperry 1100/60	FTN opt=ZEO	132	.093	7.38	21.5
Pyramid 90X FPA	UNIX 4.2 bsd f77	137	.088	7.65	22.3
Ridge 32/110	ROS 3.3/RISC	151	.081	8.48	24.7
SUN-3/75 w/68881	f77 -O -f68881 3.0	155	.079	8.67	25.2
Data General MV/8000	f77 opt level 2	157	.078	8.80	25.6
Apollo DN460/660	AEGIS 8.0 FTN	179	.069	10.0	29.2
HP 9000 Series 320	HP-UX, f77 5.15	195	.063	10.9	31.9
Apollo DN3000	AEGIS 8.0 FTN	197	.062	11.4	32.1
Masscomp MC500 w/FPP	3.1 Fortran 	200	.061	11.2	32.6
Harris HS-20 w/FPP	Fortran 77 3.1	202	.061	11.3	33.0
Sequent Balance 8000	DYNIX Fortran 2.4.4	208	.059	11.7	33.9
Definicon DSI-32/10	GreenHills f77 (MSDOS)	214	.057	12.0	34.9
VAX 11/750	VMS v4.1	215	.057	12.1	35.1
HP 9000 Series 500	Fortran 1.7	285	.043	16.0	46.6
ENCORE MULTIMAX	f77	298	.041	16.7	48.7
ATT 3B20 FP	UNIX V 2.0/4	310	.040	17.3	50.5	
Acorn Cambridge	fortran	312	.039	17.5	51.0
IBM 4331 MG2	H opt=3	326	.038	18.3	53.2
Burroughs B6800	Fortran 77 ver 34	329	.037	18.4	53.7
VAX 11/725 FPA	VMS v4.1	330	.037	18.5	53.9
Masscomp MCS-541 w/FPB	Fortran 3.1	332	.037	18.6	54.1
IBM RT PC Model 20	f77	341	.036	19.1	55.6
VAX 11/730 FPA	VMS	348	.036	19.5	56.9
Prime 2250	Fortran 77	365	.034	20.5	59.6
IBM PC-AT/370	VS opt=3	369	.033	20.7	60.2
IBM PC-XT/370	H opt=3	391	.031	21.9	63.7
VAX 11/750	UNIX 4.2 bsd f77	422	.029	23.7	69.0
Apollo DN320	AEGIS 8.0 FTN	440	.028	24.6	71.8
SUN-2/50 + SKY FFP	f77 -O -fsky 3.0	454	.027	25.4	74.0
Apollo DN550 FPA	AEGIS 8.0 FTN	489	.025	27.4	79.8
micro VAX I	VMS	529	.023	29.6	86.2
Canaan	VS	588	.021	33.0	96.0
Chas. River Data 6835+SKY	SVS Fortran 77	700	.018	39.2	114.
Apollo DN 420 PEB	AEGIS 7+ FTN	707	.017	39.6	115.
IBM AT w/80287	PROFORT 1.0	1054	.012	59.1	172.
IBM PC w/8087	PROFORT 1.0	1054	.012	59.1	172.
Cadtrak DS1/8087	Intel Fortran 77	1143	.011	64.0	186.
IBM PC/AT w/80287	Microsoft 3.2	1341	.0091	75.1	219.
Chas. River Data 6835	SVS Fortran 77	1401	.0088	78.5	229.
Apollo DN300	AEGIS 8.0 FTN	1719	.0071	96.4	281.
Masscomp MC500	3.1 Fortran 	1751	.0070	98.1	286.
IBM PC w/8087	Microsoft 3.2	1766	.0069	98.9	288.
HP 9000 Series 200	HP-UX	1982	.0062	111.	323
SUN-2/50	f77 -O -fsoft 3.0	2232	.0055	125.	363.
SUN	UNIX, f77 no opt	2661	.0046	149.	434.
Apple Macintosh	ABSOFT 2.0b	3196	.0038	179.	521.
.TE
.sp 2
.ds LH "    Full Precision
.ds CH - % -
.ds RH Coded BLAS
.bp
.B
.ce
Solving a System of Linear Equations 
.sp
.ce
with LINPACK @ nothing sup a @ in Full Precision @ nothing sup b @ Using Coded BLAS
.sp
.R
.ps 10
.TS H
center;
l l c c c c
l l c c c c
l l c c c c.
Computer	OS/Compiler@ nothing sup c @	Ratio@ nothing sup d @	MFLOPS@ nothing sup e @	Time	Unit@ nothing sup f @
				secs	@ mu @secs
_

.TH
CRAY X-MP-2 (1 proc.)	CFT 1.14(Coded BLAS)	.22	57	.012	0.035
Amdahl 1200	Fortran 77(Coded BLAS)	.45	27	.025	0.073
CDC Cyber 205 (4-pipe)	FTN200 2.1.6(Coded BLAS)	.50	25	.028	0.081
CDC Cyber 205 (2-pipe)	FTN200 2.1.6(Coded BLAS)	.53	23	.029	0.086
CRAY-1S	CFT(Coded BLAS)	.54	23	.030	0.088
Amdahl 1100	Fortran 77(Coded BLAS)	.54	23	.030	0.087
Amdahl 500	Fortran 77(Coded BLAS)	.60	20	.034	0.098
IBM 3090 Model 200/VF	VS Fortran V2(Coded BLAS)	.83	15	.047	0.14
NAS 9160	VS opt=3(Coded BLAS)	.94	13	.053	0.15
FPS-264	F.01 F77(Coded BLAS)	1.2	10	.066	0.19
Alliant FX/8 (8CEs)	FX Fortran v2.0.19(Coded BLAS)	1.3	9.8	.070	0.20
CDC Cyber 875	FTN 5 opt=3(Coded BLAS)	2.2	5.5	.124	0.36
FPS-264/20	F02 APFTN64 OPT=4(Coded BLAS)	2.6	4.6	.148	0.43
CDC 7600	FTN(Coded BLAS)	2.6	4.6	.148	0.43
CDC Cyber 760	FTN 5, opt=3(Coded BLAS)	3.3	3.7	.186	0.54
CONVEX C-1	Fortran 1.6(Coded BLAS)	3.7	3.3	.209	0.61
FPS-164/364	F.01 D, opt=3(Coded BLAS)	4.2	2.9	.232	0.68
Alliant FX/1 (1CE)	FX Fortran v2.0.19(Coded BLAS)	6.2	2.0	.348	1.01
IBM 4381-13 M/A asst	VS 1.4.0 opt=3(Coded BLAS)	7.7	1.6	.430	1.25
ELXSI	FTN MOD 2(Coded BLAS)	8.7	1.4	.485	1.41
IBM 4381-12 M/A asst	VS 1.4.0 opt=3(Coded BLAS)	9.6	1.3	.540	1.57
IBM 4381 MG2 M/A asst	VS opt=3(Coded BLAS)	10	1.2	.559	1.63
DEC VAX 8800	VMS v4(Coded BLAS)	11	1.13	.606	1.76
DEC VAX 8650	VMS v4.1(Coded BLAS)	13	.96	.715	2.08
Harris H1200	VOS 4.1 opt g(Coded BLAS)	14	.85	.805	2.34
Gould PN9000	UNIX(Coded BLAS)	15	.81	.850	2.48
NORSK DATA ND-570/2	Fortran-500-E(Coded BLAS)	16	.76	.900	2.62
DEC VAX 8500	VMS v4(Coded BLAS)	16	.76	.900	2.62
IBM 4381 MG1 M/A asst	VS opt=3(Coded BLAS)	19	.65	1.06	3.10
DEC VAX 8600	VMS v4.1(Coded BLAS)	19	.66	1.04	3.03
IBM 4381-11 M/A asst	VS 1.4.0 opt=3(Coded BLAS)	21	.59	1.16	3.37
Celerity C1230	UNIX 4.2 bsd f77(Coded BLAS)	21	.59	1.17	3.40
Harris H1000	VOS 3.3 opt g(Coded BLAS)	22	.57	1.21	3.52
IRIS 2400 Turbo/FPA	f77(Coded BLAS)	36	.34	2.04	5.93
Celerity C1200	UNIX 4.2 bsd f77(Coded BLAS)	38	.32	2.15	6.26
VAX 11/785 FPA	VMS v4.1(Coded BLAS)	54	.23	3.01	8.77
DEC VAX 8200	VMS 4.3(Coded BLAS)	67	.18	3.75	10.9
VAX 11/780 FPA	VMS v4.1(Coded BLAS)	74	.17	4.12	12.0
micro VAX II	VMS v4.1(Coded BLAS)	79	.16	4.40	12.8
VAX 11/750 FPA	VMS v4.1(Coded BLAS)	83	.15	4.64	13.5
Apollo DN460/660	AEGIS 8.0 FTN(Coded BLAS)	111	.11	6.20	18.1
Masscomp MC500 w/FPP	3.1 Fortran(Coded BLAS) 	132	.093	7.42	21.6
Sequent Balance 8000	DYNIX Fortran 2.4.4(Coded BLAS)	185	.066	10.4	30.2
Apollo DN320	AEGIS 8.0 FTN(Coded BLAS)	235	.052	13.2	38.3
Apollo DN550 FPA	AEGIS 8.0 FTN(Coded BLAS)	262	.047	14.7	42.7
VAX 11/725 FPA	VMS v4.1(Coded BLAS)	283	.043	15.8	46.2
VAX 11/730 FPA	VMS(Coded BLAS)	286	.043	16.0	46.6
Apollo DN300	AEGIS 8.0 FTN(Coded BLAS)	1721	.0071	96.4	281.
.TE
.ds LH "    Full Precision
.ds CH - % -
.ds RH Compiler Directives
.bp
.sp 2
.PP
With some of the parallel and vector computers it is possible to take advantage
of the multiprocessors and special hardware through Fortran.
For example, on the CRAY X-MP and the Alliant FX/8, a simple compiler
directive, in the form of a Fortran comment,
allows one to gain access to the multiprocessing capabilities of these machines.
.TS
center;
l l c c c c.
Computer	OS/Compiler@ nothing sup c @	Ratio@ nothing sup d @	MFLOPS@ nothing sup e @	Time	Unit@ nothing sup f @
				secs	@ mu @secs
_

CRAY X-MP-4	Comp Dir(Rolled BLAS)	.25	49	.014	.0408
Fujitsu VP-200	Comp Dir(Rolled BLAS)	.64	19	.040	.110
Alliant FX/8 (8CEs)	Comp Dir(Coded BLAS)	1.4	8.6	.08	.233
Alliant FX/8 (8CEs)	Comp Dir(Rolled BLAS)	2.0	6.2	.11	.320
.TE
.ds LH "    Half Precision
.ds CH - % -
.ds RH All Fortran
.bp
.ps 14
.B
.ce
Solving a System of Linear Equations 
.sp
.ce
with LINPACK @ nothing sup a @ in Half Precision @ nothing sup b @ Using All Fortran
.ps 10
.sp
.R
.TS H
center;
l l c c c c
l l c c c c
l l c c c c.
Computer	OS/Compiler@ nothing sup c @	Ratio@ nothing sup d @	MFLOPS@ nothing sup e @	Time	Unit@ nothing sup f @
				secs	@ mu @secs
_
.TH

NEC SX-2	FORTRAN 77/SX(Rolled BLAS)	.26	47	.015	0.043
NEC SX-1	FORTRAN 77/SX(Rolled BLAS)	.31	40	.017	0.050
NEC SX-1E	FORTRAN 77/SX(Rolled BLAS)	.35	35	.020	0.057
Amdahl 1200	Fortran 77(Rolled BLAS)	.67	18	.038	0.11
Fujitsu VP-200	Fortran 77(Rolled BLAS)	.69	18	.039	0.11
Amdahl 1100	Fortran 77(Rolled BLAS)	.73	17	.041	0.12
Fujitsu VP-100	Fortran 77(Rolled BLAS)	.75	16	.042	0.12
Hitachi S-810/20	FORT77/HAP(Rolled BLAS)	.78	16	.044	0.13
Amdahl 500	Fortran 77(Rolled BLAS)	.81	15	.045	0.13
Fujitsu VP-50	Fortran 77(Rolled BLAS)	.88	14	.049	0.14
Sperry 1100/90 ext w/ISP	UCS level 2	.89	14	.050	0.15
IBM 3090 Model 200/VF	VS Fortran V2(Rolled BLAS)	.95	13	.053	0.16
Alliant FX/8 (8CEs)	FX Fortran v2.0.19(Rolled BLAS)	1.6	7.6	.090	0.26
IBM 3090 Model 200	VS opt=3	1.7	7.1	.097	0.28
Fujitsu M-380	Fortran 77, opt=3	1.7	7.0	.098	0.28
Honeywell DPS90	ES F77V 1.0(Rolled BLAS)	1.8	6.7	.103	0.30
Amdahl 5860 HSFPF	H enhanced opt=3	2.2	5.5	.125	0.36
NAS 9060	VS opt=2	2.4	5.2	.133	0.38
Amdahl 5860 HSFPF	VS opt=3	2.4	5.1	.135	0.39
CONVEX C-1	Fortran 1.6(Rolled BLAS)	3.0	4.1	.168	0.49
Sperry 1100/90	FTN opt=ZEO	4.4	2.8	.248	0.72
Amdahl 470 V/8	H enhanced opt=3	4.4	2.8	.246	0.71
Amdahl 470 V/8	VS opt=3	4.5	2.7	.254	0.74
Sperry 1100/90 ext	UFTN	5.0	2.5	.279	0.81
IBM 3081 K	H enhanced opt=3	5.1	2.4	.283	0.82
IBM 3081 K	VS opt=3	5.6	2.2	.311	0.91
IBM 3033	VS Fortran	6.3	1.9	.353	1.03
Honeywell DPS 8/88	FR7X	6.6	1.8	.371	1.08
IBM 3081 D	VS opt=3	6.7	1.8	.376	1.10
Alliant FX/1 (1CE)	FX Fortran v2.0.19(Rolled BLAS)	7.8	1.6	.440	1.28
DEC VAX 8800	VMS 4.3	9.1	1.35	.510	1.48
DEC VAX 8650	VMS v4.1	9.7	1.3	.545	1.59
ELXSI	FTN MOD 2	10	1.2	.570	1.66
IBM 4381-13	VS 1.4.0 opt=3	12	1.1	.644	1.88
Sperry 7000	4.2	13	.96	.717	2.09
CCI Power 6/32 w/fpa	UNIX 4.2 bsd f77	13	.94	.733	2.14
Harris HCX-7 w/fpp	f77 1.0	13	.94	.733	2.14
Numerix NMX-432	AR Fortran	14	.89	.769	2.24
DEC VAX 8600	VMS v4.1 Fortran 4.2	14	.88	.780	2.27
IBM 4381 MG2	VS opt=3	14	.86	.801	2.33
IBM 4381-12	VS 1.4.0 opt=3	14	.85	.805	2.34
Sperry 1100/80 w/SAM	FTN opt=ZEO	15	.83	.830	2.41
DEC VAX 8500	VMS v4	15	.80	.859	2.50
Gould PN9000	UNIX	17	.69	1.0	2.83
Gould 9050 w MACC	UNIX 4.2bsd f77	18	.70	.980	2.85
Harris H1200	VOS 4.1 opt g	18	.68	1.01	2.95
SUN-3/160M + FPA	f77 -O -ffpa 3.1	20	.62	1.12	3.25
Gould PN9080	UTX/32	21	.57	1.20	3.49
DEC VAX 8600	VMS v3.6 Fortran 3.4	22	.57	1.21	3.54
Harris H1000	VOS 3.3 opt g	22	.57	1.21	3.52
CONCEPT 32/8750	UTX/32	23	.54	1.27	3.69
Sperry 1100/80	FTN opt=ZEO	24	.52	1.32	3.85
IBM 4381 MG1	VS opt=3	24	.51	1.33	3.89
Celerity C1230	UNIX 4.2 bsd f77	24	.51	1.35	3.93
NORSK DATA ND-570/2	Fortran-500-E	25	.50	1.38	4.02
IBM 4381-11	VS 1.4.0 opt=3	28	.43	1.58	4.61
IBM 4361 MG5	VS opt=3	29	.42	1.65	4.81
Data General MV/10000	f77 opt level 2	31	.39	1.75	5.09
IRIS 2400 Turbo/FPA	f77	31	.40	1.72	5.02
DEC VAX 11/785 FPA	VMS v4.1	31	.40	1.72	5.02
Prime 9950	F77 19.4.2	36	.34	2.00	5.82
Honeywell DPS 8/70	FR7X	40	.31	2.22	6.46
DEC VAX 11/785 FPA	UNIX 4.2 bsd f77	40	.31	2.27	6.50
Celerity C1200	UNIX 4.2 bsd f77	40	.30	2.27	6.60
IBM 370/158	H opt=3	42	.29	2.35	6.86
Perkin Elmer 3252	OS 6.2.4 fortran z	45	.27	2.50	7.28
NORSK DATA ND-560	Fortran-500	45	.27	2.54	7.39
NORSK DATA ND-500	Fortran-500-E	46	.27	2.58	7.51
DEC KL-20	F20	46	.27	2.59	7.53
IBM 370/158	VS opt=3	46	.26	2.60	7.58
Ridge 32/130	ROS 3.3/RISC	47	.26	2.64	7.70
DEC VAX 11/780 FPA	VMS v4.1	49	.25	2.74	7.98
Sperry 1100/60	FTN opt=ZEO	49	.25	2.77	8.09
ICL 2988	f77 OPT=2	50	.25	2.79	8.13
Perkin Elmer 3242	OS 32 v7.2 f77	51	.24	2.88	8.37
Harris 800	Fortran 77	53	.23	2.99	8.70
DEC VAX 8200	VMS 4.3	55	.22	3.10	9.03
IBM 4341 MG10	VS opt=3	57	.22	3.18	9.25
DEC VAX 11/780 FPA	UNIX 4.2 BSD f77	58	.21	3.25	9.47
Honeywell 6080	Y	62	.20	3.46	10.1
Pyramid 90X FPA	UNIX 4.2 bsd f77	63	.20	3.50	10.2
CONCEPT 32/6750	UTX/32	65	.19	3.63	10.6
Ridge 32/110	ROS 3.3/RISC	67	.18	3.73	10.9
DEC VAX 11/750 FPA	VMS v4.1	67	.18	3.75	10.9
Data General MV/8000	f77 opt level 2	69	.18	3.84	11.2
DEC micro VAX II	VMS v4.1	70	.17	3.95	11.5
DEC VAX 11/780	VMS v4.1	74	.17	4.13	12.0
Perkin Elmer 3230	OS 6.2.2 fortran 5.2	76	.16	4.26	12.4
Prime 750	Primos f77 v19.1	89	.14	5.00	14.6
DEC VAX 11/750 FPA	UNIX 4.2 bsd f77	91	.13	5.12	14.9
Definicon DSI-780	SVS Fortran (MSDOS)	94	.13	5.27	15.3
Prime 850	Primos	97	.13	5.41	15.8
Masscomp MC500 w/FPP	3.1 Fortran 	111	.11	6.23	18.2
Harris HS-20 w/FPP	Fortran 77 3.1	114	.11	6.38	18.6
Apollo DN460/660	AEGIS 8.0 FTN	118	.10	6.60	19.3
HP 9000 Series 500	Fortran 1.7	125	.098	7.00	20.4
Pyramid 90X	UNIX 4.2 bsd f77	134	.092	7.50	21.8
DEC VAX 11/750	VMS v4.1	138	.089	7.71	22.5
IBM 4331 MG2	H opt=3	140	.088	7.84	22.8
SUN-3/75 w/68881	f77 -O -f68881 3.0	145	.084	8.15	23.7
Sequent Balance 8000	DYNIX Fortran 2.4.4	162	.075	9.10	26.5
HP 9000 Series 320	HP-UX, f77 5.15	164	.075	9.18	26.7
Apollo DN3000	AEGIS 8.0 FTN	174	.068	10.0	29.2
Definicon DSI-32/10	GreenHills f77 (MSDOS)	175	.070	9.82	28.6
DEC VAX 11/750	UNIX 4.1 bsd f77	204	.060	11.4	33.3
Masscomp MCS-541 w/FPB	Fortran 3.1	227	.054	12.7	36.9
ATT 3B20 FP	UNIX V 2.0/4	231	.053	12.9	37.7	
ENCORE MULTIMAX	f77	233	.052	13.1	38.1
Burroughs 6700	H	234	.052	13.1	38.2
DEC VAX 11/725 FPA	VMS v4.1	236	.052	13.2	38.5
Acorn Cambridge	fortran	250	.049	14.0	40.8
Prime 2250	Fortran 77	258	.048	14.5	42.1
DEC VAX 11/730 FPA	VMS	259	.047	14.5	42.2
SUN-2/50 + SKY FFP	f77 -O -fsky 3.0	266	.046	14.9	43.4
micro VAX I	VMS	272	.045	15.2	44.4
IBM PC-AT/370	VS opt=3	279	.044	15.6	45.5
Apollo DN320	AEGIS 8.0 FTN	277	.044	15.5	45.2
Chas. River Data 6835+SKY	SVS Fortran 77	284	.043	15.9	46.3
IBM PC-XT/370	H opt=3	303	.040	17.0	49.5
DEC KA-10	F40	305	.040	17.1	49.8
Apollo DN550 FPA	AEGIS 8.0 FTN	305	.040	17.1	49.9
Canaan	VS	306	.040	17.1	49.9
IBM RT PC Model 20	f77	368	.033	20.6	60.0
Apollo DN 420 PEB	AEGIS 7+ FTN	394	.031	22.1	64.3
Chas. River Data 6835	SVS Fortran 77	770	.016	43.1	126.
IBM AT w/80287	PROFORT 1.0	891	.013	49.8	145.
Cadtrak DS1/8087	Intel Fortran 77	893	.013	50.0	146.
IBM PC w/8087	PROFORT 1.0	906	.013	50.8	148.
Apollo DN300	AEGIS 8.0 FTN	944	.013	53.0	154.
SUN-2/50	f77 -O -fsoft 3.0	983	.013	55.0	160.
Masscomp MC500	3.1 Fortran 	1037	.012	58.1	169.
IBM PC w/8087	Microsoft 3.1	1071	.011	60.0	175.
HP 9000 Series 200	HP-UX	1196	.010	67.0	195.
SUN	UNIX, f77 no opt	1298	.0094	72.7	212.
Apple Macintosh	ABSOFT 2.0b	1723	.0071	96.5	281.
Tandy 2000	Microsoft 3.13	3763	.0033	211.	614.
IBM PC	Microsoft 3.1	21875	.00056	1225.	3568.
Apple III	Pascal	50232	.00024	2813.	8193.
.TE
.sp
.ds LH "    Half Precision
.ds CH - % -
.ds RH Coded BLAS
.B
.ce
Solving a System of Linear Equations 
.sp
.ce
with LINPACK @ nothing sup a @ in Half Precision @ nothing sup b @ Using Coded BLAS
.ps 10
.sp
.R
.TS H
center;
l l c c c c
l l c c c c
l l c c c c.
Computer	OS/Compiler@ nothing sup c @	Ratio@ nothing sup d @	MFLOPS@ nothing sup e @	Time	Unit@ nothing sup f @
				secs	@ mu @secs
_
.TH

Amdahl 1200	Fortran 77(Coded BLAS)	.45	28	.025	0.073
CDC Cyber 205 (4-pipe)	FTN200 2.1.6(Coded BLAS)	.46	27	.026	0.075
Amdahl 1100	Fortran 77(Coded BLAS)	.50	24	.028	0.082
CDC Cyber 205 (2-pipe)	FTN200 2.1.6(Coded BLAS)	.53	23	.030	0.077
Amdahl 500	Fortran 77(Coded BLAS)	.57	22	.032	0.093
NAS 9160	VS opt=3 (Coded BLAS)	.70	17	.039	0.11
Alliant FX/8 (8CEs)	FX Fortran v2.0.19(Coded BLAS)	1.3	9.8	.070	0.20
CONVEX C-1	Fortran 1.6(Coded BLAS)	2.5	4.9	.139	0.41
Numerix NMX-432	AR Fortran(Coded BLAS)	3.5	3.5	.199	0.58
Alliant FX/1 (1CE)	FX Fortran v2.0.19(Coded BLAS)	6.1	2.0	.340	0.99
DEC VAX 8650	VMS v4.1(Coded BLAS)	6.4	1.9	.361	1.05
DEC VAX 8800	VMS v4(Coded BLAS)	7.4	1.7	.416	1.21
ELXSI	FTN MOD 2(Coded BLAS)	7.5	1.6	.418	1.22
Mitsubishi MX/3000 w/sp	Fortran 77(Coded BLAS)	7.9	1.6	.445	1.29
Gould PN9000	UNIX(Coded BLAS)	9.2	1.3	.520	1.50
DEC VAX 8600	VMS v4.1(Coded BLAS)	9.8	1.3	.546	1.59
NORSK DATA ND-570/2	Fortran-500-E(Coded BLAS)	12	1.0	.678	1.98
Harris H1200	VOS 4.1 opt g(Coded BLAS)	12	.99	.693	2.01
DEC VAX 8500	VMS v4(Coded BLAS)	13	.96	.717	2.09
Harris H1000	VOS 3.3 opt g(Coded BLAS)	15	.83	.825	2.40
Celerity C1230	UNIX 4.2 bsd f77(Coded BLAS)	17	.72	.950	2.77
IRIS 2400 Turbo/FPA	f77(Coded BLAS)	23	.54	1.26	3.68
DEC VAX 11/785 FPA	VMS v4.1(Coded BLAS)	24	.51	1.34	3.91
Celerity C1200	UNIX 4.2 bsd f77(Coded BLAS)	28	.44	1.55	4.52
DEC VAX 11/780 FPA	VMS v4.1(Coded BLAS)	36	.34	2.02	5.88
DEC VAX 8200	VMS 4.3(Coded BLAS)	42	.29	2.38	6.93
DEC VAX 11/750 FPA	VMS v4.1(Coded BLAS)	51	.24	2.83	8.24
DEC micro VAX II	VMS v4.1(Coded BLAS)	54	.23	3.04	8.81
Masscomp MC500 w/FPP	3.1 Fortran(Coded BLAS) 	72	.17	4.02	11.7
Apollo DN460/660	AEGIS 8.0 FTN(Coded BLAS)	99	.12	5.60	16.2
Sequent Balance 8000	DYNIX Fortran 2.4.4(Coded BLAS)	148	.083	8.31	24.2
Apollo DN320	AEGIS 8.0 FTN(Coded BLAS)	150	.082	8.40	24.5
Apollo DN550 FPA	AEGIS 8.0 FTN(Coded BLAS)	162	.076	9.10	26.4
DEC VAX 11/725 FPA	VMS v4.1(Coded BLAS)	186	.066	10.4	30.4
DEC VAX 11/730 FPA	VMS(Coded BLAS)	205	.060	11.5	33.4
COMPAQ PC/8087	Microsoft 3.13(Coded BLAS)	591	.021	33.1	96.5
Apollo DN300	AEGIS 8.0 FTN(Coded BLAS)	924	.013	51.8	151.
.TE
.sp
.sp
.PP
@ nothing sup a @ LINPACK routines @SGEFA@ and @SGESL@ were used for single
precision and routines @DGEFA@ and @DGESL@ were used for double precision.
These routines perform standard @LU@ decomposition with partial pivoting and
backsubstitution.
.sp
.PP
@ nothing sup b @\f2Full Precision\f1 implies the use of (approximately) 64 bit
arithmetic, e.g. CDC single precision or IBM double precision.
\f2Half Precision\f1 implies the use of (approximately) 32 bit
arithmetic, e.g. IBM single precision.
.sp
.PP
@ nothing sup c @\f2OS/Compiler\f1 refers to the operating system and
compiler used, 
(Coded BLAS) refers to the use of assembly language coding of the BLAS, and
(Rolled BLAS) refers to a Fortran version with, single statement, simple
loops [2].
.sp
.PP
@ nothing sup d @\f2Ratio\f1 is the number of times faster or slower a 
particular machine
configuration is when compared to the CRAY-1S using a Fortran
coding for the BLAS in full precision.
.sp
.PP
@ nothing sup e @\f2MFLOPS\f1 is a rate of execution, the number of million 
floating point
operations completed per second. For solving a system of @n@ equations, 
approximately
@ 2/3 n sup 3 ~+~ 2 n sup 2 @ operations are performed (we count both
additions and multiplications).
.sp
.PP
@ nothing sup f @\f2Unit\f1 is the time in microseconds required to execute
the statement @ y sub i ~ =~ y sub i ~+~ t*x sub i @.
This involves one floating point multiplication, one floating
point addition, and a few one-dimensional indexing operations and 
storage references. 
The actual statement occurs in SAXPY, which
is called roughly @n sup 2 /2 @ times by SGEFA and @2n@ times by SGESL
with vectors of varying lengths. The statement is executed 
approximately  @ n sup 3 over 3 ~ +~ n sup 2 @ times. Thus for @n ~=~ 100@,

.EQ I
Unit ~=~ { 10 sup 6 Time } / ( { 100 sup 3 over 3 ~ +~  100 sup 2 } ) .
.EN
.sp 2
.PP
The execution times for the LINPACK benchmark in Tables 1 and 2 were gathered
in the following way: The LINPACK code (SEGFA and SGESL in single precision
and DGEFA and DGESL in double precision) was not modified in any way.
In particular, no changes to the source statements or changes or additions
to the comments were allowed.
Only the BLAS were allowed to change, in two ways.
First, in some cases,
the Fortran
version of the BLAS was replaced and implemented in 
assembly language for the specific machine;
these are indicated by the annotation "(Coded BLAS)".
Second, in 
the case where a Fortran implementation of the BLAS
was used on vector machines the loops were usually rolled;
these are indicated by the annotation "(Rolled BLAS)".
(In the Fortran implementation of the BLAS the loops are 
unrolled to provided for better performance on scalar
machines. On vector machines this technique usually results in poor performance
since the compiler cannot reconize the vector loop.)
.PP
The same matrix was used to solve the system of equations.
The results
were checked for
accuracy by calculating a residual for the
problem, @ || ~ Ax ~-~ b ~ || / (||A|| ||x|| ) @.
The timing program is available on request.
.PP
Anyone interested in adding to or updating this table is
encouraged to contact the author.
Please send suggestions and interesting results to:
.sp
.nf
Jack J. Dongarra
Mathematics and Computer Science Division
Argonne National Laboratory
Argonne, Illinois 60439
.EQ
delim ##
.EN
ARPAnet: DONGARRA@ANL-MCS.ARPA


.SH
References
.sp
.IP [1] 
.R
J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart,
.I
LINPACK Users' Guide,
.R
SIAM Publications, Philadelphia, 1979.
.sp
.IP [2] 
.R
C. Lawson, R. Hanson, D. Kincaid, and F. Krogh,
"Basic Linear Algebra Subprograms for Fortran Usage,"
.I
ACM Trans. Math. Software,
.R
Vol. 5, No. 3, 1979, pp. 308-371.
.sp
.IP [3] 
.R
J. J. Dongarra and S. C. Eisenstat,
"Squeezing the Most out of an Algorithm in CRAY Fortran,"
.I
ACM Trans. Math. Software,
.R
Vol. 10, No. 3, 1984, pp. 221-230.
.sp
.ds LH "    Matrix Vector
.ds CH - % -
.ds RH
.bp
.SH
APPENDIX
.sp 
.EQ
delim @@
.EN
.B
.ps 14
.ce
Performance of Large Scientific Computers 
.sp
.ce
in a Fortran Environment 
.sp 2
.PP
The LINPACK routines used to generate the timings in the previous table do
not reflect the true performance of "advanced scientific computers".
A different implementation of the solution of linear equations,
presented in 
a report by Dongarra and Eisenstat [3], better describes the performance
on such machines.
That algorithm is based on matrix-vector operations rather
than just vector operations. This produces a program that has a high
level of modularity or larger granularity, having the potential
for better performance across a wide range of machines, especially
on high performance computers.
The number of floating point operations required and the 
roundoff errors produced by both algorithms are exactly the same,
only the way in which the matrix elements are accessed is different.
As before, a Fortran program was run and the time to complete the solution
of equations for a matrix of order 300 is reported.
.PP
Note that these numbers are for a problem of order 300
and all runs are for full precision.
.PP
The table was compiled over a period of time. Subsequent software and 
hardware changes to a computer system may alter the timing to some extent.
.ps 
.sp 2
.B
.ce
Solving a System of Linear Equations 
.ce
Using the Vector Unrolling Technique
.sp 2
.R
.ps 10
.TS H
center;
l l c c c 
l l c c c
l l c c c.
Computer	OS/Compiler@ nothing sup a @	MFLOPS@ nothing sup b @	Time	Unit@ nothing sup c @
			secs	@ mu @secs
_
.TH

CRAY X-MP-4 @"" sup \(sc @	CFT(Coded MV routines)	480	.038	.0042
CRAY X-MP-4 @"" sup \(sc @	CFT(Coded ISAMAX)	356	.051	.0056
NEC SX-2	FORTRAN 77/SX	309	.057	.0064
CRAY X-MP-2 @"" sup \(dd @	CFT(Coded ISAMAX)	257	.076	.0083
Amdahl 1200	Fortran 77(Coded MV routines)	230	.078	.0086
Fujitsu VP-200	Fortran 77(Comp directive)	220	.083	.0091
Amdahl 1200	Fortran 77(Comp directive)	215	.084	.0092
NEC SX-1	FORTRAN 77/SX	207	.087	.0096
Fujitsu VP-200	Fortran 77	183	.099	.011
Amdahl 1200	Fortran 77	180	1.00	.011
CRAY X-MP-1 @"" sup \(dg @	CFT(Coded MV routines)	171	.106	.0117
Amdahl 1100	Fortran 77(Coded MV routines)	167	.108	.0119
CRAY X-MP-2 @"" sup \(dd @	CFT	161	.113	.012
Amdahl 1100	Fortran 77(Comp directive)	159	.113	.0124
Fujitsu VP-100	Fortran 77(Comp directive)	159	.113	.0124
Hitachi S-810/20	FORT77/HAP	158	.115	.013
Amdahl 1100	Fortran 77	142	.127	.014
Fujitsu VP-100	Fortran 77	139	.129	.014
NEC SX-1E	FORTRAN 77/SX	140	.126	.014
CRAY X-MP-1 @"" sup \(dg @	CFT(Coded ISAMAX)	134	.136	.015
CRAY X-MP-1 @"" sup \(dg @	CFT	106	.172	.019
Amdahl 500	Fortran 77(Coded MV routines)	102	.176	.019
Fujitsu VP-50	Fortran 77(Comp directive)	100	.180	.02
Amdahl 500	Fortran 77(Comp directive)	99	.182	.02
CRAY-2 (1 proc.)	CFT 2.70	93	.195	.022
Amdahl 500	Fortran 77	84	.214	.024
Fujitsu VP-50	Fortran 77	84	.214	.024
CRAY 1-M	CFT(Coded ISAMAX)	83	.215	.024
CRAY 1-S	CFT(Coded ISAMAX)	76	.236	.026
CRAY 1-M	CFT	69	.259	.029
CRAY 1-S	CFT	66	.273	.030
Sperry 1100/90 ext w/ISP	UFTN(Coded MV routines)	39	.455	.051
FPS-264	F02, F77(Coded MV routines)	33	.550	.061
CDC Cyber 205	ftn 200 opt=1(Coded MV routines)	31	.59	.065
Sperry 1100/90 ext w/ISP	UFTN	29	.618	.069
IBM 3090 Model 200/VF	VS Fortran V2(Coded MV routines)	27	.673	.074
SCS-40	CFT 1.13	26	.683	.075 
FPS-164/364 + 4 MAX	E, F77 (Coded MV routines)	26	.70	.078
FPS-164/364 + 3 MAX	E, F77 (Coded MV routines)	24	.77	.085
NAS 9160	VS opt=3(Coded MV routines)	20	.88	.097
FPS-164/364 + 2 MAX	E, F77 (Coded MV routines)	20	.89	.098
FPS-264	F02 APFTN64 OPT=4	20	.90	.101
IBM 3090 Model 200/VF	VS Fortran V2	18	1.03	.114
FPS-164/364 + 1 MAX	E, F77 (Coded MV routines)	15	1.8	.130
Alliant FX/8 (8CEs)	FX Fortran(Coded MV routines)	14	1.3	.140
CONVEX C-1	Fortran 1.6(Coded MV routines)	14	1.3	.140
FPS-264/20	F02 APFTN64 OPT=4	11	1.6	.180
CONVEX C-1	Fortran 1.6	8.7	2.1	.230
FPS 164/364	E, opt=3(Coded MV routines)	8.7	2.1	.231
Alliant FX/8 (8CEs)	FX Fortran	7.3	2.5	.275
NAS 9060	VS opt=2	6.9	2.6	.285
FPS-164/364	F02 APFTN64 OPT=4	5.1	3.5	.391
IBM 370/195	VS opt=2	4.4	4.1	.455
IBM 3033	VS opt=2	2.5	7.1	.800
VAX 11/780 FPA	UNIX xf77	.11	177.	19.5
.TE
.sp
.I
Comments:
.R
.in +2.
@"" sup \(sc @ These timings are for four processors with manual changes
to use parallel features.
.sp
@"" sup \(dd @ These timings are for two processors with manual changes
to use parallel features.
.sp
@"" sup \(dg @ These timings are for one processor of an X-MP-2.
.sp
The major difference between the CRAY 1-M and CRAY 1-S is in the memory
speed, the CRAY 1-M having slower memory. The timings show the CRAY 1-M
to be faster than the CRAY 1-S. After much discussion and examination
of the generated assembly language code it was determined that, in fact,
the CRAY 1-M was faster for this program. The code generated by the
compiler causes the CRAY 1-S to miss a chain-slot. On the
CRAY 1-M, because of slower memory, the chain-slot is not missed, thus
the faster execution time.
.in -2.
.sp
.PP
@ nothing sup a @\f2OS/Compiler\f1 refers to the operating system and
compiler used,
(Coded ISAMAX) refers to the use of assembly language coding of the BLAS ISAMAX
and \f2Comp Directive\f1 refers to the use of compiler directives in the
matrix vector routines.
.sp
.sp
.PP
@ nothing sup b @\f2MFLOPS\f1 is a rate of execution, the number of million floating point
operations completed per second. For solving a system of @n@ equations,
@ 2/3 n sup 3 ~+~ 2 n sup 2 @ operations are performed (we count both
additions and multiplications).
.sp
.PP
@ nothing sup c @\f2Unit\f1 is the time in microseconds required to execute
the statement @ y sub i ~ =~ y sub i ~+~ t*x sub i @.
This involves one floating point multiplication, one floating
point addition, and a few one-dimensional indexing operations and 
storage references. 
.sp
.fi
.ds LH "    Peak Performance
.ds CH - % -
.ds RH
.bp
.ps 14
.ce
.B
Toward Peak Performance
.sp 2
.R
.PP
In response to many requests, we have collected the results
of solving a system of equations of order 1000. The difference between
this performance and the previous results listed in this paper
is that the manufacturer is allowed to use any algorithm
to solve the problem. The only restriction is that 
a driver program (supplied by the author of this paper) be run to ensure
that the same problem is being solved and to verify that the answer is correct.
.sp 2
.R
.ps 10
.TS H
center;
l l l
l n n.

Computer	MFLOPS	Time

_
.TH

CRAY X-MP-4 (4 processors)	713	.938
NEC SX-2	709	.947
Fujitsu VP-200	422	1.58
Amdahl 1200	397	1.67
CDC Cyber 205 (4-pipe/half precision)	308	2.16
Amdahl 1100	230	2.89
CDC Cyber 205 (2-pipe/half precision)	195	3.41
CDC Cyber 205 (4-pipe)	195	3.42
Amdahl 500	123	5.40
CDC Cyber 205 (2-pipe)	113	5.91
FPS-164 + 15 MAX	101	6.63
FPS-164 + 8 MAX	79	8.44
IBM 3090/VF (1 processor)	65	10.3
FPS-164/364 + 4 MAX	55	12.1
FPS-164/364 + 3 MAX	47	14.3
FPS-164/364 + 2 MAX	36	18.4
FPS-264	34	19.8
Alliant FX/8 (8 processors)	26	26
FPS-164/364 + 1 MAX	24	27.7
FPS-264/20	17	40
FPS-164/364	9	71.2
IBM 3081-KX (2 processors)	4	166
Sequent Balance 21000 (30 processors)	1.5	445
SUN-3/160M + FPA f77 -O -ffpa 3.1	.46	1448
.TE
.sp 3
.br
.SH
Acknowledgments
.PP
I would like to thank the people who have
helped in putting together this collection.