HPL Performance Results

The performance achieved by this software package on a few machine configurations is shown below. These results are only provided for illustrative purposes. By the time you read this, those systems have changed, they may not even exist anymore and one can surely not exactly reproduce the state in which these machines were when those measurements have been obtained. To obtain accurate figures on your system, it is absolutely necessary to download the software and run it there.

4 AMD Athlon K7 500 Mhz (256 Mb) - (2x) 100 Mbs Switched - 2 NICs per node (channel bonding)

OS Linux 6.2 RedHat (Kernel 2.2.14)
C compiler gcc (egcs-2.91.66 egcs-1.1.2 release)
C flags -fomit-frame-pointer -O3 -funroll-loops
MPI MPIch 1.2.1
BLAS ATLAS (Version 3.0 beta)
Comments 09 / 00

Performance (Gflops) w.r.t Problem size on 4 nodes.
GRID 2000 5000 8000 10000
1 x 4 1.28 1.73 1.89 1.95
2 x 2 1.17 1.68 1.88 1.93
4 x 1 0.81 1.43 1.70 1.80

The input file HPL.dat used for this run was:
HPLinpack benchmark input file - Athlon cluster - 09/00
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
4            # of problems sizes (N)
2000 5000 8000 10000 Ns
2            # of NBs
60 60        NBs
3            # of process grids (P x Q)
1 2 4        Ps
4 2 1        Qs
16.0         threshold
1            # of panel fact
1            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
2            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
60           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

8 Duals Intel PIII 550 Mhz (512 Mb) - Myrinet

OS Linux 6.1 RedHat (Kernel 2.2.15)
C compiler gcc (egcs-2.91.66 egcs-1.1.2 release)
C flags -fomit-frame-pointer -O3 -funroll-loops
MPI MPI GM (Version 1.2.3)
BLAS ATLAS (Version 3.0 beta)
Comments UTK / ICL - Torc cluster - 09 / 00

Performance (Gflops) w.r.t Problem size on 8- and 16-processors grids.
GRID 2000 5000 8000 10000 15000 20000
2 x 4 1.76 2.32 2.51 2.58 2.72 2.73
4 x 4 2.27 3.94 4.46 4.68 5.00 5.16

The input file HPL.dat used for this run was:
HPL Linpack benchmark input file - Torc - 09/00
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
6            # of problems sizes (N)
2000 5000 8000 10000 15000 20000 Ns
2            # of NBs
80 80           NBs
2            # of process grids (P x Q)
2 4            Ps
4 4            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
8            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
2            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
80           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

Compaq 16-node (4 ev67 667 Mhz processors per node) AlphaServer SC

OS Tru64 Version 5
C compiler cc Version 6.1
C flags -arch host -tune host -std -O5
MPI -lmpi -lelan
BLAS CXML
Comments ORNL / CCS - falcon - 09 / 00

Performance (Gflops) w.r.t Problem size on 16 nodes (64 processors).
GRID 5000 10000 25000 53000
8 x 8 26.37 45.00 60.99 66.93

The input file HPL.dat used for this run was:
HPL Linpack benchmark input file - falcon - 09/00
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
4            # of problems sizes (N)
5000 10000 25000 53000 Ns
2            # of NBs
88 88        NBs
1            # of process grids (P x Q)
8            Ps
8            Qs
16.0         threshold
1            # of panel fact
1            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
88           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)


The authors acknowledge the use of the Oak Ridge National Laboratory Compaq computer, funded by the Department of Energy's Office of Science and Energy Efficiency programs.


[Home] [Contact] [Copyright and Licensing Terms] [Algorithm] [Scalability] [Performance Results] [Documentation] [Software] [FAQs] [Tuning] [Errata-Bugs] [References] [Related Links]