HPL Performance Results
The performance achieved by this software package on a few machine
configurations is shown below. These results are only provided for
illustrative purposes. By the time you read this, those systems
have changed, they may not even exist anymore and one can surely
not exactly reproduce the state in which these machines were when
those measurements have been obtained. To obtain accurate figures
on your system, it is absolutely necessary to
download the software and run it there.
| OS | Linux 6.2 RedHat (Kernel 2.2.14) |
| C compiler | gcc (egcs-2.91.66 egcs-1.1.2 release) |
| C flags | -fomit-frame-pointer -O3 -funroll-loops |
| MPI | MPIch 1.2.1 |
| BLAS | ATLAS (Version 3.0 beta) |
| Comments | 09 / 00 |
| GRID |
2000 |
5000 |
8000 |
10000 |
| 1 x 4 |
1.28 |
1.73 |
1.89 |
1.95 |
| 2 x 2 |
1.17 |
1.68 |
1.88 |
1.93 |
| 4 x 1 |
0.81 |
1.43 |
1.70 |
1.80 |
Performance (Gflops) w.r.t Problem size on 4 nodes.
The input file HPL.dat used for this run was:
HPLinpack benchmark input file - Athlon cluster - 09/00
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
4 # of problems sizes (N)
2000 5000 8000 10000 Ns
2 # of NBs
60 60 NBs
3 # of process grids (P x Q)
1 2 4 Ps
4 2 1 Qs
16.0 threshold
1 # of panel fact
1 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
4 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
1 # of recursive panel fact.
2 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
60 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)
| OS | Linux 6.1 RedHat (Kernel 2.2.15) |
| C compiler | gcc (egcs-2.91.66 egcs-1.1.2 release) |
| C flags | -fomit-frame-pointer -O3 -funroll-loops |
| MPI | MPI GM (Version 1.2.3) |
| BLAS | ATLAS (Version 3.0 beta) |
| Comments |
UTK / ICL - Torc cluster - 09 / 00 |
| GRID |
2000 |
5000 |
8000 |
10000 |
15000 |
20000 |
| 2 x 4 |
1.76 |
2.32 |
2.51 |
2.58 |
2.72 |
2.73 |
| 4 x 4 |
2.27 |
3.94 |
4.46 |
4.68 |
5.00 |
5.16 |
Performance (Gflops) w.r.t Problem size on 8- and 16-processors grids.
The input file HPL.dat used for this run was:
HPL Linpack benchmark input file - Torc - 09/00
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
6 # of problems sizes (N)
2000 5000 8000 10000 15000 20000 Ns
2 # of NBs
80 80 NBs
2 # of process grids (P x Q)
2 4 Ps
4 4 Qs
16.0 threshold
1 # of panel fact
2 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
8 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
1 # of recursive panel fact.
2 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
80 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)
| OS | Tru64 Version 5 |
| C compiler | cc Version 6.1 |
| C flags | -arch host -tune host -std -O5 |
| MPI | -lmpi -lelan |
| BLAS | CXML |
| Comments |
ORNL / CCS
- falcon - 09 / 00 |
| GRID |
5000 |
10000 |
25000 |
53000 |
| 8 x 8 |
26.37 |
45.00 |
60.99 |
66.93 |
Performance (Gflops) w.r.t Problem size on 16 nodes (64 processors).
The input file HPL.dat used for this run was:
HPL Linpack benchmark input file - falcon - 09/00
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
4 # of problems sizes (N)
5000 10000 25000 53000 Ns
2 # of NBs
88 88 NBs
1 # of process grids (P x Q)
8 Ps
8 Qs
16.0 threshold
1 # of panel fact
1 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
4 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
1 # of recursive panel fact.
1 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
88 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)
The authors acknowledge the use of the Oak Ridge National Laboratory
Compaq computer, funded by the Department of Energy's Office
of Science and Energy Efficiency programs.
[Home]
[Contact]
[Copyright and Licensing Terms]
[Algorithm]
[Scalability]
[Performance Results]
[Documentation]
[Software]
[FAQs]
[Tuning]
[Errata-Bugs]
[References]
[Related Links]