[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Latest Athlon test results
"diff" of 3.2.1 vs. 3.3.1 SUMMARY.LOG: < is Atlas 3.2.1, > is Atlas 3.3.1
M. Edward Borasky, Borasky Research, 3 July 2001
Atlas options: 3DNow yes, all others defaults
Environment: 1.333 GHz Athlon Thunderbird, 512 MB DDR RAM
*Stock* Red Hat Linux 7.1, gcc version 2.96 20000731 (Red Hat Linux 7.1
2.96-81)
----------------------------------------------------------------------------
---
5c5
< *       BEGAN ATLAS INSTALL OF SECTION 0-0-0 ON 07/02/2001 AT 22:00
*
---
> *       BEGAN ATLAS INSTALL OF SECTION 0-0-0 ON 07/02/2001 AT 19:29
*
19c19
<          Apparent peak=1059.34MFLOPS
---
>          Apparent peak=1061.83MFLOPS
22c22
<          Apparent peak=1061.76MFLOPS
---
>          Apparent peak=1059.31MFLOPS
33c33
<                  This gave performance of 786.35 (74.2327777751340f
apparent peak)
---
>                  This gave performance of 784.60 (73.8927777751420f
apparent peak)
35c35
<                Performance = 171.72 (21.84% of copy matmul, 16.21% of
peak)
---
>                Performance = 172.81 (22.03% of copy matmul, 16.27% of
peak)
37c37
<                Performance = 171.06 (21.75% of copy matmul, 16.15% of
peak)
---
>                Performance = 171.82 (21.90% of copy matmul, 16.18% of
peak)
39c39
<                Performance = 780.63 (99.27% of copy matmul, 73.69% of
peak)
---
>                Performance = 777.79 (99.13% of copy matmul, 73.25% of
peak)
41c41
<                Performance = 164.40 (20.91% of copy matmul, 15.52% of
peak)
---
>                Performance = 165.47 (21.09% of copy matmul, 15.58% of
peak)
63c63
<               Performance = 244.32 (31.07% of copy matmul, 23.06% of peak)
---
>               Performance = 213.35 (27.19% of copy matmul, 20.09% of peak)
66c66
<               Performance = 151.74 (19.30% of copy matmul, 14.32% of peak)
---
>               Performance = 153.38 (19.55% of copy matmul, 14.44% of peak)
71,72c71,72
<             mu=32, nu=2, using 87.00% of L1 Cache
<               Performance = 105.94 (13.47% of copy matmul, 10.00% of peak)
---
>             mu=32, nu=2, using 89.00% of L1 Cache
>               Performance = 93.74 (11.95% of copy matmul,  8.83% of peak)
79,80c79,80
<       The best matmul kernel was ATL_mm_3dnow_100.c, written by Peter
Soendergaard
<       This gave performance of 3254.55MFLOPS (306.5227777751340f apparent
peak)
---
>       The best matmul kernel was ATL_smm_3dnow_100.c, written by Peter
Soendergaard
>       This gave performance of 3208.61MFLOPS (302.9027777751420f apparent
peak)
82c82
<                Performance = 889.87 (27.34% of copy matmul, 83.81% of
peak)
---
>                Performance = 886.31 (27.62% of copy matmul, 83.67% of
peak)
84c84
<                Performance = 966.71 (29.70% of copy matmul, 91.05% of
peak)
---
>                Performance = 964.42 (30.06% of copy matmul, 91.04% of
peak)
86c86
<                Performance = 882.56 (27.12% of copy matmul, 83.12% of
peak)
---
>                Performance = 879.05 (27.40% of copy matmul, 82.98% of
peak)
88c88
<                Performance = 936.38 (28.77% of copy matmul, 88.19% of
peak)
---
>                Performance = 940.73 (29.32% of copy matmul, 88.81% of
peak)
110,113c110,113
<               Performance = 208.95 ( 6.42% of copy matmul, 19.68% of peak)
<       gemvT : chose routine ATL_gemvT_mm.c written by R. Clint Whaley
<               Yunroll=0, Xunroll=0, using 100.00% of L1
<               Performance = 193.37 ( 5.94% of copy matmul, 18.21% of peak)
---
>               Performance = 208.00 ( 6.48% of copy matmul, 19.64% of peak)
>       gemvT : chose routine ATL_gemvT_2x16_1.c written by R. Clint Whaley
>               Yunroll=2, Xunroll=16, using 100.00% of L1
>               Performance = 159.27 ( 4.96% of copy matmul, 15.04% of peak)
117,119c117,119
<       ger : chose routine ATL_ger1_4x4_1.c written by R. Clint Whaley
<             mu=4, nu=4, using 94.00% of L1 Cache
<               Performance = 150.00 ( 4.61% of copy matmul, 14.13% of peak)
---
>       ger : chose routine ATL_ger1_1x4_0.c written by R. Clint Whaley
>             mu=1, nu=4, using 75.00% of L1 Cache
>               Performance = 137.59 ( 4.29% of copy matmul, 12.99% of peak)
127c127
<                  This gave performance of 794.41 (74.9927777751340f
apparent peak)
---
>                  This gave performance of 790.37 (74.4327777751420f
apparent peak)
129c129
<                Performance = 185.89 (23.40% of copy matmul, 17.55% of
peak)
---
>                Performance = 185.49 (23.47% of copy matmul, 17.47% of
peak)
131c131
<                Performance = 185.50 (23.35% of copy matmul, 17.51% of
peak)
---
>                Performance = 185.55 (23.48% of copy matmul, 17.47% of
peak)
133c133
<                Performance = 180.69 (22.75% of copy matmul, 17.06% of
peak)
---
>                Performance = 180.99 (22.90% of copy matmul, 17.05% of
peak)
135c135
<                Performance = 179.06 (22.54% of copy matmul, 16.90% of
peak)
---
>                Performance = 180.87 (22.88% of copy matmul, 17.03% of
peak)
155,160c155,160
<       gemvN : chose routine ATL_cgemvN_mm.c written by R. Clint Whaley
<               Yunroll=0, Xunroll=0, using 93.00% of L1
<               Performance = 129.62 (16.32% of copy matmul, 12.24% of peak)
<       gemvT : chose routine ATL_cgemvT_mm.c written by R. Clint Whaley
<               Yunroll=0, Xunroll=0, using 93.00% of L1
<               Performance = 121.36 (15.28% of copy matmul, 11.46% of peak)
---
>       gemvN : chose routine ATL_gemvN_SSE.c written by Camm Maguire
>               Yunroll=16, Xunroll=2, using 81.00% of L1
>               Performance = 392.09 (49.61% of copy matmul, 36.93% of peak)
>       gemvT : chose routine ATL_gemvT_SSE.c written by Camm Maguire
>               Yunroll=2, Xunroll=16, using 81.00% of L1
>               Performance = 396.76 (50.20% of copy matmul, 37.37% of peak)
164c164
<       ger : chose routine ATL_cger1_axpy.c written by R. Clint Whaley
---
>       ger : chose routine ATL_ger1_SSE.c written by Camm Maguire
166c166
<               Performance = 166.29 (20.93% of copy matmul, 15.70% of peak)
---
>               Performance = 187.47 (23.72% of copy matmul, 17.66% of peak)
173,174c173,174
<       The best matmul kernel was ATL_mm_3dnow_100.c, written by Peter
Soendergaard
<       This gave performance of 3498.94MFLOPS (329.5427777751340f apparent
peak)
---
>       The best matmul kernel was ATL_smm_3dnow_100.c, written by Peter
Soendergaard
>       This gave performance of 3476.51MFLOPS (328.1927777751420f apparent
peak)
176c176
<                Performance = 918.73 (26.26% of copy matmul, 86.53% of
peak)
---
>                Performance = 911.95 (26.23% of copy matmul, 86.09% of
peak)
178c178
<                Performance = 963.17 (27.53% of copy matmul, 90.71% of
peak)
---
>                Performance = 952.44 (27.40% of copy matmul, 89.91% of
peak)
180c180
<                Performance = 895.74 (25.60% of copy matmul, 84.36% of
peak)
---
>                Performance = 898.19 (25.84% of copy matmul, 84.79% of
peak)
182c182
<                Performance = 928.23 (26.53% of copy matmul, 87.42% of
peak)
---
>                Performance = 927.75 (26.69% of copy matmul, 87.58% of
peak)
203,204c203,204
<               Yunroll=0, Xunroll=0, using 75.00% of L1
<               Performance = 386.43 (11.04% of copy matmul, 36.40% of peak)
---
>               Yunroll=0, Xunroll=0, using 100.00% of L1
>               Performance = 388.95 (11.19% of copy matmul, 36.72% of peak)
206,207c206,207
<               Yunroll=0, Xunroll=0, using 75.00% of L1
<               Performance = 383.12 (10.95% of copy matmul, 36.08% of peak)
---
>               Yunroll=0, Xunroll=0, using 100.00% of L1
>               Performance = 383.38 (11.03% of copy matmul, 36.19% of peak)
212,213c212,213
<             mu=16, nu=1, using 75.00% of L1 Cache
<               Performance = 225.38 ( 6.44% of copy matmul, 21.23% of peak)
---
>             mu=16, nu=1, using 50.00% of L1 Cache
>               Performance = 433.18 (12.46% of copy matmul, 40.89% of peak)
222c222
< *      FINISHED ATLAS INSTALL OF SECTION 0-0-0 ON 07/02/2001 AT 23:07
*
---
> *      FINISHED ATLAS INSTALL OF SECTION 0-0-0 ON 07/02/2001 AT 20:43
*
--
M. Edward (Ed) Borasky, Chief Scientist, Borasky Research
http://www.borasky-research.net  http://www.aracnet.com/~znmeb
mailto:znmeb@borasky-research.com  mailto:znmeb@aracnet.com
Q: How do you get an elephant out of a theatre?
A: You can't. It's in their blood.
- References:
- RE: 3dnow
- From: R Clint Whaley <rwhaley@cs.utk.edu>