*Subject*: Latest Athlon test results*From*: "M. Edward Borasky" <znmeb@aracnet.com>*Date*: Tue, 3 Jul 2001 18:35:22 -0700

"diff" of 3.2.1 vs. 3.3.1 SUMMARY.LOG: < is Atlas 3.2.1, > is Atlas 3.3.1 M. Edward Borasky, Borasky Research, 3 July 2001 Atlas options: 3DNow yes, all others defaults Environment: 1.333 GHz Athlon Thunderbird, 512 MB DDR RAM *Stock* Red Hat Linux 7.1, gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-81) ---------------------------------------------------------------------------- --- 5c5 < * BEGAN ATLAS INSTALL OF SECTION 0-0-0 ON 07/02/2001 AT 22:00 * --- > * BEGAN ATLAS INSTALL OF SECTION 0-0-0 ON 07/02/2001 AT 19:29 * 19c19 < Apparent peak=1059.34MFLOPS --- > Apparent peak=1061.83MFLOPS 22c22 < Apparent peak=1061.76MFLOPS --- > Apparent peak=1059.31MFLOPS 33c33 < This gave performance of 786.35 (74.2327777751340f apparent peak) --- > This gave performance of 784.60 (73.8927777751420f apparent peak) 35c35 < Performance = 171.72 (21.84% of copy matmul, 16.21% of peak) --- > Performance = 172.81 (22.03% of copy matmul, 16.27% of peak) 37c37 < Performance = 171.06 (21.75% of copy matmul, 16.15% of peak) --- > Performance = 171.82 (21.90% of copy matmul, 16.18% of peak) 39c39 < Performance = 780.63 (99.27% of copy matmul, 73.69% of peak) --- > Performance = 777.79 (99.13% of copy matmul, 73.25% of peak) 41c41 < Performance = 164.40 (20.91% of copy matmul, 15.52% of peak) --- > Performance = 165.47 (21.09% of copy matmul, 15.58% of peak) 63c63 < Performance = 244.32 (31.07% of copy matmul, 23.06% of peak) --- > Performance = 213.35 (27.19% of copy matmul, 20.09% of peak) 66c66 < Performance = 151.74 (19.30% of copy matmul, 14.32% of peak) --- > Performance = 153.38 (19.55% of copy matmul, 14.44% of peak) 71,72c71,72 < mu=32, nu=2, using 87.00% of L1 Cache < Performance = 105.94 (13.47% of copy matmul, 10.00% of peak) --- > mu=32, nu=2, using 89.00% of L1 Cache > Performance = 93.74 (11.95% of copy matmul, 8.83% of peak) 79,80c79,80 < The best matmul kernel was ATL_mm_3dnow_100.c, written by Peter Soendergaard < This gave performance of 3254.55MFLOPS (306.5227777751340f apparent peak) --- > The best matmul kernel was ATL_smm_3dnow_100.c, written by Peter Soendergaard > This gave performance of 3208.61MFLOPS (302.9027777751420f apparent peak) 82c82 < Performance = 889.87 (27.34% of copy matmul, 83.81% of peak) --- > Performance = 886.31 (27.62% of copy matmul, 83.67% of peak) 84c84 < Performance = 966.71 (29.70% of copy matmul, 91.05% of peak) --- > Performance = 964.42 (30.06% of copy matmul, 91.04% of peak) 86c86 < Performance = 882.56 (27.12% of copy matmul, 83.12% of peak) --- > Performance = 879.05 (27.40% of copy matmul, 82.98% of peak) 88c88 < Performance = 936.38 (28.77% of copy matmul, 88.19% of peak) --- > Performance = 940.73 (29.32% of copy matmul, 88.81% of peak) 110,113c110,113 < Performance = 208.95 ( 6.42% of copy matmul, 19.68% of peak) < gemvT : chose routine ATL_gemvT_mm.c written by R. Clint Whaley < Yunroll=0, Xunroll=0, using 100.00% of L1 < Performance = 193.37 ( 5.94% of copy matmul, 18.21% of peak) --- > Performance = 208.00 ( 6.48% of copy matmul, 19.64% of peak) > gemvT : chose routine ATL_gemvT_2x16_1.c written by R. Clint Whaley > Yunroll=2, Xunroll=16, using 100.00% of L1 > Performance = 159.27 ( 4.96% of copy matmul, 15.04% of peak) 117,119c117,119 < ger : chose routine ATL_ger1_4x4_1.c written by R. Clint Whaley < mu=4, nu=4, using 94.00% of L1 Cache < Performance = 150.00 ( 4.61% of copy matmul, 14.13% of peak) --- > ger : chose routine ATL_ger1_1x4_0.c written by R. Clint Whaley > mu=1, nu=4, using 75.00% of L1 Cache > Performance = 137.59 ( 4.29% of copy matmul, 12.99% of peak) 127c127 < This gave performance of 794.41 (74.9927777751340f apparent peak) --- > This gave performance of 790.37 (74.4327777751420f apparent peak) 129c129 < Performance = 185.89 (23.40% of copy matmul, 17.55% of peak) --- > Performance = 185.49 (23.47% of copy matmul, 17.47% of peak) 131c131 < Performance = 185.50 (23.35% of copy matmul, 17.51% of peak) --- > Performance = 185.55 (23.48% of copy matmul, 17.47% of peak) 133c133 < Performance = 180.69 (22.75% of copy matmul, 17.06% of peak) --- > Performance = 180.99 (22.90% of copy matmul, 17.05% of peak) 135c135 < Performance = 179.06 (22.54% of copy matmul, 16.90% of peak) --- > Performance = 180.87 (22.88% of copy matmul, 17.03% of peak) 155,160c155,160 < gemvN : chose routine ATL_cgemvN_mm.c written by R. Clint Whaley < Yunroll=0, Xunroll=0, using 93.00% of L1 < Performance = 129.62 (16.32% of copy matmul, 12.24% of peak) < gemvT : chose routine ATL_cgemvT_mm.c written by R. Clint Whaley < Yunroll=0, Xunroll=0, using 93.00% of L1 < Performance = 121.36 (15.28% of copy matmul, 11.46% of peak) --- > gemvN : chose routine ATL_gemvN_SSE.c written by Camm Maguire > Yunroll=16, Xunroll=2, using 81.00% of L1 > Performance = 392.09 (49.61% of copy matmul, 36.93% of peak) > gemvT : chose routine ATL_gemvT_SSE.c written by Camm Maguire > Yunroll=2, Xunroll=16, using 81.00% of L1 > Performance = 396.76 (50.20% of copy matmul, 37.37% of peak) 164c164 < ger : chose routine ATL_cger1_axpy.c written by R. Clint Whaley --- > ger : chose routine ATL_ger1_SSE.c written by Camm Maguire 166c166 < Performance = 166.29 (20.93% of copy matmul, 15.70% of peak) --- > Performance = 187.47 (23.72% of copy matmul, 17.66% of peak) 173,174c173,174 < The best matmul kernel was ATL_mm_3dnow_100.c, written by Peter Soendergaard < This gave performance of 3498.94MFLOPS (329.5427777751340f apparent peak) --- > The best matmul kernel was ATL_smm_3dnow_100.c, written by Peter Soendergaard > This gave performance of 3476.51MFLOPS (328.1927777751420f apparent peak) 176c176 < Performance = 918.73 (26.26% of copy matmul, 86.53% of peak) --- > Performance = 911.95 (26.23% of copy matmul, 86.09% of peak) 178c178 < Performance = 963.17 (27.53% of copy matmul, 90.71% of peak) --- > Performance = 952.44 (27.40% of copy matmul, 89.91% of peak) 180c180 < Performance = 895.74 (25.60% of copy matmul, 84.36% of peak) --- > Performance = 898.19 (25.84% of copy matmul, 84.79% of peak) 182c182 < Performance = 928.23 (26.53% of copy matmul, 87.42% of peak) --- > Performance = 927.75 (26.69% of copy matmul, 87.58% of peak) 203,204c203,204 < Yunroll=0, Xunroll=0, using 75.00% of L1 < Performance = 386.43 (11.04% of copy matmul, 36.40% of peak) --- > Yunroll=0, Xunroll=0, using 100.00% of L1 > Performance = 388.95 (11.19% of copy matmul, 36.72% of peak) 206,207c206,207 < Yunroll=0, Xunroll=0, using 75.00% of L1 < Performance = 383.12 (10.95% of copy matmul, 36.08% of peak) --- > Yunroll=0, Xunroll=0, using 100.00% of L1 > Performance = 383.38 (11.03% of copy matmul, 36.19% of peak) 212,213c212,213 < mu=16, nu=1, using 75.00% of L1 Cache < Performance = 225.38 ( 6.44% of copy matmul, 21.23% of peak) --- > mu=16, nu=1, using 50.00% of L1 Cache > Performance = 433.18 (12.46% of copy matmul, 40.89% of peak) 222c222 < * FINISHED ATLAS INSTALL OF SECTION 0-0-0 ON 07/02/2001 AT 23:07 * --- > * FINISHED ATLAS INSTALL OF SECTION 0-0-0 ON 07/02/2001 AT 20:43 * -- M. 