[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Full gemm Athlon results



Guys,

I've built the kernel Peter created by translating Julian's kernel to
gnu assembler and extending it to all precisions into a full library.
I include the timings below, which peak at almost 80% of peak for out-of-cache
operations on my 600Mhz Athlon classic.  To put some perspective on this,
feature that my 600Mhz Athlon classic, which you can buy out of a gum-ball
dispenser for $0.25, has the same asymptotic gemm peak as a 500Mhz ev6 running
Goto's assembler GEMM.

The only precision that this is not true for is ZGEMM, where the full gemm
gets much less performance than the kernel timer indicated.  There is still
the possibility I screwed up the build somehow, since this lone precision
shows such a disparity in kernel/full performance . . .

Wow.  I'll say it backwards: wow.
Clint

Results on 600Mhz Athlon classic  
3.3.10 : generated kernel
3.3.0  : generated DGEMM kernel (3.3.10 had Julian's dmm kernel)
3.3.11 : ATLAS + Peter's translation of Julian's kernel

              100    200    300    400    500    600    700    800    900   1000
           ====== ====== ====== ====== ====== ====== ====== ====== ====== ======

3.3.10 sMM  681.8  720.0  794.1  800.0  781.3  757.9  788.5  781.7  788.1  784.3
3.3.11 sMM  833.3  900.0  964.3  948.1 1000.0 1004.7  980.0 1003.9 1005.5 1000.0
3.3.10 sLU  319.5  480.7  545.1  587.4  616.4  634.5  662.1  668.7  665.2  686.8
3.3.11 sLU  333.8  513.3  598.5  655.2  693.4  707.3  748.9  775.0  770.8  793.1

3.3.0  dMM  545.5  640.0  627.9  656.4  694.4  720.0  672.5  691.9  704.3  696.9
3.3.11 dMM  714.3  800.0  900.0  914.3  862.1  919.1  927.0  922.5  940.6  925.9
3.3.0  dLU  291.6  393.3  462.5  479.8  496.8  484.8  537.5  550.0  551.8  569.4
3.3.11 dLU  303.4  426.6  535.5  608.4  628.0  625.3  643.4  668.7  683.9  693.9

3.3.10 cMM  822.2  800.0  830.8  839.3  833.3  822.9  829.0  829.1  828.4  823.0
3.3.11 cMM  870.6  914.3  981.8  930.9  952.4  960.0  952.8  950.3  968.8  958.1
3.3.10 cLU  476.5  638.8  678.4  757.8  757.0  738.0  755.5  749.8  759.1  774.9
3.3.11 cLU  476.5  651.8  691.4  710.4  757.0  767.5  801.9  807.5  809.7  838.3


3.3.10 zMM  657.8  711.1  675.0  682.7  671.1  683.0  659.6  668.2  661.2  663.9
3.3.11 zMM  722.0  711.1  696.8  721.1  719.4  732.2  731.7  732.7  741.6  735.3
3.3.10 zLU  411.2  491.4  544.8  559.0  555.2  533.04 605.4  595.9  607.2  621.4
3.3.11 zLU  405.7  499.1  553.2  568.4  584.4  587.4  609.5  612.0  626.8  645.4