[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



>Hi Clint!  Sorry I've been out of touch recently.  Did you get those

Yep, I got the files; I finally had the sysadmins set your directory
so I could read it (since you pointed me at files in there, I was pretty
sure that was OK with you).

As might be expected, there were some peculiarities.  First, the best 
performance was given by ATL_dgemm_SSE_1x1xkb.c, not ATL_dgemm_SSE_1x4.c.
Also, ATL_dgemm_SSE_1x1xkb.c didn't work for cleanup (got wrong answer
for non-multiple of NB).  Using an NB of 80 (and keeping N a multiple of
80), I was able to build a complete dgemm getting a little over 2.1Gflop
on torc19.  However, that large of an NB used up too much memory, causing
swapping very early, so I dropped back to NB=56, but didn't build the
complete gemm there, since cleanup wasn't rolling.

The interesting thing is that if the 2.1Gflop holds up (as I think it
will), the P4 will overtake the Athlon on the double precision flops/$ . . .