[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

error in M cleanup



Camm,

The good news is that using your new SSE2 stuff I'm now getting a complete
DGEMM (not just mmcase) of roughly 2Gflop.  The bad news is that it still
doesn't always get the right answer.  In particular there appears to be
an error in the M cleanup.  For any i such that M = 2 + 4i, it produces
the wrong answer.  Here's some examples of making the tester fail:

>> make mmutstcase mmrout=../CASES/ATL_gemm_SSE.c mb=0 nb=56 M=2 N=56 K=56
>> make mmutstcase mmrout=../CASES/ATL_gemm_SSE.c mb=0 nb=56 M=10 N=56 K=56

Seems like an error in cleanup of a 4 unrolled loop, but I obviously don't
know.  Can you confirm it's an error, and not just something I'm doing wrong?

To give some good news with all this, I include timings below comparing the
new SSE2 DGEMM versus the x86 FPU implementation.

Thanks,
Clint

             100    200    300    400    500    600    700    800    900   1000
          ====== ====== ====== ====== ====== ====== ====== ====== ====== ======
P4   x86  1025.6 1194.0 1181.2 1238.7 1209.7 1234.3 1247.3 1264.2 1276.8 1242.2
P4  SSE2  1351.4 1837.0 1944.0 1828.6 1851.9 1878.3 1960.0 1932.1 1944.0 2000.0

            1200   1400   1600   1800   2000   2200   2400   2600   2800   3000
          ====== ====== ====== ====== ====== ====== ====== ====== ====== ======
P4   x86  1256.7 1250.1 1254.5 1262.3 1261.8 1258.6 1261.3 1261.7 1262.0 1260.5
P4  SSE2  1986.2 1974.1 1974.0 1970.3 1990.0 1999.6 1991.9 1991.6 2002.0 1974.4