[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: developer release 3.1.2


I've finished a complex level2 SSE for atlas.  It can be found at


All seems to work well, with the exception of the inlining issue I
mentioned earlier.  I have a work around now that has been successfully
tested with gcc.  The info pages for gcc describe several conditions
under which the compiler will not inline a function, one of which is
the presence of a nested function declaration.  I've defined such a
dummy declaration via a cpp macro called NO_INLINE, and used it in the
functions I don't want inlined.  If the compiler isn't gcc, NO_INLINE
is empty. 

This whole procedure may not be necessary if I rebuild the whole tree
with the same compilation settings, but I haven't tested that yet.  I
also have no way of testing other compilers.  Any other suggestions
most appreciated.

Rough timings on a PIII 450Mhz:

         SSE          Standard ATLAS

cgemvT:  400 MFLOPS   160 MFLOPS
cgemvN:  380 MFLOPS   190 MFLOPS
cger:    200 MFLOPS   100 MFLOPS

Take care,

R Clint Whaley <rwhaley@cs.utk.edu> writes:

> Guys,
> The new developer release is out, and available from the usual site.  This
> one includes Camm's SSE-enabled SGER and SGEMV, plus various upgrades
> in config and mvsearch to support it. Also, fixes the reported errors
> in Level 1 C blas, and linking problems with Level 2 packed BLAS. 
> Camm, let me know if what I've done with your stuff is OK or not.
> Thanks,
> Clint

Camm Maguire			     			camm@enhanced.com
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah