[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
R Clint Whaley wrote:
> >thanks. Actually, I was just interested in getting
> >on your mailing list to see what gets said when you
> >are starting to evaluate our PIII kernels. Just in
> >case some explanation on our part is necessary.
> I doubt we'll do much with your PIII kernels until you release the source.
> We're not much into using binary-only stuff, particularly since the GPL,
> important to a lot of ATLAS users, does not play well with binary-only codes.
> Just so you know, you timed against the ATLAS PII kernel, tuned for a 512K L2
> cache on your 256K PIII. I can't say how large the improvement will be, but
> on a PII, we get get 73% of peak, and on a PIII with on-chip cache, we get
> 76%, so that should be the minimum gemm improvement. It would be easier to
> evaluate if you had published GEMM numbers rather than LU; do you have gemm
let me explain our philosophy: we believe that one can get best performance
by assembly coding the inner-kernel, and very carefully wrapping code around
this inner-kernel. If you do an "ls -l libitxaux.a" you will find that the
code is absolutely tiny, and it will actually shrink considerably by the next
The inner-kernel has a well-defined interface and can be easily written for
different platforms. The rest of the code is based on solid theoretical results
is very clean. This "rest" will be released under Gnu license in the near
at which time we will see if other vendors become interested. We have
with the ATLAS inner-kernel, and see a considerable performance improvement
when we wrap our outer-kernels around your inner-kernel. To really do well,
you would have to change the functionality of your inner-kernel somewhat.
Our theory gives a clear indication of what optimal block sizes to pick at each
so we don't need to run (very many) experiments for that.