[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: subscribe

To: R Clint Whaley <rwhaley@cs.utk.edu>
Subject: Re: subscribe
From: Robert van de Geijn <rvdg@cs.utexas.edu>
Date: Mon, 13 Nov 2000 15:32:09 -0600
CC: atlas-comm@cs.utk.edu, flame@cs.utexas.edu
Organization: The University of Texas at Austin
References: <200011131918.OAA27438@enterprise.cs.utk.edu>
Reply-To: rvdg@cs.utexas.edu
Sender: root@mail.cs.utexas.edu

R Clint Whaley wrote:

> Robert,
>
> >thanks.  Actually, I was just interested in getting
> >on your mailing list to see what gets said when you
> >are starting to evaluate our PIII kernels.  Just in
> >case some explanation on our part is necessary.
>
> I doubt we'll do much with your PIII kernels until you release the source.
> We're not much into using binary-only stuff, particularly since the GPL,
> important to a lot of ATLAS users, does not play well with binary-only codes.
> Just so you know, you timed against the ATLAS PII kernel, tuned for a 512K L2
> cache on your 256K PIII.  I can't say how large the improvement will be, but
> on a PII, we get get 73% of peak, and on a PIII with on-chip cache, we get
> 76%, so that should be the minimum gemm improvement.  It would be easier to
> evaluate if you had published GEMM numbers rather than LU; do you have gemm
> numbers?
>
> Cheers,
> Clint

Clint,

let me explain our philosophy:  we believe that one can get best performance
by assembly coding the inner-kernel, and very carefully wrapping code around
this inner-kernel.  If you do an "ls -l libitxaux.a" you will find that the
inner-kernel
code is absolutely tiny, and it will actually shrink considerably by the next
release.
The inner-kernel has a well-defined interface and can be easily written for
different platforms.  The rest of the code is based on solid theoretical results
and
is very clean.  This "rest" will be released under Gnu license in the near
future,
at which time we will see if other vendors become interested.  We have
experimented
with the ATLAS inner-kernel, and see a considerable performance improvement
when we wrap our outer-kernels around your inner-kernel.  To really do well,
you would have to change the functionality of your inner-kernel somewhat.

Our theory gives a clear indication of what optimal block sizes to pick at each
level,
so we don't need to run (very many) experiments for that.

Regards
Robert

References:
- Re: subscribe
  - From: R Clint Whaley <rwhaley@cs.utk.edu>

Prev by Date: Re: subscribe
Next by Date: Re: sgemm questions
Prev by thread: Re: subscribe
Next by thread: Re: subscribe
Index(es):
- Date
- Thread