[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 3.3.10

Hi Clint!  OK, quite a few items, but just a quick email for now.

1) extract problems -- warnings about many lines being too long, and
   with a fresh cvs snapshot this morning:
make[1]: *** No rule to make target `/scratch/camm/cvs/AtlasBase/TexDoc/atlas_contrib.ps', needed by `atlas_contrib.ps'.  Stop.
make[1]: Leaving directory `/scratch/camm/cvs/ATLAS/doc'
make: *** [ATLAS/doc] Error 2

2) small patch to cblas tester submitted via sourceforge.

3) assembler labels:  Here is a small example. In a gcc __asm__, say
   we have the following:

#undef KB
#define KB ( 1 / DIV )
#include "camm_pipe3.h"



This is pseudocode written in cpp macro, but you can see what it does
-- it tests that register eax has bit 4 set, meaning that interpreted
as a floating point address it is either aligned to 1 or 3 mod 4
floats.  If not, it jumps to label a2, which handles the alignment to
2 mod 4 possibility.  Otherwise, it loads a block processing one
float, incrementing by one, and then hitting the label.

The labels I've been using are defined like this:

#define lab(a_)     "\n" __FUNCTION__  "_" str(a_) ":\n\t"

so all labels have the function name prepended.  Here is what this
looks like in assembler:

 804a026:	a9 04 00 00 00       	test   $0x4,%eax
 804a02b:	74 14                	je     804a041 <ATL_USCAL_a2>
 804a02d:	f3 0f 10 48 00       	movss  0x0(%eax),%xmm1
 804a032:	f3 0f 59 c8          	mulss  %xmm0,%xmm1
 804a036:	f3 0f 11 48 00       	movss  %xmm1,0x0(%eax)
 804a03b:	83 c0 04             	add    $0x4,%eax
 804a03e:	83 ea 01             	sub    $0x1,%edx

0804a041 <ATL_USCAL_a2>:

So my problem is that if this function is inlined at more than one
place in a library or executable, the label ATL_USCAL_a2 will be
defined twice, and the code won't assemble.  Your idea of something

#undef camm_label
#define camm_label __FILE__ ## _ ## __LINE__

would work, (then I would call lab(camm_label ## a2)), but there might
be something more elegant than having to do this before every user
function call.

4) I notice the case dsc files are not in our own little section of
   the AtlasBase cvs tree.  How should we provide these to you?  If
   the p4 timing issue we discussed previously still exists, then I
   think we need to add a line hardcoding values for n,m,k for the SSE
   dgemm I provided.  Also, have several s and d level1 kernels I'd
   like to upload.

Take care,

R Clint Whaley <rwhaley@cs.utk.edu> writes:

> Guys,
> 3.3.10 is out.  There are a bunch of bug fixes.  If you are using a developer
> release on an IA64, you need to go to 3.3.10, as I found an error in the
> complex GEMM that is present in the old code.  Frankly, anyone using a
> non-x86 dev release would be well advised to upgrade.
> By the way, if you do a sanity test on the IA64, it'll show failures in a
> level 1 routine.  This appears to be a g77 error: it happens using F77 BLAS,
> and goes away if you compile the tester with no optimization . . .
> There are now arch defaults for most archs, with the exception of no UltraSparc.
> I have applied a patch from Goto that should speed up Linux/ev6.  If anyone
> ever builds the ev5/6 blas without using Goto's stuff, that speed is a lot
> higher now as well.
> If all goes well, I should freeze the kernel submission sometime next week,
> so if you've got something, getting it in quick is the right idea . . .
> Cheers,
> Clint

Camm Maguire			     			camm@enhanced.com
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah