[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ATLAS version 3.2 available

Hi Clint!

R Clint Whaley <rwhaley@cs.utk.edu> writes:

> Camm,
> >> >Greetings!  I have a little script which plays back the output of
> >> >'make install arch=foo' on an arbitrary box of the same major
> >> >architecture, skipping all timings.  Clint, would it be possible for
> >> >you to make available the output of the installs on the alphas and
> >> >sparcs at your disposal, so I can configure the Debian package
> >> >accordingly? 
> >> 
> >> Sure.  What, exactly, do you need?  Will the contents of
> >>    ATLAS/bin/<arch>/INSTALL_LOG
> >> give you everything?  Also, do you just want linux versions, or tru64 and
> >> solaris installs as well (I have no access to linux+sparcs, for instance)?
> >> 
> >
> >Well, right now I'm using the file output 'make install arch=foo' >out
> >2>&1 + the contents of gemm/foo/res, gemv/foo/res and ger/foo/res.
> >Anything you have that would not be too time consuming to generate
> >would be great.  Other OS's are fine, as long as the compilation
> >commands will work under linux.  I can handle a cc->gcc, for example. 
> I'm still not getting what it is that you need.  What do you use this
> information for?  What information are you looking for?  Here in a bit, we
> will be making precompiled binaries, and we can probably capture that output
> for you (I just realized that I only have a few installs of v3.2.0
> around, most of them being pre-release versions) . . .

Well, to be brief, I have a little script which enables subarch
cross-compilation without reference to another machine.  If you're
interested, I can post the script.  Basically, if you could make the
following modifications to your build processes and post the little tar
files somewhere, that would be *very* helpful and much appreciated!

Replace 'make install arch=foo' with

1. make install arch=foo >/tmp/foo.out 2>&1
2. mkdir /tmp/mm ; mkdir /tmp/mv ; mkdir /tmp/r1;
3. cp tune/blas/gemm/foo/res/* /tmp/mm ;
4. cp tune/blas/gemv/foo/res/* /tmp/mv ;
5. cp tune/blas/ger/foo/res/* /tmp/r1 ;
6. tar zcvf /tmp/foo.tgz /tmp/foo.out /tmp/mm /tmp/mv /tmp/r1
7. rm -rf /tmp/foo.out /tmp/mm /tmp/mv /tmp/r1

> >OK, from what I've tested, I don't recall ever seeing a
> >segfault/memory access problem, but rather an fpu stack problem.  I've
> >run all the routines under efence as my memory checker too.  
> I played efence with it a while back, when I was looking for a Linux
> "purify" (since I do the majority of my development on Linux, and
> purify is both monumentally expensive, and not available for linux), and
> I was able to write a bunch of illegal memory access crap that efence didn't
> catch that purify did.  Checkergcc looked like it might be an option, but
> it was hard to install and unbeleivably slow to run.  It now looks like it
> has been essentially abandoned: it still expects gcc 2.8.1, and a really
> old Linux kernel.  I tried to get it up and running this weekend, and failed.
> Probably I could get it working if I installed one of the old 2.0 kernels . . .
> Anyway, I guess one method is for me to reproduce the error under Windows, and
> we could at least then experimentally determine what code tweak(s) would
> make that error go away . . .
> >>Anyway, I thought it possible you might want to eyeball these routines and see
> >>if you are reading/writing out of bounds (for instance, the classic error
> >>of not ending the prefetch loop an iteration ahead so you don't read off the
> >>end of the array);  if the routines have such errors, it'll need to be
> >>errataed ASAP . . .
> >
> >Well, this could be the issue right here.  I had thought that one
> >could prefetch any address one wanted too, with the result being
> >ignored if the process did not have access to the block.  I thought I
> >read this in the Intel docs, but I could be mistaken.  In any case, I
> >had checked this extensively under Linux, and prefetching beyond the
> >end of the loop never caused any violation, even when running my own
> >code outside the atlas tester-timers.  I deliberately left a prefetch
> >beyond the end of the inner loop in, as I'm often striding by more
> >than one column, and this prefetch appeared to give a leg up on the
> >next iteration on average.  If any of these assumptions are not
> >correct, it can certainly be removed relatively easily.
> Hmm.  Can you find the reference in the Intel docs again?  I can certainly
> see how convenient it would be to allow such prefetches, but am unsure
> how you could do so with the memory security model (not that I really 
> understand it that well) . . .

OK, this is from 'Intel Architecture Software Developer's Manual,


The architectural implementation of this instruction in no way effects
(sic) the function of a program.  Locality hints are processor
implementation-dependent, and can be overloaded or ignored by a
processor implementation.  The prefetch instruction does not cause any
exceptions (except for code breakpoints), does not affect program
behavior, and may be ignored by the processor implementation ...
Numeric Exceptions - none, Protected Mode Exceptions -- none, Real
Address Mode Exceptions -- none, Virtual 8086 Mode Exceptions -- none."

This contrasts with, for example, mulss or anything else that actually
reads the contents of memory, where the instruction is equipped with
many memory exceptions, including illegal addresses, page faults,
etc.  I guess I just assumed that these exceptions were the mechanism
whereby segfaults were passed to the OS.  This could, of course as
always, be mistaken.

> Again, once I reproduce the error under Windows, we ought to be able to
> experimentally determine the cause of at least this particular error . . .
> >I'm combing through the output of these compiles more carefully now,
> >and have come across another more alarming issue, or I'm probably just
> >confused.  I noticed that include/contrib has camm_dpa.h and
> >camm_dpa.h0.  camm_dpa.h appears to be a very old file that was geared
> >toward a single routine, whereas what is called camm_dpa.h0 is the
> >latest merged file used for all routines.  camm_dpa.h0, though,
> >doesn't seem to be included by ATL_gemv_ger.h.  This causes several l2
> >testing routines to segfault.  I'm checking into this more carefully
> >(i.e. I may be missing something else your doing), but it appears that
> >h0 -> h is required for l2 to go into the lib for all routines
> This looks like an error on my part.  My guess is that it happened right
> before the release, during all the confusion about errors in the L2, 
> where half of them you fixed, and the other half were a result of improper
> installs on my part.  At one point I regressed to an older version of your
> stuff, and it looks like I never went back to the newest version of the
> include file.  This might be the problem on the windows as well, I guess,
> but as to segfaulting, it passed the ATLAS testers at least for Linux . . .

Did you get 3.2.0 to compile in all the l2 under Linux?  On my box, at
least some routines were passed over (ger specifically, if I recall)
because the tester segfaulted.  I can possibly reproduce the output if

> I'll let you know as I find out more . . .

Great!  Please let me know if you need the end prefetch removed. 

Take care,

> Cheers,
> Clint

Camm Maguire			     			camm@enhanced.com
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah