[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ATLAS version 3.2 available
>> >Greetings! I have a little script which plays back the output of
>> >'make install arch=foo' on an arbitrary box of the same major
>> >architecture, skipping all timings. Clint, would it be possible for
>> >you to make available the output of the installs on the alphas and
>> >sparcs at your disposal, so I can configure the Debian package
>> Sure. What, exactly, do you need? Will the contents of
>> give you everything? Also, do you just want linux versions, or tru64 and
>> solaris installs as well (I have no access to linux+sparcs, for instance)?
>Well, right now I'm using the file output 'make install arch=foo' >out
>2>&1 + the contents of gemm/foo/res, gemv/foo/res and ger/foo/res.
>Anything you have that would not be too time consuming to generate
>would be great. Other OS's are fine, as long as the compilation
>commands will work under linux. I can handle a cc->gcc, for example.
I'm still not getting what it is that you need. What do you use this
information for? What information are you looking for? Here in a bit, we
will be making precompiled binaries, and we can probably capture that output
for you (I just realized that I only have a few installs of v3.2.0
around, most of them being pre-release versions) . . .
>OK, from what I've tested, I don't recall ever seeing a
>segfault/memory access problem, but rather an fpu stack problem. I've
>run all the routines under efence as my memory checker too.
I played efence with it a while back, when I was looking for a Linux
"purify" (since I do the majority of my development on Linux, and
purify is both monumentally expensive, and not available for linux), and
I was able to write a bunch of illegal memory access crap that efence didn't
catch that purify did. Checkergcc looked like it might be an option, but
it was hard to install and unbeleivably slow to run. It now looks like it
has been essentially abandoned: it still expects gcc 2.8.1, and a really
old Linux kernel. I tried to get it up and running this weekend, and failed.
Probably I could get it working if I installed one of the old 2.0 kernels . . .
Anyway, I guess one method is for me to reproduce the error under Windows, and
we could at least then experimentally determine what code tweak(s) would
make that error go away . . .
>>Anyway, I thought it possible you might want to eyeball these routines and see
>>if you are reading/writing out of bounds (for instance, the classic error
>>of not ending the prefetch loop an iteration ahead so you don't read off the
>>end of the array); if the routines have such errors, it'll need to be
>>errataed ASAP . . .
>Well, this could be the issue right here. I had thought that one
>could prefetch any address one wanted too, with the result being
>ignored if the process did not have access to the block. I thought I
>read this in the Intel docs, but I could be mistaken. In any case, I
>had checked this extensively under Linux, and prefetching beyond the
>end of the loop never caused any violation, even when running my own
>code outside the atlas tester-timers. I deliberately left a prefetch
>beyond the end of the inner loop in, as I'm often striding by more
>than one column, and this prefetch appeared to give a leg up on the
>next iteration on average. If any of these assumptions are not
>correct, it can certainly be removed relatively easily.
Hmm. Can you find the reference in the Intel docs again? I can certainly
see how convenient it would be to allow such prefetches, but am unsure
how you could do so with the memory security model (not that I really
understand it that well) . . .
Again, once I reproduce the error under Windows, we ought to be able to
experimentally determine the cause of at least this particular error . . .
>I'm combing through the output of these compiles more carefully now,
>and have come across another more alarming issue, or I'm probably just
>confused. I noticed that include/contrib has camm_dpa.h and
>camm_dpa.h0. camm_dpa.h appears to be a very old file that was geared
>toward a single routine, whereas what is called camm_dpa.h0 is the
>latest merged file used for all routines. camm_dpa.h0, though,
>doesn't seem to be included by ATL_gemv_ger.h. This causes several l2
>testing routines to segfault. I'm checking into this more carefully
>(i.e. I may be missing something else your doing), but it appears that
>h0 -> h is required for l2 to go into the lib for all routines
This looks like an error on my part. My guess is that it happened right
before the release, during all the confusion about errors in the L2,
where half of them you fixed, and the other half were a result of improper
installs on my part. At one point I regressed to an older version of your
stuff, and it looks like I never went back to the newest version of the
include file. This might be the problem on the windows as well, I guess,
but as to segfaulting, it passed the ATLAS testers at least for Linux . . .
I'll let you know as I find out more . . .