[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: developer release 3.1.2
R Clint Whaley <email@example.com> writes:
> >1) It looks as though the prefetch assisted double precision level2
> > will max out at about 50% + standard atlas. Transpose: 94 ->140,
> > Notrans: 67 -> 97. dger remains to be completed. Basically, I
> > just looked at the atlas compiled assembler, and added prefetch.
> > So the rule of thumb appears to be SIMD +50%, prefetch +50%,
> > both +100%.
> 50% from prefetch alone is quite nice; do you have a way of packaging
> the assembler in a C file, or will I need to modify the makefiles to
> support assembler? I wouldn't be surprised to see a greater gain
> for double precision complex . . .
I have a header/c-file setup like the single and complex just about
ready. dger only shows about +25%, due to the extreme cache
pollution, I suppose.
> >3) I do hope we can find a solution for distributed atlas binaries. I
> > know the idea is for the user to build atlas on each platform they
> > will use, and that the current tree will skip any routines which
> > fail to compile on a given platform, (i.e. if there is no SIMD
> > support). Serious users will do this no doubt.
> I don't know much about the .deb format, but I thought I read once that
> it could run scripts. No chance you can run a simple example SIMD program,
> and install SIMD-enabled lib when it works, and the PII-style when it does
> not, I guess?
Good idea. Thanks!
> >4) Do we have an idea as to when we might want to release a
> > SIMD-enhanced atlas, say in Debian?
> You guys can of course release any time you wish. Antoine and I are trying
> to get the next official release of ATLAS ready by the end of the summer,
> but we'll see if we get it rolling or not. There are two main additions
> for this next big release: the opening up of the kernels for outside
> contribution, and the addition of SMP support via pthreads. As soon as we
> get these guys in and tested, we'll have a release.
> For the first phase of the work, Antoine worked seperately on threading while
> I worked on the infrastructure necessary to open up the kernels. We have
> just started the process of bringing the work back together so it all is
> in one package. When we get something working at all reliably, we'll have
> developer releases that include threading, so you should be able to follow,
> at least roughly, the progress to the next ATLAS release.
> After the release, I hope to formalize a bit more the developer/regular
> releases. I certainly plan to keep both around: having a developer release
> with the newest stuff that we have is certainly a boon to people working on
> the package, and allows everyone to get stuff used much quicker than we
> can give out with the "stable" releases . . .
> >Is there any word on the most important level3 front?
> I haven't heard from the emmerald guys since they said that giving GEMM to
> us as a kernel provided too poor of a performance. Since it apparently
> beat our current kernel, I disagree, but you can't release code you don't
> have :) Last I heard they were working on a complete GEMM instead . . .
> As a general rule, if I hear anything important on the developer front,
> I'll CC to the list . . .
This sounds somewhat like the blocking issues we were discussing with
the level2 sometime back. In that case, while there certainly is a
hit, it appears to be small for reasonable routines. I suppose I'm
persuaded of the virtue of an all-purpose kernel, though I don't
exactly know why :-). Seriously, though, I'm persuaded by the virtue
of the quality of atlas as a whole.
> I agree that the Level 3 is the most important for performance reasons, but
> to me the main thing is to have the ability to contribute in the stable
> package; I think particular contributions will come later. So far, I have your
> stuff, and Goto's gemm: these are already significant proof-of-concept,
> and once people see the power of this building block approach, I hope
> that people will fill in the pieces we don't have . . .
You had mentioned trying a gemv based gemm for the complex in an
earlier message. As a lark, I just tried that for the single
precision. I seem to get about as good as the standard atlas gemm
(~350 MFLOPS, sgemv was ~ 250 MFLOPS), but the mmsearch did not pick
my routine. You had also indicated that this strategy was not the best
way to go, most likely. Could you elaborate a bit on what would
likely be needed beyond a loop over gemv? It seems as though one
cannot count on longer contiguous vectors than kb no matter what one
> That's why we will have an ATLAS release as soon as Antoine and I get our
> stuff together, regardless of what outside contribution we have in place
> at that time: the quicker we get this stuff in front of all our users
> (remember, right now the only people who know about the developer release
> and kernel contribution are a few people I sent mail to, and those who
> have stumbled over the web page somehow) the quicker the holes in our
> coverage will fill up . . .
Camm Maguire firstname.lastname@example.org
"The earth is but one country, and mankind its citizens." -- Baha'u'llah