[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Altivec and ATLAS
Unaligned C is okay - I've written unaligned load and store code for C,
and it results in about a 5 or 10% performance penalty. My
Altivec-based single-precision L1 matmul is getting in the neighborhood
of 1.2 - 1.3 Gflops on my 533 Mhz G4. I can probably make it better
than that (scalar code gets about 670 Mflops).
The G4 has prefetch instructions as well, which may improve the copy
performance - right now I have no idea where in ATLAS these instructions
should go though!
On Thursday, June 7, 2001, at 11:21 AM, R Clint Whaley wrote:
>> I have a question about the L1 copy matmul. Altivec code generally
>> requires data to be aligned on 128-bit boundaries. One can work with
>> unaligned data but it requires extra work. Is it possible to guarantee
>> that the copied version of the matrices in ATLAS are 128-bit aligned
>> even if the original matrices aren't? Which portion of the code should
>> I look at? The Altivec extensions include 128-bit aligned versions of
>> malloc and calloc, so perhaps I can just do a one or two line
> ATLAS already guarantees 128 bit alignment for everything except
> This was put in during the last release for SSE and 3DNow! support.
> is the relevant thread (note that 16 byte == 128 bit for discussion):
> Note that this is the alignment of the input matrices A and B *ONLY*, C
> no guaranteed alignment (C is often passed in by the user and not
> by ATLAS). Is A and B enough, or do you believe you will need C
> aligned as
> well (http://www.netlib.org/atlas/atlas-comm/msg00274.html gives a brief
> overview of why copying C can be too costly)?
Nicholas Coult, Ph.D., web: http://melby.augsburg.edu/~coult
Assistant Professor, Department of Mathematics, Augsburg College
email@example.com, phone: (612) 330-1064 office: Science Hall 137B