[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SSE-enabled level 2

>1) There still seems to be some noticeable hit for symv wrt gemv, most
>   probably due to the very different data access patterns, as I
>   understand it.  Is there a way around this?

There is a way around this, but it needs to be applied to the ATLAS install,
not your code.  The symmetric routines (SYMV, and to a lesser extent SYR2)
are special in that they can reuse $A$, the dominant cost of the algorithm,
in L1.  Taking SYMV as an example, SYMV is built by calling GEMV twice: once
with Notrans, and once with Trans.  ATLAS blocks the operation so that in the
the second call, A comes from L1.

Now, your code is optimized for A coming from L2 or main.  When A is already
in L1, I'm guessing the prefetch becomes a pure overhead, and slows you down.

Right now, ATLAS uses the fastest individual gemvN and gemvT for SYMM.  What
we *should* do, is take the fastest gemvT, and then retime all the gemvN's
as used by SYMV, and use that in SYMV.  We've known about this for a long
time, it's just a question of finding the time to modify the install process
appropriately.  I guess if I had any user's begging for faster SYMV times,
it would be up higher on the do-it queue.  Note that this would be a general
speedup for all SYMV; your SSE-stuff would just get more benefit than usual.

This does bring up an interesting, if probably unusable, point.  The main or
L2-cache optimized L2BLAS are *worse* for a guy keeping things in L1.  So
users with tight loops and memory access will not thank us for adding prefetch.

>2) Any idea of what a new SSE sgemm based sgemv would do?  Gemm based
>   routines won out in the original atlas, if memory serves.

Won't help.  Could only be used when M = KB.  You obviously can't afford a
data copy for a N^2 algorithm like GEMV.  The reason GEMM used to win is
not because GEMM is a good way to do GEMV, but because the generated GEMM was
so much better than the hand-implementations we had . . .