[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: An Intel SIMD drop in kernel


>As Clint knows, I've been working on an SIMD SGEMM for PIIIs. I've
>been attempting to write an drop in kernel for ATLAS, but I've found
>that the restriction of N=M=K<64 to be pretty incompatible with the
>fundamental algorithm I've been employing (maximal length dot

As we've discussed, that limit is a heuristic, not hard and fast . . .

>For our current research I've been tinkering with my old
>implementation of Emmerald (http://csl.anu.edu.au/~daa/research.html)
>in an attempt to boost performance. The result is a new version of
>Emmerald roughly 1.1 times faster than the old, with peak Mflops
>1.86 times the clock rate. It uses Atlas for some of it's smaller
>clean up cases.
>Getting to the point, I'd like to offer it as a drop in SGEMM for
>ATLAS. I am reasonably confident that it is not possible to
>write a user kernel which achieves similar performance (having tried
>and failed!).
>I'd like to know the best way to package up Emmerald to make it easy
>for Clint to integrate as he sees fit.

As always, I am running behind here.  I've been trying to get a new
developer release ready, with the contributions I have so far (goto's
GEMM and Camm's SSI GEMV/GER), but keep getting sidetracked.  Part
of the update to a new developer release would be updating the paper
to discuss drop-in gemms.  Since I'm moving so slowly, you might find
it helpful to scope the quick and dirty job I did to incorperate
GOTO's drop-in GEMM, available at:

The relevent code is in ATLAS/src/blas/gemm/GOTO, and the API you need to
provide ATLAS is shown in ATL_usergemm.c.  Essentially, you provide the
API, a Makefile, and the code, and it should be set . . .

I'll send mail when I have the real stuff ready.