[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ATHLON performance tips

Hello Peter,

> I have been trying to use some of your tricks to speed up ATLAS on the
> athlon (and on my own K6-2), so far I have only gotten a speedup on my
> K6-2, but I am optimistic :-)

As far as I remember I have achieved an speedup of 20% in an Athlon optimized DGEMM kernel
only by using the code padding tricks. I don't know if the K6 series requires the same
code pattern like Athlons, but keeping the intructions short is a good strategy on every
x86 CPU. I will try to compile an example program tomorrow which will demonstrate a speedup
on Athlons.

> I would like to ask you (or anyone else) if you known an easy way to force
> an assembly instruction to use a specific adressing method. For example if
> I load the first element with a offset of 0 by:
>        movq   (%ecx),%mm6
> the instruction will be 3 bytes long, however all subsequent instructions
> that load data with an offset bigger than 0:
>        movq 0x8(%ecx),%mm4
>  will be 4 bytes long. How do I most easily make
> the first instruction 4 bytes long. Putting a rep prefix in front is fun
> and works but using the same adress mode would be the proper way to do it.
> I have the same problem if I want all instructions to use the "offset
> bigger than 128" adress mode, even though the offset is lower than 128.

I doubt that there is any elegant solution because one must tell the assembler which
instruction encoding it should use. Usually an assembler selects the shortest instruction possible.
The only solution I see is to encode the instructions by hand, but this would also make a 
a "db xxx0h" necessary. I don't know if MASM or GAS have any bulit-in mechanisms that can influence
instruction encoding. NASM, which I use, has none...

> bigger than 128" adress mode, even though the offset is lower than 128.

BTW, please don't forget that you can use offsets from -128 to +128, not only from 0 to 128.