[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
single precision on Sparcs
>> From firstname.lastname@example.org Tue Jun 26 15:24:54 2001
>> QUESTION 2:
>> Does anyone know why gcc produces *much* slower code for single precision
>> than double precision on UltraSparcs (see below for more detail)?
>> We use the same C code for both single and double precision. I would expect
>> performance to be similar (though you might need to vary NB), but what I
>> see is that double precision runs roughly 25% faster. Obviously, if one
I'm afraid I cant be a lot of help here: since f.p. is the same speed
for double on the UltraSparcs I never bothered with writing special
codes single precision. Some years ago, I did the same thing with my
own codes, changed double to single in a code written for double, and it
did indeed run slower.
The only thing I can say is that I dont think its gcc's fault: I
inspected the assembler output for my code and I could not fault it.
>> is to be faster, it should be single. The only thing I can think of is that
>> it has something to do with the load instruction used; I know double precision
>> performance takes a beating if you don't assume it is 8-byte aligned. Perhaps
this is because without this assumption, you will be doing 2 4-byte
loads for each double, so it doubles the total number of loads (and
stores), and slows you down for that reason.
There is nothing in the UltraSparc specifications that I know of that
suggest that a single floating point load is slower than a double
floating point load. The only thing I can think of is that for single,
with the same degree of unrolling, you might be increasing your chances
of direct-mapped conflicts in the top-level cache: still that would not
seem to be able to account for so large a difference.
>> this is the problem with single precision? Does anyone know of any reason
>> for single to be slower than double on UltraSparcs, in particular with gcc?
I think that to get good performance for single, you need to to unroll
more than for double and make full use of the register file. Another
trick which I experimented with on the V7 and V8 sparcs (long ago :)
was to use a load double instruction to load two consecutive singles.
You had to assume that the leading dimension was even, but it did make
a difference, at least on the V7 sparcs.