[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: abs()



Greetings!  Well, I guess I should chime in here.

On the intel, the one cyle instruction is 'fabs'.  It works on the
floating point register at the bottom of the stack, so if you are
going to call from a C macro, you need to be sure that the value your
abs'ing has been loaded into st(0).  I really should study your
C/register wizadry here, as I've never been able to realiably get a C
variable into the register I want once it passes through the compiler
with various optimization flags.  But assuming you've done this, the
macro you want is

#define FABS __asm__ __volatile__ ("fabs\n\t")

If you are doing a asum, and you have the sum in st(0), and you want
the macro to do the load and add as well, use

#define ASUM(a) __asm__ __volatile__ ("fldl %0\n\tfabs\n\tfaddpl
 %%st(1)\n\t"::"m" (a))

Here is a little program showing these things with an asum:
=============================================================================
intech19:~$ !cc
cc -Wall -O6 a.c -o a -L/home/camm/lib/i386 -lnum -lmisc 
intech19:~$ ls -l foo
-rw-r--r--    1 camm     camm     32000032 May 14 13:19 foo
intech19:~$ ./a foo
3194162.297533, 104582 musec
3194162.297533, 228127 musec
=============================================================================
a.c
=============================================================================
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/time.h>
#include <unistd.h>

#define FABS(a) if ((a)<0.0) (a)=-(a)
#define MFABS(a) __asm__ __volatile__ ("fabs\n\t")
#define M1FABS(a) __asm__ __volatile__ ("fldl %0,fabs\n\t"::"m" (a))
#define M2FABS(a) __asm__ __volatile__ ("fldl %0\n\tfabs\n\tfaddp %%st(1)"::"m" (a))

#include "misc.h"
#include "num.h"

int
main(int argc,char * argv[]) {

  double *d1,*d,*de,x;
  int l;
  struct stat ss;
  register double xx;
  struct timeval tv,tv1;

  if (argc!=2)
    errret("Usage: %s <random file>\n",*argv);

  if (stat(argv[1],&ss))
    errret("Cannot stat %s\n",argv[1]);

  if (ss.st_size%sizeof(*d))
    errret("%s must have size an integral multiple of %u\n",sizeof(*d));

  if ((l=open(argv[1],O_RDONLY))<0)
    errret("Cannot open %s read-only\n",argv[1]);

  if ((d1=mmap(0,ss.st_size,PROT_READ,MAP_SHARED,l,0))==(double *)-1)
    errret("Cannot mmap %s\n",argv[1]);
  de=d1+ss.st_size/sizeof(*d1);

  gettimeofday(&tv,NULL);
  __asm__ __volatile__ ("fldz\n\t");
  for (d=d1;d<de;d++) {

    M2FABS(*d);

  }
  __asm__ __volatile__ ("fstpl %0\n\t":"=m" (x));
  gettimeofday(&tv1,NULL);

  printf("%f, %lu musec\n",x,1000000*(tv1.tv_sec-tv.tv_sec)+(tv1.tv_usec-tv.tv_usec));

  gettimeofday(&tv,NULL);
  xx=0.0;
  for (d=d1;d<de;d++) {

    register double dd;

    dd=*d;
    if (dd<0.0) dd=-dd;
    xx+=dd;

  }
  gettimeofday(&tv1,NULL);

  printf("%f, %lu musec\n",xx,1000000*(tv1.tv_sec-tv.tv_sec)+(tv1.tv_usec-tv.tv_usec));

  return 0;

}
=============================================================================

P.S.  If you're going to do this, a prefetch macro would probably help
a lot too.

Take care,


R Clint Whaley <rwhaley@cs.utk.edu> writes:

> Guys,
> 
> Several of the Level 1 BLAS have a strong dependance on the speed of real
> absolute value for their performance.  This operation should be a 1-cycle
> bit level operation (mask off the sign bit), but ANSI C supports bit operations
> on integer only, so ATLAS is unable to employ this operation (I made something
> work with a bunch of casts, but by the time all that was done, it was slower
> than an if), and so must instead substitute an if of some sort, which of
> course implies a branch, which implies poor performance.
> 
> My guess is that there are system-dependant ways to make fabs() one cycle
> nonetheless, and I'm hoping some of you know or can easily discover them.
> Anyway, I want to ask anyone who can figure out to do fabs() without an if
> to post to the list.  The solution can be as nonportable as you want;
> I figure in-line assembler may be required, but hopefully it can be used
> with a C macro.  Here's an example macro for double precision:
> 
>    #define ATL_dabs(x) ( (x) >= 0.0 ? (x) : -(x) )
> 
> If anyone can do it without the if, I think we can speedup quite a few
> routines . . .
> 
> Any pointers appreciated,
> Clint
> 
> 

-- 
Camm Maguire			     			camm@enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah