[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

binary installation issues (cont'd)


I apologize --- I looked a bit more carefully at the search code,
and it looks like you minimize search time by using binary
search instead of a full grid search.  

Given that a significant fraction of the time spent tuning ATLAS
is spent compiling, and given that you already use binary 
search, I think that building binaries for the full grid search 
would likely be prohibitively expensive.  Also, I think the
resulting installation package would be huge, since the 
aggregate size of all .o files for the full search space would
be overwhelming.

If the package were distributed by CD-ROM, size wouldn't
matter so much, and since the build only happens once,
the huge build time would not be a showstopper.  However,
it feels a bit like a kludge to me...

So, do you have any suggestions as to whether it might be
possible to extend/enhance/modify ATLAS to enable
distributors to build/redistribute binary packages of ATLAS
which could tune themselves with minimal delay and
recompilation on target machines?



PS Some buglets in 3.1.4D:
	- config.c line 2131: There is a newline in the string
	  which causes HP's compiler to complain/barf
	- Instead of using "-Aa -D_INCLUDE_POSIX_SOURCE"
	  or simply "-Aa" for HP flags, you should probably use
	  "-Ae" instead.
	- Add HP-UX specific code to discover number of CPUs
	- HP-PA machines generally only have an L1 cache,
	  they don't have an L2/L3 cache.  However, the L1
	  cache is usually between 256KB and 2MB for many
	  machines.  tune/sysinfo/L1CacheSize.c assumes
	  that L1 caches are at most 256KB.  I think a better
	  algorithm might be to use binary search where the
	  two end points are very small (say 1K) and very
	  large (say 2x MaxL2CacheSize).  The binary search
	  would be on the log2() size of the prospective cache
    I have attached a copy of the updated config.c incorporating
    these bug fixes...  I have also attached a copy of the updated
    L1CacheSize.c which is perhaps more reliable.  You should
    call the new program with a value that is roughly 2x MaxL2