next up previous contents
Next: Index filenames Up: The L1 matmul Previous: L1 matmul API   Contents

L1 matmul description file

In the install process, ATLAS first searches through the L1 matmul implementations provided by the ATLAS matmul generator. When the best generated code is found, the user contributed codes are timed to see if they can beat the generated code. The matmul search script opens a description file for each precision (scases.dsc, dcases.dsc, ccases.dsc, zcases.dsc) in the ATLAS/tune/blas/gemm/<arch> directory, to see what user-contributed codes are available. This master index file is actually generated based on several user-supplied files from ATLAS/tune/blas/gemm/CASES (see Section 5.2.4 for the names and definitions of these files). The format for all these files is the same, and is described in the following paragraphs.

The first line of each file is a comment line, and is ignored. The next line indicates the number of user-contributed codes to search, and each subsequent line supplies information about a given user-supplied L1 matmul. The form of these lines is:
<ID> <flag> <mb> <nb> <kb> <muladd> <lat> <mu> <nu> <ku> <rout> "<author>"

<rout> and <author>" are strings, and the rest of the parameters are signed integers.

The meaning of these parameters are:

Table 1 summarizes the presently defined flag values.

Table 1: Matmul index routine flag variables
FLAG MEANING
0 Normal
8 Do not consider this kernel for cleanup
16 Consider this kernel for cleanup only
32 lda and ldb are not restricted to KB
64 mb provides run-time constraint, not compile-time
128 nb provides run-time constraint, not compile-time
256 kb provides run-time constraint, not compile-time


Here's an example:

<ID> <flag> <mb> <nb> <kb> <muladd> <lat> <mu> <nu> <ku> <rout> "<Contributer>"
3
 1 0 0 0 0 1 1 1 1 1 ATL_mm1x1x1.c "R. Clint Whaley"
 2 0 1 1 1 1 1 1 1 1 ATL_mm1x1x1b.c "R. Clint Whaley"
 3 0 1 1 8 1 1 1 1 4 ATL_mm2.c "R. Clint Whaley"

So, we have 3 user-supplied routines, all written by me. The first loops over $M$, $N$, and $K$, but the following two routines loop over the cpp macros MB, NB, KB. The third routine insists that KB be a multiple of 8. The first two routines don't unroll any of the loops, while the third unrolls the K loop to a depth of 4. They all use a combined muladd style of programming, and don't worry about latency.


next up previous contents
Next: Index filenames Up: The L1 matmul Previous: L1 matmul API   Contents
R. Clint Whaley 2001-08-04