Adaptive Approach for
Basic Linear Algebra Subprograms
¨ Do a parameter study of the operation
on the target machine, done once.
¨ Only generated code is on-chip multiply
¨ BLAS operation written in terms of
generated on-chip multiply
¨ All tranpose cases coerced through data
copy to 1 case of on-chip multiply
ã Only 1 case generated per platform
N
K
N
A
M
C
B
M
*
K
NB