10
Code Generation
 Strategy
¨Code is iteratively generated & timed until optimal case is found.  We try:
ãDiffering NBs
ãBreaking false dependencies
ãM, N and K loop unrolling
¨On-chip multiply optimizes for:
ãTLB access
ãL1 cache reuse
ãFP unit usage
ãMemory fetch
ãRegister reuse
ãLoop overhead minimization
¨Takes a 30 minutes to a hour to run.