for I<>i = 0 todo
for I<>j = 0 to
do
a(i,j) = a(i-3,j) + a(i,j-2)
b(i,j) = a(i-2,j-3) + b(i-2,j-1)
enddo
enddo
This loop nest has depth 2. The dependences are uniform (intuitively, dependence vectors are translations), and they can be
encapsulated into the dependence matrix
![]()
The atomicity constraint can be expressed by the analytical condition
, where H is the matrix of vectors normal to the edges of the tile.
Irigoin and Triolet [13] were the first to derive the condition
. In the context of vector machines with a two-level memory hierarchy, they have introduced tiling as a much more powerful technique
than strip-mining or rectangular partitioning [15]. In their paper,
tiling consists in aggregating elemental computation points in tiles (or supernode) defined as multi-dimensional parallelepipeds so that each tile executes atomically with no intervening synchronization or communication. This approach has very close objectives to those of Schreiber and Dongarra [17] who aim at automatically designing block versions of nested loop kernels. Schreiber and Dongarra [17] have discussed how to determine
the size and shape of the tiles so as to optimize the communication-to-computation ratio. Their work has been extended by
Ramanujam and Sadayappan [16], and by Boulet et al. [3]. Several other papers have discussed the same
framework, including [18, 4, 1, 5]
.