Next: Discussion
Up: Literature overview
 Previous: Back to Example 
 
 
Ohta et al. [14] aim at determining the best tile size under the
following hypotheses:
- (H1)
 -  There are P available processors interconnected as a ring.
 - (H2)
 -  The computation domain is a two-dimensional rectangle of size
.
 - (H3)
 -  Tiles are rectangular and their edges are parallel to the axes
(see Figure 2). The size of a tile is
 
, where 
 and 
 are unknowns.
 - (H4)
 -  Dependences between tiles are summarized by the vector pair
 (as in Example 1).
 - (H5)
 -  Tiles are assigned to processor using a one-dimensional cyclic distribution: in other words, tile (i,j) is allocated to
processor 
.
 - (H6)
 -  The scheduling of the tiles is column-wise:
at step 0, processor 
 executes tile (0,0) and the computed value is communicated to the adjacent processor 
 (more precisely, a rectangular
slice of width w and height 
 is sent). At step 1, 
processors 
 and 
 execute tiles (0,1) and (1,0) simultaneously.
After having executed a whole column of tiles, a processor moves on to its
next column.
 
  
Figure 2: Mapping rectangular tiles onto a ring of processors.
A step is the time required to compute a tile and to communicate data.
Ohta et al. [14] use the following expression:

where t is the elemental computation time, a is a communication start-up
and b is the inverse of the communication bandwidth times the width w
of the slice being communicated (the communication cost is a
linear expression in the message size).
To compute the total execution time, Ohta et al. [14] use the
formula 
, where 
 is the latency (the step
at which the last processor begins to work) and 
 is the number of tiles per processor (assumed to be an integer). Using the approximation  
, they derive the total execution time T as

The execution time is found to be minimal when choosing

 
 
   
 Next: Discussion
Up: Literature overview
 Previous: Back to Example 
Jack Dongarra 
Sat Feb  8 08:17:58 EST 1997