next up previous
Next: Optimizing the tile size Up: Determining the total execution Previous: Determining the total execution

Proof

According to hypothesis (H4), the computation goes column-wise. When a processor has completed the execution of a whole column of tiles, it starts the next column that has been assigned to it. The time to process a whole column of tiles is the number of tiles in the column, namely tex2html_wrap_inline910, times the time to compute a tile, namely tex2html_wrap_inline912. We obtain the value tex2html_wrap_inline914 for processing a whole tile column.

Now, according to hypothesis (H5), tile columns are distributed cyclically to processors. If a processor starts the execution of the first tile in a given column at time-step t, its right neighbor cannot start the execution of the first tile in the next column before time-step tex2html_wrap_inline918, where tex2html_wrap_inline920 (this is due to the dependence vector tex2html_wrap_inline922). Note that tex2html_wrap_inline924 is the same as in Section 2.2, but we pay a communication cost only when the processors owning the tiles are not the same. Two cases can occur:

  figure218
Figure 4: Scheduling tiles with tex2html_wrap_inline926, tex2html_wrap_inline928 and P=3.


next up previous
Next: Optimizing the tile size Up: Determining the total execution Previous: Determining the total execution

Jack Dongarra
Sat Feb 8 08:17:58 EST 1997