Hypotheses (H5) and (H6) are very restrictive in that they impose the mapping of tiles to processors as well as their scheduling. The intuitive motivation for (H5) is that a cyclic distribution of tiles is quite natural to load-balance computations. Once the distribution of tiles to processors is fixed, there are several possible schedulings (any wavefront execution that goes along a left-to-right diagonal is valid). Specifying a column-wise execution may lead to the simplest code generation.
It turns out that (H5) and (H6) provide the best solution among all possible
distributions of tiles to processors, which is a very strong result.
This result holds true under the assumption that the communication cost for a tile is not larger than its
computation cost. Since the communication cost for a tile grows linearly
with its size, while the
computation costs grows quadratically, this hypothesis will be satisfied if the tile is large enough
. This result is formally stated in the theorem below.
Beforehand, we need to refine the communication cost as follows:
