Proof

Next: Optimizing the tile size Up: Determining the total execution Previous: Determining the total execution

Proof

According to hypothesis (H4), the computation goes column-wise. When a processor has completed the execution of a whole column of tiles, it starts the next column that has been assigned to it. The time to process a whole column of tiles is the number of tiles in the column, namely , times the time to compute a tile, namely . We obtain the value for processing a whole tile column.

Now, according to hypothesis (H5), tile columns are distributed cyclically to processors. If a processor starts the execution of the first tile in a given column at time-step t, its right neighbor cannot start the execution of the first tile in the next column before time-step , where (this is due to the dependence vector ). Note that is the same as in Section 2.2, but we pay a communication cost only when the processors owning the tiles are not the same. Two cases can occur:

Figure 4: Scheduling tiles with , and P=3.

Either there are enough tiles in each column so that when a processor has completed the execution of a whole tile column, it does not have to wait for its next tile column to be ready. This will happen if is greater than or equal to the delay imposed by horizontal constraints, i.e. if

If this condition holds, all processors remain active throughout the entire computation, once they have started execution. Since the last processor starts at time and has tiles to execute (each in time ), we obtain , the first expression in Equation (1). See Figure 3 where , and P=3. There are tiles per column, and , hence the condition is satisfied.

Figure: Scheduling tiles with , and P=3.
Or each processor has to wait upon finishing a tile column until the next one is ready. This translates into the condition . In that case, the total computation time is equal to the time at which the last processor starts the execution of the first tile in the last column, namely plus the time needed to process this column. We obtain the expression , as stated in the second formula of Equation (1). See Figure 4 where , , and P=3. There are tiles per column, and , hence . Processors remain idle at the end of each tile column, waiting for their next column to be ready.

Next: Optimizing the tile size Up: Determining the total execution Previous: Determining the total execution

Jack Dongarra
Sat Feb 8 08:17:58 EST 1997