Next: 13.5.5 Global Strategies Up: 13.5 ASPAR Previous: 13.5.3 The Local View

## 13.5.4 The ``Global'' View

Up to now, the discussion has rested mainly on the properties of small portions of code-often single or a single group of nested loops in practical cases. While this is generally sufficient for a ``vectorizing'' compiler, it is too little for effective parallelization. To make the issues a little clearer, consider the following piece of code:

```
DO 10 I=1,100

DO 10 I=A(I) = B(I) + C(I)

10 		 CONTINUE

DO 20 I=1,100

DO 10 I=D(I) = B(I) + C(100-I+1)

20 		 CONTINUE

```

Taken in isolation (the local view), both of these loop constructs are trivially parallelizable and have no dependences. For the first loop, we would assign the first few values of the arrays A, B, and C to the first processor, the next few to the second, and so on until we had accounted for each loop iteration. For the second loop, we would assign the first few elements of A and B and the last few of C to the first node, and so on. Unfortunately, there is a conflict here in that one loop wants to assign values from array C in increasing order while the other wants them in decreasing order. This is the global decomposition problem.

The simplest solution in this particular case can be derived from the fact that array C only appears on the right-hand side of the two sets of expressions. Thus, we can avoid the problem altogether by not distributing array C at all. In this case, we have to perform a few index calculations, but we can still achieve good speedup in parallel.

Unfortunately, life is not usually as simple as presented in this case. In typical codes, we would find that the logic which led to the ``nondistribution'' of array C would gradually spread out to the other data structures with the final result that we end up distributing nothing and often fail to achieve any speedup at all.

Next: 13.5.5 Global Strategies Up: 13.5 ASPAR Previous: 13.5.3 The Local View

Guy Robinson
Wed Mar 1 10:19:35 EST 1995