next up previous
Next: Other Applications Up: Automatic Blocking of Nested Previous: Geometric Considerations

A Procedure for Choosing the Tiling Basis

In this section, we describe a practical procedure for choosing a tiling basis A\ given the dependence matrix D. The procedure's complexity is a function of k, the nesting depth of the loop; m, the number of dependence directions; and p, the number of rays of the cone tex2html_wrap_inline1766. (We define what we mean by the rays of a polygonal cone below.) In these terms, the complexity of the procedure is tex2html_wrap_inline2102. While the exponential term here may be cause for some uneasiness, the reader should keep in mind that in the practical application of these ideas k will rarely exceed four.

The procedure can be described as follows:

1. Construct the set of rays of the cone tex2html_wrap_inline1766. A ray is a vector tex2html_wrap_inline2108 that is on the boundary of tex2html_wrap_inline1766 and is at the intersection of at least k-1 of the half spaces tex2html_wrap_inline2114. Thus, the rays satisfy
 equation493
where tex2html_wrap_inline2116 is a subset of the integers tex2html_wrap_inline2118. This is a necessary condition. Let us suppose that there is a unique integer solution (up to scaling) of equation (6). For the solution r to be a ray, we must check whether tex2html_wrap_inline2122. We also check to see whether tex2html_wrap_inline2124 because, if that is the case, then -r is a ray of tex2html_wrap_inline1766. If we find that the rows of D selected by tex2html_wrap_inline2132 are linearly dependent so that (6) has a two or more dimensional subspace of solutions, then we just ignore the set tex2html_wrap_inline2132.

The method we use for the construction is simply to form all of the tex2html_wrap_inline2136 subsets tex2html_wrap_inline2132 and then solve (6) for r. Our implementation uses a QR factorization with column pivoting, which is very effective at detecting linear dependence of the columns tex2html_wrap_inline2144  [10]. It is straightforward to find the integer solution to (6) by computing a solution in floating-point and then finding the smallest scalar multiple that makes the solution integer (after perturbations on the order of roundoff error). Implementations that use only integer arithmetic would also be feasible and perhaps better.

The complexity of these decompositions is tex2html_wrap_inline2146. However, we may update the QR decomposition after changing one column of the matrix at cost tex2html_wrap_inline2150. Bischof has recently shown how to do so and still monitor the linear independence of columns [2]. In our experiments, we do not use such an updating procedure.

We must consider the case in which tex2html_wrap_inline2152 itself has a nontrivial null space, which in fact happens quite often. In this case, the set tex2html_wrap_inline1766 is a wedge,
displaymath2082
where tex2html_wrap_inline2158 is the null space of tex2html_wrap_inline2152 and tex2html_wrap_inline2162 is the intersection of tex2html_wrap_inline2164 with the orthogonal complement of tex2html_wrap_inline2158, the row space of tex2html_wrap_inline2152. To detect this case, we always start with a QR factorization of tex2html_wrap_inline2152 itself. This allows us to find the rank of D and an integer basis for the null space of tex2html_wrap_inline2152 in a standard manner. We then construct the rays of tex2html_wrap_inline2162 by applying a variant of the procedure above to the augmented matrix [D, N], where the columns of N are the computed basis for tex2html_wrap_inline2158. In the variant, the subsets tex2html_wrap_inline2132 always include all of the columns of N, and enough other columns to make up a set of k-1. The resulting rays must therefore be members of tex2html_wrap_inline2162; together with the columns of N they are the of rays of tex2html_wrap_inline1766.

  figure512
Figure 2: Operation counts versus number of rays for selection in 2 dimensions. Solid line: Subset selection; Broken line: Exhaustive search.

Having obtained the rays as the columns of a matrix tex2html_wrap_inline2198, we next choose as our first approximation to the optimal basis a subset of these rays. As the cone is invariant under scaling of these rays, we normalize them so that their length is one. Then we select a subset of k of them, chosen to approximately minimize the condition number of the subset. (We show below that this also results in a nearly maximum determinant.) This is a standard problem, called subset selection, in statistics. We employ the heuristic procedure of Golub, Klema, and Stewart [9], which is described in the text of Golub and Van Loan [10]. This procedure involves a singular value decomposition (SVD) of R and the QR decomposition with column pivoting of a matrix that is part of the SVD, with an overall cost of tex2html_wrap_inline2204 floating-point operations (and operation being a floating-point addition and a floating-point multiplication).

  figure521
Figure 3: Operation counts versus number of rays for selection in 3 dimensions. Solid line: Subset selection; Broken line: Exhaustive search.

We know of no method for finding the optimal subset of rays other than an exhaustive search, at a cost of tex2html_wrap_inline2206 flops. The relative costs of our implementation and exhaustive search for the optimum subset are illustrated in Figures 2 --  4. Obviously, the exhaustive procedure is prohibitively expensive for large problems, but may be used for k = 2, for k = 3 and p < 6, and for k = 4 and p < 6. On the other hand, subset selection does very well. In a test of 1000 randomly generated tex2html_wrap_inline2218 matrices D, subset selection produced a suboptimal choice in 18 cases. In the worst of these, the determinant of the basis that it found was 17% smaller than that of the optimum basis.

  figure533
Figure 4: Operation counts versus number of rays for selection in 4 dimensions. Solid line: Subset selection; Broken line: Exhaustive search.

The basis chosen at this point may be far from optimal. Consider the case
displaymath2083
The two rays of the cone are the columns of
displaymath2084
These rays make an angle of tex2html_wrap_inline2222; clearly there are orthogonal bases whose elements are in tex2html_wrap_inline1766, but not all at the boundaries. To catch cases like this, we have implemented a generalized orthogonalization process. Let angle(x, y) denote the angle between the vectors x and y, given by
displaymath2085
The procedure is

 mmmmmmmmmmmmmmmmmm¯

for j = 1 to k do

Find tex2html_wrap_inline2236 such that angletex2html_wrap_inline2238 is maximum;

if angletex2html_wrap_inline2240

tex2html_wrap_inline2242

so that tex2html_wrap_inline1576 is orthogonal to tex2html_wrap_inline2246;

Replace tex2html_wrap_inline1576 with an integer vector in tex2html_wrap_inline1766

of approximately the same direction; fi

od

if tex2html_wrap_inline2078 and the normalized determinant is larger than before

improvement, accept the new tex2html_wrap_inline2254, else use the old one; fi

In a test of 1000 randomly generated tex2html_wrap_inline2218 dependence matrices D, the basis selected by finding the rays of tex2html_wrap_inline1766 and then using the subset selection procedure above was improved by this procedure. The average determinant was improved tex2html_wrap_inline2262, from .63 to .71 . In comparisons with several similar procedures, this one did the best job of maximizing the normalized determinant. We also considered the following variants:

  1. As above, but replace tex2html_wrap_inline2246 rather than tex2html_wrap_inline1576 after making it orthogonal to tex2html_wrap_inline1576.
  2. For j = 1 to k, tex2html_wrap_inline1576 is made orthogonal to each other basis vector with which it makes an obtuse angle; this continues until there are no such obtuse angles involving tex2html_wrap_inline1576.
  3. For every pair of basis vectors tex2html_wrap_inline2246 and tex2html_wrap_inline1576 with i < j, orthogonalize tex2html_wrap_inline1576 and tex2html_wrap_inline2246 by adding a multiple of tex2html_wrap_inline1576 to tex2html_wrap_inline2246.
  4. For every pair of basis vectors tex2html_wrap_inline2246, and tex2html_wrap_inline1576 with i < j, orthogonalize tex2html_wrap_inline1576 and tex2html_wrap_inline2246 by adding a multiple of tex2html_wrap_inline2246 to tex2html_wrap_inline1576.

Procedures 2, 3, and 4 are more costly with little in the way of improved performance. Procedure 1 actually does worse. Thus, we recommend the use of the procedure above.


next up previous
Next: Other Applications Up: Automatic Blocking of Nested Previous: Geometric Considerations

Jack Dongarra
Tue Feb 18 15:39:11 EST 1997