### CRS-based Factorization Transpose Solve

Solving the transpose system is slightly more involved. In the usual formulation we traverse rows when solving a factored system, but here we can only access columns of the matrices and (at less than prohibitive cost). The key idea is to distribute each newly computed component of a triangular solve immediately over the remaining right-hand-side.

For instance, if we write a lower triangular matrix as , then the system can be written as . Hence, after computing we modify , and so on. Upper triangular systems are treated in a similar manner. With this algorithm we only access columns of the triangular systems. Solving a transpose system with a matrix stored in CRS format essentially means that we access rows of and .

The algorithm now becomes

```for i = 1, n
x_tmp(i) = x(i)
end;
for i = 1, n
z(i) = x_tmp(i)
tmp = pivots(i) * z(i)
for j = diag_ptr(i)+1, row_ptr(i+1)-1
x_tmp(col_ind(j)) = x_tmp(col_ind(j)) - tmp * val(j)
end;
end;

for i = n, 1 (step -1)
y(i) = pivots(i) * z(i)
for j = row_ptr(i), diag_ptr(i)-1
z(col_ind(j)) = z(col_ind(j)) - val(j) * y(i)
end;
end;```
The extra temporary x_tmp is used only for clarity, and can be overlapped with z. Both x_tmp and z can be considered to be equivalent to y. Overall, a CRS-based preconditioner solve uses short vector lengths, indirect addressing, and has essentially the same memory traffic patterns as that of the matrix-vector product. 