ICCGLU A Fortran IV subroutine to solve large sparse general systems of linear equations. J.J. Dongarra, G.K. Leaf and M. Minkoff July, 1982 1. Purpose The Fortran program ICCGLU solves a linear system of equations A*x = b , where A is a large sparse real gereral matrix. The solution is found through an iterative procedure. The iterative method is based on the conjugate gradient algorithm applied to the implicitly formed normal equations. The method uses a preconditioner to accerlerate convergence. The preconditioner is based on an incomplete LU factorization. This routine can be used to solve symmetric systems as well as non-symmetric. 2. Usage A. Calling Sequence. The subroutine statement is: SUBROUTINE ICCGLU(A,N,IA,JA,AF,B,X,R,P,S,TOL,MAXITS,ITS,INFO) where On input: A double precision(nmat) contains the non-zero elements of the matrix, stored row-wise. nmat is the number of locations supplied to represent the array. (See section 3 for more details.) N integer is the order of the matrix. IA integer(N+1) contains pointers to the start of the rows in the arrays A and JA. JA integer(nmat) contains the column indices of the corresponding elements in the array A. nmat is the number of locations supplied to represent the array. B double precision (N) contains the right hand vector. X double precision (N) contains an estimate of the solution, the closer the estimate is to the true solution the faster the method will converge. TOL double precision is the accuracy desired in the solution. MAXITS integer is the maximum number of iterations to be taken by the routine. On output AF double precision (nmat) contains the incomplete LU factorization of the matrix contained in the array A. nmat is the number of locations supplied to represent the array. X double precision (N) contains the solution. R,P,S double precision (N) these are scratch work arrays. ITS integer contains the number of iterations needed to converge. INFO integer signals nature of termination. INFO = 0 method converged in ITS iterations INFO = -999 method did not converge in MAXITS iterations. B. Error conditions and returns. The conjugate Gradient iteration may require more steps to attain the desired accuracy then has been allowed by MAXITS. In this case the subroutine returns with INFO set to -999. C. Applicability and restrictions. This routine is intended for use with very large sparse non-symmetric matrices. There exist excellent routines in LINPACK for dense matrices of general form, banded form and tridiagonal form, ICCGLU should not be used in these instances. (See the LINPACK package which is on our computers). There are also special routines which deal with very large sparse symmetric positive definite matrices which should be used where applicable. (See the SPARSPAK package which is on our computers or its description in [2].) 3. Discussion of method and algorithm. This program grew out of a need to solve large scale linear systems where the coefficient matrix arose from a finite difference approximation (seven point) to an elliptic partial differential equation in three dimensions. The matrices associated with three dimensional problems tend to be large and sparse. For example, a finite difference mesh of 15x15x30 cells leads to a linear system of order 6750 with about 45,000 non-zero coefficients, for a density of about .001. The conjugate gradient algorithm has been known for some time. This algorithm has finite termination after n steps when the matrix is symmetric positive definite. The rate of convergence can often be accelerated by multiplying the problem by an approximation to the inverse of the matrix. The approximation used for sparse matrices often is derived from an incomplete factorization of the matrix. By incomplete factorization we mean that the sparsity structure of the original matrix is retained in its factors. This requires ignoring or skipping certain operations during the factorization. This incomplete factorization then serves to precondition the original matrix, in some sense, by clustering the eigenvalues. For a complete description see [1]. In order to that the non-symmetric matrix problem be handled by the conjugate gradient algorithm, the problem must be further transformed into a symmetric positive definite one. This is accomplished by implicitly solving the normal equations. The incomplete factorization and implicit solution of the normal equations are transparent to the user. The algorithm has the following form. ( inv = inverse, trans = transpose, invtrans = inverse transpose ) form incomplete factors L and U x(0) = initial solution estimate r(0) = b - A*x(0) R(0) = trans(A)*invtrans(L)*invtrans(U)*inv(U)*inv(L)*r(0) p(0) = R(0) i = 0 while norm(r)/(norm(a)*norm(x)) > tol do s = inv(U)*inv(L)*A*p(i) a(i) = trans(r(i))*r(i)/(trans(s)*s) x(i+1) = x(i) + a(i)*p(i) r(i+1) = r(i) - a(i)*trans(A)*invtrans(L)*invtrans(U)*s b(i) = trans(r(i+1))*r(i+1)/(trans(r(i))*r(i)) p(i+1) = r(i+1) + b(i)*p(i) i = i + 1 end The matrix must be stored in a sparse storage scheme to minimize the amount of memory required. The storage scheme used by ICCGLU is standard in dealing with sparse matrices [2] and is designed to facilitate row operations on the matrix; for this reason the storage is row oriented. The non-zero elements of a row are stored consecutively. To locate and identify these elements, we need to know where the row starts in storage, how long it is and in what columns the elements of the row lie. The scheme requires three arrays, A, IA, and JA. The array A contains non-zero elements of the matrix. The array JA contains the column indices which correspond to the entries of A. If for example, A(K) contains the value of the i,j element of the matrix then JA(K) contains j. The array IA contains pointers to the start of rows in A and JA. IA(1) = 1 and IA(I+1) is defined so that IA(I+1) - IA(I) is the number of non-zero elements in the i-th row of the matrix. The elements themselves are stored in locations IA(I) through IA(I+1)-1 of the array A. a11 a12 0 0 0 0 a21 a22 a23 a24 0 a26 0 a32 a33 0 a35 0 0 0 0 a44 0 0 a51 0 0 0 a55 a56 0 0 0 0 0 a66 A: a11 a12 a21 a22 a23 a24 a26 a32 a33 a35 a44 a51 a55 a56 a66 JA: 1 2 1 2 3 4 6 2 3 5 4 1 5 6 6 IA: 1 3 8 11 12 15 16 We have provided utility subroutines prepare the input arrays. Subroutine INITS initializes A, IA, and JA in the appropriate manner. Subroutines PUT and PUTSUM are provided to insert information about the elements of the matrix into the appropriate arrays. The call to PUT has the form: CALL PUT(T,A,N,IA,JA,I,J) where T is the numerical value to be inserted into the structure and I and J are its row and column in the matrix. Subroutine PUT will then insert T or overwrite an existing value if one has previously been defined at that location. Subroutine PUTSUM performs the same function as PUT except that if a value already exists PUTSUM will add T to it. The following program initializes the structure, sets up the matrix, and finds a solution for an example problem. double precision a(100),af(100),b(10),s(10),x(10),r(10),p(10) integer ia(11),ja(100) c c this is a test main program for the incomplete factorization conjugate c gradient algorithm which uses an incomplete lu decomposition. c lda = 10 n = 10 c c initialize the structure for the matrix. c call inits(a,n,ia,ja) c c setup the matrix and the right hand side. c do 10 i = 2,n call put(1.0d0,a,n,ia,ja,i,i-1) call put(1.0d0,a,n,ia,ja,i-1,i) call put(4.0d0,a,n,ia,ja,i,i) b(i) = 6.0d0 10 continue call put(1.0d0,a,n,ia,ja,1,n) call put(1.0d0,a,n,ia,ja,n,1) b(1) = 5.0d0 b(n) = 5.0d0 call put(4.0d0,a,n,ia,ja,1,1) c c estimate the solution c x(1) = 1.0d0 do 20 i = 2,n x(i) = 0.0d0 20 continue c c solve the system. c maxits = 20 tol = 1.0d-10 call iccglu(a,n,ia,ja,af,b,x,r,p,s,tol,maxits,its,info) write(6,30) its 30 format(' number of iterations taken',i5) write(6,40)(x(i),i=1,n) 40 format(' solution',5e16.8) stop end 4. References. 1) J.J. Dongarra, G.K. Leaf and M. Minkoff, A Preconditioned Conjugate Gradient Method for Solving a Class of Non-Symmetric Linear Systems, ANL-81-71. 2) Alan George and Joseph W. Liu, Computer Solution of Large Sparse Positive Definite Systems, Prentice-Hall, New Jersey 1981. 5. Program Statistics. subroutine iccglu(a,n,ia,ja,af,b,x,r,p,s,tol,maxits,its,info) c integer n,ia(1),ja(1),maxits,its,info double precision a(1),af(1),x(1),r(1),p(1),s(1),b(1),tol c c this routine performs preconditioned conjugate gradient on a c sparse matrix. The preconditioner is an incomplete LU of c the matrix. c c on entry: c c a double precision() c contains the elements of the matrix. c c n integer c is the order of the matrix. c c ia integer(n+1) c contains pointers to the start of the rows in the arrays c a and ja. c c ja integer() c contains the column location of the corresponding elements c in the array a. c c b double precision (n) c contains the right hand side. c c x double precision (n) c contains an estimate of the solution, the closer the c estimate is to the true solution the faster the method c will converge. c c tol double precision c is the accuracy desired in the solution. c c maxits integer c is the maximum number of iterations to be taken c by the routine. c c on output c c af double precision () c contains the incomplete factorization of the matrix c contained in the array a. c c x double precision (n) c contains the solution. c c r,p,s double precision (n) c these are scratch work arrays. c c its integer c contains the number of iterations need to converge. c c info integer c signals if normal termination. c info = 0 method converged in its iterations c info = -999 method did not converge in maxits iterations. c c c the algorithm has the following form. c c form incomplete factors L and U c x(0) <- initial estimate c r(0) <- b - A*x(0) c R(0) <- trans(A)*invtrans(L)*invtrans(U)*inv(U)*inv(L)*r(0) c p(0) <- R(0) c i <- 0 c while r(i) > tol do c s <- inv(U)*inv(L)*A*p(i) c a(i) <- trans(r(i))*r(i)/(trans(s)*s) c x(i+1) <- x(i) + a(i)*p(i) c r(i+1) <- r(i) - a(i)*trans(A)*invtrans(L)*invtrans(U)*s c b(i) <- trans(r(i+1))*r(i+1)/(trans(r(i))*r(i)) c p(i+1) <- r(i+1) + b(i)*p(i) c i <- i + 1 c end c double precision ai,bi,rowold,rownew,xnrm,anrm double precision ddot anrm = dabs(a(idamax(ia(n+1)-1,a,1))) c c form incomplete factors L and U c call dcopy(ia(n+1)-1,a,1,af,1) call lu(af,n,ia,ja) c c r(0) <- b - A*x(0) c call res(a,n,ia,ja,x,b,r) c c R(0) <- trans(A)*invtrans(L)*invtrans(U)*inv(U)*inv(L)*r(0) c c inv(U)*inv(L)*r(0) c call dcopy(n,r,1,s,1) call ssol(af,n,ia,ja,s,1) call ssol(af,n,ia,ja,s,2) c c invtrans(L)*invtrans(U)*above c call ssol(af,n,ia,ja,s,-2) call ssol(af,n,ia,ja,s,-1) c c trans(A)*above c call mmult(a,n,ia,ja,r,s,-1) c c p(0) <- R(0) c call dcopy(n,r,1,p,1) rowold = ddot(n,r,1,r,1) i = 0 c c while r(i) > tol do c ai = 1.0d0 10 continue xnrm = dabs(x(idamax(n,x,1))) if( dsqrt(rowold)/(anrm*xnrm) .le. tol ) go to 12 c c s <- inv(U)*inv(L)*A*p(i) c call mmult(a,n,ia,ja,s,p,1) call ssol(af,n,ia,ja,s,1) call ssol(af,n,ia,ja,s,2) c c a(i) <- trans(r(i))*r(i)/(trans(s)*s) c ai = rowold/ddot(n,s,1,s,1) c c x(i+1) <- x(i) + a(i)*p(i) c call daxpy(n,ai,p,1,x,1) c c c r(i+1) <- r(i) - a(i)*trans(A)*invtrans(L)*invtrans(U)*s c call ssol(af,n,ia,ja,s,-2) call ssol(af,n,ia,ja,s,-1) call mmult(a,n,ia,ja,b,s,-1) call daxpy(n,-ai,b,1,r,1) c c b(i) <- trans(r(i+1))*r(i+1)/(trans(r(i))*r(i)) c rownew = ddot(n,r,1,r,1) bi = rownew/rowold c c p(i+1) <- r(i+1) + b(i)*p(i) c call daxpy2(n,bi,p,1,r,1) rowold = rownew c c i <- i + 1 c i = i + 1 if( i .gt. maxits ) go to 20 go to 10 12 continue 15 continue its = i return 20 continue info = -999 its = maxits return end