ICCGLU
 

   A Fortran IV subroutine to solve large sparse general
                 systems of linear equations.

 
           J.J. Dongarra, G.K. Leaf and M. Minkoff
 

                        July, 1982

 
   1. Purpose

       The Fortran program ICCGLU solves a linear system of equations

                   A*x = b ,

where A is a large sparse real gereral matrix. The solution is found through
an iterative procedure. The iterative method is based on the conjugate gradient 
algorithm applied to the implicitly formed normal equations. The method uses 
a preconditioner to accerlerate convergence. The preconditioner is based on an 
incomplete LU factorization.  This routine can be used to solve symmetric 
systems as well as non-symmetric.


   2. Usage

      A. Calling Sequence.
         
         The subroutine statement is:

      SUBROUTINE ICCGLU(A,N,IA,JA,AF,B,X,R,P,S,TOL,MAXITS,ITS,INFO)

      where
      
      On input:

         A  double precision(nmat)
            contains the non-zero elements of the matrix, stored row-wise.
            nmat is the number of locations supplied to represent
            the array. (See section 3 for more details.)
 
         N  integer
            is the order of the matrix.
 
         IA integer(N+1)
            contains pointers to the start of the rows in the arrays
            A and JA.
 
         JA integer(nmat)
            contains the column indices of the corresponding elements
            in the array A. 
            nmat is the number of locations supplied to represent
            the array.
 
         B double precision (N)
            contains the right hand vector.
             
         X double precision (N)
            contains an estimate of the solution, the closer the
            estimate is to the true solution the faster the method
            will converge.

         TOL double precision
            is the accuracy desired in the solution.
 
         MAXITS integer
            is the maximum number of iterations to be taken
            by the routine.
 

      On output
 
         AF double precision (nmat)
            contains the incomplete LU factorization of the matrix
            contained in the array A.
            nmat is the number of locations supplied to represent
            the array.
           
         X double precision (N)
           contains the solution.
 
         R,P,S double precision (N)
           these are scratch work arrays.
 
         ITS integer
           contains the number of iterations needed to converge.
 
         INFO integer
           signals nature of termination.
           INFO = 0    method converged in ITS iterations
           INFO = -999 method did not converge in MAXITS iterations.
 
         
       B. Error conditions and returns.

           The conjugate Gradient iteration may require more
           steps to attain the desired accuracy then has been
           allowed by MAXITS.  In this case the subroutine returns 
           with INFO set to -999.

       C. Applicability and restrictions.

           This routine is intended for use with very large sparse
           non-symmetric matrices. There exist excellent routines in
           LINPACK for dense matrices of general form, banded form and
           tridiagonal form, ICCGLU should not be used in these
           instances. (See the LINPACK package which is on our computers).
           There are also special routines which deal with very large 
           sparse symmetric positive definite matrices which should 
           be used where applicable. (See the SPARSPAK package which is 
           on our computers or its description in [2].)


   3. Discussion of method and algorithm.

     This program grew out of a need to solve large scale linear systems
where the coefficient matrix arose from a finite difference approximation
(seven point) to an elliptic partial differential equation in three dimensions.
The matrices associated with three dimensional problems tend to be large
and sparse. For example, a finite difference mesh of 15x15x30 cells
leads to a linear system of order 6750 with about 45,000 non-zero
coefficients, for a density of about .001.

     The conjugate gradient algorithm has been known for
some time. This algorithm has finite termination after n steps when
the matrix is symmetric positive definite. The rate of convergence
can often be accelerated by multiplying the problem by an approximation
to the inverse of the matrix. The approximation used for sparse matrices
often is derived from an incomplete factorization of the matrix. By
incomplete factorization we mean that the sparsity structure of the original
matrix is retained in its factors. This requires ignoring or skipping 
certain operations during the factorization. This incomplete factorization
then serves to precondition the original matrix, in some sense, by
clustering the eigenvalues. For a complete description see [1]. In order to 
that the non-symmetric matrix problem be handled by the conjugate gradient
algorithm, the problem must be further transformed into a symmetric positive 
definite one. This is accomplished by implicitly solving the normal equations.
The incomplete factorization and implicit solution of the normal equations are 
transparent to the user.

    The algorithm has the following form.
    ( inv = inverse, trans = transpose, invtrans = inverse transpose )
 
         form incomplete factors L and U
         x(0) = initial solution estimate
         r(0) = b - A*x(0)
         R(0) = trans(A)*invtrans(L)*invtrans(U)*inv(U)*inv(L)*r(0)
         p(0) = R(0)
         i    = 0
         while norm(r)/(norm(a)*norm(x)) > tol do
            s      = inv(U)*inv(L)*A*p(i)
            a(i)   = trans(r(i))*r(i)/(trans(s)*s)
            x(i+1) = x(i) + a(i)*p(i)
            r(i+1) = r(i) - a(i)*trans(A)*invtrans(L)*invtrans(U)*s
            b(i)   = trans(r(i+1))*r(i+1)/(trans(r(i))*r(i))
            p(i+1) = r(i+1) + b(i)*p(i)
            i      = i + 1
         end

 
     The matrix must be stored in a sparse storage scheme to minimize the
amount of memory required. The storage scheme used by ICCGLU is standard in 
dealing with sparse matrices [2] and is designed to facilitate row operations 
on the matrix; for this reason the storage is row oriented. The non-zero
elements of a row are stored consecutively. To locate and identify these 
elements, we need to know where the row starts in storage, how long it is 
and in what columns the elements of the row lie. The scheme requires three 
arrays, A, IA, and JA. The array A contains non-zero elements of the matrix. 
The array JA contains the column indices which correspond to the entries of A. 
If for example, A(K) contains the value of the i,j element of the matrix 
then JA(K) contains j. The array IA contains pointers to the start of rows
in A and JA.  IA(1) = 1 and IA(I+1) is defined so that IA(I+1) - IA(I) is the 
number of non-zero elements in the i-th row of the matrix. The elements 
themselves are stored in locations IA(I) through IA(I+1)-1 of the array A.


                 a11  a12  0     0     0     0
                 a21  a22  a23   a24   0     a26
                 0    a32  a33   0     a35   0
                 0    0    0     a44   0     0
                 a51  0    0     0     a55   a56
                 0    0    0     0     0     a66


A:   a11 a12 a21 a22 a23 a24 a26 a32 a33 a35 a44 a51 a55 a56 a66

JA:   1   2   1   2   3   4   6   2   3   5   4   1   5   6   6

IA:   1   3   8   11  12  15  16

We have provided utility subroutines prepare the input arrays.
Subroutine INITS initializes A, IA, and JA in the appropriate manner.  
Subroutines PUT and PUTSUM are provided to insert information about the 
elements of the matrix into the appropriate arrays.  The call to PUT 
has the form:

       CALL PUT(T,A,N,IA,JA,I,J)

where T is the numerical value to be inserted into the structure
and I and J are its row and column in the matrix. Subroutine PUT
will then insert T or overwrite an existing value if one has previously 
been defined at that location. Subroutine PUTSUM performs the same function 
as PUT except that if a value already exists PUTSUM will add T to it.
     The following program initializes the structure, sets up the matrix,
and finds a solution for an example problem.


      double precision a(100),af(100),b(10),s(10),x(10),r(10),p(10)
      integer ia(11),ja(100)
c
c     this is a test main program for the incomplete factorization conjugate
c     gradient algorithm which uses an incomplete lu decomposition.
c
      lda = 10
      n = 10
c
c     initialize the structure for the matrix.
c
      call inits(a,n,ia,ja)
c
c     setup the matrix and the right hand side.
c
      do 10 i = 2,n
         call put(1.0d0,a,n,ia,ja,i,i-1)
         call put(1.0d0,a,n,ia,ja,i-1,i)
         call put(4.0d0,a,n,ia,ja,i,i)
         b(i) = 6.0d0
   10 continue
      call put(1.0d0,a,n,ia,ja,1,n)
      call put(1.0d0,a,n,ia,ja,n,1)
      b(1) = 5.0d0
      b(n) = 5.0d0
      call put(4.0d0,a,n,ia,ja,1,1)
c
c     estimate the solution
c
      x(1) = 1.0d0
      do 20 i = 2,n
         x(i) = 0.0d0
   20 continue
c  
c     solve the system.
c
      maxits = 20
      tol = 1.0d-10
      call iccglu(a,n,ia,ja,af,b,x,r,p,s,tol,maxits,its,info)
      write(6,30) its
   30 format(' number of iterations taken',i5)
      write(6,40)(x(i),i=1,n)
   40 format(' solution',5e16.8)
      stop
      end


    4. References.

       1) J.J. Dongarra, G.K. Leaf and M. Minkoff, A Preconditioned
          Conjugate Gradient Method for Solving a Class of Non-Symmetric
          Linear Systems, ANL-81-71.

       2) Alan George and Joseph W. Liu, Computer Solution of Large 
          Sparse Positive Definite Systems, Prentice-Hall, 
          New Jersey 1981.
       
    5. Program Statistics.


      subroutine iccglu(a,n,ia,ja,af,b,x,r,p,s,tol,maxits,its,info)
c
      integer n,ia(1),ja(1),maxits,its,info
      double precision a(1),af(1),x(1),r(1),p(1),s(1),b(1),tol
c
c     this routine performs preconditioned conjugate gradient on a
c     sparse matrix. The preconditioner is an incomplete LU of
c     the matrix.
c
c     on entry:
c
c        a  double precision()
c           contains the elements of the matrix.
c
c        n  integer
c           is the order of the matrix.
c
c        ia integer(n+1)
c           contains pointers to the start of the rows in the arrays
c           a and ja.
c
c        ja integer()
c           contains the column location of the corresponding elements
c           in the array a. 
c
c        b double precision (n)
c           contains the right hand side.
c            
c        x double precision (n)
c           contains an estimate of the solution, the closer the
c           estimate is to the true solution the faster the method
c           will converge.
c
c        tol double precision
c           is the accuracy desired in the solution.
c
c        maxits integer
c           is the maximum number of iterations to be taken
c           by the routine.
c
c     on output
c
c        af double precision ()
c           contains the incomplete factorization of the matrix
c           contained in the array a.
c          
c        x double precision (n)
c          contains the solution.
c
c        r,p,s double precision (n)
c          these are scratch work arrays.
c
c        its integer
c          contains the number of iterations need to converge.
c
c        info integer
c          signals if normal termination.
c          info = 0 method converged in its iterations
c          info = -999 method did not converge in maxits iterations.
c
c        
c     the algorithm has the following form.
c
c        form incomplete factors L and U
c        x(0) <- initial estimate
c        r(0) <- b - A*x(0)
c        R(0) <- trans(A)*invtrans(L)*invtrans(U)*inv(U)*inv(L)*r(0)
c        p(0) <- R(0)
c        i    <- 0
c        while r(i) > tol do
c           s      <- inv(U)*inv(L)*A*p(i)
c           a(i)   <- trans(r(i))*r(i)/(trans(s)*s)
c           x(i+1) <- x(i) + a(i)*p(i)
c           r(i+1) <- r(i) - a(i)*trans(A)*invtrans(L)*invtrans(U)*s
c           b(i)   <- trans(r(i+1))*r(i+1)/(trans(r(i))*r(i))
c           p(i+1) <- r(i+1) + b(i)*p(i)
c           i      <- i + 1
c        end
c
      double precision ai,bi,rowold,rownew,xnrm,anrm
      double precision ddot
      anrm = dabs(a(idamax(ia(n+1)-1,a,1)))
c
c     form incomplete factors L and U
c
      call dcopy(ia(n+1)-1,a,1,af,1)
      call lu(af,n,ia,ja)
c
c     r(0) <- b - A*x(0)
c
      call res(a,n,ia,ja,x,b,r)
c
c     R(0) <- trans(A)*invtrans(L)*invtrans(U)*inv(U)*inv(L)*r(0)
c
c     inv(U)*inv(L)*r(0)
c
      call dcopy(n,r,1,s,1)
      call ssol(af,n,ia,ja,s,1)
      call ssol(af,n,ia,ja,s,2)
c
c     invtrans(L)*invtrans(U)*above
c
      call ssol(af,n,ia,ja,s,-2)
      call ssol(af,n,ia,ja,s,-1)
c
c     trans(A)*above
c
      call mmult(a,n,ia,ja,r,s,-1)
c
c     p(0) <- R(0)
c
      call dcopy(n,r,1,p,1)
      rowold = ddot(n,r,1,r,1)
      i = 0
c
c     while r(i) > tol do
c
      ai = 1.0d0
   10 continue
      xnrm = dabs(x(idamax(n,x,1)))
      if( dsqrt(rowold)/(anrm*xnrm) .le. tol ) go to 12
c
c        s      <- inv(U)*inv(L)*A*p(i)
c
         call mmult(a,n,ia,ja,s,p,1)
         call ssol(af,n,ia,ja,s,1)
         call ssol(af,n,ia,ja,s,2)
c
c        a(i)   <- trans(r(i))*r(i)/(trans(s)*s)
c
         ai = rowold/ddot(n,s,1,s,1)
c
c        x(i+1) <- x(i) + a(i)*p(i)
c
         call daxpy(n,ai,p,1,x,1)
c
c
c        r(i+1) <- r(i) - a(i)*trans(A)*invtrans(L)*invtrans(U)*s
c
         call ssol(af,n,ia,ja,s,-2)
         call ssol(af,n,ia,ja,s,-1)
         call mmult(a,n,ia,ja,b,s,-1)
         call daxpy(n,-ai,b,1,r,1)
c
c        b(i)   <- trans(r(i+1))*r(i+1)/(trans(r(i))*r(i))
c
         rownew = ddot(n,r,1,r,1)
         bi = rownew/rowold
c
c        p(i+1) <- r(i+1) + b(i)*p(i)
c
         call daxpy2(n,bi,p,1,r,1)
         rowold = rownew
c
c        i      <- i + 1
c
         i = i + 1
         if( i .gt. maxits ) go to 20
         go to 10
   12 continue
   15 continue
      its = i
      return
   20 continue
      info = -999
      its = maxits
      return
      end