#!/bin/sh PATH=/bin:/usr/bin # To unbundle the double precision version, sh this file as is. # for the single precision version, make the changes as directed at the lines # marked with *SINGLE*, then sh this file cat > read.me << 'EOF' A Level 3 BLAS Based Solver for Almost Block Diagonal Systems Cliff Cyphers and Marcin Paprzycki Department of Mathematics and Computer Science University of Texas of the Permian Basin Odessa, TX 79762 Email: cliffc@tenet.edu m_paprzycki@utpb.pb.utexas.edu SMU Software Report 92-3 1. Introduction The software presented here is a continuation of the work done by Marcin Paprzycki and Ian Gladwell at Southern Methodist University [8,9]. This software provides an efficient, direct method of solving almost block diagonal(ABD) systems. The decomposi- tion is based on block Gaussian elimination with partial pivoting (alternate row and column interchanges [10]) and generates no fill-in. The algorithm utilizes level 3 BLAS [4] primitives where possible and, therefore, is a natural extension to the one proposed by Brankin and Gladwell [2], which uses level 2 BLAS. For a precise description of the algorithm see [8,9]. 2. Block Structure of the ABD System The structure of the almost block diagonal that can be solved by our program is characterized as follows (see also the description of variable IBLKST below): - the overlap between blocks is non-negative - the total amount of overlap in a given block must not be greater than the number of columns in the block - the total number of rows in the system must be equal to the total number of columns in the system - the index of the first column of overlap between i-th and (i+1)th blocks must be less than or equal to the index of the last row of the i-th block - the index of the last column of overlap between i-th and (i+1)th blocks must be greater than or equal to the index of the last row of the i-th block. Figure 1 illustrates the structure of an N block ABD system. |-------| | 1 | |--|----|---| | 2 | | | |----|---| . . . |-------| | | | N-1 | |---|---|-----| | N | |_________| Figure 1 3. Calling the Program The software contains three major subroutines to per- form the LU decomposition (_GEABD), solution for multiple right hand sides (_SOLN) and solution for multiple right hand sides for the transposed system (_SOLT). The arguments to the call to _GEABD ate the same as the call to the NAG routine F01LHF with the exception of the TOLerance parameter used by F01LHF but not present here. The arguments in the call to _SOLN and _SOLT are similar to those in the call to the NAG routine F04LHF. Calls to them have the following forms: CALL _GEABD(N,NBLOKS,IBLKST,A,LENA,IPIV,INDEX,INFO) CALL _SOLN(N,NBLOKS,IBLKST,A,LENA,IPIV,B,LDB,K) CALL _SOLT(N,NBLOKS,IBLKST,A,LENA,IPIV,B,LDB,K) where: N - INTEGER, number of rows in the ABD system, N > 0 NBLOKS - INTEGER, total number of blocks, 0 < NBLOKS <= N IBLKST - INTEGER, matrix of size (3, NBLOKS) which describes the structure of the system IBLKST(1,j) - number of rows in the j-th block j = 1,2, . . . NBLOKS IBLKST(2,j) - number of columns in the j-th block j = 1,2, . . . NBLOKS IBLKST(3,j) - overlap between the jth and (j+1)th blocks j = 1,2, . . . NBLOKS-1 IBLKST(3,NBLOKS) does not need to be set. The following conditions must be satisfied: IBLKST(1,k),IBLKST(2,k) > 0 for k = 1,2,...,NBLOKS, IBLKST(3,k) >= 0 for k = 1,2,...,NBLOKS-1, IBLKST(3,k-1)+IBLKST(3,k) <= IBLKST(2,k) for k = 2,3,...,NBLOKS-1, IBLKST(2,1) + (sum from k=2 to j) [IBLKST(2,k)-IBLKST(3,k-1)] >= (sum from k=1 to j) IBLKST(1,k) for j = 2,3,...,NBLOKS-1, (sum from k=1 to j) [IBLKST(2,k)-IBLKST(3,k)] <= (sum from k=1 to j) [IBLKST(1,k)] for j = 1,2,...,NBLOKS-1 (sum from k=1 to NBLOKS) IBLKST(1,k) = N IBLKST(2,1) + (sum from k=2 to NBLOKS) [IBLKST(2,k)-IBLKST(3,k-1)]= N A - REAL, vector in which the ABD system is stored block by block in the column major order within each block. A must remain unchanged between the call to _GEABD and calls to _SOLN or _SOLT. LENA - INTEGER, length of A in the calling program where LENA >= (sum from j=1 to NBLOKS) (IBLKST(1,j) * IBLKST(2,j)) IPIV - INTEGER, vector of dimension N, on exit contains information about row and column interchanges IPIV must remain unchanged between the call to _GEABD and calls to _SOLN or _SOLT. INFO - INTEGER, error indicator, should be checked after each call to _GEABD 1 - zero pivot detected 0 - normal end -1 - illegal block structure BLKCHK will print an error message indicating the test that failed, and execution will return to the calling program INDEX - INTEGER, zero for successful completion or the location k where a zero pivot has been detected at A(k,k). K - INTEGER, number of right hand sides stored in matrix B B - REAL, matrix containing the right hand sides stored as columns LDB - INTEGER, first dimension of B as declared in the calling program REAL may be either single or double precision as selected during installation. 4. Support Routines Routine _GEABD calls the following support subroutines: - _GETRX - performs a blocked (SAXPY version[5]) LU factorization of an MxN dense block of A using partial pivoting with column interchanges; _GETRX is a column interchange variation of the routine _GETRF described in [1]. - _GETRY - performs a blocked (SAXPY version) LU factorization of an MxN dense block of A using partial pivoting with row interchanges. This is an adaptation of routine _GETRF described in [1]. - BLKCHK - called by _GEABD, _SOLN, and _SOLT to check the structure of the ABD system; BLKCHK was adapted with permission from the NAG routine F01LHZ; users with access to the NAG library should replace calls to BLKCHK with calls to F01LHZ. Since the arguments to these two routines are the same, the call to F01LHZ can be made simply by changing BLKCHK to F01LHZ. Both _GETRX and _GETRY utilize level 3 BLAS routines _GEMM and _TRSM. Some systems may provide an implementation of _GETRF which may be called in place of _GETRY. _GETRX and _GETRY perform calls to the following routines: - _GTX - called by _GETRX; performs unblocked (SAXPY version) LU factorization inside the blocked elimination using partial pivoting with column interchanges - _GTY - called by _GETRY; performs unblocked (SAXPY version) LU factorization inside the blocked elimination using partial pivoting with row interchanges Both _GTX and _GTY utilize calls to level 1 BLAS routines I_AMAX, _SWAP and _SCAL and to the level 2 BLAS routine _GER. Figure 2 shows the call dependencies of all of the programs involved. - ENVIR - called by _GETRY and _GETRX; specifies the optimal blocksize to use in the elimination; it is located in the file ABD2.F and must be set to the optimum blocksize for a given system before compilation; the optimum size will depend on many factors such as the system load, number of processors, the size of the problem, etc; see [1] for details. For example, on a Cray Y-MP, a blocksize of 128 or 256 makes a good choice [9,10]. Figure 2 _GEABD-------\ |--BLKCHK |--_GETRY----\ | |--ENVIR | |--_SWAP | |--_TRSM | |--_GEMM | |--_GTY----\ |--_SWAP |--I_AMAX |--_TRSM |--_SWAP |--_GEMM |--_SCAL | |--_GER |--_GETRX----\ |--ENVIR |--_SWAP |--_TRSM |--_GEMM |--_GTX----\ |--I_AMAX |--_SWAP |--_SCAL |--_GER _SOL_------\ |--_SWAP |--_TRSM |--_GEMM 5. Installation It is assumed that the user is familiar with the edi- tors, utilities, compiler, and linker on the target machine. The first step is to upload the files onto the target machine. Single or double precision needs to be specified in all routines. This can be achieved by a global search and replace. All occurrences of the underscore "_" charac- ter need to be replaced with an "S" for single precision or a "D" for double precision. This step modifies the names of called routines so that the appropriate single or double precision routines will be called. No other changes are needed for single precision. For double precision string "CDBL" needs to be replaced with 4 (four) spaces. This change will uncomment an implicit double precision declara- tion in all of the programs. If the target machine is a UNIX machine then the easiest way to make these changes is: For single precision: sed -e "s/_/S/g" filename.f > newfile.f For double precision: sed -e "s/_/D/g" -e "s/CDBL/ /g" filename.f > newfile.f In both cases filename.f is edited leaving the edited version in newfile.f. finally, the blocksize needs to be specified in the routine ENVIR. To set the blocksize: sed -e "s/++ENVIR++/nnn/g" file.f > newfile.f Where nnn is the desired blocksize, and file.f is the file that was created from ABD2.F by the sed command above. For a non-Unix machine, edit the file ABD2.F and replace ++ENVIR++ with the blocksize. A minimum installation requires that the above pro- cedure be applied to the files ABD2.F and SGET.F. To test the software ABDML.F and TEST1.F will need to be modified. And to run the example BVP program include the file EXAMPLE1.F in the installation. 5.1. Testing After all the files are appropriately modified, the software should be tested. The program test1.f can be used to test the installation. Test1.f makes calls to the random number generator ranf(). If the random number generator on the target machine is different then calls to ranf() should be replaced with calls to the available function. Test1.f creates several ABD systems of varying size blocks and overlaps. The matrices A and X are filled with random numbers between zero and one. Where A is an NxN ABD system and X is an NxM matrix. The right hand side B is generated by multiplying A by X by a call to _ABDML[2]. The LU factorization of A is performed by a call to _GEABD. The solution to the system A~X = B is then calculated by a call to _SOLN. Next the accuracy of the solution is checked by calculating ||X - ~X||/sqrt(N), the two-norm of the differ- ence between the original X and the calculated X divided by the square root of the number of rows. This procedure is repeated for the transpose of A by calling _SOLT after A has been decomposed. For each system tested test1.f will output the calculated error, the number of rows, number of blocks, length of A, number of right hand sides, the size and over- lap of each block, and whether the system is transpose or not. This software has been installed and tested using test1.f on the following machines: Computer Precision Used Convex 220 Double IBM RISC 6000 Double Cray YMP Single DEC Station 5000 Double The following describes the systems tested by test1.f along with the error calculated on the indicated hardware plat- form. The differences in the error for the machines using double precision is caused by random number generator producing different values on each machine. No. Rows: 1851 No. Blocks: 4 Computer Error Len of A: 1016485 No. RHS: 10 Convex 220 6E-12 Rows Cols Overlap IBM RISC 6000 7E-11 684 X 718 68 Cray YMP 1E-10 437 X 513 84 DEC Station 5000 1E-11 322 X 388 48 408 X 432 No. Rows: 3000 No. Blocks:10 Computer Error Len of A: 900000 No. RHS: 10 Convex 220 2E-12 Rows Cols Overlap IBM RISC 6000 7E-11 300 X 300 0 Cray YMP 1E-10 DEC Station 5000 2E-11 No. Rows: 1900 No. Blocks:20 Computer Error Len of A: 370000 No. RHS: 37 Convex 220 2E-12 Rows Cols Overlap IBM RISC 6000 4E-10 50 X 100 100 First and last blocks Cray YMP 1E-10 100 X 200 100 Eighteen blocks DEC Station 5000 2E-12 No. Rows: 552 No. Blocks: 6 Computer Error Len of A: 86626 No. RHS: 13 Convex 220 3E-12 Rows Cols Overlap IBM RISC 6000 8E-11 10 X 20 10 Cray YMP 8E-10 200 X 200 0 DEC Station 5000 6E-12 100 X 100 37 50 X 147 108 66 X 200 86 126 X 126 No. Rows: 200 No. Blocks:20 Computer Error Len of A: 2190 No. RHS: 37 Convex 220 3E-12 Rows Cols Overlap Block IBM RISC 6000 5E-10 10 X 10 1 1 Cray YMP 1E-09 10 X 11 1 2-19 DEC Station 5000 2E-09 10 X 11 20 No. Rows: 12 No. Blocks: 3 Computer Error Len of A: 113 No. RHS: 9 Convex 220 3E-15 Rows Cols Overlap IBM RISC 6000 2E-13 1 X 2 1 Cray YMP 1E-11 1 X 11 10 DEC Station 5000 1E-15 10 X 10 The accuracy achieved by _GEABD and _SOL_ on the Cray is on the same order as the accuracy achieved by calls to the NAG routines F01LHF and F04LHF. The single precision arithmetic on the Cray provides slightly less precision than the double precision obtained on the other machines. 6. Examples Example1.f uses a finite difference scheme to discre- tize a linear two-point boundary value problem for ordinary differential equations[6]. The resulting ABD linear system is decomposed by a call to _GEABD and solved by a call to _SOLN to give an approximate solution to the ODE's. Table 1 gives the explicit solution y, and the approximate solution ~y for each equation at each value of x. The errors in Table 1 are what can be expected when solving a BVP on such a crude mesh as that used here. These results were obtained on a Convex 220 using double precision. Replacing the calls to _GEABD and _SOLN, by calls to the NAG routines F01LHF and F04LHF respectively produced identical results for this example for a variety of different number of mesh points. To try this example for a different number of mesh points, modify the line NP = in example1.f. The ODE's and there explicit solutions are: 1. ey'' - y' = 0 ; y = A + B exp(x/e) , e = 0.1 2. ey'' + xy' = 0 ; y = A + B erf(x/sqrt(2e)) , e = 0.1 3. ey'' - y = 0 ; y = A exp(-x/sqrt(e)) + B exp(x/sqrt(e)) , e = 0.1 Table 1 x Equation y ~y ------------ -------- ---------------- ---------------- Left Boundary condition -1.0000 1 2.00000000000000 2.00000000000000 2 6.00000000000000 6.00000000000000 3 2.00000000000000 2.00000000000000 -0.7500 1 2.00000004609769 1.99999953538875 2 2.75289346838977 2.62370502510638 3 2.05059530023110 2.02919062500771 -0.5000 1 2.00000060768247 2.00000371689032 2 1.31784565069080 1.19089521967209 3 2.11719506215659 2.20433436925429 -0.2500 1 2.00000744918544 1.99996608337369 2 0.750253737967740 0.640230005823079 3 2.24303669653980 2.69792127795623 0.0 1 2.00009079575093 2.00030478504576 2 0.676507242490030 0.563809232356443 3 3.50000028435189 3.50000000000000 0.2500 1 2.00110616474389 1.99725646979526 2 1.04806342866501 0.905024919926263 3 4.75696330346020 4.30207872204377 0.5000 1 2.01347589090767 2.02469130886667 2 2.10949464649351 1.91662942523160 3 4.88280493784341 4.79566563074571 0.7500 1 2.16416999957968 1.77777774087150 2 4.55947511260207 4.34795940563524 3 4.94940469976890 4.97080937499229 Right Boundary condition 1.0000 1 4.00000000000000 4.00000000000000 2 10.0106747335822 10.0000000000000 3 5.00000000000000 5.00000000000000 ------------------------------------------------------------------------ 7. FILES: Directory File Name Contents --------- ------------ ------------------------ READ.ME This file SOURCE ABD2.F _GEABD _SOLN _SOLT SUBR SGET.F _GETRX _GETRY _GTX _GTY SUBR BLKCHK.F BLKCHK EROUT SUBR ABDML.F _ABDML[2] EXAMPLES TEST1.F TEST EXAMPLE1.F Acknowledgement We would like to express our deepest gratitude to Ian Gladwell for his continued guidance and support, to NAG for allowing us to adapt their routine F01LHZ, and to Richard Brankin for allowing us to use his routine _ABDML, described in [2]. References 1. Bischof, C., Demmel, J., Dongarra, J. J., Du Croz, J., Greenbaum, A., Hammarling, S., and Sorensen, D. "LAPACK Working Note #5 Provisional Contents," Technical Report ANL-88-38, Argonne National Laboratory, 1988. 2. Brankin, R.W., Gladwell, I., "Codes for Almost Block Diagonal Systems," Computers and Mathematics with Applications, 19, 1990, 1-7. 3. Dongarra, J. J., Du Croz, J., Hammarling, S., and Hanson, R.J., "An Extended Set of FORTRAN Basic Linear Algebra Subprograms," ACM Transactions on Mathematical Software, 14(1), 1988, 1-17. 4. Dongarra, J. J., Du Croz, J., and Hammarling, S., "A Set of Level 3 Basic Linear Algebra Subprograms," Technical Report ANL-MCS-TM57, Argonne National Laboratory, 1988. 5. Dongarra, J. J., Gustavson, F. G., Karp, A., "Implementing Linear Algebra Algorithms for Dense Matrices on a Vector Pipeline Machine," SIAM Review, 26, No. 1, January 1984, pp. 91-112. 6. Hemker, P. W., "A Numerical Study of Stiff Two-point Boundary Problems," Amsterdam: Mathematisch Centrum, 1977. 7. Paprzycki, M., "Parallelization of Boundary Value Problem Software," Ph.D. Dissertation, SMU, 1990 8. Paprzycki, M., Gladwell, I., "Using Level 3 BLAS to Solve Almost Block Diagonal Systems," "Proceedings of The Fifth SIAM Conference on Parallel Processing for Scientific Computing," Ed. Dongarra, J., Kennedy, K., Messina, P., Sorensen, D.C., Voigt, R.G., 52-62. 9. Paprzycki, M., Gladwell, I., "Parallel Solution of Almost Block Diagonal Systems Using Level 3 BLAS," to appear in: "Journal for Computational and Applied Mathematics." 10. Varah, J. M., "Alternate Row and Column Elimination for Solving Certain Linear Systems", SIAM J. Numer. Anal., 13, 1976, 71-75. EOF #*SINGLE* uncomment the line below, and delete the second line # for single precision #sed -e "s/_/S/g" > abd2.f <<'EOF' sed -e "s/CDBL/ /g" -e "s/_/D/g" > abd2.f <<'EOF' C====================================================================== C Cliff Cyphers and Marcin Paprzycki C Department of Mathematics and Computer Science C The University of Texas of the Permian Basin C Odessa, Texas 79762 C cliffc@tenet.edu C m_paprzycki@utpb.pb.utexas.edu C C====================================================================== C C Return the optimum block size for this environment. C See READ.ME for a discusion of this. SUBROUTINE ENVIR(I,N) CHARACTER*3 I C CHANGE ++ENVIR TO THE APPROPRIATE VALUE N = ++ENVIR RETURN END C C====================================================================== C SUBROUTINE _GEABD(N,NBLOKS,IBLKST,A,LENA,IPIV,INDEX,INFO) CDBL IMPLICIT DOUBLE PRECISION(A-H,O-Z) DIMENSION A(LENA), IPIV(N), IBLKST(3,NBLOKS) CHARACTER*6 SRNAME PARAMETER (ONE = 1.0E0, IONE = 1) LOGICAL REPORT C C COMPUTES LU FACTORIZATION OF ALMOST BLOCK DIAGONAL SYSTEM ARISING C FROM DISCRETIZATION OF A BVP USING FINNITE DIFFERENCES (BOX SCHEME), C PARTIAL PIVOTING, ALTERNATE ROW AND COLUMN INTERCHANGES, BLOCKED C VERSION C C THIS VERSION WILL WORK WITH VARYING SIZES OF BLOCKS. C C THIS TAKES THE SAME PARAMETERS AS THE NAG ROUTINE F01LHF WITH THE C EXCEPTION OF THE TOLERANCE PERAMETER IN F01LHF. HERE ONLY A ZERO C ON THE DIAGONAL WILL BE DETECTED AS A SINGULARITY C C N - NUMBER OF ROWS IN A C C NBLOKS - TOTAL NUMBER OF BLOCKS, UNCHANGED C C IBLKST - 3 X NBLOKS TABLE WHICH DESCRIBES THE BLOCK STRUCTURE OF A C IBLKST(1,K) - NUMBER OF ROWS IN KTH BLOCK C IBLKST(2,K) - NUMBER OF COLUMNS IN KTH BLOCK C IBLKST(3,K) - OVERLAP BETWEEN THE KTH BLOCK AND (K+1)TH BLOCK C C A - SYSTEM TO BE DECOMPOSED, STORED BLOCK BY BLOCK, COLUMN BY COLUMN, C OVERWRITTEN BY THE LU DECOMPOSITION, C C LENA - LENGTH OF A IN CALLING PROGRAM C C IPIV - VECTOR - ON EXIT CONTAINS PIVOT INDICES C C C INFO - ERROR INDICATOR, 0 - NORMAL END C -1 - ILLEGAL BLOCK STRUCTUE C 1 - SIGULARITY DETECTED C C INDEX - LOCATION OF THE ZERO PIVOT HAS BEEN DETECTED OR C ZERO FOR SUCCESSFUL COMPLETION C C IA1PTR - POINTER TO BLOCK decomposed in phase I C IA2PTR - pointer to block calculated in phase I C IA3PTR - pointer to block updated in phase I C IA3PTR - also points to the block decomposed in phase II C IA5PTR - pointer to block calculated in phase II C IA6PRT - pointer to block updated in phase II C IPP - POINTS TO THE LOCATION IN PIVOT VECTOR FOR THE CURRENT BLOCK C EXECUTABLE STATEMENTS SRNAME='_GEABD' C C C TEST FOR A BAD BLOCK STRUCTURE IERR = 0 C THE FOLLOWING CALL MAY BE REPLACED BY A CALL TO THE NAG ROUTINE C F01LHZ WITH THE SAME PARAMETERS CALL BLKCHK(IBLKST,NBLOKS,N,LENARQ,SRNAME,REPORT,IERR) IF (IERR.NE.0) THEN info = -1 RETURN else if (lenarq.gt.lena) then info = -1 return ENDIF C C INITIALIZE POINTERS C IPP = 1 IA1PTR = 1 NA5 = 0 IA5PTR = 1 C C MAIN LOOP AROUND THE FIRST NBLOKS-1 BLOCKS C DO 100 J = 1, NBLOKS-1 C C SET UP POINTERS FOR SUBLOCKS IN THIS BLOCK C MA1 = IBLKST(1,J) NA1 = IBLKST(2,J) - IBLKST(3,J) - NA5 MA2 = NA1 NA2 = IBLKST(3,J) IA2PTR = IA1PTR + MA1 * NA1 MA3 = IBLKST(1,J) - NA1 NA3 = IBLKST(3,J) IA3PTR = IA2PTR + MA2 MA4 = MA3 NA4 = NA1 IA4PTR = IA1PTR + NA1 C C IA5PTR AND IA6PTR point to BLOCKS in the J+1 BLOCK of A C CACLULATED AND UPDATED during phase II MA5 = IBLKST(1,J+1) NA5 = MA3 IA5PTR = IA5PTR + IBLKST(1,J) * IBLKST(2,J) MA6 = MA5 NA6 = IBLKST(3,J) - NA5 IA6PTR = IA5PTR + MA5 * NA5 C MA7 = MA3 NA7 = IBLKST(3,J) - MA3 IA7PTR = IA2PTR + MA3 * (MA3 + MA2) + MA2 IAL1PT = IA1PTR C C PHASE I - DECOMPOSE MA1 X NA1 SUB-BLOCK of the jth BLOCK of A C with ROW INTERCHANGES C CALL _GETRY(MA1,NA1,A(IA1PTR),MA1,IPIV(IPP),INFO) C IF(INFO .NE. 0) THEN INDEX = INFO + IPP - 1 INFO = 1 RETURN ENDIF C C APPLY INTERCHANGES TO BLOCKS TO THE "LEFT" AND "RIGHT" C CAN BE PERFORMED IN PARALLEL C Check to see if this is the first block, if it is then C there is no BLOCK to the left. IF (J.GT.1) THEN DO 40 I = IPP, IPP + MA2 - 1 IPIV(I) = IPIV(I) + IPP - 1 IP = IPIV(I) IF(IP .NE. I) THEN CALL _SWAP(NA2,A(IA2PTR + I - IPP),MA1, & A(IA2PTR + IP - IPP),MA1) CALL _SWAP(I5PRVN,A(I5PVPT + I - IPP), I5PRVM, & A(I5PVPT + IP - IPP),I5PRVM) ENDIF 40 CONTINUE ELSE DO 41 I = IPP, IPP + MA2 - 1 IPIV(I) = IPIV(I) + IPP - 1 IP = IPIV(I) IF(IP .NE. I) THEN CALL _SWAP(NA2,A(IA2PTR + I - IPP),MA1, & A(IA2PTR + IP - IPP),MA1) ENDIF 41 CONTINUE ENDIF C C CALCULATE A BLOCK OF L -- This is the MA2 X NA2 BLOCK to the C right of the Block that was just decomposed. C CALL _TRSM('LEFT', 'LOWER', 'NO TRANSPOSE', 'UNIT', & MA2, NA2, ONE, A(IAL1PT), MA1, A(IA2PTR), MA1) C C UPDATE DIAGONAL BLOCK -- This is the MA3 X NA3 block on the C diagonal C CALL _GEMM('NO TRANSPOSE', 'NO TRANSPOSE', MA4,NA2,NA4, -ONE, & A(IA4PTR),MA1, A(IA2PTR),MA1, ONE, A(IA3PTR),MA1) C C END OF PHASE I C IPP = IPP + NA1 C C PHASE II - DECOMPOSE BLOCK WITH COL INTERCHANGES C This is the MA3 X NA3 block on the diagonal C CALL _GETRX(MA3, NA3, A(IA3PTR),MA1, IPIV(IPP), INFO) C IF(INFO .NE. 0) THEN INDEX = INFO + IPP - 1 INFO = 1 RETURN ENDIF C C APPLY INTERCHANGES TO BLOCKS "ABOVE" AND "BELOW", CAN C BE PERFORMED IN PARALLEL C DO 50 I= IPP, IPP + MA3 - 1 IPIV(I) = IPIV(I) + IPP - 1 IP = IPIV(I) IF (IP .NE. I) THEN CALL _SWAP(MA2,A(IA2PTR + ((I - IPP) * MA1)),IONE, & A((IA2PTR + (IP - IPP) * MA1)),IONE) CALL _SWAP(MA5,A(IA5PTR + ((I - IPP) * MA5)),IONE, & A((IA5PTR + (IP - IPP) * MA5)),IONE) ENDIF 50 CONTINUE C C CALCULATE A BLOCK OF L -- This is the MA5 X NA5 sub-block of C the j+1 BLOCK of A. This is the left most part of the j+1 C block of A which is in the overlap area to the left of the C diagonal. C CALL _TRSM('RIGHT', 'UPPER', 'NO TRANSPOSE', 'NO UNIT', & MA5, NA5, ONE, A(IA3PTR),MA1,A(IA5PTR),MA5) C C UPDATE DIAGONAL BLOCK -- This is the MA6 X NA6 sub-block of C the j+1 block of A, which extends from the diagonal to the C right edge of the jth block of A C CALL _GEMM('NO TRANSPOSE', 'NO TRANSPOSE', MA5,NA7,NA5, -ONE, & A(IA5PTR),MA5, A(IA7PTR),MA1, ONE, A(IA6PTR),MA5) C C END OF PHASE II C IPP = IPP + MA3 C WILL NEED POINTER TO PREVIOUS A5, THE PREVIOUS A5 IS THE BLOCK C TO THE LEFT OF THE CURRENT A1 I5PVPT = IA5PTR I5PRVM = MA5 I5PRVN = NA5 IA1PTR = IA6PTR 100 CONTINUE C C ============================================================= C DECOMPOSITION OF THE LAST BLOCK C C LAST BLOCK WILL ALWAYS BE SQUARE AND WILL SIMPLY NEED TO BE DECOMPSED C C PHASE I - DECOMPOSE BLOCK, ROW INTERCHANGES MA1 = IBLKST(1,NBLOKS) NA1 = IBLKST(2,NBLOKS) - NA5 C CALL _GETRY(MA1, NA1, A(IA1PTR), MA1, IPIV(IPP), INFO) IF(INFO .NE. 0) THEN INDEX = INFO + IPP - 1 INFO = 1 RETURN ENDIF C C APPLY INTERCHANGES TO BLOCKS "LEFT" CAN BE PERFORMED C IN PARALLEL DO 60 I = IPP, IPP + NA1 - 1 IPIV(I) = IPIV(I) + IPP - 1 IP = IPIV(I) IF(IP .NE. I) THEN CALL _SWAP(I5PRVN,A(I5PVPT + I - IPP), & I5PRVM,A(I5PVPT + IP - IPP),I5PRVM) ENDIF 60 CONTINUE RETURN END C C C=========================================================================== C SUBROUTINE _SOLN(N,NBLOKS,IBLKST,A,LENA,IPIV,B,LDB,K) CDBL IMPLICIT DOUBLE PRECISION(A-H,O-Z) DIMENSION A(LENA), IPIV(N), B(LDB, K), IBLKST(3,NBLOKS) PARAMETER (ONE = 1.0E0) C C SOLUTION OF ALMOST BLOCK DIAGONAL SYSTEM DECOMPOSED BY ROUTINE C _GEABD, MULTIPLE RIGHT HAND SIDES, BLOCK VERSION C C N - NUMBER OF ROWS IN A C C NBLOKS - TOTAL NUMBER OF BLOCKS, INPUT ONLY C C A - MATRIX (STORED AS A VECTOR), CONTAINS LU DECOMPOSITION FROM _GEABD, C SHOULD REMAIN UNCHANGED BETWEEN CALLS, INPUT ONLY C C IBLKST - 3 X NBLOKS TABLE WHICH DESCRIBES THE BLOCK STRUCTURE OF A C IBLKST(1,K) - NUMBER OF ROWS IN KTH BLOCK C IBLKST(2,K) - NUMBER OF COLUMNS IN KTH BLOCK C IBLKST(3,K) - OVERLAP BETWEEN THE KTH BLOCK AND (K+1)TH BLOCK C C IPIV - VECTOR CONTAINING PIVOT INDICES FROM _GEABD, SHOULD REMAIN C UNCHANGED BETWEEN CALLS, INPUT ONLY C C K - NUMBER OF RIGHT HAND SIDES STORED IN MATRIX B, INPUT ONLY C C B - MATRIX CONTAINING RIGHT HAND SIDES STORED AS COLUMNS, C ON EXIT OVERWRITTEN BY SOLUTIONS C C LDB - FIRST DIMENSION OF B AS DECLARED IN CALLING (SUB)PROGRAM, C INPUT ONLY C C EXECUTABLE STATEMENTS C C C APPLY ROW PERMUTATION MATRIX P TO THE RIGHT HAND SIDE C IBLKST(3,NBLOKS) = 0 L = 0 IPP = 1 DO 290 J = 1, NBLOKS M1 = IBLKST(2,J) - IBLKST(3,J) - L L = IBLKST(1,J) - M1 DO 295 I = IPP, (IPP + M1 - 1) IP = IPIV(I) IF(IP .NE. I) THEN CALL _SWAP(K, B(I, 1), LDB, B(IP, 1), LDB) ENDIF 295 CONTINUE IPP = IPP + IBLKST(1,J) 290 CONTINUE C C C SOLUTION OF LY = B C C CALCULATE TOP PART OF Y USING TOP BLOCK C MB = IBLKST(1,1) CALL _TRSM ('LEFT', 'LOWER', 'NO TRANSPOSE', 'UNIT', & MB, K, ONE, A(1), MB, B(1,1), LDB) C ILPTR = 1 IB1PTR = 1 ITPTR = 1 MT = 0 NT = 0 MB1 = IBLKST(1,1) IF (NBLOKS .EQ. 1 ) THEN ML = MB IUPTR = 1 IB2PTR = 1 ENDIF C ILPTR = 1 C C USE INTERNAL BLOCKS TO CALCULATE Y C DO 110 J = 2, NBLOKS C C SET UP POINTERS C MT = IBLKST(1,J) NT = IBLKST(1,J-1) - IBLKST(2,J-1) + IBLKST(3,J-1) + NT IB2PTR = IB1PTR + MB1 IB1PTR = IB1PTR + MB1 - NT MB1 = NT MB2 = MT ITPTR = ITPTR + IBLKST(1,J-1) * IBLKST(2,J-1) C C UPDATE PART OF Y USING PREVIOUSLY COMPUTED SOLUTION C CALL _GEMM('NO TRANSPOSE', 'NO TRANSPOSE', MT, K, NT, & -ONE, A(ITPTR), MT, B(IB1PTR,1), LDB, & ONE, B(IB2PTR,1), LDB) C C SET UP POINTERS C ILPTR = ITPTR + MT * NT ML = IBLKST(1,J) MB1 = ML IB1PTR = IB1PTR + NT C C CALCULATE PART OF Y USING PART OF INTERNAL BLOCK C CALL _TRSM ('LEFT', 'LOWER','NO TRANSPOSE', 'UNIT', & ML, K, ONE, A(ILPTR), ML, B(IB2PTR,1),LDB) C 110 CONTINUE C C C SOLUTION OF UX = Y C C CALCULATE BOTTOM PART OF X USING BOTTOM BLOCK OF U C C IUPTR = ILPTR CALL _TRSM('LEFT', 'UPPER', 'NO TRANSPOSE', 'NO UNIT', & ML, K, ONE, A(IUPTR), ML, B(IB2PTR,1), LDB) MQ = IBLKST(1,NBLOKS - 1) IUPTR = IUPTR - ML * NT C C C CALCULATE X EXCEPT TOP PART USING INTERNAL BLOCKS OF U C DO 210 J = NBLOKS-1, 1, -1 C C SET UP POINTERS C NQ = IBLKST(3,J) - NT ML = IBLKST(1,J) NL = ML MQ = ML IB1PTR = IB2PTR - ML IUPTR = IUPTR - (MQ * NQ) - (ML * NL) IQPTR = IUPTR + ML * NL C C UPDATE PART OF X USING PREVIOUS SOLUTION C CALL _GEMM('NO TRANSPOSE', 'NO TRANSPOSE', ML, K, NQ, & -ONE,A(IQPTR),ML, B(IB2PTR,1), LDB, & ONE, B(IB1PTR,1), LDB) C C CALCULATE PART OF X USING BLOCK OF U C CALL _TRSM('LEFT', 'UPPER', 'NO TRANSPOSE','NO UNIT', & MQ, K, ONE, A(IUPTR), ML, B(IB1PTR,1), LDB) C NT = IBLKST(2,J) - (NL + NQ) IB2PTR = IB1PTR IUPTR = IUPTR - (ML * NT) 210 CONTINUE C C C C APPLY COLUMN PERMUTATION MATRIX Q TO THE SOLUTION C L = 0 IPP = 1 DO 310 J = 1, NBLOKS - 1 M1 = IBLKST(2,J) - IBLKST(3,J) - L L = IBLKST(1,J) - M1 IPP = IPP + M1 DO 305 I = (IPP + L - 1), IPP, (-1) IP = IPIV(I) IF(IP .NE. I) THEN CALL _SWAP(K, B(I, 1), LDB, B(IP, 1), LDB) ENDIF 305 CONTINUE IPP = IPP + L 310 CONTINUE C RETURN END C C================================================================= C SUBROUTINE _SOLT(N,NBLOKS,IBLKST,A,LENA,IPIV,B,LDB,K) CDBL IMPLICIT DOUBLE PRECISION(A-H,O-Z) DIMENSION A(LENA), IPIV(N), B(LDB, K), IBLKST(3,NBLOKS) PARAMETER (ONE = 1.0E0) C SOLUTION OF OP(A)X = B WHERE OP IS TRANSPOSE. C SOLUTION OF ALMOST BLOCK DIAGONAL SYSTEM DECOMPOSED BY ROUTINE C _GEABD, MULTIPLE RIGHT HAND SIDES, BLOCK VERSION C C N - NUMBER OF ROWS IN A C C NBLOKS - TOTAL NUMBER OF BLOCKS, INPUT ONLY C C A - MATRIX (STORED AS A VECTOR), CONTAINS LU DECOMPOSITION FROM _GEABD, C SHOULD REMAIN UNCHANGED BETWEEN CALLS, INPUT ONLY C C IBLKST - 3 X NBLOKS TABLE WHICH DESCRIBES THE BLOCK STRUCTURE OF A C IBLKST(1,K) - NUMBER OF ROWS IN KTH BLOCK C IBLKST(2,K) - NUMBER OF COLUMNS IN KTH BLOCK C IBLKST(3,K) - OVERLAP BETWEEN THE KTH BLOCK AND (K+1)TH BLOCK C C IPIV - VECTOR CONTAINING PIVOT INDICES FROM _GEABD; SHOULD REMAIN C UNCHANGED BETWEEN CALLS, INPUT ONLY C C K - NUMBER OF RIGHT HAND SIDES STORED IN MATRIX B, INPUT ONLY C C B - MATRIX CONTAINING RIGHT HAND SIDES STORED AS COLUMNS, C ON EXIT OVERWRITTEN BY SOLUTIONS C C LDB - FIRST DIMENSION OF B AS DECLARED IN CALLING (SUB)PROGRAM, C INPUT ONLY C C EXECUTABLE STATEMENTS C C C APPLY THE TRANSPOSE COLUMN PERMUTATION MATRIX Q TO THE RIGHT HAND SIDE C IBLKST(3,NBLOKS) = 0 L = 0 IPP = 1 DO 310 J = 1, NBLOKS - 1 M1 = IBLKST(2,J) - IBLKST(3,J) - L L = IBLKST(1,J) - M1 IPP = IPP + M1 DO 305 I = IPP, (IPP + L - 1) IP = IPIV(I) IF(IP .NE. I) THEN CALL _SWAP(K, B(I, 1), LDB, B(IP, 1), LDB) ENDIF 305 CONTINUE IPP = IPP + L 310 CONTINUE C C C SOLUTION OF OP(U)Y = B C C INITIALIZE POINTERS C NT = 0 IUPTR = 1 IB1PTR = 1 IB2PTR = 1 DO 110 J = 1, NBLOKS - 1 C C SET UP POINTERS C MU = IBLKST(1,J) NQ = IBLKST(2,J) - MU - NT IQPTR = IUPTR + MU * MU IB2PTR = IB2PTR + MU C C CALCULATE PART OF Y USING PART OF INTERNAL BLOCK C CALL _TRSM ('LEFT', 'UPPER','TRANSPOSE', 'NO UNIT', & MU, K, ONE, A(IUPTR), MU, B(IB1PTR,1),LDB) C C UPDATE PART OF Y USING PREVIOUSLY COMPUTED SOLUTION C CALL _GEMM('TRANSPOSE', 'NO TRANSPOSE', NQ, K, MU, & -ONE, A(IQPTR), MU, B(IB1PTR,1), LDB, & ONE, B(IB2PTR,1), LDB) C C SET UP POINTERS C NT = IBLKST(3,J) - NQ IUPTR = IUPTR + MU * (MU + NQ) + NT * IBLKST(1,J + 1) IB1PTR = IB2PTR C 110 CONTINUE C C CALCULATE LAST PART OF Y MU = IBLKST(1,NBLOKS) CALL _TRSM ('LEFT', 'UPPER','TRANSPOSE', 'NO UNIT', & MU, K, ONE, A(IUPTR), MU, B(IB1PTR,1),LDB) C C C SOLUTION OF OP(L) X= Y C C ILPTR = LENA + 1 IB1PTR = N + 1 NQ = 0 NT = 0 C DO 210 J = NBLOKS, 2, -1 C C SET UP POINTERS C ML = IBLKST(1,J) ILPTR = ILPTR - ML * ML - NQ * ML NT = IBLKST(2,J) - (ML + NQ) ITPTR = ILPTR - ML * NT IB1PTR = IB1PTR - ML IB2PTR = IB1PTR - NT C C CALCULATE PART OF X USING BLOCK OF L C CALL _TRSM('LEFT', 'LOWER', 'TRANSPOSE','UNIT', & ML, K, ONE, A(ILPTR), ML, B(IB1PTR,1), LDB) C C UPDATE PART OF X USING PREVIOUS SOLUTION C CALL _GEMM('TRANSPOSE', 'NO TRANSPOSE', NT, K, ML, & -ONE,A(ITPTR),ML, B(IB1PTR,1), LDB, & ONE, B(IB2PTR,1), LDB) C ILPTR = ILPTR - ML * NT NQ = IBLKST(3,J - 1) - NT C 210 CONTINUE C C CALCULATE LAST PART OF X USING BLOCK OF L C ML = IBLKST(1,1) CALL _TRSM('LEFT', 'LOWER', 'TRANSPOSE','UNIT', & ML, K, ONE, A(1), ML, B(1,1), LDB) C C C C APPLY THE TRANSPOSE ROW PERMUTATION MATRIX P TO THE SOLUTION C L = 0 IPP = 1 DO 290 J = 1, NBLOKS M1 = IBLKST(2,J) - IBLKST(3,J) - L L = IBLKST(1,J) - M1 DO 295 I = (IPP + M1 - 1), IPP, (-1) IP = IPIV(I) IF(IP .NE. I) THEN CALL _SWAP(K, B(I, 1), LDB, B(IP, 1), LDB) ENDIF 295 CONTINUE IPP = IPP + IBLKST(1,J) 290 CONTINUE C RETURN END EOF #*SINGLE* uncomment the line below, and delete the second line # for single precision #sed -e "s/_/S/g" > sget.f <<'EOF' sed -e "s/CDBL/ /g" -e "s/_/D/g" > sget.f <<'EOF' C======================================================================= C Cliff Cyphers and Marcin Paprzycki C Department of Mathematics and Computer Science C The University of Texas of the Permian Basin C Odessa, Texas 79762 C cliffc@tenet.edu C m_paprzycki@utpb.pb.utexas.edu C======================================================================= SUBROUTINE _GETRX(M, N, A, LDA, IPIV, INFO) CDBL IMPLICIT DOUBLE PRECISION(A-H,O-Z) C PARAMETER(ONE = 1.0E0, IONE = 1, ZERO = 0.0E0) DIMENSION A(LDA, N), IPIV(*) C C COMPUTES LU FACTORIZATION OF M-BY-N MATRIX; PARTIAL PIVOTING C COLUMN INTERCHANGES C C M - NUMBER OF ROWS IN A, UNCHANGED C C N - NUMBER OF COLUMNS IN A, UNCHANGED C C A - ARRAY OF DIMENSION (LDA, N), OVERWRITTEN BY FACTORIZATION C L - UNIT LOVER TRIANGULAR C - U UPPER TRIANGULAR C C LDA - FIRST DIMENSION OF A AS DECLARED IN THE CALLING (SUB)PROGRAM, C UNCHANGED C C IPIV - ARRAY OF DIMENSION (M) ON EXIT CONTAINS PIVOT INDICES C C INFO - ERROR INDICATOR, 0 - NORMAL RETURN, POSITIVE VALUE (K) INDICATE C THAT U(K, K) = 0.E0 EXACTLY, ALWAYS CHECK AFTER CALL C C C DETERMINE BLOCK SIZE FOR THIS ENVIRONMENT C IF (M .EQ. ZERO .OR. N .EQ. ZERO) RETURN CALL ENVIR('GET',MB) IF (MB .EQ. 1) MB = M C DO 40 J = 1, M, MB JB = MIN(M-J+1, MB) C C PREVIOUS INTERCHANGES IN CURRENT BLOCK C DO 10 I = 1, J-1 IP = IPIV(I) IF(IP .NE. I) & CALL _SWAP(JB, A(J,I), IONE, A(J,IP), IONE) 10 CONTINUE C C COMPUTE SUPERDIAGONAL BLOCK OF U C CALL _TRSM('RIGHT', 'UPPER', 'NO TRANSPOSE', 'NO UNIT', & JB, J-1, ONE, A, LDA, A(J,1), LDA) C C UPDATE DIAGONAL AND SUBDIAGONAL BLOCKS C CALL _GEMM('NO TRANSPOSE', 'NO TRANSPOSE', JB, N-J+1, J-1, & -ONE, A(J,1), LDA, A(1,J), LDA, ONE, A(J,J), LDA) C C FACTORIZE DIAGONAL AND SUBDIAGONAL BLOCKS AND TEST FOR SINGULARITY C CALL _GTX(JB, N-J+1, A(J,J), LDA, IPIV(J), INFO) DO 20 I = J, J+JB-1 IPIV(I) = J-1+IPIV(I) 20 CONTINUE IF(INFO .EQ. 0) THEN C C APPLY INTERCHANGES TO PREVIOUS BLOCKS C DO 30 I = J, J+JB-1 IP = IPIV(I) IF(IP .NE. I) & CALL _SWAP(J-1, A(1,I), IONE, A(1,IP), IONE) 30 CONTINUE ELSE C C IF INFO NOT ZERO, A ZERO PIVOT FOUND IN _GTX, CALCULATE INDEX, RETURN C INFO = INFO + J - 1 RETURN ENDIF 40 CONTINUE RETURN C C END OF _GETRX C END C======================================================================= C SUBROUTINE _GTX(M, N, A, LDA, IPIV, INFO) CDBL IMPLICIT DOUBLE PRECISION(A-H,O-Z) DIMENSION A(LDA, N), IPIV(*) PARAMETER (ONE = 1.0E0, IONE = 1) C C LU DECOMPOSITION, COLUMN INTERCHANGES DO 10 J = 1,M MX = I_AMAX(N-J+1, A(J,J), LDA) IPIV(J) = MX + J - 1 IF(J .NE. IPIV(J)) & CALL _SWAP(M, A(1,J), IONE, A(1,IPIV(J)), IONE) C IF(ABS(A(J,J)) .EQ. 0.0E0) THEN INFO = J RETURN ENDIF C COLM = ONE/A(J,J) CALL _SCAL(M-J, COLM, A(J+1,J), IONE) CALL _GER (M-J, N-J, -ONE, A(J+1,J), IONE, & A(J,J+1), LDA, A(J+1,J+1), LDA) 10 CONTINUE RETURN END C======================================================================= SUBROUTINE _GETRY(M, N, A, LDA, IPIV, INFO) CDBL IMPLICIT DOUBLE PRECISION(A-H,O-Z) C PARAMETER(ONE = 1.0E0, IONE = 1, ZERO = 0.0E0) DIMENSION A(LDA,N), IPIV(*) C C COMPUTES LU FACTORIZATION OF M-BY-N MATRIX; PARTIAL PIVOTING C ROW INTERCHANGES C C M - NUMBER OF ROWS IN A, UNCHANGED C C N - NUMBER OF COLUMNS IN A, UNCHANGED C C A - ARRAY OF DIMENSION (LDA, N), OVERWIRITTEN BY FACTORIZATION C L - UNIT LOVER TRIANGULAR C - U UPPER TRIANGULAR C C LDA - FIRST DIMENSION OF A AS DECLARED IN THE CALLING (SUB)PROGRAM, C UNCHANGED C C IPIV - ARRAY OF DIMENSION (M), ON EXIT CONTAINS PIVOT INDICES C C INFO - ERROR INDICATOR, 0 - NORMAL RETURN, POSITIVE VALUE (K) INDICATE C U(K, K) = 0.E0 EXACTLY, ALWAYS CHECK AFTER CALL C C C DETERMINE BLOCK SIZE FOR THIS ENVIRONMENT C IF (M .EQ. ZERO .OR. N .EQ. ZERO) RETURN CALL ENVIR('GET',NB) IF (NB .EQ. 1) NB = N C DO 40 J = 1, N, NB JB = MIN(N-J+1, NB) C C PREVIOUS INTERCHANGES IN CURRENT BLOCK C DO 10 I = 1, J-1 IP = IPIV(I) IF(IP .NE. I) & CALL _SWAP(JB, A(I,J), LDA, A(IP,J), LDA) 10 CONTINUE C C COMPUTE SUPERDIAGONAL BLOCK OF U C CALL _TRSM('LEFT', 'LOWER', 'NO TRANSPOSE', 'UNIT', & J-1, JB, ONE, A, LDA, A(1,J), LDA) C C UPDATE DIAGONAL AND SUBDIAGONAL BLOCKS C CALL _GEMM('NO TRANSPOSE', 'NO TRANSPOSE', M-J+1, JB, J-1, & -ONE, A(J,1), LDA, A(1,J), LDA, ONE, A(J,J), LDA) C C FACTORIZE DIAGONAL AND SUBDIAGONAL BLOCKS AND TEST FOR SINGULARITY C CALL _GTY(M-J+1, JB, A(J,J), LDA, IPIV(J), INFO) DO 20 I = J, J+JB-1 IPIV(I) = J-1+IPIV(I) 20 CONTINUE IF(INFO .EQ. 0) THEN C C APPLY INTERCHANGES TO PREVIOUS BLOCKS C DO 30 I = J, J+JB-1 IP = IPIV(I) IF(IP .NE. I) & CALL _SWAP(J-1, A(I,1), LDA, A(IP,1), LDA) 30 CONTINUE ELSE C C IF INFO NOT ZERO, A ZERO PIVOT FOUD IN _GTY, CALCULATE INDEX, RETURN C INFO = INFO + J - 1 RETURN ENDIF 40 CONTINUE RETURN C C END OF _GETRY C END C======================================================================= C SUBROUTINE _GTY(M, N, A, LDA, IPIV, INFO) CDBL IMPLICIT DOUBLE PRECISION(A-H,O-Z) DIMENSION A(LDA,N), IPIV(*) PARAMETER(ONE = 1.0E0, IONE = 1) C C LU DECOMPOSITION WITH ROW INTERCHANGES C DO 10 J = 1,N MX = I_AMAX(M-J+1, A(J,J), IONE) IPIV(J) = MX + J - 1 IF(J .NE. IPIV(J)) & CALL _SWAP(N, A(J,1), LDA, A(IPIV(J),1), LDA) C IF(ABS(A(J,J)) .EQ. 0.0E0) THEN INFO = J RETURN ENDIF C ROWM = ONE/A(J,J) CALL _SCAL(M-J, ROWM, A(J+1,J), IONE) CALL _GER (M-J, N-J, -ONE, A(J+1,J), IONE, & A(J,J+1), LDA, A(J+1,J+1), LDA) 10 CONTINUE RETURN END EOF #*SINGLE* uncomment the line below, and delete the second line # for single precision #sed -e "s/_/S/g" > blkchk.f <<'EOF' sed -e "s/CDBL/ /g" -e "s/_/D/g" > blkchk.f <<'EOF' C Cliff Cyphers and Marcin Paprzycki C Department of Mathematics and Computer Science C The University of Texas of the Permian Basin C Odessa, Texas 79762 C C This routine is an adaptation of the NAG routine F01LHZ used C by permission. Thanks to NAG for allowing us to use this code. C If the NAG library is available then, the call to this routine may C be replaced by a call to F01LHZ SUBROUTINE BLKCHK(BLKSTR,NBLOKS,N,LENARQ,PASNAM,REPORT,IERR) C MARK 13 RELEASE. NAG COPYRIGHT 1988. C .. Scalar Arguments .. INTEGER IERR, LENARQ, N, NBLOKS LOGICAL REPORT CHARACTER*6 PASNAM C .. Array Arguments .. INTEGER BLKSTR(3,NBLOKS) C .. Local Scalars .. INTEGER IDEV, JOVER, K, KCOLS, KOVER, KROWS, NCOLS, * NCOLS1, NROWS CHARACTER*80 REC C .. External Subroutines .. EXTERNAL EROUT C .. Executable Statements .. C C SET IDEV TO THE LOGICAL UNIT NUMBER WHERE ERROR MESSAGES C SHOULD BE WRITTEN IDEV = 6 C IF (NBLOKS.LT.1) THEN IERR = -3 IF (REPORT) THEN WRITE (REC,FMT=99999) PASNAM, NBLOKS CALL EROUT(IDEV,REC) END IF END IF IF (N.LT.1) THEN IERR = -2 IF (REPORT) THEN WRITE (REC,FMT=99998) PASNAM, N CALL EROUT(IDEV,REC) END IF END IF IF (IERR.LT.0) GO TO 60 IF (N.LT.NBLOKS) THEN IERR = -1 IF (REPORT) THEN WRITE (REC,FMT=99997) PASNAM, N, NBLOKS CALL EROUT(IDEV,REC) END IF END IF DO 20 K = 1, NBLOKS - 1 KROWS = BLKSTR(1,K) KCOLS = BLKSTR(2,K) KOVER = BLKSTR(3,K) IF (KROWS.LT.1) THEN IERR = -4 IF (REPORT) THEN WRITE (REC,FMT=99996) PASNAM, K, KROWS CALL EROUT(IDEV,REC) END IF END IF IF (KCOLS.LT.1) THEN IERR = -4 IF (REPORT) THEN WRITE (REC,FMT=99995) PASNAM, K, KCOLS CALL EROUT(IDEV,REC) END IF END IF IF (KOVER.LT.0) THEN IERR = -4 IF (REPORT) THEN WRITE (REC,FMT=99994) PASNAM, K, KOVER CALL EROUT(IDEV,REC) END IF END IF 20 CONTINUE KROWS = BLKSTR(1,NBLOKS) KCOLS = BLKSTR(2,NBLOKS) IF (KROWS.LT.1) THEN IERR = -4 IF (REPORT) THEN WRITE (REC,FMT=99996) PASNAM, NBLOKS, KROWS CALL EROUT(IDEV,REC) END IF END IF IF (KCOLS.LT.1) THEN IERR = -4 IF (REPORT) THEN WRITE (REC,FMT=99995) PASNAM, NBLOKS, KCOLS CALL EROUT(IDEV,REC) END IF END IF IF (IERR.LT.0) GO TO 60 KROWS = BLKSTR(1,1) KCOLS = BLKSTR(2,1) KOVER = BLKSTR(3,1) IF (KCOLS.LT.KROWS) THEN IERR = -5 IF (REPORT) THEN WRITE (REC,FMT=99988) PASNAM, KCOLS, KROWS CALL EROUT(IDEV,REC) END IF GO TO 60 ELSE IF (KCOLS-KOVER.GT.KROWS) THEN IERR = -5 IF (REPORT) THEN WRITE (REC,FMT=99987) PASNAM, KCOLS - KOVER, KROWS CALL EROUT(IDEV,REC) END IF GO TO 60 END IF LENARQ = KROWS*KCOLS NROWS = KROWS NCOLS = KCOLS JOVER = KOVER DO 40 K = 2, NBLOKS - 1 KROWS = BLKSTR(1,K) KCOLS = BLKSTR(2,K) KOVER = BLKSTR(3,K) LENARQ = LENARQ + KROWS*KCOLS NROWS = NROWS + KROWS NCOLS = NCOLS + KCOLS - JOVER NCOLS1 = NCOLS - KOVER IF ( .NOT. (NCOLS1.LE.NROWS .AND. NROWS.LE.NCOLS)) THEN IERR = -5 IF (REPORT) THEN WRITE (REC,FMT=99986) PASNAM, K CALL EROUT(IDEV,REC) WRITE (REC,FMT=99985) CALL EROUT(IDEV,REC) WRITE (REC,FMT=99984) CALL EROUT(IDEV,REC) WRITE (REC,FMT=99983) CALL EROUT(IDEV,REC) END IF GO TO 60 ELSE IF (KOVER+JOVER.GT.KCOLS) THEN IERR = -5 IF (REPORT) THEN WRITE (REC,FMT=99993) PASNAM, K - 1, K, JOVER + KOVER CALL EROUT(IDEV,REC) WRITE (REC,FMT=99992) K, KCOLS CALL EROUT(IDEV,REC) END IF GO TO 60 END IF JOVER = KOVER 40 CONTINUE IF (NBLOKS.GT.1) THEN KROWS = BLKSTR(1,NBLOKS) KCOLS = BLKSTR(2,NBLOKS) NROWS = NROWS + KROWS NCOLS = NCOLS + KCOLS - JOVER LENARQ = LENARQ + KROWS*KCOLS END IF IF (NROWS.NE.N) THEN IERR = -5 IF (REPORT) THEN WRITE (REC,FMT=99991) PASNAM CALL EROUT(IDEV,REC) WRITE (REC,FMT=99990) CALL EROUT(IDEV,REC) END IF END IF IF (NCOLS.NE.N) THEN IERR = -5 IF (REPORT) THEN WRITE (REC,FMT=99991) PASNAM CALL EROUT(IDEV,REC) WRITE (REC,FMT=99989) CALL EROUT(IDEV,REC) END IF END IF 60 CONTINUE RETURN C 99999 FORMAT (' ** ',A6,' - NBLOKS(=',I16,') .le. 0 **') 99998 FORMAT (' ** ',A6,' - N(=',I16,') .le. 0 **') 99997 FORMAT (' ** ',A6,' - N(=',I16,') .lt. NBLOKS(=',I16,') **') 99996 FORMAT (' ** ',A6,' - BLKSTR(1,',I6,') (=',I16,') .lt. 1 **') 99995 FORMAT (' ** ',A6,' - BLKSTR(2,',I6,') (=',I16,') .lt. 1 **') 99994 FORMAT (' ** ',A6,' - BLKSTR(3,',I6,') (=',I16,') .lt. 0 **') 99993 FORMAT (' ** ',A6,' - BLKSTR(3,',I6,')+BLKSTR(3,',I6,')(=',I16, * ')') 99992 FORMAT (13X,'.lt. BLKSTR(2,',I6,') (=',I16,') **') 99991 FORMAT (' ** ',A6,' - the following equality does not hold') 99990 FORMAT (' sum( BLKSTR(1,k) :k=1,NBLOKS) .eq. N') 99989 FORMAT (' BLKSTR(2,1) + sum(BLKSTR(2,k)-BLKSTR(3,k-1) :k=2,N', * 'BLOKS) .eq. N **') 99988 FORMAT (' ** ',A6,' - BLKSTR(2,1) (=',I6,') .lt. BLKSTR(1,1) (=', * I6,') **') 99987 FORMAT (' ** ',A6,' - BLKSTR(2,1)-BLKSTR(3,1) (=',I6,') .gt. BLK', * 'STR(1,1) (=',I6,') **') 99986 FORMAT (' ** ',A6,' - the following inequality was not satisfied', * ' for j (=',I6,')') 99985 FORMAT (' sum(BLKSTR(2,k)-BLKSTR(3,k):k=1,J) .le. ') 99984 FORMAT (' sum(BLKSTR(1,k):k=1,J) .le. ') 99983 FORMAT (' BLKSTR(2,1)+sum(BLKSTR(2,k)-BLKSTR(3,k-1):k=2,j) **' * ) END SUBROUTINE EROUT(NOUT,REC) C This subroutine is adapted from the NAG routine X04BAF used C with permission. C MARK 11.5(F77) RELEASE. NAG COPYRIGHT 1986. C C Write the contents of REC to the unit defined by NOUT. C C Trailing blanks are not output, except that if REC is entirely C blank, a single blank character is output. C If NOUT.lt.0, i.e. if NOUT is not a valid Fortran unit identifier, C then no output occurs. C C .. Scalar Arguments .. INTEGER NOUT CHARACTER*(*) REC C .. Local Scalars .. INTEGER I C .. Intrinsic Functions .. INTRINSIC LEN C .. Executable Statements .. IF (NOUT.GE.0) THEN C Remove trailing blanks DO 20 I = LEN(REC), 2, -1 IF (REC(I:I).NE.' ') GO TO 40 20 CONTINUE C Write record to external file 40 WRITE (NOUT,FMT=99999) REC(1:I) END IF RETURN C 99999 FORMAT (A) END EOF #*SINGLE* uncomment the line below, and delete the second line # for single precision #sed -e "s/_/S/g" > abdml.f <<'EOF' sed -e "s/CDBL/ /g" -e "s/_/D/g" > abdml.f <<'EOF' SUBROUTINE _ABDML(TRANSA,N,NBLOKS,BLKSTR,ALPHA,A,LENA,B,LDB,IR, & BETA,C,LDC) CDBL IMPLICIT DOUBLE PRECISION(A-H,O-Z) C C .. Parameters .. PARAMETER (ONE=1.0D0) C .. Scalar Arguments .. INTEGER IR, LDB, LDC, LENA, N, NBLOKS CHARACTER TRANSA C .. Array Arguments .. DIMENSION A(LENA), B(LDB,IR), C(LDC,IR) INTEGER BLKSTR(3,NBLOKS) C .. Local Scalars .. INTEGER I, LDB1, LDC1, NCOLS, NMID, NOVRL1, NOVRL2, & NOVRLP, NROWS, NRWNXT, PTA, PTAMID, PTANXT, & PTAOV2, PTB, PTB1, PTB2, PTC, PTC1, PTC2 LOGICAL NORMAL, SINGLE C .. External Subroutines .. EXTERNAL _GEMM C .. Executable Statements .. C LDB1 = LDB LDC1 = LDC SINGLE = IR .EQ. 1 NORMAL = TRANSA .EQ. 'N' .OR. TRANSA .EQ. 'n' C C C = alpha * op(A) * B + beta * C C C where op(A) = A or op(A) = A(transpose) C IF (NORMAL) THEN PTA = 1 PTB = 1 PTC = 1 DO 20 I = 1, NBLOKS NROWS = BLKSTR(1,I) NCOLS = BLKSTR(2,I) IF (I.LT.NBLOKS) NOVRLP = BLKSTR(3,I) IF (SINGLE) THEN LDB1 = LDB - PTB + 1 LDC1 = LDC - PTC + 1 END IF CALL _GEMM(TRANSA,'N',NROWS,IR,NCOLS,ALPHA,A(PTA),NROWS, & B(PTB,1),LDB1,BETA,C(PTC,1),LDC1) PTA = PTA + NROWS*NCOLS PTB = PTB + NCOLS - NOVRLP PTC = PTC + NROWS 20 CONTINUE C ELSE C C NOVRL1: overlap of columns between current block and previous block C PTA: pointer to start of block C NOVRL2: overlap of columns between current block and next block C PTAOV2: pointer to part of block which overlaps with next block C NMID: columns of current block not in overlap C PTAMID: pointer to part of block not in overlap C C PTANXT: pointer to next block C NRWNXT: number of rows in next block C NROWS = BLKSTR(1,1) NCOLS = BLKSTR(2,1) NOVRL1 = 0 PTA = 1 PTB1 = 1 PTC1 = 1 DO 40 I = 1, NBLOKS - 1 NOVRL2 = BLKSTR(3,I) NMID = NCOLS - NOVRL1 - NOVRL2 PTAMID = PTA + NROWS*NOVRL1 PTAOV2 = PTAMID + NROWS*NMID PTB2 = PTB1 + NROWS PTC2 = PTC1 + NMID NRWNXT = BLKSTR(1,I+1) PTANXT = PTA + NROWS*NCOLS IF (NMID.GT.0) THEN C C Contribution from part of current block not in overlap C IF (SINGLE) THEN LDB1 = LDB - PTB1 + 1 LDC1 = LDC - PTC1 + 1 END IF CALL _GEMM(TRANSA,'N',NMID,IR,NROWS,ALPHA,A(PTAMID), & NROWS,B(PTB1,1),LDB1,BETA,C(PTC1,1),LDC1) END IF C IF (NOVRL2.GT.0) THEN C C Contribution from part of current block in overlap with the next block C IF (SINGLE) THEN LDB1 = LDB - PTB1 + 1 LDC1 = LDC - PTC2 + 1 END IF CALL _GEMM(TRANSA,'N',NOVRL2,IR,NROWS,ALPHA,A(PTAOV2), & NROWS,B(PTB1,1),LDB1,BETA,C(PTC2,1),LDC1) IF (SINGLE) THEN C C Add in the contribution from part of the next block in overlap with current C block - hence the argument BETA in _GEMM is set to ONE C LDB1 = LDB - PTB2 + 1 LDC1 = LDC - PTC2 + 1 END IF CALL _GEMM(TRANSA,'N',NOVRL2,IR,NRWNXT,ALPHA,A(PTANXT), & NRWNXT,B(PTB2,1),LDB1,ONE,C(PTC2,1),LDC1) END IF C PTB1 = PTB2 PTC1 = PTC2 + NOVRL2 PTA = PTANXT NOVRL1 = NOVRL2 NROWS = NRWNXT NCOLS = BLKSTR(2,I+1) C 40 CONTINUE C C Last block handled separately C NMID = NCOLS - NOVRL1 PTAMID = PTA + NOVRL1*NROWS IF (NMID.GT.0) THEN IF (SINGLE) THEN LDB1 = LDB - PTB1 + 1 LDC1 = LDC - PTC1 + 1 END IF c PRINT *, ' LAST BLOCK' c PRINT *, ' NMID = ', NMID c PRINT *, ' PTAMID = ', PTAMID c PRINT *, ' PTB1 = ', PTB1 c PRINT *, ' PTC1 = ', PTC1 CALL _GEMM(TRANSA,'N',NMID,IR,NROWS,ALPHA,A(PTAMID),NROWS, & B(PTB1,1),LDB1,BETA,C(PTC1,1),LDC1) C END IF C END IF C RETURN C 99999 FORMAT (' NROWS=',I2,' NCOLS=',I2,' NOVRL1=',I2,' NOVRL2=',I2, & ' NMID=',I2,/' PTA=',I2,' PTAMID=',I2,' PTAOV2=',I2,/' PT', & 'B1=',I2,' PTB2=',I2,' PTC1=',I2,' PTC2=',I2,/' NRWNXT=', & I2,' PTANXT=',I2,/) END EOF #*SINGLE* uncomment the line below, and delete the second line # for single precision #sed -e "s/_/S/g" > test1.f <<'EOF' sed -e "s/CDBL/ /g" -e "s/_/D/g" > test1.f <<'EOF' CDBL IMPLICIT DOUBLE PRECISION(A-H,O-Z) DIMENSION A(1016485), AH(1016485) DIMENSION B(333000), IPIV(3000), IBLK(3,20) IBLK(1,1) = 1 IBLK(2,1) = 2 IBLK(3,1) = 1 IBLK(1,2) = 1 IBLK(2,2) = 11 IBLK(3,2) = 10 IBLK(1,3) = 10 IBLK(2,3) = 10 IBLK(3,3) = 0 LENA = 113 NRO = 12 NBLK = 3 NRHS = 9 CALL TEST('N',NRO,NBLK,LENA,IBLK,NRHS,A,IPIV,AH,B) CALL TEST('T',NRO,NBLK,LENA,IBLK,NRHS,A,IPIV,AH,B) IBLK(1,1) = 684 IBLK(2,1) = 718 IBLK(3,1) = 68 IBLK(1,2) = 437 IBLK(2,2) = 513 IBLK(3,2) = 84 IBLK(1,3) = 322 IBLK(2,3) = 388 IBLK(3,3) = 48 IBLK(1,4) = 408 IBLK(2,4) = 432 IBLK(3,4) = 0 LENA = 1016485 NRO = 1851 NBLK = 4 NRHS = 10 CALL TEST('N',NRO,NBLK,LENA,IBLK,NRHS,A,IPIV,AH,B) CALL TEST('T',NRO,NBLK,LENA,IBLK,NRHS,A,IPIV,AH,B) IBLK(1,1) = 10 IBLK(2,1) = 20 IBLK(3,1) = 10 IBLK(1,2) = 200 IBLK(2,2) = 200 IBLK(3,2) = 0 IBLK(1,3) = 100 IBLK(2,3) = 100 IBLK(3,3) = 37 IBLK(1,4) = 50 IBLK(2,4) = 147 IBLK(3,4) = 108 IBLK(1,5) = 66 IBLK(2,5) = 200 IBLK(3,5) = 86 IBLK(1,6) = 126 IBLK(2,6) = 126 IBLK(3,6) = 0 LENA = 0 NRO = 0 DO 5 I=I,6 LENA = LENA + IBLK(1,I) * IBLK(2,I) NRO = NRO + IBLK(1,I) 5 CONTINUE NBLK = 6 NRHS = 13 CALL TEST('N',NRO,NBLK,LENA,IBLK,NRHS,A,IPIV,AH,B) CALL TEST('T',NRO,NBLK,LENA,IBLK,NRHS,A,IPIV,AH,B) C C Test for 10 300X300 blocks with no overlap DO 10 I=1,10 IBLK(1,I) = 300 IBLK(2,I) = 300 10 IBLK(3,I) = 0 LENA = 900000 NRO = 3000 NBLK = 10 NRHS = 10 CALL TEST('N',NRO,NBLK,LENA,IBLK,NRHS,A,IPIV,AH,B) CALL TEST('T',NRO,NBLK,LENA,IBLK,NRHS,A,IPIV,AH,B) C C Test for 20 blocks, first and last 50X100, rest 100X200 with 100 overlap IBLK(1,1) = 50 IBLK(2,1) = 100 IBLK(3,1) = 100 DO 20 I=2,19 IBLK(1,I) = 100 IBLK(2,I) = 200 20 IBLK(3,I) = 100 IBLK(1,20) = 50 IBLK(2,20) = 100 IBLK(3,20) = 0 LENA = 370000 NRO = 1900 NBLK = 20 NRHS = 37 CALL TEST('N',NRO,NBLK,LENA,IBLK,NRHS,A,IPIV,AH,B) CALL TEST('T',NRO,NBLK,LENA,IBLK,NRHS,A,IPIV,AH,B) C Test for 20 blocks IBLK(1,1) = 10 IBLK(2,1) = 10 IBLK(3,1) = 1 LENA=IBLK(1,1) * IBLK(2,1) DO 44 I=2,20 IBLK(1,I) = 10 IBLK(3,I) = 1 IBLK(2,I) = 11 LENA = LENA + IBLK(1,I) * IBLK(2,I) 44 CONTINUE IBLK(3,20) = 0 NBLK = 20 NRHS = 37 NRO = 200 CALL TEST('N',NRO,NBLK,LENA,IBLK,NRHS,A,IPIV,AH,B) CALL TEST('T',NRO,NBLK,LENA,IBLK,NRHS,A,IPIV,AH,B) STOP END C================================================================== SUBROUTINE TEST(TRANSA,NRO,NBLOKS,NAE,IBLK,NRHS,A,IPIV, & AH,B) CDBL IMPLICIT DOUBLE PRECISION(A-H,O-Z) DIMENSION A(NAE), IPIV(NRO), B(NRO,NRHS*3), IBLK(3,NBLOKS) DIMENSION AH(NAE) CHARACTER TRANSA PARAMETER (ONE = 1.0E0, IONE = 1, ZERO = 0.0E0) C WRITE(*,*) 'TRANSA=',TRANSA, ' NRO=',NRO, ' NBLOKS=',NBLOKS WRITE(*,*) 'LENA=',NAE,' NRHS=',NRHS WRITE(*,*) 'IBLK=',iblk C DO 10 I=1,NAE A(I) = 1.0E0*RANF() 10 CONTINUE DO 11 I=1,NAE AH(I) = A(I) 11 CONTINUE C c Number of right hand side K = NRHS DO 100 J= 1, K DO 100 I = 1, NRO B(I,J) = 1.0E0*RANF() B(I,J+K) = 0.0E0 100 B(I,J+(K*2)) = B(I,J) C C Calculate B <-- A*X IF (TRANSA .EQ. 'T' .OR. TRANSA .EQ. 't') THEN CALL _abdml('T',NRO,NBLOKS,IBLK,one,AH,NAE, & B(1,1),NRO,K,zero,B(1,K+1),NRO) ELSE CALL _abdml('N',NRO,NBLOKS,IBLK,ONE,AH,NAE, & B(1,1),NRO,K,ZERO,B(1,K+1),NRO) ENDIF IFAIL = 0 C Perform LU factorization of A CALL _GEABD(NRO,NBLOKS,IBLK,A,NAE,IPIV,INDEX,IFAIL) IF (IFAIL.NE.0) THEN WRITE(*,*) 'IFAIL=',IFAIL, INDEX RETURN ENDIF IFAIL = 0 C Calculate solution ~X IF (TRANSA .EQ. 'T' .OR. TRANSA .EQ. 't') THEN CALL _SOLT(NRO,NBLOKS,IBLK,A,NAE,IPIV,B(1,K+1),NRO,K) ELSE CALL _SOLN(NRO,NBLOKS,IBLK,A,NAE,IPIV,B(1,K+1),NRO,K) ENDIF C C Calculate the error ||X - ~X|| SUM = 0.0E0 do 299 j=1,k DO 299 I = 1,NRO SUM = SUM + (B(I,J+K) - B(I,J+(K*2)))**2 299 CONTINUE WRITE(*,*)'ERROR=',SQRT(SUM)/SQRT(FLOAT(NRO)) RETURN END C C=========================================================================== C EOF #*SINGLE* uncomment the line below, and delete the second line # for single precision #sed -e "s/_/S/g" > example1.f <<'EOF' sed -e "s/CDBL/ /g" -e "s/_/D/g" > example1.f <<'EOF' C SAMPLE BOX SCHEME DISCRETIZATION PROGRAM IMPLICIT DOUBLE PRECISION(A-H,O-Z) DIMENSION RHS(200),A(4000),Y(200),Z1(200),Z2(200), AH(4000) DIMENSION RHS2(200) DIMENSION X(2),IBLK(3,33),IPIV(200) CHARACTER*1 ITR PARAMETER (IONE = 1,ZERO = 0.D0) COMMON /PARAMS/ EPS, GAMMA, ZETA, B1, B2, B21,B22,B31,B32 EXTERNAL ASUB,QSUB,BASUB,BBSUB N = 6 C SET THE NUMBER OF MESH POINTS HERE NP = 9 NBL = NP + 1 NRO = NP*N WRITE(*,*)'N',N,'NP',NP,'NBL',NBL,'NRO',NRO EPS = 0.1 GAMMA = 0.1 ZETA = 0.1 ALEFT = -1.0D0 ARIGHT = 1.0D0 H = (ARIGHT - ALEFT)/(NP - 1) C WRITE(*,*)'EPS',EPS,'AL,AR',ALEFT,ARIGHT,'H',H C C BOUNDARY CONDITIONS I C IBCL = 3 IBCR = 3 B1 = 2.D0 B2 = 4.D0 B21 = 6.D0 B22 = 10.D0 B31 = 2.D0 B32 = 5.D0 C SET UP THE CONSTANTS NEEDED IN THE CALCULATION OF THE EXACT SOLUTION. ATEMP1 = EXP(-1.0D0/EPS) ATEMP2 = EXP(1.0D0/EPS) BB1 = (B2 -B1) /(ATEMP2 - ATEMP1) AA1 = B1 - BB1 * ATEMP1 BB2 = (B22 - EXP(-ARIGHT/SQRT(GAMMA))*EXP(ALEFT/SQRT(GAMMA))/ & EXP(-ALEFT/SQRT(GAMMA))) / (EXP(ARIGHT/SQRT(GAMMA)) & - EXP(-ARIGHT/SQRT(GAMMA))*EXP(ALEFT/SQRT(GAMMA))/ & EXP(-ALEFT/SQRT(GAMMA))) AA2 = (B21/EXP(-ALEFT/SQRT(GAMMA))) - BB2*(EXP(ALEFT/SQRT(GAMMA)) & /EXP(-ALEFT/SQRT(GAMMA))) C Some systems may require DERF for double precision. ATEMP1 = ERF(ALEFT/DSQRT(2.0D0*ZETA)) ATEMP2 = ERF(ARIGHT/DSQRT(2.0D0*ZETA)) BB3= (B32 - B31) / (ATEMP2 - ATEMP1) AA3= B31 - BB3*ATEMP1 CALL BASUB(A(1),N,IBCL,RHS(1)) IA = N*IBCL C WRITE(*,*)'A',A(1),A(2),'IA',IA,'IBC',IBCL,IBCR C WRITE(*,*)'RHS',RHS(1),RHS(2) IBLK(1,1) = IBCL IBLK(2,1) = N IBLK(3,1) = N IIB = 2 C WRITE(*,*)'IBLK',IBLK(1,1),IBLK(2,1),IBLK(3,1),'IIB',IIB C X1 = ALEFT X2 = X1 + H C C POINT BY POINT EQUATIONS SET UP C DO 100 K = 1,NP-1 C WRITE(*,*)'K=',K C CALCULATE THE "EXACT" SOLUTION FOR THIS POINT ATEMP = AA1 + BB1 * EXP(X1/EPS) WRITE(*,*)'X',X1, ATEMP ATEMP= AA2 *EXP(-X1/DSQRT(GAMMA)) + & BB2 *EXP(X1/DSQRT(GAMMA)) WRITE(*,*)'2 AT X1 ', ATEMP ATEMP = AA3 + BB3 * ERF(X1/DSQRT(2*ZETA)) WRITE(*,*)'3 AT X1 ', ATEMP C C S - BLOCK C II = IA + 2*(K - 1)*(N**2) C WRITE(*,*)'II =',II C CALL ASUB(A(II + 1),N,X1) C DO 10 I = 1,N*N A(II + I) = -A(II + I)/2.D0 C WRITE(*,*)'A',II+I,A(II+I) 10 CONTINUE DO 20 I = 1,N*N,(N + 1) A(II + I) = A(II + I) - 1/H C WRITE(*,*)'A',II+I,A(II+I) 20 CONTINUE C II = II + N*N C C R - BLOCK C CALL ASUB(A(II + 1),N,X2) C DO 30 I = 1,N*N 30 A(II + I) = -A(II + I)/2.D0 C 30 WRITE(*,*)'A',II+I,A(II+I) C DO 40 I = 1,N*N,(N + 1) 40 A(II + I) = A(II + I) + 1/H C 40 WRITE(*,*)'A',II+I,A(II+I) C C RIGHT HAND SIDE C IRHS = IBCL + (K - 1)*N + 1 CALL QSUB(RHS(IRHS),N,X1,X2) C WRITE(*,*)'RHS',(I,RHS(I),I=IRHS,IRHS+N-1) C X1 = X2 X2 = X1 + H IBLK(1,IIB) = N IBLK(2,IIB) = 2*N IBLK(3,IIB) = N C WRITE(*,*)'IBLK',IBLK(1,IIB),IBLK(2,IIB),IBLK(3,IIB),'IIB',IIB IIB = IIB + 1 C 100 CONTINUE ATEMP = AA1 + BB1 * EXP(ARIGHT/EPS) WRITE(*,*)'1 AT X2 ', ATEMP ATEMP= AA2 *EXP(-ARIGHT/DSQRT(GAMMA)) + & BB2 *EXP(ARIGHT/DSQRT(GAMMA)) WRITE(*,*)'2 AT ARIGHT ', ATEMP ATEMP = AA3 + BB3 * ERF(ARIGHT/DSQRT(2*ZETA)) WRITE(*,*)'3 AT ARIGHT ', ATEMP C C BOUNDARY CONDITIONS II C IA = IBCL*N + 2*(NP - 1)*N*N + 1 IRHS = IBCL + (K - 1)*N + 1 CALL BBSUB(A(IA),N,IBCR,RHS(IRHS)) C WRITE(*,*)'A L',IA,A(IA),IA+1,A(IA+1) C WRITE(*,*)'RHS L',IRHS,RHS(IRHS) IBLK(1,IIB) = IBCR IBLK(2,IIB) = N IBLK(3,IIB) = N C WRITE(*,*)'IBLK',IBLK C WRITE(*,*)'IBLK',IBLK(1,IIB),IBLK(2,IIB),IBLK(3,IIB),'IIB',IIB C C WRITE(*,*) C WRITE(*,*)(I,A(I),I=1,IA+9) C WRITE(*,*) C WRITE(*,*)(I,RHS(I),I=1,IRHS+2) WRITE(*,*) WRITE(*,*)' * * * * SOLUTION * * * * ' C LENA = 4000 ITR = 'N' LDB = 200 IFAIL = 0 ILENA = 0 NRO = 0 C WRITE(*,*) 'A BEFORE LUFACT ' DO 765 I=1,NBL ILENA = ILENA + IBLK(1,I) * IBLK(2,I) NRO = NRO + IBLK(1,I) 765 CONTINUE C WRITE(*,*) NRO,NBL,LENA C WRITE(*,*) 'NRO NBL LENA IBLK' C WRITE(*,*) NRO,NBL,LENA,IBLK DO 711 I=1,LENA 711 AH(I) = A(I) ITR = 'N' C WRITE(*,*)(I,A(I),I=1,ILENA) IBLK(3,NBL) = 0 CALL _GEABD(NRO,NBL,IBLK,A,ILENA,IPIV,INDEX,IFAIL) C WRITE(*,*) 'A AFTER LUFACT ' C WRITE(*,*)(I,A(I),I=1,ILENA) C WRITE(*,*) 'BACK FROM SGABD2', IRHS IF (IFAIL.NE.0) THEN WRITE(*,*) 'IFAIL=',IFAIL, INDEX STOP ENDIF IFAIL = 0 C WRITE(*,*) NRO,NBL,IBLK,LENA C WRITE(*,*) 'IPIV=',IPIV C WRITE(*,*) 'RHS BEFORE SOLVE ' C WRITE(*,*)(I,RHS(I),I=1,IRHS+2) CALL _SOLN(NRO,NBL,IBLK,A,ILENA,IPIV,RHS,NRO,IONE) WRITE(*,*)'IFAIL',IFAIL IF(IFAIL .NE. 0) STOP WRITE(*,*) 'RHS AFTER SOLVE ' WRITE(*,*)(I,RHS(I),I=1,IRHS+2) C CALL SBEMM(NRO,NBL,IBLK,AH,LENA,ZERO,RHS,RHS2,IONE) C CALL CEQAXB('N',NRO,NBL,IBLK,AH,LENA, C & RHS,LDB,IONE,RHS2,LDB) C WRITE(*,*)(I,RHS2(I),I=1,IRHS+2) END C SUBROUTINE ASUB(S,N,X) IMPLICIT DOUBLE PRECISION(A-H,O-Z) DIMENSION S(N,N) COMMON /PARAMS/ EPS, GAMMA, ZETA, B1, B2, B21,B22,B31,B32 C DO 10 I = 1,N DO 10 J = 1,N 10 S(I,J) = 0.0D0 C S(1,2) = 1.0D0 S(2,2) = 1.0D0/EPS S(3,4) = 1.0D0 S(4,3) = 1.0D0/GAMMA S(5,6) = 1.0D0 S(6,6) = -(X/ZETA) C END C SUBROUTINE QSUB(R,N,X1,X2) IMPLICIT DOUBLE PRECISION(A-H,O-Z) DIMENSION R(N) C DO 10 I = 1,N 10 R(I) = 0.0D0 C RETURN END C SUBROUTINE BASUB(B,N,NB,R) IMPLICIT DOUBLE PRECISION(A-H,O-Z) DIMENSION B(NB,N), R(NB) COMMON /PARAMS/ EPS, GAMMA, ZETA, B1, B2, B21,B22,B31,B32 C DO 10 I = 1,N DO 10 J = 1,N 10 B(I,J) = 0.0D0 C B(1,1) = 1.0D0 B(2,3) = 1.0D0 B(3,5) = 1.0D0 C DO 20 I = 1,NB 20 R(I) = 0.0D0 C R(1) = B1 R(2) = B21 R(3) = B31 C RETURN END C SUBROUTINE BBSUB(B,N,NB,R) IMPLICIT DOUBLE PRECISION(A-H,O-Z) DIMENSION B(NB,N), R(NB) COMMON /PARAMS/ EPS, GAMMA, ZETA, B1, B2, B21,B22,B31,B32 C DO 10 I = 1,N DO 10 J = 1,N 10 B(I,J) = 0.0D0 C B(1,1) = 1.0D0 B(2,3) = 1.0D0 B(3,5) = 1.0D0 C DO 20 I = 1,NB 20 R(I) = 0.0D0 C R(1) = B2 R(2) = B22 R(3) = B32 C RETURN END EOF