|
ScaLAPACK
2.0.2
ScaLAPACK: Scalable Linear Algebra PACKage
|
00001 SUBROUTINE PDGEHD2( N, ILO, IHI, A, IA, JA, DESCA, TAU, WORK, 00002 $ LWORK, INFO ) 00003 * 00004 * -- ScaLAPACK auxiliary routine (version 1.7) -- 00005 * University of Tennessee, Knoxville, Oak Ridge National Laboratory, 00006 * and University of California, Berkeley. 00007 * May 1, 1997 00008 * 00009 * .. Scalar Arguments .. 00010 INTEGER IA, IHI, ILO, INFO, JA, LWORK, N 00011 * .. 00012 * .. Array Arguments .. 00013 INTEGER DESCA( * ) 00014 DOUBLE PRECISION A( * ), TAU( * ), WORK( * ) 00015 * .. 00016 * 00017 * Purpose 00018 * ======= 00019 * 00020 * PDGEHD2 reduces a real general distributed matrix sub( A ) 00021 * to upper Hessenberg form H by an orthogonal similarity transforma- 00022 * tion: Q' * sub( A ) * Q = H, where 00023 * sub( A ) = A(IA+N-1:IA+N-1,JA+N-1:JA+N-1). 00024 * 00025 * Notes 00026 * ===== 00027 * 00028 * Each global data object is described by an associated description 00029 * vector. This vector stores the information required to establish 00030 * the mapping between an object element and its corresponding process 00031 * and memory location. 00032 * 00033 * Let A be a generic term for any 2D block cyclicly distributed array. 00034 * Such a global array has an associated description vector DESCA. 00035 * In the following comments, the character _ should be read as 00036 * "of the global array". 00037 * 00038 * NOTATION STORED IN EXPLANATION 00039 * --------------- -------------- -------------------------------------- 00040 * DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 00041 * DTYPE_A = 1. 00042 * CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 00043 * the BLACS process grid A is distribu- 00044 * ted over. The context itself is glo- 00045 * bal, but the handle (the integer 00046 * value) may vary. 00047 * M_A (global) DESCA( M_ ) The number of rows in the global 00048 * array A. 00049 * N_A (global) DESCA( N_ ) The number of columns in the global 00050 * array A. 00051 * MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 00052 * the rows of the array. 00053 * NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 00054 * the columns of the array. 00055 * RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 00056 * row of the array A is distributed. 00057 * CSRC_A (global) DESCA( CSRC_ ) The process column over which the 00058 * first column of the array A is 00059 * distributed. 00060 * LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 00061 * array. LLD_A >= MAX(1,LOCr(M_A)). 00062 * 00063 * Let K be the number of rows or columns of a distributed matrix, 00064 * and assume that its process grid has dimension p x q. 00065 * LOCr( K ) denotes the number of elements of K that a process 00066 * would receive if K were distributed over the p processes of its 00067 * process column. 00068 * Similarly, LOCc( K ) denotes the number of elements of K that a 00069 * process would receive if K were distributed over the q processes of 00070 * its process row. 00071 * The values of LOCr() and LOCc() may be determined via a call to the 00072 * ScaLAPACK tool function, NUMROC: 00073 * LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 00074 * LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 00075 * An upper bound for these quantities may be computed by: 00076 * LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 00077 * LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 00078 * 00079 * Arguments 00080 * ========= 00081 * 00082 * N (global input) INTEGER 00083 * The number of rows and columns to be operated on, i.e. the 00084 * order of the distributed submatrix sub( A ). N >= 0. 00085 * 00086 * ILO (global input) INTEGER 00087 * IHI (global input) INTEGER 00088 * It is assumed that sub( A ) is already upper triangular in 00089 * rows IA:IA+ILO-2 and IA+IHI:IA+N-1 and columns JA:JA+JLO-2 00090 * and JA+JHI:JA+N-1. See Further Details. If N > 0, 00091 * 1 <= ILO <= IHI <= N; otherwise set ILO = 1, IHI = N. 00092 * 00093 * A (local input/local output) DOUBLE PRECISION pointer into the 00094 * local memory to an array of dimension (LLD_A,LOCc(JA+N-1)). 00095 * On entry, this array contains the local pieces of the N-by-N 00096 * general distributed matrix sub( A ) to be reduced. On exit, 00097 * the upper triangle and the first subdiagonal of sub( A ) are 00098 * overwritten with the upper Hessenberg matrix H, and the ele- 00099 * ments below the first subdiagonal, with the array TAU, repre- 00100 * sent the orthogonal matrix Q as a product of elementary 00101 * reflectors. See Further Details. 00102 * 00103 * IA (global input) INTEGER 00104 * The row index in the global array A indicating the first 00105 * row of sub( A ). 00106 * 00107 * JA (global input) INTEGER 00108 * The column index in the global array A indicating the 00109 * first column of sub( A ). 00110 * 00111 * DESCA (global and local input) INTEGER array of dimension DLEN_. 00112 * The array descriptor for the distributed matrix A. 00113 * 00114 * TAU (local output) DOUBLE PRECISION array, dimension LOCc(JA+N-2) 00115 * The scalar factors of the elementary reflectors (see Further 00116 * Details). Elements JA:JA+ILO-2 and JA+IHI:JA+N-2 of TAU are 00117 * set to zero. TAU is tied to the distributed matrix A. 00118 * 00119 * WORK (local workspace/local output) DOUBLE PRECISION array, 00120 * dimension (LWORK) 00121 * On exit, WORK( 1 ) returns the minimal and optimal LWORK. 00122 * 00123 * LWORK (local or global input) INTEGER 00124 * The dimension of the array WORK. 00125 * LWORK is local input and must be at least 00126 * LWORK >= NB + MAX( NpA0, NB ) 00127 * 00128 * where NB = MB_A = NB_A, IROFFA = MOD( IA-1, NB ), 00129 * IAROW = INDXG2P( IA, NB, MYROW, RSRC_A, NPROW ), 00130 * NpA0 = NUMROC( IHI+IROFFA, NB, MYROW, IAROW, NPROW ), 00131 * 00132 * INDXG2P and NUMROC are ScaLAPACK tool functions; 00133 * MYROW, MYCOL, NPROW and NPCOL can be determined by calling 00134 * the subroutine BLACS_GRIDINFO. 00135 * 00136 * If LWORK = -1, then LWORK is global input and a workspace 00137 * query is assumed; the routine only calculates the minimum 00138 * and optimal size for all work arrays. Each of these 00139 * values is returned in the first entry of the corresponding 00140 * work array, and no error message is issued by PXERBLA. 00141 * 00142 * INFO (local output) INTEGER 00143 * = 0: successful exit 00144 * < 0: If the i-th argument is an array and the j-entry had 00145 * an illegal value, then INFO = -(i*100+j), if the i-th 00146 * argument is a scalar and had an illegal value, then 00147 * INFO = -i. 00148 * 00149 * Further Details 00150 * =============== 00151 * 00152 * The matrix Q is represented as a product of (ihi-ilo) elementary 00153 * reflectors 00154 * 00155 * Q = H(ilo) H(ilo+1) . . . H(ihi-1). 00156 * 00157 * Each H(i) has the form 00158 * 00159 * H(i) = I - tau * v * v' 00160 * 00161 * where tau is a real scalar, and v is a real vector with 00162 * v(1:i) = 0, v(i+1) = 1 and v(ihi+1:n) = 0; v(i+2:ihi) is stored on 00163 * exit in A(ia+ilo+i:ia+ihi-1,ja+ilo+i-2), and tau in TAU(ja+ilo+i-2). 00164 * 00165 * The contents of A(IA:IA+N-1,JA:JA+N-1) are illustrated by the follo- 00166 * wing example, with n = 7, ilo = 2 and ihi = 6: 00167 * 00168 * on entry on exit 00169 * 00170 * ( a a a a a a a ) ( a a h h h h a ) 00171 * ( a a a a a a ) ( a h h h h a ) 00172 * ( a a a a a a ) ( h h h h h h ) 00173 * ( a a a a a a ) ( v2 h h h h h ) 00174 * ( a a a a a a ) ( v2 v3 h h h h ) 00175 * ( a a a a a a ) ( v2 v3 v4 h h h ) 00176 * ( a ) ( a ) 00177 * 00178 * where a denotes an element of the original matrix sub( A ), h denotes 00179 * a modified element of the upper Hessenberg matrix H, and vi denotes 00180 * an element of the vector defining H(ja+ilo+i-2). 00181 * 00182 * Alignment requirements 00183 * ====================== 00184 * 00185 * The distributed submatrix sub( A ) must verify some alignment proper- 00186 * ties, namely the following expression should be true: 00187 * ( MB_A.EQ.NB_A .AND. IROFFA.EQ.ICOFFA ) 00188 * 00189 * ===================================================================== 00190 * 00191 * .. Parameters .. 00192 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 00193 $ LLD_, MB_, M_, NB_, N_, RSRC_ 00194 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 00195 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 00196 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 00197 DOUBLE PRECISION ONE 00198 PARAMETER ( ONE = 1.0D+0 ) 00199 * .. 00200 * .. Local Scalars .. 00201 LOGICAL LQUERY 00202 INTEGER I, IAROW, ICOFFA, ICTXT, IROFFA, J, K, LWMIN, 00203 $ MYCOL, MYROW, NPA0, NPCOL, NPROW 00204 DOUBLE PRECISION AII 00205 * .. 00206 * .. External Subroutines .. 00207 EXTERNAL BLACS_ABORT, BLACS_GRIDINFO, CHK1MAT, PDELSET, 00208 $ PDLARF, PDLARFG, PXERBLA 00209 * .. 00210 * .. External Functions .. 00211 INTEGER INDXG2P, NUMROC 00212 EXTERNAL INDXG2P, NUMROC 00213 * .. 00214 * .. Intrinsic Functions .. 00215 INTRINSIC DBLE, MAX, MIN, MOD 00216 * .. 00217 * .. Executable Statements .. 00218 * 00219 * Get grid parameters 00220 * 00221 ICTXT = DESCA( CTXT_ ) 00222 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 00223 * 00224 * Test the input parameters 00225 * 00226 INFO = 0 00227 IF( NPROW.EQ.-1 ) THEN 00228 INFO = -(700+CTXT_) 00229 ELSE 00230 CALL CHK1MAT( N, 1, N, 1, IA, JA, DESCA, 7, INFO ) 00231 IF( INFO.EQ.0 ) THEN 00232 IROFFA = MOD( IA-1, DESCA( MB_ ) ) 00233 ICOFFA = MOD( JA-1, DESCA( NB_ ) ) 00234 IAROW = INDXG2P( IA, DESCA( MB_ ), MYROW, DESCA( RSRC_ ), 00235 $ NPROW ) 00236 NPA0 = NUMROC( IHI+IROFFA, DESCA( MB_ ), MYROW, IAROW, 00237 $ NPROW ) 00238 LWMIN = DESCA( NB_ ) + MAX( NPA0, DESCA( NB_ ) ) 00239 * 00240 WORK( 1 ) = DBLE( LWMIN ) 00241 LQUERY = ( LWORK.EQ.-1 ) 00242 IF( ILO.LT.1 .OR. ILO.GT.MAX( 1, N ) ) THEN 00243 INFO = -2 00244 ELSE IF( IHI.LT.MIN( ILO, N ) .OR. IHI.GT.N ) THEN 00245 INFO = -3 00246 ELSE IF( IROFFA.NE.ICOFFA ) THEN 00247 INFO = -6 00248 ELSE IF( DESCA( MB_ ).NE.DESCA( NB_ ) ) THEN 00249 INFO = -(700+NB_) 00250 ELSE IF( LWORK.LT.LWMIN .AND. .NOT.LQUERY ) THEN 00251 INFO = -10 00252 END IF 00253 END IF 00254 END IF 00255 * 00256 IF( INFO.NE.0 ) THEN 00257 CALL PXERBLA( ICTXT, 'PDGEHD2', -INFO ) 00258 CALL BLACS_ABORT( ICTXT, 1 ) 00259 RETURN 00260 ELSE IF( LQUERY ) THEN 00261 RETURN 00262 END IF 00263 * 00264 DO 10 K = ILO, IHI-1 00265 I = IA + K - 1 00266 J = JA + K - 1 00267 * 00268 * Compute elementary reflector H(j) to annihilate 00269 * A(i+2:ihi+ia-1,j) 00270 * 00271 CALL PDLARFG( IHI-K, AII, I+1, J, A, MIN( I+2, N+IA-1 ), J, 00272 $ DESCA, 1, TAU ) 00273 CALL PDELSET( A, I+1, J, DESCA, ONE ) 00274 * 00275 * Apply H(k) to A(ia:ihi+ia-1,j+1:ihi+ja-1) from the right 00276 * 00277 CALL PDLARF( 'Right', IHI, IHI-K, A, I+1, J, DESCA, 1, TAU, A, 00278 $ IA, J+1, DESCA, WORK ) 00279 * 00280 * Apply H(j) to A(i+1:ia+ihi-1,j+1:ja+n-1) from the left 00281 * 00282 CALL PDLARF( 'Left', IHI-K, N-K, A, I+1, J, DESCA, 1, TAU, A, 00283 $ I+1, J+1, DESCA, WORK ) 00284 * 00285 CALL PDELSET( A, I+1, J, DESCA, AII ) 00286 10 CONTINUE 00287 * 00288 WORK( 1 ) = DBLE( LWMIN ) 00289 * 00290 RETURN 00291 * 00292 * End of PDGEHD2 00293 * 00294 END