|
ScaLAPACK
2.0.2
ScaLAPACK: Scalable Linear Algebra PACKage
|
00001 SUBROUTINE PDLARFG( N, ALPHA, IAX, JAX, X, IX, JX, DESCX, INCX, 00002 $ TAU ) 00003 * 00004 * -- ScaLAPACK auxiliary routine (version 1.7) -- 00005 * University of Tennessee, Knoxville, Oak Ridge National Laboratory, 00006 * and University of California, Berkeley. 00007 * May 1, 1997 00008 * 00009 * .. Scalar Arguments .. 00010 INTEGER IAX, INCX, IX, JAX, JX, N 00011 DOUBLE PRECISION ALPHA 00012 * .. 00013 * .. Array Arguments .. 00014 INTEGER DESCX( * ) 00015 DOUBLE PRECISION TAU( * ), X( * ) 00016 * .. 00017 * 00018 * Purpose 00019 * ======= 00020 * 00021 * PDLARFG generates a real elementary reflector H of order n, such 00022 * that 00023 * 00024 * H * sub( X ) = H * ( x(iax,jax) ) = ( alpha ), H' * H = I. 00025 * ( x ) ( 0 ) 00026 * 00027 * where alpha is a scalar, and sub( X ) is an (N-1)-element real 00028 * distributed vector X(IX:IX+N-2,JX) if INCX = 1 and X(IX,JX:JX+N-2) if 00029 * INCX = DESCX(M_). H is represented in the form 00030 * 00031 * H = I - tau * ( 1 ) * ( 1 v' ) , 00032 * ( v ) 00033 * 00034 * where tau is a real scalar and v is a real (N-1)-element 00035 * vector. 00036 * 00037 * If the elements of sub( X ) are all zero, then tau = 0 and H is 00038 * taken to be the unit matrix. 00039 * 00040 * Otherwise 1 <= tau <= 2. 00041 * 00042 * Notes 00043 * ===== 00044 * 00045 * Each global data object is described by an associated description 00046 * vector. This vector stores the information required to establish 00047 * the mapping between an object element and its corresponding process 00048 * and memory location. 00049 * 00050 * Let A be a generic term for any 2D block cyclicly distributed array. 00051 * Such a global array has an associated description vector DESCA. 00052 * In the following comments, the character _ should be read as 00053 * "of the global array". 00054 * 00055 * NOTATION STORED IN EXPLANATION 00056 * --------------- -------------- -------------------------------------- 00057 * DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case, 00058 * DTYPE_A = 1. 00059 * CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating 00060 * the BLACS process grid A is distribu- 00061 * ted over. The context itself is glo- 00062 * bal, but the handle (the integer 00063 * value) may vary. 00064 * M_A (global) DESCA( M_ ) The number of rows in the global 00065 * array A. 00066 * N_A (global) DESCA( N_ ) The number of columns in the global 00067 * array A. 00068 * MB_A (global) DESCA( MB_ ) The blocking factor used to distribute 00069 * the rows of the array. 00070 * NB_A (global) DESCA( NB_ ) The blocking factor used to distribute 00071 * the columns of the array. 00072 * RSRC_A (global) DESCA( RSRC_ ) The process row over which the first 00073 * row of the array A is distributed. 00074 * CSRC_A (global) DESCA( CSRC_ ) The process column over which the 00075 * first column of the array A is 00076 * distributed. 00077 * LLD_A (local) DESCA( LLD_ ) The leading dimension of the local 00078 * array. LLD_A >= MAX(1,LOCr(M_A)). 00079 * 00080 * Let K be the number of rows or columns of a distributed matrix, 00081 * and assume that its process grid has dimension p x q. 00082 * LOCr( K ) denotes the number of elements of K that a process 00083 * would receive if K were distributed over the p processes of its 00084 * process column. 00085 * Similarly, LOCc( K ) denotes the number of elements of K that a 00086 * process would receive if K were distributed over the q processes of 00087 * its process row. 00088 * The values of LOCr() and LOCc() may be determined via a call to the 00089 * ScaLAPACK tool function, NUMROC: 00090 * LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ), 00091 * LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). 00092 * An upper bound for these quantities may be computed by: 00093 * LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A 00094 * LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A 00095 * 00096 * Because vectors may be viewed as a subclass of matrices, a 00097 * distributed vector is considered to be a distributed matrix. 00098 * 00099 * Arguments 00100 * ========= 00101 * 00102 * N (global input) INTEGER 00103 * The global order of the elementary reflector. N >= 0. 00104 * 00105 * ALPHA (local output) DOUBLE PRECISION 00106 * On exit, alpha is computed in the process scope having the 00107 * vector sub( X ). 00108 * 00109 * IAX (global input) INTEGER 00110 * The global row index in X of X(IAX,JAX). 00111 * 00112 * JAX (global input) INTEGER 00113 * The global column index in X of X(IAX,JAX). 00114 * 00115 * X (local input/local output) DOUBLE PRECISION, pointer into the 00116 * local memory to an array of dimension (LLD_X,*). This array 00117 * contains the local pieces of the distributed vector sub( X ). 00118 * Before entry, the incremented array sub( X ) must contain 00119 * the vector x. On exit, it is overwritten with the vector v. 00120 * 00121 * IX (global input) INTEGER 00122 * The row index in the global array X indicating the first 00123 * row of sub( X ). 00124 * 00125 * JX (global input) INTEGER 00126 * The column index in the global array X indicating the 00127 * first column of sub( X ). 00128 * 00129 * DESCX (global and local input) INTEGER array of dimension DLEN_. 00130 * The array descriptor for the distributed matrix X. 00131 * 00132 * INCX (global input) INTEGER 00133 * The global increment for the elements of X. Only two values 00134 * of INCX are supported in this version, namely 1 and M_X. 00135 * INCX must not be zero. 00136 * 00137 * TAU (local output) DOUBLE PRECISION array, dimension LOCc(JX) 00138 * if INCX = 1, and LOCr(IX) otherwise. This array contains the 00139 * Householder scalars related to the Householder vectors. 00140 * TAU is tied to the distributed matrix X. 00141 * 00142 * ===================================================================== 00143 * 00144 * .. Parameters .. 00145 INTEGER BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_, 00146 $ LLD_, MB_, M_, NB_, N_, RSRC_ 00147 PARAMETER ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1, 00148 $ CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6, 00149 $ RSRC_ = 7, CSRC_ = 8, LLD_ = 9 ) 00150 DOUBLE PRECISION ONE, ZERO 00151 PARAMETER ( ONE = 1.0D+0, ZERO = 0.0D+0 ) 00152 * .. 00153 * .. Local Scalars .. 00154 INTEGER ICTXT, IIAX, INDXTAU, IXCOL, IXROW, J, JJAX, 00155 $ KNT, MYCOL, MYROW, NPCOL, NPROW 00156 DOUBLE PRECISION BETA, RSAFMN, SAFMIN, XNORM 00157 * .. 00158 * .. External Subroutines .. 00159 EXTERNAL BLACS_GRIDINFO, DGEBR2D, DGEBS2D, PDSCAL, 00160 $ INFOG2L, PDNRM2 00161 * .. 00162 * .. External Functions .. 00163 DOUBLE PRECISION DLAMCH, DLAPY2 00164 EXTERNAL DLAMCH, DLAPY2 00165 * .. 00166 * .. Intrinsic Functions .. 00167 INTRINSIC ABS, SIGN 00168 * .. 00169 * .. Executable Statements .. 00170 * 00171 * Get grid parameters. 00172 * 00173 ICTXT = DESCX( CTXT_ ) 00174 CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) 00175 * 00176 IF( INCX.EQ.DESCX( M_ ) ) THEN 00177 * 00178 * sub( X ) is distributed across a process row. 00179 * 00180 CALL INFOG2L( IX, JAX, DESCX, NPROW, NPCOL, MYROW, MYCOL, 00181 $ IIAX, JJAX, IXROW, IXCOL ) 00182 * 00183 IF( MYROW.NE.IXROW ) 00184 $ RETURN 00185 * 00186 * Broadcast X(IAX,JAX) across the process row. 00187 * 00188 IF( MYCOL.EQ.IXCOL ) THEN 00189 J = IIAX+(JJAX-1)*DESCX( LLD_ ) 00190 CALL DGEBS2D( ICTXT, 'Rowwise', ' ', 1, 1, X( J ), 1 ) 00191 ALPHA = X( J ) 00192 ELSE 00193 CALL DGEBR2D( ICTXT, 'Rowwise', ' ', 1, 1, ALPHA, 1, 00194 $ MYROW, IXCOL ) 00195 END IF 00196 * 00197 INDXTAU = IIAX 00198 * 00199 ELSE 00200 * 00201 * sub( X ) is distributed across a process column. 00202 * 00203 CALL INFOG2L( IAX, JX, DESCX, NPROW, NPCOL, MYROW, MYCOL, 00204 $ IIAX, JJAX, IXROW, IXCOL ) 00205 * 00206 IF( MYCOL.NE.IXCOL ) 00207 $ RETURN 00208 * 00209 * Broadcast X(IAX,JAX) across the process column. 00210 * 00211 IF( MYROW.EQ.IXROW ) THEN 00212 J = IIAX+(JJAX-1)*DESCX( LLD_ ) 00213 CALL DGEBS2D( ICTXT, 'Columnwise', ' ', 1, 1, X( J ), 1 ) 00214 ALPHA = X( J ) 00215 ELSE 00216 CALL DGEBR2D( ICTXT, 'Columnwise', ' ', 1, 1, ALPHA, 1, 00217 $ IXROW, MYCOL ) 00218 END IF 00219 * 00220 INDXTAU = JJAX 00221 * 00222 END IF 00223 * 00224 IF( N.LE.0 ) THEN 00225 TAU( INDXTAU ) = ZERO 00226 RETURN 00227 END IF 00228 * 00229 CALL PDNRM2( N-1, XNORM, X, IX, JX, DESCX, INCX ) 00230 * 00231 IF( XNORM.EQ.ZERO ) THEN 00232 * 00233 * H = I 00234 * 00235 TAU( INDXTAU ) = ZERO 00236 * 00237 ELSE 00238 * 00239 * General case 00240 * 00241 BETA = -SIGN( DLAPY2( ALPHA, XNORM ), ALPHA ) 00242 SAFMIN = DLAMCH( 'S' ) 00243 RSAFMN = ONE / SAFMIN 00244 IF( ABS( BETA ).LT.SAFMIN ) THEN 00245 * 00246 * XNORM, BETA may be inaccurate; scale X and recompute them 00247 * 00248 KNT = 0 00249 10 CONTINUE 00250 KNT = KNT + 1 00251 CALL PDSCAL( N-1, RSAFMN, X, IX, JX, DESCX, INCX ) 00252 BETA = BETA*RSAFMN 00253 ALPHA = ALPHA*RSAFMN 00254 IF( ABS( BETA ).LT.SAFMIN ) 00255 $ GO TO 10 00256 * 00257 * New BETA is at most 1, at least SAFMIN 00258 * 00259 CALL PDNRM2( N-1, XNORM, X, IX, JX, DESCX, INCX ) 00260 BETA = -SIGN( DLAPY2( ALPHA, XNORM ), ALPHA ) 00261 TAU( INDXTAU ) = ( BETA-ALPHA ) / BETA 00262 CALL PDSCAL( N-1, ONE/(ALPHA-BETA), X, IX, JX, DESCX, INCX ) 00263 * 00264 * If ALPHA is subnormal, it may lose relative accuracy 00265 * 00266 ALPHA = BETA 00267 DO 20 J = 1, KNT 00268 ALPHA = ALPHA*SAFMIN 00269 20 CONTINUE 00270 ELSE 00271 TAU( INDXTAU ) = ( BETA-ALPHA ) / BETA 00272 CALL PDSCAL( N-1, ONE/(ALPHA-BETA), X, IX, JX, DESCX, INCX ) 00273 ALPHA = BETA 00274 END IF 00275 END IF 00276 * 00277 RETURN 00278 * 00279 * End of PDLARFG 00280 * 00281 END