ScaLAPACK  2.0.2
ScaLAPACK: Scalable Linear Algebra PACKage
pssytd2.f
Go to the documentation of this file.
00001       SUBROUTINE PSSYTD2( UPLO, N, A, IA, JA, DESCA, D, E, TAU, WORK,
00002      $                    LWORK, INFO )
00003 *
00004 *  -- ScaLAPACK auxiliary routine (version 1.7) --
00005 *     University of Tennessee, Knoxville, Oak Ridge National Laboratory,
00006 *     and University of California, Berkeley.
00007 *     May 1, 1997
00008 *
00009 *     .. Scalar Arguments ..
00010       CHARACTER          UPLO
00011       INTEGER            IA, INFO, JA, LWORK, N
00012 *     ..
00013 *     .. Array Arguments ..
00014       INTEGER            DESCA( * )
00015       REAL                A( * ), D( * ), E( * ), TAU( * ), WORK( * )
00016 *     ..
00017 *
00018 *  Purpose
00019 *  =======
00020 *
00021 *  PSSYTD2 reduces a real symmetric matrix sub( A ) to symmetric
00022 *  tridiagonal form T by an orthogonal similarity transformation:
00023 *  Q' * sub( A ) * Q = T, where sub( A ) = A(IA:IA+N-1,JA:JA+N-1).
00024 *
00025 *  Notes
00026 *  =====
00027 *
00028 *  Each global data object is described by an associated description
00029 *  vector.  This vector stores the information required to establish
00030 *  the mapping between an object element and its corresponding process
00031 *  and memory location.
00032 *
00033 *  Let A be a generic term for any 2D block cyclicly distributed array.
00034 *  Such a global array has an associated description vector DESCA.
00035 *  In the following comments, the character _ should be read as
00036 *  "of the global array".
00037 *
00038 *  NOTATION        STORED IN      EXPLANATION
00039 *  --------------- -------------- --------------------------------------
00040 *  DTYPE_A(global) DESCA( DTYPE_ )The descriptor type.  In this case,
00041 *                                 DTYPE_A = 1.
00042 *  CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
00043 *                                 the BLACS process grid A is distribu-
00044 *                                 ted over. The context itself is glo-
00045 *                                 bal, but the handle (the integer
00046 *                                 value) may vary.
00047 *  M_A    (global) DESCA( M_ )    The number of rows in the global
00048 *                                 array A.
00049 *  N_A    (global) DESCA( N_ )    The number of columns in the global
00050 *                                 array A.
00051 *  MB_A   (global) DESCA( MB_ )   The blocking factor used to distribute
00052 *                                 the rows of the array.
00053 *  NB_A   (global) DESCA( NB_ )   The blocking factor used to distribute
00054 *                                 the columns of the array.
00055 *  RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
00056 *                                 row of the array A is distributed.
00057 *  CSRC_A (global) DESCA( CSRC_ ) The process column over which the
00058 *                                 first column of the array A is
00059 *                                 distributed.
00060 *  LLD_A  (local)  DESCA( LLD_ )  The leading dimension of the local
00061 *                                 array.  LLD_A >= MAX(1,LOCr(M_A)).
00062 *
00063 *  Let K be the number of rows or columns of a distributed matrix,
00064 *  and assume that its process grid has dimension p x q.
00065 *  LOCr( K ) denotes the number of elements of K that a process
00066 *  would receive if K were distributed over the p processes of its
00067 *  process column.
00068 *  Similarly, LOCc( K ) denotes the number of elements of K that a
00069 *  process would receive if K were distributed over the q processes of
00070 *  its process row.
00071 *  The values of LOCr() and LOCc() may be determined via a call to the
00072 *  ScaLAPACK tool function, NUMROC:
00073 *          LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
00074 *          LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).
00075 *  An upper bound for these quantities may be computed by:
00076 *          LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
00077 *          LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A
00078 *
00079 *  Arguments
00080 *  =========
00081 *
00082 *  UPLO    (global input) CHARACTER
00083 *          Specifies whether the upper or lower triangular part of the
00084 *          symmetric matrix sub( A ) is stored:
00085 *          = 'U':  Upper triangular
00086 *          = 'L':  Lower triangular
00087 *
00088 *  N       (global input) INTEGER
00089 *          The number of rows and columns to be operated on, i.e. the
00090 *          order of the distributed submatrix sub( A ). N >= 0.
00091 *
00092 *  A       (local input/local output) REAL pointer into the
00093 *          local memory to an array of dimension (LLD_A,LOCc(JA+N-1)).
00094 *          On entry, this array contains the local pieces of the
00095 *          symmetric distributed matrix sub( A ).  If UPLO = 'U', the
00096 *          leading N-by-N upper triangular part of sub( A ) contains
00097 *          the upper triangular part of the matrix, and its strictly
00098 *          lower triangular part is not referenced. If UPLO = 'L', the
00099 *          leading N-by-N lower triangular part of sub( A ) contains the
00100 *          lower triangular part of the matrix, and its strictly upper
00101 *          triangular part is not referenced. On exit, if UPLO = 'U',
00102 *          the diagonal and first superdiagonal of sub( A ) are over-
00103 *          written by the corresponding elements of the tridiagonal
00104 *          matrix T, and the elements above the first superdiagonal,
00105 *          with the array TAU, represent the orthogonal matrix Q as a
00106 *          product of elementary reflectors; if UPLO = 'L', the diagonal
00107 *          and first subdiagonal of sub( A ) are overwritten by the
00108 *          corresponding elements of the tridiagonal matrix T, and the
00109 *          elements below the first subdiagonal, with the array TAU,
00110 *          represent the orthogonal matrix Q as a product of elementary
00111 *          reflectors. See Further Details.
00112 *
00113 *  IA      (global input) INTEGER
00114 *          The row index in the global array A indicating the first
00115 *          row of sub( A ).
00116 *
00117 *  JA      (global input) INTEGER
00118 *          The column index in the global array A indicating the
00119 *          first column of sub( A ).
00120 *
00121 *  DESCA   (global and local input) INTEGER array of dimension DLEN_.
00122 *          The array descriptor for the distributed matrix A.
00123 *
00124 *  D       (local output) REAL array, dimension LOCc(JA+N-1)
00125 *          The diagonal elements of the tridiagonal matrix T:
00126 *          D(i) = A(i,i). D is tied to the distributed matrix A.
00127 *
00128 *  E       (local output) REAL array, dimension LOCc(JA+N-1)
00129 *          if UPLO = 'U', LOCc(JA+N-2) otherwise. The off-diagonal
00130 *          elements of the tridiagonal matrix T: E(i) = A(i,i+1) if
00131 *          UPLO = 'U', E(i) = A(i+1,i) if UPLO = 'L'. E is tied to the
00132 *          distributed matrix A.
00133 *
00134 *  TAU     (local output) REAL, array, dimension
00135 *          LOCc(JA+N-1). This array contains the scalar factors TAU of
00136 *          the elementary reflectors. TAU is tied to the distributed
00137 *          matrix A.
00138 *
00139 *  WORK    (local workspace/local output) REAL array,
00140 *                                                    dimension (LWORK)
00141 *          On exit, WORK( 1 ) returns the minimal and optimal LWORK.
00142 *
00143 *  LWORK   (local or global input) INTEGER
00144 *          The dimension of the array WORK.
00145 *          LWORK is local input and must be at least
00146 *          LWORK >= 3*N.
00147 *
00148 *          If LWORK = -1, then LWORK is global input and a workspace
00149 *          query is assumed; the routine only calculates the minimum
00150 *          and optimal size for all work arrays. Each of these
00151 *          values is returned in the first entry of the corresponding
00152 *          work array, and no error message is issued by PXERBLA.
00153 *
00154 *  INFO    (local output) INTEGER
00155 *          = 0:  successful exit
00156 *          < 0:  If the i-th argument is an array and the j-entry had
00157 *                an illegal value, then INFO = -(i*100+j), if the i-th
00158 *                argument is a scalar and had an illegal value, then
00159 *                INFO = -i.
00160 *
00161 *  Further Details
00162 *  ===============
00163 *
00164 *  If UPLO = 'U', the matrix Q is represented as a product of elementary
00165 *  reflectors
00166 *
00167 *     Q = H(n-1) . . . H(2) H(1).
00168 *
00169 *  Each H(i) has the form
00170 *
00171 *     H(i) = I - tau * v * v'
00172 *
00173 *  where tau is a real scalar, and v is a real vector with
00174 *  v(i+1:n) = 0 and v(i) = 1; v(1:i-1) is stored on exit in
00175 *  A(ia:ia+i-2,ja+i), and tau in TAU(ja+i-1).
00176 *
00177 *  If UPLO = 'L', the matrix Q is represented as a product of elementary
00178 *  reflectors
00179 *
00180 *     Q = H(1) H(2) . . . H(n-1).
00181 *
00182 *  Each H(i) has the form
00183 *
00184 *     H(i) = I - tau * v * v'
00185 *
00186 *  where tau is a real scalar, and v is a real vector with
00187 *  v(1:i) = 0 and v(i+1) = 1; v(i+2:n) is stored on exit in
00188 *  A(ia+i+1:ia+n-1,ja+i-1), and tau in TAU(ja+i-1).
00189 *
00190 *  The contents of sub( A ) on exit are illustrated by the following
00191 *  examples with n = 5:
00192 *
00193 *  if UPLO = 'U':                       if UPLO = 'L':
00194 *
00195 *    (  d   e   v2  v3  v4 )              (  d                  )
00196 *    (      d   e   v3  v4 )              (  e   d              )
00197 *    (          d   e   v4 )              (  v1  e   d          )
00198 *    (              d   e  )              (  v1  v2  e   d      )
00199 *    (                  d  )              (  v1  v2  v3  e   d  )
00200 *
00201 *  where d and e denote diagonal and off-diagonal elements of T, and vi
00202 *  denotes an element of the vector defining H(i).
00203 *
00204 *  Alignment requirements
00205 *  ======================
00206 *
00207 *  The distributed submatrix sub( A ) must verify some alignment proper-
00208 *  ties, namely the following expression should be true:
00209 *  ( MB_A.EQ.NB_A .AND. IROFFA.EQ.ICOFFA ) with
00210 *  IROFFA = MOD( IA-1, MB_A ) and ICOFFA = MOD( JA-1, NB_A ).
00211 *
00212 *  =====================================================================
00213 *
00214 *     .. Parameters ..
00215       INTEGER            BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_,
00216      $                   LLD_, MB_, M_, NB_, N_, RSRC_
00217       PARAMETER          ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1,
00218      $                     CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6,
00219      $                     RSRC_ = 7, CSRC_ = 8, LLD_ = 9 )
00220       REAL               HALF, ONE, ZERO
00221       PARAMETER          ( HALF = 0.5E+0, ONE = 1.0E+0, ZERO = 0.0E+0 )
00222 *     ..
00223 *     .. Local Scalars ..
00224       LOGICAL            LQUERY, UPPER
00225       INTEGER            IACOL, IAROW, ICOFFA, ICTXT, II, IK, IROFFA, J,
00226      $                   JJ, JK, JN, LDA, LWMIN, MYCOL, MYROW, NPCOL,
00227      $                   NPROW
00228       REAL               ALPHA, TAUI
00229 *     ..
00230 *     .. External Subroutines ..
00231       EXTERNAL           BLACS_ABORT, BLACS_GRIDINFO, CHK1MAT, INFOG2L,
00232      $                   PXERBLA, SAXPY, SGEBR2D, SGEBS2D,
00233      $                   SLARFG, SSYMV, SSYR2
00234 *     ..
00235 *     .. External Functions ..
00236       LOGICAL            LSAME
00237       REAL               SDOT
00238       EXTERNAL           LSAME, SDOT
00239 *     ..
00240 *     .. Intrinsic Functions ..
00241       INTRINSIC          REAL
00242 *     ..
00243 *     .. Executable Statements ..
00244 *
00245 *     Get grid parameters
00246 *
00247       ICTXT = DESCA( CTXT_ )
00248       CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )
00249 *
00250 *     Test the input parameters
00251 *
00252       INFO = 0
00253       IF( NPROW.EQ.-1 ) THEN
00254          INFO = -(600+CTXT_)
00255       ELSE
00256          UPPER = LSAME( UPLO, 'U' )
00257          CALL CHK1MAT( N, 2, N, 2, IA, JA, DESCA, 6, INFO )
00258          LWMIN = 3 * N
00259 *
00260          WORK( 1 ) = REAL( LWMIN )
00261          LQUERY = ( LWORK.EQ.-1 )
00262          IF( INFO.EQ.0 ) THEN
00263             IROFFA = MOD( IA-1, DESCA( MB_ ) )
00264             ICOFFA = MOD( JA-1, DESCA( NB_ ) )
00265             IF( .NOT.UPPER .AND. .NOT.LSAME( UPLO, 'L' ) ) THEN
00266                INFO = -1
00267             ELSE IF( IROFFA.NE.ICOFFA ) THEN
00268                INFO = -5
00269             ELSE IF( DESCA( MB_ ).NE.DESCA( NB_ ) ) THEN
00270                INFO = -(600+NB_)
00271             ELSE IF( LWORK.LT.LWMIN .AND. .NOT.LQUERY ) THEN
00272                INFO = -11
00273             END IF
00274          END IF
00275       END IF
00276 *
00277       IF( INFO.NE.0 ) THEN
00278          CALL PXERBLA( ICTXT, 'PSSYTD2', -INFO )
00279          CALL BLACS_ABORT( ICTXT, 1 )
00280          RETURN
00281       ELSE IF( LQUERY ) THEN
00282          RETURN
00283       END IF
00284 *
00285 *     Quick return if possible
00286 *
00287       IF( N.LE.0 )
00288      $   RETURN
00289 *
00290 *     Compute local information
00291 *
00292       LDA = DESCA( LLD_ )
00293       CALL INFOG2L( IA, JA, DESCA, NPROW, NPCOL, MYROW, MYCOL, II, JJ,
00294      $              IAROW, IACOL )
00295 *
00296       IF( UPPER ) THEN
00297 *
00298 *        Process(IAROW, IACOL) owns block to be reduced
00299 *
00300          IF( MYCOL.EQ.IACOL ) THEN
00301             IF( MYROW.EQ.IAROW ) THEN
00302 *
00303 *              Reduce the upper triangle of sub( A )
00304 *
00305                DO 10 J = N-1, 1, -1
00306                   IK = II + J - 1
00307                   JK = JJ + J - 1
00308 *
00309 *                 Generate elementary reflector H(i) = I - tau * v * v'
00310 *                 to annihilate A(IA:IA+J-1,JA:JA+J-1)
00311 *
00312                   CALL SLARFG( J, A( IK+JK*LDA ), A( II+JK*LDA ), 1,
00313      $                         TAUI )
00314                   E( JK+1 ) = A( IK+JK*LDA )
00315 *
00316                   IF( TAUI.NE.ZERO ) THEN
00317 *
00318 *                    Apply H(i) from both sides to
00319 *                    A(IA:IA+J-1,JA:JA+J-1)
00320 *
00321                      A( IK+JK*LDA ) = ONE
00322 *
00323 *                    Compute  x := tau * A * v  storing x in TAU(1:i)
00324 *
00325                      CALL SSYMV( UPLO, J, TAUI, A( II+(JJ-1)*LDA ),
00326      $                           LDA, A( II+JK*LDA ), 1, ZERO,
00327      $                           TAU( JJ ), 1 )
00328 *
00329 *                    Compute  w := x - 1/2 * tau * (x'*v) * v
00330 *
00331                      ALPHA = -HALF*TAUI*SDOT( J, TAU( JJ ), 1,
00332      $                                        A( II+JK*LDA ), 1 )
00333                      CALL SAXPY( J, ALPHA, A( II+JK*LDA ), 1,
00334      $                           TAU( JJ ), 1 )
00335 *
00336 *                    Apply the transformation as a rank-2 update:
00337 *                       A := A - v * w' - w * v'
00338 *
00339                      CALL SSYR2( UPLO, J, -ONE, A( II+JK*LDA ), 1,
00340      $                           TAU( JJ ), 1, A( II+(JJ-1)*LDA ),
00341      $                           LDA )
00342                      A( IK+JK*LDA ) = E( JK+1 )
00343                   END IF
00344 *
00345 *                 Copy D, E, TAU to broadcast them columnwise.
00346 *
00347                   D( JK+1 ) = A( IK+1+JK*LDA )
00348                   WORK( J+1 ) = D( JK+1 )
00349                   WORK( N+J+1 ) = E( JK+1 )
00350                   TAU( JK+1 ) = TAUI
00351                   WORK( 2*N+J+1 ) = TAU( JK+1 )
00352 *
00353    10          CONTINUE
00354                D( JJ ) = A( II+(JJ-1)*LDA )
00355                WORK( 1 ) = D( JJ )
00356                WORK( N+1 ) = ZERO
00357                WORK( 2*N+1 ) = ZERO
00358 *
00359                CALL SGEBS2D( ICTXT, 'Columnwise', ' ', 1, 3*N, WORK, 1 )
00360 *
00361             ELSE
00362                CALL SGEBR2D( ICTXT, 'Columnwise', ' ', 1, 3*N, WORK, 1,
00363      $                       IAROW, IACOL )
00364                DO 20 J = 2, N
00365                   JN = JJ + J - 1
00366                   D( JN ) = WORK( J )
00367                   E( JN ) = WORK( N+J )
00368                   TAU( JN ) = WORK( 2*N+J )
00369    20          CONTINUE
00370                D( JJ ) = WORK( 1 )
00371             END IF
00372          END IF
00373 *
00374       ELSE
00375 *
00376 *        Process (IAROW, IACOL) owns block to be factorized
00377 *
00378          IF( MYCOL.EQ.IACOL ) THEN
00379             IF( MYROW.EQ.IAROW ) THEN
00380 *
00381 *              Reduce the lower triangle of sub( A )
00382 *
00383                DO 30 J = 1, N - 1
00384                   IK = II + J - 1
00385                   JK = JJ + J - 1
00386 *
00387 *                 Generate elementary reflector H(i) = I - tau * v * v'
00388 *                 to annihilate A(IA+J-JA+2:IA+N-1,JA+J-1)
00389 *
00390                   CALL SLARFG( N-J, A( IK+1+(JK-1)*LDA ),
00391      $                         A( IK+2+(JK-1)*LDA ), 1, TAUI )
00392                   E( JK ) = A( IK+1+(JK-1)*LDA )
00393 *
00394                   IF( TAUI.NE.ZERO ) THEN
00395 *
00396 *                    Apply H(i) from both sides to
00397 *                    A(IA+J-JA+1:IA+N-1,JA+J+1:JA+N-1)
00398 *
00399                      A( IK+1+(JK-1)*LDA ) = ONE
00400 *
00401 *                    Compute  x := tau * A * v  storing y in TAU(i:n-1)
00402 *
00403                      CALL SSYMV( UPLO, N-J, TAUI, A( IK+1+JK*LDA ),
00404      $                           LDA, A( IK+1+(JK-1)*LDA ), 1,
00405      $                           ZERO, TAU( JK ), 1 )
00406 *
00407 *                    Compute  w := x - 1/2 * tau * (x'*v) * v
00408 *
00409                      ALPHA = -HALF*TAUI*SDOT( N-J, TAU( JK ), 1,
00410      $                        A( IK+1+(JK-1)*LDA ), 1 )
00411                      CALL SAXPY( N-J, ALPHA, A( IK+1+(JK-1)*LDA ),
00412      $                           1, TAU( JK ), 1 )
00413 *
00414 *                    Apply the transformation as a rank-2 update:
00415 *                       A := A - v * w' - w * v'
00416 *
00417                      CALL SSYR2( UPLO, N-J, -ONE,
00418      $                           A( IK+1+(JK-1)*LDA ), 1,
00419      $                           TAU( JK ), 1, A( IK+1+JK*LDA ),
00420      $                           LDA )
00421                      A( IK+1+(JK-1)*LDA ) = E( JK )
00422                   END IF
00423 *
00424 *                 Copy D(JK), E(JK), TAU(JK) to broadcast them
00425 *                 columnwise.
00426 *
00427                   D( JK ) = A( IK+(JK-1)*LDA )
00428                   WORK( J ) = D( JK )
00429                   WORK( N+J ) = E( JK )
00430                   TAU( JK ) = TAUI
00431                   WORK( 2*N+J ) = TAU( JK )
00432    30          CONTINUE
00433                JN = JJ + N - 1
00434                D( JN ) = A( II+N-1+(JN-1)*LDA )
00435                WORK( N ) = D( JN )
00436                TAU( JN ) = ZERO
00437                WORK( 2*N ) = ZERO
00438 *
00439                CALL SGEBS2D( ICTXT, 'Columnwise', ' ', 1, 3*N-1, WORK,
00440      $                            1 )
00441 *
00442             ELSE
00443                CALL SGEBR2D( ICTXT, 'Columnwise', ' ', 1, 3*N-1, WORK,
00444      $                       1, IAROW, IACOL )
00445                DO 40 J = 1, N - 1
00446                   JN = JJ + J - 1
00447                   D( JN ) = WORK( J )
00448                   E( JN ) = WORK( N+J )
00449                   TAU( JN ) = WORK( 2*N+J )
00450    40          CONTINUE
00451                JN = JJ + N - 1
00452                D( JN ) = WORK( N )
00453                TAU( JN ) = ZERO
00454             END IF
00455          END IF
00456       END IF
00457 *
00458       WORK( 1 ) = REAL( LWMIN )
00459 *
00460       RETURN
00461 *
00462 *     End of PSSYTD2
00463 *
00464       END