ScaLAPACK  2.0.2
ScaLAPACK: Scalable Linear Algebra PACKage
pzgebrd.f
Go to the documentation of this file.
00001       SUBROUTINE PZGEBRD( M, N, A, IA, JA, DESCA, D, E, TAUQ, TAUP,
00002      $                    WORK, LWORK, INFO )
00003 *
00004 *  -- ScaLAPACK routine (version 1.7) --
00005 *     University of Tennessee, Knoxville, Oak Ridge National Laboratory,
00006 *     and University of California, Berkeley.
00007 *     May 25, 2001
00008 *
00009 *     .. Scalar Arguments ..
00010       INTEGER            IA, INFO, JA, LWORK, M, N
00011 *     ..
00012 *     .. Array Arguments ..
00013       INTEGER            DESCA( * )
00014       DOUBLE PRECISION   D( * ), E( * )
00015       COMPLEX*16         A( * ), TAUP( * ), TAUQ( * ), WORK( * )
00016 *     ..
00017 *
00018 *  Purpose
00019 *  =======
00020 *
00021 *  PZGEBRD reduces a complex general M-by-N distributed matrix
00022 *  sub( A ) = A(IA:IA+M-1,JA:JA+N-1) to upper or lower bidiagonal
00023 *  form B by an unitary transformation: Q' * sub( A ) * P = B.
00024 *
00025 *  If M >= N, B is upper bidiagonal; if M < N, B is lower bidiagonal.
00026 *
00027 *  Notes
00028 *  =====
00029 *
00030 *  Each global data object is described by an associated description
00031 *  vector.  This vector stores the information required to establish
00032 *  the mapping between an object element and its corresponding process
00033 *  and memory location.
00034 *
00035 *  Let A be a generic term for any 2D block cyclicly distributed array.
00036 *  Such a global array has an associated description vector DESCA.
00037 *  In the following comments, the character _ should be read as
00038 *  "of the global array".
00039 *
00040 *  NOTATION        STORED IN      EXPLANATION
00041 *  --------------- -------------- --------------------------------------
00042 *  DTYPE_A(global) DESCA( DTYPE_ )The descriptor type.  In this case,
00043 *                                 DTYPE_A = 1.
00044 *  CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
00045 *                                 the BLACS process grid A is distribu-
00046 *                                 ted over. The context itself is glo-
00047 *                                 bal, but the handle (the integer
00048 *                                 value) may vary.
00049 *  M_A    (global) DESCA( M_ )    The number of rows in the global
00050 *                                 array A.
00051 *  N_A    (global) DESCA( N_ )    The number of columns in the global
00052 *                                 array A.
00053 *  MB_A   (global) DESCA( MB_ )   The blocking factor used to distribute
00054 *                                 the rows of the array.
00055 *  NB_A   (global) DESCA( NB_ )   The blocking factor used to distribute
00056 *                                 the columns of the array.
00057 *  RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
00058 *                                 row of the array A is distributed.
00059 *  CSRC_A (global) DESCA( CSRC_ ) The process column over which the
00060 *                                 first column of the array A is
00061 *                                 distributed.
00062 *  LLD_A  (local)  DESCA( LLD_ )  The leading dimension of the local
00063 *                                 array.  LLD_A >= MAX(1,LOCr(M_A)).
00064 *
00065 *  Let K be the number of rows or columns of a distributed matrix,
00066 *  and assume that its process grid has dimension p x q.
00067 *  LOCr( K ) denotes the number of elements of K that a process
00068 *  would receive if K were distributed over the p processes of its
00069 *  process column.
00070 *  Similarly, LOCc( K ) denotes the number of elements of K that a
00071 *  process would receive if K were distributed over the q processes of
00072 *  its process row.
00073 *  The values of LOCr() and LOCc() may be determined via a call to the
00074 *  ScaLAPACK tool function, NUMROC:
00075 *          LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
00076 *          LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).
00077 *  An upper bound for these quantities may be computed by:
00078 *          LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
00079 *          LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A
00080 *
00081 *  Arguments
00082 *  =========
00083 *
00084 *  M       (global input) INTEGER
00085 *          The number of rows to be operated on, i.e. the number of rows
00086 *          of the distributed submatrix sub( A ). M >= 0.
00087 *
00088 *  N       (global input) INTEGER
00089 *          The number of columns to be operated on, i.e. the number of
00090 *          columns of the distributed submatrix sub( A ). N >= 0.
00091 *
00092 *  A       (local input/local output) COMPLEX*16 pointer into the
00093 *          local memory to an array of dimension (LLD_A,LOCc(JA+N-1)).
00094 *          On entry, this array contains the local pieces of the
00095 *          general distributed matrix sub( A ). On exit, if M >= N,
00096 *          the diagonal and the first superdiagonal of sub( A ) are
00097 *          overwritten with the upper bidiagonal matrix B; the elements
00098 *          below the diagonal, with the array TAUQ, represent the
00099 *          unitary matrix Q as a product of elementary reflectors, and
00100 *          the elements above the first superdiagonal, with the array
00101 *          TAUP, represent the orthogonal matrix P as a product of
00102 *          elementary reflectors. If M < N, the diagonal and the first
00103 *          subdiagonal are overwritten with the lower bidiagonal
00104 *          matrix B; the elements below the first subdiagonal, with the
00105 *          array TAUQ, represent the unitary matrix Q as a product of
00106 *          elementary reflectors, and the elements above the diagonal,
00107 *          with the array TAUP, represent the orthogonal matrix P as a
00108 *          product of elementary reflectors. See Further Details.
00109 *
00110 *  IA      (global input) INTEGER
00111 *          The row index in the global array A indicating the first
00112 *          row of sub( A ).
00113 *
00114 *  JA      (global input) INTEGER
00115 *          The column index in the global array A indicating the
00116 *          first column of sub( A ).
00117 *
00118 *  DESCA   (global and local input) INTEGER array of dimension DLEN_.
00119 *          The array descriptor for the distributed matrix A.
00120 *
00121 *  D       (local output) DOUBLE PRECISION array, dimension
00122 *          LOCc(JA+MIN(M,N)-1) if M >= N; LOCr(IA+MIN(M,N)-1) otherwise.
00123 *          The distributed diagonal elements of the bidiagonal matrix
00124 *          B: D(i) = A(i,i). D is tied to the distributed matrix A.
00125 *
00126 *  E       (local output) DOUBLE PRECISION array, dimension
00127 *          LOCr(IA+MIN(M,N)-1) if M >= N; LOCc(JA+MIN(M,N)-2) otherwise.
00128 *          The distributed off-diagonal elements of the bidiagonal
00129 *          distributed matrix B:
00130 *          if m >= n, E(i) = A(i,i+1) for i = 1,2,...,n-1;
00131 *          if m < n, E(i) = A(i+1,i) for i = 1,2,...,m-1.
00132 *          E is tied to the distributed matrix A.
00133 *
00134 *  TAUQ    (local output) COMPLEX*16 array dimension
00135 *          LOCc(JA+MIN(M,N)-1). The scalar factors of the elementary
00136 *          reflectors which represent the unitary matrix Q. TAUQ is
00137 *          tied to the distributed matrix A. See Further Details.
00138 *
00139 *  TAUP    (local output) COMPLEX*16 array, dimension
00140 *          LOCr(IA+MIN(M,N)-1). The scalar factors of the elementary
00141 *          reflectors which represent the unitary matrix P. TAUP is
00142 *          tied to the distributed matrix A. See Further Details.
00143 *
00144 *  WORK    (local workspace/local output) COMPLEX*16 array,
00145 *                                                 dimension (LWORK)
00146 *          On exit, WORK( 1 ) returns the minimal and optimal LWORK.
00147 *
00148 *  LWORK   (local or global input) INTEGER
00149 *          The dimension of the array WORK.
00150 *          LWORK is local input and must be at least
00151 *          LWORK >= NB*( MpA0 + NqA0 + 1 ) + NqA0
00152 *
00153 *          where NB = MB_A = NB_A,
00154 *          IROFFA = MOD( IA-1, NB ), ICOFFA = MOD( JA-1, NB ),
00155 *          IAROW = INDXG2P( IA, NB, MYROW, RSRC_A, NPROW ),
00156 *          IACOL = INDXG2P( JA, NB, MYCOL, CSRC_A, NPCOL ),
00157 *          MpA0 = NUMROC( M+IROFFA, NB, MYROW, IAROW, NPROW ),
00158 *          NqA0 = NUMROC( N+ICOFFA, NB, MYCOL, IACOL, NPCOL ).
00159 *
00160 *          INDXG2P and NUMROC are ScaLAPACK tool functions;
00161 *          MYROW, MYCOL, NPROW and NPCOL can be determined by calling
00162 *          the subroutine BLACS_GRIDINFO.
00163 *
00164 *          If LWORK = -1, then LWORK is global input and a workspace
00165 *          query is assumed; the routine only calculates the minimum
00166 *          and optimal size for all work arrays. Each of these
00167 *          values is returned in the first entry of the corresponding
00168 *          work array, and no error message is issued by PXERBLA.
00169 *
00170 *  INFO    (global output) INTEGER
00171 *          = 0:  successful exit
00172 *          < 0:  If the i-th argument is an array and the j-entry had
00173 *                an illegal value, then INFO = -(i*100+j), if the i-th
00174 *                argument is a scalar and had an illegal value, then
00175 *                INFO = -i.
00176 *
00177 *  Further Details
00178 *  ===============
00179 *
00180 *  The matrices Q and P are represented as products of elementary
00181 *  reflectors:
00182 *
00183 *  If m >= n,
00184 *
00185 *     Q = H(1) H(2) . . . H(n)  and  P = G(1) G(2) . . . G(n-1)
00186 *
00187 *  Each H(i) and G(i) has the form:
00188 *
00189 *     H(i) = I - tauq * v * v'  and G(i) = I - taup * u * u'
00190 *
00191 *  where tauq and taup are complex scalars, and v and u are complex
00192 *  vectors;
00193 *  v(1:i-1) = 0, v(i) = 1, and v(i+1:m) is stored on exit in
00194 *  A(ia+i:ia+m-1,ja+i-1);
00195 *  u(1:i) = 0, u(i+1) = 1, and u(i+2:n) is stored on exit in
00196 *  A(ia+i-1,ja+i+1:ja+n-1);
00197 *  tauq is stored in TAUQ(ja+i-1) and taup in TAUP(ia+i-1).
00198 *
00199 *  If m < n,
00200 *
00201 *     Q = H(1) H(2) . . . H(m-1)  and  P = G(1) G(2) . . . G(m)
00202 *
00203 *  Each H(i) and G(i) has the form:
00204 *
00205 *     H(i) = I - tauq * v * v'  and G(i) = I - taup * u * u'
00206 *
00207 *  where tauq and taup are complex scalars, and v and u are complex
00208 *  vectors;
00209 *  v(1:i) = 0, v(i+1) = 1, and v(i+2:m) is stored on exit in
00210 *  A(ia+i+1:ia+m-1,ja+i-1);
00211 *  u(1:i-1) = 0, u(i) = 1, and u(i+1:n) is stored on exit in
00212 *  A(ia+i-1,ja+i:ja+n-1);
00213 *  tauq is stored in TAUQ(ja+i-1) and taup in TAUP(ia+i-1).
00214 *
00215 *  The contents of sub( A ) on exit are illustrated by the following
00216 *  examples:
00217 *
00218 *  m = 6 and n = 5 (m > n):          m = 5 and n = 6 (m < n):
00219 *
00220 *    (  d   e   u1  u1  u1 )           (  d   u1  u1  u1  u1  u1 )
00221 *    (  v1  d   e   u2  u2 )           (  e   d   u2  u2  u2  u2 )
00222 *    (  v1  v2  d   e   u3 )           (  v1  e   d   u3  u3  u3 )
00223 *    (  v1  v2  v3  d   e  )           (  v1  v2  e   d   u4  u4 )
00224 *    (  v1  v2  v3  v4  d  )           (  v1  v2  v3  e   d   u5 )
00225 *    (  v1  v2  v3  v4  v5 )
00226 *
00227 *  where d and e denote diagonal and off-diagonal elements of B, vi
00228 *  denotes an element of the vector defining H(i), and ui an element of
00229 *  the vector defining G(i).
00230 *
00231 *  Alignment requirements
00232 *  ======================
00233 *
00234 *  The distributed submatrix sub( A ) must verify some alignment proper-
00235 *  ties, namely the following expressions should be true:
00236 *  ( MB_A.EQ.NB_A .AND. IROFFA.EQ.ICOFFA )
00237 *
00238 *  =====================================================================
00239 *
00240 *     .. Parameters ..
00241       INTEGER            BLOCK_CYCLIC_2D, CSRC_, CTXT_, DLEN_, DTYPE_,
00242      $                   LLD_, MB_, M_, NB_, N_, RSRC_
00243       PARAMETER          ( BLOCK_CYCLIC_2D = 1, DLEN_ = 9, DTYPE_ = 1,
00244      $                     CTXT_ = 2, M_ = 3, N_ = 4, MB_ = 5, NB_ = 6,
00245      $                     RSRC_ = 7, CSRC_ = 8, LLD_ = 9 )
00246       COMPLEX*16         ONE
00247       PARAMETER          ( ONE = ( 1.0D+0, 0.0D+0 ) )
00248 *     ..
00249 *     .. Local Scalars ..
00250       LOGICAL            LQUERY
00251       CHARACTER          COLCTOP, ROWCTOP
00252       INTEGER            I, IACOL, IAROW, ICTXT, IINFO, IOFF, IPW, IPY,
00253      $                   IW, J, JB, JS, JW, K, L, LWMIN, MN, MP, MYCOL,
00254      $                   MYROW, NB, NPCOL, NPROW, NQ
00255 *     ..
00256 *     .. Local Arrays ..
00257       INTEGER            DESCWX( DLEN_ ), DESCWY( DLEN_ ), IDUM1( 1 ),
00258      $                   IDUM2( 1 )
00259 *     ..
00260 *     .. External Subroutines ..
00261       EXTERNAL           BLACS_GRIDINFO, CHK1MAT, DESCSET, PCHK1MAT,
00262      $                   PB_TOPGET, PB_TOPSET, PXERBLA, PZELSET,
00263      $                   PZGEBD2, PZGEMM, PZLABRD
00264 *     ..
00265 *     .. External Functions ..
00266       INTEGER            INDXG2L, INDXG2P, NUMROC
00267       EXTERNAL           INDXG2L, INDXG2P, NUMROC
00268 *     ..
00269 *     .. Intrinsic Functions ..
00270       INTRINSIC          DCMPLX, DBLE, MAX, MIN, MOD
00271 *     ..
00272 *     .. Executable Statements ..
00273 *
00274 *     Get grid parameters
00275 *
00276       ICTXT = DESCA( CTXT_ )
00277       CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )
00278 *
00279 *     Test the input parameters
00280 *
00281       INFO = 0
00282       IF( NPROW.EQ.-1 ) THEN
00283          INFO = -(600+CTXT_)
00284       ELSE
00285          CALL CHK1MAT( M, 1, N, 2, IA, JA, DESCA, 6, INFO )
00286          IF( INFO.EQ.0 ) THEN
00287             NB = DESCA( MB_ )
00288             IOFF = MOD( IA-1, DESCA( MB_ ) )
00289             IAROW = INDXG2P( IA, NB, MYROW, DESCA( RSRC_ ), NPROW )
00290             IACOL = INDXG2P( JA, NB, MYCOL, DESCA( CSRC_ ), NPCOL )
00291             MP = NUMROC( M+IOFF, NB, MYROW, IAROW, NPROW )
00292             NQ = NUMROC( N+IOFF, NB, MYCOL, IACOL, NPCOL )
00293             LWMIN = NB*( MP+NQ+1 ) + NQ
00294 *
00295             WORK( 1 ) = DCMPLX( DBLE( LWMIN ) )
00296             LQUERY = ( LWORK.EQ.-1 )
00297             IF( IOFF.NE.MOD( JA-1, DESCA( NB_ ) ) ) THEN
00298                INFO = -5
00299             ELSE IF( NB.NE.DESCA( NB_ ) ) THEN
00300                INFO = -(600+NB_)
00301             ELSE IF( LWORK.LT.LWMIN .AND. .NOT.LQUERY ) THEN
00302                INFO = -12
00303             END IF
00304          END IF
00305          IF( LQUERY ) THEN
00306             IDUM1( 1 ) = -1
00307          ELSE
00308             IDUM1( 1 ) = 1
00309          END IF
00310          IDUM2( 1 ) = 12
00311          CALL PCHK1MAT( M, 1, N, 2, IA, JA, DESCA, 6, 1, IDUM1, IDUM2,
00312      $                  INFO )
00313       END IF
00314 *
00315       IF( INFO.LT.0 ) THEN
00316          CALL PXERBLA( ICTXT, 'PZGEBRD', -INFO )
00317          RETURN
00318       ELSE IF( LQUERY ) THEN
00319          RETURN
00320       END IF
00321 *
00322 *     Quick return if possible
00323 *
00324       MN = MIN( M, N )
00325       IF( MN.EQ.0 )
00326      $   RETURN
00327 *
00328 *     Initialize parameters.
00329 *
00330       CALL PB_TOPGET( ICTXT, 'Combine', 'Columnwise', COLCTOP )
00331       CALL PB_TOPGET( ICTXT, 'Combine', 'Rowwise',    ROWCTOP )
00332       CALL PB_TOPSET( ICTXT, 'Combine', 'Columnwise', '1-tree' )
00333       CALL PB_TOPSET( ICTXT, 'Combine', 'Rowwise',    '1-tree' )
00334 *
00335       IPY = MP * NB + 1
00336       IPW = NQ * NB + IPY
00337 *
00338       CALL DESCSET( DESCWX, M+IOFF, NB, NB, NB, IAROW, IACOL, ICTXT,
00339      $              MAX( 1, MP ) )
00340       CALL DESCSET( DESCWY, NB, N+IOFF, NB, NB, IAROW, IACOL, ICTXT,
00341      $              NB )
00342 *
00343       MP = NUMROC( M+IA-1, NB, MYROW, DESCA( RSRC_ ), NPROW )
00344       NQ = NUMROC( N+JA-1, NB, MYCOL, DESCA( CSRC_ ), NPCOL )
00345       K  = 1
00346       JB = NB - IOFF
00347       IW = IOFF + 1
00348       JW = IOFF + 1
00349 *
00350       DO 10 L = 1, MN+IOFF-NB, NB
00351          I = IA + K - 1
00352          J = JA + K - 1
00353 *
00354 *        Reduce rows and columns i:i+nb-1 to bidiagonal form and return
00355 *        the matrices X and Y which are needed to update the unreduced
00356 *        part of the matrix.
00357 *
00358          CALL PZLABRD( M-K+1, N-K+1, JB, A, I, J, DESCA, D, E, TAUQ,
00359      $                 TAUP, WORK, IW, JW, DESCWX, WORK( IPY ), IW,
00360      $                 JW, DESCWY, WORK( IPW ) )
00361 *
00362 *        Update the trailing submatrix A(i+nb:ia+m-1,j+nb:ja+n-1), using
00363 *        an update of the form  A := A - V*Y' - X*U'.
00364 *
00365          CALL PZGEMM( 'No transpose', 'No transpose', M-K-JB+1,
00366      $                N-K-JB+1, JB, -ONE, A, I+JB, J, DESCA,
00367      $                WORK( IPY ), IW, JW+JB, DESCWY, ONE, A, I+JB,
00368      $                J+JB, DESCA )
00369          CALL PZGEMM( 'No transpose', 'No transpose', M-K-JB+1,
00370      $                N-K-JB+1, JB, -ONE, WORK, IW+JB, JW, DESCWX, A, I,
00371      $                J+JB, DESCA, ONE, A, I+JB, J+JB, DESCA )
00372 *
00373 *        Copy last off-diagonal elements of B back into sub( A ).
00374 *
00375          IF( M.GE.N ) THEN
00376             JS = MIN( INDXG2L( I+JB-1, NB, 0, DESCA( RSRC_ ), NPROW ),
00377      $                MP )
00378             IF( JS.GT.0 )
00379      $         CALL PZELSET( A, I+JB-1, J+JB, DESCA, DCMPLX( E( JS ) ) )
00380          ELSE
00381             JS = MIN( INDXG2L( J+JB-1, NB, 0, DESCA( CSRC_ ), NPCOL ),
00382      $                NQ )
00383             IF( JS.GT.0 )
00384      $         CALL PZELSET( A, I+JB, J+JB-1, DESCA, DCMPLX( E( JS ) ) )
00385          END IF
00386 *
00387          K = K + JB
00388          JB = NB
00389          IW = 1
00390          JW = 1
00391          DESCWX( M_ ) = DESCWX( M_ ) - JB
00392          DESCWX( RSRC_ ) = MOD( DESCWX( RSRC_ ) + 1, NPROW )
00393          DESCWX( CSRC_ ) = MOD( DESCWX( CSRC_ ) + 1, NPCOL )
00394          DESCWY( N_ ) = DESCWY( N_ ) - JB
00395          DESCWY( RSRC_ ) = MOD( DESCWY( RSRC_ ) + 1, NPROW )
00396          DESCWY( CSRC_ ) = MOD( DESCWY( CSRC_ ) + 1, NPCOL )
00397 *
00398    10 CONTINUE
00399 *
00400 *     Use unblocked code to reduce the remainder of the matrix.
00401 *
00402       CALL PZGEBD2( M-K+1, N-K+1, A, IA+K-1, JA+K-1, DESCA, D, E, TAUQ,
00403      $              TAUP, WORK, LWORK, IINFO )
00404 *
00405       CALL PB_TOPSET( ICTXT, 'Combine', 'Columnwise', COLCTOP )
00406       CALL PB_TOPSET( ICTXT, 'Combine', 'Rowwise',    ROWCTOP )
00407 *
00408       WORK( 1 ) = DCMPLX( DBLE( LWMIN ) )
00409 *
00410       RETURN
00411 *
00412 *     End of PZGEBRD
00413 *
00414       END