=====================
== ScaLAPACK 1.8.0 ==
=====================
Release date: Th 04/05/2007.
This material is based upon work supported by the National Science Foundation
under Grant No. NSF-0444486.
* ScaLAPACK 1.8.0: What's new
* Thanks
* Developer list
* More details
=============================================
== ScaLAPACK 1.8.0: What's new since 1.7.0 ==
=============================================
1) externalisation of the LAPACK routines: starting from 1.8.0, you NEED the
LAPACK library installed on your machine in order to link/run a ScaLAPACK
application
2) add p[cz]gesvd, the complex version of the SVD driver
3) add p[sdcz]lawrite and [psdcz]laread, tools for easy I/O
4) new directory EXAMPLE that contains a ScaLAPACK example in the 4
precisions
5) bug fixes
=======================================
== Thanks for bug-report/patches to ==
=======================================
Ake Sandgren
HPC2N, Umea University
Robert Granat
Umea University
Greg Henry
Intel
Alan Edelman, Sudarshan Raghunathan
Interactive Super Computing
Yasuhiro Nakahara
Canon inc.
Mark Fahey
ORNL
Desheng Wang
Caltech
===========================
= Principal Investigators =
===========================
Jim Demmel (University or California at Berkeley, USA)
Jack Dongarra (University of Tennessee and ORNL, USA)
===================================================
== ScaLAPACK developers involved in this release ==
===================================================
Peng Du (University of Tennessee, USA)
Julie Langou (University of Tennessee, USA)
Julien Langou (University of Colorado at Denver and Health Sciences Center, USA)
Piotr Luszczek (University of Tennessee, USA)
Osni Marques (Lawrence Berkeley National Laboratory, USA)
=================
== More details =
=================
----------------------------------------------------------------------------
1) externalisation of the LAPACK library
----------------------------------------------------------------------------
Comments:
=========
Until 1.7.x, the LAPACK library was hard-coded in ScaLAPACK, it has been
removed starting from 1.8.0. Consequently, the ScaLAPACK library needs to
link with an existing LAPACK library in order to work properly.
Changes:
========
Remove all the LAPACK routines from TOOLS/LAPACK
----------------------------------------------------------------------------
2) add the complex version of the SVD driver
----------------------------------------------------------------------------
Comments:
=========
Contributed codes by Peng Du (Graduate Research Assistant at UTK, Fall
2005) supervised by Julien.
Changes:
========
A SRC/pcgesvd.f
A SRC/pzgesvd.f
M SRC/Makefile
----------------------------------------------------------------------------
3) add [sdcz]lawrite and [psdcz]laread: they have been adapated from ScaEx
example from Antoine Petitet.
----------------------------------------------------------------------------
Comments:
=========
p[sdcz]lawrite and p[sdcz]laread are in the TOOLS directory.
They provide an easy way to write/read a matrix to/from a file.
Changes:
========
M TOOLS/Makefile
A TOOLS/pclaread.f
A TOOLS/pclawrite.f
A TOOLS/pdlaread.f
A TOOLS/pdlawrite.f
A TOOLS/pslaread.f
A TOOLS/pslawrite.f
A TOOLS/pzlaread.f
A TOOLS/pzlawrite.f
---------------------------------------------------------------------------------
4) a new directory EXAMPLE that contains a ScaLAPACK example in the 4 precisions.
---------------------------------------------------------------------------------
Comments:
=========
In the EXAMPLE directory, you now have a program (declined in the 4
precisions) that solves a linear system by calling the ScaLAPACK routine
PDGESV. The input matrix and right-hand side are read from a file. The
solution is written to a file. To compile and create the example
executables (assuming that all librairies have previously been built), type
***make example*** or ***make*** if you are in the EXAMPLE directory. This
will create the four executables in the TESTING directory:
- xsscaex: for the example using single precision,
- xdscaex: for the example using double precision,
- xcscaex: for the example using complex precision,
- xzscaex: for the example using double complex precision,
and copy the input files in the TESTING directory. The input files are
CSCAEXMAT.dat, CSCAEXRHS.dat, DSCAEXMAT.dat, DSCAEXRHS.dat, SCAEX.dat,
SSCAEXMAT.dat, SSCAEXRHS.dat, ZSCAEXMAT.dat and ZSCAEXRHS.dat.
To run the example programs using MPI, type
mpirun -np xsscaex
(This is the single precision example.)
The results will be written in CSCAEXSOL.dat for xcscaex, DSCAEXSOL.dat for
xdscaex, SSCAEXSOL.dat for xsscaex andZSCAEXSOL.dat for xzscaex.
Changes:
========
A EXAMPLE
A EXAMPLE/CSCAEXMAT.dat
A EXAMPLE/CSCAEXRHS.dat
A EXAMPLE/DSCAEXMAT.dat
A EXAMPLE/DSCAEXRHS.dat
A EXAMPLE/Makefile
A EXAMPLE/SCAEX.dat
A EXAMPLE/SSCAEXMAT.dat
A EXAMPLE/SSCAEXRHS.dat
A EXAMPLE/ZSCAEXMAT.dat
A EXAMPLE/ZSCAEXRHS.dat
A EXAMPLE/pcscaex.f
A EXAMPLE/pdscaex.f
A EXAMPLE/pdscaexinfo.f
A EXAMPLE/psscaex.f
A EXAMPLE/pzscaex.f
M Makefile
----------------------------------------------------------------------------
4) bug fixes
----------------------------------------------------------------------------
---------------------------------------------------
4.1) Add a define for crot and zrot in SRC/pblas.h
---------------------------------------------------
Changes:
========
M SRC/pblas.h
-------------------------------------------------------
4.2) Patches provided by Ake Sandgren and Robert Granat
-------------------------------------------------------
All these was found with pathscale compiler with -trapuv -O0 -g which
initialized everything to NaN and turns FPE traps on.
Comments:
=========
The set of patches does two things.
1 - reduce the usage of uninitialized variables
2 - fix a couple of incorrect calls to blacs (bad LDA)
* gehdrv *
The gehdrv patch is just the complete patch related to
https://icl.cs.utk.edu/lapack-forum/viewtopic.php?p=1153#1153
* pzsepinfo *
pxsepinfo doesnt initialize THRESH when INFO != 0.
* pxlahrd and lasorte *
The lahqr patch and a fix to lasorte needed by lahqr which used to get
IERR != 0 back from lasorte.
The T2 = T1*V2 and T3 = T1*V3 moves are needed due to uninitialized
data.
The 2 changed IF-statements where brought about to make getting and
sending SMALLA consistent.
The ISTOP change at the bottom is a copy of the corresponding statement
at the top of the loop.
The init of VCOPY and SMALLA are neccesary.
lasorte couldn't handle a situation where the top S(1,1) eigenvalue was
real.
This set of patches have been tested as can be seen on
https://icl.cs.utk.edu/lapack-forum/viewtopic.php?p=1196#1196
The current pxlahrd fix might not be the best. Maybe something should be
done in pxlarfg instead since alpha isn't set in all cases there, like
myrow != ixrow for row distribution and likewise for column distribution.
* pxlasmsub *
pxlasmsub destroys irow1/icol1 in the "find some norm of the local H"
part.
* pxrot *
pxrot used incorrect LDA values for buff in several places, not sure if
the intention was to have buff Mx1 or 1xM but it shouldn't really matter
should it?
* PBLAS/pxscal *
PBLAS/pxscal must not test ALPHA unless it is really going to be used
since scalapack routines sometimes call pxscal with ALPHA uninitialized
when myrow != Xrow/mycol != Xcol.
* pxstein *
pxstein must initialize ONENRM since it isn't always initialized in the
"IF( NBLK.EQ.IBLOCK( NEXT-1 ) .AND. NBLK.NE.OLNBLK ) THEN" case before
being used in the "IF( TMPFAC.GT.ODM18 ) THEN" case. Maybe setting to
ZERO is wrong but its not worse then the original code.
* pxtrevc *
pxtrevc and pxevcdriver are just incorrect LDA param to blacs routines.
Changes:
========
M PBLAS/SRC/pcscal_.c
M PBLAS/SRC/pdscal_.c
M PBLAS/SRC/psscal_.c
M PBLAS/SRC/pzscal_.c
M SLmake.inc
M SRC/dlasorte.f
M SRC/pclahqr.f
M SRC/pclahrd.f
M SRC/pclasmsub.f
M SRC/pcrot.c
M SRC/pcstein.f
M SRC/pctrevc.f
M SRC/pdlahqr.f
M SRC/pdlahrd.f
M SRC/pdlasmsub.f
M SRC/pdstein.f
M SRC/pslahqr.f
M SRC/pslahrd.f
M SRC/pslasmsub.f
M SRC/psstein.f
M SRC/pzlahqr.f
M SRC/pzlahrd.f
M SRC/pzlasmsub.f
M SRC/pzrot.c
M SRC/pzstein.f
M SRC/pztrevc.f
M SRC/slasorte.f
M TESTING/EIG/pcevcdriver.f
M TESTING/EIG/pcgehdrv.f
M TESTING/EIG/pcgsepreq.f
M TESTING/EIG/pdgehdrv.f
M TESTING/EIG/pdgsepreq.f
M TESTING/EIG/psgehdrv.f
M TESTING/EIG/psgsepreq.f
M TESTING/EIG/pzevcdriver.f
M TESTING/EIG/pzgehdrv.f
M TESTING/EIG/pzgsepreq.f
----------------
4.3) pxinvdriver
----------------
Comments:
=========
Following up on the latest modification (see below). We have increased the size of the integer
workspace in the rectangular case. We now report the new integer block size
calculation in the tester. So that the LIWORK given by the tester to the
PxGETRI is big enough ...
Changes:
========
M TESTING/LIN/pcinvdriver.f
M TESTING/LIN/pdinvdriver.f
M TESTING/LIN/psinvdriver.f
M TESTING/LIN/pzinvdriver.f
-----------------------------------------------------------------
4.4) Correct the integer workspace (IWORK) calculation in PxGETRI
-----------------------------------------------------------------
Comments:
=========
Bug report send by Desheng Wang from Caltech on scalapack@cs.utk.edu,
Mon. May, 1st 2006.
Fix:
Replace the line 221-222:
LIWMIN = NQ + MAX( ICEIL( ICEIL( MP, DESCA( MB_ ) ),
$ LCM / NPROW ), DESCA( NB_ ) )
By:
LIWMIN = NUMROC( DESCA( M_ ) + DESCA( MB_ ) * NPROW
$ + MOD ( IA - 1, DESCA( MB_ ) ), DESCA ( NB_ ),
$ MYCOL, DESCA( CSRC_ ), NPCOL ) +
$ MAX ( DESCA( MB_ ) * ICEIL ( ICEIL(
$ NUMROC( DESCA( M_ ) + DESCA( MB_ ) * NPROW,
$ DESCA( MB_ ), MYROW, DESCA( RSRC_ ), NPROW ),
$ DESCA( MB_ ) ), LCM / NPROW ), DESCA( NB_ ) )
Yep, slightly more complex...
The error in the first computation is that it misinterprets the statement
in PxLAPIV: The formula for the integer worskpace calculation in PxLAPIV is
LDW = LOCc( M_P + MOD(IP-1, MB_P) ) +
MB_P * CEIL( CEIL(LOCr(M_P)/MB_P) / (LCM/NPROW) )
where M_P is the local size of the IPIV. But the IPIV is slighlty bigger
than A, the global size of IPIV is:
MP = DESCA( M_ ) + DESCA( MB_ ) * NPROW (and not DESCA(M_)).
The other quantities are given by
M_P is the global length of the pivot vector
MP = DESCA( M_ ) + DESCA( MB_ ) * NPROW
I_P is IA
I_P = IA
MB_P is the block size use for the block cyclic distribution of the
pivot vector
MB_P = DESCA (MB_ )
LOCc ( . )
NUMROC ( . , DESCA ( NB_ ), MYCOL, DESCA ( CSRC_ ), NPCOL )
LOCr ( . )
NUMROC ( . , DESCA ( MB_ ), MYROW, DESCA ( RSRC_ ), NPROW )
CEIL ( X / Y )
ICEIL( X, Y )
LCM
LCM = ILCM( NPROW, NPCOL )
and this gives the new formula to compute the integer workspace.
Changes:
========
M SRC/pcgetri.f
M SRC/pdgetri.f
M SRC/psgetri.f
M SRC/pzgetri.f
-----------------------------------------------------------------
4.5) Correct the integer workspace (IWORK) calculation in PxGETRI
-----------------------------------------------------------------
Comments:
=========
Bug report from Yasuhiro Nakahara (Canon inc.) on 03/13/2006.
Patch from Greg Henry (Intel) and Mark Fahey (ORNL).
Description: pzlahqr routine was aborted due to a segmentation fault.
I found an invalid memory access at the line 525 in pzlahqr.f.
In the DO-loop, with II=1, S1(1, 0) was accessed.
Greg said:
> There is an easy fix for this- the idea of exceptional shifts is to
> just try something outside the norm based on the size of the diagonal
> elements. The offending part can be removed from the code without a
> loss of generality. I think I may be able to come with an alternate
> solution.
move from
DO 20 II = 2*JBLK, 1, -1
S1( II, II ) = CONST*( CABS1( S1( II, II ) )+
$ CABS1( S1( II, II-1 ) ) )
S1( II, II-1 ) = ZERO
S1( II-1, II ) = ZERO
20 CONTINUE
(with problem when II=1 ...) to
DO 20 II = 2*JBLK, 2, -1
S1( II, II ) = CONST*( CABS1( S1( II, II ) )+
$ CABS1( S1( II, II-1 ) ) )
S1( II, II-1 ) = ZERO
S1( II-1, II ) = ZERO
20 CONTINUE
S1( 1, 1 ) = CONST*CABS1( S1( 1, 1 ) )
Note that this part of the code is not exercized by the testing.
(So the bug was hard to find.)
Changes:
========
M SRC/pclahqr.f
M SRC/pdlahqr.f
M SRC/pslahqr.f
M SRC/pzlahqr.f
----------------------------------------------------------------------
4.6) Correct typo in the [S,D,C,Z]gesvd files for the delaclaration of
P[S,D,C,Z]ORMBRQLN
----------------------------------------------------------------------
Changes:
========
M SRC/pcgesvd.f
M SRC/pdgesvd.f
M SRC/psgesvd.f
M SRC/pzgesvd.f
-----------------------------------------------------------------
4.7) Modify typo in comment + description of workspace.
-----------------------------------------------------------------
Comments:
=========
When RANGE='V', work need to be of dimension 3
Changes:
========
M SRC/pcheevx.f
M SRC/pchegvx.f
M SRC/pdsyevx.f
M SRC/pdsygvx.f
M SRC/pssyevx.f
M SRC/pssygvx.f
M SRC/pzheevx.f
M SRC/pzhegvx.f
-----------------------------------------------------------------
4.8) Correction of a Typo mistake in the work comment.
-----------------------------------------------------------------
Changes:
========
M SRC/pdsyevx.f
----------------------------------------------------------------------
4.9) modify the workspace size of xBDSQR to follow the revision 184 of
LAPACK the workspace size of xBDSQR has moved from
----------------------------------------------------------------------
Comments:
=========
modify the workspace size of xBDSQR to follow the revision 184 of LAPACK
the workspace size of xBDSQR has moved from
* WDBDSQR = MAX(1, 4*SIZE )
to
* WDBDSQR = MAX(1, 2*SIZE + (2*SIZE - 4)*MAX(WANTU, WANTVT))
and is now back to
* WDBDSQR = MAX(1, 4*SIZE )
so SVD of ScaLAPACK is following (at least let us take the max of both until
LAPACK is fixed on its workspace size)
Changes:
========
M SRC/psgesvd.f
M SRC/pcgesvd.f
M SRC/pzgesvd.f
M SRC/pdgesvd.f
-----------------------------------------------------------
4.10) correct a bug in the workspace utilisation of p_gesvd
-----------------------------------------------------------
Comments:
=========
[Julien/Osni] correct a bug in the workspace utilisation of p_gesvd. In
the case jobU='V' and jobVT='V', the routine has good pointers,
otherwise the pointers in the workspace where shifted as if matrices U
and VT existed which implied out of bound reference for the value
stored at the end of the workspace. There was also a few problems at
the end of the code with some sizes in the case of rectangular matrices.
Changes:
========
M SRC/psgesvd.f
M SRC/pdgesvd.f
------------------------------
4.11) Documentation correction
------------------------------
Comments:
=========
* SRC/p[s,d,c,z]gesv.f *
[Julien]
correction in the description of the parameter NRHS
(it's the number of columns of B not A)
* SRC/p[s,d]lared1d.f *
* SRC/p[s,d]lared2d.f *
[Julien]
The comments in the routines p[s,d]lared2d (where the initial vectors are stored by row)
were wrong (basically replace BYCOL by BYROW)
Changes:
========
M SRC/p[s,d,c,z]gesv.f
M SRC/p[s,d]lared1d.f
M SRC/p[s,d]lared2d.f
-------------------------
4.12) bug in p[s/d]lahrd
------------------------
Comments:
=========
Although the Schur form returned by p[s/d]lahqr was correct (as tested by
the testing routine), the returned eigenvalues were not computed
correctely. This bug was reported by Interactive Supercompting
(Thanks!). The bug was already found by Greg Henry in March 2002 but the
patch has never been released. Here we go.
Changes:
========
M SRC/p[s/d]lahrd.f
-----------------------------------------------------------------
4.13) Initial import from netlib
-----------------------------------------------------------------
Comments:
=========
For ScaLAPACK: Scalapack 1.7 + patch
patch contains:
PBLAS/SRC/PBtools.h 3/12/2002 Comment out CSYMM reference (line 57)
PBLAS/SRC/pblas.h 3/15/2002 Added missing crot define
SRC/psdbtrf.f 3/12/2002 Typo (DLACPY->SLACPY) in EXTERNAL declaration (line 374)
SRC/pcheevd.f 3/25/2002 Correction to LRWORK (lines 117, 248) and INFO=0 return
SRC/pzheevd.f 3/25/2002 Correction to LRWORK (lines 117, 248) and INFO=0 return
TESTING/EIG/pcseptst.f 3/15/2002 Correction to LHEEVDSIZE calculation (line 1064)
TESTING/EIG/pzseptst.f 3/15/2002 Correction to LHEEVDSIZE calculation (line 1064)
for more information, please visit:
http://www.netlib.org/scalapack/errata.html#sourcecode
Changes:
========
M PBLAS/SRC/PBtools.h
M PBLAS/SRC/pblas.h
M SRC/psdbtrf.f
M SRC/pcheevd.f
M SRC/pzheevd.f
M TESTING/EIG/pcseptst.f
M TESTING/EIG/pzseptst.f
----------------------------------------
4.14) Modification on the BLACS tar ball
----------------------------------------
Comments:
=========
for BLACS: Blacs : pvmblacs + mpiblacs + blacs tester from netlib +
patch-3 + correction on the Makefile from the INSTALL directory
For patch details, see:
http://www.netlib.org/blacs/old_errata.blacs for details
the ***make clean*** now deletes the following files:
tc_cCsameF77.o tc_fCsameF77.o tc_UseMpich.o
Changes:
========
INSTALL/Makefile