================
== LAPACK 3.2 ==
================

Release date: Su 11/16/2008.

This material is based upon work supported by the National Science Foundation,
the Department of Energy and the MathWorks under Grant No.  NSF-CCF-00444486,
NSF-CNS-0325873, NSF-EIA 0122599, NSF-ACI-0090127, DOE-DE-FC02-01ER25478,
DOE-DE-FC02-06ER25768.

  * LAPACK 3.2: What's new
  * References
  * Contributor list
  * Developer list
  * Interface changes
  * More details
  * Expected additions and improvements for the future =

=============================
== LAPACK 3.2: What's new  ==
=============================

(1) Extra Precise Iterative Refinement: New linear solvers that "guarantee"
fully accurate answers (or give a warning that the answer cannot be trusted).
The matrix types supported in this release are: GE (general), SY (symmetric),
PO (positive definite), HE (Hermitian), and GB (general band)  in all the
relevant precisions.  See reference [3] below.

(2) XBLAS, or portable "extra precise BLAS": our new linear solvers in (1)
depend on these to perform iterative refinement. See reference [3] below.
The XBLAS will be released in a separarate package. See "More Details".

(3) Non-Negative Diagonals from Householder QR: The QR factorization routines
now guarantee that the diagonal is both real and non-negative.  Factoring a
uniformly random matrix now correctly generates an orthogonal Q from the Haar
distribution.  See reference [4] below.

(4) High Performance QR and Householder Reflections on Low-Profile Matrices:
The auxiliary routines to apply Householder reflections (e.g. DLARFB)
automatically reduce the cost of QR from O(n^3) to O(n^2) for matrices stored
in a dense format but with a "narrow profile" (including but not limited to
band matrices) with no user interface changes.  Other users of these routines
can see similar benefits.  See reference [4] below.

(5) New fast and accurate Jacobi SVD: High accuracy SVD routine for dense
matrices, which can compute tiny singular values to many more correct digits
than xGESVD when the matrix has columns differing widely in norm, and usually
runs faster than xGESVD too.  See references [5,6,7] below.

(6) Routines for Rectangular Full Packed format:  The RFP format (SF, HF, PF,
TF) enables efficient routines with optimal storage for symmetric, Hermitian or
triangular matrices.  Since these routines utilise the Level 3 BLAS, they are
generally much more efficient than the existing packed storage routines (SP,
HP, PP, TP). See reference [8] below.

(7) Pivoted Cholesky: The Cholesky factorization with diagonal pivoting for
symmetric positive semi-definite matrices.  Pivoting is required for reliable
rank detection. See reference [9] below.

(8) Mixed precision iterative refinement routines for exploiting fast single
precision hardware. On platforms like the Cell processor that do single
precision much faster than double, linear systems can be solved many times
faster. Even on commodity processors there is a factor of 2 in speed between
single and double precision.  The matrix types supported in this release are:
GE (general), PO (positive definite).  See reference [1] below.

(9) Some new variants added for the one sided factorization: LU gets
Right-Looking, Left-Looking, Crout and Recursive), QR gets Right-Looking and
Left-Looking, Cholesky gets Left-Looking, Right-Looking and Top-Looking.
Depending on the computer architecture (or speed of the underlying BLAS), one
of these variants may be faster than the original LAPACK implementation."

(10) More robust DQDS: Fixed some rare convergence failures for the bidiagonal
DQDS SVD routine.

(11) Better documentation for the multishift Hessenberg QR algorithm with early
agressive delfation, and various improvements of the code.

================
== References ==
================

[1] Alfredo Buttari, Jack Dongarra, Julie Langou, Julien Langou, Piotr
Luszczek, and Jakub Kurzak. Mixed Precision Iterative Refinement Techniques for
the Solution of Dense Linear Systems International Journal of High Performance
Computing Applications, 21(4):457-466, 2007. 

[2] Ralph Byers.  LAPACK 3.1 xHSEQR: Tuning and Implementation Notes on the
Small Bulge Multi-shift QR Algorithm with Aggressive Early Deflation. LAPACK
Working Note 187, May 2007.

[3] James Demmel, Yozo Hida, William Kahan, Xiaoye S. Li, Sonil Mukherjee, and
E.  Jason Riedy.  Error Bounds from Extra Precise Iterative Refinement.  ACM
Transactions on Mathematical Software (TOMS), 32(2):325-351, 2006. (Also
LAWN-165).

[4] James W. Demmel, Mark Hoemmen, Yozo Hida, and E. Jason Riedy.  Non-Negative
Diagonals and High Performance on Low-Profile Matrices from Householder QR.
LAPACK Working Note 203, May 2008.

[5] Zlatko Drmac.  A global convergence proof of cyclic Jacobi methods with
block rotations. LAPACK Working Note 196, December 2007.

[6] Zlatko Drmac and Kresimir Veselic.  New fast and accurate Jacobi SVD
algorithm: I.  SIAM Journal on Matrix Analysis and Applications,
29(4):1322-1342, 2007. (Also LAWN-169).

[7] Zlatko Drmac and Kresimir Veselic.  New fast and accurate Jacobi SVD
algorithm: II.  SIAM Journal on Matrix Analysis and Applications,
29(4):1343-1362, 2007. (Also LAWN-170).

[8] Fred G. Gustravson, Jerzy Wasniewski, and Jack J. Dongarra.  Rectangular
Full Packed Format for Cholesky's Algorithm: Factorization, Solution and
Inversion.  LAPACK Working Note 199, April 2008.

[9] Craig Lucas.  LAPACK-Style Codes for Level 2 and 3 Pivoted Cholesky
Factorizations.  LAPACK Working Note 161, February 2004.

==================
== Contributors ==
==================

    Ralph Byers (University of Kansas, USA)
    Zlatko Drmac (University of Zagreb, Croatia)
    Peng Du (University of Tennessee, Knoxville, USA)
    Fred Gustavson (IBM Watson Research Center, NY, US)
    Craig Lucas (University of Manchester / NAG Ltd., UK)  
    Kresimir Veselic (Fernuniversitaet Hagen, Hagen, Germany)
    Jerzy Wasniewski (Technical University of Denmark,  Lyngby, Copenhagen, Denmark)

======================================
== Thanks for bug-report/patches to ==
======================================

   Fernando Guevara (Dept. of Mathematics, University of Utah)

===========================
= Principal Investigators =
===========================

    Jim Demmel (University of California at Berkeley, USA)
    Jack Dongarra (University of Tennessee and ORNL, USA)

================================================
== LAPACK developers involved in this release ==
================================================

    Deaglan Halligan (University of California at Berkeley, USA)
    Sven Hammarling (NAG Ltd., UK)
    Yozo Hida (University of California at Berkeley, USA)
    Daniel Kressner (ETH Zurich, Switzerland)
    Julie Langou (University of Tennessee, USA)
    Julien Langou (Uinversity of Colorado Denver, USA)
    Osni Marques (Lawrence Berkeley Laboratory, USA)
    E. Jason Riedy (University of California at Berkeley, USA)
    Edward Smyth (NAG Ltd., UK)

================================================
== XBLAS developers involved in this release ==
================================================

    David Bailey (Lawrence Berkeley Laboratory, USA)
    Deaglan Halligan (University of California at Berkeley, USA)
    Greg Henry (Intel)
    Yozo Hida (University of California at Berkeley, USA)
    Jimmy Iskandar (University of California at Berkeley, USA)
    William Kahan (University of California at Berkeley, USA)
    Anil Kapur (University of California at Berkeley, USA)
    Suh Y. Kang (University of California at Berkeley, USA)
    Xiaoye Li (Lawrence Berkeley Laboratory, USA)
    Sonil Mukherjee (University of California at Berkeley, USA)
    Jason Riedy (University of California at Berkeley, USA)
    Michael Martin (University of California at Berkeley, USA)
    Brandon Thompson (University of California at Berkeley, USA)
    Teresa Tung (University of California at Berkeley, USA)
    Daniel Yoo (University of California at Berkeley, USA)

======================
== Install Procedure =
======================

* YOU NEED F90 !!!
* XBLAS and iterref integration
* VARIANTS integration

======================
== Interface change ==
======================

There are interface changes from LAPACK versions 3.1 to 3.2 for routines:
    DSGESV ZCGESV

=================
== More details =
=================

-----------------------------------------------------------------------
(1) Extra Precise Iterative Refinement
-----------------------------------------------------------------------
The matrix types supported in this release are

1. GE (general)
2. SY (symmetric)
3. PO (positive definite)
4. HE (Hermitian)
5. GB (general band)

in all the relevant precisions.

-----------------------------------------------------------------------
(2) XBLAS, or portable "extra precise BLAS"
-----------------------------------------------------------------------

-----------------------------------------------------------------------
(3) Non-Negative Diagonals and High Performance on Low-Profile Matrices
from Householder QR
-----------------------------------------------------------------------

   * contributors: James W. Demmel, Mark Hoemmen, Yozo Hida, and E.
     Jason Riedy.

   * lapacker: Jason Riedy.

   * see: James W. Demmel, Mark Hoemmen, Yozo Hida, and E. Jason Riedy
     "Non-Negative Diagonals and High Performance on Low-Profile
     Matrices from Householder QR.", LAPACK Working Note 203,
     UCB/EECS-2008-76, May 30, 2008.

-----------------------------------------------------------------------
(4) New fast and accurate Jacobi SVD
-----------------------------------------------------------------------

   * contributors: Zlatko Drmac and Kresemir Veselic.

   * lapacker: Julien Langou.

-----------------------------------------------------------------------
(5) Rectangular Full Packed format
-----------------------------------------------------------------------

   * contributors: Fred Gustavson and Jerzy Wasniewski.

   * lapacker: Julien Langou.

-----------------------------------------------------------------------
(6) Pivoted Cholesky
-----------------------------------------------------------------------

   * contributor: Craig Lucas.

   * lapacker: Jason Riedy.

-----------------------------------------------------------------------
(7) Mixed precision iterative refinement subroutines for exploiting
fast single precision hardware
-----------------------------------------------------------------------

   * contributors: Julie Langou

   * lapackers: Julie Langou

-----------------------------------------------------------------------
(8) Add some variants for the one sided factorization
-----------------------------------------------------------------------

   * contributors: Peng Du and Jason Riedy.

   * lapackers: Julie Langou and Jason Riedy.

   * see:

     LAPACK QR blocked factorization (xGEQRF) is Right-Looking,
     - add the Left-Looking variant. (Peng)
     LAPACK Cholesky blocked factorization (xPOTRF) is Left-Looking,
     - add the Right-Looking variant. (Peng)
     - add the Top-Looking variant. (Peng)
     LAPACK LU blocked factorization (xGETRF) is Right-Looking,
     - add the Right-Looking variant. (Peng)
     - add the Crout variant. (Peng)
     - add the recursive variant. (Jason), in F77, please.

-----------------------------------------------------------------------
(9) Bug fixes for the bidiagonal SVD routine that fixes some rare
convergence failures.
-----------------------------------------------------------------------

   * contributors: Osni Marques and Beresford Parlett.

   * lapackers: Osni Marques, Jim Demmel, and Julien Langou.

-----------------------------------------------------------------------
(10) New TTQRE from Ralf Byers.
-----------------------------------------------------------------------

   * contributor: Ralph Byers.

   * lapacker: Edward Smyth, Daniel Kressner.

Most of the revisions are fixing typographical errors, but there are
a few revisions that have a small affect on how the program works.
Even these are relatively minor revisions:

        o revised the choice of the size of the deflation window
          slightly to make the code a little more robust against
          convergence failures.

        o revised the section of code that tries to reintroduce bulges
          after they have collapsed due to underflow.  The new version
          is cleaner and more robust.

        o revised xLAQR1 so that it does not assume that H(2,1) is
          real.  A code ought to do what it claims to do and in the
          complex case, this small subroutine didn't quite do it.


=======================================================
== Expected additions and improvements for the future =
=======================================================

  * Have a new QZ, see:

         Bo Kagstrom and Daniel Kressner.
         Multishift variants of the QZ algorithm with aggressive early deflation.
         SIAM J. Matrix Anal. Appl., 29(1):199-227, 2006.

  * Have a new block reordering algorithm:

         Daniel Kressner.
         Block algorithms for reordering standard and generalized Schur forms.
         ACM Trans. Math. Software, 32(4):521-532, 2006.

  * Add the accurate and efficient Givens rotations from David Bindel, Jim Demmel, W. Kahan, and Osni Marques.

    See http://www.cs.berkeley.edu/~demmel/Givens/ and:
         David Bindel, James Demmel, William Kahan, and Osni Marques
         On computing givens rotations reliably and efficiently
         ACM Transactions on Mathematical Software (TOMS)
         Volume 28, Issue 2, 2002.  Pages: 206-238.

  * Change the default Cholesky factorization in SRC from right--looking to left--looking,
    move left--loking to the VARIANTS directory.

  * Add some recursive variants for QR and Cholesky.

  * Remove IEEE=.FALSE. in DQDS (DLASQ3, SLASQ3 of Osni).

  * Look at the Matlab laundry list (sent by Penny Anderson).

  * Support more matrix types for extra-precise iterative refinement.
    Matrix types SB (symmetric band), PB (positive definite band), HB
    (Hermitian band), and packed storage.  Tridiagonal types such as GT
    (general tridiagonal) are also on the wish list but first we need to derive
    adequate test cases.