1. Improvements in LAPACK 3.1.0
See Release Notes for 3.1.0 for more details

Hessenberg QR algorithm with the small bulge multishift QR algorithm together with aggressive early deflation. This is an implementation of the 2003 SIAM SIAG LA Prize winning algorithm of Braman, Byers and Mathias, that significantly speeds up the nonsymmetric eigenproblem.

Improvements of the Hessenberg reduction subroutines. These accelerate the first phase of the nonsymmetric eigenvalue problem.

New MRRR eigenvalue algorithms that also support subset computations. These implementations of the 2006 SIAM SIAG LA Prize winning algorithm of Dhillon and Parlett are also significantly more accurate than the version in LAPACK 3.0.

Mixed precision iterative refinement subroutines for exploiting fast single precision hardware. On platforms like the Cell processor that do single precision much faster than double, linear systems can be solved many times faster. Even on commodity processors there is a factor of 2 in speed between single and double precision. These are prototype routines in the sense that their interfaces might changed based on user feedback.

New partial column norm updating strategy for QR factorization with column pivoting. This fixes a subtle numerical bug dating back to LINPACK that can give completely wrong results.

Thread safety: Removed all the SAVE and DATA statements (or provided alternate routines without those statements), increasing reliability on SMPs.

Additional support for matrices with NaN/subnormal elements, optimization of the balancing subroutines, improving reliability.
2. Improvements in LAPACK 3.1.1
See Release Notes for 3.1.1 for more details

Add BLAS routines so that the BLAS provided is complete

Provide 5 flavours of SECOND and DSECND and ddd the TIMER variable in the make.inc to chose which routine to use
2. Improvements in LAPACK 3.2.0
See Release Notes for 3.2 for more details

Extra Precise Iterative Refinement: New linear solvers that “guarantee” fully accurate answers (or give a warning that the answer cannot be trusted). The matrix types supported in this release are: GE (general), SY (symmetric), PO (positive definite), HE (Hermitian), and GB (general band) in all the relevant precisions.

XBLAS, or portable “extra precise BLAS”: Our new linear solvers in (1) depend on these to perform iterative refinement. The XBLAS will be released in a separate package.

NonNegative Diagonals from Householder QR: The QR factorization routines now guarantee that the diagonal is both real and nonnegative. Factoring a uniformly random matrix now correctly generates an orthogonal Q from the Haar distribution.

High Performance QR and Householder Reflections on LowProfile Matrices: The auxiliary routines to apply Householder reflections (e.g. DLARFB) automatically reduce the cost of QR from O(n^{3}) to O(n^{2}+nb^{2}) for matrices stored in a dense format for band matrices with bandwidth b with no user interface changes. Other users of these routines can see similar benefits in particular on “narrow profile” matrices.

New fast and accurate Jacobi SVD: High accuracy SVD routine for dense matrices, which can compute tiny singular values to many more correct digits than xGESVD when the matrix has columns differing widely in norm, and usually runs faster than xGESVD too.

Routines for Rectangular Full Packed format: The RFP format (SF, HF, PF, TF) enables efficient routines with optimal storage for symmetric, Hermitian or triangular matrices. Since these routines utilize the Level 3 BLAS, they are generally much more efficient than the existing packed storage routines (SP, HP, PP, TP).

Pivoted Cholesky: The Cholesky factorization with diagonal pivoting for symmetric positive semidefinite matrices. Pivoting is required for reliable rank detection.

Mixed precision iterative refinement routines for exploiting fast single precision hardware: On platforms like the Cell processor that do single precision much faster than double, linear systems can be solved many times faster. Even on commodity processors there is a factor of 2 in speed between single and double precision. The matrix types supported in this release are: GE (general), PO (positive definite).

Some new variants added for the one sided factorization: LU gets rightlooking, leftlooking, Crout and Recursive), QR gets rightlooking and leftlooking, Cholesky gets leftlooking, rightlooking and toplooking. Depending on the computer architecture (or speed of the underlying BLAS), one of these variants may be faster than the original LAPACK implementation.

More robust DQDS algorithm: Fixed some rare convergence failures for the bidiagonal DQDS SVD routine.

Better documentation for the multishift Hessenberg QR algorithm with early aggressive deflation, and various improvements of the code.
3. Improvements in LAPACK 3.2.1
See Release Notes for 3.2.1 for more details

change xLARRD (MRRR routines) to deal with some matrices from Godunov
4. Improvements in LAPACK 3.3.0
See Release Notes for 3.3.0 for more details

Standard C language APIs for LAPACK: The standard interfaces include support for Fortran and C data formats (columnmajor and rowmajor) to ease interoperability with, and migration of, Fortran code. Full LAPACK functionality is now accessible in a Cfriendly manner. These new interfaces are a key feature of the new LAPACK 3.3 release and the just released Intel MKL 10.3 product. PLASMA 2.3 relies on it.

Computing the complete CS decomposition: Compute the complete CS decomposition of a partitioned unitary matrix. The algorithm is designed for numerical stability. See: Brian D. Sutton. Computing the complete CS decomposition. Numer. Algorithms. 50 (2009), no. 1, 33â€“65.

Level3 BLAS symmetric indefinite solve and symmetric indefinite inversion: This enables better performance for xSYTRF and xSYTRI.

Thread safe xLAMCH: SLAMCH and DLAMCH were the only two routines not threadsafe in LAPACK3.2. This is fixed: all routines in LAPACK3.3 are now threadsafe.
4. Improvements in LAPACK 3.3.1
See Release Notes for 3.3.1 for more details
Improve the CMAKE build system.

Added a FindBLAS module to easily search for and link against various optimized BLAS implementations

Added proper default compiler flags for various commercial Fortran compilers

Check for and disable SIGFPE handlers (LAPACK code handles these internally) on known compilers

Remove testing drivers and BLAS source from nightly code coverage to get a more accurate representation of the LAPACK code getting exercised

Suppress harmless warning messages for various platforms and compilers

Added nightly builds (http://my.cdash.org/index.php?project=LAPACK)

Standardized binary output locations to play nicely when building Windows DLLs

Cleaned up CMake packaging to allow find_package(LAPACK) in applications

Added missing link dependencies for test libraries
5. Improvements in LAPACK 3.4.0
See Release Notes for 3.4.0 for more details

xGEQRT: QR factorization (improved interface). Contribution by Rodney James (UC Denver). xGEQRT is analogous to xGEQRF with a modified interface which enables better performance when the blocked reflectors need to be reused. The companion subroutines xGEMQRT apply the reflectors.

xGEQRT3: recursive QR factorization. Contribution by Rodney James (UC Denver). The recursive QR factorization enables cacheoblivious and enables high performance on tall and skinny matrices.

xTPQRT: CommunicationAvoiding QR sequential kernels. Contribution by Rodney James (UC Denver). These subroutines are useful for updating a QR factorization and are used in sequential and parallel Communication Avoiding QR. These subroutines support the general case Triangle on top of Pentagon which includes as special cases socalled Triangle on top of Triangle and Triangle on top of Square. This is the rightlooking version of the subroutines and the subroutines are blocked. The T matrices and the block size are part of the interface. The companion subroutines xTPMQRT apply the reflectors.

CMAKE build system. We are striving to help our users install our libraries seamlessly on their machines. The CMAKE team contributed to our effort to port LAPACK and ScaLAPACK under the CMAKE build system. Building under Windows has never been easier. This also allows us to release dll for Windows, so users no longer need a Fortran compiler to use LAPACK under Windows.

Doxygen documentation. LAPACK routine documentation has never been more accessible. See http://www.netlib.org/lapack/explorehtml/.

New website allowing for easier navigation.

LAPACKE  Standard C language APIs for LAPACK. Since LAPACK 3.3.0, LAPACK includes new C interfaces. With the LAPACK 3.4.0 release, LAPACKE is directly integrated within the LAPACK library and has been enriched by the full set of LAPACK subroutines. See here.
6. Improvements in LAPACK 3.4.1
See Release Notes for 3.4.1 for more details

BLAS test codes now using F90 machine epsilon, optimization now works

Fix Thread safety issue. xLARFT routines modified so that V in now [in] only instead of [in/out], now thread safe

LAPACK test suite modified so that all tests will pass with gfortran (and ifort with O0, and hopefully others)
7. Improvements in LAPACK 3.4.2
See Release Notes for 3.4.2 for more details

Modified all of the matrix norm routines. Now, if a matrix contains a NaN, a NaN will be returned for all choices of norm. Previously, if a matrix contained a NaN, the norm routines would not return necessarily a NaN. (They would return regular floatingpoint numbers in some cases.)
8. Improvements in LAPACK 3.5.0
See Release Notes for 3.5.0 for more details

xSYSV_ROOK: LDL^{T} with rook pivoting. Contribution by Craig Lucas (University of Manchester and NAG) and Sven Hammarling (NAG). These subroutines enable better stability than the BunchKaufman pivoting scheme currently used in LAPACK xSYSV solvers; also, the elements of L are bounded, which is important in some applications. The computational time of xSYSV_ROOK is slightly higher than the one of xSYSV.

2by1 CSD to be used for tall and skinny matrix with orthonormal columns (in LAPCK 3.4.0, we already integrated CSD of a full square orthogonal matrix) More info here: http://faculty.rmc.edu/bsutton/simultaneousbidiagonalization.html

New stopping criteria for balancing.

New complex division algorithm. Following the algorithm of Michael Baudin, Robert L. Smith (see arxiv.1210.4539 )

Other improvements:

slasyf.f, clasyf.f, clahef.f, dlasyf.f, zlahef.f, zlasyf.f :: various improvements by Igor Kozachenko

dhgeqz.f, shgeqz.f :: improvements by Duncan Po (MathWorks), committed by Rodney James

dlasd4.f, slasd4.f :: improvements by Sergey Kuznetsov (Intel), committed by Rodney James

clarfb.f, dlarfb.f, slarfb.f, zlarfb.f :: improvements by Rodney James
