LAPACK Wishlist

updated on Tue December 10 2013
maintained by J. Langou, U. Colorado Denver
maintained by J. Langou, U. of Tennessee

(*) remove unnecessary transpositions from lapacke_?_work layer
o Lawrence Mulholland, NAG, DEC-10-2013
o use tricks ala CBLAS layer to remove unnecessary transpositions from lapacke
o see: forum topic 4469

(*) add inplace transposition algorithm
o Julien, DEC-10-2013
o add inplace transposition algorithm to LAPACK and use it in LAPACKE (If appropriate)
o see: forum topic 4469

(*) ScaLAPACK :: PDLARFB

o Keita Teranishi, Cray, 16-12-10
o fact: PDLARB does not use PBLAS (rely on BLACS and BLAS)
o todo: investigate why and if using PBLAS is better, use PBLAS ...
o see forum topic 2094

(*) LAPACK/ScaLAPACK

o Nichols A. Romero, Argonne Leadership Computing Facility, 12-21-10
o include Jack Poulson's PDSYNTRD algorithm (See his master thesis)
o include "faster Householder algorithm" (See "Accumulating Householder Transformations, Revisited" by Joffrain, Low, Quintana-Ortí, van de Geijn, and Van Zee
o see see forum topic 2113

(*) LAPACK :: DSPEVR routine

Currently, LAPACK contains many flavors for driver routines to solve eigenproblems - for example:
DSYEV{D/R/X}
DSTEV{D/R/X}
DSPEV{D/X}

Request made by user on the LAPACK mailing list:  see email 

(*) ScaLAPACK :: P[SDCZ]LATRS is not the ScaLAPACK equivallent of LAPACK [SDCZ]LATRS

    o Jill Reese, Mathworks, 04/13/2010

    o ScaLAPACK P[SDCZ]LATRS is not the ScaLAPACK equivallent of LAPACK
      [SDCZ]LATRS, it is a wrapper on top of P[SDCZ]TRSV, in other words,
      there is no check to prevent possible overflow.

    o As a consequence the numerical behavior of LAPACK and ScaLAPACK routines can be
      quite different

(*) LAPACK :: support multiple couples (c,d) in xGGLSE

    o date: Sep 06 2009, "kyewong"
    o problem description: The interface of xGGLSE only supports one couple (c,d).
      When solving a linear equality-constrained least squares problem, most of the work
      is in the factorization of the matrices A and B (with xGGRQF), if you have several 
      couples (c,d), you do not want to repeat the factorization for each couple. You
      want to reuse the factorization.
    o learn more: see http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=2&t=1615
    o This is a rather easy task. Do not forget to add testing of the new functionnality
      in the TESTING directory!

(*) LAPACK :: interface (and/or) source code with 64-bit interger
(*) ScaLAPACK :: interface (and/or) source code with 64-bit interger

    o interfaces and code sources using 64-bit integers would fix bug0020
    o code sources using 64-bit integers would enable packed format routine to work for N > sqrt(2^31)
    o interface using 64-bit integers are needed for N > 2^31 (possible for large vectors in the ScaLAPACK context)
    o possibility of compiling lapack with flag to set all integers at 64-bit

(*) LAPACK :: New routines: ILAxLV, scan a vector for trailing zeros.
(*) LAPACK :: Use the vector scanning routines in xLARFG and xLARFP.

    o see Jason's email: jason_20090323_001.txt and jason_20090323_002.txt

(*) LAPACK :: multishift QZ with early aggressive deflation

    o Bo Kågströom and Daniel Kressner. Multishift variants of the QZ algorithm with aggressive early deflation. SIAM J. Matrix Anal. Appl., 29(1):199-227, 2006.
    o This will fix as well the problem described in: http://www-math.cudenver.edu/~langou/lapack-3.2/lapack_known_issues.html#QZ

(*) LAPACK :: block reordering algorithm

    o Daniel Kressner. Block algorithms for reordering standard and generalized Schur forms. ACM Trans. Math. Software, 32(4):521-532, 2006.

(*) LAPACK :: Extra-precise iterative refinement for overdetermined least squares 

    o James Demmel, Yozo Hida, Xiaoye S. Li, and E. Jason Riedy. Extra-precise Iterative Refinement for Overdetermined Least Squares Problems. LAPACK Working Note 188, May 2007.

(*) LAPACK :: accurate and efficient Givens rotations

    o David Bindel, James Demmel, William Kahan, and Osni Marques. On computing givens rotations reliably and efficiently. ACM Transactions on Mathematical Software (TOMS) Volume 28, Issue 2, 2002. Pages: 206-238.
    o http://www.cs.berkeley.edu/~demmel/Givens/

(*) LAPACK :: blas 2.5

    o Gary W. Howell, James Demmel, Charles T. Fulton, Sven Hammarling, and Karen Marmol. Cache efficient bidiagonalization using BLAS 2.5 operators. ACM Transactions on Mathematical Software (TOMS) Volume 34, Issue 3, 2008.

(*) LAPACK :: support more matrix types for extra-precise iterative refinement.

    o Matrix types SB (symmetric band), PB (positive definite band), HB (Hermitian band), and packed storage.
      Tridiagonal types such as GT (general tridiagonal) are also on the wish list but first we need to derive adequate test cases.

(*) LAPACK :: make xLARFB thread friendly

    o Take into account the comments of Robert van de Geijn (Univ. of Texas at Austin) concerning the interface of xLARFB.
      The interface labels V as input/output and this is not at all convenient for multithreaded implementations.

(*) LAPACK :: Change the default Cholesky factorization in SRC from right-looking to left-looking 

    o Change the default Cholesky factorization in SRC from right-looking to left-looking, move left-looking to the VARIANTS directory.

(*) LAPACK :: Add some recursive variants for QR and Cholesky. 

    o Add some recursive variants for QR and Cholesky.

(*) LAPACK :: Add some recursive variants for QR and Cholesky. 

    o See: http://www.netlib.org/lapack/lapack-3.2.html#_9_10_bug_fixes_for_the_bidiagonal_svd_routine_that_fix_some_rare_convergence_failures
    o Remove IEEE=.FALSE. in DQDS (DLASQ3, SLASQ3 of Osni).
    o Note: Collin Engstrom on Tue 21 Jul 2009 sent us an email to tell us that he was able to reproduce the numerical failure as described with LAPACK but not with CLAPACK.
      Sounds like fun to debug!