New robust ScaLAPACK routine for computing the QR factorization with column pivoting

lawn295 [pdf]

New robust ScaLAPACK routine for computing the QR factorization with column pivoting

by Zvonimir Bujanovic and Zlatko Drmac

XXXXX Oct 2019

lawn294 [pdf]

Aasen’s Symmetric Indefinite Linear Solvers in LAPACK

by Ichitaro Yamazaki and Jack Dongarra

ICL-UT-17-13 Dec 2017

lawn293 [pdf]

PLASMA 17.1 Functionality Report

by Maksims Abalenkovs, Negin Bagherpour, Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Samuel Relton, Jakub Sistek, David Stevens, Panruo Wu, Ichitaro Yamazaki, Asim YarKhan, and Mawussi Zounon

UT-EECS-17-751 June 2017

lawn292 [pdf]

PLASMA 17 Performance Report

by Maksims Abalenkovs, Negin Bagherpour, Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Samuel Relton, Jakub Sistek, David Stevens, Panruo Wu, Ichitaro Yamazaki, Asim YarKhan, and Mawussi Zounon

UT-EECS-17-750 June 2017

lawn291 [pdf]

On block-asynchronous execution on GPUs

by Hartwig Anzt, Jack Dongarra and Edmond Chow

UT-EECS-16-746 November 2016

lawn290 [pdf]

2016 Dense Linear Algebra Software Packages Survey

by Jack Dongarra, Jim Demmel, Julien Langou and Julie Langou

UT-EECS-16-744 September 2016

lawn289 [pdf]

Fault tolerance techniques for high-performance computing

by Jack Dongarra, Thomas Herault and Yves Robert

UT-EECS-15-734 May 2015

lawn288 [pdf]

PULSAR Users’ Guide

by Jack Dongarra, Jakub Kurzak, Piotr Luszczek, and Ichitaro Yamazaki

UT-EECS-14-733 December 2014

lawn287 [pdf]

Efficient checkpoint/verification patterns for silent error detection

by Anne Benoit, Saurabh K. Raina and Yves Robert

UT-EECS-14-729 May 2014

lawn286 [pdf]

Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem

by Mark Gates, Azzam Haidar, and Jack Dongarra

UT-EECS-14-724 March 2014

lawn285 [pdf]

Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods

by Marc Baboulin, Xiaoye S. Li, and François-Henry Rouet

Inria Research Report RR-8481 (Feb. 2014)

lawn284 [pdf]

FlexiBLAS - A flexible BLAS library with runtime exchangeable backends

by Martin Köhler, and Jens Saak

lawn283 [pdf]

An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware

by Azzam Haidar, Piotr Luszczek, Jakub Kurzak, and Jack Dongarra

UT-EECS-13-720 October 2013

lawn282 [pdf]

Designing LU-QR hybrid solvers for performance and stability

by Mathieu Faverge, Julien Herrmann, Julien Langou, Bradley Lowery, Yves Robert and Jack Dongarra

UT-EECS-13-719 October 2013

lawn281 [pdf]

Optimal Checkpointing Period: Time vs. Energy

by Guillaume Aupy, Anne Benoit, Thomas Herault, Yves Robert and Jack Dongarra

UT-EECS-13-718 October 2013

lawn280 [pdf]

On Algorithmic Variants of Parallel Gaussian Elimination: Comparison of Implementations in Terms of Performance and Numerical Properties

by Simplice Donfack, Jack Dongarra, Mathieu Faverge, Mark Gates, Jakub Kurzak, Piotr Luszczek, and Ichitaro Yamazaki

UT-CS-13-715 July 2013

lawn279 [pdf]

Transient Error Resilient Hessenberg Reduction on GPU-based Hybrid Architectures

by Yulu Jia, Piotr Luszczek, and Jack Dongarra

UT-CS-13-712 June 2013

lawn278 [pdf]

On the Combination of Silent Error Detection and Checkpointing

by Guillaume Aupy, Anne Benoit, Thomas Hérault, Yves Robert, Frédéric Vivien and Dounia Zaidouni

UT-CS-13-710 June 2013

lawn277 [pdf]

Implementing a systolic algorithm for QR factorization on multicore clusters with PaRSEC

by Guillaume Aupy, Mathieu Faverge, Yves Robert, Jakub Kurzak, Piotr Luszczek, and Jack Dongarra

UT-CS-13-709 May 2013

lawn276 [pdf]

Communication Avoiding Rank Revealing QR Factorization With Column Pivoting

by James W. Demmel, Laura Grigori, Ming Gu, and Hua Xiang

UCB/EECS-2013-46 May 2013

lawn275 [pdf]

clMAGMA: High Performance Dense Linear Algebra with OpenCL

by Chongxiao Cao, Jack Dongarra, Peng Du, Mark Gates, Piotr Luszczek, and Stanimire Tomov

UT-CS-13-706 Mar 2013

lawn274 [pdf]

Revisiting the double checkpointing algorithm

by Jack Dongarra, Thomas Herault and Yves Robert

UT-CS-13-705 Dec 2012

lawn273 [pdf]

Efficient computation of condition estimates for linear least squares problems

by Marc Baboulin, Serge Gratton, Remi Lacroix and Alan Laub

Inria Research Report 8065 Sep 2012

lawn272 [pdf]

Providing GPU Capability to LU and QR within the ScaLAPACK Framework

by Peng Du, Stanimire Tomov, and Jack Dongarra

UT-CS-12-699 Sep 2012

lawn271 [pdf]

Optimally packed chains of bulges in multishift QR algorithms

by Lars Karlsson, and Daniel Kressner

lawn270 [pdf]

How LAPACK library enables Microsoft Visual Studio support with CMake and LAPACKE

by Julie Langou, Bill Hoffman, and Brad King

UT-CS-12-698 Jul 2, 2012

lawn269 [pdf]

Unified Model for Assessing Checkpointing Protocols at Extreme-Scale

by George Bosilca, Aurélien Bouteiller, Elisabeth Brunet, Franck Cappello, Jack Dongarra, Amina Guermouche, Thomas Hérault, Yves Robert, Frédéric Vivien, and Dounia Zaidouni

UT-CS-12-697 Jun 4, 2012

lawn268 [pdf]

Combining Process Replication and Checkpointing for Resilience on Exascale Systems

by Henri Casanova, Yves Robert, Frédéric Vivien, and Dounia Zaidouni

UT-CS-12-696 Jun 4, 2012

lawn267 [pdf]

Preliminary Results of Autotuning GEMM Kernels for the NVIDIA Kepler Architecture – GeForce GTX 680.

by Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, and Jack Dongarra

lawn266 [pdf]

LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System.

by Jakub Kurzak, Piotr Luszczek, Mathieu Faverge, and Jack Dongarra

accepted to VECPAR’12

lawn265 [pdf]

Using group replication for resilience on exascale systems.

by Marin Bougeret, Henri Casanova, Yves Robert, Frédéric Vivien and Dounia Zaidouni

INRIA Research report RR-7876, February 2012

lawn264 [pdf]

Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach.

by George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Herault, Piotr Luszczek, and Jack J. Dongarra

To be published in

Title: Scalable Computing and Communications: Theory and Practice

Editors: Samee U. Khan, Lizhe Wang, and Albert Y. Zomaya

Publisher: John Wiley & Sons

lawn263 [pdf]

LU factorization with panel rank revealing pivoting and its communication avoiding version

by Amal Khabou, James W. Demmel, Laura Grigori, and Ming Gu

UCB/EECS-20112-XX, Janary 24, 2012.

lawn262 [pdf]

Using replication for resilience on exascale systems.

by Marin Bougeret, Henri Casanova, Yves Robert, Frédéric Vivien and Dounia Zaidouni

UT-CS-11-691 Dec 10, 2011

lawn261 [pdf]

A parallel tiled solver for dense symmetric indefinite systems on multicore architectures.

by Marc Baboulin, Dulceneia Becker, and Jack Dongarra

ICL-UT-11-07, INRIA RR-7762 Dec 14, 2011

To appear in the proceedings of IPDPS 2012.

lawn260 [pdf]

GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement.

by Hartwig Anzt, Piotr Luszczek, Jack Dongarra, and Vincent Heuveline

UT-CS-11-690, Dec 14, 2011

lawn259 [pdf]

Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization.

by Jack Dongarra, Mathieu Faverge, Hatem Ltaief and Piotr Luszczek

UT-CS-11-688, Dec 9, 2011

lawn258 [pdf]

A Block-Asynchronous Relaxation Method for Graphics Processing Units.

by Hartwig Anzt, Stanimire Tomov, Jack Dongarra and Vincent Heuveline

UT-CS-11-687, Dec 1, 2011

lawn257 [pdf]

Hierarchical QR factorization algorithms for multi-core cluster systems.

by Jack Dongarra, Mathieu Faverge, Thomas Herault, Julien Langou and Yves Robert

UT-CS-11-684, Oct 4, 2011

lawn256 [pdf]

High Performance Linear System Solver with Resilience to Multiple Soft Errors.

by Peng Du, Piotr Luszczek and Jack Dongarra

UT-CS-11-683, Oct 4, 2011

lawn255 [pdf]

Improving communication performance in dense linear algebra via topology aware collectives.

by Edgar Solomonik Abhinav Bhatele and James Demmel

UCB/EECS-2011-92, Aug 15, 2011

lawn254 [pdf]

Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems using Aggregated Fine-Grained and Memory-Aware Kernels.

by Azzam Haidar, Hatem Ltaief and Jack Dongarra

UT-CS-11-677, Aug 5, 2011

lawn253 [pdf]

Algorithm-based Fault Tolerance for Dense Matrix Factorizations.

by Peng Du, Aurelien Bouteiller, George Bosilca, Thomas Herault and Jack Dongarra

UT-CS-11-676, Aug 5, 2011

lawn252 [pdf]

Soft Error Resilient QR Factorization for Hybrid System.

by Peng Du, Piotr Luszczek, Stanimire Tomov, and Jack Dongarra

UT-CS-11-675, July 1, 2011

lawn251 [pdf]

Profiling High Performance Dense Linear Algebra Algorithms on Multicore Architectures for Power and Energy Efficiency.

by Hatem Ltaief, Piotr Luszczek, and Jack Dongarra

UT-CS-11-674, June 21, 2011

lawn250 [pdf]

Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures.

by Fengguang Song, Stanimire Tomov, and Jack Dongarra

UT-CS-11-668, June 16, 2011

lawn249 [pdf]

Level-3 Cholesky Factorization Routines as Part of Many Cholesky Algorithms.

by Fred G. Gustavson, Jerzy Wásniewski, Jack J. Dongarra, José R. Herrero and Julien Langou

DTU/IMM-Technical-Report-2011-11.

Submitted at TOMS

lawn248 [pdf]

Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms.

by Edgar Solomonik and James Demmel

UCB/EECS-2011-72, Jun 7, 2011

lawn247 [pdf]

High Performance Bidiagonal Reduction using Tile Algorithms on Homogeneous Multicore Architectures.

by Hatem Ltaief, Piotr Luszczek, and Jack Dongarra

UT-CS-11-673, May 18, 2011

Submitted at TOMS

lawn246 [pdf]

Accelerating linear system solutions using randomization techniques.

by Marc Baboulin, Jack Dongarra, Julien Herrmann, and Stanimire Tomov

INRIA RR-7616, May 15 2011

lawn245 [pdf]

Autotuning GEMMs for Fermi.

by Jakub Kurzak, Stanimire Tomov, and Jack Dongarra

UT-CS-11-671, Apr 18, 2011

accepted to IEEE TPDS

lawn244 [pdf]

Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices using Tile Algorithms on Multicore Architectures.

by Piotr Luszczek, Hatem Ltaief, and Jack Dongarra

UT-CS-11-670, Apr 18, 2011

lawn243 [pdf]

Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures.

by Azzam Haidar, Hatem Ltaief, Asim YarKhan and Jack Dongarra

UT-CS-11-666, Mar 10, 2011

Submitted at Concurrency and Computations.

lawn242 [pdf]

A Fully Empirical Autotuned Dense QR Factorization For Multicore Architectures.

by Emmanuel Agullo, Jack Dongarra, Rajib Nath and Stanimire Tomov

INRIA-7526, Mar 9, 2011

lawn241 [pdf]

Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems.

by Fengguang Song, Hatem Ltaief, Bilel Hadri and Jack Dongarra

UT-CS-10-653, Mar 4, 2011

Published at SC’10.

lawn240 [pdf]

Communication-Avoiding QR Decomposition for GPUs.

by Michael Anderson, Grey Ballard, James Demmel and Kurt Keutzer

update of UCB/EECS-2010-131, Feb 18, 2011

To appear in IPDPS’11

lawn239 [pdf]

Communication bounds for heterogeneous architectures.

by Grey Ballard, James Demmel, and Andrew Gearhart

UCB/EECS-2011-13, Feb 11, 2011

lawn238 [pdf]

Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms.

by Edgar Solomonik and James Demmel

UCB/EECS-2011-10, Feb 9, 2011

lawn237 [pdf]

Minimizing Communication for Eigenproblems and the Singular Value Decomposition.

by Grey Ballard, James Demmel, and Ioana Dumitriu

UCB/EECS-2010-136, Nov 13, 2010

lawn236 [pdf]

A contribution to the conditioning of the total least squares problem.

by Marc Baboulin and Serge Gratton INRIA, Nov 5, 2010.

lawn235 [pdf]

Reducing the time to tune parallel dense linear algebra routines with partial execution and performance modelling.

by Jack Dongarra and Piotr Luszczek

UT-CS-10-661, Oct 8, 2010.

lawn233 [pdf]

QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators.

by Agullo, E., Augonnet, C., Dongarra, J., Faverge, M., Ltaief, H., Thibault, S. and Tomov, S.

UT-CS-10-XXX, Oct, 2010.

Proceedings of IPDPS 2011

lawn232 [pdf]

Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project.

by Bosilca, G., Bouteiller, A., Danalis, A, Faverge, M., Haidar, H., Herault, T., Kurzak, J., Langou, J., Lemarinier, P., Ltaief, H., Luszczek, P., YarKhan, A., Dongarra, J

UT-CS-10-660, Sep 15, 2010.

lawn231 [pdf]

DAGuE: A generic distributed DAG engine for high performance computing.

by Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J

UT-CS-10-659, Sep 15, 2010.

lawn230 [pdf]

Faster, Cheaper, Better – a Hybridization Methodology to Develop Linear Algebra Software for GPUs.

by Emmanuel Agullo, Cedric Augonnet, Jack Dongarra, Hatem Ltaief, Raymond Namyst, Samuel Thibault, and Stanimire Tomov

UT-CS-10-658, Sep 15, 2010.

To appear in GPU Computing GEMs, vol.2

lawn229 [pdf]

An Implementation of the Tile QR Factorization for a GPU and Multiple CPUs.

by Jakub Kurzak, Rajib Nath, Peng Du, and Jack Dongarra

UT-CS-10-657, Sep 15, 2010.

Submitted to PARA’10

lawn228 [pdf]

From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming.

by Peng Du, Rick Weber, Piotr Luszczek, Stanimire Tomov, Gregory Peterson and Jack Dongarra

UT-CS-10-656, Sep 6, 2010.

lawn227 [pdf]

An Improved MAGMA GEMM for Fermi GPUs.

by Rajib Nath, Stanimire Tomov, and Jack Dongarra

UT-CS-10-655, July 29, 2010.

lawn226 [pdf]

CALU: a communication optimal LU factorization algorithm.

by Laura Grigori, James W Demmel, and Hua Xiang

UCB/EECS-2010-29, March 15, 2010.

Submitted to SIAM Journal on Matrix Analysis and Applications (SIMAX).

lawn225 [pdf]

Dense Linear Algebra Solvers for Multicore with GPU Accelerators.

by Stanimire Tomov, Rajib Nath, Hatem Ltaief, and Jack Dongarra

UT-CS-09-649, February 18, 2010.

Published In the Proceedings of IPDPS 2010: 24th IEEE International Parallel and Distributed Processing Symposium, Atlanta, GA, April 2010.

lawn224 [pdf]

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment.

by Emmanuel Agullo, Camille Coti, Jack Dongarra, Thomas Herault, and Julien Langou

UT-CS-10-651, January 6, 2010.

Published In the Proceedings of IPDPS 2010: 24th IEEE International Parallel and Distributed Processing Symposium, Atlanta, GA, April 2010.

lawn223 [pdf]

A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators.

by Hatem Ltaief, Stanimire Tomov, Rajib Nath, Peng Du, and Jack Dongarra

UT-CS-09-646, November 25, 2009.

lawn222 [pdf]

Enhancing Parallelism of Tile QR Factorization for Multicore Architectures.

by Bilel Hadri, Hatem Ltaief, Emmanuel Agullo, and Jack Dongarra

UT-CS-09-645, September 4, 2009.

lawn221 [pdf]

Dynamic Task Scheduling for Linear Algebra Algorithms on Distributed-Memory Multicore Systems.

by Fengguang Song, Asim YarKhan and Jack Dongarra

UT-CS-09-638, April 13, 2009.

lawn220 [pdf]

Fully Dynamic Scheduler for Numerical Computing on Multicore Processors.

by Jakub Kurzak and Jack Dongarra

UT-CS-09-643, June 4, 2009.

lawn219 [pdf]

Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing.

by Stanimire Tomov and Jack Dongarra

UT-CS-09-642, May 24, 2009.

lawn218 [pdf]

Minimizing Communication in Linear Algebra.

by Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz UCB/EECS-2009-62, May 15, 2009.

lawn217 [pdf]

Comparative Study of One-Sided Factorizations with Multiple Software Packages on Multi-Core Hardware.

by Emmanuel Agullo, Bilel Hadri, Hatem Ltaief and Jack Dongarra

UT-CS-09-640, April 28, 2009.

lawn216 [pdf]

A novel parallel QR algorithm for hybrid distributed memory HPC systems.

by Robert Granat Bo Kagstrom and Daniel Kressner

UMINF-09.06, April 6, 2009.

lawn215 [pdf]

Communication-optimal Parallel and Sequential Cholesky decomposition.

by Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz

UCB/EECS-2009-29, February 13, 2009.

lawn214 [pdf]

Scheduling Two-sided Transformations using Algorithms by-Tiles on Multicore Architectures.

by Hatem Ltaief, Jakub Kurzak and Jack Dongarra

UT-CS-09-637, February 11, 2009.

lawn213 [pdf]

Scheduling Linear Algebra Operations on Multicore Processors.

by Jakub Kurzak, Hatem Ltaief, Jack Dongarra, and Rosa M. Badia

UT-CS-09-636, February 5, 2009.

lawn212 [pdf]

A Note on Auto-tuning GEMM for GPUs.

by Yinan Li, Jack Dongarra, and Stanimire Tomov

UT-CS-09-635, January 12, 2009.

lawn211 [pdf]

Level-3 Cholesky kernel subroutine of a fully portable High Performance minimal storage hybrid format Cholesky algorithm.

by Fred G. Gustavson, Jerzy Wasniewski, and Jack Dongarra

UT-CS-08-634, December 1, 2008.

lawn210 [pdf]

Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems.

by Stanimire Tomov, Jack Dongarra, and Marc Baboulin

UT-CS-08-632, October 17, 2008.

lawn209 [pdf]

Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures.

by Hatem Ltaief, Jakub Kurzak, and Jack Dongarra

UT-CS-08-631, October 1, 2008.

lawn208 [pdf]

Parallel Block Hessenberg Reduction using Algorithms by Tiles for Multicore Architectures Revisited.

by Hatem Ltaief, Jakub Kurzak, and Jack Dongarra

UT-CS-08-624, August 7, 2008.

lawn207 [pdf]

Using dual techniques to derive componentwise and mixed condition numbers for a linear functional of a linear least squares solution.

by Marc Baboulin and Serge Gratton

UT-CS-08-622, August 4, 2008.

lawn206 [pdf]

The Problem with the Linpack Benchmark Matrix Generator.

by Jack Dongarra, and Julien Langou

UCD-CCM-271, June 28, 2008.

lawn205 [pdf]

Algorithmic Based Fault Tolerance Applied to High Performance Computing.

by George Bosilca, Remi Delmas, Jack Dongarra, and Julien Langou

UT-CS-08-620, June 19, 2008.

lawn204 [pdf]

Communication-optimal parallel and sequential QR and LU factorizations.

by James W. Demmel, Laura Grigori, Mark Frederick Hoemmen and Julien Langou

UCB/EECS-2008-89, August 4, 2008.

lawn203 [pdf]

Non-Negative Diagonals and High Performance on Low-Profile Matrices from Householder QR.

by James W. Demmel, Mark Hoemmen, Yozo Hida, and E. Jason Riedy

UCB/EECS-2008-76, May 30, 2008.

lawn202 [pdf]

LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs.

by Vasily Volkov and James W. Demmel

UCB/EECS-2008-49, May 15, 2008.

lawn201 [pdf]

QR Factorization for the CELL Processor

by Jakub Kurzak and Jack Dongarra

UT-CS-08-616, May 22, 2008.

lawn200 [pdf]

Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures.

by Marc Baboulin, Jack J. Dongarra and Stanimire Tomov

UT-CS-08-615, May 6, 2008.

lawn199 [pdf]

Rectangular Full Packed Format for Cholesky’s Algorithm: Factorization, Solution and Inversion.

by Fred G. Gustavson, Jerzy Wasniewski, Julien Langou and Jack J. Dongarra

UT-CS-08-614, April 28, 2008.

lawn198 [pdf]

Blocked Algorithms for the Reduction to Hessenberg-Triangular Form Revisited.

by *Bo Kågström, Daniel Kressner, Enrique S. Quintana-Ortí, and Gregorio Quintana-Ortí*

February 2008.

lawn197 [pdf]

Using GPUs to Accelerate the Bisection Algorithm for Finding Eigenvalues of Symmetric Tridiagonal Matrices.

by Vasily Volkov and James W. Demmel

UCB/EECS-2007-179, January 2008.

lawn196 [pdf]

A global convergence proof of cyclic Jacobi methods with block rotations.

by Zlatko Drmač

December 2007.

lawn195 [pdf]

ScaLAPACK’s MRR Algorithm.

by Christof Vömel

November 2007.

lawn194 [pdf]

A Refined Representation Tree for MRRR.

by Christof Vömel

November 2007.

lawn193 [pdf]

Computing the Conditioning of the Components of a Linear Least Squares Solution.

by Marc Baboulin, Jack Dongarra, Serge Gratton, and Julien Langou

UT-cs-07-604, September 2007.

lawn192 [pdf]

Parallel eigenvalue reordering in real Schur forms.

by R. Granat, B. Kågström, D. Kressner

September 2007.

lawn191 [pdf]

A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures.

by Alfredo Buttari, Julien Langou, Jakub Kurzak, and Jack Dongarra

UT-CS-07-600, September 7, 2007.

lawn190 [pdf]

by Alfredo Buttari, Julien Langou, Jakub Kurzak, and Jack Dongarra .Parallel Tiled QR Factorization for Multicore Architectures.

UT-CS-07-598, July 2007.

Published in Concurrency and Computation: Practice and Experience, volume 20, Issue 13, pages 1573-1590, Sep 2008, DOI: 10.1002/cpe.1301

lawn189 [pdf]

Fast and Small Short Vector SIMD Matrix Multiplication Kernels for the CELL Processor.

by Wesley Alvaro, Jakub Kurzak, and Jack Dongarra

UT-CS-08-609, January 2008.

lawn188 [pdf]

Extra-precise Iterative Refinement for Overdetermined Least Squares Problems.

by James Demmel, Yozo Hida, Xiaoye S. Li, and E. Jason Riedy

May 2007

Published in ACM Transactions on Mathematical Software, Vol. 35, No. 4, 2009.

lawn187 [pdf]

LAPACK 3.1 xHSEQR: Tuning and Implementation Notes on the Small Bulge Multi-shift QR Algorithm with Aggressive Early Deflation.

by Ralph Byers

May 2007

lawn186 [pdf]

Fast Linear Algebra is Stable.

by James Demmel, Ioana Dumitriu, and Olga Holtz

May 2007

Published in Numerische Mathematik, Volume 108, Issue 1 (October 2007), Pages: 59-91, Year of Publication: 2007, ISSN:0029-599X

lawn185 [pdf]

Limitations of the PlayStation 3 for High Performance Cluster Computing.

by Alfredo Buttari, Jack Dongarra, and Jakub Kurzak

UT-CS-07-597, May 2007

lawn184 [pdf]

Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization.

by Jakub Kurzak, Alfredo Buttari, and Jack Dongarra

UT-CS-07-596, May 2007

lawn183 [pdf]

Performance and Accuracy of LAPACK’s Symmetric Tridiagonal Eigensolvers.

by James W. Demmel, Osni A. Marques, Beresford N. Parlett, and Christof Vömel

April 2007

lawn182 [pdf]

A Testing Infrastructure for LAPACK’s Symmetric Eigensolvers.

by James W. Demmel, Osni A. Marques, Beresford N. Parlett, and Christof Vömel

April 2007

lawn181 [pdf]

Prospectus for the Next LAPACK and ScaLAPACK Libraries.

by James Demmel, Jack Dongarra, Beresford Parlett, William Kahan, Ming Gu, David Bindel Yozo Hida, Xiaoye Li, Osni Marques, E. Jason Riedy, Christof Vömel, Julien Langou, Piotr Luszczek, Jakub Kurzak, Alfredo Buttari, Julie Langou, and Stanimire Tomov

UT-CS-07-592, February 2007

lawn180 [pdf]

Computations to Enhance the Performance while Achieving the 64-bit Accuracy.

by Alfredo Buttari, Jack Dongarra, Jakub Kurzak, Piotr Luszczek, and Stanimire Tomov

UT-CS-06-584, November 2006.

lawn179 [pdf]

Parallel tools for solving incremental dense least squares problems. Application to space geodesy.

by Marc Baboulin, Luc Giraud, Serge Gratton, and Julien Langou

UT-CS-06-582, September 2006.

lawn178 [pdf]

Implementing Linear Algebra Routines on Multi-Core Processors with Pipelining and a Look Ahead

by Jakub Kurzak and Jack Dongarra

UT-CS-06-581, September 2006.

lawn177 [pdf]

Implementation of the Mixed-Precision High Performance LINPACK Benchmark on the CELL Processor

by Jakub Kurzak and Jack Dongarra

UT-CS-06-580, September 2006.

lawn176 [pdf]

On the failure of rank revealing QR factorization software - a case study

by Zlatko Drmac and Zvonimir Bujanovic

June 2006.

lawn175 [pdf]

Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear Systems)

by Julie Langou, Julien Langou, Piotr Luszczek, Jakub Kurzak, Alfredo Buttari and Jack Dongarra

June 2006.

lawn174 [pdf]

Cache Efficient Biadiagonalization Using BLAS 2.5 Operators

by G. W. Howell, J. W. Demmel, C. T. Fulton, S. Hammarling, and K. Marmol

February 2006.

lawn173 [pdf]

Multishift Variants of the QZ Algorithm with Agressive Early Deflation

by Bo Kågström and Daniel Kressner

February 2006.

lawn172 [pdf]

Benefits oF IEEE-754 Features in Modern Symmetric Tridiagonal Eigensolvers

by Osni A. Marques, E. Jason Riedy, and Christof Vömel

Technical Report UCB/CSD-05-1414, September 2005.

lawn171 [pdf]

Block Algorithms for Reordering Standard and Generalized Schur Forms

by Daniel Kressner

September 2005. Updated February 2006.

lawn170 [pdf]

New fast and accurate Jacobi SVD algorithm: II

by Zlatko Drmac and Kresimir Veselic

August 2005.

lawn169 [pdf]

New fast and accurate Jacobi SVD algorithm: I

by Zlatko Drmac and Kresimir Veselic

August 2005.

lawn168 [pdf]

  1. PDSYEVR. ScaLAPACK’s parallel MRRR algorithm for the symmetric eigenvalue problem

by Dominic Antonelli and Christof Vömel

Technical Report UCB/CSD-05-1399, August 2005.

lawn167 [pdf]

Subset Computations with the MRRR algorithm

by Osni A. Marques, Beresford N. Parlett, and Christof Vömel

Technical Report UCB/CSD-05-1392, August 2005.

lawn166 [pdf]

Computing The Bidiagonal SVD Using Multiple Relatively Robust Representations

by Paul R. Willems, Bruno Lang, and Christof Vömel

UT-CS-05-551, April 2005.

lawn165 [pdf]

Error Bounds from Extra Precise Iterative Refinement

by James Demmel, Yozo Hida, W. Kahan, Xiaoye S. Li, Soni Mukherjee, and E. Jason Riedy

UT-CS-05-547, February 2005.

lawn164 [pdf]

LAPACK 2005 Prospectus: Reliable and Scalable Software for Linear Algebra Computations on High End Computers

by Jim Demmel and Jack Dongarra

UT-CS-05-546, February 2005.

lawn163 [pdf]

How the MRRR Algorithm Can Fail on Tight Eigenvalue Clusters

by Beresford N. Parlett and Christof Vömel

UT-CS-04-542, December, 2004.

lawn162 [pdf]

The Design and Implementation of the MRRR Algorithm

by Inderjit S. Dhillon, Beresford N. Parlett, and Christof Vömel

UT-CS-04-541, December, 2004.

Published in ACM Transactions on Mathematical Software (TOMS), Volume 32, Issue 4 (December 2006), Pages: 533-560, Year of Publication: 2006, ISSN:0098-3500

lawn161 [pdf]

LAPack-Style Codes for Level 2 and 3 Pivoted Cholesky Factorizations

by Craig Lucas

UT-CS-04-522, February 2004

lawn160 [pdf]

Self Adapting Software for Numerical Linear Algebra and LAPACK for Clusters

by Zizhong Chen, Jack Dongarra, Piotr Luszczek, and Kenneth Roche

UT-CS-03-499, January 2003

Published in Parallel Computing, Volume 29, Issues 11-12, November-December 2003, Pages 1723-1743

lawn159 [pdf]

Finite-choice algorithm optimization in Conjugate Gradients

by Jack Dongarra and Victor Eijkhout

UT-CS-03-502, January 2003

lawn158 [pdf]

LAPACK3E — A Fortran 90-enhanced version of LAPACK

by Edward Anderson

UT-CS-02-497, December 2002

lawn157 [pdf]

Self-adapting Numerical Software for Next Generation Applications

by Jack Dongarra and Victor Eijkhout

UT-CS-02-484, August 2002

Published in International Journal of High Performance Computing Applications, Vol. 17, Year. 2, pages: 2-7, DOI: 10.1177

lawn156 [pdf]

Polynomial acceleration of optimised multi-grid smoothers basic theory

by Victor Eijkhout

UT-CS-02-477, August 2002

lawn155 [pdf]

An implementation of the dqds algorithm positive case

by Beresford N. Parlett and Osni A. Marques

UT-CS-02-475, August 2002

Published in Linear Algebra and Applications, year 1999, volume 309, page 2000

lawn154 [pdf]

Orthogonal Eigenvectors and Relative Gaps

by Inderjit S. Dhillon and Beresford N. Parlett

UT-CS-02-474, August 2002

lawn153 [pdf]

New Complex Parallel Eigenvalue and Eigenvector Routines

by M. Fahey

UT-CS-01-471, August 2001.

lawn152 [pdf]

Implementation for LAPACK of a Block Algorithm for Matrix 1-Norm Estimation

by S. Cheng and N. Higham

UT-CS-01-470, August 2001.

lawn151 [pdf]

Automatic Determination of Matrix-Blocks

by V. Eijkhout

UT-CS-01-458, April 2001.

lawn150 [pdf]

Discontinuous Plane Rotations and the Symmetric Eigenvalue Problem

by E. Anderson

UT-CS-00-454, December 2000.

lawn149 [pdf]

Design, Implementation and Testing of Extended and Mixed Precision BLAS

by X. Li, J. Demmel, D. Bailey, G. Henry, Y. Hida, J. Iskandar, W. Kahan, A. Kapur, M. Martin, T. Tung, D. J. Yoo

UT-CS-00-451, October 2000.

lawn148 [pdf]

On Computing Givens rotations reliably and efficiently

by D. Bindel, J. Demmel, W. Kahan, O. Marques

UT-CS-00-449, October 2000.

lawn147 [pdf]

Automated Empirical Optimization of Software and the ATLAS Project

by R. C. Whaley, A. Petitet, J. Dongarra

UT-CS-00-448, September 2000.

lawn146 [pdf]

A recursive formulation of Cholesky factorization of a matrix in packed storage

by B. Andersen, F. Gustavson and J. Wasniewski

UT-CS-00-441, May 2000.

lawn145 [pdf]

The weighted modification incomplete factorisation method

by V. Eijkhout

UT-CS-99-436, Dec 1999.

lawn144 [pdf]

On the Existence Problem of Incomplete Factorisation Methods

by V. Eijkhout

UT-CS-99-435, Dec 1999.

lawn143 [pdf]

A Comparison of Parallel Solvers for Diagonally Dominant and General Narrow-Banded Linear Systems II

by P. Arbenz, A. Cleary, J. Dongarra, and M. Hegland

UT-CS-99-415, May 1999.

lawn142 [pdf]

A Comparison of Parallel Solvers for Diagonally Dominant and General Narrow-Banded Linear Systems

by P. Arbenz, A. Cleary, J. Dongarra, and M. Hegland

UT-CS-99-414, Feb 1999.

lawn141 [pdf]

Overview of Iterative Linear System Solver Packages

by Victor Eijkhout

UT-CS-98-411, Dec 1998.

lawn140 [pdf]

NetSolve version 1.2: Design and Implementation

by H. Casanova, J. Dongarra

UT-CS-98-406, Nov 1998.

lawn139 [pdf]

A Numerical Linear Algebra Problem Solving Environment Designer’s Perspective

by A. Petitet, H. Casanova, J. Dongarra, Y. Robert & R.C. Whaley

UT-CS-98-405, Oct 1998.

lawn138 [pdf]

Testing Software for LAPACK90

by J. Dongarra, W. Owczarz, J. Wasniewski, P. Yalamov

UT-CS-98-401, Sept 1998.

lawn137 [pdf]

Installation Guide and Design of the HPF 1.1 interface to ScaLAPACK, SLHPF

by L. S. Blackford, J. J. Dongarra, C. A. Papadopoulos and R. C. Whaley

UT-CS-98-396, August 1998.

lawn136 [pdf]

ScaLAPACK Evaluation and Performance at the DoD MSRCs

by L. S. Blackford and R. C. Whaley

UT-CS-98-388, April 1998.

lawn135 [pdf]

Packed Storage Extensions for ScaLAPACK

by E. D’Azevedo and J. Dongarra

UT-CS-98-385, April 1998.

lawn134 [pdf]

High Performance Linear Algebra Package — LAPACK90

by J. Wasniewski and J. Dongarra

UT-CS-98-384, April 1998.

lawn133 [pdf]

Algorithmic Redistribution Methods for Block Cyclic Distributions

by A. Petitet and J. Dongarra

UT-CS-98-383, March 1998.

lawn132 [pdf]

Parallelizing the Divide and Conquer Algorithm for the Symmetric Tridiagonal Eigenvalue Problem on Distributed Memory Architectures

by F. Tisseur and J. Dongarra

UT-CS-98-382, March 1998.

lawn131 [pdf]

Automatically Tuned Linear Algebra Software

by R. Whaley and J. Dongarra

UT-CS-97-366, December 1997.

lawn130 [pdf]

Accurate SVDs of Structured Matrices

by J. Demmel

UT-CS-97-375, October 1997.

lawn129 [pdf]

A New Parallel Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers

by J. Choi

UT-CS-97-369, September 1997.

lawn128 [pdf]

Algorithmic Redistribution Methods for Block Cyclic Decompositions

by A. Petitet

UT-CS-97-371, July 1997.

lawn127 [pdf]

Sparse Gaussian Elimination on High Performance Computers

by X. Li

UT-CS-97-368, June 1997.

lawn126 [pdf]

Performance Improvements to LAPACK for the Cray Scientific Library

by E. Anderson and M. Fahey

UT-CS-97-359, April 1997.

lawn125 [pdf]

Implementation in ScaLAPACK of Divide-and-Conquer Algorithms for Banded and Tridiagonal Linear Systems

by A. Cleary and J. Dongarra

UT-CS-97-358, April 1997.

lawn124 [pdf]

An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination

by J. Demmel, J. Gilbert, and X. Li

UT-CS-97-357, April 1997.

lawn123 [pdf]

A Test Matrix Collection for Non-Hermitian Eigenvalue Problems

by Z. Bai, D. Day, J. Demmel and J. Dongarra

UT-CS-97-355, March 1997.

lawn122 [pdf]

A New Deflation Criterion for the QR Algorithm

by M. Ahues and F. Tisseur

UT-CS-97-353, March 1997.

lawn121 [pdf]

A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures

by G. Henry, D. Watkins, and J. Dongarra

UT-CS-97-352, March 1997.

lawn120 [pdf]

Scheduling Block-Cyclic Array Redistribution

by F. Desprez, J. Dongarra, A. Petitet, C. Randriamaro and Y. Robert

UT-CS-97-349, February 1997.

lawn119 [pdf]

Computing the Singular Value Decomposition with High Relative Accuracy

by J. Demmel, M. Gu, S. Eisenstat, I. Slapnicar, K. Veselic and Z. Drmac

UT-CS-97-348, February 1997.

lawn118 [pdf]

The Design and Implementation of the Parallel OUT-of-core ScaLAPACK LU, QR, and Cholesky Factorization Routines

by J. J. Dongarra and E. F. D’Azevedo

UT-CS-97-347, January 1997.

lawn117 [pdf]

A Fortran 90 Interface for LAPACK

by L. Susan Blackford, Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, Jerzy Wasniewski

UT-CS-96-341, December 1996.

lawn116 [pdf]

Parallel Matrix Distributions: Have we been doing it all right?

by M. Sidani and B. Harrod

UT-CS-96-340, November 1996.

lawn115 [pdf]

On the Error Analysis and Implementation of Some Eigenvalue Decomposition and Singular Value Decomposition Algorithms

by H. Ren

UT-CS-96-336, September 1996.

lawn114 [pdf]

A BLAS-3 Version of the QR Factorization with Column Pivoting

by G. Quintana-Orti, X. Sun, and C. Bischof

UT-CS-96-334, August 1996.

lawn113 [pdf]

Block-Partitioned Algorithms for Solving the Linear Least Squares Problem

by G. Quintana-Orti, E. S. Quintana-Orti, and A. Petitet

UT-CS-96-333, July 1996.

lawn112 [pdf]

Practical Experience in the Dangers of Heterogeneous Computing

by L. S. Blackford, A. Cleary, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, A. Petitet, H. Ren, K. Stanley, and R. C. Whaley

UT-CS-96-330, July 1996.

lawn111 [pdf]

Optimizing Matrix Multiply using PHiPAC: a Portable, High-Performance, ANSI C Coding Methodology

by J. Bilmes, K. Asanovic, J. Demmel, D. Lam, and C.-W. Chin

UT-CS-96-326, May 1996.

lawn110 [pdf]

Key Concepts For Parallel OUT-Of-Core LU Factorization

by J. J. Dongarra, S. Hammarling, and D. W. Walker

UT-CS-96-324, April 1996.

lawn109 [pdf]

BLAS Technical Workshop

by J. Dongarra, S. Hammarling, and S. Ostrouchov

UT-CS-95-317, November 1995.

lawn108 [pdf]

GEMM-Based Level 3 BLAS: Installation, Tuning and Use of the Model Implementations and the Performance Evaluation Benchmark

by B. Kagstrom, P. Ling, and C. Van Loan

UT-CS-95-316, November 1995.

lawn107 [pdf]

GEMM-Based Level 3 BLAS: High-Performance Model Implementations and Performance Evaluation Benchmark

by B. Kagstrom, P. Ling, and C. Van Loan

UT-CS-95-315, November 1995.

lawn106 [pdf]

Templates for Linear Algebra Problems

by Z. Bai, D. Day, J. Demmel, J. Dongarra, M. Gu, A. Ruhe, and H. van der Vorst

UT-CS-95-311, October 1995.

lawn105 [pdf]

Stability of the Diagonal Pivoting Method with Partial Pivoting

by N. J. Higham

UT-CS-95-309, October 1995.

lawn104 [pdf]

Iterative Refinement and LAPACK

by N. J. Higham

UT-CS-95-308, October 1995.

lawn103 [pdf]

A Supernodal Approach to Sparse Partial Pivoting

by J. W. Demmel, S. C. Eisenstat, J. R. Gilbert, X. S. Li, and J. W. H. Liu

UT-CS-95-304, September 1995.

lawn102 [pdf]

IML++ v. 1.2: Iterative Methods Library Reference Guide

by J. Dongarra, A. Lumsdaine, R. Pozo, and K. Remington

UT-CS-95-303, August 1995.

lawn101 [pdf]

A Proposal for a Fortran 90 Interface for LAPACK

by J. J. Dongarra, J. Du Croz, S. Hammarling, J. Wasniewski, and A. Zemla

UT-CS-95-295, July 1995.

lawn100 [pdf]

A Proposal for a Set of Parallel Basic Linear Algebra Subprograms

by J. Choi, J. Dongarra, S. Ostrouchov, A. Petitet, D. Walker and R. C. Whaley

UT-CS-95-292, May 1995.

lawn99 [pdf]

Reverse Communication Interface for Linear Algebra Templates for Iterative Methods

by J. Dongarra, V. Eijkhout, and A. Kalhan

UT-CS-95-291, May 1995.

lawn98 [pdf]

LAPACK++ V. 1.0: High Performance Linear Algebra Users' Guide

by J. Dongarra, R. Pozo, and D. Walker

UT-CS-95-290, May 1995.

lawn97 [pdf]

Modeling the Benefits of Mixed Data and Task Parallelism

by S. Chakrabarti, J. Demmel, and D. Yelick

UT-CS-95-289, May 1995.

lawn96 [pdf]

SUMMA: Scalable Universal Matrix Multiplication Algorithm

by R. A. van de Geijn and J. Watts

UT-CS-95-286, April 1995.

lawn95 [pdf]

ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance

by J. Choi, J. Demmel, I. Dhillon, J. Dongarra, S. Ostrouchov, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley

UT-CS-95-283, March 1995.

lawn94 [pdf]

A User’s Guide to the BLACS v1.1

by J. Dongarra and R. C. Whaley UPDATED: May 5, 1997 (VERSION 1.1).

UT-CS-95-281, March 1995.

lawn93 [pdf]

Installation Guide for ScaLAPACK

by J. Choi, J. Demmel, I. Dhillon, J. Dongarra, S. Ostrouchov, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley UPDATED: August 31, 2001 (VERSION 1.7).

UT-CS-95-280, March, 1995.

lawn92 [pdf]

The Design of a Parallel Dense Linear Algebra Software Library: Reduction to Hessenberg, Tridiagonal, and Bidiagonal Form

by J. Choi, J. Dongarra, and D. Walker

UT-CS-95-275, February 1995.

lawn91 [pdf]

The Spectral Decomposition of Nonsymmetric Matrices on Distributed Memory Computers

by Z. Bai, J. Demmel, J. Dongarra, A. Petitet, H. Robinson, and K. Stanley

UT-CS-95-273, January 1995.

lawn90 [pdf]

Algorithm-Based Diskless Checkpointing for Fault Tolerant Matrix Operations

by J. S. Plank, Y. Kim, and J. J. Dongarra

UT-CS-94-268, December 1994.

lawn89 [pdf]

Solving Secular Equations Stably and Efficiently

by Ren-Cang Li

UT-CS-94-260, November, 1994.

lawn88 [pdf]

Efficient Computation of the Singular Value Decomposition with Applications to Least Squares Problems

by Ming Gu, James Demmel, and Inderjit Dhillon

UT-CS-94-257, October, 1994.

lawn87 [pdf]

Computing Eigenspaces with Specified Eigenvalues of a Regular Matrix Pair (A,B) and Condition Estimation: Theory Algorithms and Software

by B. Kagstrom and P. Poromaa

UT-CS-94-255, September, 1994.

lawn86 [pdf]

The Performance of Finding Eigenvalues and Eigenvectors of Dense Symmetric Matrices on Distributed Memory Computers

by J. Demmel and K. Stanley

UT-CS-94-254, September, 1994.

lawn85 [pdf]

Relative Perturbation Theory: (II) Eigenspace Variations

by Ren-Cang Li

UT-CS-94-253, September, 1994.

lawn84 [pdf]

Relative Perturbation Theory: (I) Eigenvalue Variations

by Ren-Cang Li

UT-CS-94-252, September, 1994.

lawn83 [pdf]

Relative Perturbation Bounds for the Unitary Polar Factor

by Ren-Cang Li

UT-CS-94-251, September, 1994.

lawn82 [pdf]

Call Conversion Interface (CCI) for LAPACK/ESSL

by J. Dongarra and M. Kolatis

UT-CS-94-250, August, 1994.

lawn81 [pdf]

Quick Installation Guide for LAPACK on Unix Systems

by S. Blackford and J. Dongarra

UT-CS-94-249, September, 1994. UPDATED: June 30, 1999. (VERSION 3.0)

lawn80 [pdf]

The Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

by J. Choi, J. J. Dongarra, S. Ostrouchov, A. P. Petitet, D. W. Walker, and R. C. Whaley

UT-CS-94-246, September, 1994.

lawn79 [pdf]

Parallelizing the QR Algorithm for the Unsymmetric Algebraic Eigenvalue Problem: Myths and Reality

by Greg Henry and Robert van de Geijn

UT-CS-94-244, August, 1994.

lawn78 [pdf]

Computational variants of the CGS and BiCGstab methods

by Victor Eijkhout

UT-CS-94-241, August, 1994.

lawn77 [pdf]

Basic Concepts for Distributed Sparse Linear Algebra Operations

by Victor Eijkhout and Roldan Pozo

UT-CS-94-240, August, 1994.

lawn76 [pdf]

Algorithic Bombardment for the Iterative Solution of Linear Systems: A Poly-Iterative Approach

by Richard Barrett, Michael Berry, Jack Dongarra, Victor Eijkhout, and Charles Romine

UT-CS-94-239, August, 1994.

lawn75 [pdf]

LAPACK-Style Algorithms and Software for Solving the Generalized Sylvester Equation and Estimating the Separating Between Regular Matrix Pairs

by Bo Kagstrom and Peter Poromaa

UT-CS-94-237, July 1994.

lawn74 [pdf]

A Sparse Matrix Library in C++ for High Performance Architectures

by J. Dongarra, A. Lumsdaine, X. Niu, R. Pozo, and K. Remington

UT-CS-94-236, July 1994.

lawn73 [pdf]

Basic Linear Algebra Communication Subprograms: Analysis and Implementation Across Multiple Parallel Architectures

by R. Clint Whaley

UT-CS-94-234, May 1994.

lawn72 [pdf]

The Computation of Elementary Unitary Matrices

by R. Lehoucq

UT-CS-94-233, October 1995.

lawn71 [pdf]

IBM RS/6000-550 & -590 Performance for Selected Routines in ESSL

by Jack Dongarra and Michael Kolatis

UT-CS-94-231, April 1994.

lawn70 [pdf]

On the Correctness of Parallel Bisection in Floating Point

by James Demmel, Inderjit Dhillon, and Huan Ren

UT-CS-94-228, March 1994.

lawn69 [pdf]

A Serial Implementation of Cuppen’s Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem

by J. Rutter

UT-CS-94-225, March 1994.

lawn68 [pdf]

A Highly Parallel Algorithm for the Reduction of a Nonsymmetric Matrix to Block Upper-Hessenberg Form

by Michael W. Berry, Jack J. Dongarra and Youngbae Kim

UT-CS-94-221, February 1994.

lawn67 [pdf]

Performance Complexity of $LU$ Factorization with Efficient Pipelining and Overlap on a Multiprocessor

by F. Desprez, J. Dongarra, and B. Tourancheau

UT-CS-93-218, December, 1993.

lawn66 [pdf]

A Characterization of Polynomial Iterative Methods

by Victor Eijkhout

UT-CS-93-216, November 1993.

lawn65 [pdf]

Parallel Matrix Transpose Algorithms on Distributed Memory Concurrent Computers

by Jaeyoung Choi, Jack J. Dongarra, and David W. Walker

UT-CS-93-215, November, 1993.

lawn64 [pdf]

Distributed Sparse Gaussian Elimination and Orthogonal Factorization

by Padma Raghavan

UT-CS-93-203, August 1993.

lawn63 [pdf]

Line and Plane Separators

by Michael T. Heath and Padma Raghavan

UT-CS-93-202, August 1993.

lawn62 [pdf]

Distributed Solution of Sparse Linear Systems

by Michael T. Heath and Padma Raghavan

UT-CS-93-201, August 1993.

lawn61 [pdf]

An Object Oriented Design for High Performance Linear Algebra on Distributed Memory Architectures

by J. Dongarra, R. Pozo, and D. Walker

UT-CS-93-200, August 1993.

lawn60 [pdf]

Parallel Numerical Linear Algebra

by James W. Demmel, Michael T. Heath, and Henk A. van der Vorst

UT-CS-93-192, March 1993.

lawn59 [pdf]

Faster Numerical Algorithms via Exception Handling

by James W. Demmel and Xiaoye Li

UT-CS-93-192, March 1993.

lawn58 [pdf]

The Design of Linear Algebra Libraries for High Performance Computer

by Jack Dongarra and David Walker

UT-CS-93-188, June 1993.

lawn57 [pdf]

lawn57.tgz .PUMMA: Parallel Universal Matrix Multiplication Algorithms on Distributed Memory Concurrent Computers

by Jaeyoung Choi, Jack J. Dongarra, and David W. Walker

UT-CS-93-187, May 1993.

lawn56 [pdf]

lawn56.tgz .Reducing Communication Costs in the Conjugate Gradient Algorithm on Distributed Memory Multiprocessors

by E.F. D’Azevedo, V.L. Eijkhout and C.H. Romine

UT-CS-93-185, January 1993.

lawn55 [pdf]

ScaLAPACK: A Scalable Linear Algebra for Distributed Memory Concurrent Computers

by J. Choi, J. Dongarra, R. Pozo, and D. Walker

UT-CS-92-181, November 1992.

lawn54 [pdf]

On Swapping Diagonal Blocks in Real Schur Form

by Z. Bai and J.W. Demmel

UT-CS-92-182, October 1992.

lawn53 [pdf]

Trading Off Parallelism and Numerical Stability

by J.W. Demmel

UT-CS-92-179, June 1992.

lawn52 [pdf]

A Cartesian Parallel Nested Dissection Algorithm

by Michael T. Heath and Padma Raghavan

UT-CS-92-178, June 1992.

lawn51 [pdf]

Qualitative Properties of the Conjugate Gradient and Lanczos Methods in a Matrix Framework

by Victor Eijkhout

UT-CS-92-170, May 1992.

lawn50 [pdf]

Distributed Sparse Data Structures for Linear Algebra Operations

by Victor Eijkhout

UT-CS-92-169, May 1992.

lawn49 [pdf]

A Specification for Floating Point Parallel Prefix

by J. Demmel

UT-CS-92-167, May 1992.

lawn48 [pdf]

On Computing Accurate Singular Values and Eigenvalues of Matrices with Acyclic Graphs

by J. Demmel and W. Gragg

UT-CS-92-166, May 1992.

lawn47 [pdf]

Open Problems in Numerical Linear Algebra

by J. Demmel

UT-CS-92-164, May 1992.

lawn46 [pdf]

Computing the Generalized Singular Value Decomposition

by Z. Bai and J. Demmel

UT-CS-92-163, May 1992.

lawn45 [pdf]

The Inherent Inaccuracy of Implicit Tridiagonal QR

by J. Demmel

UT-CS-92-162, May 1992.

lawn44 [pdf]

Performance of LAPACK: A Portable Library of Numerical Linear Algebra Routines

by Edward Anderson and Jack Dongarra

UT-CS-92-156, May 1992.

lawn43 [pdf]

A Look at Scalable Dense Linear Algebra Libraries

by Jack Dongarra, Robert van de Geijn and David Walker

UT-CS-92-155, April, 1992.

lawn42 [pdf]

Perturbation Theory and Backward Error for $AX-XB=C$

by Nick Higham

UT-CS-92-153, April, 1992.

lawn41 [pdf]

Installation Guide for LAPACK

by Susan Blackford and Jack Dongarra

UT-CS-92-151, March, 1992. Updated: June 30, 1999 (VERSION 3.0)

lawn40 [pdf]

Block LU Factorization

by James Demmel, Nick Higham, Rob Schreiber

UT-CS-92-149, February 1992.

lawn39 [pdf]

On Designing Portable High Performance Numerical Libraries

by James Demmel, Jack Dongarra, and W. Kahan

UT-CS-91-141, July, 1991.

lawn38 [pdf]

On a Direct Algorithm for Computing Invariant Subspaces with Specified Eigenvalues

by Z. Bai and J. Demmel

UT-CS-91-139, November, 1991.

lawn37 [pdf]

Two Dimensional Basic Linear Algebra Communication Subprograms

by Jack J. Dongarra and Robert A. van de Geijn

UT-CS-91-138, October, 1991.

lawn36 [pdf]

Robust Triangular solvers

by E. Anderson

UT-CS-91-142, August, 1991.

lawn35 [pdf]

Implementation guide for LAPACK

by E. Anderson, J. Dongarra, and S. Ostrouchov

UT-CS-91-138, August 1991.

lawn34 [pdf]

Workshop on the BLACS

by J. J. Dongarra

UT-CS-91-134, May 1991.

lawn33 [pdf]

Robust Incremental Condition Estimation

by C. Bischof, P.T.P. Tang

UT-CS-91-133, May 1991.

lawn32 [pdf]

Generalized Incremental Condition Estimation

by C. Bischof, P.T.P. Tang

UT-CS-91-132, May 1991.

lawn31 [pdf]

Generalized QR Factorization and its Applications

by E. Anderson, Z. Bai, J. Dongarra

UT-CS-91-131, April 1991.

lawn30 [pdf]

Reduction to Condensed Form for the Eigenvalue Problem on Distributed Memory Architectures

by J. Dongarra, R. van de Geijn

UT-CS-91-130, April 1991.

lawn29 [pdf]

On Global Combine Operations

by R. van de Geijn

UT-CS-91-129, April 1991.

lawn28 [pdf]

The IBM RISC System/6000 and Linear Algebra Operations

by J. Dongarra, P. Mayes, G. Radicati

UT-CS-90-122, December 1990.

lawn27 [pdf]

Stability of Methods for Matrix Inversion

by J. DuCroz, N. Higham

UT-CS-90-119, October, 1990.

lawn26 [pdf]

Prospectus for an Extension to LAPACK: A Portable Linear Algebra Library for High-Performance Computers

by E. Anderson, C. Bischof, J. Demmel, J. Dongarra, J. DuCroz, S. Hammarling, and W. Kahan

UT-CS-90-118, November 1990.

lawn25 [pdf]

Numerical Considerations in Computing Invariant Subspaces

by J. Dongarra, S. Hammarling, and J. Wilkinson

UT-CS-90-117, October, 1990.

lawn24 [pdf]

LAPACK Block Factorization Algorithms on the Intel iPSC/860

by J. Dongarra and S. Ostrouchov

UT-CS-90-115, October, 1990.

lawn23 [pdf]

Improved Error Bounds for Underdetermined System Solvers}

by J. Demmel and N. Higham

UT-CS-90-113, August 1990.

lawn22 [pdf]

Stability of Block Algorithms with Fast Level 3 BLAS

by J. Demmel and N. Higham

UT-CS-90-110, July 1990.

lawn21 [pdf]

Factorizations of Band Matrices Using Level 3 BLAS

by Jeremy Du Croz, Peter Mayes, and Guiseppe Radicati

UT-CS-90-109, July 1990.

lawn20 [pdf]

LAPACK: A Portable Linear Algebra Library for High-Performance Computers

by E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. DuCroz, A. Greenbaum, S. Hammarling, A. McKenney, D. Sorensen

UT-CS-90-105, May 1990.

lawn19 [pdf], download scanned copy with figures +[pdf scan]

Evaluating Block Algorithm Variants in LAPACK

by E. Anderson and J. Dongarra

UT-CS-90-103, April 1990.

lawn18 [pdf]

Implementation Guide for LAPACK

by E. Anderson and J. Dongarra

UT-CS-90-101, April 1990.

lawn17 [pdf]

Experiments with QR/QL Methods for the Symmetric Tridiagonal Eigenproblem

by A. Greenbaum and J. Dongarra

UT-CS-89-92, November 1989.

lawn16 [pdf]

Results from the Initial Release of LAPACK

by E. Anderson and J. Dongarra

UT-CS-89-89, November 1989. (Replaced by lawn 41 or 81!!)

lawn15 [pdf]

Jacobi’s Method is More Accurate than QR

by J. Demmel and K. Veselic

UT-CS-89-88, October 1989.

lawn14 [pdf]

On Floating Point Errors in Cholesky

by J. Demmel

UT-CS-89-87, October 1989.

lawn13 [pdf]

On the Conditioning of the Nonsymmetric Eigenproblem: Theory and Software

by Z. Bai, J. Demmel, and A. McKenney

UT-CS-89-86, October 1989.

lawn12 [pdf]

Banded Cholosky Factorization Using Level 3 BLAS

by Peter Mayes and Giuseppe Radicati ANL/MCS-TM-134, August 1989.

lawn11 [pdf]

The Bidiagonal Singular Value Decomposition and Hamiltonian Mechanics

by P. Deift, J. Demmel, L.-C. Li, and C. Tomei ANL, MCS-TM-133, August 1989.

lawn10 [pdf]

Installing and Testing the Initial Release of LAPACK --Unix and Non-Unix Versions

by E. Anderson and J. Dongarra ANL, MCS-TM-130, May 1989.

lawn09 [pdf]

A Test Matrix Generation Suite

by J. Demmel and A. McKenney ANL, MCS-P69-0389, March 1989.

lawn08 [pdf]

On a Block Implementation of Hessenberg Multishift QR Iteration

by Z. Bai and J. Demmel

ANL, MCS-TM-127, January 1989.

lawn07 [pdf]

Computing Accurate Eigensystems of Scaled Diagonally Dominant Matrices

by J. Barlow and J. Demmel

ANL, MCS-TM-126, December 1988.

lawn06 [pdf]

Tools to Aid in the Analysis of Memory Access Patterns for FORTRAN Programs

by O. Brewer, J. Dongarra, and D. Sorensen

ANL, MCS-TM-120, June 1988

lawn05 [pdf]

Provisional Contents

by C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, and D. Sorensen

ANL, MCS-TM-38, September 1988

lawn04 [pdf]

Guidelines for the Design of Symmetric Eigenroutines, SVD, and Iterative Refinement and Condition Estimation for Linear Systems

by J. Demmel, J. Du Croz, S. Hammarling, and D. Sorensen

ANL, MCS-TM-111, March 1988

lawn03 [pdf]

Computing Small Singular Values of Bidiagonal Matrices with Guaranteed High Relative Accuracy

by J. Demmel and W. Kahan

ANL, MCS-TM-110, February 1988

lawn02 [pdf]

Block Reduction of Matrices to Condensed Forms for Eigenvalue Computations

by J. Dongarra, S. Hammarling, and D. Sorensen

ANL, MCS-TM-99, September 1987

lawn01 [pdf]

Prospectus for the Development of a Linear Algebra Library for High-Performance Computers

by J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, and D. Sorensen

ANL, MCS-TM-97, September 1987