BLAS Technical Workshop

November 13-14, 1995

University of Tennessee, Knoxville

The University of Tennessee, Cray Research, and Rutherford Lab are organizing a workshop on November 13th and 14th, 1995, to study two related topics for software relating to linear algebra.

  1. Developing a set parallel BLAS and related interfaces for linear algebra
  2. Algorithm implementations on todays high performance computers

The existing BLAS have proven to be very effective in assisting portable, efficient software for sequential and some of the current class of high-performance computers. We would like to investigate the possibility of extending the currently accepted standards to provide greater coverage of sparse matrices and provide additional facilities for parallel computing. In particular to standardize on a set of Parallel BLAS along the lines of the existing BLAS for the dense and sparse cases.

The goal of this workshop is to stimulate thought, discussion, and comment on the future development of a set of standards for basic matrix data structures, both dense and sparse, as well as calling sequences for a set of low-level computational kernels for the parallel and sequential settings. These new ``standards" are needed to complement and supplement the existing ones for sparse and parallel computation. One of the major aims of these standards will be to enable linear algebra libraries (both public domain and commercial) to interoperate efficiently and easily.

BLAS Technical Forum homepage

Workshop Agenda

Sunday November 12
7:00 pm Reception at the Hilton Hotel
Monday November 13
University Center Room 221
8:00 am Registration
Session 1: Jack Dongarra, Chair
8:50 am Local Information and Particulars
Jack Dongarra, University of Tennessee
9:00 am Historical Perspective
Sven Hammarling, NAG and University of Tennessee
9:15 am Standard Sequential Mathematical Libraries: Promises and Pitfalls, Opportunities and Challenges
Andrew Lumsdaine, University of Notre Dame
9:30 am Requirements for Parallel BLAS: A Library Writer's Perspective
Bill Gropp, Argonne National Laboratory
9:45 am On the Sparse BLAS Work
Iain Duff, Rutherford Appleton Laboratory
10:00 am Discussion
10:30 am Break
Session 2: Iain Duff, Chair
11:00 am Sparse BLAS, Toolkits and Primitives
Mike Heroux, Cray Research, Inc.
11:15 am Basic Linear Algebra Communication Subroutines Used by ScaLAPACK
Clint Whaley, University of Tennessee
11:30 am Parallel BLAS Used by ScaLAPACK
Antoine Petitet, University of Tennessee
11:45 am Discussion
12:15 pm Lunch
Session 3: Mike Heroux, Chair
1:30 pm Parallel Givens and a Better Symmetric Update
Linda Kaufman, Bell Labs
1:45 pm Physically Based Matrix Distribution: Theory and Interface
John Gunnels and Carter Edwards, U of Texas - Austin
2:15 pm OBLAS: Objective Basic Linear Algebra Subprograms (One Call for all LAS)
Craig C. Douglas, IBM TJ Watson Research
2:30 pm Discussion
3:00 pm Break
Session 4: Bo Kagstrom, Chair
3:30 pm Fortran 90 Version of the BLAS
Jeremy Du Croz, NAG
3:45 pm Key Concepts for Parallel Out-Of-Core LU Factorization
David Walker, Oak Ridge National Laboratory
4:00 pm The Importance of Highly Efficient Computational Kernels for All Block Sizes
Barry Smith, Argonne National Laboratory
4:15 pm Future Research Directions in Scalable Software Libraries
Anthony Skjellum, Mississippi State University
4:30 pm P_SPARSLIB: A Parallel Sparse Iterative Solution Package
Yousef Saad, University of Minnesota
4:45 pm Discussion
6:00 pm Dinner
8:00 pm Birds-of-a-Feather Sessions
Extensions to the existing BLAS, Jeremy Du Croz and Linda Kaufman
Sparse BLAS, Iain Duff and Mike Heroux
Matrix Distributions, Jack Dongarra, Bo Kagstrom, and Robert van de Geijn
Tuesday November 14
University Center Room 221
Session 5: Jeremy DuCroz, Chair
9:00 am GEMM-Based Level 3 BLAS: High Performance Model Implementations and Performance Evaluations Benchmark
Bo Kagstrom, Umea University
9:15 am Portable Automatic Generation of Fast BLAS-GEMM Compatible Matrix-Matrix Multiply Using PHiPAC Techniques
Jeff Bilmes, University of California, Berkeley
9:30 am A GAM Implementation of the BLACS
Melody Y. Ivory, University of California, Berkeley
9:45 am Discussion
10:15 am Break
Session 6: Sven Hammarling, Chair
10:45 am Parallel BLAS and SCALAPACK results on Meiko
Michel Dayde, ENSEEIHT-IRIT, Toulouse
11:00 am 3D || Matrix Multiply
Fred Gustavson, IBM
11:15 am A Poly-Algorithm for Parallel Dense Matrix Multiplication on Two-Dimensional Process Grid Topologies
Jin Li, Mississippi State University
11:30 am Issues in Standardizing the Parallel BLAS
Steve Huss-Lederman, Supercomputing Research Center
11:45 am Discussion
12:15 pm Lunch
Session 7: Tony Skjellum, Chair
1:30 pm Performance impacts of PBLAS interface and implementation decisions on LU decomposition and reduction to tridiagonal form
Ken Stanley, University of California, Berkeley
2:00 pm On C/C++ Work on the Sparse BLAS
Roldan Pozo, NIST
2:15 pm Discussion
2:45 pm Break
Session 8: Jack Dongarra, Chair
2:45 pm Highly Parallel Formulations of Sparse Matrix Computations
Vipin Kumar, University of Minnesota
3:00 pm Summary and Wrap-up
Pete Stewart, University of Maryland
3:15 pm Discussion

If you would like to handout reports or documents during the workshop, please bring the material with you. We will not be able to reproduce large volumes of material.

We estimate an attendance of 50 persons at the maximum.


This workshop is organized by Jack Dongarra, Iain Duff , and Mike Heroux, and supported in part by the National Science Foundation Science and Technology Center CRPC .


We have made arrangements with the Downtown Hilton Hotel in Knoxville.

Hilton Hotel
501 W. Church Street
Knoxville, TN
Phone: 615-523-2300

When making arrangements tell the hotel you are associated with the BLAS Workshop. The rooms will cost $68 for a single, and $78 for a double.

You can download postscript maps of the area by looking at

Map of Knoxville downtown area.

Map of Knoxville region.

Map of University of Tennessee campus.

We estimate an attendance of 40 persons at the maximum.

All presentations will take place in University Center on campus, a short walk from the hotel, about 15 minutes.

There will be a $25.00 registration fee, payable at the meeting, to cover the meeting room, reception, and refreshments during the breaks.

You can rent a car or get a cab from the airport to the hotel.

The airport is the Knoxville McGhee-Tyson Airport.

In general, people are on their own for all meals.

Please let me know if you are planning to attend.


If you would like to make a presentation, let me know.

List of working notes to be discussed.

(If you would like to add to this list send me the url.)

  • A Proposal for a Set of Parallel Basic Linear Algebra Subprograms, J. Choi, J. Dongarra, S. Ostrouchov, A. Petitet, D. Walker, and R. Whaley, LAPACK Working Note #100, Click here for the postscript version.

  • A Revised Proposal for a Sparse BLAS Toolkit, , S. Carney, M. A. Heroux, G. Li, and K. Wu, AHPCRC Preprint 94-034, SPARKER working note # 3.

  • A Set of Level 3 Basic Linear Algebra Subprograms for Sparse Matrices, I. Duff, M. Marrone, G. Radicati, C. Vittoli, RAL-TR-95-049.

  • Scalable Parallel Algorithms for Sparse Linear Systems, V. Kumar and G. Karypis.

  • P_SPARSLIB: a library of distributed sparse iterative solvers, Y. Saad.

  • A GAM Implementation of the BLACS, Melody Y. Ivory, UC Berkeley.

  • A Comprehensive Approach to Parallel Linear Algebra Libraries, Almadena Chtchelkanova, Carter Edwards, John Gunnels, Sam Guyer, Ken Klimkowski, Greg Morrow, James Overfelt, Abani Patra, Jaehoon Seol, Robert van de Geijn.

  • OBLAS: Objective basic linear algebra subprograms, C. C. Douglas.

  • Portable automatic generation of fast BLAS-GEMM compatible ma trix-matrix multiply using PHiPAC techniques, Jeff Bilmes, UC Berkeley.

  • Parallel Matrix Distributions: have we been doing it all wrong? Robert van de Geijn, Department of Computer Sciences, U of Texas, and Texas Institute for Computational and Applied Mathematics

  • Physically Based Matrix Distribution: Theory and Interface John Gunnels, Department of Computer Sciencesa, U of Texas. and Carter Edwards, Texas Institute for Computational and Applied Mathematics, U of Texas.

  • Efficient Parallel Level 3 BLAS Implementation Almadena Chtchelkanova, Department of Computer Sciences, U of Texas.

  • A User's Guide to the BLACS v1.0 Jack J. Dongarra and R. Clint Whaley, University of Tennessee.

  • Matrix Multiplication Work and How Data Layouts Effect Performance And Coding, Steve Huss-Lederman.

  • A Poly-Algorithm for Parallel Dense Matrix Multiplication on Two-Dimensional Process Grid Topologies , Jin Li [SPEAKER], Anthony Skjellum, Robert D. Falgout

  • Future Research Directions in Scalable Software Libraries, , Anthony Skjellum [SPEAKER], Jin Li, Purushotham V. Bangalore, Andrew Lumsdaine

  • Standard sequential mathematical libraries: Promises and pitfalls, opportunities and challenges, , Andrew Lumsdaine and Anthony Skjellum

  • GEMM-based Level 3 BLAS: High Performance Model Implementations and Performance Evaluation Benchmark , Bo Kagstrom, Per Ling and Charles Van Loan

  • Distributed General Matrix Multiply and Add for a 2D Mesh Processor Network , Bo Kagstrom and Mikael Rannar

  • GEMM-based Level 3 BLAS: Installation, Tuning and Use of the Model Implementations and the Performance Evaluation Benchmark, , Bo Kagstrom, Per Ling and Charles Van Loan

  • Using BLACS and MPI in ScaLAPACK, , R. Clint Whaley