See my Google Scholar page for a closetocomprehensive list of publications and my ResearchGate page for the full text of many. (My DBLP page provides a good list too, organized nicely by year, and with a coauthor index .) 

Years
2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 1990 1989 1988 1987 1986 1985 1984 1983 1982 1979 
Papers
 2020 Harnessing the Computing Continuum for Programming Our World, P., Beckman, J. Dongarra, N. Ferrier, G. Fox, T. Moore, D. Reed, and M. Beck, Fog Computing: Theory and Practice, John Wiley & Sons, Inc., 2020. DOI: 10.1002/9781119551713.ch7 A PDF version is available. Numerical Algorithms for HighPerformance Computational Science, J., Dongarra, L. Grigori, and N. J. Higham, Philosophical Transactions of the Royal Society A, vol. 378, issue 2166, 2020. DOI: 10.1098/rsta.2019.0066 A PDF version is available. FFTECP API and HighPerformance Library Prototype for 2D and 3D FFTs on LargeScale Heterogeneous Systems with GPUs, S., Tomov, A. Ayala, A. Haidar, and J. Dongarra, no. FFTECP STML1327, Innovative Computing Laboratory, University of Tennessee, January 2020. A PDF version is available. Formulation of Requirements for new PAPI++ Software Package: Part I: Survey Results, H., Jagode, A. Danalis, and J. Dongarra, PAPI++ Working Notes, no. No. 1, ICLUT2002, Innovative Computing Laboratory, University of Tennessee Knoxville, January 2020. A PDF version is available. ProjectBased Research and Training in High Performance Data Sciences, Data Analytics, and Machine Learning, K., Wong, S. Tomov, and J. Dongarra, The Journal of Computational Science Education, vol. 11, issue 1, 3644, January 2020. DOI: 10.22369/issn.21534136/11/1/7 A PDF version is available. Performance Tuning SLATE, M., Gates, A. Charara, A. YarKhan, D. Sukkari, M. Al Farhan, and J. Dongarra, SLATE Working Notes, no. 14, ICLUT2001, Innovative Computing Laboratory, University of Tennessee, January 2020. A PDF version is available. Loadbalancing Sparse Matrix Vector Product Kernels on GPUs, H., Anzt, YC. Chen, T. Cojean, J. Dongarra, G. Flegar, R. Nayak, E. S. QuintanaOrti, Y. Tsai, and W. Wang, ACM Transactions on Parallel Computing, issue 2, March 2020. DOI: 10.1145/3380930 A PDF version is available. Asynchronous SGD for DNN Training on SharedMemory Parallel Architectures, F., Lopez, E. Chow, S. Tomov, and J. Dongarra, Innovative Computing Laboratory Technical Report, no. ICLUT2004, University of Tennessee, Knoxville, March 2020. A PDF version is available. Reducing the Amount of outofcore Data Access for GPUAccelerated Randomized SVD, Y., Lu, I. Yamazaki, F. Ino, Y. Matsushita, S. Tomov, and J. Dongarra, Concurrency and Computation: Practice and Experience, April 2020. DOI: 10.1002/cpe.5754 A PDF version is available. Using Arm Scalable Vector Extension to optimize Open MPI, D., Zhong, P. Shamis, Q. Cao, G. Bosilca, and J. Dongarra, 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID 2020), Melbourne, Australia, IEEE/ACM, May 2020. Asynchronous SGD for DNN training on Sharedmemory Parallel Architectures, F., Lopez, E. Chow, S. Tomov, and J. Dongarra, Workshop on Scalable Deep Learning over Parallel And Distributed Infrastructures (ScaDL 2020), May 2020. A PDF version is available. MixedPrecision Solution of Linear Systems Using AcceleratorBased Computing, A., Haidar, H. Bayraktar, S. Tomov, J. Dongarra, and N. J. Higham, Innovative Computing Laboratory Technical Report, no. ICLUT2005, University of Tennessee, May 2020. A PDF version is available. Communication Avoiding 2D Stencil Implementations over PaRSEC TaskBased Runtime, Y., Pei, Q. Cao, G. Bosilca, P. Luszczek, V. Eijkhout, and J. Dongarra, 21st IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2020), New Orleans, LA, IEEE, May 2020. A PDF version is available. Twenty Years of Computational Science, V., Krzhizhanovskaya, G. Závodszky, M. Lees, J. Dongarra, P. Sloot, S. Brissos, and J. Teixeira, International Conference on Computational Science (ICCS 2020), Amsterdam, Netherlands, June 2020. heFFTe: Highly Efficient FFT for Exascale, A., Ayala, S. Tomov, A. Haidar, and J. Dongarra, International Conference on Computational Science (ICCS 2020), Amsterdam, Netherlands, June 2020. Investigating the Benefit of FP16enabled Mixedprecision Solvers for Symmetric Positive Definite Matrices using GPUs, A., Abdelfattah, S. Tomov, and J. Dongarra, International Conference on Computational Science (ICCS 2020), Amsterdam, Netherlands, Elsevier, June 2020. Report on the Fujitsu Fugaku System, J., Dongarra, Innovative Computing Laboratory Technical Report, no. ICLUT2006, University of Tennessee, June 2020. A PDF version is available. Improving the Performance of the GMRES method using MixedPrecision Techniques, N., Lindquist, P. Luszczek, and J. Dongarra, Smoky Mountains Computational Sciences & Engineering Conference (SMC2020), August 2020.  2019 Adaptive Precision in BlockJacobi Preconditioning for Iterative Sparse Linear System Solvers, Anzt, H., J. Dongarra, G. Flegar, N. J. Higham, and E. S. QuintanaOrti, Concurrency and Computation: Practice and Experience, vol. 31, no. 6, pp. e4460, March 2019. DOI: 10.1002/cpe.4460 A PDF version is available. Algorithms and Optimization Techniques for HighPerformance MatrixMatrix Multiplications of Very Small Matrices, Masliah, I., A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin, J. Falcou, and J. Dongarra, Parallel Computing, vol. 81, pp. 121, January 2019. DOI: 10.1016/j.parco.2018.10.003 A PDF version is available. CEED ECP Milestone Report: Performance Tuning of CEED Software and 1st and 2nd Wave Apps: Zenodo, Tomov, S., A. Abdelfattah, V. Barra, N. Beams, J. Brown, JS. Camier, V. Dobrev, J. Dongarra, Y. Dudouit, P. Fischer, et al., October 2019. DOI: 10.5281/zenodo.3477618 A PDF version is available. Characterization of Power Usage and Performance in DataIntensive Applications using MapReduce over MPI, Davis, J., T. Gao, S. Chandrasekaran, H. Jagode, A. Danalis, P. Balaji, J. Dongarra, and M. Taufer, 2019 International Conference on Parallel Computing (ParCo2019), Prague, Czech Republic, September 2019. Checkpointing Strategies for Shared HighPerformance Computing Platforms, Herault, T., Y. Robert, A. Bouteiller, D. Arnold, K. Ferreira, G. Bosilca, and J. Dongarra, International Journal of Networking and Computing, vol. 9, no. 1, pp. 2852, 2019. A PDF version is available. Comparing the Performance of Rigid, Moldable, and GridShaped Applications on FailureProne HPC Platforms, Le Fevre, V., T. Herault, Y. Robert, A. Bouteiller, A. Hori, G. Bosilca, and J. Dongarra, Parallel Computing, vol. 85, pp. 112, July 2019. DOI: 10.1016/j.parco.2019.02.002 A PDF version is available. Counter Inspection Toolkit: Making Sense out of Hardware Performance Events, Danalis, A., H. Jagode, H. Hanumantharayappa, S. Ragate, and J. Dongarra, 11th International Workshop on Parallel Tools for High Performance Computing, Dresden, Germany, Cham, Switzerland: Springer, February 2019. DOI: 10.1007/9783030119874_2 A PDF version is available. Design and Implementation for FFTECP on Distributed Accelerated Systems, Tomov, S., A. Haidar, A. Ayala, D. Schultz, and J. Dongarra, Innovative Computing Laboratory Technical Report, no. ICLUT1905: University of Tennessee, April 2019. A PDF version is available. DistributedMemory Lattice HMatrix Factorization, Yamazaki, I., A. Ida, R. Yokota, and J. Dongarra, The International Journal of High Performance Computing Applications, vol. 33, issue 5, pp. 10461063, August 2019. DOI: 10.1177/1094342019861139 A PDF version is available. An Empirical View of SLATE Algorithms on Scalable Hybrid System, YarKhan, A., J. Kurzak, A. Abdelfattah, and J. Dongarra, Innovative Computing Laboratory Technical Report, no. ICLUT1908: University of Tennessee, Knoxville, September 2019. A PDF version is available. Evaluation of DirectiveBased Performance Portable Programming Models, Lopez, M. G., W. Joubert, V. Larrea, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra, International Journal of High Performance Computing and Networking, vol. 14, issue 2, pp. 165182. DOI: http://dx.doi.org/10.1504/IJHPCN.2017.10009064 A PDF version is available. Evaluation of Programming Models to Address Load Imbalance on Distributed MultiCore CPUs: A Case Study with Block LowRank Factorization, Pei, Y., G. Bosilca, I. Yamazaki, A. Ida, and J. Dongarra, PAWATM Workshop at SC19, Denver, CO, ACM, November 2019. A PDF version is available. Fast Batched Matrix Multiplication for Small Sizes using Half Precision Arithmetic on GPUs, Abdelfattah, A., S. Tomov, and J. Dongarra, 33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019. A PDF version is available. FFTECP Implementation Optimizations and Features Phase, Tomov, S., A. Haidar, A. Ayala, H. Shaiek, and J. Dongarra, Innovative Computing Laboratory Technical Report, no. ICLUT1912: University of Tennessee, October 2019. A PDF version is available. Generic Matrix Multiplication for MultiGPU Accelerated DistributedMemory Platforms over PaRSEC, Herault, T., Y. Robert, G. Bosilca, and J. Dongarra, ScalA'19: 10th Workshop on Latest Advances in Scalable Algorithms for LargeScale Systems, Denver, CO, IEEE, November 2019. A PDF version is available. GPUDirect MPI Communications and Optimizations to Accelerate FFTs on Exascale Systems, Shaiek, H., S. Tomov, A. Ayala, A. Haidar, and J. Dongarra, EuroMPI'19 Posters, Zurich, Switzerland, no. iclut1906: ICL, September 2019. A PDF version is available. Handson Research and Training in HighPerformance Data Sciences, Data Analytics, and Machine Learning for Emerging Environments, Wong, K., S. Tomov, and J. Dongarra, ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019. A PDF version is available. Impacts of MultiGPU MPI Collective Communications on Large FFT Computation, Ayala, A., S. Tomov, X. Luo, H. Shaiek, A. Haidar, G. Bosilca, and J. Dongarra, Workshop on Exascale MPI (ExaMPI) at SC19, Denver, CO, November 2019. A PDF version is available. Increasing Accuracy of Iterative Refinement in Limited FloatingPoint Arithmetic on HalfPrecision Accelerators, Luszczek, P., I. Yamazaki, and J. Dongarra, IEEE High Performance Extreme Computing Conference (HPEC 2019), Best Paper Finalist, Waltham, MA, IEEE, September 2019. A PDF version is available. Least Squares Solvers for DistributedMemory Machines with GPU Accelerators, Kurzak, J., M. Gates, A. Charara, A. YarKhan, and J. Dongarra, ACM International Conference on Supercomputing (ICS '19), Phoenix, Arizona, ACM, pp. 117–126, June 2019. DOI: 10.1145/3324989.3325719 A PDF version is available. Linear Systems Solvers for DistributedMemory Machines with GPU Accelerators, Kurzak, J., M. Gates, A. Charara, A. YarKhan, I. Yamazaki, and J. Dongarra, EuroPar 2019: Parallel Processing, vol. 11725: Springer, pp. 495–506, August 2019. DOI: 10.1007/9783030294007_35 MagmaDNN: Towards HighPerformance Data Analytics and Machine Learning for DataDriven Scientific Computing, Nichols, D., NS. Tomov, F. Betancourt, S. Tomov, K. Wong, and J. Dongarra, ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019. A PDF version is available. Massively Parallel Automated Software Tuning, Kurzak, J., Y. Tsai, M. Gates, A. Abdelfattah, and J. Dongarra, 48th International Conference on Parallel Processing (ICPP 2019), Kyoto, Japan, ACM Press, August 2019. DOI: 10.1145/3337821.3337908 A PDF version is available. Solving Linear Diophantine Systems on Parallel Architectures, Zaitsev, D., S. Tomov, and J. Dongarra, IEEE Transactions on Parallel and Distributed Systems, vol. 30, issue 5, pp. 11581169, May 2019, 2018. DOI: http://dx.doi.org/10.1109/TPDS.2018.2873354 A PDF version is available. Matrix Powers Kernels for ThickRestart Lanczos with Explicit External Deflation, Bai, Z., J. Dongarra, D. Lu, and I. Yamazaki, International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019. A PDF version is available. PAPI SoftwareDefined Events for inDepth Performance Analysis, Jagode, H., A. Danalis, H. Anzt, and J. Dongarra, The International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 11131127, November 2019. A PDF version is available. ParILUT  A Parallel Threshold ILU for GPUs, Anzt, H., T. Ribizel, G. Flegar, E. Chow, and J. Dongarra, IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019. DOI: 10.1109/IPDPS.2019.00033 A PDF version is available. Performance Analysis of Tile LowRank Cholesky Factorization Using PaRSEC Instrumentation Tools, Cao, Q., Y. Pei, T. Herault, K. Akbudak, A. Mikhalev, G. Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, Workshop on Programming and Performance Visualization Tools (ProTools 19) at SC19, Denver, CO, ACM, November 2019. A PDF version is available. Performance of Asynchronous Optimized Schwarz with Onesided Communication, Yamazaki, I., E. Chow, A. Bouteiller, and J. Dongarra, Parallel Computing, vol. 86, pp. 6681, August 2019. DOI: 10.1016/j.parco.2019.05.004 A PDF version is available. PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP, Dongarra, J., M. Gates, A. Haidar, J. Kurzak, P. Luszczek, P. Wu, I. Yamazaki, A. YarKhan, M. Abalenkovs, N. Bagherpour, et al., ACM Transactions on Mathematical Software, vol. 45, issue 2, June 2019. DOI: 10.1145/3264491 DOI A PDF version is available. Progressive Optimization of Batched LU Factorization on GPUs, Abdelfattah, A., S. Tomov, and J. Dongarra, IEEE High Performance Extreme Computing Conference (HPEC’19), Waltham, MA, IEEE, September 2019. A PDF version is available. Race to Exascale, Dongarra, J., S. Gottlieb, and W. T. Kramer, Computing in Science and Engineering, vol. 21, issue 1, pp. 45, March 2019. DOI: 10.1109/MCSE.2018.2882574 A PDF version is available. SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library, Gates, M., J. Kurzak, A. Charara, A. YarKhan, and J. Dongarra, International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Denver, CO, ACM, November 2019. DOI: 10.1145/3295500.3356223 A PDF version is available. SLATE Developers' Guide, Charara, A., M. Gates, J. Kurzak, A. YarKhan, and J. Dongarra, SLATE Working Notes, no. 11, ICLUT1902: Innovative Computing Laboratory, University of Tennessee, December 2019. A PDF version is available. SLATE Mixed Precision Performance Report, Charara, A., J. Dongarra, M. Gates, J. Kurzak, and A. YarKhan, Innovative Computing Laboratory Technical Report, no. ICLUT1903: University of Tennessee, April 2019. A PDF version is available. SLATE Users' Guide, Gates, M., A. Charara, J. Kurzak, and J. Dongarra, SLATE Working Notes, no. 10, ICLUT1901: Innovative Computing Laboratory, University of Tennessee, January 2019. SLATE Working Note 12: Implementing Matrix Inversions, Kurzak, J., M. Gates, A. Charara, A. YarKhan, and J. Dongarra, SLATE Working Notes, no. 12, ICLUT1904: Innovative Computing Laboratory, University of Tennessee, June 2019. A PDF version is available. SLATE Working Note 13: Implementing Singular Value and Symmetric/Hermitian Eigenvalue Solvers, Gates, M., M. Al Farhan, A. Charara, J. Kurzak, D. Sukkari, A. YarKhan, and J. Dongarra, SLATE Working Notes, no. 13, ICLUT1907: Innovative Computing Laboratory, University of Tennessee, September 2019. A PDF version is available. SoftwareDefined Events through PAPI, Danalis, A., H. Jagode, T. Herault, P. Luszczek, and J. Dongarra, 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, IEEE, May 2019. DOI: 10.1109/IPDPSW.2019.00069 A PDF version is available. Towards Continuous Benchmarking, Anzt, H., Y. Chen Chen, T. Cojean, J. Dongarra, G. Flegar, P. Nayak, E. S. QuintanaOrti, Y. M. Tsai, and W. Wang, Platform for Advanced Scientific Computing Conference (PASC 2019), Zurich, Switzerland, ACM Press, June 2019. DOI: 10.1145/3324989.3325719 A PDF version is available. Towards HalfPrecision Computation for Complex Matrices: A Case Study for Mixed Precision Solvers on GPUs, Abdelfattah, A., S. Tomov, and J. Dongarra, ScalA19: 10th Workshop on Latest Advances in Scalable Algorithms for LargeScale Systems, Denver, CO, IEEE, November 2019. A PDF version is available. What it Takes to keep PAPI Instrumental for the HPC Community, Jagode, H., A. Danalis, and J. Dongarra, 1st Workshop on Sustainable Scientific Software (CW3S19), Collegeville, Minnesota, July 2019. A PDF version is available.  2018 Analyzing Performance of BiCGStab with Hierarchical Matrix on GPU Clusters, Yamazaki, I., A. Abdelfattah, A. Ida, S. Ohshima, S. Tomov, R. Yokota, and J. Dongarra, IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, BC, Canada, IEEE, May 2018. https://www.semanticscholar.org/paper/AnalyzingPerformanceofBiCGStabwithHierarchicalYamazakiAbdelfattah/9f5a5449a04f09fb8b27a106f363dc5a5035a1b9A pdf version is available. Do moldable applications perform better on failureprone HPC platforms? Le F�vre, V., G. Bosilca, A. Bouteiller, T. Herault, A. Hori, Y. Robert, and J. Dongarra, 11th Workshop on Resiliency in High Performance Computing in Clusters, Clouds, and Grids, Turin, Italy, Springer Verlag, August 2018. https://link.springer.com/chapter/10.1007/9783030105495_61 A pdf version is available. ADAPT: An EventBased Adaptive Collective Communication Framework, Luo, X., W. Wu, G. Bosilca, T. Patinyasakdikul, L. Wang, and J. Dongarra, The 27th International Symposium on HighPerformance Parallel and Distributed Computing (HPDC '18), Tempe, Arizona, ACM Press, June 2018, http://dx.doi.org/10.1145/3208040.3208054. A pdf version is available. Optimal Cooperative Checkpointing for Shared HighPerformance Computing Platforms, Herault, T., Y. Robert, A. Bouteiller, D. Arnold, K. Ferreira, G. Bosilca, and J. Dongarra, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Best Paper Award, Vancouver, BC, Canada, IEEE, May 2018. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8425494 A pdf version is available. Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up MixedPrecision Iterative Refinement Solvers, Haidar, A., S. Tomov, J. Dongarra, and N. J. Higham, International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Dallas, TX, IEEE, November 2018. DOI: 10.1109/SC.2018.00050 https://dl.acm.org/citation.cfm?id=3291719 A pdf version is available. The Design of Fast and EnergyEfficient Linear Solvers: On the Potential of HalfPrecision Arithmetic and Iterative Refinement Techniques, Haidar, A., A. Abdelfattah, M. Zounon, P. Wu, S. Pranesh, S. Tomov, and J. Dongarra, International Conference on Computational Science (ICCS 2018), vol. 10860, Wuxi, China, Springer, pp. 586–600, June 2018, https://doi.org/10.1007/9783319936987_45 A pdf version is available. VariableSize Batched Condition Number Calculation on GPUs, Anzt, H., J. Dongarra, G. Flegar, and T. Gruetzmacher, SBACPAD, Lyon, France, September 2018. https://ieeexplore.ieee.org/document/8645907 A pdf version is available. A Jaccard Weights Kernel Leveraging Independent Thread Scheduling on GPUs, Anzt, H. and J. Dongarra, SBACPAD, Lyon, France, September 2018. https://ieeexplore.ieee.org/document/8645946 A pdf version is available. Symmetric Indefinite Linear Solver using OpenMP Task on Multicore Architectures, Yamazaki, I., J. Kurzak, P. Wu, M. Zounon, and J. Dongarra, IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 8, pp. 1879–1892, August 2018, http://dx.doi.org/10.1109/TPDS.2018.2808964. A pdf version is available. Computational Benefit of GPU Optimization for Atmospheric Chemistry Modeling, Sun, J., J. Fu, J. Drake, Q. Zhu, A. Haidar, M. Gates, S. Tomov, and J. Dongarra, Journal of Advances in Modeling Earth Systems, vol. 10, issue 8, pp. 1952–1969, August 2018, https://doi.org/10.1029/2018MS001276. A pdf version is available. Evaluation of Dataflow Programming Models for Electronic Structure Theory, Jagode, H., A. Danalis, R. Hoque, M. Faverge, and J. Dongarra, Concurrency and Computation: Practice and Experience: Special Issue on Parallel and Distributed Algorithms, vol. 2018, issue e4490, pp. 1–20, May 2018, https://doi.org/10.1002/cpe.4490. A pdf version is available. Accelerating NWChem Coupled Cluster through DataflowBased Execution, Jagode, H., A. Danalis, and J. Dongarra, International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 540–551, July 2018, https://doi.org/10.1007/9783319321493_35. A pdf version is available. Investigating Power Capping toward EnergyEfficient Scientific Applications, Haidar, A., H. Jagode, P. Vaccaro, A. YarKhan, S. Tomov, and J. Dongarra, Concurrency Computation: Practice and Experience, vol. 2018, issue e4485, pp. 1–14, April 2018, http://dx.doi.org/10.1002/cpe.4485. A pdf version is available. A Guide for Achieving High Performance with Very Small Matrices on GPUs: A Case Study of Batched LU and Cholesky Factorizations, Haidar, A., A. Abdelfattah, M. Zounon, S. Tomov, and J. Dongarra, IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 5, pp. 973–984, May 2018, https://doi.org/10.1109/TPDS.2017.2783929. A pdf version is available. Accelerating the SVD Two Stage Bidiagonal Reduction and Divide and Conquer Using GPUs, Gates, M., S. Tomov, and J. Dongarra, Parallel Computing, vol. 74, pp. 3–18, May 2018, http://dx.doi.org/10.1016/j.parco.2017.10.004. A pdf version is available. The 30th Anniversary of the Supercomputing Conference: Bringing the Future Closer—Supercomputing History and the Immortality of Now, Dongarra, J., V. Getov, and K. Walsh, Computer, vol. 51, issue 10, pp. 74–85, November 2018, http://dx.doi.org/10.1109/MC.2018.3971352. A pdf version is available. Autotuning Numerical Dense Linear Algebra for Batched Computation With GPU Hardware Accelerators, Dongarra, J., M. Gates, J. Kurzak, P. Luszczek, and Y. Tsai, Proceedings of the IEEE, vol. 106, issue 11, pp. 2040–2055, November 2018, http://dx.doi.org/10.1109/JPROC.2018.2868961. A pdf version is available. A Failure Detection for HPC Platforms, G. Bosilca, A. Bouteiller, A. Guermouche, T. Herault, Y. Roberts, P. Sens, and J. Dongarra, International Journal of High Performance Computing Applications, Volume 32 Issue 1, January 2018, pp 139158, http://journals.sagepub.com/doi/10.1177/1094342017711505 A pdf version is available. Accelerating the SVD Bidiagonalization of a Batch of Small Matrices using GPUs, Tingxing Dong Azzam Haidar Stanimire Tomov Jack Dongarra, Journal of Computational Science, January 2018, https://doi.org/doi:10.1016/j.jocs.2018.01.007 A pdf version is available. Adaptive Precision in BlockJacobi Preconditioning for Iterative Sparse Linear System Solvers, H. Anzt, J. Dongarra, G. Flegar, N. Higham, E. QuintanaOrti, Concurrency and Computation: Practice and Experience, http://dx.doi.org/10.1002/cpe.4460, January, 2018. A pdf version is available. A Guide for Achieving High Performance With Very Small Matrices On GPU: A case Study of Batched LU and Cholesky Factorizations, Azzam Haidar, Ahmad Abdelfattah, Mawussi Zounon, Stanimire Tomov, Jack Dongarra, IEEE Transactions on Parallel and Distributed Systems, Vol. 29, No. 5, May 2018, DOI: 10.1109/TPDS.2017.2783929 A pdf version is available. Investigating Power Capping toward EnergyEfficient Scientific Applications, A. Haidar, H. Jagode, A. YarKhan, P. Vaccaro, S. Tomov, J. Dongarra, Concurrency and Computations: Practice and Experience, February 2018, DOI: 10.1002/cpe.4485. A pdf version is available. Evaluation of Dataflow Programming Models for Electronic Structure Theory, H. Jagode, A. Danalis, R. Hoque, M. Faverge, and J. Dongarra, Concurrency and Computations: Practice and Experience, vol. 2018, issue e4490, pp. 120, May 2018. https://doi.org/10.1002/cpe.4490 A pdf version is available. Big Data and ExtremeScale Computing: Pathways to ConvergenceToward a Shaping Strategy for a Future Software and Data Ecosystem for Scientific Inquiry, M. Asch, et al., International Journal of High Performance Computing Applications, Volume 32 Issue 4, Fall 2018, pp 435479. doi.org/10.1177/1094342018778123 A pdf version is available. PARILUT  A New Parallel Threshold ILU Factorization, H. Anzt, E. Chow, J. Dongarra, SIAM SISC, Vol 40 No 4, pp C503C519. https://doi.org/10.1137/16M1079506 A pdf version is available. Autotuning in HighPerformance Computing Applications, Prasanna Balaprakash, Jack Dongarra, Todd Gamblin, Mary Hall, Jeffrey K. Hollingsworth, Boyana Norris, and Richard Vuduc, IEEE Proceedings, August 2018. DOI:10.1109/JPROC.2018.2841200 A pdf version is available. Evaluation of Directivebased Performance Portable Programming Models, M. Graham Lopez, Wayne Joubert, Veronica Vergara Larrea, Oscar Hernandez, Azzam Haidar, Stanimire Tomov, Jack Dongarra, Int. J. Signal and Imaging Systems Engineering, Vol. x, No. x, 2017 , 2017. DOI: 10.1504/IJHPCN.2017.10009064 A pdf version is available. Accelerating the SVD Two Stage Reduction and DivideandConquer Using GPUs, Mark Gates, Stanimire Tomov, Jack Dongarra, Parallel Computing, accepted November 2017. A pdf version is available. Batched Onesided Factorizations of Tiny Matrices using GPUs: Challenges and Countermeasure, Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, and Jack Dongarra, Journal of Computational Science, Volume 26, May 2018, pp 226236. https://doi.org/10.1016/j.jocs.2018.01.005 A pdf version is available. Symmetric Indefinite Linear Solver using OpenMP Task on Manycore Architecture, I Yamazaki, J. Kurzak, P. Wu, Z. Mawussi, J. Dongarra, IEEE Transactions on Parallel and Distributed Systems, Volume: 29, Issue: 8, Aug. 1 2018. 10.1109/TPDS.2018.2808964 A pdf version is available. The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Exascale, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, I. Yamazaki, SIAM Review, vol. 60, issue 4, pp. 808–865, November 2018, https://doi.org/10.1137/17M1117732. A pdf version is available. PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, P. Wu, I. Yamazaki, A. YarKhan, M. Abalenkovs, N. Bagherpour, S. Hammarling, J. Sistek, Accepted in ACM TOMS July 2018. A pdf version is available. Using Jacobi Iterations and Blocking for Solving Sparse Triangular Systems in Incomplete Factorization Preconditioning, Edmond Chow, Hartwig Anzt, Jennifer Scott, Jack Dongarra, Journal of Parallel and Distributed Computing, 119, pp 219230, 2018. https://doi.org/10.1016/j.jpdc.2018.04.017 A pdf version is available. Analysis and Design Techniques towards HighPerformance and EnergyEfficient Dense Linear Solvers on GPUs”, IEEE Transaction on Parallel and Distributed Systems, Accepted May 2018. 10.1109/TPDS.2018.2842785 A pdf version is available. Computational Benefit of GPU Optimization for the Atmospheric Chemistry Modeling, Jian Sun, Joshua Fu, John Drake, Qingzhao Zhu, Azzam Haidar, Mark Gates, Stanimire Tomov, Jack Dongarra, Journal of Advances in Modeling Earth Systems, https://doi.org/10.1029/2018MS001276 A pdf version is available. Analyzing Performance of BiCGStab with Hierarchicalmatrix on GPU clusters, Ichitaro Yamazaki, Ahmad Abdelfattah, Akihiro Ida, Satoshi Ohshima, Stanimire Tomov, Rio Yokota and Jack Dongarra, IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, British Columbia, Canada, IEEE, May 2018. A pdf version is available. Optimal Cooperative Checkpointing for Shared HighPerformance Computing Platforms, T. Herault, Y. Robert, A. Bouteiller, D. Arnold, K. Ferreira, G. Bosilca, J. Dongarra, APDCM Workshop at IPDPS 2018, Best paper award. A pdf version is available. ADAPT: An EventBased Adaptive Collective Communication Framework, Wu, W., G. Bosilca, X. Luo, T. Patinyasakdikul, L. Wang, and J. Dongarra, Proceedings of the 27th International Symposium on HighPerformance Parallel and Distributed Computing  HPDC '18, Tempe, Arizona, ACM Press, June 2018. 10.1145/3208040.3208054 A pdf version is available. Harnessing GPU's Tensor Cores Fast FP16 Arithmetic to Speedup MixedPrecision Iterative Refinement Solvers,” Tomov, Azzam, Dongarra, Higham, submitted to SC18. A pdf version is available. Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization, Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack Dongarra, accepted in IEEE HPEC September 2018, Waltham, MA. A pdf version is available. Do moldable applications perform better on failureprone HPC platforms?, EuroPar 2018, Resilience Workshop, Turin Italy, Accepted June 2018. A pdf version is available. Incomplete Sparse Approximate Inverses for Parallel Preconditioning, H. Anzt, T. Huckle, J. Brackle , J. Dongarra, Parallel Computing, Volume 71, January 2018, Pages 122, doi.org/10.1016/j.parco.2017.10.003. A pdf version is available.  2017 A Look Back on 30 Years of the Gordon Bell Prize, Gordon Bell, David Bailey, Alan H. Karp, Jack Dongarra, Kevin Walsh, International Journal of High Performance Computing and Networking, 2017, Vol. 31(6) 469–484, DOI: 10.1177/1094342017738610A pdf version is available. The Design and Performance of Batched BLAS on Modern HighPerformance Computing Systems, Jack Dongarra, Sven Hammarling, Nick Higham, Samuel Relton, Pedro ValeroLara, and Mawussi Zounon, ICCS’17, ETH Zurich, Procedia Computer Science, Volume 108, 2017, Pages 495504, DOI:10.1016/j.procs.2017.05.138 A pdf version is available. Autotuning Batch Cholesky Factorization in CUDA with Interleaved Layout of Matrices, Mark Gates, Jakub Kurzak, Piotr Luszczek, Yu Pei and Jack Dongarra, iWAPT 2017 at IPDPS 2017. A pdf version is available. VariableSize Batched LU for Small Matrices and its Integration into BlockJacobi Preconditioning, Hartwig Anzt, Jack Dongarra, Goran Flegar and Enrique S. QuintanaOrti, 2017 46th International Conference on Parallel Processing (ICPP), August 2017, pp 91100, DOI: 10.1109/ICPP.2017.18 A pdf version is available. Out of Memory SVD Solver for Big Data, Azzam Haidar, Khairul Kabir, Diana Fayad, Stanimire Tomov, Jack Dongarra, 2017 IEEE High Performance Extreme Computing Conference. September 2017, pp 17, DOI: 10.1109/HPEC.2017.8091029 A pdf version is available. Towards Numerical Benchmark for HalfPrecision Floating Point Arithmetic, Piotr Luszczek, Jakub Kurzak, Ichitaro Yamazaki and Jack Dongarra, 2017 IEEE High Performance Extreme Computing Conference, Boston, 2017, DOI: 10.1109/HPEC.2017.8091031 A pdf version is available. Sampling Algorithms to Update Truncated SVD, Ichitaro Yamazaki, Stanimire Tomov and Jack Dongarra, accepted at the IEEE Big Data 2017 Conference, Boston MA, December 1116, 2017. A pdf version is available. Flexible Batched Sparse MatrixVector Product on GPUs, H. Anzt, G. Collins, J. Dongarra, G. Flegar, and E. S. QuintanaOrti, The 8th Workshop on Latest Advances in Scalable Algorithms for LargeScale Systems (ScalA ‘17), Denver, Colorado, ACM Press, November 2017. A pdf version is available. PowerAware Computing: Measurement, Control, and Performance Analysis for Intel Xeon Phi, Azzam Haidar, Heike Jagode, Asim Yarkhan, Phil Vaccaro, Stanimire Tomov and Jack Dongarra, 2017 IEEE High Performance Extreme Computing Conference (HPEC), September 2017, DOI: 10.1109/HPEC.2017.8091085 A pdf version is available. Scaling Point Set Registration in 3D Across Thread Counts on Multicore and Hardware Accelerator Platforms through Autotuning for Large Scale Analysis of Scientific Point Clouds, Piotr Luszczek, Jakub Kurzak, Ichitaro Yamazaki, David Keffer, and Jack Dongarra, accepted in IEEE International Workshop on Benchmarking, Performance Tuning and Optimization for Big Data Applications (BPOD 2017), December 2017, Boston, MA. A pdf version is available. Optimized Batched Linear Algebra for Modern Architectures, J. Dongarra, S. Hammarling, N. J. Higham, S.D. Relton, and M. Zounon, In EuroPar 2017: Parallel Processing, F.F. Rivera, T.F. Pena, and J.C. Cabaleiro, editors, volume 10417 of Lecture Notes in Computer Science, SpringerVerlag, Cham, 2017, pages 511522. DOI: 10.1007/9783319642031_37. A pdf version is available. VariableSize Batched GaussHuard for BlockJacobi Preconditioning, Hartwig Anzt, Jack Dongarra, Goran Flegar, Enrique S. QuintanaOrt�, Andr�s E. Tom�s, Procedia Computer Science, Volume 108, pp 1783  1792, 2017, International Conference on Computational Science, ICCS 2017, 1214 June 2017, Zurich, Switzerland, ISSN 18770509, DOI:10.1016/j.procs.2017.05.186. A pdf version is available. Bidiagonalization and RBidiagonalization: Parallel Tiled Algorithms, Critical Paths and DistributedMemory Implementation, Mathieu Faverge, Julien Langou, Yves Robert, Jack J. Dongarra, 2017 IPDPS Conference, DOI:10.1109/IPDPS.2017.46 A pdf version is available. Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures, Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, and Jack Dongarra, ICCS’17, ETH Zurich, Procedia Computer Science, Volume 108, 2017, Pages 606615, DOI:10.1016/j.procs.2017.05.250. A pdf version is available. Improving Performance of GMRES by Reducing Communication and Pipelining Global Collectives, I. Yamazaki, M. Hoemmen, P. Luszczek, J. Dongarra, IPDPS Workshop PDSEC2017, Workshop Best Paper Award, 2017. DOI: 10.1109/IPDPSW.2017.65 A pdf version is available. Novel HPC Techniques to Batch Execution of Many Variable Size BLAS Computations on GPUs, Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, and Jack Dongarra, ICS’17, Frankfurt, ISBN: 9781450350204, DOI:10.1145/3079079.3079103. A pdf version is available. Bringing High Performance Computing to Big Data Algorithms, H. Anzt, J. Dongarra, M. Gates, J. Kurzak , P. Luszczek, S. Tomov, I. Yamazaki in Handbook of Big Data Technologies, Editors: Albert Y. Zomaya, Sherif Sakr, ISBN: 9783319493398 (Print) 9783319493404 (Online), DOI:10.1007/9783319493404, Springer, 2017. A pdf version is available. Preconditioned Krylov solvers on GPUs, Hartwig Anzt, Mark Gates, Jack Dongarra, Moritz Kreutzerd, Gerhard Welleind, Martin K�hlere, Parallel Computing, DOI:10.1016/j.parco.2017.05.006, June 2017. A pdf version is available. Scaling Point Set Registration in 3D Across Thread Counts on Multicore and Hardware Accelerator Platforms through Autotuning for Large Scale Analysis of Scientific Point Clouds, Piotr Luszczek, Jakub Kurzak, Ichitaro Yamazaki, David Keffer, and Jack Dongarra, accepted in IEEE International Workshop on Benchmarking, Performance Tuning and Optimization for Big Data Applications (BPOD 2017), December 2017, Boston, MA. DOI: 10.1109/BigData.2017.8258258 A pdf version is available. VariableSize Batched GaussJordan Elimination for BlockJacobi Preconditioning on Graphics Processors Parallel Computing, H. Anzt, G. Flegar, J. Dongarra, E, Qunintana Otri, Parallel Computing, doi.org/10.1016/j.parco.2017.12.006. A pdf version is available. Preconditioned Krylov solvers on GPUs, Hartwig Anzt, Mark Gates, Jack Dongarra, Moritz Kreutzerd, Gerhard Welleind, Martin K�hlere, Parallel Computing, DOI:10.1016/j.parco.2017.05.006, June 2017. A pdf version is available. Evaluation of Directivebased Performance Portable Programming Models,“ M. Graham Lopez, Wayne Joubert, Veronica Vergara Larrea, Oscar Hernandez, Azzam Haidar, Stanimire Tomov, Jack Dongarra, International Journal of High Performance Computing and Networking, accepted May 2017. A pdf version is available. A Framework for Out of Memory Algorithms, K. Kabir, A. Haidar, S. Tomov, A. Bouteiller, J. Dongarra, in Kunkel J., Yokota R., Balaji P., Keyes D. (eds) High Performance Computing, ISC 2017. Lecture Notes in Computer Science, vol 10266. Springer, Frankfurt, Germany, June 1921, 2017, DOI:10.1007/9783319586670_9 A pdf version is available. Batched GaussJordan Elimination for BlockJacobi Preconditioner Generation on GPUs, Hartwig Anzt, Jack Dongarra, Goran Flegar and Enrique S. QuintanaOrti, Proceeding PMAM'17 Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, Pages 110, Austin, TX, USA — February 04  08, 2017, ISBN: 9781450348836 DOI:10.1145/3026937.3026940 A pdf version is available. HighPerformance Cholesky Factorization for GPUOnly Execution, Azzam Haidar, Ahmad Abdelfattah, Stanimire Tomov and Jack Dongarra, Proceeding GPGPU10 Proceedings of the General Purpose GPUs, Pages 4252 Austin, TX, USA — February 04  08, 2017, DOI:10.1145/3038228.3038237 A pdf version is available. Updating Incomplete Factorization Preconditioners for Model Order Reduction, Hartwig Anzt, Edmond Chow, Jens Saak, and Jack Dongarra, Numerical Algorithms, November 2016, Volume 73, Issue 3, pp 611–630, DOI:10.1007/s1107501601102 A pdf version is available. Accelerating NWChem Coupled Cluster through dataflowbased Execution, A. Danalis, H. Jagode, and J. Dongarra, The International Journal of High Performance Computing Applications, 2017, DOI:10.1177/1094342016672543 A pdf version is available. On the Performance and Energy Efficiency of Sparse Linear Algebra on GPU, Hartwig Anzt, Stanimire Tomov, and Jack Dongarra, International Journal of High Performance Computing, 2017, DOI:10.1177/1094342016672081 A pdf version is available. Solving Dense Symmetric Indefinite Systems using GPUs, M. Baboulin, J. Dongarra, A. Remy, S. Tomov, I. Yamazaki, Concurrency and Computation: Practice and Experience, 2017, DOI:10.1002/cpe.4055 A pdf version is available. Finegrained BitFlip Protection for Relaxation Methods, H. Anzt, J. Dongarra, and E QuintanaOrti, the Journal of Computational Science, 2017, DOI:10.1016/j.jocs.2016.11.013 A pdf version is available. Fast Cholesky Factorization on GPUs for Batch and Native Modes in MAGMA, Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, and Jack Dongarra, Journal of Computational Science, Volume 20, May 2017, Pages 85–93 DOI:10.1016/j.jocs.2016.12.009 A pdf version is available. With Extreme Computing, the Rules Have Changed, Jack Dongarra, Stanimire Tomov, Piotr Luszczek, Jakub Kurzak, Mark Gates, Ichitaro Yamazaki, Hartwig Anzt, Azzam Haidar, and Ahmad Abdelfattah, IEEE CISE, April 2017, DOI:10.1109/MCSE.2017.48 A pdf version is available. Structureaware Linear Solver for Realtime Convex Optimization for Embedded Systems," I. Yamazaki, S. Tomov, J. Dongarra, IEEE Embedded Systems Letters, May 2017, DOI: 10.1109/LES.2017.2700401 A pdf version is available. Design and Implementation of the PULSAR Programming System for Large Scale Computing, J. Kurzak, P. Luszczek, I. Yamazaki, Y. Robert, J. Dongarra, Supercomputing Frontiers and Innovations, 2017, DOI:10.14529/jsfi170101 A pdf version is available. Bringing High Performance Computing to Big Data Algorithms, H. Anzt, J. Dongarra, M. Gates, J. Kurzak , P. Luszczek, S. Tomov, I. Yamazaki in Handbook of Big Data Technologies Editors: Albert Y. Zomaya, Sherif Sakr, ISBN: 9783319493398 (Print) 9783319493404 (Online), DOI:10.1007/9783319493404, Springer, 2017. A pdf version is available. Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices, Tingxing Dong, Azzam Haidar, Stanimire Tomov and Jack Dongarra, ICCS’17, ETH Zurich, Procedia Computer Science, Volume 108, 2017, Pages 1008–1018, DOI:10.1016/j.procs.2017.05.237 A pdf version is available. Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures, Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, and Jack Dongarra, ICCS’17, ETH Zurich, Procedia Computer Science, Volume 108, 2017, Pages 606615, DOI:10.1016/j.procs.2017.05.250 A pdf version is available. The Design and Performance of Batched BLAS on Modern HighPerformance Computing Systems, Jack Dongarra, Sven Hammarling, Nick Higham, Samuel Relton, Pedro ValeroLaraand Mawussi Zounon, ICCS’17, ETH Zurich, Procedia Computer Science, Volume 108, 2017, Pages 495504, DOI:10.1016/j.procs.2017.05.138 A pdf version is available. Novel HPC Techniques to Batch Execution of Many Variable Size BLAS Computations on GPUs, Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov and Jack Dongarra, ICS 2017 Chicago, June 14 2017, DOI:10.1145/3079079.3079103 A pdf version is available. Bidiagonalization and RBidiagonalization: Parallel Tiled Algorithms, Critical Paths and DistributedMemory Implementation, Mathieu Faverge, Julien Langou, Yves Robert, Jack J. Dongarra, 2017 IPDPS Conference, DOI:10.1109/IPDPS.2017.46 A pdf version is available. VariableSize Batched GaussHuard for BlockJacobi Preconditioning, Hartwig Anzt, Jack Dongarra, Goran Flegar, Enrique S. QuintanaOrt�, Andr�s E. Tom�s, Procedia Computer Science, Volume 108, pp 1783  1792, 2017, International Conference on Computational Science, ICCS 2017, 1214 June 2017, Zurich, Switzerland, ISSN 18770509, DOI:10.1016/j.procs.2017.05.186. A pdf version is available. Batched GaussJordan Elimination for BlockJacobi Preconditioner Generation on GPUs, Hartwig Anzt, Jack Dongarra, Goran Flegar and Enrique S. QuintanaOrti, accepted PMAM 2017, December 2016. A pdf version is available.  2016 Linear algebra software for largescale accelerated multicore computing, A. Abdelfattah, H. Anzt, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, I., Yamazaki and A. YarKhan, Acta Numerica / Volume 25 / May 2016, pp 1  160, DOI: 10.1017/S0962492916000015.A pdf version is available. Report on the Sunway TaihuLight System, Jack Dongarra, University of Tennessee, Department of Electrical Engineering and Computer Science Tech Report UTEECS16742, June 2016. A pdf version is available. Sunway TaihuLight Supercomputer Makes Its Appearance, Jack Dongarra, The National Science Review 2016 3: 265266, September 2016, DOI: 10.1093/nsr/nww044. A pdf version is available. Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU, I. Yamazaki, S. Tomov, and J. Dongarra, ACM Transactions on Mathematical Software (TOMS), Volume 43 Issue 2, September 2016 DOI:>10.1145/2898347 A pdf version is available. On the Performance and Energy Efficiency of Sparse Linear Algebra on GPUs, H. Anzt, S. Tomov, and J. Dongarra, The International Journal of High Performance Computing Applications, DOI: 10.1177/1094342016672081. A pdf version is available. Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs, A. Abdelfattah, A. Haidar, S. Tomov, and J. Dongarra, International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016. A pdf version is available. HighPerformance Tensor Contractions for GPUs, A. Abdelfattah, M. Baboulin , V. Dobrev, J. Dongarra , C. Earl , J. Falcou , A. Haidar , I. Karlin , T. Kolev , I. Masliah, International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016 A pdf version is available. Efficiency of General Krylov Methods on GPUs – An Experimental Study, Hartwig Anzt, Jack Dongarra, Moritz Kreutzer, Gerhard Wellein, Martin K�hler, AsHES Workshop, IPDPS, 2016. A pdf version is available. On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures, Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack Dongarra, The 17th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2016), IPDPS 2016, Chicago, IL, IEEE, May 2016. A pdf version is available. GPUAware Noncontiguous Data Movement In Open MPI, W. Wu, G. Bosilca, R. vandeVaart, S. Jeaugey, and J. Dongarra, The 25th International Symposium on High Performance Distributed Computing (HPDC2016). A pdf version is available. Creating a Standardised Set of Batched BLAS Routines, Jack Dongarra, Sven Hammarling, Nicholas J. Higham, Samuel D. Relton, Pedro ValeroLara and Mawussi Zounon, in the Proceedings of the Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4, 2016), Gabrielle Allen, Jeffrey Carver et al, volume 1686, CEUR Workshop Proceedings, http://ceurws.org/Vol1686/WSSSPE4_paper_3.pdf. A pdf version is available. Hessenberg Reduction with Transient Error Resilience on GPUBased Hybrid Architectures, Y. Jai, P. Luszczek, and J. Dongarra, The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES) 2016, May 2016, Chicago. DOI: 10.1109/IPDPSW.2016.34 A pdf version is available. NonGPUresident Dense Symmetric Indefinite Factorization, I. Yamazaki, S. Tomov, and J. Dongarra, Concurrency and Computation: Practice and Experience, DOI: 10.1002/cpe.4012, November 2016. A pdf version is available. A New Metric for Ranking High Performance Computing Systems, Jack Dongarra, Michael A. Heroux, and Piotr Luszczek, National Science Review, Volume 3, Issue 1, March 2016, pp 3035, DOI: 10.1093/nsr/nwv084. A pdf version is available. Assessing the Cost of Redistribution followed by a Computational Kernel: Complexity and Performance Results, Julien Herrmann, George Bosilca, Thomas H�rault, Loris Marchal, Yves Robert, and Jack Dongarra, Parallel Computing, Volume 52, February 2016, pp. 22–41, DOI: 10.1016/j.parco.2015.09.005. A pdf version is available. Optimization and Performance Evaluation of the IDR Iterative Krylov Solver on GPUs, Hartwig Anzt, Moritz Kreutzer, Eduardo Ponce, Gregory D. Peterson, Gerhard Wellein, Jack Dongarra, The International Journal of High Performance Computing Applications, 1–11, 2016, DOI: 10.1177/1094342016646844 A pdf version is available. Experiences in Autotuning Matrix Multiplication for Energy Minimization on GPUs, Anzt, H., B. Haugen, J. Kurzak, P. Luszczek, and J. Dongarra, Concurrency in Computation: Practice and Experience, vol. 27, issue 17, pp. 50965113, DOI: 10.1002/cpe.3516. A pdf version is available. High Performance Conjugate Gradient Benchmark: A new Metric for Ranking High Performance Computing Systems,”J. Dongarra, M. Heroux, P. Luszczek, The International Journal of High Performance Computing Applications, Volume 30 Issue 1, Spring 2016. DOI: 10.1177/1094342015593158. A pdf version is available. Assessing the Cost of Redistribution followed by a Computational Kernel: Complexity and Performance Results, Herrmann, J., G. Bosilca, T. Herault, L. Marchal, Y. Robert, and J. Dongarra, Parallel Computing, vol. 52, pp. 2241, February 2016. DOI: 10.1016/j.parco.2015.09.005. A pdf version is available. Updating Incomplete Factorization Preconditioners for Model Order Reduction, Hartwig Anzt, Edmond Chow, Jens Saak, and Jack Dongarra, accepted in Numerical Algorithms, January 2016. A pdf version is available. Stability and Performance of Various Singular Value QR Implementations and Casestudies with Adaptive Mixed Precision on Multicore CPU with GPUs, Ichitaro Yamazaki, Stanimire Tomov, and Jack Dongarra, Accepted TOMS, February 2016. A pdf version is available. Performance Optimization of Sparse MatrixVector Multiplication for Multicomponent PDEbased Applications using GPUs, Ahmad Ahmad, Hatem Ltaief, David Keyes, and Jack Dongarra, accepted Concurrency and Computation: Practice and Experience, April 2016. A pdf version is available. Porting the PLASMA Numerical Library to the OpenMP Standard, Asim YarKhan, Jakub Kurzak, Piotr Luszczek, and Jack Dongarra, accepted in International Journal of Parallel Programming, May 2016. A pdf version is available. Domain Overlap for Iterative Sparse Triangular Solves on GPUs, Hartwig Anzt, Edmond Chow, Daniel Szyld, and Jack Dongarra, Software for Exascale Computing, Leibniz Supercomputing Centre, Munich, Germany, Volume 113 of the series Lecture Notes in Computational Science and Engineering pp 527545, Jan 25–27, 2016. DOI: 10.1007/9783319405285_24 A pdf version is available. Performance, Design, and Autotuning of Batched GEMM for GPUs, Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack Dongarra, High Performance Computing, Volume 9697 of the series Lecture Notes in Computer Science pp 2138, 2016, DOI: 10.1007/9783319413211_2 A pdf version is available. Accelerating the Conjugate Gradient Algorithm with GPU in CFD Simulations, Hartwig Anzt, Marc Baboulin, Jack Dongarra, Yvan Fournier, Frank Hulsemann, Amal Khabou and Yushan Wang, VECPAR 2016. A pdf version is available. TaskBased Cholesky Decomposition on Knights Corner using OpenMP, Joseph Dorris, Jakub Kurzak, Piotr Luszczek, Asim Yarkhan, Jack Dongarra, Awarded the Best Paper Award at the P^3MA workshop colocated with ISC, High Performance Computing, Volume 9945 of the series Lecture Notes in Computer Science pp 544562, DOI: 10.1007/9783319460796_37 A pdf version is available. LU, QR, and Cholesky Factorizations: Programming Model, Performance Analysis and Optimization Techniques for the Intel Knights Landing Xeon Phi, Azzam Haidar, Stanimire Tomov, Konstantin Arturov, Murat Guney, Shane Story, Jack Dongarra, 2016 IEEE High Performance Extreme Computing Conference (HPEC ‘16) Twentieth Annual HPEC Conference 13  15 September 2016, Waltham, MA USA. A pdf version is available. Performance Analysis and Acceleration of Explicit Integration for Large Kinetic Networks using Batched GPU Computations, A. Haidar, B. Brock, S. Tomov, M. Guidry, J. Billings, D. Shyles, J. Dongarra, 2016 IEEE High Performance Extreme Computing Conference (HPEC ‘16), September 1315, 2016. A pdf version is available. Failure Detection and Propagation in HPC systems, George Bosilca, Aurelien Bouteiller, Amina Guermouche, Thomas Herault, Yves Robert, Pierre Sens, Jack Dongarra, Nominated for Best Paper, Proceedings of the The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Salt Lake City, Utah, IEEE Press, pp. 27:127:11, November 2016. A pdf version is available. PerformancePortable Autotuning of OpenCL Kernels for Convolutional Layers of Deep Neural Networks, Yaohung Tsai, Piotr Luszczek, Jakub Kurzak and Jack Dongarra, in the Machine Learning and HPC Environments Workshop associated with SC16, November 2016. A pdf version is available. Batched Generation of Incomplete Sparse Approximate Inverses on GPUs, H. Anzt, E. Chow, T. Huckle, J. Dongarra, Proceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for LargeScale Systems, pp. 49–56, November 2016. A pdf version is available. Towards Achieving Performance Portability Using Directives for Accelerators, M. Lopez, V. Larrea, W. Joubert, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra, The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Third Workshop on Accelerator Programming Using Directives (WACCPD), Salt Lake City, Utah, Innovative Computing Laboratory, University of Tennessee, November 2016. A pdf version is available. Performance Analysis and Acceleration of Explicit Integration for Large Kinetic Networks using Batched GPU Computations, A. Haidar, B. Brock, S. Tomov, M. Guidry, J. Billings, D. Shyles, and J. Dongarra, 2016 IEEE High Performance Extreme Computing Conference (HPEC ‘16), Waltham, MA, IEEE, September 2016. A pdf version is available. Power Management and Event Verification in PAPI, H. Jagode, A. YarKhan, A. Danalis , and J. Dongarra, Tools for High Performance Computing 2015: Proceedings of the 9th International Workshop on Parallel Tools for High Performance Computing, September 2015, Dresden, Germany, Dresden, Germany, Springer International Publishing, pp. pp. 4151, 2016. A pdf version is available. Search Space Generation and Pruning System for Autotuners, Piotr Luszczek, Mark Gates, Jakub Kurzak, Anthony Danalis, and Jack Dongarra, the 30th IEEE International Parallel & Distributed Processing Symposium, Chicago, IL, IEEE, May 2016. A pdf version is available. Highperformance MatrixMatrix Multiplications of Very Small Matrices, I. Masliah, A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin, J. Falcou, and J. Dongarra, 22nd International European Conference on Parallel and Distributed Computing (EuroPar'16), Grenoble, France, Springer International Publishing, August 2016. A pdf version is available. Heterogeneous Streaming, C. Newburn, et al., The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2016, Chicago, IL, IEEE, May 2016. A pdf version is available. CUDAaware noncontiguous data movement in Open MPI, Wei Wu, George Bosilca, Rolf vandeVaart, and Jack Dongarra, 25th International Symposium on HighPerformance Parallel and Distributed Computing (HPDC'16), Kyoto, Japan, ACM, June 2016. A pdf version is available.  2015 Exascale Computing and Big Data: The Next Frontier, Daniel A. Reed and Jack Dongarra, Communications of the ACM, Vol. 58 No. 7, Pages 5668, DOI: 10.1145/2699414.A pdf version is available. Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures, M. Baboulin, J. Dongarra, A. R�my, S. Tomov, I. Yamazaki, the Proceedings of the 11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015), Volume 9573 of the series Lecture Notes in Computer Science pp 8695, DOI: 10.1007/9783319321493_9 A pdf version is available. Accelerating Collaborative Filtering Using Concepts from High Performance Computing, Mark Gates, Hartwig Anzt, Jakub Kurzak, and Jack Dongarra, 2015 IEEE International Conference on Big Data (IEEE BigData, November 2015). DOI: 10.1109/BigData.2015.7363811 A pdf version is available. Strengthening compute and data intensive capacities of Armenia,” H. Astsatryan, V. Sahakyan, Y. Shoukourian, P.H. Cros, M. Dayde, J. Dongarra, P. Oster, in RoEduNet International Conference  Networking in Education and Research (RoEduNet NER), 2015 14th, vol., no., pp.2833, 2426, Sept. 2015 DOI: 10.1109/RoEduNet.2015.7311823 A pdf version is available. Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems, M. Abalenkovs, A. Abdelfattah, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, I. Yamazaki, A. YarKhan, Supercomputing Frontiers and Innovations, Volume 2, Number 4, pages 6786, 2015, DOI: 10.14529/jsfi1504 A pdf version is available. The TOP500 List of Supercomputers and Progress in High Performance Computing, Erich Strohmaier, Hans W. Meuer, Jack Dongarra, Horst D. Simon, IEEE Computer, No.11  Nov. (2015 vol.48), pp. 42–49, http://doi.ieeecomputersociety.org/10.1109/MC.2015.338. A pdf version is available. Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs, Jakub Kurzak, Hartwig Anzt, Mark Gates, and Jack Dongarra, IEEE Transactions on Parallel and Distributed Systems, no. 1045–9219, November 2015. A pdf version is available. Mixing LUQR Factorization Algorithms to Design HighPerformance Dense Linear Algebra Solvers, Mathieu Faverge, Julien Herrmann, Julien Langou, Bradley Lowery, Yves Robert, and Jack Dongarra, Journal on Parallel and Distributed Computing, Volume 85, November 2015, pp. 32–46, http://dx.doi.org/10.1016/j.jpdc.201. A pdf version is available. A Scalable Approach to Solving Dense Linear Algebra Problems on Hybrid CPUGPU Systems, Fengguang Song and Jack Dongarra, Concurrency and Computation: Practice and Experience, Volume 27, Issue 14, 25 September 2015, pp. 3702–3723, DOI: 10.1002/cpe.3403. A pdf version is available. A Survey of Recent Developments in Parallel Implementations of Gaussian Elimination, Simplice Donfack, Jack Dongarra, Mathieu Faverge, Mark Gates, Jakub Kurzak, Piotr Luszczek, and Ichitaro Yamazaki, Concurrency and Computation: Practice and Experience Volume 27, Issue 5, pp. 1292–1309, 10 April 2015, http://dx.doi.org/10.1002/cpe.3306. A pdf version is available. Experiences in Autotuning Matrix Multiplication for Energy Minimization on GPUs, Hartwig Anzt, Blake Haugen, Jakub Kurzak, Piotr Luszczek, and Jack Dongarra, Concurrency and Computing: Practice and Experience, Volume 27, Issue 17, December 2015, pp. 5096–5113, http://dx.doi.org/10.1109/IPDPSW.2014.107. A pdf version is available. MixedPrecision Cholesky QR Factorization and its Case Studies on Multicore CPUS with Multiple GPUs, Ichitaro Yamazaki, Stanimire Tomov, and Jack Dongarra, SIAM J. Sci. Comput. 373 (2015), pp. C307C330, http://dx.doi.org/10.1137/14M0973773. A pdf version is available. A New Metric for Ranking High Performance Computing Systems, Jack Dongarra, Michael A. Heroux, and Piotr Luszczek, National Science Review, January 2016, DOI: 10.1093/nsr/nwv084. A pdf version is available. Computing Lowrank Approximation of a Dense Matrix on Multicore CPUs with a GPU and its Application to Solving a Hierarchically Semiseparable Linear System of Equations, Ichitaro Yamazaki, Stanimire Tomov and Jack Dongarra, Scientific Programming, vol. 2015, Article ID 246019, 17 pages, 2015, http://dx.doi.org/10.1155/2015/246019. A pdf version is available. Batched Matrix Computations on Hardware Accelerators Based on GPUs, Azzam Haidar, Tingxing Dong, Piotr Luszczek, Stanimire Tomov, and Jack Dongarra, The International Journal of High Performance Computing Applications, May 2015 29: 193208, first published on February 9, 2015, http://dx.doi.org/1177/1094342014567546. A pdf version is available. PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed TaskBased Execution, Anthony Danalis, Heike Jagode, George Bosilca and Jack Dongarra, to appear IEEE Cluster 2015, Chicago, Illinois, USA, Sept. 811, 2015. A pdf version is available. Random Sampling to Update Partial Singular Value Decomposition on a Hybrid CPU/GPU Cluster, Ichitaro Yamazaki, Jakub Kurzak, Piotr Luszczek, Jack Dongarra, to appear SC15, November 2015. A pdf version is available. Performance of Random Sampling for Computing Lowrank Approximations of a Dense Matrix on GPUs, Théo Mary, Ichitaro Yamazaki, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Jack Dongarra, to appear SC15, November 2015. A pdf version is available. Practical Scalable Consensus for PseudoSynchronous Distributed Systems, Thomas Herault, Aurelien Bouteiller, George Bosilca, Marc Gamell, Keita Teranishi, Manish Parashar, Jack Dongarra, to appear SC15, November 2015. A pdf version is available. Efficient Implementation Of Quantum Materials Simulations On Distributed CPUGPU Systems, Raffaele Solc� , Anton Kozhevnikov, Azzam Haidar, Stanimire Tomov, Thomas C. Schulthess, Jack Dongarra, to appear SC15, finalist for the Best Paper Award, November 2015. A pdf version is available. Dense Symmetric Indefinite Factorization on GPU Accelerated Architecture, Marc Baboulin, Jack Dongarra, Adrien Remy, Stanimire Tomov, and Ichitaro Yamazaki, to appear PPAM 2015, Krakow Poland, 2015. A pdf version is available. Plan B: Interruption of Ongoing MPI Operations to Support Failure Recovery, Aurelien Bouteiller, George Bosilca and Jack Dongarra, to appear EUROMPI Conference, Spetember 2015. A pdf version is available. Flexible Linear Algebra Development and Scheduling with Cholesky Factorization, Azzam Haidar, Asim YarKhan, Chongxiao Cao, Piotr Luszczek, Stanimire Tomov, Jack Dongarra, 17th IEEE International Conference on High Performance Computing and Communications, New York, New York, August 2015. A pdf version is available. Iterative Sparse Triangular Solves for Prconditioning, Hartwig Anzt, Edmond Chow and Jack Dongarra, to appear in EuroPar 2015, Vienna Austria, August 2015. A pdf version is available. Design for a Soft Error Resilient Dynamic Taskbased Runtime, Chongxaio Cao, George Bosilca, Thomas Herault, and Jack Dongarra, 29th IEEE International Parallel & Distributed Processing Symposium, Hyderabad, INDIA, May 2015. A pdf version is available. Hierarchical DAG Scheduling for Hybrid Distributed Systems, Wei Wu, George Bosilca, Aurelien Bouteiller, Mathieu Faverge, and Jack Dongarra, 29th IEEE International Parallel & Distributed Processing Symposium, Hyderabad, INDIA, May 2015. A pdf version is available. Performance Analysis and Optimisation of TwoSided Factorization Algorithms for Heterogeneous Platform, International Conference on Computational Science 2015, ICCS 2015, Computational Science at the Gates of Nature Edited By Slawomir Koziel, Leifur Leifsson, Michael Lees, Valeria V. Krzhizhanovskaya, Jack Dongarra and Peter M.A. Sloot. doi:10.1016/j.procs.2015.05.222 A pdf version is available. Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product, H. Anzt, S. Tomov, and J. Dongarra, In Spring Simulation MultiConference 2015 (SpringSim15), 2015. A pdf version is available. Performance Analysis and Design of a Hessenberg Reduction using Stabilized Blocked Elementary Transformations for New Architecture, Khairul Kabir, Azzam Haidar, Stanimire Tomov, Jack Dongarra. Best Paper Award at 2015 Spring Simulation Multiconference, 23rd High Performance Computing Symposium (HPC 2015). A pdf version is available. Energy Efficiency and Performance Frontiers for Sparse Computations on GPU Supercomputers, Hartwig Anzt, Stan Tomov, and Jack Dongarra, PMAM '15 Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, ACM New York, NY, USA 2015, doi:10.1145/2712386.2712387 A pdf version is available. Towards Batched Linear Solvers on Accelerated Hardware Platforms, Azzam Haidar, Piotr Luszczek, Stanimire Tomov, and Jack Dongarra, In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, San Francisco, CA, February 711, 2015. 10.1145/2688500.2688534 A pdf version is available. Optimization for Performance and Energy for Batched Matrix Computations on GPUs, Azzam Haidar, Tingxing Dong, Piotr Luszczek, Stanimire Tomov, and Jack Dongarra, 8th Workshop on General Purpose Processing Using GPUs, (GPGPU 8), San Francisco, February 7, 2015. 10.1145/2716282.2716288 A pdf version is available. Optimizing Krylov Subspace Solvers on Graphics Processing Units, Hartwig Anzt, Stanimire Tomov, Piotr Luszczek, Ichitaro Yamazaki, Jack Dongarra, and William Sawyer, Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International, pp 941949, DOI: 10.1109/IPDPSW.2014.107 A pdf version is available. Experiences in Autotuning Matrix Multiplication for Energy Minimization on GPUs, Hartwig Anzt, Blake Haugen, Jakub Kurzak, Piotr Luszczek, and Jack Dongarra, accepted in Concurrency and Computing: Practice and Experience, March 2015. DOI: 10.1002/cpe.3516 A pdf version is available. Mixing LUQR Factorization Algorithms to Design HighPerformance Dense Linear Algebra Solvers, Mathieu Faverge, Julien Herrmann, Julien Langou, Bradley Lowery, Yves Robert, and Jack Dongarra, accepted in Journal on Parallel and Distributed Computing, March 2015. http://dx.doi.org/10.1016/j.jpdc.201 A pdf version is available. MixedPrecision Cholesky QR Factorization and its Case Studies on Multicore CPUS with Multiple GPUs, I. Yamazaki, S. Tomov, and J. Dongarra, SIAM J. Sci. Comput., Volume 37, Issue 3, DOI:10.1137/14M0973773 A pdf version is available. Updating Incomplete Factorization Preconditioners for Model Order Reduction, Hartwig Anzt, Edmond Chow, Jens Saak, and Jack Dongarra, To appear in Parallel Computing. A pdf version is available. A Survey of Recent Developments in Parallel Implementations of Gaussian Elimination, Simplice Donfack, Jack Dongarra, Mathieu Faverge, Mark Gates, Jakub Kurzak, Piotr Luszczek, Ichitaro Yamazaki, Submitted to Concurrency and Computation: Practice and Experience, Volume 27, Issue 5, pages 12921309, April 2015. DOI: 10.1002/cpe.3306 A pdf version is available. Computing Lowrank Approximation of a Dense Matrix on Multicore CPUs with a GPU and its Application to Solving a Hierarchically Semiseparable Linear System of Equations, Ichitaro Yamazaki, Stanimire Tomov and Jack Dongarra, Scientific Programming, vol. 2015, Article ID 246019, 17 pages, 2015. http://dx.doi.org/10.1155/2015/246019. A pdf version is available. Acceleration of GPUbased Krylov Solvers via Data Transfer Reduction, Hartwig Anzt, Stanimire Tomov, Piotr Luszczek, William Sawyer and Jack Dongarra, The International Journal of High Performance Computing Applications, accepted April 2015, http://dx.doi.org/10.1177/1094342015580139. A pdf version is available. Algorithmbased Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy, Aurelien Bouteiller, Thomas Herault, George Bosilca, Peng Du, and Jack Dongarra, ACM Transactions on Parallel Computing, Volume 1 Issue 2, January 2015, http://dx.doi.org/10.1145/2686892. A pdf version is available. HPC Programming on Intel ManyIntegratedCore Hardware with MAGMA Xeon Phi, Jack Dongarra, Mark Gates, Azzam Haidar, Yulu Jia, Khairul Kabir, Piotr Luszczek, and Stanimire Tomov, Scientific Programming, Volume 2015 (2015), Article ID 502593, 11 pages http://dx.doi.org/10.1155/2015/502593. A pdf version is available. Batched Matrix Computations on Hardware Accelerators Based on GPUs, Azzam Haidar, Tingxing Dong, Piotr Luszczek, Stanimire Tomov, and Jack Dongarra, The International Journal of High Performance Computing Applications, May 2015 29: 193208, first published on February 9, 2015, http://dx.doi.org/1177/1094342014567546. A pdf version is available. Composing Resilience Techniques: ABFT, Periodic and Incremental Checkpointing, George Bosilca, Aurelien Bouteiller, Thomas Herault, Yves Robert, and Jack Dongarra, International Journal of Networking and Computing, Volume 5, Number 1, pages 225, January 2015. A pdf version is available. Exascale Computing and Big Data: The Next Frontier, Daniel A. Reed and Jack Dongarra, accepted in Communications of the ACM, Vol. 58 No. 7, Pages 5668, DOI: 10.1145/2699414. A pdf version is available.  2014 Unified Model for Assessing Checkpointing Protocols at ExtremeScale, George Bosilca, Aurelien Bouteiller, Elisabeth Brunet, Franck Cappello, Jack Dongarra, Amina Guermouche, Thomas Herault, Yves Robert, Frederic Vivien, and Dounia Zaidouni, Concurrency and Computation: Practice and Experience, Volume 26, Issue 17, pp. 2772–2791, 10 December 2014, DOI: 10.1002/cpe.3173.A pdf version is available. Performance of Various Computers Using Standard Linear Equations Software, (Linpack Benchmark Report), Jack J. Dongarra, University of Tennessee Computer Science Technical Report, CS8985, 2014. A postscript version is available. Parallel Simulation of Superscalar Scheduling, Blake Haugen, Piotr Luszczek, Jakub Kurzak, Asim YarKhan, and Jack Dongarra, CPP'14: International Conference on Parallel Processing, Minneapolis, MN, 2014, DOI: 10.1109/ICPP.2014.21 A pdf version is available. Performance and Portability with OpenCL for ThroughputOriented HPC Workloads Across Accelerators, Coprocessors, and Multicore Processors, Azzam Haidar, Chongxiao Cao, Ichitaro Yamazaki, Jack Dongarra, Mark Gates, Piotr Luszczek, and Stan Tomov, Scala 2014, ACM, New Orleans, LA, November 17, 2014, DOE:10.1109/ScalA.2014.8 A pdf version is available. Accessaverse Framework for Computing Lowrank Matrix Approximations, Ichitaro Yamazaki, Theo Mary, Jakub Kurzak, Stanimire Tomov, and Jack Dongarra, First International Workshop on High Performance Big Graph Data Management, Analysis, and Mining (in Conjunction with IEEE BigData'14), October, 27, 2014, Bethesda, MD, Pages: 70  77, DOI: 10.1109/BigData.2014.7004374 A pdf version is available. PTG: An Abstraction for Unhindered Parallelism, Anthony Danalis, George Bosilca, Aurelien Bouteiller, Thomas Herault, and Jack Dongarra, WOLFHPC '14 Proceedings of the Fourth International Workshop on DomainSpecific Languages and HighLevel Frameworks for High Performance Computing Pages 2130, SC14 Workshop, New Orleans, LA, November 17, 2014, DOI:10.1109/WOLFHPC.2014.8 A pdf version is available. Deflation Strategies to Improve the Convergence of CommunicationAvoiding GMRES, Ichitaro Yamazaki, Stanimire Tomov, and Jack Dongarra, ScalA2014, Workshop on Latest Advances in Scalable Algorithms for LargeScale Systems (ScalA), New Orleans, LA, November 17, 2014. DOI:10.1109/ScalA.2014.6 A pdf version is available. Power Monitoring with PAPI for Extreme Scale Architectures and Dataflowbased Programming Models, McCraw, Heike, Ralph, James, Danalis, Anthony, Dongarra, Jack, Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications (HPCMASPA 2014), IEEE Cluster 2014, IEEE, Madrid, Spain, September, 2014. DOI: 10.1109/CLUSTER.2014.6968672 A pdf version is available. LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU, Tingxing Dong, Azzam Haidar, Piotr Luszczek, James Austin Harris, Stanimire Tomov, and Jack Dongarra, High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), Paris, France, 2014, DOI:10.1109/HPCC.2014.30 A pdf version is available. A Step towards Energy Efficient Computing: Redesigning A Hydrodynamic Application on CPUGPU, Tingxing Dong, Veselin Dobrev, Tzanio Kolev, Robert Rieben, Stanimire Tomov, and Jack Dongarra, 28th IEEE International Parallel & Distributed Processing Symposium, 2014, DOI: 10.1109/IPDPS.2014.103 A pdf version is available. clMAGMA: High Performance Dense Linear Algebra with OpenCL, Chongxiao Cao, Jack Dongarra, Peng Du, Mark Gates, Piotr Luszczek, Stanimire Tomov, IWOCL '14, May 12  13 2014, Bristol, United Kingdom. A pdf version is available. A Scalable Approach to Solving Dense Linear Algebra Problems on Hybrid CPUGPU Systems, Fengguang Song and Jack Dongarra, accepted in Concurrency and Computation: Practice and Experience, August 2014. DOI: 10.1002/cpe.3403 A pdf version is available DOE: Assessment of Workforce Development Needs in office of Science Research Disciplines, DOE ASCAC Subcommittee Report, B. Chapman, et. al, July 2014. A pdf version is available. Top Ten Exascale Research Challenges, DOE ASCAC Subcommittee Report, 2014, R. Lucas, et. al. A pdf version is available. Applied Mathematics Research for Exascale Computing, Jack Dongarra (cochair, Oak Ridge National Laboratory) and Jeffrey Hittinger (cochair, Lawrence Livermore National Laboratory, et. al. DOE Report for the Office of Science, Advanced Scientific Computing Research, 2014. A pdf version is available. Unified Model for Assessing Checkpointing Protocols at ExtremeScale, George Bosilca, Aurelien Bouteiller, Elisabeth Brunet, Franck Cappello, Jack Dongarra, Amina Guermouche, Thomas Herault, Yves Robert, Frederic Vivien, and Dounia Zaidouni, accepted in Concurrency and Computation: Practice and Experience, Volume 26, Issue 17, pages 27722791, 10 December 2014, DOI: 10.1002/cpe.3173. A pdf version is available. Accelerating Numerical Dense Linear Algebra Calculations with GPUs, Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, and Ichitaro Yamazaki, pp. 328, in Numerical Computations with GPUs, edited by Volodymyr Kindratenko, Springer, 2014, DOI:10.1007/9783319065489_1. A pdf version is available. Looking Back at Dense Linear Algebra Software, Piotr Luszczek, Jakub Kurzak, and Jack Dongarra, Journal of Parallel and Distributed Computing, pp 25482560, 2014. http://dx.doi.org/10.1016/j.jpdc.2013.10.005 A pdf version is available. A Novel Hybrid CPUGPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks, Azzam Haidar, Stanimire Tomov, Jack Dongarra, Raffaele Solc`a, Thomas Schulthess, International Journal of High Performance Computing Applications, volume 28, number 2 pp 196209, 2014. DOI: 10.1177/1094342013502097 A pdf version is available. Update Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization, J. Dongarra, M. Faverge, P. Luszcsek, Concurrency and Computation: Practice and Experience, Volume 26, Issue 7, pp 14081431, DOI: 10.1002/cpe.3110, 2014. A pdf version is available. ModelDriven OneSided Factorizations on Multicore, Accelerated Systems, Jack Dongarra, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Asim YarKhan, Supercomputing Frontiers and Innovations, volume 1, number 1, 2014. A pdf version is available. Performance and Reliability Tradeoffs for the Double Checkpointing Algorithm, Jack Dongarra, Thomas Herault and Yves Robert, The International Journal of Networking and Computing, Vol 4 No 1, p. 2341, 2014. A pdf version is available. An Efficient Distributed Randomized Algorithm For Solving Large Dense Symmetric Indefinite Linear Systems, Marc Baboulin, Dulceneia Becker, George Bosilca, Anthony Danalis, and Jack Dongarra, Parallel Computing, Volume 40 Issue 7, July 2014, pp 213223. DOI: 10.1016/j.parco.2013.12.003 A pdf version is available. HPC Programming on Intel ManyIntegratedCore Hardware with MAGMA Port to Xeon Phi, Jack Dongarra, Mark Gates, Azzam Haidar, Yulu Jia, Khairul Kabir, Piotr Luszczek, and Stanimire Tomov, Volume 2015 (2015), Article ID 502593, Scientific Programming. DOI: 10.1155/2015/502593 A pdf version is available. Exascale Computing and Big Data: The Next Frontier, Daniel A. Reed and Jack Dongarra, DOI: 10.1145/2699414, Communications of the ACM, Vol. 58 No. 7, Pages 5668, July 2015. A pdf version is available. CommunicationAvoiding SymmetricIndefinite Factorization, G. Ballard, D. Becker, J. Demmel, J. Dongarra, A. Druinsky, I. Peled, O. Schwartz, S. Toledo, and I. Yamazaki, DOI:10.1137/130929060, SIAM J. Matrix Anal. Appl. 35(4): 13641460 (2014). A pdf version is available. Algorithmbased Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy, Aurelien Bouteiller, Thomas Herault, George Bosilca, Peng Du, and Jack Dongarra, DOI: 10.1145/2686892, ACM Transactions on Parallel Computing, Volume 1 Issue 2, January 2015. A pdf version is available. Assessing the Cost of Redistribution followed by a Computational Kernel: Complexity and Performance Results, Julien Herrmann, George Bosilca, Thomas Hurault, Loris Marchal, Yves Robert, Jack Dongarra, submitted to Parallel Computing May 2014. A pdf version is available. Optimizing Krylov Subspace Solvers on Graphics Processing Units, Hartwig Anzt, Stanimire Tomov, Piotr Luszczek, Ichitaro Yamazaki, Jack Dongarra, and William Sawyer, submitted to International Journal of High Performance Computing Applications 2014. A pdf version is available. A Scalable Approach to Solving Dense Linear Algebra Problems on Hybrid CPUGPU Systems, Fengguang Song and Jack Dongarra, DOI: 10.1002/cpe.3403, Concurrency and Computation: Practice and Experience, October 2014. A pdf version is available. LAPACK, CRC Handbook on Linear Algebra, Second Edition, Zhaojun Bai, James Demmel, Jack Dongarra, Julien Langou, and Jenny Wang, Editor Leslie Hogben, CRC Press, ISBN 9781466507289, 2014. A pdf version is available. Accelerating Numerical Dense Linear Algebra Calculations with GPUs, Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, and Ichitaro Yamazaki, to appear in Numerical Computations with GPUs, edited by Volodymyr Kindratenko, Springer, 2014. A pdf version is available. Computing Least Squares Condition Numbers on Hybrid Multicore/GPU Systems, M. Baboulin and J. Dongarra and R. Lacroix, Proceedings for the Applied Mathematics, Modeling and Computational Science (AMMCS) conference, Vol. 117 (2015). A pdf version is available. New MultiStage Algorithm for Symmetric Eigenvalues and Eigenvectors Achieves TwoFold Speedup, A. Haidar, P. Luszczek, J. Dongarra, Best Paper Award, Workshop on Parallel and Distributed Scientific and Engineering Computing, Phoenix, AZ, May, 2014. A pdf version is available. Designing LUQR Hybrid Solvers for Performance and Stability, Mathieu Faverge, Julien Herrmann, Julien Langou, Bradley Lowery, Yves Robert, and Jack Dongarra, 28th IEEE International Parallel & Distributed Processing Symposium, 2014. A pdf version is available. Redesigning A Hydrodynamic Application on CPUGPU, Tingxing Dong, Veselin Dobrev, Tzanio Kolev, Robert Rieben, Stanimire Tomov, Jack Dongarra, 28th IEEE International Parallel & Distributed Processing Symposium. A pdf version is available. Improving the Performance of CAGMRES on Multicores with Multiple GPUs, I. Yamazaki, H. Anzt, S. Tomov, M. Hoemmen, and J. Dongarra, 28th IEEE International Parallel & Distributed Processing Symposium. A pdf version is available. Unified Development for Mixed MultiGPU and MultiCoprocessor Environments using a Lightweight Runtime Environment, A. Haidar, C. Cao, J. Dongarra, P. Luszczek, S. Tomov, A. YarKhan, K. Kabir, 28th IEEE International Parallel & Distributed Processing Symposium. A pdf version is available. MixedPrecision Orthogonalization Scheme and Adaptive Step Size for CAGMRES on GPUs, Best Paper Award, Ichitaro Yamazaki, Stanimire Tomov, Tingxing Dong and Jack Dongarra VECPAR 2014, June 30  July 3, 2014, Eugene, Oregon. A pdf version is available. Accelerating computation of eigenvectors in the nonsymmetric eigenvalue problem, Mark Gates, Azzam Haidar and Jack Dongarra VECPAR 2014, June 30  July 3, 2014, Eugene, Oregon. A pdf version is available. SelfAdaptive Multiprecision Preconditioners on Multicore and Manycore Architectures, Hartwig Anzt, Dimitar Lukarski, Stan Tomov and Jack Dongarra VECPAR 2014, June 30  July 3, 2014, Eugene, Oregon. A pdf version is available. Hybrid MultiElimination ILU Preconditioners on GPUs, Dimitar Lukarski, Hartwig Anzt, Stanimire Tomov, and Jack Dongarra, 23rd Heterogeneity in Computing Workshop (HCW 2014), in Proc. of IPDPS 2014, Phoenix, Arizona, May 1923, 2014. A pdf version is available. Optimizing Krylov Subspace Solvers on Graphics Processing Units, Hartwig Anzt, Stanimire Tomov, Piotr Luszczek, Ichitaro Yamazaki, Jack Dongarra, and William Sawyer, The Third International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), May 19, 2014, Phoenix, AZ, part of IPDPS Conference. A pdf version is available. MIAMI: A Framework for Application Performance Diagnosis, G. Marin, J. Dongarra, and D. Terpstra, ISPASS2014 2014 IEEE International Symposium on Performance Analysis of Systems and Software March 2325, 2014 Hyatt Regency Hotel in Monterey, CA. A pdf version is available. Assessing the Impact of ABFT and Checkpoint Composite Strategies, Bosilca, G., Bouteiller, A., Herault, T., Robert, Y., Dongarra, J. IPDPSW, APDCM 2014, Phoenix, AZ, May, 2014. A pdf version is available. Dynamically balanced synchronizationavoiding LU factorization with multicore and GPUs, Simplice Donfack, Stanimire Tomov and Jack Dongarra, Fourth International Workshop on Accelerators and Hybrid Exascale Systems, May 19, 2014. A pdf version is available. Design and Implementation of a Large Scale TreeBased QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime, Ichitaro Yamazaki, Jakub Kurzak, Piotr Luszczek, Jack Dongarra, Parallel Processing Letters, Volume 24, Number 4, December 2014, doi: 10.1142/S0129626414420043. A pdf version is available. Scaling Up Matrix Computations on SharedMemory Manycore Systems with 1000 CPU Cores, Fengguang Song and Jack Dongarra, Proceeding ICS '14 Proceedings of the 28th ACM international conference on Supercomputing, pp 333342, ACM New York, NY, USA, ISBN: 9781450326421 doi>10.1145/2597652.2597670 A pdf version is available. Heterogenous Acceleration for Linear Algebra in MulitCoprocessor Environments, Azzam Haidar, Piotr Luszczek, Stanimire Tomov and Jack Dongarra VECPAR 2014, June 30  July 3, 2014, Eugene, Oregon, accepted March 2014. A pdf version is available. A Fast Batched Choleksy Factorization on a GPU, Tingxing Dong, Azzam Haidar, Stanimire Tomov and Jack Dongarra, 43rd International Conference on Parallel Processing (ICPP2014), Minneapolis, USA, during September 912, 2014. A pdf version is available. clMAGMA: High Performance Dense Linear Algebra with OpenCL, Chongxiao Cao, Jack Dongarra, Peng Du, Mark Gates, Piotr Luszczek, Stanimire Tomov, The International Workshop on OpenCL, Bristol University, England, May 1213, 2014. A pdf version is available. Utilizing Dataflowbased Execution for Coupled Cluster Methods, Heike McCraw, Anthony Danalis, Thomas Herault, George Bosilca, Jack Dongarra, Karol Kowalski, Theresa L. Windus, Poster at Clusters 2014. A pdf version is available.  2013 Trip Report to Changsha and the Tianhe2 Supercomputer, J. Dongarra, June 3, 2013.A pdf version is available. Extending the Scope of the CheckpointonFailure Protocol for Forward Recovery in Standard MPI, Wesley Bland, Peng Du, Aurelien Bouteiller, Thomas Herault, George Bosilca, and Jack J. Dongarra, Concurrency and Computing: Practice and Experience, Volume 25, Issue 17, pp. 2381–2393, DOI: 10.1002/cpe.3100. A pdf version is available. Extending the Scope of the CheckpointonFailure Protocol for Forward Recovery in Standard MPI, Wesley Bland, Peng Du, Aurelien Bouteiller, Thomas Herault, George Bosilca, and Jack J. Dongarra, Concurrency and Computing: Practice and Experience, Volume 25, Issue 17, pages 23812393, 2013, DOI: 10.1002/cpe.3100. A pdf version is available. Toward a New Metric for Ranking High Performance Computing Systems, M. Heroux and J. Dongarra, UTK EECS Tech Report and Sandia National Labs Report SAND20134744, June 2013. A pdf version is available. A Novel Hybrid CPUGPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks, Azzam Haidar, Stanimire Tomov, Jack Dongarra, Raffaele Solc`a, Thomas Schulthess, International Journal of High Performance Computing Applications, accepted July 2013. A pdf version is available. PaRSEC: A programming paradigm exploiting heterogeneity for enhancing scalability, George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Thomas Herault, Jack J. Dongarra, accepted in IEEE Computing in Science and Engineering, September 2013. A pdf version is available. Unified Model for Assessing Checkpointing Protocols at ExtremeScale, George Bosilca, Aurelien Bouteiller, Elisabeth Brunet, Franck Cappello, Jack Dongarra, Amina Guermouche, Thomas Herault, Yves Robert, Frederic Vivien, and Dounia Zaidouni, accepted in Concurrency and Computation: Practice and Experience, October 2013. A pdf version is available. Tridiagonalization of a Dense Symmetric Matrix On Multiple GPUs and Its Application to Symmetric Eigenvalue Problems, Ichitaro Yamazaki, Tingxing Dong, Raffaele Solc�, Stanimire Tomov, Jack Dongarra, Thomas Schulthess, Concurrency and Computation: Practice and Experience, published online, October 2013, DOI: 10.1002/cpe.3152 A pdf version is available. PostFailure Recovery of MPI Communication Capability: Design and Rationale, Wesley Bland, Aurelien Bouteiller, Thomas Herault, George Bosilca and Jack J. Dongarra, International Journal of High Performance Computing Applications, Volume 27, Issue 3, Fall 2013, pp 44254, DOI: 10.1177/1094342013488238. A pdf version is available. Toward High Performance Divide and Conquer Eigensolver for Dense Symmetric Matrices, Azzam Haidare Hatem Ltaief, and Jack Dongarra, SIAM SISC, Vol. 34, No. 6, pp. C249C274. A pdf version is available. Accelerating Linear System Solutions Using Randomization Techniques, Marc Baboulin, Jack Dongarra, Julien Herrmann, and Stanimire Tomov, ACM TOMS, Vol. 39, No 2 (2013). A pdf version is available. Level3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms, Fred G. Gustavson, Jerzy Wasniewski, Jack J. Dongarra, J. Herrero, and J. Langou, ACM Transactions on Mathematical Software (TOMS), Vol. 39, No 2 (2013). A pdf version is available. High Performance Bidiagonal Reduction using Tile Algorithms on Homogeneous Multicore Architectures, H. Ltaief, P. Luszczek, and J. Dongarra, ACM Transactions on Mathematical Software, Volume 39, Issue 3, April 2013. A pdf version is available. An Evaluation of UserLevel Failure Mitigation support in MPI, Aurelien Bouteiller, Wesley Bland, Thomas Herault, Joshua Hursey, George Bosilca and Jack Dongarra, Recent Advances in the Message Passing Interface, Lecture Notes in Computer Science Volume 7490, 2012, pp 193203, ISSN: 0010485X, April 2013. A pdf version is available. KernelAssisted and TopologyAware Collective Communications on Multicore/Manycore Platforms, Teng Ma, George Bosilca, Aurelien Bouteiller, Jack Dongarra, Journal of Parallel and Distributed Computing, Volume 73, Issue 7, pp. 10001010, July 2013. (Best paper award IPDPS 2013 Conference) A pdf version is available. BlackjackBench: Portable Hardware Characterization with Automated Results Analysis, Anthony Danalis, Piotr Luszczek, Gabriel Marin, Jeffrey S. Vetter and Jack Dongarra, Computer Journal, 2013; doi: 10.1093/comjnl/bxt057. A pdf version is available. Enabling Workflows in GridSolve: Request Sequencing and Service Trading, Yinan Li, Asim YarKhan, Jack Dongarra, Keith Seymour, and Aurlie Hurault, The Journal of Supercomputing, June 2013, Volume 64, Issue 3, pp 11331152. A pdf version is available. Correlated Set Coordination in Fault Tolerant Message Logging Protocols, A. Boureiller, T. Herault, G. Bosilca, J. Dongarra, Concurrency and Computation: Practice and Experience, Volume 25, Issue 4, pages 572585, 2013. A pdf version is available. LU Factorization with Partial Pivoting for a Multicore System with Accelerators, J. Kurzak, P. Luszczek, and J. Dongarra, IEEE Transactions on Parallel and Distributed Computing, August 2013 (vol. 24 no. 8), pp. 16131621. A pdf version is available. Soft Error Resilient QR Factorization for Hybrid System with GPGPU,P. Du, P. Luszczek, S. Tomov, and J. Dongarra, accepted in Journal of Computational Science, January 2013. A pdf version is available. Hierarchical QR factorization algorithms for multicore cluster systems, Jack Dongarra, Mathieu Faverge, Thomas Herault, Mathias Jacquelin, Julien Langou, Yves Robert, Parallel Computing, Volume 39, Issues 45, AprilMay 2013, Pages 212�€“232. A pdf version is available. A BlockAsynchronous Relaxation Method for Graphics Processing Units, Hartwig Anzt, Stanimire Tomov, Jack Dongarra, Vincent Heuveline, Journal of Parallel and Distributed Computing, Journal of Parallel and Distributed Computing, Online June 6, 2013, http://dx.doi.org/10.1016/j.bbr.2011.03.031 A pdf version is available. Extending the Scope of the CheckpointonFailure Protocol for Forward Recovery in Standard MPI, Wesley Bland, Peng Du, Aurelien Bouteiller, Thomas Herault, George Bosilca, Jack J. Dongarra, accepted in Concurrency and Computing: Practice and Experience, June 2013. A pdf version is available. Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization, J. Dongarra, M. Faverge, P. Luszcsek, Accepted Concurrency and Computation: Practice and Experience, July 2013. A pdf version is available. Optimizing MemoryBound Numerical Kernels on GPU Hardware Accelerators, A. Abdelfattah, J. Dongarra, D. Keyes, and H. Ltaief, 10th International Meeting on HighPerformance Computing for Computational Science (VECPAR 2012), Lecture Notes in Computer Science 7851, pp 7279, 2013. A pdf version is available. Programming the LU Factorization for a Multicore System with Accelerators, Jakub Kurzak, Piotr Luszczek, Mathieu Faverge, and Jack Dongarra, 10th International Meeting on HighPerformance Computing for Computational Science (VECPAR 2012), Lecture Notes in Computer Science 7851, pp 2835, 2013. A pdf version is available. Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach, George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Herault, Piotr Luszczek, and Jack J. Dongara, in the book Scalable Computing and Communications: Theory and Practice, edited by Samee U. Khan, Lizhe Wang, and Albert Y. Zomaya, Publisher John Wiley & Sons, ISBN: 9781118162651, 2013. A pdf version is available. Keeneland: Computational Science Using Heterogeneous GPU Computing, J. Vetter, R. Glassbrook, K. Schwan, S. Yalamanchili, M. Horton, A. Gavrilovska, M. Slawinska, J. Meredith, P. Roth, K. Spafford, S. Tomov, J. Wynkoop, Ed. Jeffrey S. Vetter, Contemporary High Performance Computing: From Petascale Toward Exascale, Taylor and Francis, Boca Raton, CRC Computational Science Series, 2013. A pdf version is available. HPC Challenge: Design, History, and Implementation Highlights, J. Dongarra and P. Luszczek, Ed. Jeffrey S. Vetter, Contemporary High Performance Computing: From Petascale Toward Exascale, Taylor and Francis, Boca Raton, CRC Computational Science Series, 2013, ISBN: 9781466568341. A pdf version is available. Multithreading in the PLASMA Library, Jakub Kurzak, Piotr Luszczek, Asim YarKhan, Mathieu Faverge, Julien Langou, Henricus Bouwmeester, and Jack Dongarra in Mult and Many�€�Core Processing: Architecture, Programming, Algorithms, & Applications, Edited by Mohamed Ahmed, Reda A. Ammar, Sanguthevar Rajasekaran Series: Chapman & Hall/CRC Computer & Information Science Series, published by Taylor & Francis, 2013. A pdf version is available. Looking Back at Dense Linear Algebra Software, Piotr Luszczek, Jakub Kurzak, and Jack Dongarra, submitted to Journal of Parallel and Distributed Computing, August 2013. A pdf version is available. Scalable Dense Linear Algebra on Heterogeneous Hardware, George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Herault, Jakub Kurzak, Piotr Luszczek, Stan Tomov, Jack Dongarra, to appear in the book HPC: Transition Towards Exascale Processing, in the series Advances in Parallel Computing, IOS Press. A pdf version is available. LAPACK, CRC Handbook on Linear Algebra, Second Edition, Zhaojun Bai, James Demmel, Jack Dongarra, Julien Langou, and Jenny Wang, Editor Leslie Hogben, CRC Press, to appear 2013. A pdf version is available. Revisiting the Double Checkpointing Algorithm, Jack Dongarra, Thomas Herault and Yves Robert, 15th Workshop on Advances in Parallel and Distributed Computational Models, at the IEEE International Parallel & Distributed Processing Symposium 2013, Boston MA, January 2013. A pdf version is available. Implementing a Blocked Aasen's Algorithm with a Dynamic Scheduler on Multicore Architectures, Ichitaro Yamazaki, Dulceneia Becker, Jack Dongarra, Alex Druinsky, Inon Peled, and Sivan Toledo, Grey Ballard, James Demmel, and Oded Schwartz, 15th Workshop on Advances in Parallel and Distributed Computational Models, at the IEEE International Parallel & Distributed Processing Symposium 2013, (Best Paper Award0, Boston MA, January 2013. A pdf version is available. Virtual Systolic Array for QR Decomposition, Jakub Kurzak, Piotr Luszczek, Mark Gates, Ichitaro Yamazaki, and Jack Dongarra, 15th Workshop on Advances in Parallel and Distributed Computational Models, at the IEEE International Parallel & Distributed Processing Symposium 2013, Boston MA, January 2013. A pdf version is available. clMAGMA: High Performance Dense Linear Algebra with OpenCL, C. Cao, Jack Dongarra, Peng Du, Mark Gates, Piotr Luszczek, Stanimire Tomov, International Workshop on OpenCL (IWOCL), GATech, May 1314, 2013. A pdf version is available. A Parallel solver for Incompressible Fluid Flows, Y. Wang, M. Baboulin, J. Dongarra, J. Falcou, Y Fraigneau, and O. Le Maitre, International Conference on Computational Science, ICCS 2013, Barcelona, Spain, May, 2013. A pdf version is available. Leading Edge Hybrid MultiGPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations, Azzam Haidar, Raffaele Solca, Mark Gates, Stanimire Tomov, Thomas Schulthess, and Jack Dongarra, International Supercomputing Conference ISC, Germany, Lecture Notes in Computer Science, Volume 7905, 2013, pp 6780. A pdf version is available. Beyond the CPU: Hardware Performance Counter Monitoring on Blue Gene/Q, Dan Terpstra, Kris Davis, Heike McCraw, Jack Dongarra, International Supercomputing Conference ISC, Germany, Lecture Notes in Computer Science, Volume 7905, 2013, pp 213225. A pdf version is available. Toward a scalable multiGPU eigensolver via computeintensive kernels and efficient communication, Azzam Haidar, Mark Gates, Stanimire Tomov, Jack Dongarra, ICS '13 Proceedings of the 27th international ACM conference on International conference on supercomputing, Pages 223232, ACM New York, NY, USA, June 2013, Eugene Oregon. A pdf version is available. Portable HPC Programming on Intel ManyIntegratedCore Hardware with MAGMA Port to Xeon Phi, Jack Dongarra, Mark Gates, Azzam Haidar, Yulu Jia, Khairul Kabir, Piotr Luszczek and Stan Tomov, To appear in the PPAM Conference 2013, Warsaw, Poland, September 2013. A pdf version is available. Standards for Graph Algorithm Primitives, Tim Mattson et. al, to appear HPEC�€™2013, Boston, September 10, 2013. A pdf version is available. Implementing a Systolic Algorithm for QR Factorization on Multicore Clusters with PaRSEC, Guillaume Aupy, Mathieu Faverge, Yves Robert, Jakub Kurzak, Piotr Luszczek, and Jack Dongarra, accepted in the 6th Workshop on Productivity and Performance held in conjunction with EuroPar 2013, Aachen, Germany August 26 or 27, 2013. A pdf version is available. Parallel Reduction to Hessenberg Form with Algorithmbased Fault Tolerance, Yulu Jia, George Bosilca, Piotr Luszczek, and Jack J. Dongarra, accepted in SC2013, July 2013. A pdf version is available.  2012 Autotuning GEMMs for Fermi, Jakub Kurzak, Stanimire Tomov, and Jack Dongarra, IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 11, November 2012, pp 20452057.A pdf version is available. Energy Footprint of Advanced Dense Numerical Linear Algebra using Tile Algorithms on Multicore Architecture, Jack Dongarra, Hatem Ltaief, Piotr Luszczek, and Vince M. Weaver, The 2nd International Conference on Cloud and Green Computing(CGC 2012), pp 274  281, ISBN: 9781467330275, November 13, 2012, Xiangtan, Hunan, China. A pdf version is available. A Novel Hybrid CPUGPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks, Raffaele Solc�, Azzam Haidar, Stanimire Tomov, Jack Dongarra, and Thomas C. Schulthess, Proceeding SC '12 Proceedings of the 2012, High Performance Computing, Networking Storage and Analysis, Pages 13381339 IEEE Computer Society Washington, DC, USA. A pdf version is available. Autotuning GEMMs for Fermi,Jakub Kurzak, Stanimire Tomov, and Jack Dongarra, IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 11, November 2012, pp 20452057. A pdf version is available. Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures, A. Haidar, H. Ltaief, A, YarKhan, J. Dongarra, Concurrency and Computations, Volume 24, Issue 3, pages 305�€“321, 10 March 2012. A pdf version is available. From CUDA to OpenCL: Towards a Performanceportable Solution for Multiplatform GPU Programming, P. Du, R. Weber, P. Luszczek, S. Tomov, G. Peterson, and J. Dongarra, Parallel Computing, Volume 38, Issue 8, August 2012, pp. 391407. A pdf version is available. Highperformance computing systems: Status and Outlook, , Jack Dongarra and A. J. van der Steen, Acta Numerica (2012), pp. 196. A pdf version is available. An Implementation of the Tile QR Factorization for a GPU and Multiple CPUs, Jakub Kurzak, Rajib Nath, Peng Du, and Jack Dongarra, in Applied Parallel and Scientific Computing, PARA 2010, Editor Lristjan Jonasson, Springer, LNCS, Volume 7133, pp 248257, 2012. A pdf version is available. DAGuE: A generic distributed DAG engine for high performance computing, G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier, J. Dongarra, Parallel Computing, Volume 38, Issue 12, pp. 37 �€“ 51, 2012. A pdf version is available. Divide and Conquer on Hybrid GPUAccelerated Multicore Systems, Christof V�mel, Stanimire Tomov, and Jack Dongarra, SIAM J. Sci. Comput. Volume 34, pp. C70C82, 2012. A pdf version is available. A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a TwoStage Bidiagonal Reduction, A. Haidar, H. Ltaief, P. Luszczek, and J. Dongarra, 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Shanghai, China, May 2012. A pdf version is available. A Tiled Parallel Solver For Symmetric Indefinite Systems On Multicore Architectures,Marc Babolin, D. Becker, and J. Dongarra, 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Shanghai, China, May 2012. A pdf version is available. AlgorithmBased Fault Tolerance for Dense Matrix Factorization,Peng Du, Aurelien Bouteiller, George Bosilca, Jack J. Dongarra, Thomas Herault, 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), February 2529, 2012, New Orleans, LA. A pdf version is available. Blockasynchronous Multigrid Smoothers for GPUaccelerated Systems,Hartwig Anzt, Stan Tomov, Mark Gates, Jack Dongarra, and Vincent Heuveline, Procedia Computer Science, Proceedings of the International Conference on Computational Science, ICCS 2012, Volume 9, 2012, Pages 7�€“16, 2012. A pdf version is available. From Serial Loops to Parallel Execution on Distributed Systems, Anthony Danalis, Aurelien Bouteiller, George Bosilca, Jack J. Dongarra, Thomas Herault, submitted to PPoPP 2012. A pdf version is available. HierKNEM: An Adaptive Framework for KernelAssisted and TopologyAware Collective Communications on Manycore Clusters,(Best Paper), Teng Ma, G. Bosilca, A. Bouteiller, J. Dongarra, 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Shanghai, China, May 2012.. A pdf version is available. Weighted BlockAsynchronous Relaxation for GPUAccelerated Systems, Hartwig Anzt, Jack Dongarra, and Vincent Heuveline, submitted to SIAM Journal on Computing March 2012. A pdf version is available. Dense Linear Algebra on Accelerated Multicore Hardware, Jack Dongarra, Jakub Kurzak, Piotr Luszczek, and Stanimire Tomov, in High Performance Scientific Computing: Algorithms and Applications, Editors Michael W. Berry, Kyle A. Gallivan, Efstratios Gallopoulos, Ananth Grama, Bernard Philippe, Yousef Saad and Faisal Saied, Springer, 2012. A pdf version is available. Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures using Tree Reduction, H. Ltaief, P. Luszczek, and J. Dongarra, in Lecture Notes in Computer Science, Volume 7203, 2012, Parallel Processing and Applied Mathematics 9th International Conference, PPAM 2011, Torun, Poland, September 1114, 2011, Part I, Roman Wyrzykowski, Jack Dongarra , Konrad Karczewski and Jerzy Wasniewski, pp 661670, 2012. A pdf version is available. Reducing the Amount of Pivoting in Symmetric Indefinite Systems, D. Becker, M. Babolin, J. Dongarra, in Lecture Notes in Computer Science, Volume 7203, 2012, Parallel Processing and Applied Mathematics 9th International Conference, PPAM 2011, Torun, Poland, September 1114, 2011, Part I, Roman Wyrzykowski, Jack Dongarra , Konrad Karczewski and Jerzy Wasniewski, pp 133142, 2012. A pdf version is available. Blockasynchronous Multigrid Smoothers for GPUaccelerated Systems, Hartwig Anzt, Stan Tomov, Mark Gates, Jack Dongarra, and Vincent Heuveline, International Conference on Computational Science, International Conference on Computational Science, (ICCS) 2012, May 2012, Omaha NE. A pdf version is available. Onesided dense matrix factorizations on a multicore with multiple GPU accelerators in MAGMA, Ichitaro Yamazaki, Stanimire Tomov, and Jack Dongarra, International Conference on Computational Science, ICCS 2012, Omaha NE. A pdf version is available. A Class of CommunicationAvoiding Algorithms for Solving General Dense Linear Systems on CPU/GPU Parallel Machines, Marc Baboulin, Simplice Donfack, Jack Dongarra, Laura Grigori, Adrien R�emy, Stanimire Tomov, International Conference on Computational Science, ICCS 2012, Omaha NE. A pdf version is available. High Performance Dense Linear System Solver with Resilience to Multiple Soft Errors, P. Du, P. Luszczek, and J. Dongarra, International Conference on Computational Science, ICCS 2012, Omaha NE. A pdf version is available. Enabling and Scaling Matrix Computations on Heterogeneous MultiCore and MultiGPU Systems, Fengguang Song and Jack Dongarra, ICS 2012 Conference, 26th International Conference on Supercomputing, 2529 June 2012, San Servolo Island, Venice, Italy. A pdf version is available. A Scalable Framework for Heterogeneous GPUBased Clusters, F. Song and J. Dongarra, ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '12), Pittsburgh, USA on January 2012. A pdf version is available. A CheckpointonFailure Protocol for AlgorithmBased Recovery in Standard MPI, Wesley Bland, Peng Du, Aurelien Bouteiller, Thomas Herault, George Bosilca, and Jack J. Dongarra, EuroPar 2012 Parallel Processing, Lecture Notes in Computer Science Volume 7484, 2012, pp 477488 as a distinguished paper. A pdf version is available. From Serial Loops to Parallel Execution on Distributed Systems, Anthony Danalis, Aurelien Bouteiller, George Bosilca, Jack J. Dongarra, Thomas Herault, EuroPar 2012 Parallel Processing, Lecture Notes in Computer Science Volume 7484, 2012, pp 246257. A pdf version is available. Power Profiling of Cholesky and QR Factorizations on Distributed Memory Systems, George Bosilca, Jack Dongarra, and Hatem Ltaief, accepted at the EnAHPC 2012 : Third International Conference on EnergyAware High Performance Computing, International Conference on EnergyAware High Performance Computing, September 1214, 2012. A pdf version is available. Energy Footprint of Advanced Dense Numerical Linear Algebra using Tile Algorithms on Multicore Architecture, Jack Dongarra, Hatem Ltaief, Piotr Luszczek, and Vince M. Weaver, submitted to The 2nd International Conference on Cloud and Green Computing(CGC 2012) November 13, 2012, Xiangtan, Hunan, China. A pdf version is available. Anatomy of a Globally Recursive Embedded LINPACK Benchmark, Piotr Luszczek and Jack Dongarra, accepted in 2012 IEEE High Performance Extreme Computing Conference, Waltham, Massachusetts, September 2012. A pdf version is available. Weights for BlockAsynchronous Iteration on GPUAccelerated Systems, Hartwig Anzt, Stanimire Tomov, Jack Dongarra, and Vincent Heuveline, To appear in the 10th HeteroPar'2012 (Tenth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms), Rhodes Island, Greece, August 2012. A pdf version is available. GPUAccelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement, H. Anzt, P. Luszczek, J. Dongarra, V. Heuveline, EuroPar 2012 Parallel Processing, Lecture Notes in Computer Science Volume 7484, 2012, pp 908919, Rhodes Island, Greece, August 2012. A pdf version is available.  2011 HighPerformance HighResolution SemiLagrangian Tracer Transport on a Sphere, T. White and J. Dongarra, Journal of Computational Physics, Volume 230 Issue 17, July, 2011, pp 67786799. A pdf version is available. A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures, M. Horton, S. Tomov, and J. Dongarra, to appear 2011 Symposium on Application Accelerators in High Performance Computing, 1921 July, 2011, Knoxville TN. A pdf version is available. Algorithmbased Fault Tolerance Method for Soft Error Resilience in HighPerformance Linpack, Peng Du, Piotr Luszczek, and Jack Dongarra, IEEE Cluster 2011, September 2630, Austin, TX. A pdf version is available. Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures, Azzam , Hatem Ltaief, Asim YarKhan and Jack Dongarra, IPDPS 2011, Anchorage, AK, May 2011. A pdf version is available. BLAS for GPUs, R. Nath, S. Tomov, and J. Dongarra, pp 5780, in Scientific Computing with Multicore and Accelerators, Edited by Jakub Kurzak, David Bader, and Jack Dongarra, Chapman & Hall/CRC Computational Science Series, ISBN 9781439825365, 2011. A pdf version is available. Changes in Dense Linear Algebra Kernels, Decadeslong perspective, Piotr Luszczek, Jakub Kurzak, and Jack Dongarra, pp 313342, in Solving the Schr�dinger equation: has everything been tried? Editor Paul Popular, Imperial College Press, 2011, ISBN13 9781848167247. A pdf version is available. DAGuE: A generic distributed DAG engine for high performance computing,G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier, J. Dongarra, Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on , pp.11511158, 1620 May 2011, ISSN: 15302075. A pdf version is available. Dense Linear Algebra for Hybrid GPUBased Systems, S. Tomov and J. Dongarra, pp 3756, in Scientific Computing with Multicore and Accelerators, Edited by Jakub Kurzak, David Bader, and Jack Dongarra, Chapman & Hall/CRC Computational Science Series, ISBN 9781439825365, 2011. A pdf version is available. Evaluation of the HPC Challenge Benchmarks in Virtualized Environments, P. Luszczek, E. Meek, S. Moore, D. Terpstra, J. Dongarra, 6th Workshop on Virtualization in HighPerformance Cloud Computing (VHPC '11) as part of EuroPar 2011, Bordeux France. A pdf version is available. Exploiting FineGrain Parallelism in Recursive LU Factorization, Jack Dongarra, Mathieu Faverge, Hatem Ltaief, Piotr Luszczek, International Conference on Parallel Computing, 30 August  2 September 2011, Ghant Belgium. A pdf version is available. Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA,George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Azzam Haidar, Thomas Herault, Jakub Kurzak, Julien Langou, Pierre Lemarinier, Hatem Ltaief, Piotr Luszczek, Asim YarKhan, Jack Dongarra, 12th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC11), May 1620, 2011, Anchorage, Alaska, USA. A pdf version is available. Fully Empirical Autotuned Dense QR Factorization For Multicore Architectures, E. Agullo, J. Dongarra, R. Nath, S. Tomov, EuroPar 2011. A pdf version is available. High Performance Matrix Inversion Based on LU Factorization for Multicore Architectures,J. Dongarra, M. Faverge, H. Ltaief, P. Luszcsek, 4th Workshop on ManyTask Computing on Grids and Supercomputers (MTAGS) 2011, Colocated with Supercomputing/SC 2011, Seattle Washington, November 14th, 2011. A pdf version is available. HighPerformance HighResolution SemiLagrangian Tracer Transport on a Sphere, T. White and J. Dongarra, Journal of Computational Physics, Volume 230 Issue 17, July, 2011, pp 67786799. A pdf version is available. Impact of KernelAssisted MPI Communication over Scientific Applications: CPMD and FFTW, T. Ma, A. Bouteiller, G. Bosilca, J. Dongarra, EuroMPI2011, September 1921, 2011, Santorini Greece. A pdf version is available. Implementing Matrix Factorization on the Cell B.E., J. Kurzak, and J. Dongarra, pp. 2135, in Scientific Computing with Multicore and Accelerators, Edited by Jakub Kurzak, David Bader, and Jack Dongarra, Chapman & Hall/CRC Computational Science Series, ISBN 9781439825365, 2011. A pdf version is available. Implementing Matrix Multiplication on the Cell B.E., W. Alvaro, J. Kurzak, and J. Dongarra, pp 320, in Scientific Computing with Multicore and Accelerators, Edited by Jakub Kurzak, David Bader, and Jack Dongarra, Chapman & Hall/CRC Computational Science Series, ISBN 9781439825365, 2011. A pdf version is available. Improvement of parallelization efficiency of batch pattern BP training algorithm using Open MPI, Volodymyr Turchenko, Lucio Grandinetti, George Bosilca and Jack J. Dongarra, International Conferenc e on Computational Science, ICCS 2010, Amsterdam The Netherlands, June 2010. A pdf version is available. Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community, J.S. Vetter, R. Glassbrook, J. Dongarra, K. Schwan, B. Loftis, S. McNally, J. Meredith, J. Rogers, P. Roth, K. Spafford, and S. Yalamanchili, IEEE Computing in Science and Engineering, 13(5):905, 2011, ISSN: 15219615. A pdf version is available. Level3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms, Fred G. Gustavson, Jerzy Wasniewski, Jack J. Dongarra, J. Herrero, and J. Langou, accepted in ACM TOMS, June 2011. A pdf version is available. LU Factorization for Acceleratorbased Systems, Emmanuel Agullo, C�edric Augonnet, Jack Dongarra, Mathieu Faverge, Julien Langou, Hatem Ltaief, Stanimire Tomov, The 9TH ACS/IEEE International Conference on Computer Systems and Applications AICCSA 2011, June 27th  June 30th 2011, Sharm ElSheikh, Egypt. A pdf version is available. Multithreading in the PLASMA Library,Jakub Kurzak, Piotr Luszczek, Asim YarKhan, Mathieu Faverge, Julien Langou, Henricus Bouwmeester, and Jack Dongarra in Multi and ManyCore Technologies: Architecture, Programming, Algorithms, & Applications, published by Taylor & Francis, 2011. A pdf version is available. OMPIO: A Modular Software Architecture for MPI I/O, Mohamad Chaarawi, Edgar Gabriel, Rainer Keller, Richard Graham, George Bosilca and Jack Dongarra, EuroMPI2011, September 1921, 2011, Santorini Greece. A pdf version is available. On Scalability for MPI Runtime Systems, George Bosilca, Thomas Herault, Ala Rezmerita and Jack Dongarra, The International Workshop on Runtime and Operating Systems for Supercomputers, May 31, 2011. A pdf version is available. Optimizing Symmetric Dense MatrixVector Multiplication on GPUs, Jakub Kurzak, Jack Dongarra, and Rajib Nath, IEEE/ACM SC11 Conference, Seattle WA, November 2011. A pdf version is available. Overlapping Computation and Communication for Advection on Hybrid Parallel Computers, J. White and J. Dongarra, IPDPS 2011, Anchorage, AK, May 2011. A pdf version is available. Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems using Aggregated FineGrained and MemoryAware Kernels, Hatem Ltaief, Azzam , and Jack Dongarra, IEEE/ACM SC11 Conference, Seattle WA, November 2011. A pdf version is available. Performance Portability of a GPU Enabled Factorization with the DAGuE Framework,Aurelien Bouteiller, George Bosilca, Jack J. Dongarra, Thomas Herault, Pierre Lemarinier, Stanimir Tomov and Narapat Ohm Saengpatsa, IEEE Cluster: workshop on Parallel Programming on Accelerator Clusters (PPAC), June 24, 2011. A pdf version is available. Profiling High Performance Dense Linear Algebra Algorithms on Multicore Architectures for Power and Energy Efficiency, Hatem Ltaief, Piotr Luszczek and Jack Dongarra, the International Conference on EnergyAware High Performance Computing September 0709, 2011, Hamburg, Germany. A pdf version is available. QCGOMPI: MPI Applications on Grids, Emmanuel Agullo, Camille Coti, Thomas Herault, Julien Langou, Sylvain Peyronnet, Ala Rezmerita, Franck Cappello, Jack Dongarra, Future Generation Computer Systems, Volume 27, Issue 4, pp 357369, April 2011. A pdf version is available. QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment, Emmanuel Agullo, Camille Coti, Jack Dongarra, Thomas Herault, and Julien Langou, UTCS10651, Janua ry 6, 2010. A pdf version is available. QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, E. Agullo, C. Augonnet, J. Dongarra, M. Feverge, H. Ltaief, S. Thibault, S. Tomov, IPDPS 2011, Anchorage, AK, May 2011. A pdf version is available. Recent Advances in the Message Passing Interface 18th European MPI Users' Group Meeting,EuroMPI 2011 Santorini, Greece, September 1821, 2011, Yiannis Cotronis, Anthony Danalis, Dimitrios S. Nikolopoulos, and Jack Dongarra (Eds.) Springer, LNCS, Volume 6960, 2011, ISSN 03029743, ISBN 9783642244483. Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution, and Inverse. Fred G. Gustavson, Jerzy Wasniewski, Jack J. Dongarra, and J. Langou, ACM TOMS, Volume 37, Number 2, 2011, pp. 181:1821, 2011, ISSN 00983500. A pdf version is available. Reducing the Amount of Pivoting in Symmetric Indefinite Systems, D. Becker, M. Babolin, J. Dongarra, to appear PPAM, October 2011. A pdf version is available. Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure, G. Bosilca, T. Herault, P. Lemarinier, A. Rezmerita, and J. Dongarra, EuroMPI2011, September 1921, 2011, Santorini Greece. A pdf version is available. Scientific Computing with Multicore and Accelerators, Edited by Jakub Kurzak, David Bader, and Jack Dongarra, Chapman & Hall/CRC Computational Science Series, ISBN 9781439825365, 2011. Soft Error Resilient QR Factorization for Hybrid System with GPGPU,P. Du, P. Luszczek, S. Tomov, and J. Dongarra, Workshop on Latest Advances in Scalable Algorithms for LargeScale Systems (ScalA) held in conjunction with the 24th IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC) 2011, November 14, 2011, Seattle, WA, USA. A pdf version is available. Solving the Generalized Symmetric Eigenvalue Problem using Tile Algorithms on Multicore Architectures, Hatem Ltaief, Piotr Luszczek, and Jack Dongarra, International Conference on Parallel Computing, 30 August  2 September 2011, Ghant Belgium. A pdf version is available. The International Exascale Software Roadmap, J. Dongarra, P. Beckman, et. al, International Journal of High Performance Computing, Volume 25, Number 1, pp. 360, 2011, ISSN 10943420. A pdf version is available. Toward High Performance and Conquer Eigensolver for Dense Symmetric Matrices, Azzam Haidar, Hatem Ltaief, and Jack Dongarra, submitted to SIAM SISC, February 2011. A pdf version is available. Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures,Agullo, E., Bouwmeester, H., Dongarra, J., Kurzak, J., Langou, J., and Rosenberg, L., In Proceedings of the 9th International Meeting on High Performance Computing for Computational Science, VEC PAR'10, Berkeley, CA, June 2225 2011. A pdf version is available. Tracebased Performance Analysis for the Petascale Simulation Code FLASH, Heike Jagode, Jack Dongarra, Andreas Knupfer, Matthias Jurenz, Matthias S. Muller, and Wolfgang E. Nagel, International Journal of High Performance Computing, Volume 25, Number 4, Winter 2011, pp. 428439, ISSN 10943420. A pdf version is available. TwoStage Tridiagonal Reduction for Dense Symmetric Matrices using Tile Algorithms on Multicore Architectures, Piotr Luszczek, Hatem Ltaief, and Jack Dongarra, IPDPS 2011, Anchorage, AK, May 2011. A pdf version is available.  2010 Accelerating the Reduction to Upper Hessenberg, Tridiagonal, and Bidiagonal Forms Through Hybrid GPUBased Computing, S. Tomov, R. Nath, and J. Dongarra, Parallel Computing, Volume 36, Number 12, 2010, pp. 45654.A pdf version is available. An Improved MAGMA GEMM for Fermi GPUs, Rajib Nath, Stanimire Tomov, and Jack Dongarra, International Journal of High Performance Computing Applications, Volume 24, number 4, 2010, pp 511515, ISSN 10943420. A pdf version is available. Dense Linear Algebra Solvers for Multicore with GPU Accelerators, Stanimire Tomov, Rajib Nath, Hatem Ltaief, and Jack Dongarra, Proceedings of IPDPS 2010: 24th IEEE I nternational Parallel and Distributed Processing Symposium, Atlanta, GA, April 2010. A pdf version is available. Empirical Performance Tuning of Dense Linear Algebra Software, Jack Dongarra and Shirley Moore, pp 255272, in Performance Tuning of Scientific Applications, David H. Bailey, Robert F. Lucas, Samuel W. Williams, Editors, Chapman & Hall/CRC Computational Science Series, ISBN 9781439815694, 2010. A pdf version is available. Faster, Cheaper, Better  a Hybridization Methodology to Develop Linear Algebra Software for GPUs, Emmanuel Agullo, Cedric Augonnet, Jack Dongarra, Hatem Ltaief, Raymond Namyst, Samuel Thibault, and Stanimire Tomov, Nvidia GPU Gems, Morgan Kaufmann (Ed.), 2010. A pdf version is available. Hybrid Multicore Cholesky Factorization with Multiple GPU Accelerators, H. Ltaief, S. Tomov, R. Nath, and J. Dongarra, Submitted to IEEE Transaction on Parallel and Distributed Computing, March 2010. A pdf version is available. Parallel Band TwoSided Matrix Bidiagonalization for Multicore Architectures, Hatem Ltaief, Jakub Kurzak, and Jack Dongarra, IEEE Transactions on Parallel and Distributed Systems, April 2010, pp 417423. A pdf version is available. Redesigning the Message Logging Model for High Performance, A. Bouteiller, G. Bosilca, and J. Dongarra, Concurrency and Computation Practice and Experience, Volume 22, Number 15, November 2010, pp 21962212, ISSN 15320626. A pdf version is available. Scheduling Linear Algebra Operations on Multicore Processors, Jakub Kurzak, Hatem Ltaief, Jack Dongarra, and Rosa M. Badia, Concurrency and Computation: Practice and Experience, Vol. 22, no. 1, pp. 1544, January, 2010. A pdf version is available. Scheduling Twosided Transformations using AlgorithmsbyTiles on Multicore Architectures, H. Ltaief, J. Kurzak, J. Dongarra, and R. Badia, Scientific Programming, Volume 18, Number 1, pp 3550, 2010, ISSN 10589244. A pdf version is available. SelfHealing Network for Scalable FaultTolerant Runtime Environments, T. Angskun, G. Fagg, G. Bosilca, J. PjesivacGrbovic, and J Dongarra, Future Generation Computer Systems, Volume 26, Issue 3, pp 479485, March 2010, ISSN 0167739X, 2010. A pdf version is available. SmartGridRPC: The new RPC model for high performance Grid computing and its implementation in SmartGridSolve, T. Brady, A. Lastovetsky, K. Seymour, M. Guidolin,and J. Dongarra, Concurrency Practice and Experience, pp 24672487, Volume 22 Number 18, ISSN 15320626, 2010. A pdf version is available. Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems, Parallel Computing, Volume 36, Issues 56, pp 232240, 2010, ISSN 01678191. A pdf version is available. Reliability and Performance Modeling and Analysis for Grid Computing,YuanShun Dai, Jack Dongarra, in Handbook of Research on Scalable Computing Technologies, Editors KuanChing Li, ChingHsien Hsu, Laurence Tianruo Yang, Jack Dongarra, Hans Zima, IGI Global, 2010. A pdf version is available. Transparent CrossPlatform Access to Software Services using GridSolve and GridRPC, Keith Seymour, Asim YarKhan, and Jack Dongarra to appear in Cloud Computing and Software Services: Theory and Techniques, editors Syed Ahson and Mohammad Ilyas, 2010, CRC Press. A pdf version is available.  2009 A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures, Alfredo Buttari, Julien Langou, Jakub Kurzak, and Jack Dongarra, Parallel Computing, Volume 35, Issue 1, pp 3853, 2009, ISSN:01678191 A pdf version is available. Accelerating Scientific Computations with Mixed Precision Algorithms, Marc Baboulin, Alfredo Buttari, Jack Dongarra, Jakub Kurzak, Julie Langou, Julien Langou, Piotr Luszczek, and Stanimire Tomov, Computer Physics Communications 180 (2009) 25262533. A pdf version is available. Accelerating TimeToSolution for Computational Science and Engineering, J. Demmel, J. Dongarra, A. Fox, S. Williams, V. Volkov, and K. Yelick, SciDAC Review, Winter 2009, pp 4657. A pdf version is available. Algorithmic Based Fault Tolerance Applied to High Performance Computing, Jack J. Dongarra, George Bosilca, Remi Delmas, and Julien Langou, Journal of Parallel and Distributed Computing, Volume 69, pp 410416, 2009. A pdf version is available. Computing the Conditioning of the Components of a Linear Least Squares Solution, Marc Baboulin, Jack Dongarra, and Julien Langou,Numerical Linear Algebra with Applications, July 2009, Volume 16 Issue 7, p 517533. A pdf version is available. Highly Scalable SelfHealing Algorithms for High Peroformance Scientific Computing, Zizhong Chen and Dongarra, J.IEEE Transactions on Computing, Volume 58, Number 11, November 2009, pp 15121524, ISSN 00189340. A pdf version is available. Optimizing Matrix Multiplication for a ShortVector SIMD Architecture  CELL Processor, Wesley Alvaro, Jakub Kurzak, and Jack Dongarra, Parallel Computing, Volume 35, pp 138150, 2009. A pdf version is available. Paravirtualization Effect on Single and Multithreaded MemoryIntensive Linear Algebra Software, Lamia Youseff, Keith Seymour, Haihang You, Dmitrii Zagorodnov, Jack Dongarra, and Rich Wolski, Cluster Computing Journal, Volume 12, Number 2 / June, 2009, pp 101122, ISSN 13867857. A pdf version is available. QR Factorization for the CELL Processor, Jakub Kurzak and Jack Dongarra, Accepted in Scientific Programming, Scientific Programming, Volume 17, Issue 12, January 2009, pp 3142, ISSN:10589244. A pdf version is available. Scheduling Linear Algebra Operations on Multicore Processors, Jakub Kurzak, Hatem Ltaief, Jack Dongarra, and Rosa Badia, to appear in Trends in High Performance and Large Scale Computing, editors L. Grandinetti, G. Joubert, and W. Gentzsch, IOP Press, to be published in 2009. A pdf version is available. The International Exascale Software Project: A Call to Cooperative Action by the Global High Performance Community, Jack Dongarra, Pete Beckman, Patrick Aerts, Frank Cappello, Thomas Lippert, Satoshi Matsuoka, Paul Messina, Terry Moore, Rick Stevens, Anne Trefethen, Mateo Valero, Volume 23, Number 4, Winter 2009, International Journal of High Performance Computer Applications, pp 309322, ISSN 10943420. A pdf version is available. The Problem with the Linpack Benchmark Matrix Generator, Julien Langou and Jack Dongarra, International Journal of High Performance Computer Applications, Volume 23, Number 1, Spring 2009, pp 5  14. A pdf version is available.  2008 A Comparison of Search Techniques for Empirical Code Optimization, Keith Seymour, Haihang You, and Jack Dongarra, submitted to The Third international Workshop on Automatic Performance Tuning, October 1st, 2008, Tsukuba International Congress Center, Epochal Tsukuba, Japan. A pdf version is available. A Tribute to Gene Golub, Jack Dongarra, Computing in Science and Engineering, IEEE, March/April 2008, pp 5. A pdf version is available. AlgorithmBased Fault Tolerance for FailStop Failures, Zizhong Chen and Jack Dongarra, IEEE Transactions on Parallel and Distributed Systems, Vol. 19, No. 12, December, 2008. A pdf version is available. Interactive GridAccess Using Gridsolve and Giggle, M. Hardt, K. Seymour, J. Dongarra, M. Zapf, and N.V. Ruiter, Computing and Informatics, Vol. 27, No. 2, pp 233248, 2008, ISSN 13359150. A pdf version is available. Interior State Computation of Nano Structures,Andrew Canning, Jack Dongarra, Julien Langou, Osni Marques, Stanimire Tomov, Christof Voemel, and LinWang Wang, PARA 2008, 9th International Workshop on StateoftheArt in Scientific and Parallel Computing, May 1316, 2008, Trondheim Norway. A pdf version is available. Netlib and NANet: Building a Scientific Computing Community, J. Dongarra, G. Golub, E. Grosse, C. Moler, K. Moore, IEEE Annals of the History of Computing, Volume 3 Number 2, April  June 2008, pp 30  41. A pdf version is available. Parallel Tiled QR Factorization for Multicore Architectures, Alfredo Buttari, Julien Langou, Jakub Kurzak, and Jack Dongarra, Concurrency and Computation: Practice and Experience, 2008; 20:15731590. A pdf version is available. Revisiting Matrix Product on MasterWorker Platforms, Jack Dongarra, JeanFrançois Pineau, Yves Robert, Zhiao Shi and Frédéric Vivien, International Journal of Foundations of Computer Science (IJFCS), Volume 19, Number 6, December 2008, pp 13171336. A pdf version is available. Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization, Jakub Kurzak, Alfredo Buttari, and Jack Dongarra, IEEE Transactions on Parallel and Distributed Systems, Volume 19, Number 9, September 2008, pp 1  11. A pdf version is available. Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures, Marc Baboulin, Stan Tomov and Jack Dongarra, PARA 2008, 9th International Workshop on StateoftheArt in Scientific and Parallel Computing, EECS Tech Report UTCS08615, LAWN #200, May 1316, 2008, Trondheim Norway. A pdf version is available. StateoftheArt Eigensolvers for Electronic Structure Calculations of Large Scale NanoSystems, Christof Vomel, Stanimire Z. Tomov, Osni A. Marques, A. Canning, LinWang Wang, and Jack J. Dongarra, Journal of Computational Physics, Volume 227, Issue 15 (July 2008), pages 71137124. A pdf version is available. The PlayStation 3 for High Performance Scientific Computing, Jakub Kurzak, Alfredo Buttari, Piotr Luszczek, and Jack Dongarra, Computing in Science and Engineering, IEEE, May/June 2008, pp 8083. A pdf version is available. Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64bit Accuracy, Alfredo Buttari, Jack Dongarra, Jakub Kurzak, Piotr Luszczek, and Stanimire Tomov, ACM Transactions on Mathematical Software, Volume 34 Number 4, July 2008, pp 1  22. A pdf version is available.  2007 Automatic Analysis of Inefficiency Patterns in Parallel Applications, Felix Wolf, Bernd Mohr, Jack Dongarra, Shirley Moore, Concurrency and Computation: Practice and Experience, Volume 19, Issue 11, pp 14811496, August 2007. A pdf version is available. Implementation of Mixed Precision in Solving Systems of Linear Equations on the Cell Processor, Jakub Kurzak, Jack Dongarra, Concurrency and Computation: Practice and Experience, Volume 19, Issue 10, pp 13711385, July 2007. A pdf version is available. Improved Runtime and Transfer Time Prediction Mechanisms in a Network Enabled Servers Middleware, Emmanuel Jeannot, Keith Seymour, Asim YarKhan, and Jack J. Dongarra, Parallel Processing Letters, March 2007, Volume 17, Number 1, pp 4759, ISSN 01296264. A pdf version is available. Performance Analysis of MPI Collective Operations, Jelena PjesivacGrbovi´c, Thara Angskun, George Bosilca, Graham E. Fagg, Edgar Gabriel, and Jack J. Dongarra, Cluster Computing Journal, Volume 10, pp 127143, 2007. A pdf version is available. Recovery Patterns for Iterative Methods in a Parallel Unstable Environment, G. Bosilca, Z. Chen, J. Dongarra, and J. Langou, SIAM Journal on Scientific Computing, pp 102116, Volume 30, Number 1, 2007. A pdf version is available. Scalability Analysis of the SPEC OpenMP Benchmarks on LargeScale Shared Memory Multiprocessors, K. Fuerlinger, M. Gerndt, J. Dongarra, in Lecture Notes in Computer Science, Volumes 44874490, Computational Science  ICCS 2007, 7th International Conference Beijing, China, May 27  30, 2007, Editors Yong Shi, Geert Dick van Albada, Jack Dongarra, and Peter M.A. Sloot, ISBN10 354072589X, ISSN 03029743, Springer Berlin / Heidelberg, 2007. A pdf version is available. The Impact of Multicore on Computational Science Software, Jack Dongarra, Dennis Gannon, Geoffrey Fox, and Ken Kennedy, CTWatch Quarterly, Volume 3 Number 1, February 2007, (Unreviewed). A pdf version is available. The Use of Bulk States to Accelerate the Band Edge State Calculation of a Semiconductor Quantum Dot, Christof Vomel, Stanimire Z. Tomov, LinWang Wang, Osni A. Marques, and Jack J. Dongarra, Journal of Computational Physics, Volume 223, Number 2, pp 774782, ISSN 00219991, 2007. A pdf version is available.  2006 ConjugateGradient Eigenvalue Solvers in Computing Electronic Properties of Nanostructure Architectures, Stanimire Tomov, Julien Langou, Andrew Canning, LinWang Wang, and Jack Dongarra, The International Journal of Computational Science and Engineering, Volume 2, Number 3/4, pp 205212, 2006, ISSN 17427185. A pdf version is available. Design and Implementation of the HPC Challenge Benchmark Suite, Piotr Luszczek, Jack Dongarra, Jeremy Kepner, CTWatch Quarterly, November 2006, Volume 2, Number 4A, http://www.ctwatch.org/quarterly/archives/november2006/ (Unreviewed). A pdf version is available. NanoPSE: A Nanoscience Problem Solving Environment for Atomistic Electronic Structure of Semiconductor Nanostructures, W. B. Jones, G. Bester, A. Canning, A. Franceschetti, P. A. Graf, K. Kim, J. Langou, L.W. Wang, J. Dongarra, and A. Zunger, , in "the Proceedings of Science Discovery through Advanced Computing (SciDAC 2005)", Journal of Physics: Conference Series 16, 277282, 2005. A pdf version is available. Predicting the Electronic Properties of 3D, MillionAtom Semiconductor Nanostructure Architectures, A. Zunger, A. Franceschetti, G. Bester, W.B. Jones, Kwiseon Kim, P. A. Graf, LW. Wang, A. Canning, O. Marques, C. Voemel, J. Dongarra, J. Langou and S. Tomov, Journal of Physics: 46 (2006) 292298. A pdf version is available. Scheduling Workflow Applications on Processors with Different Capabilities, Zhiao Shi and Jack Dongarra, Future Generation Computing Systems, Volume 22, pp 665675, 2006. A pdf version is available. Recent Developments in GridSolve, Asim YarKhan, Keith Seymour, Kiran Sagi, Zhiao Shi, and Jack Dongarra, International Journal of High Performance Applications and Supercomputing, Volume 20 Number 1 Spring 2006, ISSN 10943420, pp 131132. A pdf version is available. Self Adapting Numerical Software (SANS) Effort, George Bosilca, Zizhong Chen, Jack Dongarra, Victor Eijkhout, Graham E. Fagg, Erika Fuentes, Julien Langou, Piotr Luszczek, Jelena PjesivacGrbovic, Keith Seymour, Haihang You, and Sathish S. Vadhiyar, IBM Journal of Research and Development, pp. 223238, Volume 50, Number 2/3, 2006. A pdf version is available. Trends in HighPerformance Computing, Jack Dongarra, January/February 2006, IEEE Circuits & Devices Magazine, pp 2227, ISSN 87553996. A pdf version is available. TwentyPlus Years of Netlib and NANet, Part 1 and 2, SIAM News, pp 13, Volume 39, Number 3&4, April & May 2006 (Unreviewed news article). A pdf version is available.  2005 A Not So Simple Matter of Software, Jack Dongarra, NCSA Access, Summer 2005 (nonrefereed magazine publication). A pdf version is available. A Scalable Approach to MPI Application Performance Analysis, Shirley Moore, Felix Wolf, Jack Dongarra, Sameer Shende, Patricia Teller, and Bernd Mohr, Volume 3666, Recent Advances in Parallel Virtual Machine and Messaging Passing Interface Users' Group Meeting Euro PVMMPI 2005, pp 309316, Springer Heidelberg, 2005, ISSN: 03029743. A pdf version is available. An Asynchronous Algorithm on NetSolve Global Computing System, Jack Dongarra, Nahid Emad, S. A. Shahzadeh Fazeli, Future Generation Computing Systems , Vol. 22, No. 3, pp 279290, 2005. A pdf version is available. Biological Sequence Alignment on the Computational Grid using the GrADS Framework, Asim YarKhan and Jack Dongarra, Future Generation Computer Systems, Volume 21, Issue 6, pp 980986, June 2005. A pdf version is available. Condition Numbers of Gaussian Random Matrices, Zizhong Chen and Jack Dongarra, SIAM Matrix Analysis and Applications, Volume 27, Number 3, pp 603620, 2005. A pdf version is available. Evaluating Dynamic Communicators and OneSided Operations for Current MPI Libraries, Edgar Gabriel, Graham E. Fagg, and Jack J. Dongarra, International Journal of High Performance Computing Applications, Volume 19, Number 1, pp 6781, Spring 2005, ISSN 10943420. A pdf version is available. Hash Functions for Datatype Signatures in MPI, George Bosilca, Jack Dongarra, Graham Fagg, and Julien Langou, Lecture Notes in Computer Science, Volume 3666, Recent Advances in Parallel Virtual Machine and Messaging Passing Interface Users' Group Meeting Euro PVMMPI 2005, pp 7683, Springer Heidelberg, 2005, ISSN: 03029743. A pdf version is available. High Performance Computing: Clusters, Constellations, MPPs, and Future Directions, Jack Dongarra, Thomas Sterling, Horst Simon, and Erich Strohmaier, Computing in Science and Engineering, Volume 7, Number 2, March/April 2005, pp. 5159, ISSN 15219615. A pdf version is available. New Grid Scheduling and Rescheduling Methods in the GrADS Project, F. Berman, H. Casanova, A Chien, K. Cooper, H. Dail, A. Dasgupta, W. Deng, J. Dongarra, L. Johnsson, K. Kennedy, C. Koelbel, B. Liu, X. Liu, A. Mandal, G. Marin, M. Mazina, J. MellorCrummey, C. Mendes, A. Olugbile, M. Patel, D. Reed, Z. Shi,O. Sievert, H. Xia, and A.YarKhan, International Journal of Parallel Programming, Vol. 33, No. 2, June 2005. A pdf version is available. Process FaultTolerance: Semantics, Design and Applications for High Performance Computing, Graham E. Fagg, Edgar Gabriel, Zizhong Chen, Thara Angskun, George Bosilca, Jelena PjesivacGrbovic, and Jack J. Dongarra, International Journal for High Performance Applications and Supercomputing, Vol. 19, N0. 4, pp 465478. 2005. A pdf version is available. Recent Trends in the Marketplace of High Performance Computing, Erich Strohmaier, Jack J. Dongarra, Hans W. Meuer, and Horst D. Simon, Parallel Computing, Volume 31, Issues 34 , pp 261273, MarchApril 2005. A pdf version is available. Scalable Fault Tolerant MPI: Extending the Recovery Algorithm, Graham E. Fagg, Thara Angskun, George Bosilca, Jelena PjesivacGrbovic, and Jack J. Dongarra, Lecture Notes in Computer Science, Volume 3666, Recent Advances in Parallel Virtual Machine and Messaging Passing Interface Users' Group Meeting Euro PVMMPI 2005, pp 6775, Springer Heidelberg, 2005, ISSN: 03029743. A pdf version is available. Scanning the Special Issue on Program Generation Optimization and Platform Adaptation, J.M.F. Moura, M. Puschel, D. Padua, and J. Dongarra, Proceedings of the IEEE, Volume 93, Number 2, February 2005, pp 211215, ISSN 00189219. A pdf version is available. Self Adapting Linear Algebra Algorithms and Software, Jim Demmel, Jack Dongarra, Victor Eijkhout, Erika Fuentes, Antoine Petitet, Rich Vuduc, R. Clint Whaley, Katherine Yelick, Proceedings of the IEEE, Volume 93, Number 2, February 2005, pp 293312, ISSN 00189219. A pdf version is available. Self Adaptivity in Grid Computing, S. Vadhiyar and J. Dongarra, Concurrency and Computation: Practice and Experience. Volume 17, Issue 24, 2005, pp. 235257. A pdf version is available. The Component Structure of a SelfAdapting Numerical Software System, Victor Eijkhout, Erika Fuentes, Thomas Eidson, and Jack Dongarra, International Journal of Parallel Programming, Vol. 33, No. 2, June 2005. A pdf version is available. The Top500 and Computational Science, A not so simple matter of software, Jack Dongarra, Scientific Computing, pp 1416, August 2005 (nonrefereed magazine publication). A pdf version is available.  2004 Simplified Grid Computing through Spreadsheets and NetSolve, David Abramson, Jack Dongarra, Eric Meek, Paul Roe, Zhiao Shi, High Performance Computing and Grid in Asia Pacific Region, 2004. Proceedings. Seventh International Conference, 2222 July 2004 DOI: 10.1109/HPCASIA.2004.1324012 A pdf version is available. Building and Using a Fault Tolerant MPI Implementation, Graham E Fagg and Jack J Dongarra, International Journal of High Performance Applications and Supercomputing, Volume 18, number 3, Fall 2004, pp 353362, ISSN 10943420. A pdf version is available. GrADSolve  A Gridbased RPC system for Remote Invocation of Parallel Software, Sathish Vadhiyar and Jack Dongarra, Journal of Parallel and Distributed Computing, 64(6):774783, June 2004, ISSN 07437315. A pdf version is available. Self Adapting Software for Numerical Linear Algebra and LAPACK for Clusters, Z. Chen, J. Dongarra, P. Luszczek, and K. Roche, Parallel Computing 29(1112):17231743, November/December 2003, ISSN 01678191. A pdf version is available. The Virtual Instrument: Support for Gridenabled MCell Simulations, Henri Casanova, Thomas Bartol, Francine Berman, Erhan Gokcay, Adam Birnbaum, Jack Dongarra, Mark Ellisman, Marcio Faerman, Michelle Miller, Graziano Obertelli, Stuart Pomerantz, Terry Sejnowski, Joel Stiles, Rich Wolski, International Journal of High Performance Computing Applications, Volume 18, Number 1, Spring 2004, pp 318, ISSN 10943420. A pdf version is available. Toward an Accurate Model for Collective Communications, Sathish Vadhiyar, Graham Fagg, Jack Dongarra, International Journal of High Performance Computing Applications, Volume 18, Number 1, Spring 2004, pp 159166, ISSN 10943420. A pdf version is available. Trends in High Performance Computing, Jack Dongarra, The Computer Journal, 47(4):399403, The British Computer Society, 2004. A pdf version is available.  2003 Self Adaptability in Grid Computing, S. Vadhiyar and J. Dongarra, Currency and Computation: Practice and Experience, January 2003, ISSN 15320634. A pdf version is available. Selfadapting Numerical Algorithm for Next Generation Applications, J. Dongarra and V. Eijkhout, International Journal of High Performance Computing Applications 17(2):125132, Summer 2003, ISSN 10943420. A pdf version is available. Selfadapting Numerical Software and Automatic Tuning of Heuristics, Jack Dongarra and Victor Eijkhout, Lecture Notes in Computer Science, Volume 2660, SpringerVerlag Heidelberg, pp 759  770, ISSN: 03029743, June 2003. A pdf version is available. SRS: A Framework for Developing Malleable and Migratable Parallel Applications for Distributed Systems, S. S. Vadhiyar and J. J. Dongarra, Parallel Processing Letters 13(2):291312, June 2003, ISSN 01296264. A pdf version is available. The LINPACK Benchmark: Past, Present, and Future, J. J. Dongarra, P. Luszczek, and A. Petitet, Concurrency and Computation: Practice and Experience 15(9):803820, August 2003, ISSN 15320634. A pdf version is available.  2002 A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures, G. Henry, D. Watkins, and J. Dongarra, SIAM Journal on Scientific Computing 24(1):284311, January 2003, ISSN 10648275. A pdf version is available. An Updated Set of Basic Linear Algebra Subprograms (BLAS), L. S. Blackford, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, R. Pozo, K. Remington, and R. C. Whaley, ACM Transactions on Mathematical Software 28(2):135151, June 2002, ISSN 00983500. A pdf version is available. Automatic Translation of Fortran to JVM Bytecode, K. Seymour and J. Dongarra, Concurrency and Computation: Practice and Experience 15(35):207222, March/April 2003, ISSN 15320626 (print), 15320634 (electronic). A pdf version is available. Basic Linear Algebra Subprograms Technical (BLAST) Forum Standard, Special Issue  Part I, International Journal of High Performance Computing Applications 16(1):1111, Spring 2002, ISSN 10943420. A pdf version is available. Basic Linear Algebra Subprograms Technical (BLAST) Forum Standard, Special Issue  Part II, International Journal of High Performance Computing Applications 16(2):115199, Spring 2002, ISSN 10943420. A pdf version is available. HARNESS Fault Tolerant MPI Design, Usage and Performance Issues, G. E. Fagg and J. J. Dongarra, Future Generation Computer Systems 18(8):11271142, October 2002, ISSN 0167739X. A pdf version is available. Innovations of the NetSolve Grid Computing System, D. C. Arnold, H. Casanova, and J. Dongarra, Concurrency and Computation: Practice and Experience, Special Issue: Grid Computing Environments 14(1315):14571479, November/December 2002, ISSN 15320626 (print), 15320634 (electronic). A pdf version is available. Middleware for the Use of Storage in Communication, M. Beck, D. Arnold, A. Bassi, F. Berman, H. Casanova, J. Dongarra, T. Moore, G. Obertelli, J. Plank, M. Swany, S. Vadhiyar, and R. Wolski, Parallel Computing 28(12):17731788, December 2002, ISSN 01678191. A pdf version is available. NetBuild: Transparent CrossPlatform Access to Computational Software Libraries, K. Moore and J. Dongarra, Concurrency and Computation: Practice and Experience 14(1315):14451456, November/December 2002, ISSN 15320626 (print), 15320634 (electronic). A pdf version is available.  2001 A Comparison of Parallel Solvers for Diagonally Dominant and General NarrowBanded Linear Systems, P. Arbenz, A. Cleary, J. Dongarra, and M. Hegland, Parallel and Distributed Computing Practices, Special Issue: Parallel Numerical Linear Algebra 2(4):385400, November 1999, ISSN 10972803. A pdf version is available. Automated Empirical Optimization of Software and the ATLAS Project, R. Whaley, A. Petitet, and J. Dongarra, Parallel Computing 27(12):325, January 2001, ISSN 01678191. A pdf version is available. Biannual Top500 Computer Lists Track Changing Environments for Scientific Computing, J. Dongarra, H. Meuer, H. Simon, and E. Strohmaier, SIAM News 34(9), November 2001, ISSN 00361445. A pdf version is available. HARNESS and Fault Tolerant MPI, G. Fagg, A. Bukovsky, and J. Dongarra, Parallel Computing 27(11):14791496, October 2001, ISSN 01678191. A pdf version is available. High Performance Computing Trends, J. J. Dongarra, H. W. Meuer, H. D. Simon, and E. Strohmaier, HERMIS 2:155163, November 2001, ISSN 11087609. A pdf version is available. Iterative Solver Benchmark, J. Dongarra, V. Eijkhout, and H. van der Vorst, Scientific Programming 9(4):223231, 2001, ISSN 10589244. A pdf version is available. Measuring Computer Performance: A Practitioner��‚��„�s Guide, Book Review by D. Lilja, Cambridge University Press (ISBN 0521641055), SIAM Review 43(2):383384, 2001, ISSN 00361445. A pdf version is available. NetworkEnabled Solvers: A Step Toward GridBased Computing, J. Dongarra, SIAM News 34(10), December 2001, ISSN 00361445. A pdf version is available. Numerical Libraries and the Grid, A. Petitet, S. Blackford, J. Dongarra, B. Ellis, G. Fagg, K. Roche, and S. Vadhiyar, International Journal of High Performance Computing Applications 15(4):359374, Winter 2001, ISSN 10943420. A pdf version is available. Numerical Libraries and Tools for Scalable Parallel Cluster Computing, J. Dongarra, S. Moore, and A. Trefethen, International Journal of High Performance Computing Applications 15(2):175180, Summer 2001, ISSN 10943420. A pdf version is available. On the Convergence of Computational and Data Grids, D. C. Arnold, S. S. Vahdiyar, and J. J. Dongarra, Parallel Processing Letters 11(23):187202, June/September 2001, ISSN 01296264. A pdf version is available. Recursive Approach in Sparse Matrix LU Factorization, J. Dongarra, V. Eijkhout, and P. Luszczek, Scientific Programming 9(1):5160, 2001, ISSN 10589244. A pdf version is available. Telescoping Languages: A Strategy for Automatic Generation of Scientific ProblemSolving Systems from Annotated Libraries, K. Kennedy, B. Broom, K. Cooper, J. Dongarra, R. Fowler, D. Gannon, L. Johnsson, J. MellorCrummey, and L. Torczon, Journal of Parallel and Distributed Computing 61(12):18031826, December 2001, ISSN 07437315. A pdf version is available. The GrADS Project: Software Support for HighLevel Grid Application Development, F. Berman, A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johnsson, K. Kennedy, C. Kesselman, J. MellorCrummey, D. Reed, L. Torczon, and R. Wolski, International Journal of High Performance Computing Applications 15(4):327344, Winter 2001, ISSN 10943420. A pdf version is available. The Quest for Petascale Computing, J. Dongarra and D. Walker, Computing in Science and Engineering 3(3):3239, May/June 2001, ISSN 15219615. A pdf version is available.  2000 A Portable Programming Interface for Performance Evaluation on Modern Processors, S. Browne, J Dongarra, N. Garner, G. Ho, and P. Mucci, International Journal of High Performance Computing Applications 14(3):189204, Fall 2000, ISSN 10943420. A pdf version is available. The Design And Implementation Of The Parallel OutOfCore Scalapack LU, QR, And Cholesky Factorization Routines, E. D'Azevedo and J. Dongarra, Concurrency: Practice and Experience 12(15):14811493, 2000, ISSN 10403108. A pdf version is available.  1999 A Comparison Of Parallel Solvers For General Narrow Banded Linear Systems, P. Arbenz, A. Cleary, J. Dongarra, and M. Hegland, Parallel and Distributed Computing Practices 2(4):385400, December 1999, ISSN 10972803. A pdf version is available. A Parallel Divide and Conquer algorithm for the Symmetric Eigenvalue Problem, F. Tisseur and J. Dongarra, SIAM Journal on Scientific Computing 6(20):22232236, 1999, ISSN 10648275. A pdf version is available. Adaptive Scheduling for Task Farming with Grid Middleware, H. Casanova, M. Kim, J. Plank, and J. Dongarra, International Journal of High Performance Computing Applications 13(3):231240, Fall 1999, ISSN 10943420. A pdf version is available. Algorithmic Issues on Heterogeneous Computing Platforms, Pierre Boulet, J. Dongarra, F. Rastello, Y. Robert, and F. Vivien, Parallel Processing Letters 9(2):197213, 1999, ISSN 01296264. A pdf version is available. Algorithmic Redistribution Methods for BlockCyclic Decompositions, A. P. Petitet and J. J. Dongarra, IEEE Transactions on Parallel and Distributed Systems 10(12):201220, 1999, ISSN 10459219. A pdf version is available. Atlanta Organizers Put Mathematics to Work For the Math Sciences Community, M. Berry and J. Dongarra, SIAM News 32(6), July/August 1999, ISSN 00361445. A pdf version is available. Deploying Fault Tolerance and Task Migration with NetSolve, J. S. Plank, H. Casanova, M. Beck, and J. J. Dongarra, Future Generation Computer Systems 15(56):745755, October 1999, ISSN 0167739X. A pdf version is available. Experiences with Windows NT as a Cluster Computing Platform for Parallel Computing, M. Fischer and J. Dongarra, Parallel and Distributed Computing Practices, Special Issue: Cluster Computing 2(2):119128, June 1999, ISSN 10972803. A pdf version is available. HARNESS: A Next Generation Distributed Virtual Machine, M. Beck, J. J. Dongarra, G. E. Fagg, G. A. Geist, P. Gray, J. Kohl, M. Migliardi, K. Moore, T. Moore, P. Papadopoulous, S. L. Scott, and V. Sunderam, Future Generation Computer Systems 15(56):571582, October 1999, ISSN 0167739X. A pdf version is available. JLAPACK  Compiling LAPACK Fortran to Java, D. Doolin, J. Dongarra, and K. Seymour, Scientific Programming 7(2):111138, 1999, ISSN 10589244. A pdf version is available. Logistical Quality of Service in NetSolve, M. Beck, H. Casanova, J. Dongarra, T. Moore, J. Plank, F. Berman, and R. Wolski, Computer Communications 22(11):10341044, 1999, ISSN 01403664. A pdf version is available. Numerical Linear Algebra Algorithms and Software, J. Dongarra and V. Eijkhout, Journal of Computational and Applied Mathematics 123(12):489514, November 1, 2000, ISSN 03770427. A pdf version is available. Scalable Networked Information Processing Environment (SNIPE), G. E. Fagg, K. Moore, and J. J. Dongarra, Future Generation Computer Systems 15(56):595605, October 1999, ISSN 0167739X. A pdf version is available. Static Tiling For Heterogeneous Computing Platforms, P. Boulet, J. Dongarra, Y. Robert, and F. Vivien, Parallel Computing 25(5):547568, 1999, ISSN 01678191. A pdf version is available. Stochastic Performance Prediction for Iterative Algorithms in Distributed Environments, H. Casanova, M. Thomason, and J. Dongarra, Journal of Parallel and Distributed Computing 58(1):6891, July 1999, ISSN 07437315. A pdf version is available. The Marketplace for HighPerformance Computers, E. Strohmaier, J. Dongarra, H. Meuer, and H. Simon, Parallel Computing 25(1314):15171545, December 1999, ISSN 01678191. A pdf version is available. Tiling On Systems with Communication/Computation Overlap, P.Y. Calland, J. Dongarra, and Y. Robert, Concurrency: Practice and Experience 11(3):139153, 1999, ISSN 10403108. A pdf version is available.  1998 Applying NetSolve's Network Enabled Server, H. Casanova and J. Dongarra, IEEE Computational Science and Engineering 5(3):5767, July/September 1998, ISSN 10709924. A pdf version is available. Determining the Idle Time of a Tiling: New Results, F. Desprez, J. Dongarra, F. Rastello, and Yves Robert, Journal of Computing and Information Science in Engineering (Special Issue on Compiler Techniques for HighPerformance Computing) 14(1):167190, March 1998, ISSN 15309827. A pdf version is available. Developing Numerical Libraries in Java, R. F. Boisvert, J. J. Dongarra, R. Pozo, K. A. Remington, and G. W. Stewart, Concurrency: Practice and Experience 10(1113):11171129, 1998, ISSN 10403108. A pdf version is available. National HPCC Software Exchange (NHSE): Uniting the High Performance Computing and Communications Community, S. Browne, J. Dongarra, J. Horner, P. McMahan, S. Wells, DLib Magazine (Electronic), May 1998, ISSN 10829873. A pdf version is available. Programming Tools and Environments, J. Saltz, A. Sussman, S. Graham, J. Demmel, S. Baden, and J. Dongarra, Communications of the ACM 41(11):6473, November 1998, ISSN 00010782 A pdf version is available. Scheduling BlockCyclic Array Redistribution, F. Desprez, J. Dongarra, A. Petitet, C. Randriamaro, and Y. Robert, IEEE Transactions on Parallel and Distributed Systems 9(2):192205, February 1998, ISSN 10459219. A pdf version is available. Using Agentbased Software for Scientific Computing in the NetSolve System, H. Casanova and J. Dongarra, Parallel Computing 24(1213):17771790, November, 1998, ISSN 01678191.k A pdf version is available.  1997 Changing Technologies of HPC, J. J. Dongarra, H. W. Meuer, H. D. Simon, and E. Strohmaier, Future Generation Computer Systems 12(5):461474, April 1997, ISSN 0167739X. A pdf version is available. Fault Tolerant Matrix Operations for Networks of Workstations Using Diskless Checkpointing, J. Plank, Y. Kim, and J. Dongarra, Journal of Parallel and Distributed Computing 43(2):125138, 1997, ISSN 07437315. A pdf version is available. Java Access to Numerical Libraries, H. Casanova, J. Dongarra, and D. Doolin, Concurrency: Practice and Experience 9(11):12791291, 1997, ISSN 10403108. A pdf version is available. Key Concepts for Parallel Out of Core LU Factorization, J. Dongarra, S. Hammarling, and D. Walker, Parallel Computing 23(12):4970, April 1997. ISSN 01678191. A pdf version is available. MessagePassing Performance of Various Computers, J. Dongarra and T. Dunigan, Concurrency: Practice and Experience 9(10):915926, 1997, ISSN 10403108. A pdf version is available. NetSolve: A NetworkEnabled Server for Solving Computational Science Problems, H. Casanova, and J. Dongarra, The International Journal of Supercomputer Applications and High Performance Computing 11(3):212223, Fall 1997. ISSN 10783482. A pdf version is available. Practical Experience in the Numerical Dangers of Heterogeneous Computing, L. S. Blackford, A. Cleary, J. Demmel, J. Dongarra, I. Dhillon, S. Hammarling, A. Petitet, H. Ren, K. Stanley, and R. C. Whaley, ACM Transactions on Mathematical Software 23(2):133147, June 1997, ISSN 00983500. A pdf version is available. The Spectral Decomposition of Nonsymmetric Matrices on Distributed Memory Computers, J. Bai, J. Demmel, J. Dongarra, A. Petitet, H. Robinson, and K. Stanley, SIAM Journal on Scientific Computing 18(5):14461461, 1997, ISSN 01965204. A pdf version is available. Top500 Supercomputer Sites, J. Dongarra, H. W. Meuer and E. Strohmaier, Supercomputer 67:89120, 1997, ISSN 01687875. A pdf version is available.  1996 A Message Passing Standard for MPP and Workstations, J. Dongarra, S. W. Otto, M. Snir, and D. Walker, Communications of the ACM 39(7):8490, July 1996, ISSN 00010782. A pdf version is available. Algorithmic Bombardment for the Iterative Solution of Linear Systems: A PolyIterative Approach, R. Barrett, M. Berry, J. Dongarra, V. Eijkhout, and C. Romine, Journal of Computational and Applied Mathematics 74(12):91110, November 1996, ISSN 03770427. A pdf version is available. Chebyshev tau  QZ Algorithm Methods for Calculating Spectra of Hydrodynamic Stability Problems, J. Dongarra, B. Straughan and D. W. Walker, Applied Numerical Mathematics 22(4):399435, 1996, ISSN 01689274. A pdf version is available. Future Linear Algebra Libraries, J. Dongarra, IEEE Computational Science and Engineering 3(2):3840, Summer 1996, ISSN 10709924. A pdf version is available. LAPACK for Fortran90, J. Dongarra, J. Du Croz, S. Hammarling, J. Wasniewski, A. Zemla, Applied Mathematics and Computer Science 6(2):101109, 1996, ISSN 1641876X. A pdf version is available. MPI: A Standard Message Passing Interface, J. Dongarra and D. Walker, Supercomputer 12(1):5668, January 1996, ISSN 01687875. Overview of HighPerformance Computers, A. van der Steen and J. Dongarra, Electronic Journal of the NHSE Review 1(1), 1996, HTML. PBBLAS: A Set of Parallel Block Basic Linear Algebra Subroutines, J. Choi, J. Dongarra, and D. Walker, Concurrency: Practice and Experience 8(7):517535, September 1996, ISSN 10403108. A pdf version is available. PVMPI: An Integration of PVM and MPI Systems, G. Fagg and J. Dongarra, Calculateurs Parallèles 8(2):151166, 1996, Hermes, ISSN 12603198. A pdf version is available. ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers  Design Issues and Performance, J. Choi, J. Demmel, J. Dongarra, I. Dhillon, S. Ostrouchov, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley, Computer Physics Communications 97(12):115, August 1996, ISSN 00104655. A pdf version is available. The Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines, J. Choi, J. J. Dongarra, L. S. Ostrouchov, A. P. Petitet, D. W. Walker and R. C. Whaley, Scientific Programming 5(3):173184, Fall 1996, ISSN 10589244. A pdf version is available.  1995 A Highly Parallel Algorithm for the Reduction of a Nonsymmetric Matrix to Block UpperHessenberg Form, M. W. Berry, J. Dongarra, and Y. Kim, Parallel Computing 21(8):11891212, August 1995, ISSN 01678191. A pdf version is available. Parallel Matrix Transpose Algorithms on Distributed Memory Concurrent Computers, J. Choi, J. Dongarra, and D. Walker, Parallel Computing 21(9):13871405, 1995, ISSN 01678191. A pdf version is available. Performance Study of LU Factorization with Low Communication Overhead on Multiprocessors, F. Desprez, J. Dongarra, and B. Tourancheau, Parallel Processing Letters 5(2):157169, June 1995, ISSN 01296264. A pdf version is available. Recent Enhancements to PVM, A. Beguelin, J. Dongarra, A. Geist, R. Manchek, and V. Sunderam, International Journal of Supercomputer Applications and High Performance Computing 9(2):108127, Summer 1995, ISSN 10783482. A pdf version is available. Software Distribution Using XNETLIB, J. Dongarra, T. Rowan and R. Wade, ACM Transactions on Mathematical Software 21(1):7988, March 1995, ISSN 00983500. A pdf version is available. Software Libraries for Linear Algebra Computations on High Performance Computers, J. Dongarra and D. Walker, SIAM Review 37(2):151180, June 1995, ISSN 00361445. A pdf version is available. The Design of a Parallel, Dense Linear Algebra Software Library: Reduction to Hessenberg, Tridiagonal, and Bidiagonal Form, J. Choi, J. Dongarra, and D. Walker, Numerical Algorithms 10(34):379400, 1995, ISSN 10171398. A pdf version is available. The National HPCC Software Exchange, S. Browne, J. Dongarra, S. Green, K. Moore, T. Rowan, R. Wade, G. Fox, K. Hawick K. Kennedy, J. Pool, R. Stevens, B. Olsen, and T. Disz, IEEE Computational Science and Engineering 2(2):6269, Summer 1995, ISSN 10709924. A pdf version is available. The Netlib Mathematical Software Repository, S. Browne, J. Dongarra, E. Grosse, and T. Rowan, DLib Magazine, Electronic Journal, September 1995, ISSN 10829873, http://www.dlib.org/dlib/september95/netlib/09browne.html. A pdf version is available. The ParkBench Benchmark Collection, J. Dongarra and T. Hey, Supercomputer 11(23):94115, June 1995, ISSN 01687875. Top500 Supercomputer Sites, J. Dongarra, H. Meuer and E. Strohmaier, Supercomputer 11(23):133194, June 1995, ISSN 01687875. A pdf version is available.  1994 CRPC Research into Linear Algebra Software for HighPerformance Computers, J. Choi, J. J. Dongarra, R. Pozo, D. C. Sorensen, and D. W. Walker, International Journal of Supercomputing Applications 8(2):99118, Summer 1994, ISSN 08902720. A pdf version is available. Experiences with CODE and HeNCE in Visual Programming for Parallel Computing, J. C. Browne, J. Dongarra, S. I. Hyder, K. Moore, and P. Newton, IEEE Parallel and Distributed Technology 3(1):7583, Spring 1994, ISSN 10636552. A pdf version is available. HeNCE: A Heterogeneous Network Computing Environment, A. Beguelin, J. J. Dongarra, G. A. Geist, R. Manchek, and K. Moore, Scientific Programming 3(1):4960, Spring 1994, ISSN 10589244. A pdf version is available. MPI: A Message Passing Interface Standard, Special Issue, International Journal of Supercomputer Applications 8(34):159416, Fall/Winter 1994, ISSN 08902720. A pdf version is available. PARKBENCH Report  1: Public International Benchmarks for Parallel Computers, PARKBENCH Committee (assembled by R. Hockney and M. Berry, with contributions from D. Bailey, M. Berry, J. Dongarra, V. Getov, T. Haupt, T. Hey, R. Hockney, and D. Walker), Scientific Programming 3(2):101146, 1994, ISSN 10599244. A pdf version is available. PDS: A Performance Database Server, M. W. Berry, J. Dongarra, B. H. LaRose, and T. Letsche, Scientific Programming 3(2):147156, 1994, ISSN 10599244. A pdf version is available. PUMMA: Parallel Universal Matrix Multiplication Algorithms on Distributed Memory Concurrent Computers, J. Choi, J. J. Dongarra, and D. W. Walker, Concurrency: Practice and Experience 6(7):543570, October 1994, ISSN 10403108. A pdf version is available. Scalability Issues in the Design of a Library for Dense Linear Algebra, J. J. Dongarra, R. A. van de Geijn, and D. W. Walker, Journal of Parallel and Distributed Computing 22(3):523537, September 1994, ISSN 07437315. A pdf version is available. The PVM Concurrent Computing System: Evolution, Experiences, and Trends, V. S. Sunderam, J. Dongarra, G. A. Geist, and R Manchek, Parallel Computing 20(4):531545, March 31, 1994, ISSN 01678191. A pdf version is available.  1993 A Parallel Algorithm for the NonSymmetric Eigenvalue Problem, J. J. Dongarra and M. Sidani, SIAM Journal on Scientific Computing 14(3):542569, May 1993, ISSN 10648275. A pdf version is available. Integrated PVM Framework Supports Heterogeneous Network Computing, J. Dongarra, G. A. Geist, R. Manchek, and V. S. Sunderam, Computers in Physics 7(2):166175, April 1993, ISSN 08956111. A pdf version is available. Linear Algebra Libraries for HighPerformance Computers: A Personal Perspective, J. Dongarra, IEEE Parallel and Distributed Technology: Systems and Applications 1(1):1724, February 1993, ISSN 10636552. A pdf version is available. Performance of LAPACK: A Portable Library of Numerical Linear Algebra Routines, E. C. Anderson and J. Dongarra, Proceedings of the IEEE 81(8):10941102, August 1993, ISSN 00189219. A pdf version is available. Supporting Heterogeneous Network Computing: PVM, J. Dongarra, A. Geist, R. Manchek, and V. Sunderam, Chemical Design Automation News 8(910):3642, September/October 1993, ISSN 08866716. A pdf version is available. Visualization and Debugging in a Heterogeneous Environment, A. Beguelin, J. Dongarra, A. Geist, and V. Sunderam, IEEE Computer 26(6):8895, June 1993, ISSN 00189162. A pdf version is available.  1992 ALGORITHM 710; FORTRAN Subroutines for Computing the Eigenvalues and Eigenvectors of a General Matrix by Reduction to General Tridiagonal Form, J. J. Dongarra, G. A. Geist, and C. H. Romine, ACM Transactions on Mathematical Software 18(4):392400, December 1992, ISSN 00983500. A pdf version is available. Generalized QR Factorization and Its Applications, E. Anderson, Z. Bai, and J. Dongarra, Linear Algebra and Its Applications 162164:243271, February 1992, ISSN 00243795. A pdf version is available. Numerical Considerations in Computing Invariant Subspaces, J. J. Dongarra, S. Hammarling and J. H. Wilkinson, SIAM Journal on Matrix Analysis and Applications 13(1):145161, January 1992, ISSN 08954798. A pdf version is available. Performance of Various Computers Using Standard Sparse Linear Equations Solving Techniques, J. J. Dongarra and H. A. van der Vorst, Supercomputer 9(5):1729, September 1992, ISSN 01687875. A pdf version is available. Reduction to Condensed Form for the Eigenvalue Problem on Distributed Memory Architectures, J. J. Dongarra and R. A. van de Geijn, Parallel Computing 18(9):973982, September 1992, ISSN 01678191. A pdf version is available.  1991 A Comparative Study of Automatic Vectorizing Compilers, D. Levine, D. Callahan, and J. Dongarra, Parallel Computing, 17(1011):12231244, December 1991, ISSN 01678191. A pdf version is available. Opening the Door to Heterogeneous Network Supercomputing, A. Beguelin, J. Dongarra, A. Geist, R. Manchek, and V. Sunderam, Supercomputing Review 4(9):4445, September 1991, ISSN 10486836. A pdf version is available. Parallel Loops  A Test Suite for Parallelizing Compilers: Description and Example Results, J. Dongarra, M. Furtney, S. Reinhardt and J. Russell, Parallel Computing 17(1011):12471257, December 1991, ISSN 01678191. A pdf version is available. Special Report: 1990 Gordon Bell Prize Winners, J. Dongarra, A. H. Karp, K. Miura, and H. Simon, IEEE Software 8(3):9297, 102, May/June 1991, ISSN 07407459. A pdf version is available. The IBM RISC System/6000 and Linear Algebra Operations, J. Dongarra, P. Mayes and G. Radicati di Brozolo, Supercomputer 8(4):1530, July 1991, ISSN 01687875. A pdf version is available.  1990 A Set of Level 3 Basic Linear Algebra Subprograms, J. J. Dongarra, J. Du Croz, S. Hammarling, and I. S. Duff, ACM Transactions on Mathematical Software 16(1):117, March 1990, ISSN 00983500. A pdf version is available. Evolution of Numerical Software for Dense Linear Algebra, Jack Dongarra and Sven Hammarling, In M. G. Cox and S. Hammarling, editors, Reliable Numerical Computation, pages 297327. Oxford University Press, Oxford, UK, 1990. A pdf version is available. Automatic Blocking of Nested Loops, R. Schreiber and J. Dongarra, University of Tennessee Technical Report CS90108, Knoxville, TN 37996, USA, 1990. A pdf version is available. A Tool to Aid in the Design, Implementation, and Understanding of Matrix Algorithms for Parallel Processors, J. Dongarra, O. Brewer, J. A. Kohl, and S. Fineberg, Journal of Parallel and Distributed Computing 9(2):185202, June 1990, ISSN 07437315. A pdf version is available. Algorithm 679; A Set of Level 3 Basic Linear Algebra Subprogram: Model Implementation and Test Programs, J. J. Dongarra, J. Du Croz, S. Hammarling, and I. S. Duff, ACM Transactions on Mathematical Software 16(1):1828, March 1990, ISSN 00983500. A pdf version is available.  1989 Block Reduction of Matrices to Condensed Forms for Eigenvalue Computations, J. J. Dongarra, S. J. Hammarling, and D. C. Sorensen, Journal of Computational and Applied Mathematics 27(12):215227, September 1989, ISSN 03770427. A pdf version is available. Shopping for Mathematical Software Electronically, J. Dongarra and E. Grosse, IEEE Potentials 8(1):3738, February 1989, ISSN 02786648. A pdf version is available.  1988 Algorithm 656: An Extended Set of Basic Linear Algebra Subprograms: Model Implementation and Test Programs, J. J. Dongarra, J. Du Croz, S. Hammarling, R. J. Hanson, ACM Transactions on Mathematical Software 14(1):1832, March 1988, ISSN 00983500. A pdf version is available. An Extended Set of Fortran Basic Linear Algebra Subprograms, J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson, ACM Transactions on Mathematical Software 14(1): 117, March 1988, ISSN 00983500. A pdf version is available. Programming Methodology and Performance Issues for Advanced Computer Architectures, J. J. Dongarra, D. C. Sorensen, K. Connolly, and J. Patterson, Parallel Computing 8(13):4158, October 1988, ISSN 01678191. A pdf version is available. Tools to Aid in the Analysis of Memory Access Patterns for FORTRAN Programs, O. Brewer, J. Dongarra, and D. Sorensen, Parallel Computing 9(1):2535, December 1988, ISSN 01678191. A pdf version is available.  1987 A Fully Parallel Algorithm for the Symmetric Eigenvalue Problem, J. J. Dongarra and D. C. Sorensen, SIAM Journal on Scientific and Statistical Computing 8(2):139154, March 1987, ISSN 01965204. A pdf version is available. A Portable Environment for Developing Parallel FORTRAN Programs, J. J. Dongarra and D. C. Sorensen, Parallel Computing 5(12):175186, July 1987, ISSN 01678191. A pdf version is available. Computer Benchmarking: Paths and Pitfalls, J. Dongarra, J. Martin, and J. Worlton, IEEE Spectrum 24(7): 3843, June 1987, ISSN 00189235. A pdf version is available. Distribution of Mathematical Software via Electronic Mail, J. J. Dongarra and E. Grosse, Communications of the ACM 30(5):403407, May 1987, ISSN 00010782. A pdf version is available. Solving Banded Systems on a Parallel Processor, J. J. Dongarra and L. Johnsson, Parallel Computing 5(12):219246, July 1987, ISSN 01678191. A pdf version is available.  1986 How Do the "Minisupers" Stack Up?, J. J. Dongarra, IEEE Computer 19(3):93, 100, March 1986, ISSN 00189162. A pdf version is available. Implementing Dense Linear Algebra Algorithms Using Multitasking on the CRAY XMP4 (Or Approaching the Gigaflop), J. J. Dongarra and T. Hewitt, SIAM Journal on Statistical and Scientific Computing 7(1):347350, January 1986, ISSN 01965204. A pdf version is available. Implementation of Some Concurrent Algorithms for Matrix Factorization, J. J. Dongarra, A. H. Sameh, and D. C. Sorensen, Parallel Computing 3(1):2534, March 1986, ISSN 01678191. A pdf version is available. Linear Algebra on HighPerformance Computers, J. Dongarra and D. Sorensen, Applied Mathematics and Computation 20(12):5788, September 1986, ISSN 00963003. A pdf version is available. Squeezing the Most out of High Performance Computers for Finding the Eigenvalues, J. Dongarra, L. Kaufman, and S. Hammarling, Linear Algebra and Its Applications 77:113136, May 1986, ISSN 00243795. A pdf version is available.  1985 A Proposal for an Extended Set of Fortran Basic Linear Algebra Subprograms, J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson, ACM SIGNUM Newsletter 20(1):218, January 1985, ISSN 01635778. A pdf version is available. Algorithm Design for Different Computer Architectures, J. J. Dongarra, B. T. Smith, and D. Sorensen, IEEE Software 2(4):7980, July 1985. A pdf version is available.  1984 A Collection of Parallel Linear Equations Routines for the Denelcor HEP, J. J. Dongarra and R. E Hiromoto, Parallel Computing 1(2):133142, December 1984, ISSN 01678191. A pdf version is available. EISPACK  A Collection for Solving Eigenvalue Problems, J. Dongarra and C. Moler, in Sources and Development of Mathematical Software, W. R. Cowell, ed., pp. 6887, PrenticeHall: Upper Saddle River, NY, 1984, ISBN 0138235015. A pdf version is available. Implementing Linear Algebra Algorithms for Dense Matrices on a Vector Pipeline Machine, J. J. Dongarra, F. G. Gustavson and A. Karp, SIAM Review 26(1):91112, January 1984, ISSN 00361445. A pdf version is available. Multiprocessing Linear Algebra Algorithms on the CRAY XMP2: Experiences with Small Granularity, S. S. Chen, J. J. Dongarra, and C. C. Hsiung, Journal of Parallel and Distributed Computing 1(1):2231, August 1984, ISSN 07437315. A pdf version is available. On Some Parallel Banded System Solvers, J. J. Dongarra and A. H. Sameh, Parallel Computing 1(3):223235, December 1984. A pdf version is available. Performances comparés de 80 ordinateurs sur des programmes Fortran, J. J. Dongarra, Technique et Science Informatiques 3(5):355360, 1984, ISSN 07524072. A pdf version is available. Solving the Secular Equation Including Spin Orbit Coupling for Systems with Inversion and Time Reversal Symmetry, J. J. Dongarra, J. R. Gabriel, D. D. Koelling, and J. H. Wilkinson, Journal of Computational Physics 54(2):278288, May 1984, ISSN 00219991. A pdf version is available. Squeezing the Most out of an Algorithm in CRAY FORTRAN, J. J. Dongarra, and S. C. Eisenstat, ACM Transactions on Mathematical Software 10(3):219230, September 1984, ISSN 00983500. A pdf version is available. The Eigenvalue Problem for Hermitian Matrices with Time Reversal Symmetry, J. J. Dongarra, J. R. Gabriel, D. D. Koelling, and J. H. Wilkinson, Linear Algebra and Its Applications 60:2742, August 1984, ISSN 00243795. A pdf version is available.  1983 Improving the Accuracy of Computed Eigenvalues and Eigenvectors, J. J. Dongarra, C. B. Moler and J. H. Wilkinson, SIAM Journal on Numerical Analysis 20(1):2345, February 1983, ISSN 00361429. A pdf version is available. Improving the Accuracy of Computed Singular Values, J. J. Dongarra, SIAM Journal on Scientific and Statistical Computing 4(4):712719, December 1983, ISSN 01965204. A pdf version is available. Performance of Various Computers Using Standard Linear Equations Software in a Fortran Environment, J. J. Dongarra, ACM SIGARCH Computer Architecture News 11(5):2227, December 1983, ISSN 01635964. A pdf version is available.  1982 Algorithm 589: SICEDR: A FORTRAN Subroutine for Improving the Accuracy of Computed Matrix Eigenvalues, J. J. Dongarra, ACM Transactions on MathematicalSoftware 8(4):371375, December 1982, ISSN 00983500. A pdf version is available.  1979 Unrolling Loops in Fortran, J. Dongarra and A. R. Hinds, SoftwarePractice and Experience, 9(3):219226, March 1979, ISSN 00380644. A pdf version is available.



