Workshop on Clusters, Clouds, and Data for Scientific Computing

CCDSC 2018

(last update 9/6/18 12:50 AM)

September 4-7, 2018

Châteauform’

La Maison des Contes

427 Chemin de Chanzé, France

Châteauform’

La Maison des Contes

427 Chemin de Chanzé, France

September 4^th – 7^th, 2018

CCDSC 2018 will be held at a resort outside of Lyon France called La maison des contes http://www.chateauform.com/en/chateauform/maison/17/chateau-la-maison-des-contes

The address of the Chateau is:

Châteauform’ La Maison des Contes

427 chemin de Chanzé

69490 Dareizé

Telephone: +33 1 30 28 69 69

1 hr 30 min from the Saint Exupéry Airport

45 minutes from Lyon

GPS Coordinates: North latitude 45° 54' 20" East longitude 4° 30' 41"

Go to http://maps.google.com and type in: “427 chemin de Chanzé 69490 Dareizé” or see:

Maps: click here

Map of Chateau: click here

Message from the Program Chairs

This proceeding gathers information about the participants of the Workshop on Clusters, Clouds, and Data for Scientific Computing that will be held at La Maison des Contes, 427 Chemin de Chanzé, France on September 4^th-7^th, 2018. This workshop is a continuation of a series of workshops started in 1992 entitled Workshop on Environments and Tools for Parallel Scientific Computing. These workshops have been held every two years and alternate between the U.S. and France. The purpose of this the workshop, which is by invitation only, is to evaluate the state-of-the-art and future trends for cluster computing and the use of computational clouds for scientific computing.

This workshop addresses a number of themes for developing and using both cluster and computational clouds. In particular, the talks covered:

§ Survey and analyze the key deployment, operational and usage issues for clusters, clouds and grids, especially focusing on discontinuities produced by multicore and hybrid architectures, data intensive science, and the increasing need for wide area/local area interaction.

§ Document the current state-of-the-art in each of these areas, identifying interesting questions and limitations. Experiences with clusters, clouds and grids relative to their science research communities and science domains that are benefitting from the technology.

§ Explore interoperability among disparate clouds as well as interoperability between various clouds and grids and the impact on the domain sciences.

§ Explore directions for future research and development against the background of disruptive trends and technologies and the recognized gaps in the current state-of-the-art.

Speakers will present their research and interact with all the participants on the future software technologies that will provide for easier use of parallel computers.

This workshop was made possible thanks to sponsorship from NSF, AMD, PGI, Nvidia, Intel, Mellanox, STFC, ICL/UT, Vanderbilt Univeristy, Grenoble Alps University.

Thanks!

Jack Dongarra, Knoxville, Tennessee, USA.

Bernard Tourancheau, Grenoble, France

Draft agenda (9/6/18 12:50 AM)

September 4-7, 2018

Tuesday September 4th	Jack Dongarra, U of Tenn Bernard Tourancheau, U Grenoble	Introduction and Welcome
6:30 – 7:45	Session Chair: Jack Dongarra	(5 talks - 15 minute each)
6:30	Luiz DeRose, Cray	Scaling DL Training Workloads with the Cray PE Plugin
6:45	Joe Curley, Intel	Optimizing Deep Learning on General Purpose hardware – Part 2
7:00	Steve Scalpone, PGI/Nvidia	The F18 Fortran Compiler
7:15	Bill Brantley, AMD	AMD HPC Server Update
7:30
8:00 pm – 9:00 pm	Dinner

Wednesday, September 5th
7:30 - 8:30	Breakfast
8:30 - 10:30	Session Chair: Bernard Tourancheau	(6 talks – 20 minutes each)
8:30	Ian Foster	Learning Systems for Science
8:50	Rosa Badia	TANGO: How to dance with multiple task-based programming approach
9:10	Bill Gropp	Managing Code Transformations for Better Performance Portability
9:30	Geoffrey Fox	AI-Driven Science and Engineering with the Global AI Supercomputer
9:50	Ewa Deelman	Building a Cyberinfrastructure Community
10:10	Ilkay Altintas	Collaborative Workflow-Driven Science in a Rapidly Evolving Cyberinfrastructure Ecosystem
10:30 -11:00	Coffee
11:00 - 1:00	Session Chair: Padma Raghavan	(6 talks – 20 minutes each)
11:00	Pete Beckman	The Tortoise and the Hare: Is there still time for HPC to catch up in the performance race?
11:20	Vaidy Sunderam	Data Driven Systems For Spatio-Temporal Applications
11:40	Anne Benoit	Co-scheduling HPC Workloads on Cache-Partitioned CMP Platforms
12:00	Ken Birman	High Performance State Machine Replication for HPC: Bridging Two Worlds
12:20	Nicola Ferrier	Computing at the Edge
12:40	Guillaume Aupy	I/O Management in HPC Systems, from Burst-Buffers to I/O Scheduling
1:00 - 2:00	Lunch - break
2:30 – 3:00	Coffee
3:00 - 5:20	Session Chair: Dorian Arnold	(6 talks – 20 minutes each)
3:00	Barbara Chapman	Compiler Optimizations for Parallelism and Locality on Emerging Hardware
3:20	Al Geist	Latest Results from Summit – the New #1 System on the TOP500
3:40	Tony Hey	Machine Learning and Big Scientific Data Benchmarks
4:00	Franck Cappello	Frontiers of Lossy Compression for Scientific Data
4:20	Heike Jagode	PAPI's new Software-Defined Events for In-depth Performance Analysis
4:40	Yves Robert	A Little Scheduling Problem
6:30 – 7:30	Organic wine tasting in the « salon des contes » where we had the first welcome gathering
8:00 – 9:00	Dinner

Thursday, September 6th
7:30 - 8:30	Breakfast
8:30 - 10:30	Session Chair: Emmanuel Jeannot	(6 talks – 20 minutes each)
8:30	Joel Saltz	Integrative Everything, Deep Learning and Streaming Data
8:50	Judy Qiu	Real-Time Anomaly Detection from Edge to HPC-Cloud
9:10	Ron Brightwell	Resource Management in the Era of Extreme Heterogeneity
9:30	Andrew Grimshaw	Timing is Everything: The CCC as an Alternative to Commercial Clouds
9:50	Phil Papadopoulos	Virtualization is the answer. What was the question?
10:10	Alok Choudhary	The Ultimate Self-Driving Machine
10:30 – 11:00	Coffee
11:00 - 1:00	Session Chair: Laurent Lefevre	(6 talks – 20 minutes each)
11:00	Rich Vuduc	Algorithm-Level Control of Performance and Power Tradeoffs
11:20	Michela Taufer	Modeling Record-and-Replay for Nondeterministic Applications on Exascale Systems
11:40	Emmanuel Jeannot	Process Placement from Monitoring to Data Analysis
12:00	Haohuan Fu	Extreme-Scale Earthquake Simulation on Sunway TaihuLight
12:20	Carl Kesselman	Computation as an Experimental Science
12:40	Torsten Hoefler	Quantum Computing from an HPC System's Perspective
1:00 - 2:00	Lunch
2:00 – 4:00	Session Chair: Michela Taufer	(6 talks – 20 minutes each)
2:00	Dimitrios Nikolopoulos	Realistic Fault Injection and Analysis for Exascale Systems
2:20	Rob Ross	Versatile Data Services for Computational Science
2:40	Mary Hall	Mainstreaming Autotuning Compilers for Performance Portability: What will it Take?
3:00	Manish Parashar	Enabling Data-Driven Edge/Cloud Application Workflows
3:20	David Abramson	Energy Efficiency Modeling of Parallel Applications
3:40	Dorian Arnold	Big Deal, Little Deal or No Deal? The Realities of the HPC Resilience Challenge
4:00 - 4:30	Coffee
4:30 – 5:40	Session Chair: Rosa Badia	(5 talks – 20 minutes each)
4:30	Jeff Vetter	Preparing for Extreme Heterogeneity in High Performance Computing
4:50	George Bosilca	MPI as perceived by the ECP community
5:10	Padma Raghavan	Rethinking the Computational Complexity and Efficiency in the Age of “Big Data”
5:30	Laurent Lefevre	Building and exploiting the table of energy and power leverage for energy efficient large scale HPC systems
5:50	Hartwig Anzt	Towards a Modular Precision Ecosystem
8:00 – 9:00	Dinner
9:00 pm -

Friday, September 7th
7:30 - 8:30	Breakfast
8:30 - 10:30	Session Chair: Padma Raghavan	(6 talks – 20 minutes each)
8:30	Bernd Mohr	On the ROI of Parallel Performance Optimization
8:50	Christian Obrecht	Building Simulation: an Illusion
9:10	Laércio Lima Pilla	Decoupling schedulers from runtime systems for increased reuse and portability
9:30	Frederic Vivien	A Generic Approach to Scheduling and Checkpointing Workflows
9:50	Martin Swany	Network Microservices and Edge Computing
10:10	Jonathan Churchhill	Managing Mismatched Network Interface Performance in Multi Terabit Converged Ethernet Software Defined Storage
10:30 -11:00	Coffee
11:00 – 12:00	Session Chair: Bernard Tourancheau	(3 talks – 20 minutes each)
11:00	Frederic Desprez	SILECS: Super Infrastructure for Large-scale Experimental Computer Science
11:20	Minh Quan Ho	Standard Libraries on Non-Standard Processors
11:40	Rich Graham, Mellanox	The Network’s Role in the Large-Scale Computational Eco-System
12:00 - 1:30	Lunch
1:30	Depart

Arrival / Departure Information:

Here is some information on the meeting in Lyon. We have updated the workshop webpage http://bit.ly/ccdsc-2018 with the workshop agenda.

On Tuesday September 4^th there will be a bus to pick up participants at Lyon's Saint Exupéry (old name Satolas) Airport at 3:00 pm. (Note that the Saint Exupéry airport has its own train station with direct TGV connections to Paris via Charles de Gaulle. If you arrive by train at Saint Exupéry airport please go to the airport meeting point (point-rencontre) (second floor, next to the shuttles, near the hallway between the two terminals, see http://www.lyonaeroports.com/en/practicals-informations/information-points ).

The bus will be at the TGV station is after a long corridor from the airport terminal. The bus stop is near the station entrance on the parking lot called "depose minute".

The bus will then travel to pick up people at the Lyon Part Dieu railway station at 4:45 pm. (There are two train stations in Lyon, you want Part Dieu station not the Perrache station.) There will be someone with a sign at the "Meeting Point/point de rencontre" of the station to direct you to the bus.

The bus is expected to arrive at the La Maison des Contes around 5:30. We would like to hold the first session on Tuesday evening from 6:30 pm to 8:00 pm, with dinner following the session. The La Maison des Contes is about 43 Km from Lyon. For a map to the La Maison des Contes go to http://maps.google.com and type in: “427 chemin de Chanzé 69490 Dareizé” or see: Maps: click here

Map of Chateau: click here

VERY IMPORTANT: Please send your arrival and departure times to Jack so we can arrange the appropriate size bus for transportation. VERY VERY IMPORTANT: If your flight is such that you will miss the bus on Tuesday September 4^th at 3:00 send Bernard your flight arrival information so he can arrange for a transportation to pick you up at the train station or the airport in Lyon. It turns out that a taxi from Lyon to the Chateau can cost as much as 100 Euro and the Chateau may be hard to find at night if you rent a car and are not a French driver :-).

At the end of the meeting on Friday afternoon, we will arrange for a bus to transport people to the train station and airport. If you are catching an early flight in the morning of Saturday September 8^th you may want to stay at the hotel located at Lyon's Saint Exupéry Airport,

see http://www.lyonaeroports.com/eng/Shops-facilities/Hotels for details.

There are also many hotels in Lyon area, see: http://www.en.lyon-france.com/

Due to room constraints at the La Maison des Contes, we ask that you not bring a guest. Dress at the workshop is informal. Please tell us if you need special requirements (vegetarian food etc...) We are expecting to have internet and wireless connections at the meeting, but you know this is France.

Please send this information to Jack (dongarra@icl.utk.edu) by August 5^th.

Name:

Institute:

Title:

Abstract:

Participant’s brief biography:

Arrival / Departure Details:

		Arrival	Departure	Special
David	Abramson	9/4 Part Dieu	9/7 Part Dieu	Vegetarian
Ilkay	Altintas	9/4 UA9487 12:15pm	9/8 UA8881 7:40am
Hartwig	Anzt	9/4 Part Dieu	9/7 Part Dieu 2:41pm Train
Dorian	Arnold	9/4 Part Dieu	9/7 Part Dieu
Guillaume	Aupy	Car	Bus to Part Dieu
Rosa	Badia	9/4 VY1220 12:25pm	9/7 EZY4417 2:10pm	Will need a car on September 7^th mid morning.
Pete	Beckman	9/4 Train to Airport 2:01	9/7 airport
Anne	Benoit	Car	Car
Ken	Birman	9/4 TAP 476 11:15 am Taxi	9/7 TAP 473 6:00am	Will need a car on September 6^th for the early flight back on September 7^th
George	Bosilca	9/4 Part Dieu	9/7 Early	Will need a car on September 7^th mid morning.
Bill	Brantley	9/4 Part Dieu	9/7 Part Dieu
Ron	Brightwell	9/4 4:00pm Part Dieu	9/7 Part Dieu
Franck	Cappello	9/4 Car	9/7 Bus to airport
Barbara	Chapman	9/4 UA9959 10:50am	9/7 Part Dieu
Alok	Choudhary	9/4 LH 1076 1:35pm	9/7 LH 1077 2:15pm	Will need a car on September 7^th mid morning.
Jonathan	Churchhill	9/4 EZY8415 10:50am	9/7 EZY8418 7:15pm
Joe	Curley	9/4 LH1074 10:05am	Bus to airport
Ewa	Deelman	9/4 Part Dieu	Bus back to Part Dieu
Luiz	DeRose	9/4 Part Dieu	9/7 Part Dieu
Frederic	Desprez	Drive	Drive
Jack	Dongarra	9/4 KLM 1415 1:20pm	9/7 KLM 1412 6:10am	Will need a car on September 6^th for the early flight back on September 7^th
Fanny	Dufosse	9/4 Part Dieu
Nicola	Ferrier	9/4 UA8914 10:05am	9/7 bus
Ian	Foster	9/4 UA9487 12:15pm	9/7 UA9026 2:15pm	Will need a car on September 7^th mid morning.
Geoffrey	Fox	9/4 Airport Bus	9/7 Airport Bus
Haohuan	Fu	9/4 Will take a taxi to chateau	9/7 Part Dieu
Al	Geist	9/4 KLM 1415 1:20pm	9/8 DL 8611 6:15am
Rich	Graham	9/4 Late; Taxi to Chateaux	9/7 Part Dieu
Andrew	Grimshaw	9/4 LH1076 1:35pm	9/6 LH2253 7:45pm	Will need a car on September 6^th mid afternoon
Bill	Gropp	9/4 Part Dieu	9/7 Part Dieu
Mary	Hall	9/4 Part Dieu	9/7 Part Dieu
Li	Han	Car
Tony	Hey	9/4 BA 362 4:30pm	9/7 BA363 5:20pm	Arriving too late for the bus, will arrange a car
Torsten	Hoefler	9/4 Part Dieu	9/7 Part Dieu
Minh Quan	Ho	9/4 Part Dieu	9/7 Part Dieu
Heike	Jagode	9/4 LH1076 1:35pm	9/7 LH1077 2:15pm	Vegan Will need a car on September 7^th mid morning.
Emmanuel	Jeannot	9/4 Part Dieu	9/7 3:50pm flight
Carl	Kesselman	9/4 UA8916 1:35pm	9/8 UA9486 9:20am
Laurent	Lefevre	Drive to workshop	Drive	vegetarian (no meat, but fish, milk, eggs OK)
Laércio	LIMA PILLA	Car	Car
Bernd	Mohr	9/4 4:00pm Part Dieu TGV 9828	9/7 3:04pm Part Dieu TGV 6622
Dimitrios	Nikolopoulos	9/3 4:30pm BA0362	9/6 6:50am BA0365	Will need a car on September 5^thevening.
Christian	Obrecht	Drive to workshop	Drive
Phil	Papadopoulos	9/4 KLM 1415 1:20pm	9/9 DL 9499 9:55am
Manish	Parashar	9/4 UA9959 10:50am	9/7 UA9944 7:20am	Vegetarian Will need a car on September 6^th for the early flight back on the 7^th
Judy	Qiu	9/4 Airport Bus	9/7 Airport Bus
Padma	Raghavan	9/4 AA 6487 4:30pm	9/8 BA365 6:50am	Arriving too late for the bus, will arrange a car9/4
Yves	Robert	Drive to workshop	Drive
Rob	Ross	9/4 (AC 828) United 8024 8:10am	9/7 (LH 4229) United 9488 12:50pm	Will need a car on September 7^th early morning.
Joel	Saltz	9/4 Part Dieu 2:44pm	9/7 Part Dieu 6:04pm
Steve	Scalpone	9/4 KLM1415 1:20pm	9/8 LH1075 10:45am
Vaidy	Sunderam	9/4 TGV #6615 Part Dieu 2:56pm	9/8 TGV 08:00am Part Dieu
Martin	Swany	Part Dieu	Part Dieu
Michela	Taufer	9/4 AA 6487 4:30pm	9/8 AA 8602 10:30am	Vegetarian (no meat, fish OK) Arriving too late for the bus, will arrange a car
Bernard	Tourancheau	Airport bus
Jeff	Vetter	9/4 AF 7652 1:35pm	9/8 AF 7651 6:15am
Frederic	Vivien	Car	Car
Rich	Vuduc	9/4 Lyon Airport	9/7 DL 85 3:20pm CDG	Will need a car on September 7^th early morning or September 6^th.

Abstracts:

David Abramson, U of Queensland

Energy Efficiency Modeling of Parallel Applications

Abstract: Energy efficiency has become increasingly important in high performance computing (HPC), as power constraints and costs escalate. Workload and system characteristics form a complex optimization search space in which optimal settings for energy efficiency and performance often diverge. Thus, we must identify trade-off options for performance and energy efficiency to find the desired balance between them. We present an innovative statistical model that accurately predicts the Pareto optimal performance and energy efficiency trade-off options using only user-controllable parameters. Our approach can also tolerate both measurement and model errors. We study model training and validation using several HPC kernels, then explore the feasibility of applying the model to more complex workloads, including AMG and LAMMPS. We can calibrate an accurate model from as few as 12 runs, with prediction error of less than 10%. Our results identify trade-off options allowing up to 40% improvement in energy efficiency at the cost of under 20% performance loss. For AMG, we reduce the required sample measurement time from 13 hours to 74 minutes (about 90%).

Ilkay Altintas, UCSD

Collaborative Workflow-Driven Science in a Rapidly Evolving Cyberinfrastructure Ecosystem

ABSTRACT: Scientific workflows are powerful tools for computational data scientists to perform scalable experiments, often composed of complex tasks and algorithms distributed on a potentially heterogeneous set of resources. Existing cyberinfrastructure provides powerful components that can be utilized as building blocks within workflows to translate the newest advances into impactful repeatable solutions that can execute at scale. However, any workflow development activity today depends on the effective collaboration and communication of a multi-disciplinary data science team, not only with humans but also with analytical systems and infrastructure. Dynamic, predictable and programmable interfaces to systems and scalable infrastructure is key to building effective systems that can bridge the exploratory and scalable activities in the scientific process. This talk will focus on our recent work on the development of methodologies and tools for effective workflow driven collaborations, namely the PPoDS methodology and family of SmartFlows tools for the practice and smart utilization of workflows.

Hartwig Anzt, Karlsruher Institut für Technologie

Towards a modular precision ecosystem

Abstract: Over the last years, we have observed a growing mismatch between the arithmetic performance of processors in terms of the number of floating point operations per second (FLOPS) on the one side, and the memory performance in terms of how fast data can be brought into the computational elements (memory bandwidth) on the other side. As a result, more and more applications can utilize only a fraction of the available compute power as they are waiting for the required data. With memory operations being the primary energy consumer, data access is pivotal also in the resource balance and the battery life of mobile devices. In this I will introduce a disruptive paradigm change with respect to how scientific data is stored and processed in computing applications. The goal is to 1) radically decouple the data storage format from the processing format; 2) design a "modular precision ecosystem'' that allows for more flexibility in terms of customized data access; 3) develop algorithms and applications that dynamically adapt data access accuracy to the numerical requirements.

Dorian Arnold, Emory U

Big Deal, Little Deal or No Deal? The Realities of the HPC Resilience Challenge

Abstract: Considering fault-tolerant distributed systems, conceptual works date back to at least the 1960s, and practical software and hardware systems begin to appear at least in the early 1970s. Today, fault-tolerance or resilience is stated as one of the major challenges to realizing exascale computational capabilities. Yet, there are widely varying perspectives on the extent to which fault-tolerance is or will be an impediment. In this talk, we briefly survey landmark hardware and software technologies from the early to present day to posit answers to the questions: Should we be worried about fault-tolerance? If so, how much and specifically what about?

Rosa M Badia, Barcelona Supercomputing Center

TANGO: How to dance with multiple task-based programming approaches

Abstract: In the EU funded project TANGO BSC has been integrating two of the instances of task-based programming models: COMPSs and OmpSs. The combination of both is very interesting, since enables to parallelize applications at task level in distributed computing platforms (including Clouds) through COMPSs and to exploit finer level parallelism offloading OmpSs tasks to GPUs and FPGAs. Additionally, a new elasticity concept in large clusters has been integrated in COMPSs. The talk will introduce the TANGO programming model and will illustrate its application with several use cases, from HPC to embedded areas.

Pete Beckman, ANL

The Tortoise and the Hare: Is there still time for HPC to catch up in the performance race?

Abstract: Speed and scale define supercomputing. By some metrics, our supercomputers are the fastest, most capable systems on the planet. However over the last twenty years, the HPC community has become overconfident. Instead of leading the race for new architectures, methods, and software stacks, we pride ourselves on uptime, reliability, and the performance of a handful of hero computations. For many HPC deployments, lowering risk is more important than sprinting ahead. Has the cloud computing community already won the race? Can HPC regain leadership?

Anne Benoit, ENS Lyon, France

Co-scheduling HPC workloads on cache-partitioned CMP platforms

Abstract: Co-scheduling techniques are used to improve the throughput of applications on chip multiprocessors (CMP), but sharing resources often generates critical interferences. We focus on the interferences in the last level of cache (LLC) and use the Cache Allocation Technology (CAT) recently provided by Intel to partition the LLC and give each co-scheduled application their own cache area. We consider m iterative HPC applications running concurrently and answer the following questions: (i) how to precisely model the behavior of these applications on the cache partitioned platform? and (ii) how many cores and cache fractions should be assigned to each application to maximize the platform efficiency? Here, platform efficiency is defined as maximizing the performance either globally, or as guaranteeing a fixed ratio of iterations per second for each application. Through extensive experiments using CAT, we demonstrate the impact of cache partitioning when multiple HPC application are co- scheduled onto CMP platforms.

Ken Birman, Cornell U

High Performance State Machine Replication for HPC: Bridging two worlds

Abstract: Our new Derecho system shows that by leveraging HPC hardware (such as RDMA or Intel OMNI Path), state machine replication can run at stunning speeds and scale. Derecho’s main target is to support a new kind of edge computing with massive data rates and demanding real-time response requirements, a need also seen in many of today’s most exciting HPC settings. Indeed, many cloud-edge uses are basically HPC scenarios. Meanwhile, the HPC community has long struggled with issues of fault-tolerance for very large computations. Can high performance state machine replication bridge the two worlds?

George Bosilca, UTK

MPI as perceived by the ECP community

Abstract: The Exascale Computing Project (ECP) is currently the primary effort in the United States focused on developing “exascale” levels of computing capabilities, including hardware, software and applications. In order to obtain a more thorough understanding of how the software projects under the ECP are using, and planning to use the Message Passing Interface (MPI), and help guide the work of our own project within the ECP, we created a survey. This talks presents some results of the survey, providing a picture of MPI capabilities as perceived by some of it's "power users".

Bill Brantly, AMD

AMD HPC Server Update

Abstract: AMD HPC products have changed a great deal since the last CCDSC. I will give a very brief overview of current CPU and GPU server products and thus far public information about 2019 products.

Ron Brightwell, Sandia Labs

Resource Management in the Era of Extreme Heterogeneity

Abstract: Future HPC systems will be characterized by a large number and variety of complex, interacting components including processing units, accelerators, deep memory hierarchies, multiple interconnects, and alternative storage technologies. In addition to extreme hardware diversity, there is a broadening community of computational scientists using HPC as a tool to address challenging problems. Systems will be expected to efficiently support a wider variety of applications, including not only traditional HPC modeling and simulation codes, but also data analytics and machine learning workloads. This era of extreme heterogeneity creates several challenges that will need to be addressed to enable future systems to be effective tools for enabling scientific discovery. A recent DOE/ASCR workshop discussed these challenges and potential research directions to address them. In this talk, I will give my perspective on the resource management challenges and approaches stemming from extreme heterogeneity and offer my views on the most important system software capabilities that will need to be explored to meet these challenges.

Franck Cappello, ANL

Frontiers of lossy compression for scientific data

Abstract: Lossy compression is becoming popular for scientific data because of the need to reduce scientific data significantly. However, while the application of lossy compression is well understood in many domains (audio, video, image), it opens many questions when the data is produced and consumed by scientific simulations. In this talk, we explore three frontiers of lossy compression for scientific data: (i) the compression algorithms, (ii) the application of lossy compression for scientific simulation and (iii) the methodology to evaluate, compare and assess the impacts of lossy compression.

Barbara Chapman, SUNY Stonybrook

Compiler Optimizations for Parallelism and Locality on Emerging Hardware

Abstract: Pre-exascale computing systems are already giving us insight into the level of complexity that next-generation HPC architectures will entail. In order to enable their exploitation, and strive to meet the expectations of application developers, intra-node programming interfaces may provide constructs that enable code to explicitly use new architectural features. On the other hand, approaches are also needed that reduce the level of effort required to port codes to a potentially diverse array of computers.

The feature set of the OpenMP API is being enhanced in its 5.0 specification, now available in a draft form, to address these challenges. Its implementation technology will also need to be extended to meet new challenges. In this talk, we describe some of the ways in which we are working in the ECP-funded SOLLVE project to implement anticipated new features and enhance the state of the art in OpenMP implementations.

Alok N. Choudhary, Northwestern University

The Ultimate Self-Driving Machine

Abstract: HPC, ML, data mining, IOT, and control systems technologies among others have played a central role in advancing the cause of self-driving vehicles. This talk is not about the technologies. The talk will explore the possible impact on business and society and potential transformation that may occur or may be required if and when self-driving vehicles become a reality.

Jonathan Churchhill, STFC

Managing Mismatched Network Interface Performance In Multi Terabit Converged Ethernet Software Defined Storage

Abstract: In this talk I will discuss some of the issues we’ve encountered, with mixing 10/25/40/50/100Gb equipped compute and storage servers in our very large multi-Terabit converged Ethernet network that is the heart of JASMIN. This is in the context of our move from traditional parallel file systems to similarly high performance software defined object storage.

Joe Curley, Intel

Optimizing Deep Learning on General Purpose Hardware – Part 2

Ewa Deelman, ISI

Building a Cyberinfrastructure Community

Abstract: This talk will examine the opportunities of building a community around cyberinfrastructure design and deployment in scientific projects. It will examine what types of the capabilities can be shared across large cyberinfrastructure projects. It will ask the questions about how to build such a community, how to sustain it over time?

Luiz DeRose, Cray

Scaling DL Training Workloads with the Cray PE Plugin

Abstract: Deep Learning with convolutional neural networks is emerging as a powerful tool for analyzing complex datasets through classification, prediction and regression. Neural networks can also be trained to produce datasets in scenarios traditionally addressed with simulation and at significantly lower computational cost. However, training neural networks is a computationally intensive workload with training times measured in days or weeks on a single server or node. Thus, the computational resources needed to train sufficiently complex networks can limit the use of Deep Learning in production. High Performance Computing, in particular efficient scaling to large numbers of nodes, is ideal for addressing this problem. In this talk I will present the Cray Programming Environments Deep Learning Scalability Plugin, a portable solution for high performance scaling of deep learning frameworks.

Frederic Desprez, INRIA

SILECS: Super Infrastructure for Large-scale Experimental Computer Science

Abstract: SILECS, based on two existing infrastructure (FIT and Grid'5000), aims to provide a large robust, trustable and scalable instrument for research in distributed computing and networks. Experiments from the Internet of Things, data centers, cloud computing, security services, and the networks connecting them will be possible, in a reproducible way, on various hardware and software. This instrument will offer a multi-platform experimental infrastructure (HPC, Cloud, Big Data, Software Defined Storage, IoT, wireless, Software Defined Network / Radio) capable of exploring the
infrastructures that will be deployed tomorrow and assist researchers and industrial about how to design, build and operate a multi-scale, robust and safe computer system. Diverse digital resources (compute, storage, link, IO devices) are be assembled to support a “playground” at scale.

Nicola Ferrier, ANL

Computing at the Edge

Ian Foster, U of Chicago and ANL

Learning Systems for Science

Abstract: New learning technologies seem likely to transform much of science, as they are already doing for many areas of industry and society. We can expect these technologies to be used, for example, to obtain new insights from massive scientific data and to automate research processes. However, success in such endeavors will require new learning systems: scientific computing platforms, methods, and software that enable the large-scale application of learning technologies. These systems will need to enable learning from extremely large quantities of data; the management of large and complex data, models, and workflows; and the delivery of learning capabilities to many thousands of scientists. In this talk, I review these challenges and opportunities and describe systems that my colleagues and I are developing to enable the application of learning throughout the research process, from data acquisition to analysis.

Geoffrey Fox, Indiana University

AI-Driven Science and Engineering with the Global AI Supercomputer

Abstract: Most things are dominated by Artificial Intelligence (AI). Technology Companies like Amazon, Google, Facebook, and Microsoft are AI First organizations. Engineering achievement today is highlighted by the AI buried in a vehicle or machine. Industry (Manufacturing) 4.0 focusses on the AI-Driven future of the Industrial Internet of Things. Software is eating the world. We can describe much computer systems work as designing, building and using the Global AI supercomputer which itself is autonomously tuned by AI. We suggest that this is not just a bunch of buzzwords but has profound significance and examine consequences of this for education and research. Naively high-performance computing should be relevant for the AI supercomputer but somehow the corporate juggernaut is not making so much use of it. We discuss how to change this.

Haohuan Fu, Tsinghua University

Extreme-Scale Earthquake Simulation on Sunway TaihuLight

Abstract: This talk would first introduce and discuss the design philosophy about the Sunway TaihuLight system, and then describe our recent efforts on performing earthquake simulations on such a large-scale system. Our work in 2017 accomplished a complete redesign of AWP-ODC for Sunway architectures, achieves over 15% of the system's peak, better than the 11.8% achieved by a similar software running on Titan, whose byte to flop ratio is 5 times better than TaihuLight. The extreme cases demonstrate a sustained performance of over 18.9 Pflops, enabling the simulation of Tangshan earthquake as an 18-Hz scenario with an 8-meter resolution. Our recent work further improves the simulation framework with capabilities to describe complex surface topography, and to drive building damage prediction and landslide simulation, which are demonstrated with a case study of the Wenchuan earthquake with accurate surface topography and improved coda wave effects.

Al Geist, ORNL

Latest results from Summit – the new #1 system on the TOP500

Abstract: In June 2018 the Summit system at the Oak Ridge Leadership Computing Facility became the new #1 system on the TOP500 at 122 PF. This talk will describe the design of this system, the complex three lab collaboration used in its development, and the latest application results from Summit. The design of Summit provides a very green computer that is able to do traditional high performance computing as well as machine learning and data analytics. Summit’s peak double-precision performance is 200 PF, but more amazing is that it’s peak machine learning capability is over 3 exaops. This talk will describe a data analytics application that has already achieved 1.9 exaops on Summit.

Rich Graham, Mellanox

The Network’s Role in the Large-Scale Computational Eco-System

Abstract: As the volume of data transfers increases, the opportunities to manipulate data in flight increase, providing an opportunity to increase overall system efficiency and increasing application performance. This presentation will present several capabilities Mellanox Technologies has introduced to support in-network computing, and describe improvements that result from these. Technologies such as SHARP, MPI hardware tag matching, and UMR will be discussed.

Andrew Grimshaw, U of Virginia

Timing is Everything: The CCC as an Alternative to Commercial Clouds

Abstract: "Grid computing is where I give you access to my resources and get nothing in return." - A Skeptical Resource Administrator.

Wide-area, federated, compute-sharing systems (such as Condor, gLite, Globus, and Legion) have been around for over twenty years. Outside of particular domains such as physics, these systems have not been widely adopted. Recently, however, universities are starting to propose and join resource-sharing platforms. Why this sudden change?

Mostly, this change has come in response to cost concerns. HPC managers are under new pressure from university administrators who demand that infrastructure outlays be economically justified. "Why not just put it all on Amazon?" goes the administration's refrain. In response, HPC managers have begun to document the true cost of university-, department-, and research-group-owned infrastructure, thus enabling a legitimate cost comparison with Amazon or Azure. Additionally, it may be noted, this pressure to consider outsourcing computing infrastructure has legitimized both remote computing and paying for computation.

In this talk I will briefly describe the Campus Compute Cooperative's (CCC). I will then detail both the results of our market simulations and the take-aways from interviews with stakeholders. By both of these measures, the CCC is valuable and viable: first, the simulation results clearly show the gains in institutional value; second, stakeholders indicated that many institutions are open to trading resources. Most promising, some institutions expressed interest in selling resources and others expressed willingness to pay.

William Gropp, University of Illinois at Urbana-Champaign

Managing Code Transformations for Better Performance Portability

Abstract: With the end of Dennard Scaling, performance has depended on innovations in processor architecture. While these innovations have allowed per chip performance to continue to increase, it has made it increasingly difficult to write and maintain high performance code. Many different approaches to this problem have been tried, including enhancements to existing languages, new programming languages, libraries, tools, and even general techniques.

I will discuss the Illinois Coding Environment (ICE), which is used to provide code transformations for the primary code used by the Center for the Exascale Simulation of Plasma-Coupled Combustion. ICE is an example of an approach that uses annotations to an existing language to provide additional information that an guide performance optimizations, and uses a framework that can invoke third-party tools to apply performance enhancing transformations.

Mary Hall, U Utah

Mainstreaming Autotuning Compilers for Performance Portability: What will it Take?

Abstract: We describe research on mainstreaming autotuning compiler technology, whereby the compiler automatically explores a search space of alternative implementations of a computation to find the best implementation for a target architecture. Autotuning has demonstrated success in achieving performance portability as it enables the compiler to tailor optimization and code generation to a specific architectural context, starting from the same high-level program specification. Still, mainstream adoption requires availability in widely-used compilers and demonstrated impact on production application codes while under development. This talk will highlight an example of the impact of autotuning compiler technology, recent work on a brick data layout and associated code generator for stencil computations that uses fine-grained data blocking as a tunable abstraction for performance portability across CPUs and GPUs. It also will describe research on migrating autotuning technology into Clang/LLVM to support autotuning of OpenMP and complex loop transformation sequences.

Tony Hey, Science and Technology Facilities Council, UK

Machine Learning and Big Scientific Data Benchmarks

Abstract: This talk will review the challenges posed by the growth of experimental data generated by the new generation of large-scale experiments at UK national facilities such as the Diamond Synchrotron at the Rutherford Appleton Laboratory site at Harwell near Oxford. Increasingly, scientists now need to use sophisticated machine learning and other AI technologies to automate parts of the data pipeline and to find new scientific discoveries in the deluge of experimental data. In industry, Deep Learning is now transforming many areas of computing and researchers are now exploring their use in analyzing their ‘Big Scientific Data’. The talk will include a discussion about the creation of a set of Big Scientific Data Machine Learning ‘benchmarks’ for exploring the use of these technologies in the analysis of experimental research data. Such benchmarks could also be important in providing new research insights into the robustness and transparency of such these algorithms.

Torsten Hoefler, ETH Zurich

Quantum Computing from an HPC System's Perspective

Abstract: Quantum computation may be a big paradigm shift in the next century. Yet, the specifics of how computations happen are subtle and entangles with quantum mechanical concepts. The situation is further confused by many popular- and real-science misconceptions of the basic concepts. This talk tries to provide an intuitive-as-possible view on the field and its challenges from a computer systems perspective.

Minh Quan Ho, University Grenoble

Standard libraries on non-standard processors

Abstract: Potential of non-conventional many-core processors is clear for
future HPC and AI platforms. However, the hardness of developing
the standard software stack on those architectures frightens
system and application developers. In this talk, we present our
approaches on porting and optimizing BLAS and FFT libraries on
the MPPA processor - a DMA-based many-core architecture, while
keeping minimal footprint in a memory-constrained environment.

Heike Jagode, UTK

PAPI's new Software-Defined Events for in-depth Performance Analysis.

Abstract: One of the most recent developments of the Performance API (PAPI) is the addition of Software-Defined Events (SDE). PAPI has successfully served the role of the abstraction and unification layer for hardware performance counters for over a decade. This talk presents our effort to extend this role to encompass performance critical information that does not originate in hardware, but rather in critical software layers, such as libraries and runtime systems. Our overall objective is to enable monitoring of both types of performance events, hardware- and software-related events, in a uniform way, through one consistent PAPI interface. Performance analysts will be able to form a complete picture of the entire application performance without learning new instrumentation primitives. In this talk, we outline PAPI's new SDE API and showcase the usefulness of SDE through its employment in software layers as diverse as the math library MAGMA, the dataflow runtime PaRSEC, and the state-of-the-art chemistry application NWChem. We outline the process of instrumenting these software packages and highlight the performance information that can be acquired with SDEs.

Emmanuel Jeannot, INRIA Bordeaux

Process Placement from Monitoring to Data Analysis

Abstract: In this talk we will review the complete chain of topology process mapping: gathering the topology, monitoring the application, map processes, analyze results. We will review latest advances in this research field by my group (MPI monitoring, Hwloc, TreeMatch, etc.).

Carl Kesselman, ISI

Computation as an Experimental Science

Laurent Lefevre, Inria

Builing and exploiting the table of energy and power leverage for energy efficient large scale HPC systems

Abstract: Large scale distributed systems and supercomputers consume huge amounts of energy.

To address this issue, a set of hardware and software capabilities and techniques (leverages) exist to modify power and energy consumption in large scale systems.

Discovering, benchmarking and efficiently exploiting such leverages, remains a real challenge for most of the users. This talk will address the building of the table of energy and power leverages and will present how to exploit it for energy efficient systems.

Laércio Lima Pilla

Decoupling schedulers from runtime systems for increased reuse and portability

Abstract: Global schedulers are components used in parallel solutions, specially in dynamic applications, to optimize resource usage. Nonetheless, their development is a cumbersome process due to necessary adaptations to cope with the programming interfaces and abstractions of runtime systems. This presentation will focus on our model to dissociate schedulers from runtime systems in order to lower software complexity. Our model is based on the scheduler breakdown into modular and reusable concepts that better express the scheduler requirements. Through the use of meta-programming and design patterns, we are able to achieve fully reusable workload-aware scheduling strategies with increased reuse, less lines of code for the algorithms, and negligible run time overhead.

Bernd Mohr, Juelich Supercomputing Centre, Germany

On the ROI of Parallel Performance Optimization

Abstract:
Developers of HPC applications can count on free advice from European experts to analyse the performance of their scientific codes. The Performance Optimization and Productivity (POP) Centre of Excellence, funded by the European Commission under H2020, ran from October 2015 to March 2018. The POP Centre of Excellence gathered together experts from BSC, JSC, HLRS, RWTH Aachen University, NAG and Ter@tec. The objective of POP was to provide performance measurement and analysis services to
the industrial and academic HPC community, help them to better understand the performance behaviour of their codes and suggest
improvements to increase their efficiency. Training and user education regarding application tuning was also provided. Further information can be found at http://www.pop-coe.eu/. The talk will give an overview of the POP Centre of Excellence and describe the common performance assessment strategy and metrics developed and defined by the project partners. The presentation will close with the success stories and reports from the over 150 performance assessments performed during the project.

Dimitrios Nikolopoulos, Queens University Belfast.

Realistic fault injection and analysis for Exascale systems

Abstract: We explore compiler-based tools for accelerating resilience studies on Exascale systems. We look into how these tools can achieve accurate fault injection compared to binary-level tools, how they can handle multithreaded and parallel code, and how they can be scaled to conduct realistic fault resilience analysis campaigns.

Christian Obrecht, Centre for Energy and Thermal Sciences of Lyon (CETHIL)
National Institute of Applied Sciences of Lyon (INSA Lyon)
Building simulation: an illusion?

Abstract: Global warming mitigation through the reduction of greenhouse gas emission requires a drastic decrease of our energy consumption. In this perspective, residential buildings represent one of the largest potential source of energy savings. However, harnessing this resource will require considerable advancement in terms of building design. In this presentation, we will first focus on the computational challenges of designing energy efficient buildings and on why current practices in building simulation are inadequate. Secondly, we will address the computational effort needed for accurate building simulation and evaluate to which extent it is economically viable. To conclude, we will give some insights on how recent advances in computational sciences such as deep learning could help in designing more efficient buildings.

Phil Papadopoulos, UC, Irvine

"Virtualization is the answer. What was the question?"

Abstract: 10 years ago virtualization in High-Performance computing was a "non-starter" in the community. This talk will take an abbreviated tour of the short history of virtualization in HPC starting with para-virtualization (of the nearly dead Xen project), going through full-system (KVM as the exemplar), high-performance virtual clusters (enabled fundamentally by SRIOV) and finally to containers. There is demonstrated success for Virtual clusters on the SDSC's Comet Cluster where, as unseen infrastructure, these facilitated some recent key science discoveries. Virtualized systems bring more software control to the end user, but this can exact some significant, but hidden, costs. As more users want to "containerize" their applications there are open questions about how container orchestration engines like Kubernetes fit within the landscape. We'll use the Pacific Research Platform as a motivator for some possible directions while illuminating some dark corners.

Manish Parashar, NSF/Rutgers University

Enabling Data-Driven Edge/Cloud Application Workflows

Abstract: The proliferation of edge devices and associated data streams are enabling new classes of dynamic data driven applications. However, processing these data streams in a robust, effective and timely manner as part of applications workflows presents challenges that are not addressed by current stream programming frameworks. In this talk, I will present R-Pulsar, a unified cloud and edge data processing platform that extends the serverless computing model to the edge to enable streaming data analytics across cloud and edge resources in a location, content and resource aware manner. R-Pulsar has been deployed on edge devices and is being used to support disaster recovery workflows. This research is part of the Computing in the Continuum project at the Rutgers Discovery Informatics Institute.

Judy Qiu, Indiana University

Real-Time Anomaly Detection from Edge to HPC-Cloud

Abstract: Detection of anomalies in real-time streaming data has significant importance to a wide variety of application domains. They require high performance analytics and prediction that give actionable information in critical scenarios to racing cars, autonomous vehicles, monitoring medical conditions, security detection, nanoparticle interactions, and fusion reactions. In the motor racing event of Indianapolis 500, telemetry data is observed sequentially and gathered from multiple vehicles at the edge of networks, and then stored in a MongoDB database. To enable car simulators and analytics on-the-fly, we leverage a novel HPC-Cloud convergence framework named Harp-DAAL and demonstrate that the combination of Big Data and HPC techniques can simultaneously achieve productivity and performance. We show how simulations and Big Data analytics can use common programming environments with a runtime based on a rich set of collectives and libraries.

Padma Raghavan, Vanderbilt University

Rethinking the Computational Complexity and Efficiency in the Age of “Big Data”

Abstract: “Big data” sets are here and as they continue to get bigger, there is an important and growing “small data” challenge, namely the energy costs of moving small numbers of bits and bytes within the hardware. This will impact high performance computing disproportionately as there is higher susceptibility to hardware errors while single thread performance is not improving despite the multi-megawatt power consumption of even modest-sized systems which have multi-million way thread parallelism. We need to rethink how we seek to optimize computational performance and resiliency starting with the key measure, namely the computational complexity and efficiency of an algorithm that has traditionally concerned the number calculations. I will provide some illustrative examples drawn from sparse computations, where the number of data elements moved per operation are high for traditional algorithms, with a view to inform alternative approaches that could potentially increase performance, energy-efficiency and resiliency.

Yves Robert, ENS Lyon

A Little Scheduling Problem

Abstract: The talk addresses a little scheduling problem related to these large HPC platforms
that we love to play with.
Participant’s brief biography: Yves Robert has attended all CCDSC meetings but one. That says it all!
In addition, please send me your arrival and departure information:
- driving my own car
- arriving on Tuesday around 5:30pm) and leaving on Friday afternoon
- will likely have some passengers from LIP (Anne, Frédéric, ....)

Rob Ross, ANL

Versatile Data Services for Computational Science

Abstract: On the data management side of HPC, the adoption of new services over the past decade has been slow. Globally-available parallel file systems still dominate the scene, despite the availability and success of alternatives outside the HPC community. At the same time, the approach of composing codes from multiple coordinating components is having great success in other areas of computational science. In this presentation we motivate the composition of data services from reusable components and describe our efforts in this direction under the Mochi project (https://www.mcs.anl.gov/research/projects/mochi/). Mochi aims to provide the tools for an ecosystem of specialized data services for HPC. We will discuss the approach and our rationale, components built to date, and describe some motivating computational science use cases.

Joel Saltz,SUNY Stronybrook

Integrative Everything, Deep Learning and Streaming Data

Abstract: The need to create to label information and segment regions in individual sensor data sources and to create synthesizes from multiple disparate data sources span many areas of science, biomedicine and technology. The rapid evolution in sensor technologies – from digital microscopes to UAVs drive requirements in this area. I will describe a variety of use cases, describe technical challenges as well as tools, algorithms and techniques developed by our group and collaborators.

Steve Scalpone, PGI/Nvidia

The F18 Fortran Compiler

F18 is an all-new open source Fortran compiler infrastructure. It is being developed as part of the Flang project, a collaboration between NVIDIA, the US Dept of Energy and a growing community of contributors to create a Fortran front-end for LLVM. F18 is written in C++ in the style of Clang and LLVM, is designed to be integrated with LLVM code generation, and is designed to facilitate ready development of language extensions and Fortran-related tools. F18 source code is available under the LLVM and Apache licenses, so it's friendly for both research and commercial use. F18 is designed for a future in which all proposed extensions to Fortran and its related directives can be implemented and proven out prior to formal adoption in a standard or specification, where Fortran language features, extensions, pragmas and directives are portable across all HPC platforms, where each processor manufacturer can support their latest hardware optimizations in an end-to-end Fortran compiler built around modern software engineering principles, and where researchers and academics can implement state-of-the-art language and optimization features built on a production-quality Fortran source base.

Vaidy Sunderam, Emory University

Data driven systems for spatio-temporal applications
Abstract: Data driven systems are rapidly increasing in prevalence, especially in spatio-temporal domains. Numerous smart devices collect and report observations, for individual and collective value, but with variable reliability and potential loss of privacy. We present several research contributions aimed at addressing: (1) task assignment in crowdsourcing systems with privacy protection; (2) truth discovery whereby reports or observations from multiple entities can be fused to improve veracity; and (3) tensor factorization methods for extracting patterns from spatio-temporal data. Models, issues, approaches, and preliminary results will be presented.

Martin Swany, Indiana University

Network Microservices and Edge Computing

Abstract: With proliferating sensor networks and Internet of Things-scale devices, networks are increasingly diverse and heterogeneous. To enable the most efficient use of network bandwidth with the lowest possible latency, we propose InLocus, a stream-oriented architecture situated at (or near) the network's edge which balances hardware-accelerated performance with the flexibility of asynchronous software-based control.

Michela Taufer, University of Tennessee

Modeling Record-and-Replay for Nondeterministic Applications on Exascale Systems

Abstract: Record-and-replay (R&R) techniques present an attractive method for mitigating the harmful aspects of nondeterminism in HPC applications (e.g., numerical irreproducibility and hampered debugging), but are hamstrung by two problems. First, there is insufficient understanding of how existing R&R techniques cost of recording responds to changes in application communication patterns, inputs, and other aspects of configuration, and to the degree of concurrency. Second, current R&R techniques have insufficient ability to exploit regularities in the communication patterns of individual applications.

To tackle these problems, it is crucial that the HPC community is equipped with modeling and simulation methodologies to assess the response of R&R tools, both in terms of execution time overhead and memory overhead, to changes in the configuration of the applications they monitor. To realize effective modeling of the relationship between application configuration, R&R tool configuration, and the cost of recording, we apply a fourfold approach. First, we design a general and expressive representation of executions of the parallel applications that record-and-replay tools target. Second, we define a rigorous notion of the dissimilarity between multiple executions of the same nondeterministic application. Third, we implement a method of determining for a particular event graph the cost of recording that execution given a record-and-replay tool. Finally, we implement a methods of extracting from those event graphs that correspond to costly recordings the components of those event graphs that contribute most to the cost. In our talk, we describe our approach towards addressing each one of these four contributions.

This is joint work with Dylan Chapp and Danny Rudabaugh (UTK), Kento Sato and Dong Ahn (LLNL)

Jeff Vetter, ORNL

Preparing for Extreme Heterogeneity in High Performance Computing

Abstract: Concerns about energy-efficiency and cost are forcing our community to reexamine system architectures, including the memory and storage hierarchy. While computing technologies have remained relatively stable for nearly two decades, new architectural features, such as heterogeneous cores, deep memory hierarchies, non-volatile memory (NVM), and near-memory processing, have emerged as possible solutions to address these concerns. However, we expect this ‘golden age’ of architectural change to lead to extreme heterogeneity and it will have a major impact on software systems and applications. Software will need to be redesigned to exploit these new capabilities and provide some level of performance portability across these diverse architectures. In this talk, I will sample these emerging memory technologies, discuss their architectural and software implications, and describe several new approaches to address these challenges. One system is Papyrus (Parallel Aggregate Persistent -yru- Storage); it is a programming system that aggregates NVM from across the system for use as application data structures, such as vectors and key-value stores, while providing performance portability across emerging NVM hierarchies.

Frédéric Vivien, INDIA

A Generic Approach to Scheduling and Checkpointing Workflows

Abstract: This work deals with scheduling and checkpointing strategies to execute scientific workflows on failure-prone large-scale platforms. To the best of our knowledge, this work is the first to target fail-stop errors for arbitrary workflows. Most previous work addresses soft errors, which corrupt the task being executed by a processor but do not cause the entire memory of that processor to be lost, contrarily to fail-stop errors. We revisit classical mapping heuristics such as HEFT and MinMin and complement them with several checkpointing strategies. The objective is to derive an efficient trade-off between checkpointing every task (CkptAll), which is an overkill when failures are rare events, and checkpointing no task (CkptNone), which induces dramatic re-execution overhead even when only a few failures strike during execution. Contrarily to previous work, our approach applies to arbitrary workflows, not just special classes of dependence graphs such as M-SPGs (Minimal Series-Parallel Graphs). Extensive experiments report significantcant gain over both CkptAll and CkptNone, for a wide variety of workflows.

Rich Vuduc, GATech

Algorithm-level control of performance and power tradeoffs

Abstract: I'll discuss a novel technique to control power consumption by tuning the amount of parallelism that is available during the execution of an algorithm. The specific algorithm is a tunable variation of delta-stepping for computing a single-source shortest path (SSSP); its available parallelism is highly irregular and depends strongly on the input. Informed by an analysis of these runtime characteristics, we propose a software-based controller that uses online learning techniques to dynamically tune the available parallelism to meet a given target, thereby improving the average available parallelism while reducing its variability. We verify experimentally that this mechanism makes it possible for the algorithm to “self-tune” the tradeoff between performance and power. The prototype extends Gunrock's GPU SSSP implementation, and the experimental apparatus consists of embedded CPU+GPU development boards (NVIDIA Tegra series), which have separately tunable GPU core and memory frequency knobs, attached to an external power monitoring device (PowerMon 2). This work is led by Sara Karamati, a Ph.D. student, and joint with Jeff Young, both at Georgia Tech.

Biographies of Attendees:

David Abramson has been involved in computer architecture and high performance computing research since 1979. He has held appointments at Griffith University, CSIRO, RMIT and Monash University. Prior to joining UQ, he was the Director of the Monash e-Education Centre, Science Director of the Monash e-Research Centre, and a Professor of Computer Science in the Faculty of Information Technology at Monash. From 2007 to 2011 he was an Australian Research Council Professorial Fellow. David has expertise in High Performance Computing, distributed and parallel computing, computer architecture and software engineering. He has produced in excess of 200 research publications, and some of his work has also been integrated in commercial products. One of these, Nimrod, has been used widely in research and academia globally, and is also available as a commercial product, called EnFuzion, from Axceleon. His world-leading work in parallel debugging is sold and marketed by Cray Inc, one of the world's leading supercomputing vendors, as a product called ccdb. David is a Fellow of the Association for Computing Machinery (ACM), the Institute of Electrical and Electronic Engineers (IEEE), the Australian Academy of Technology and Engineering (ATSE), and the Australian Computer Society (ACS). He is currently a visiting Professor in the Oxford e-Research Centre at the University of Oxford.

Ilkay Altintas is the Chief Data Science Officer at the San Diego Supercomputer Center (SDSC), UC San Diego, where she is also the Founder and Director for the Workflows for Data Science Center of Excellence. In her various roles and projects, she leads collaborative multi-disciplinary with a research objective to deliver impactful results through making computational data science work more reusable, programmable, scalable and reproducible. Since joining SDSC in 2001, she has been a principal investigator and a technical leader in a wide range of cross-disciplinary projects. Her work has been applied to many scientific and societal domains including bioinformatics, geoinformatics, high-energy physics, multi-scale biomedical science, smart cities, and smart manufacturing. She is a co-initiator of the popular open-source Kepler Scientific Workflow System, and the co-author of publications related to computational data science at the intersection of workflows, provenance, distributed computing, big data, reproducibility, and software modeling in many different application areas.

Hartwig Anzt is a Helmholtz-Young-Investigator Group leader at the Steinbuch Centre for Computing at the Karlsruhe Institute of Technology. He obtained his PhD in Mathematics at the Karlsruhe Institute of Technology, and afterwards joined Jack Dongarra's Innovative Computing Lab at the University of Tennessee in 2013. Since 2015 he also holds a Senior Research Scientist position at the University of Tennessee. Hartwig Anzt has a strong background in numerical mathematics, specializes in iterative methods and preconditioning techniques for the next generation hardware architectures. His Helmholtz group on Fixed-point methods for numerics at Exascale (``FiNE'') is granted funding until 2022. Hartwig Anzt has a long track record of high-quality software development. He is author of the MAGMA-sparse open source software package managing lead and developer of the Ginkgo numerical linear algebra library, and part of the US Exascale computing project delivering production-ready numerical linear algebra libraries.

Dorian Arnold is an associate professor of Computer Science at Emory University with research interests in operating and distributed systems, fault-tolerance, online (streaming) data analysis and high-performance software tools. Dorian’s projects target a productive balance of principles and practice: his 60+ research articles have been cited over 1600 times, and two of his research projects (NetSolve and STAT) have won Top 100 R&D awards in 1999 and 2011. He is a senior member of the IEEE and an ACM Distinguished Speaker. Arnold received Ph.D. and M.S. degrees in Computer Science from the Universities of Wisconsin and Tennessee, respectively. He also received his B.S. in Math and Computer Science from Regis University (Denver, CO) and his A.S. in Physics, Chemistry and Math from St. John's Junior College (Belize).

Guillaume Aupy is a researcher at Inria Bordeaux Sud-Ouest. He currently works on data-aware scheduling at the different levels of the memory hierarchy (cache, memory, buffers, disks). He completed his PhD at ENS Lyon in 2014 on reliable and energy efficient scheduling strategies in High-Performance Computing. In 2017, he served as the Technical Program vice-chair for SC'17, workshop chair for SC18 and algorithm track
vice-chair for ICPP'18.

Rosa M. Badia Rosa M. Badia holds a PhD on Computer Science (1994) from the Technical University of Catalonia (UPC). She is the manager of the Workflows and Distributed Computing research group at the Barcelona Supercomputing Center (BSC).
Her current research interest are programming models for complex platforms (from multicore, GPUs to Cloud). The group lead by Dr. Badia has been developing StarSs programming model for more than 10 years, with a high success in adoption by application developers. Currently the group focuses its efforts in PyCOMPSs/COMPSs, an instance of the programming model for distributed computing including Cloud.
Dr Badia has published near 200 papers in international conferences and journals in the topics of her research. Her group is very active in projects funded by the European Commission and in contracts with industry.

Pete Beckman is the co-director of the Northwestern University/Argonne Institute for Science and Engineering and a recognized global expert in high-end computing systems. During the past 25 years, his research has been focused on software and architectures for large-scale parallel and distributed computing systems. For the DOE’s Exascale Computing Project, Pete leads the Argo team focused on extreme-scale operating systems and run-time software. He is the founder and leader of the Waggle project for smart sensors and edge computing that is used by the Array of Things project. Pete also coordinates the collaborative technical research activities in extreme-scale computing between the US Department of Energy and Japan’s ministry of education, science, and technology and helps lead the BDEC (Big Data and Extreme Computing) series of international workshops. Pete leads the extreme computing research activities at Argonne National Laboratory. He received his Ph.D in computer science from Indiana University.

Anne Benoit received the PhD degree from Institut National Polytechnique de Grenoble in 2003, and the Habilitation à Diriger des Recherches (HDR) from Ecole Normale Supérieure de Lyon (ENS Lyon) in 2009. She is currently an associate professor in the Computer Science Laboratory LIP at ENS Lyon, France. She is the author of one book on algorithm design, 43 papers published in international journals, and 87 papers published in international conferences. She is the advisor of 9 PhD theses. Her research interests include algorithm design and scheduling techniques for parallel and distributed platforms, and also the performance evaluation of parallel systems and applications, with a focus on energy awareness and resilience. She is Associate Editor (in Chief) of Elsevier ParCo, and Associate Editor of IEEE TPDS and Elsevier JPDC. She is the program chair of several workshops and conferences, in particular she is the program chair for HiPC’16, ICPP’17, SC’17 (papers chair), and IPDPS’18. She is a senior member of the IEEE, and she has been elected a Junior Member of Institut Universitaire de France in 2009.

Ken Birman is the N. Rama Rao Professor of Computer Science at Cornell. An ACM Fellow and the winner of the IEEE Tsutomu Kanai Award, Ken has written 3 textbooks and published more than 150 papers in prestigious journals and conferences. Software he developed operated the New York Stock Exchange for more than a decade without trading disruptions, and plays central roles in the French Air Traffic Control System and the US Navy AEGIS warship. Other technologies from his group found their way into IBM’s Websphere product, Amazon’s EC2 and S3 systems, Microsoft’s cluster management solutions, and the US Northeast bulk power grid. The new Derechos system is intended for demanding settings such as the smart power grid, smart highways and homes, and scalable vision systems. Download it, open source, from http://GitHub.com/Derecho-Project.

George Bosilca is a Research Director and Adjunct Assistant Professor at the Innovative Computing Laboratory at University of Tennessee, Knoxville. His research interests evolve around designing support for parallel applications to maximize their efficiency, scalability, heterogeneity and resiliency at any scale and in any settings. He is actively involved in projects such as Open MPI, ULFM, PaRSEC, DPLASMA, TESSE.

Bill Brantley is a Fellow Design Engineer in the Research Division of Advanced Micro Devices leading parts of *Forward research contracts as well as other efforts. Prior to AMD he was at IBM T.J. Watson Research Center where he was one of the architects and implementers of the 64 CPU RP3 (a DARPA supported HPC system development in the mid-80s) including a hardware performance monitor. In IBM Austin he held in a number of roles including the analysis of server performance in the Linux Technology Center. Prior to joining IBM, he completed his Ph.D. at Carnegie Mellon University in ECE after working for 3 years at Los Alamos National Laboratory.

Ron Brightwell leads the Scalable System Software Department at Sandia National Laboratories. After joining Sandia in 1995, he was a key contributor to the high-performance interconnect software and lightweight operating system for the world’s first terascale system, the Intel ASCI Red machine. He was also part of the team responsible for the high-performance interconnect and lightweight operating system for the Cray Red Storm machine, which was the prototype for Cray’s successful XT product line. The impact of his interconnect research is visible in technologies available today from Atos/Bull, Intel, and Mellanox. He has also contributed to the development of the MPI-2 and MPI-3 specifications. He has authored more than 115 peer-reviewed journal, conference, and workshop publications. He is an Associate Editor for the IEEE Transactions on Parallel and Distributed Systems, has served on the technical program and organizing committees for numerous high-performance and parallel computing conferences, and is a Senior Member of the IEEE and the ACM.

Franck Cappello is senior computer scientist at Argonne National Laboratory and adjunct associate professor in the department of computer science at University of Illinois at Urbana Champaign. He is the director of the Joint-Laboratory on Extreme Scale Computing gathering six of the leading high-performance computing institutions in the world: Argonne National Laboratory (ANL), National Center for Scientific Applications (NCSA), Inria, Barcelona Supercomputing Center (BSC), Julich Supercomputing center (JSC), Riken CCS and UTK-ICL. Franck is an expert in parallel/distributed computing and high-performance computing. Recently he started investigating lossy compression for scientific datasets to respond to the pressing needs of scientists performing large scale simulations and experiments for significant data reduction. Franck is member of the editorial board of IEEE Transactions on Parallel and Distributed Systems and of the IEEE CCGRID steering committees. He is fellow of the IEEE and recipient of the 2018 IEEE TCPP outstanding service award.

Barbara Chapman is a Professor of Applied Mathematics and Statistics, and of Computer Science, at Stony Brook University, where she is affiliated with the Institute for Advanced Computational Science. She also directs Computer Science and Mathematics Research at Brookhaven National Laboratory. She performs research on parallel programming interfaces and the related implementation technology, and has been involved in several efforts to develop community standards for parallel programming, including OpenMP, OpenACC and OpenSHMEM. Her research group created the OpenUH compiler that enabled practical experimentation with proposed enhancements to application programming interfaces and a reference implementation of the library-based OpenSHMEM standard. Dr. Chapman has co-authored over 200 papers and two books. She obtained her Ph.D. in Computer Science from Queen’s University of Belfast.

Alok Choudhary is the Henry & Isabelle Dever Professor of Electrical Engineering and Computer Science and a professor at Kellogg School of Management. He is also the founder, chairman and chief scientist (served as its CEO during 2011-2013) of 4C insights (formerly Voxsup Inc.), a big data analytics and marketing technology software company. He received the National Science Foundation's Young Investigator Award in 1993. He is a fellow of IEEE, ACM and AAAS. His research interests are in high-performance computing, data intensive computing, scalable data mining, high-performance I/O systems, software and their applications in science, medicine and business. Alok Choudhary has published more than 400 papers in various journals and conferences and has graduated 40+ PhD students. Alok Choudhary’s work and interviews have appeared in many traditional media including New York Times, Chicago Tribune, The Telegraph, ABC, PBS, NPR, AdExchange, Business Daily and many international media outlets all over the world.

Jonathan Churchhill

After a 20+ year career in the Semiconductor business designing high performance SRAMs and their associated CAD systems, joined STFC in 2006 focusing on HPC systems and support. In the last 6 years I have been responsible for building the architecture and systems operations for JASMIN - The UK’s platform for environmental research data analysis – taking it from day one to todays 70 racks, 45PB’s of high performance storage attached to cloud and physical HPC totalling ~10k cores. I am the named author on 15 US and UK patents.

Joe Curley serves Intel® Corporation as Senior Director, HPC Platform and Ecosystem Enablement in the High Performance Computing Platform Group (HPG). His primary responsibilities include supporting global ecosystem partners to develop their own powerful and energy-efficient HPC computing solutions utilizing Intel hardware and software products. Mr. Curley joined Intel Corporation in 2007, and has served in multiple other planning and business leadership roles.

Prior to joining Intel, Joe worked at Dell, Inc. leading the global workstation product line, consumer and small business desktops, and a series of engineering roles. He began his career at computer graphics pioneer Tseng Labs.

Ewa Deelman is a Research Professor at the USC Computer Science Department and a Research Director, at the USC Information Sciences Institute (ISI). Dr. Deelman's research interests include the design and exploration of collaborative, distributed scientific environments, with particular emphasis on automation of scientific workflow and management of computing resources, as well as the management of scientific data. Her work involves close collaboration with researchers from a wide spectrum of disciplines. At ISI she leads the Science Automation Technologies group that is responsible for the development of the Pegasus Workflow Management software. In 2007, Dr. Deelman edited a book: “Workflows in e-Science: Scientific Workflows for Grids”, published by Springer. She is also the founder of the annual Workshop on Workflows in Support of Large-Scale Science, which is held in conjunction with the Super Computing conference. In 1997 Dr. Deelman received her PhD in Computer Science from the Rensselaer Polytechnic Institute.

Luiz DeRose is a Senior Principal Engineer and the Programming Environments Director at Cray Inc, where he is responsible for the programming environment strategy for all Cray systems. Before joining Cray in 2004, he was a research staff member and the Tools Group Leader at the Advanced Computing Technology Center at IBM Research. Dr. DeRose has a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign. With more than 25 years of high performance computing experience and a deep knowledge of its programming environments, he has published more than 50 peer-review articles in scientific journals, conferences, and book chapters, primarily on the topics of compilers and tools for high performance computing.

Frédéric Desprez is a Chief Senior Research Scientist at Inria and holds a position at the LIG laboratory (UGA, Grenoble, France) in the Corse research team. He is also Deputy Scientific Director at Inria. He received his PhD in C.S. from Institut National Polytechnique de Grenoble, France, in 1994 and his MS in C.S. from ENS Lyon in 1990. His research interests include parallel high performance computing algorithms and scheduling for large scale distributed platforms. He leads the Grid'5000 project, which offers a platform to evaluate large scale algorithms, applications, and middleware systems. See https://fdesprez.github.io/ for further information.

Jack Dongarra holds an appointment at the University of Tennessee, Oak Ridge National Laboratory, and the University of Manchester. He specializes in numerical algorithms in linear algebra, parallel computing, use of advanced-computer architectures, programming methodology, and tools for parallel computers. He was awarded the IEEE Sid Fernbach Award in 2004; in 2008 he was the recipient of the first IEEE Medal of Excellence in Scalable Computing; in 2010 he was the first recipient of the SIAM Special Interest Group on Supercomputing's award for Career Achievement; in 2011 he was the recipient of the IEEE IPDPS Charles Babbage Award; and in 2013 he received the ACM/IEEE Ken Kennedy Award. He is a Fellow of the AAAS, ACM, IEEE, and SIAM and a member of the National Academy of Engineering.

Fanny Dufosse

Nicola Ferrier is a Senior Computer Scientist in ANL’s Mathematics and Computer Science Division, and a Senior Fellow of University of Chicago’s Consortium for Advanced Science and Engineering (UChicago CASE) and Institute of Molecular Engineering, and a member of the Northwestern Argonne Institute for Science and Engineering. Ferrier’s research interests are in the use of computer vision (digital images) to control robots, machinery, and devices, with applications as diverse as medical systems, manufacturing, and biology. At Argonne National Lab and University of Chicago she collaborates with scientists from the Institute for Molecular Engineering, Advanced Photon Source, Materials Science, and biological sciences on various projects where images and computation facilitate “scientific discovery”. Prior to joining MCS in 2013 she was a professor of mechanical engineering at the University of Wisconsin-Madison where she directed the Robotics and Intelligent Systems lab (1996-2013).

Ian Foster is Distinguished Fellow and director of the Data Science and Learning Division at Argonne National Laboratory. He is also the Arthur Holly Compton Distinguished Service Professor of Computer Science at the University of Chicago. Ian received a BSc (Hons I) degree from the University of Canterbury, New Zealand, and a PhD from Imperial College, United Kingdom, both in computer science. His research deals with distributed, parallel, and data-intensive computing technologies, and innovative applications of those technologies to scientific problems in such domains as materials science, climate change, and biomedicine. His Globus software is widely used in national and international cyberinfrastructures. Foster is a fellow of the American Association for the Advancement of Science, Association for Computing Machinery, and British Computer Society. His awards include the Global Information Infrastructure Next Generation award, the British Computer Society's Lovelace Medal, the IEEE’s Kanai award, and honorary doctorates from the University of Canterbury, New Zealand, and the Mexican Center for Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV). He co-founded Univa, Inc., a company established to deliver grid and cloud computing solutions, and Praedictus Climate Solutions, which combines data science and high performance computing for quantitative agricultural forecasting.

Geoffrey Fox is a professor of Engineering, Computing, and Physics at Indiana University where he is director of the Digital Science Center, and Department Chair for Intelligent Systems Engineering at the School of Informatics, Computing, and Engineering. He has supervised the Ph.D. of 71 students and is a Fellow of APS (Physics) and ACM (Computing)

Haohuan Fu is a professor in the Ministry of Education Key Laboratory for Earth System Modeling, and Department of Earth System Science in Tsinghua University, where he leads the research group of High Performance Geo-Computing (HPGC). He is also the deputy director of the National Supercomputing Center in Wuxi, laeding the research and development division. Fu has a PhD in computing from Imperial College London. His research work focuses on providing both the most efficient simulation platforms and the most intelligent data management and analysis platforms for geoscience applications.

Al Geist is a Corporate Research Fellow at Oak Ridge National Laboratory. He is the Chief Technology Officer of ORNL's Leadership Computing Facility and Chief Scientist for the Computer Science and Mathematics Division. He is on the Leadership Team of the U.S. Exascale Computing Project. His recent research is on Exascale computing and resilience needs of the hardware and software.

Al Geist

Rich Graham is a Senior Director for HPC technology at Mellanox Technologies, Inc. His primary focus is on the High Performance Computing focusing on Mellanox’s HPC technical roadmap, and working with customers on their HPC needs. Prior to moving to Mellanox, Rich spent thirteen years at Los Alamos National Laboratory and Oak Ridge National Laboratory, in computer science technical and administrative roles, with a technical focus on communication libraries and application analysis tools. He is cofounder of the Open MPI collaboration, was chairman of the MPI 3.0 standardization efforts.

Andrew Grimshaw received his Ph.D. from the University of Illinois at Urbana-Champaign in 1988. He joined the University of Virginia as an Assistant Professor of Computer Science, becoming Associate Professor in 1994 and Professor in 1999. He is the chief designer and architect of Mentat, Legion, Genesis II, and the co-architect for XSEDE. In 1999 he co-founded Avaki Corporation, and served as its Chairman and Chief Technical Officer until 2003. In 2003 he won the Frost and Sullivan Technology Innovation Award. In 2008 he became the founding director of the University of Virginia Alliance for Computational Science and Engineering (UVACSE). The mission of UVACSE is to change the culture of computation at the University of Virginia and to accelerate computationally oriented research.

Andrew is the chairman of the Open Grid Forum (OGF), having served both as a member of the OGF's Board of Directors and as Architecture Area Director. Andrew is the author or co-author of over 100 publications and book chapters. His current projects are IT, Genesis II, and XSEDE. IT is a next generation portable parallel language based on the PCubeS type architecture. Genesis II, is an open source, standards-based, Grid system that focuses on making Grids easy-to-use and accessible to non computer-scientists. XSEDE (eXtreme Science and Engineering Discovery Environment) is the NSF follow-on to the TeraGrid project.

William Gropp holds the Thomas M. Siebel chair in computer science at the University of Illinois at Urbana-Champaign, is the Director and Chief Scientist of the National Center for Supercomputer Applications, and was the founding director of the Parallel Computing Institute. Prior to joining Illinois in 2007, he held positions at Argonne National Laboratory, including Associate Director for the Mathematics and Computer Science Division and Senior Computer Scientist. He is known for his work on scalable numerical algorithms and software (sharing an R&D100 award and the SIAM/ACM Prize in Computational Science and Engineering for PETSc software) and for the Message Passing Interface (sharing an R&D100 award for MPICH, the dominant high-end implementation, as well as co-authoring the leading books on MPI). For his accomplishments in parallel algorithms and programming, he received the IEEE Computer Society's Sidney Fernbach award in 2008, the SIAM-SC Career Award in 2014, and the ACM/IEEE-CS Ken Kennedy Aared in 2016. He is a fellow of ACM, IEEE, and SIAM, and is an elected member of the National Academy of Engineering

Mary Hall is a Professor at University of Utah, where she has been since 2008. Her research interests focus on programming systems for high-performance computing, with a particular interest in autotuning compilers, parallel code generation and domain-specific optimization. She leads the Y-Tune project that is part of the U.S. Dept. of Energy Exascale Computing Project, in collaboration with Lawrence Berkeley National Laboratory and Argonne National Laboratory. Mary Hall is an ACM Distinguished Scientist and serves on the Computing Research Association Board of Directors.

Li Han

Tony Hey began his career as a theoretical physicist with a doctorate in particle physics from the University of Oxford in the UK. After a career in physics that included research positions at Caltech and CERN, and a professorship at the University of Southampton in England, he became interested in parallel computing and moved into computer science. In the 1980’s he was one of the pioneers of distributed memory message-passing computing and co-wrote the first draft of the successful MPI message-passing standard.

After being both Head of Department and Dean of Engineering at Southampton, Tony Hey was appointed to lead the U.K.’s ground-breaking ‘eScience’ initiative in 2001. He recognized the importance of Big Data for science and wrote one of the first papers on the ‘Data Deluge’ in 2003. He joined Microsoft in 2005 as a Vice President and was responsible for Microsoft’s global university research engagements. He worked with Jim Gray and his multidisciplinary eScience research group and edited a tribute to Jim called ‘The Fourth Paradigm: Data-Intensive Scientific Discovery.’ Hey left Microsoft in 2014 and spent a year as a Senior Data Science Fellow at the eScience Institute at the University of Washington. He returned to the UK in November 2015 and is now Chief Data Scientist at the Science and Technology Facilities Council.

In 1987 Tony Hey was asked by Caltech Nobel physicist Richard Feynman to write up his ‘Lectures on Computation’. This covered such unconventional topics as the thermodynamics of computing as well as an outline for a quantum computer. Feynman’s introduction to the workings of a computer in terms of the actions of a ‘dumb file clerk’ was the inspiration for Tony Hey’s attempt to write ‘The Computing Universe’, a popular book about computer science. Tony Hey is a fellow of the AAAS and of the UK's Royal Academy of Engineering. In 2005, he was awarded a CBE by Prince Charles for his ‘services to science.’

Torsten Hoeffler is an Associate Professor of Computer Science at ETH Zürich, Switzerland. He is best described as an HPC systems person with interests across the whole stack. Recently, he started to investigate the potential of quantum computation. Torsten won best paper awards at the ACM/IEEE Supercomputing Conference SC10, SC13, SC14, EuroMPI'13, HPDC'15, HPDC'16, IPDPS'15, and other conferences. He published numerous peer-reviewed scientific conference and journal articles and authored chapters of the MPI-2.2 and MPI-3.0 standards. He received the Latsis prize of ETH Zurich as well as an ERC starting grant in 2015. His research interests revolve around the central topic of "Performance-centric System Design" and include scalable networks, parallel programming techniques, and performance modeling. Additional information about Torsten can be found on his homepage at htor.inf.ethz.ch.

Minh Quan Ho currently holds the position of Embedded library and High-performance computing solution expert at Kalray. He joined Kalray back in 2014 within his PhD program on optimizing stencil computations and linear algebra on the Kalray MPPA processor. Minh Quan graduated from his Master's degree in Computer Science from the Ecole Polytechnique de Grenoble and his PhD from the University Grenoble Alpes.

Heike Jagode is a Research Assistant Professor with the Innovative Computing Laboratory at the University of Tennessee Knoxville. She specializes in high-performance computing and the efficient use of advanced computer architectures; focusing primarily on developing methods and tools for performance analysis and tuning of parallel scientific applications. Her research interests include the multi-disciplinary effort to convert computational chemistry algorithms into a dataflow-based form to make them compatible with next-generation task-scheduling systems, such as PaRSEC. She received a Ph.D. in Computer Science from the University of Tennessee Knoxville. Previously, she received an M.S. in High-Performance Computing from The University of Edinburgh, Scotland, UK; an M.S. in Applied Techno-Mathematics and a B.S. in Applied Mathematics from the University of Applied Sciences Mittweida, Germany.

Emmanuel Jeannot is a Senior Research Scientist at Inria. He is doing his research at INRIA Bordeaux Sud-Ouest and at the LaBRI laboratory since 2009. From 2005 to 2006 he was researcher at INRIA Nancy Grand-Est. In 2006 I was a visiting researcher at the University of Tennessee, ICL laboratory. From 2000 to 2005 he was assistant professor at the Université Henry Poincaré. During the period from 2000 to 2009 he did his research at the LORIA laboratory. He got his PhD and Master degree of computer science (resp. in 1996 and 1999) both from Ecole Normale Supérieur de Lyon, at the LIP laboratory. His main research interests lies in parallel and high-performance computing and more precisely: process placement, topology-aware algorithms, scheduling for heterogeneous environments, data redistribution, algorithms and models for parallel machines, distributed computing software, adaptive online compression, and programming models.

Carl Kesselman

Laurent Lefevre is a permanent researcher in computer science at Inria (the French Institute for Research in Computer Science and Control). He is a member of the Avalon team (Algorithms and Software Architectures for Distributed and HPC Platforms) from the LIP laboratory in Ecole Normale Supérieure of Lyon, France. He has organized several conferences in high performance networking and computing and he is a member of several program committees. He has co-authored more than 100 papers published in refereed journals and conference proceedings. Since more than a decade, he is working on energy efficiency of large scale systems (HPC centers, datacenters, clouds and big networks). His others interests include: high performance computing, distributed computing and networking, high performance networks protocols and services.

See http://perso.ens-lyon.fr/laurent.lefevre for further information.

Laércio Lima Pilla received his PhD in Computer Science from the Université Grenoble Alpes, France, and the Universidade Federal do Rio Grande do Sul, Brazil, in 2014. He holds an Associate Professor position in the Universidade Federal de Santa Catarina in Florianópolis, Brazil. He is currently working as a postdoctoral researcher in the CORSE project-team in Grenoble in the hybrid parallelization of a high order finite element solver for the numerical modeling of nanoscale light/matter interaction. Starting this October, he will hold a position as CNRS researcher in the ParSys team at LRI - University of Paris-Saclay. His research interests are mainly related to parallel computing, runtime systems, computer architecture, and global scheduling.

Bernd Mohr started to design and develop tools for performance analysis of parallel programs at the University of Erlangen in Germany in 1987. During a three year postdoc position at the University of Oregon, he designed and implemented the original TAU performance analysis framework. Since 1996 he has been a senior scientist at Forschungszentrum Juelich. Since 2000, he has been the team leader of the group "Programming Environments and Performance Analysis". Besides being responsible for user support and training in regard to performance tools at the Juelich Supercomputing Centre (JSC), he is leading the Scalasca performance tools efforts in collaboration with Prof. Felix Wolf of TU Darmstadt. Since 2007, he also serves as deputy head for the JSC division "Application support".

Dimitrios Nikolopoulos FBCS FIET is a Professor at Queen’s University Belfast where he holds a personal chair in High Performance and Distributed Computing and is Director of the University’s Global Research Institute on Electronics, Communication and Information Technologies. Dimitrios currently holds a Royal Society Wolfson Research Merit Award and an SFI-DEL Investigator Award.

Christian Obrecht is an associate professor of applied physics at the Department of Civil Engineering and Urban Planning of the National Institute of Applied Sciences in Lyon (INSA Lyon). Dr Obrecht first graduated in mathematics from University of Strasbourg in 1990 and served as a teacher of mathematics from 1993 to 2008. He obtained a master’s degree in computer science from University of Lyon in 2009 and a doctoral degree in civil engineering from INSA Lyon in 2012. He was appointed associate professor in 2015 and joined the Centre for Energy and Thermal Sciences of Lyon (CETHIL). His research work is devoted to energy efficiency in buildings and focuses more specifically on innovative approaches in computational building physics.

Phil Papadopoulos received his PhD in 1993 from UC Santa Barbara in Electrical Engineering. He spent 5 years at Oak Ridge National Laboratory (ORNL) as part of the the Parallel Virtual Machine (PVM) development team. In 1998, he moved to UC San Diego as research professor in computer science in 1998. In 1999. he began a 19-year career at the San Diego Supercomputer Center and become the Chief Technology Officer at SDSC in 2008. He is the chief architect of the NSF-funded Comet Cluster which supports high-performance virtual clusters. In 2018, Dr. Papadopoulos moved to UC Irvine to become the inaugural Director of the Research Cyberinfrastructure Center. While his current job focuses more on CI development and deployment for a leading research university, his own research interests revolve around distributed, clustered, and cloud-based systems and how they can be used more effectively in an expanding bandwidth-rich environment. Dr. Papadopoulos has been a key investigator for several research projects at UCSD including the The National Biomedical Computation Resource(NBCR) and the Pacific Rim Applications and Grid Middlware Assembly (PRAGMA, OCI-1234983) He is well known for leading the development of the open-source, NSF-funded Rocks Cluster toolkit (OCI-0721623), which has installed base of 1000s of clusters. Since his formative days at ORNL, Dr. Papadopoulos has focused on the practicalities and challenges of defining and building cluster and distributed cyberinfrastructure for local, national, and international communities. He likes to hike, too.

Manish Parashar is Distinguished Professor of Computer Science at Rutgers University. He is also the founding Director of the Rutgers Discovery Informatics Institute (RDI2). He is currently on an IPA appointment at the National Science Foundation. His research interests are in the broad areas of Parallel and Distributed Computing and Computational and Data-Enabled Science and Engineering. Manish is the founding chair of the IEEE Technical Consortium on High Performance Computing (TCHPC), Editor-in-Chief of the IEEE Transactions on Parallel and Distributed Systems. He has received a number of awards for his research and leadership, and is Fellow of AAAS, Fellow of IEEE/IEEE Computer Society and ACM Distinguished Scientist. For more information please visit http://parashar.rutgers.edu/.

Judy Qiu is an associate professor of Intelligent Systems Engineering at Indiana University. Her general area of research is in data-intensive computing at the intersection of Cloud and HPC multicore technologies. This includes a specialization in programming models that support iterative computation, ranging from storage to analysis which can scalably execute data intensive applications. Her research has been funded by NSF, NIH, Microsoft, Google, Intel and Indiana University.

Padma Raghavan is a Professor of Computer Science in the Department of Electrical Engineering and Computer Science at Vanderbilt University, where she is also Vice Provost for Research. Prior to joining Vanderbilt in February 2016, she was a Distinguished Professor of Computer Science and Engineering at the Pennsylvania State University and served as the Associate Vice President for Research and Director of Strategic Initiatives, in addition to being the founding Director of the Institute for CyberScience, the coordinating unit on campus for developing interdisciplinary computation and data-enabled science and engineering and the provider of high-performance computing services for the university. Raghavan received her Ph.D. in computer science from Penn State. Prior to joining Penn State in August 2000, she served as an associate professor in the Department of Computer Science at the University of Tennessee and as a research scientist at the Oak Ridge National Laboratory.

Raghavan specializes in high-performance computing and computational science and engineering. She has led the development of “sparse algorithms” that derive from and operate on compact yet accurate representation of high dimensional data, complex models, and computed results. Raghavan has developed parallel sparse linear solvers that limit the growth of computational costs and utilize the concurrent computing capability of advanced hardware to enable the solution of complex large-scale modeling and simulation problems that are otherwise beyond reach. Raghavan was also among the first to propose the design of energy-efficient supercomputing systems by combining results from sparse scientific computing with energy-aware hardware optimizations used for small-embedded computers. In her professorial role, Raghavan is deeply involved in education and research, with 46 Masters and Ph.D. theses supervised and more than hundred peer-reviewed publications. She has earned several awards including an NSF CAREER Award (1995), the Maria Goeppert-Mayer Distinguished Scholar Award (2002, University of Chicago and the Argonne National Laboratory), and selection as an IEEE Fellow (2013). Raghavan is also a prominent member of major professional societies including SIAM (Society of Industrial and Applied Mathematics) and IEEE (Institute of Electrical and Electronics Engineers). She served as the Chair of the Technical Program of the 2017 IEEE/ACM Conference on Supercomputing and she is a member of the SIAM Committee on Science Policy and the SIAM Council, which together with its Board and officers leads SIAM. Raghavan also serves on the Advisory Board of the Computing and Information Science and Engineering Directorate o

Yves Robert

Robert Ross is a Senior Computer Scientist at Argonne National Laboratory and a Senior Fellow at the Northwestern-Argonne Institute for Science and Engineering. He is the Director of the DOE SciDAC RAPIDS Institute for Computer Science and Data. Rob’s research interests are in system software for high performance computing systems, in particular distributed storage systems and libraries for I/O and message passing. Rob received his Ph.D. in Computer Engineering from Clemson University in 2000 and was a recipient of the 2004 Presidential Early Career Award for Scientists and Engineers.

Joel Saltz is an MD, PhD in Computer Science with a long career spanning development of compiler, run time system, filter stream methods in Computer Science and Multi Scale imaging and digital Pathology related tools, algorithms and methods in Biomedical Informatics. He is currently Chair of Biomedical Informatics and Professor of Computer Science at Stony Brook.

Steve Scalpone is Director of Engineering for PGI compilers and tools at NVIDIA. He has worked on compilers for 20 years at Verdix, Rational Software, ST Microelectronics and NVIDIA. He also worked mobile device security at Nukona, Wind River, Intel, and Symantec.

Vaidy Sunderam is Professor of Computer Science at Emory University and
Chair of the Computer Science Department. His research interests are in parallel
and distributed computing systems, security and privacy issues in spatiotemporal
systems, high-performance message passing environments, and infrastructures for
collaborative computing. His prior and current research efforts are supported by
grants from NSF, DoE, AFOSR, and NASA and have focused on system for
metacomputing middleware, collaboration, and data driven systems. Sunderam
teaches computer science at the beginning, advanced, and graduate levels, and
advises graduate theses in the area of computer systems and data science.

Martin Swany is Associate Chair and Professor in the Intelligent Systems Engineering Department in the School of Informatics and Computing at Indiana University, and the Deputy Director of the Center for Research in Extreme Scale Technologies (CREST). His research interests include high-performance parallel and distributed computing and networking.

Michela Taufer holds the Jack Dongarra Professorship in High Performance Computing in the Department of Electrical Engineering and Computer Science at the University of Tennessee Knoxville (UTK). Before to join UTK, she was Professor in Computer and Information Sciences and a J.P. Morgan Case Scholar at the University of Delaware where she also had a joint appointment in the Biomedical Department and the Bioinformatics Program. She earned her undergraduate degrees in Computer Engineering from the University of Padova (Italy) and her doctoral degree in Computer Science from the Swiss Federal Institute of Technology or ETH (Switzerland). From 2003 to 2004 she was a La Jolla Interfaces in Science Training Program (LJIS) Postdoctoral Fellow at the University of California San Diego (UCSD) and The Scripps Research Institute (TSRI), where she worked on interdisciplinary projects in computer systems and computational chemistry.

Bernard Tourancheau got a MSc. in Apply Maths from Grenoble University in 1986 and a MSc. in Renewable Energy Science and Technology from Loughborough University in 2007. He was awarded best Computer Science PhD by Institut National Polytechnique of Grenoble in 1989 for his work on Parallel Computing for Distributed Memory Architectures.

He was appointed assistant professor at Ecole Normale Supérieure de Lyon LIP lab in 1989 before joining CNRS as a junior researcher. After initiating a CNRS-NSF collaboration, he worked on leave at the University of Tennessee on a senior researcher position with the US Center for Research in Parallel Computation at the ICL laboratory.

He then took a Professor position at University of Lyon in 1995 where he created a research laboratory and the INRIA RESO team, specialized in High Speed Networking and HPC.

In 2001, he joined SUN Microsystems Laboratories for a 6 years sabbatical as a Principal Investigator in the DARPA HPCS project where he lead the backplane networking group.

Back in academia he oriented his research on wireless sensor networks for building energy efficiency at ENS LIP and INSA CITI labs.

He was appointed Professor at University Joseph Fourier of Grenoble in 2012. Since then in the LIG lab Drakkar team, he is developing research about protocols and architectures for the Internet of Things. He as well pursues HPC multicores GPGPU's communication algorithms optimization research. He is also a scientific promoter of renewable energy transition, relocalization and low tech to answer the peak oil and global warming issues.

He has authored more than 140 peer-reviewed publications and filed 10 patents.

Jeffrey Vetter, Ph.D., is a Distinguished R&D Staff Member at Oak Ridge National Laboratory (ORNL). At ORNL, Vetter is the founding group leader of the Future Technologies Group in the Computer Science and Mathematics Division. Vetter also holds joint appointments at the Georgia Institute of Technology and the University of Tennessee-Knoxville. Vetter earned his Ph.D. in Computer Science from the Georgia Institute of Technology. Vetter is a Fellow of the IEEE, and a Distinguished Scientist Member of the ACM. In 2010, Vetter, as part of an interdisciplinary team from Georgia Tech, NYU, and ORNL, was awarded the ACM Gordon Bell Prize. Also, his work has won awards at major conferences including Best Paper Awards at the International Parallel and Distributed Processing Symposium (IPDPS), the AsHES workshop, and EuroPar, Best Student Paper Finalist at SC14, and Best Presentation at EASC 2015. In 2015, Vetter served as the SC15 Technical Program Chair. His recent books, entitled "Contemporary High Performance Computing: From Petascale toward Exascale (Vols. 1 and 2)," survey the international landscape of HPC. See his website for more information: http://ft.ornl.gov/~vetter/.

Frédéric Vivien received his Ph.D. degree from the École Normale Supérieure de Lyon in 1997. From 1998 to 2002, he was an associate professor at the Louis Pasteur University in Strasbourg, France. He spent the year 2000 working with the Computer Architecture Group of the MIT Laboratory for Computer Science. He is currently a senior researcher from INRIA, working at ENS Lyon, France. He leads the INRIA project-team Roma, which focuses on designing models, algorithms, and scheduling strategies to optimize the execution of scientific applications. He is the author of two books, more than 35 papers published in international journals, and more than 50 papers published in international conferences. His main research interests are scheduling techniques and parallel algorithms for distributed and/or heterogeneous systems.

Rich Vuduc is an Associate Professor at the Georgia Institute of Technology (“Georgia Tech”), in the School of Computational Science and Engineering, a department devoted to the study of computer-based modeling and simulation of natural and engineered systems. His research lab, The HPC Garage (@hpcgarage), is interested in high-performance computing, with an emphasis on algorithms, performance analysis, and performance engineering. He is a recipient of a DARPA Computer Science Study Group grant; an NSF CAREER award; a collaborative Gordon Bell Prize in 2010; Lockheed-Martin Aeronautics Company Dean’s Award for Teaching Excellence (2013); and Best Paper Awards at the SIAM Conference on Data Mining (SDM, 2012) and the IEEE Parallel and Distributed Processing Symposium (IPDPS, 2015), among others. He has also served as his department’s Associate Chair and Director of its graduate programs. External to Georgia Tech, he was elected to be Vice President of the SIAM Activity Group on Supercomputing (2016-2018); co-chaired the Technical Papers Program of the “Supercomputing” (SC) Conference in 2016; and serves as an associate editor of both the International Journal of High-Performance Computing Applications and IEEE Transactions on Parallel and Distributed Systems. He received his Ph.D. in Computer Science from the University of California, Berkeley, and was a postdoctoral scholar at Lawrence Livermore National Laboratory's Center for Advanced Scientific Computing.