Workshop on Clusters, Clouds, and Data for Scientific Computing

CCDSC 2016

(last update 4/5/17 3:31 AM)

October 3-6, 2016

Châteauform’

La Maison des Contes

427 Chemin de Chanzé, France

Workshop Final Report

Châteauform’

La Maison des Contes

427 Chemin de Chanzé, France

October 3^rd – 6^th, 2016

CCDSC 2016 will be held at a resort outside of Lyon France called La maison des contes http://www.chateauform.com/en/chateauform/maison/17/chateau-la-maison-des-contes

The address of the Chateau is:

Châteauform’ La Maison des Contes

427 chemin de Chanzé

69490 Dareizé

Telephone: +33 1 30 28 69 69

1 hr 30 min from the Saint Exupéry Airport

45 minutes from Lyon

GPS Coordinates: North latitude 45° 54' 20" East longitude 4° 30' 41"

Go to http://maps.google.com and type in: “427 chemin de Chanzé 69490 Dareizé” or see:

Maps: click here

Map of Chateau: click here

Message from the Program Chairs

This proceeding gathers information about the participants of the Workshop on Clusters, Clouds, and Data for Scientific Computing that will be held at La Maison des Contes, 427 Chemin de Chanzé, France on October 3-6, 2016. This workshop is a continuation of a series of workshops started in 1992 entitled Workshop on Environments and Tools for Parallel Scientific Computing. These workshops have been held every two years and alternate between the U.S. and France. The purpose of this the workshop, which is by invitation only, is to evaluate the state-of-the-art and future trends for cluster computing and the use of computational clouds for scientific computing.

This workshop addresses a number of themes for developing and using both cluster and computational clouds. In particular, the talks covered:

§ Survey and analyze the key deployment, operational and usage issues for clusters, clouds and grids, especially focusing on discontinuities produced by multicore and hybrid architectures, data intensive science, and the increasing need for wide area/local area interaction.

§ Document the current state-of-the-art in each of these areas, identifying interesting questions and limitations. Experiences with clusters, clouds and grids relative to their science research communities and science domains that are benefitting from the technology.

§ Explore interoperability among disparate clouds as well as interoperability between various clouds and grids and the impact on the domain sciences.

§ Explore directions for future research and development against the background of disruptive trends and technologies and the recognized gaps in the current state-of-the-art.

Speakers will present their research and interact with all the participants on the future software technologies that will provide for easier use of parallel computers.

This workshop was made possible thanks to sponsorship from NSF, AMD, PGI, Nvidia, Intel, Mellanox, GENCI, CGG-Veritas, ICL/UT, Grenoble Alps University.

Thanks!

Jack Dongarra, Knoxville, Tennessee, USA.

Bernard Tourancheau, Grenoble, France

Draft agenda (4/5/17 3:31 AM)

October 3^rd – 6^th, 2016

Monday October 3rd	Jack Dongarra, U of Tenn Bernard Tourancheau, U Grenoble	Introduction and Welcome
6:30 – 7:45	Session Chair: Jack Dongarra	(5 talks - 15 minute each)
6:30	Doug Miles, PGI/Nvidia	On the Role of Compiler Directives in High-Performance Computing
6:45	Patrick Demichel, HP	The Future of IT Technologies
7:00	Rich Graham, Mellanox	The Active Network
7:15	Joe Curley, Intel	Tales of Parallel App Enabling on the Path to Exascale
7:30	Bill Brantley, AMD	AMD Accelerating Technologies for Exascale Computing
7:45	Gunter Roeth, Nvidia	Deep Learning – Impact on Modern Life
8:00 pm – 9:00 pm	Dinner

Tuesday, October 4th
7:30 - 8:30	Breakfast
8:30 - 10:30	Session Chair: Bernard Tourancheau, U Grenoble	(6 talks – 20 minutes each)
8:30	Pete Beckman	WaggleVision
8:50	Rosa Badia	Task-based programming in COMPSs to converge from HPC to Big Data
9:10	Ian Foster	New Directions in Globus: Collections, Responsive Storage, and Safe Data
9:30	Geoffrey Fox	Distinguishing Parallel and Distributed Computing Performance
9:50	Ewa Deelman	What is missing in workflow technologies
10:10	Laurent Lefevre	GreenFactory : orchestrating power capabilities and leverages at large scale for energy efficient infrastructures
10:30 -11:00	Coffee
11:00 - 1:00	Session Chair: Patrick Demichel	(6 talks – 20 minutes each)
11:00	Alok Choudhary	Scaling Resiliency via machine learning and compression
11:20	Vaidy Sunderam	Cost and Utility Tradeoffs on IaaS Clouds, Grids, and On-Premise Resources
11:40	Anne Benoit	Resilient application co-scheduling with processor redistribution
12:00	Frank Mueller	Mini-Ckpts: Surviving OS Failures in Persistent Memory
12:20	Fran Berman	Sustaining the Data Ecosystem
12:40	Emmanuel Jeannot	Towards System-Scale Optimization of HPC Applications
1:00 - 2:00	Lunch - break
2:30 – 3:00	Coffee
3:00 - 5:20	Session Chair: Fran Berman	(6 talks – 20 minutes each)
3:00	Barbara Chapman	?
3:20	Rusty Lusk	From Automated Theorem Proving to Nuclear Structure Analysis with Self-Scheduled Task Parallelism
3:40	Tony Hey	The Revolution in Experimental and Observational Science: The Convergence of Data-Intensive and Compute-Intensive Infrastructure
4:00	Franck Cappello	Lossy Compression of scientific data: from Stone Age to Renaissance
4:20	Al Geist	Are Killer Apps Killing Exascale?
4:40	Yves Robert	Some recent results about resilience and/or co-scheduling for large-scale systems
6:30 – 7:30	Organic wine tasting in the « salon des contes » where we had the first welcome gathering
8:00 – 9:00	Dinner

Wednesday, October 5th
7:30 - 8:30	Breakfast
8:30 - 10:10	Session Chair: Emmanuel Jeannot	(5 talks – 20 minutes each)
8:30	Bill Gropp	Do You Know What Your I/O is Doing?
8:50	Andrew Grimshaw	PCubeS/IT - A Type Architecture and Portable Parallel Language for Hierarchical Parallel Machines
9:10	Ron Brightwell	Embracing Diversity: OS Support for Integrating High-Performance Computing and Data Analytics
9:30	Joel Saltz	Convergence of Data and Computation: Integration of Sensors and Simulation
9:50	Laura Grigori	Low rank approximation and write avoiding algorithms
10:10 -10:40	Coffee
10:40 - 12:40	Session Chair: Laurent Lefevre	(6 talks – 20 minutes each)
10:40	Samuel Thibault	Task-graph-based applications, from theory to exascale?
11:00	Michela Taufer	In Situ Data Analysis of Protein Trajectories
11:20	Jeff Hollingsworth	Handling Phase Behavior of Parallel Programs
11:40	Haohuan Fu	Refactoring and Optimizing the Community Atmosphere Model (CAM) on the Sunway TaihuLight Supercomputer
12:00	David Walker	Morton Ordering of 2D Arrays for Parallelism and Efficient Access to Hierarchical Memory
12:20 - 2:00	Lunch
2:00 – 4:00	Session Chair: Michela Taufer	(4 talks – 20 minutes each)
2:00	Dimitrios Nikolopoulos	Computational Significance and its Implications for HPC
2:20	Rob Ross	From File Systems to Services: Changing the Data Management Model in HPC
2:40	Torsten Hoefler	Automatic GPU compilation and why you want to run MPI on your GPU
3:00	Manish Parashar	Experiments with Software-Defined Environments for Science
3:20 – 4:00	Coffee
4:00 – 6:00	Session Chair: Rosa Badia	(4 talks – 20 minutes each)
4:00	Jeff Vetter	Performance Portability for Extreme Scale High Performance Computing
4:20	Bernd Mohr	POP -- Parallel Performance Analysis and Tuning as a Service
4:40	Padma Ragahavan	Synchronization, Load-Balancing and Redundant Calculations: Finding the Sweet Spot of High Performance Computing
5:00	Carl Kesselman	Scientific Data Asset Management
8:00 – 9:00	Dinner
9:00 pm -

Thuesday, October 6th
7:30 - 8:30	Breakfast
8:30 - 10:10	Session Chair: Padma Raghavan	(5 talks – 20 minutes each)
8:30	Ilkay Altintas	Workflows as an Operation Tool for Scientific Computing using Data Science
8:50	Christian Obrecht	On a Novel Method for High Performance Computational Fluid Dynamics
9:10	Anthony Danalis	Dataflow programming: Do we need it for exascale?
9:30	Pavan Balaji	How I Learned to Stop Worrying about Exascale and Love MPI
9:50	Martin Swany	Offloading Collective Operations to Programmable Logic
10:10 -10:40	Coffee
10:40 - 12:20	Session Chair: Jack Dongarra	(3 talks – 20 minutes each)
10:40	Frederic Desprez	BOAST: Performance Portability Using Meta-Programming and Auto-Tuning
11:00	Frederic Vivien	Multi-level checkpointing and silent error detection
11:20	Minh Quan Ho	LBM 3D stencil memory bound improvement with many-core processors asynchronous transfers
11:40
12:00 - 1:30	Lunch
1:30	Depart

Arrival / Departure Information:

Here is some information on the meeting in Lyon. We have updated the workshop webpage http://bit.ly/ccdsc-2016 with the workshop agenda.

On Monday October 3^rd there will be a bus to pick up participants at Lyon's Saint Exupéry (old name Satolas) Airport at 3:00. (Note that the Saint Exupéry airport has its own train station with direct TGV connections to Paris via Charles de Gaulle. If you arrive by train at Saint Exupéry airport please go to the airport meeting point (point-rencontre) (second floor, next to the shuttles, near the hallway between the two terminals, see http://www.lyonaeroports.com/en/practicals-informations/information-points ).

The bus will be at the TGV station is after a long corridor from the airport terminal. The bus stop is near the station entrance on the parking lot called "depose minute".

The bus will then travel to pick up people at the Lyon Part Dieu railway station at 4:45. (There are two train stations in Lyon, you want Part Dieu station not the Perrache station.) There will be someone with a sign at the "Meeting Point/point de rencontre" of the station to direct you to the bus.

The bus is expected to arrive at the La Maison des Contes around 5:30. We would like to hold the first session on Tuesday evening from 6:30 pm to 8:00 pm, with dinner following the session. The La Maison des Contes is about 43 Km from Lyon. For a map to the La Maison des Contes go to http://maps.google.com and type in: “427 chemin de Chanzé 69490 Dareizé” or see: Maps: click here

Map of Chateau: click here

VERY IMPORTANT: Please send your arrival and departure times to Jack so we can arrange the appropriate size bus for transportation. VERY VERY IMPORTANT: If your flight is such that you will miss the bus on Monday, October 3^rd at 3:00 send Bernard your flight arrival information so he can arrange for a transportation to pick you up at the train station or the airport in Lyon. It turns out that a taxi from Lyon to the Chateau can cost as much as 100 Euro and the Chateau may be hard to find at night if you rent a car and are not a French driver :-).

At the end of the meeting on Thursday afternoon, we will arrange for a bus to transport people to the train station and airport. If you are catching an early flight in the morning of Friday October 7^th you may want to stay at the hotel located at Lyon's Saint Exupéry Airport,

see http://www.lyonaeroports.com/eng/Shops-facilities/Hotels for details.

There are also many hotels in Lyon area, see: http://www.en.lyon-france.com/

Due to room constraints at the La Maison des Contes, we ask that you not bring a guest. Dress at the workshop is informal. Please tell us if you need special requirements (vegetarian food etc...) We are expecting to have internet and wireless connections at the meeting, but you know this is France.

Please send this information to Jack (dongarra@icl.utk.edu) by August 5^th.

Name:

Institute:

Title:

Abstract:

Participant’s brief biography:

Arrival / Departure Details:

		Arrival Times in Lyon	Departure Times in Lyon	Special
Ilkay	Altintas	10/3 10:25 UA 8024	10/7 12:00 UA 8067
Rosa	Badia	10/3 13:55 U2 4417	10/6 19:00 VY 1223
Pavan	Balaji	10/3 14:10 UA 8916	10/7 7:20 UA 8881
Pete	Beckman	10/3 14:10 UA 8916	10/6 Train
Anne	Benoit	Drive	Drive
Fran	Berman	10/3 11:00 BR 3587	10/7 9:00 LH 2247
Bill	Brantley	10/3 10:55 TA 1807	10/6 18:20 TA 1810
Ron	Brightwell	10/3 13:20 DL 9515	10/7 6:40 DL 8611
Franck	Cappello	10/3 14:00 Part Dieu	10/7 16:57 Part Dieu
Barbara	Chapman	10/3 Part Dieu	10/4 21:00
Alok	Choudhary	10/3 13:20 KL 1417	10/6 19:05 LH 1079
Joe	Curley	10/3 13:00 Part Dieu	10/6
Anthony	Danalis	10/3 11:15 DL 8320	10/6 16:25
Ewa	Deelman	Part Dieu	10/6 19:30 to LHR
Patrick	Demichel	Drive	Drive
Frederic	Desprez	Drive 10/5	Drive 10/6
Jack	Dongarra	10/3 11:15 DL 8320	10/7 6:40 DL 8611
Ian	Foster	10/3 14:10 UA 8916	10/7 9:00 UA 9486 depart 10/6 10:00
Geoffrey	Fox	10/3 13:20 KL 1417	10/7 10:05 KL 1414
Haohuan	Fu	10/3 10:20 LH 1074	10/6 19:05 LH 1079
Al	Geist	10/3 11:15 DL 8320	10/7 6:40 DL 8611
Rich	Graham	10/3 10:20 UN 8914	10/7 9:00 UA 9486
Laura	Grigori	Part Dieu	Departs Wednesday 10/4
Andrew	Grimshaw	10/3 12:15 LH 2248	10/6 17:05 LH 2251
Bill	Gropp	10/3 train Part Dieu	10/7 flights
Tony	Hey	10/3 18:35 BA 362 pickup Roland Taxi cellular number is +33 6 08 25 55 46	10/5 19:30 BA 363
Minh Quan	Ho	Part Dieu	Part Dieu
Torsten	Hoefler	10/3 Part Dieu 13:30 from Geneva	Part Dieu
Jeff	Hollingsworth	10/4 12:15 UA 9487	10/7 9:00 UA 9486
Emmanuel	Jeannot	10/3 14:30 AF 5372	10/6 15:15 AS 4126 depart 10/6 12:00
Carl	Kesselman	10/3 14:10 LH 1076	10/6 13:55 AF 8285 depart 10/6 10:00
Laurent	Lefevre	Drive 10/3 20:00	Drive 10/6 5:00	vegetarian (no meat, but fish, milk, eggs OK)
Rusty	Lusk	10/3 14:10 UA 8916	10/6 Train
Doug	Miles	10/3 bus from airport	10/6 bus to airport
Bernd	Mohr	10/3 Part Dieu TGV 9826 14:00	10/6 Part Dieu TGV 6622 15:00
Frank	Mueller	10/3 14:10 LH 1076	10/6 17:05 LH 2251
Dimitrios	Nikolopoulos	10/3 15:05 EI 552	10/6 15:45 EI 1553 depart 10/6 12:00
Christian	Obrecht	Drive	Drive
Manish	Parashar	10/3 12:15 UA 9487	10/7 7:20 UA 8881	Vegetarian
Padma	Raghavan	10/3 11:00 UA 9959	10/7 10:00 UA 973
Yves	Robert	Drive	Drive
Rob	Ross	10/3 14:00 UA 8916	10/7 7:20 UA 8881
Gunter	Roeth	10/3 Drive	10/5 Drive
Joel	Saltz	10/3 10:20 LH 1074	10/6 14:25 LH 1077 depart 10/6 10:00
Vaidy	Sunderam	10/3 13:20 DL 9515	10/7 10:05 DL 9468
Martin	Swany	10/3 13:20 DL 9515	10/7 10:05 DL 9468
Michela	Taufer	Part Dieu	10/6 19:30
Samuel	Thibault	10/3 14:30 AF 5372	10/6 20:20 AF 5383
Bernard	Tourancheau	Make the airport bus at 15:00	10/6 16:00
Jeff	Vetter	10/3 9:45 AF 7640	10/7 6:40 AF 7651
Frederic	Vivien	Drive	Drive
David	Walker	10/3 13:20 KL 1417	10/6 18:20 KL 1416

Abstracts:

Ilkay Altintas, UCSD

Workflows as an Operation Tool for Scientific Computing using Data Science

Workflows are used by many scientific communities to capture, automate and standardize computational and data practices in science. In addition to the earlier use of workflows in HPC and HTC applications, they present an opportunity to operationalize dynamic data-driven solutions in which big data systems can be merged with existing big data and cloud solutions, especially in scenarios where a scalable and reusable integration of streaming data, analytical tools and computational infrastructure is needed. This talk will focus on using workflows as a scalable and reproducible programming model for data streaming and steering within dynamic data-driven applications, e.g., wildfire prediction, smart manufacturing, smart grids and traffic control. A summary our ongoing research efforts on using data science techniques for end-to-end performance prediction and dynamic steering of workflow-driven applications will also be presented.

Rosa M Badia, Barcelona Supercomputing Center

Task-based programming in COMPSs to converge from HPC to Big Data

Task-based programming have proven to be a suite model for HPC applications. The different instances of StarSs have been good demonstrators of this and have promoted the acceptance of task-based programming in the OpenMP standard. Big Data programming models have been dominated by approaches like MapReduce/Hadoop or Spark, which define a set of operators to be used by the applications. Since COMPSs is the StarSs instance that tackles distributed computing (including Clouds), it can be considered in order to provide a task-based programming model for Big Data applications.

The talk will describe why we consider that task-based programming models are a good approach for Big Data and will compare examples between COMPSs and Apache Spark, including performance results.

Pavan Balaji, ANL

How I Learned to Stop Worrying about Exascale and Love MPI

Pete Beckman, ANL

WaggleVision

Sensors and embedded computing devices are being woven into buildings, roads, household appliances, and light bulbs. Most sensors and actuators are designed to be as simple as possible, with low-power microprocessors that just push sensor values up to the cloud. However, another class of powerful, programmable sensor node is emerging. The Waggle (www.wa8.gl) platform supports parallel computing, machine learning, and computer vision for advanced intelligent sensing applications. Waggle is an open source and open hardware project at Argonne National Laboratory that has developed a novel wireless sensor system to enable a new breed of smart city research and sensor-driven environmental science. Leveraging machine learning tools such as Google’s TensorFlow and Berkeley’s Caffe and computer vision packages such as OpenCV, Waggle sensors can understand their surroundings while also measuring air quality and environmental conditions. Waggle is the core technology for the Chicago ArrayOfThings (AoT) project (https://arrayofthings.github.io). The AoT will deploy 500 Waggle-based nodes on the streets of Chicago beginning in 2016. Prototype versions are already deployed on a couple campuses. The presentation will outline the current progress of designing and deploying the current platform, and our progress on research topics in computer science, including parallel computing, operating system resilience, data aggregation, and HPC modeling and simulation.

Anne Benoit, ENS Lyon, France

Resilient application co-scheduling with processor redistribution

Recently, the benefits of co-scheduling several applications have been demonstrated in a fault-free context, both in terms of performance and energy savings. However, large-scale computer systems are confronted to frequent failures, and resilience techniques must be employed to ensure the completion of large applications. Indeed, failures may create severe imbalance between applications, and significantly degrade performance. We propose to redistribute the resources assigned to each application upon the striking of failures, in order to minimize the expected completion time of a set of co-scheduled applications. First, we introduce a formal model and establish complexity results. When no redistribution is allowed, we can minimize the expected completion time in polynomial time, while the problem becomes NP-complete with redistributions, even in a fault-free context. Therefore, we design polynomial-time heuristics that perform redistributions and account for processor failures. A fault simulator is used to perform extensive simulations that demonstrate the usefulness of redistribution and the performance of the proposed heuristics.

Fran Berman, RPI

Sustaining the Data Ecosystem

Innovation in a digital world presupposes that the data will be there when we need it, but will it? Without enabling technical infrastructure, supporting social infrastructure, and sufficient attention to the stewardship and long-term preservation of digital data, data may become inaccessible or lost. This is particularly critical for data generated by sponsored research projects where the focus is typically on innovation rather than infrastructure, and support for stewardship and preservation may be short-term. In this talk, we provide a holistic perspective on the opportunities and challenges involved in creating a sustainable data ecosystem to drive data-driven innovation for current and future applications.

Ron Brightwell, Sandia Labs

Embracing Diversity: OS Support for Integrating High-Performance Computing and Data Analytics

It is unlikely that one operating system or a single software stack will support the emerging and future needs of the high-performance computing and high-performance data analytics applications. There are many technical and non-technical reasons why functional partitioning through customized software stacks will continue to persist. Rather than pursuing approaches that constrain the ability to provide a system software environment that satisfies a diverse and competing set of requirements, methods and interfaces that enable the use and integration of multiple software stacks should be pursued. This talk will describe the challenges that motivate the need to support multiple concurrent software stacks for enabling application composition, more complex application workflows, and a potentially richer set of usage models for extreme-scale high-performance computing systems. The Hobbes project led by Sandia National Laboratories has been exploring operating system infrastructure for supporting multiple concurrent software stacks. This talk will describe this infrastructure, relevant interfaces, and highlight issues that motivate future exploration.

Franck Cappello, ANL

Lossy Compression of scientific data: from Stone Age to Renaissance

Data-reduction is already necessary for many scientific simulations and experiments. Exascale simulations and updates of large scale instruments will require significantly more reduction. Compression is one of the fundamental techniques that can help address this challenge. Can we compress floating point datasets more than with Gzip of JPEG (Stone age)? The answer is yes with the best state of the art compressors (Renaissance). But it's not easy. Good compressors are intricate machineries optimizing multiple objectives: compression factors, respect of error bounds, compression speed, decompression speed, etc. In this talk, I will present the best in class lossy compressor for floating point dataset, respecting strictly user set error bounds. Can we declare victory? No, some datasets are "hard to compress". I will present our understanding of them and techniques to improve their compression. Are user ready to use lossy compressors? Well, there is no consensus here: one key for the adoption is the understanding of the controls that users need on the compression errors.

Alok N. Choudhary, Northwestern University

Scaling Resiliency via machine learning and compression

Data checkpoint-restart is the most common fault tolerance technique in High Performance Computing (HPC) systems, which writes the full state of the machine to stable storage and restarts the last point of checkpoint. As the HPC systems move towards exascale, the external storage space, costs of time and power to move data off system from traditional checkpointing method threaten to overwhelm not only the simulations but also the post-simulation data analysis. One conventional practice to address this problem is to apply data compression in order to achieve data reduction. However, most of the lossless compression techniques that look for repeated patterns are ineffective for scientific data, as when high-precision data is used, common patterns are rare to find.

In this talk, we present machine learning techniques that learn the distribution of changes in state values of simulations, and an algorithm that significantly compresses data with guaranteed point-wise error bounds. Capturing the distribution of relative changes in pair-wise data elements instead of storing the data itself provides an opportunity to incorporate the temporal dimension of the data and learn the distribution evolution of the changes. The algorithm consists of the following steps. (1) Similar to forward predictive coding in video compression, it first computes the relative change in data values between two consecutive timestamps or iterations. (2) A machine learning-based data approximation algorithm will be designed to transform the change distribution into groups (an index domain with a much smaller space required), and encode data points by their group indices. (3) As each data point is approximated by its corresponding group index value, our approach allows a program restart or post-simulation analysis to use the compressed data with a controlled approximation.

Joe Curley, Intel

Tales of Parallel App Enabling on the Path to Exascale

Anthony Danalis, UTK

Dataflow programming: Do we need it for exascale?

Task based execution has been growing in popularity in the last few years. Several new runtime systems are being actively developed and some more traditional ones are adding tasking support. However, task execution and dataflow programming are not the same thing. This talk will discuss the differences between the two and examine the pros and cons of the different ways of supporting task based parallelism. The presentation will also attempt to extrapolate into the future of high performance computing and examine the possibility of tighter integration between the different layers of the stack, namely the runtime, compiler and the human developer.

Ewa Deelman, ISI

What is missing in workflow technologies

This talk will look at the current state of workflow technologies. It will give examples from the Pegasus Workflow Management System and describe current capabilities. Examples will be drawn from a variety of applications in astronomy, gravitational-wave physics, earthquake science and bioinformatics. These workflows are executing on heterogeneous environments including clouds, HPC, and HTC resources. Finally, the talk will lay out the missing capabilities that are necessary to broaden the use of workflows in science.

Patrick Demichel, HP

The future of IT technologies

Our industry is in perpetual technological transformation and even acceleration, with an immense appetite for always more compute, storage, network, applications, etc. We now face an immense opportunity with the Internet of Things world and all its potentiality to solve many challenges of our society ; but our industry is having for the first time the symptoms it reaches some fundamental limits of scaling. Then we need a more drastic transformation, we will see what technologies will help us to continue on that road towards the massively distributed Exascale systems we envision and all the technologies we need to develop.

Frederic Desprez, INRIA

BOAST: Performance Portability Using Meta-Programming and Auto-Tuning

Performance portability of HPC applications is of paramount importance but tedious and costly in terms of human resources. Unfortunately those efforts are often lost when migrating to new architectures as optimization are not generally applicable. In the Mont-Blanc European project we tackle this problem from several angles. One of them is by using task based runtime (OmpSs) to get adaptive scientific applications. Another one is by promoting scientific application auto-tuning. Unfortunately, the investment to setup a dedicated auto-tuning framework is usually too expensive for a single application. Source to source transformations or compiler based solutions exist but sometimes prove too restrictive to cover all use-cases.

We thus propose BOAST a meta-programming framework aiming at generating parametrized source code. The aim is for the programmer to be able to orthogonally express optimizations on a computing kernel, enabling a thorough search of the optimization space. This also allows a lot of code factorization and thus code base reduction. We will demonstrate the use of BOAST on a classical Laplace kernel. Demonstrating how our embedded DSL allowed the description of non trivial optimizations. We will also show how the BOAST framework enabled performance and non regression tests to be performed on the generated code versions, resulting in proven and efficient computing kernels on several architectures.

Ian Foster, U of Chicago and ANL

New Directions in Globus: Collections, Responsive Storage, and Safe Data

The Globus team has spent the past five years developing new cloud software-as-a-service approaches to research data management. This work has produced powerful cloud services and a widely deployed network of more than 10,000 Globus endpoints that together enable ubiquitous, secure, and efficient access to large quantities of scientific data. We are now investigating how we can build on this distributed infrastructure to automate further research data management tasks, such as mapping distributed data collections, detecting and responding to storage system events, and collaborative analysis of sensitive information. I present here some of the use cases that motivate this work, the ideas that we are exploring, and early results.

Geoffrey Fox, Indiana University

Distinguishing Parallel and Distributed Computing Performance

We are pursuing the concept of HPC-ABDS -- High Performance Computing Enhanced Apache Big Data Stack -- where we try to blend the usability and functionality of the community big data stack with the performance of HPC. Here we examine major Apache Programming environments including Spark, Flink, Hadoop, Storm, Heron and Beam. We suggest that parallel and distributed computing often implement similar concepts (such as reduction, communication or dataflow) but that these need to be implemented differently to respect the different performance, fault-tolerance, synchronization, and execution flexibility requirements of parallel and distributed programs. We present early results on the HPC-ABDS strategy of implementing best-practice runtimes for both these computing paradigms in major Apache environments.

Haohuan Fu, Tsinghua University

Refactoring and Optimizing the Community Atmosphere Model (CAM) on the Sunway TaihuLight Supercomputer

This paper reports our efforts on refactoring and optimizing the Community Atmosphere Model (CAM) on the Sunway TaihuLight supercomputer, which uses a many-core processor that consists of management processing elements (MPEs) and clusters of computing processing elements (CPEs). To map the large code base of CAM to the millions of cores on the Sunway system, we take OpenACC-based refactoring as the major approach, and apply source-to-source translator tools to exploit the most suitable parallelism for the CPE cluster, and to fit the intermediate variable into the limited on-chip fast buffer. For individual kernels, when comparing the original ported version using only MPEs and the refactored version using both the MPE and CPE clusters, we achieve up to 22x speedup for the compute-intensive kernels. For the 25km resolution CAM global model, we manage to scale to 24,000 MPEs, and 1,536,000 CPEs, and achieve a simulation speed of 2.81 model years per day.

Al Geist, ORNL

Are Killer Apps Killing Exascale?

In 2009 the goal was to get to exascale by 2018. In 2013 the goal was slipped to 2020. Today the U.S. Exascale Computing project is targeting 2023 for the first U.S. exascale computer. Is it technology, politics, or the lack of any compelling killer apps that is driving out the target date for exascale? This talk examines all three of these reasons and shows that while technology and politics play a role, the lack of even a single killer exascale app is killing exascale.

Rich Graham, Mellanox

The Active Network

Laura Grigori, INRIA

Low rank approximation and write avoiding algorithms

In this talk we will discuss algorithms for computing a low rank approximation of a matrix. This problem has numerous and diverse applications ranging from scientific computing problems such as fast solvers for integral equations to data analytics problems such as principal component analysis (PCA) or image processing. We discuss various approaches for computing such an approximation that show a trade-off between speed versus deterministic/probabilistic accuracy.

We also discuss the need to avoid writing suggested by emerging memory technologies as nonvolatile memories. Writes can be much more expensive than reads in some current and emerging technologies, in terms of both time and energy. This motivates us to study algorithms that reduce the number of writes in addition to more generally reducing communication between processors and between different levels of the memory hierarchy.

William Gropp, University of Illinois at Urbana-Champaign

Do You Know What Your I/O is Doing?

Even though supercomputers are typically described in terms of their floating point performance, science applications also need significant I/O performance for all parts of the science workflow. This ranges from reading input data, to writing simulation output, to conducting analysis across years of simulation data. This talk presents recent data on the use of I/O at several supercomputing centers and what that suggests about the challenges and open problems in I/O on HPC systems. The talk concludes with an example of how I/O performance can be improved in an application.

Andrew Grimshaw, U of Virginia

PCubeS/IT - A Type Architecture and Portable Parallel Language for Hierarchical Parallel Machines

Writing portable parallel applications remains a challenge, particularly in the presence of increasingly heterogeneous and deep node architectures. Achieving good performance is especially challenging for rookie parallel programmers who lack the experience of optimizing performance on different types of hardware. Snyder, in his seminal work “Type Architectures, Shared Memory, and the Corollary of Modest Potential,” argues that a reflection of the salient features of an architecture in the programming language is necessary. The PCubeS type architecture represents a parallel machine as a finite hierarchy of parallel processing spaces each having fixed, possibly zero, compute and memory capacities and containing a finite set of uniform, independent sub-spaces.

The IT language is a PCubeS language in which computations are defined to take place in a corresponding hierarchy of logical processing spaces, each of which may impose a different partitioning of data structures. The programmer is responsible for decomposing the problem into multiple spaces, selecting the best decomposition of variables in each space, and for mapping the logical IT spaces to the physical spaces of the target machine. Only the last step, mapping logical to physical spaces is different for each target machine. The rest of the IT program remains the same for different target machines.

The IT compiler and run-time system are responsible for breaking up and executing the code for each logical space on the specified hardware space and managing all communication and synchronization between partitions within a space and between different logical spaces. Two IT compilers have been completed: a multicore compiler and a distributed memory/multicore compiler that uses MPI for inter-host communication. A third compiler, adding GPGPUs to the distributed memory/multicore compiler is in development.

In this talk I will briefly review what makes efficient parallel programming difficult for programmers, and why the problem is only getting worse as machines are developed with deeper and deeper memory hierarchies and more and more node heterogeneity. I will then introduce the PCubeS type architecture and IT programming language as a mechanism to addressing the efficient portable parallel programming problem via a series of sample programs. Finally I will present the performance results for several application kernels on multicore, distributed memory/multicore, and distributed memory/GPGPU.

Tony Hey, Science and Technology Facilities Council, UK

The Revolution in Experimental and Observational Science: The Convergence of Data-Intensive and Compute-Intensive Infrastructure

The revolution in Experimental and Observational Science (EOS) is being driven by the new generation of facilities and instruments, and by dramatic advances in detector technology. In addition, the experiments now being performed at large-scale facilities, such as the Diamond Light Source in the UK and Argonne Advanced Photon Source in the US, are becoming increasingly complex, often requiring advanced computational modelling to interpret the results. There is also an increasing requirement for the facilities to provide near real-time feedback on the progress of an experiment as the data is being collected. A final complexity comes from the need to understand multi-modal data which combines data from several different experiments on the same instrument or data from several different instruments. All of these trends are requiring a closer coupling between data and compute resources.

Minh Quan HO, Laboratoire Informatique de Grenoble (LIG)

LBM 3D stencil memory bound improvement with many-core processors asynchronous transfers

State-of-the-art academic and industrial many-core processors present an alternative to mainstream CPU and GPU processors. In particular, the 93-Petaflops Sunway supercomputer, built from NoC-based many-core processors, has opened a new era for high performance computing that does not rely on GPU acceleration. However, memory bandwidth remains the main challenge for these architectures. This motivates our approach for optimizing 3D Lattice Boltzmann Method (LBM) applications, one of the most data-intensive kinds of stencil computations, on many-core processors by using local memory and asynchronous software-prefetching. A representative 3D LBM solver is taken as an example. We achieve 33% performance gain on the Kalray MPPA-256 processor, by actively prefetching data compared to a “passive” programming model (OpenCL). We also introduce two- wall, a new LBM propagation algorithm which performs in-place lattice updates. This method cuts down memory requirement by half, reduces bandwidth losses in copying halo cells and offers another 5% improvement, delivering an overall 38% performance gain.

Torsten Hoefler, ETH Zurich

Automatic GPU compilation and why you want to run MPI on your GPU

Auto-parallelization of programs that have not been developed with parallelism in mind is one of the holy grails in computer science. It requires understanding the source code's data flow to automatically distribute the data, parallelize the computations, and infer synchronizations where necessary. We will discuss our new LLVM-based research compiler Polly-ACC that enables automatic compilation to accelerator devices such as GPUs. Unfortunately, its applicability is limited to codes for which the iteration space and all accesses can be described as affine functions. In the second part of the talk, we will discuss dCUDA, a way to express parallel codes in MPI-RMA, a well-known communication library, to map them automatically to GPU clusters. The dCUDA approach enables simple and portable programming across heterogeneous devices due to programmer-specified locality. Furthermore, dCUDA enables hardware-supported overlap of computation and communication and is applicable to next-generation technologies such as NVLINK. We will demonstrate encouraging initial results and show limitations of current devices in order to start a discussion.

Jeff Hollingsworth, U Maryland

Handling Phase Behavior of Parallel Programs

The execution of high-performance computing applications can often classified into phases with distinct behavior. These periods of time are relatively short and create problems for many tools. One type of tool that can be confused by short phases are auto-tuners. In particular, optimal parameters for one phase may not be optimal for another. I will describe how run-time phase annotations can be combined with an online auto-tuner like Active Harmony. This allows an auto-tuner to reassemble the disjoint (and possibly interleaved) execution phases of an application, and optimize each as if they were whole uninterrupted tuning targets. I will present some results of phase analysis and talk about future challenges as auto-tuning is scaled up to larger systems.

Emmanuel Jeannot, INRIA Bordeaux

Towards System-Scale Optimization of HPC Applications

TADaaM (Topology Aware DAtA Management) is a new Inria project team targeting the optimization of HPC applications taking into account the topology, the affinity, the memory hierarchy, the network contention, the input data and other factors impacting performance. In this talk, we will give an overview of the problematic we want to address, examples of concrete research issues we are looking at as well as set of existing software and results that will be the basis of this project for the coming years. This talk will welcome comments and feedback about this project as well as collaborations and use cases.

Carl Kesselman, ISI

Scientific Data Asset Management

In his seminal 1960 paper "Man-Computer Symbiosis,” J. C. R. Licklider observed: “my choices of what to attempt and what not to attempt [are] determined to an embarrassingly great extent by considerations of clerical feasibility, not intellectual capability.” Today with the advances in data-driven discovery enabled by big-data, the internet of things, and the associated data deluge, the situation is if anything worse not better over fifty years later. One could argue that the dismal rates of scientific repeatability are attributable at least in part to this situation. Issues associated with the costs and complexity of data management are rife across all scientific discipline, yet surprisingly, almost no tools exist to address this problem. In my talk, I will introduce the idea of scientific data asset management as a missing element of the scientific software ecosystem and illustrate how common concepts and tools can be developed that can be applied across many diverse use cases to significantly streamline the process of data driven discovery.

Laurent Lefevre, Inria

GreenFactory : orchestrating power capabilities and leverages at large scale for energy efficient infrastructures

With hardware improvement, several leverages of power and energy capabilities are now available (shutdown, slowdown...). Dealing with such capabilities at large scale remains a real challenge as some of them can have some impact on performance, cooling, or even contradictory objectives. We will present several models of such leverages and how the orchestration of such capabilities can improve energy efficiency of large scale infrastructures and applications.

Rusty Lusk, Argonne National Laboratory

From Automated Theorem Proving to Nuclear Structure Analysis with Self-Scheduled Task Parallelism

Long ago, we tried, with some success, to parallelize the automated theorem proving system we were working on at the time. Theorem proving is a challenging application for parallelism because of its unpredictable paths forward and irregular subtask sizes. The technique we used then, before there was such a thing as a taxonomy of parallel programming models, is now perhaps worthy of a name; we propose “self-scheduled task parallelism,” which is different from many current “task parallel” models and systems. In this talk I will introduce the central idea in the context of automated theorem proving and then show how it has been modernized and adapted for the exascale age, with particular application to large-scale Monte Carlo calculations of nuclear structure.

Doug Miles, PGI/Nvidia

On the Role of Compiler Directives in High-Performance Computing

Compiler directives were originally conceived and used as hints to enable better optimization and more efficient code generation. The advent of directive-based parallelization extended the use of directives into the realm of de facto language extensions. What role should directives play in HPC programming, and how should they interact with and impact parallel constructs in current and future standardized programming languages? This brief talk will explore the issue and provide a few thoughts on a logical path forward for compiler directives in HPC.

Bernd Mohr, Juelich Supercomputing Centre, Germany

POP -- Parallel Performance Analysis and Tuning as a Service

Developers of HPC applications can now count on free advice from European experts to analyse the performance of their scientific codes. The Performance Optimization and Productivity (POP) Centre of Excellence, funded by the European Commission under H2020, started operating at the end of 2015. The POP Centre of Excellence gathers together experts from BSC, JSC, HLRS, RWTH Aachen University, NAG and Ter@tec. The objective of POP is to provide performance measurement and analysis services to the industrial and academic HPC community, help them to better understand the performance behaviour of their codes and suggest improvements to increase their efficiency. Training and user education regarding application tuning is also provided. Further information can be found at http://www.pop-coe.eu/. The talk will give an overview of the POP Centre of Excellence and describe the common performance assessment strategy and metrics developed and defined by the project partners. The presentation will close with some success stories and reports from performance assessments already performed in the first year of operation by POP personal.

Frank Mueller, NC State University

Mini-Ckpts: Surviving OS Failures in Persistent Memory

Current resilient efforts in HPC have focused on application fault-tolerance rather than the operating system (OS), despite the fact that recent studies have suggested that failures in OS memory may be more likely. The OS is critical to a system’s correct and efficient operation of the node and processes it governs—and the parallel nature of HPC applications means any single node failure generally forces all processes of this application to terminate due to tight communication in HPC. Therefore, the OS itself must be capable of tolerating failures in a robust system.

We contribute mini-ckpts, a framework which enables application survival despite the occurrence of a fatal OS failure or crash. minickpts achieves this tolerance by ensuring that the crit ical data describing a process is preserved in persistent memory prior to the failure. Following the failure, the OS is rejuvenated via a warm reboot and the application continues execution effectively making the failure and restart transparent. The mini- ckpts rejuvenation and recovery process is measured to take between three to six seconds and has a failure-free overhead of between 3-5% for a number of key HPC workloads. In contrast to current fault-tolerance methods, this work ensures that the operating and runtime systems can continue in the presence of faults. This is a much finer-grained and dynamic method of fault-tolerance than the current coarse-grained application-centric methods. Handling faults at this level has the potential to greatly reduce overheads and enables mitigation of additional faults.

Dimitrios Nikolopoulos, Queens University Belfast

Computational Significance and its Implications for HPC

This talk explores the scope for relaxing the accuracy of computation and storage in HPC systems and applications by leveraging abstractions and metrics of computational significance. While this effort is largely motivated by a desire to reduce the energy footprint of HPC systems, we explore broader implications of the approach on resilience, performance, and the design of the system software stack.

Christian Obrecht, National Institute of Applied Sciences in Lyon

On a novel method for high performance computational fluid dynamics

In engineering applications of computational fluid dynamics, using unstructured meshes has been the obvious choice for decades. However, building an appropriate unstructured mesh is often a time-consuming task. In recent years, much attention has been drawn on alternative methods operating on regular Cartesian meshes. Besides trivial meshing, this kind of approach is usually well-suited to massively parallel processors. The lattice Boltzmann method (LBM) is the most popular of these alternatives. It has however the disadvantage of involving significantly more data per fluid cell than classic Navier-Stokes solvers which impinges upon performance in a memory bound context.

In this contribution, we will introduce the link-wise artificial compressibility method (LW-ACM), a recently proposed approach which combines the advantages of the LBM to the lower memory requirements of classic Navier-Stokes solvers. For three-dimensional simulations, memory consumption is reduced by a factor of 5 and performance on GPU increases by a factor of 2, with respect to LBM. Several implementations of the LW-ACM, using either CUDA or OpenCL, will be presented. Performance and optimisation issues on both CPU and GPU will be discussed as well.

Manish Parashar, Rutgers University

Experiments with Software-Defined Environments for Science

Software-defined platforms, such as those enabled by Cloud services, provide new levels of flexibility, which combined with autonomic capabilities can lead to very dynamic infrastructures that can adapt themselves to application and user needs. Such platforms can enable new formulations in science and engineering by opportunistically leveraging heterogeneous and loosely connected data and computing resources. In this talk I will explore how elastic software-defined execution based on autonomic federation of resources and management of applications can support such dynamic and data-driven workflows. I will also explore how such abstractions can potentially lead to new paradigms and practices in science and engineering. This talk is based on research that is part of the CometCloud project at the Cloud and Autonomic Computing Center at Rutgers and at the Rutgers Discovery Informatics Institute.

Padma Raghavan, Vanderbilt University

Synchronization, Load-Balancing and Redundant Calculations: Finding the Sweet Spot of High Performance Computing

Parallel processing at scale as well as workflows that operate on "big data" are becoming increasingly pervasive. This is changing the space of trade-offs between synchronization levels, load-balancing and redundant calculations to achieve high performance. We will explore this matter with some limit studies that illustrate the value of finding the sweet spots in order to start a discussion on these trade-offs could inform the development of algorithms, programming models and software environments.

Guther Roeth, NVIDIA

Deep Learning – Impact on Modern Life

An introduction to the new computing paradigm that is deep learning, a subset of machine learning "Teaching computers to think!". Accelerated by NVIDIA hardware and software, whether your application involves natural language processing, robotics, bioinformatics, speech, video, search engines, online advertising or ﬁnance this overview will highlight the incredible ability of the algorithms currently helping us address some of our Grandest of Challenges.

Rob Ross, ANL

From File Systems to Services: Changing the Data Management Model in HPC

Abstract:

HPC applications are composed from software components that provide only the communication, concurrency, and synchronization needed for the task at hand. In contrast, parallel file system are kernel resident, fully consistent services with semantic obligations developed on single core machines 50 years ago; parallel file systems are old-fashioned system services forced to scale as fast as the HPC system. Rather than the monolithic storage services seen today, we envision an ecosystem of services being composed to meet the specific needs of science activities at extreme scale. In fact, a nascent ecosystem of services is present today. In this talk we will discuss drivers leading to this development, some examples in existence today, and work we are undertaking to accelerate the rate at which these services are developed and mature to meet application needs.

Joel Saltz,SUNY Stronybrook

Convergence of Data and Computation: Integration of Sensors and Simulation

A great variety of biomedical and physical application areas involve the need to make predictions that require 1) multi-scale sensor data, 2) computations aimed at creating quantitative multi-scale characterizations of material/chemical/biological properties and 3) use of material/chemical/biological characterizations to make predictions. I will describe this paradigm and discuss ways in which this plays out in the rapidly emerging area of exascale Cancer research. I will also describe ideas for and prototypes of applicable tools and methods.

Vaidy Sunderam, Emory University

Cost and Utility Tradeoffs on IaaS Clouds, Grids, and On-Premise Resources

Cloud computing is now a mainstream technology in many application domains and user constituencies. For scientific high-performance applications in academic and research settings

however, the trade-offs between cost and elasticity on the one hand, and performance and access on the other, is not always clear. We discuss our experiences with comparing cost and utility for a class of numerical codes on three typical platform-types available to researchers: IaaS clouds, grids, and on-premise local resources. To rank the tested platforms, we introduce a simple utility function describing the value of a completed computational task to the user as a function of the wait time and the cost of the computation. Our results suggest that each platform has situational value, providing tradeoffs between cost and turnaround.

Martin Swany, Indiana University

Offloading Collective Operations to Programmable Logic

This talk describes our architecture and implementation for offloading collective operations to programmable logic in the communication substrate. Collective operations -- operations that involve communication between groups of cooperating processes -- are widely used in parallel processing. The design and implementation strategies of collective operations plays a significant role in their performance and thus affects the performance of many high performance computing applications that utilize them. The programmable logic provided by FPGAs is a powerful option for creating

task-specific logic to aid applications. Leveraging FPGAs to improve collective operation performance stands to offer significant capability and performance

Michela Taufer, University of Delaware

In Situ Data Analysis of Protein Trajectories

The transition towards exascale computing will be accompanied by a performance dichotomy. Computational peak performance will rapidly increase; I/O performance will either grow slowly or be completely stagnant. Essentially, the rate at which data are generated will grow much faster than the rate at which data can be read from and written to the disk. Molecular Dynamics (MD) simulations will soon face the I/O problem of efficiently writing to and reading from disk on the next generation of supercomputers.

This talk targets MD simulations at the exascale and proposes a novel technique for in situ data analysis of MD trajectories. Our technique maps individual trajectories' substructures (i.e., alpha-helices and beta-strands) to metadata frame by frame. The metadata captures the conformational properties of the substructures. The ensemble of metadata can be used for automatic, strategic analysis within a trajectory or across trajectories, without manually identify those portions of trajectories in which critical changes take place. We demonstrate our technique's effectiveness by applying it to 26.3k helices and 31.2k strands from 9,917 PDB proteins and by providing three empirical case studies.

Samuel Thibault, Université Bordeaux, INRIA

Task-graph-based applications, from theory to exascale?

Expressing parallelism through task graphs (DAG), well studied theoretically for the past decades, has recently gained a lot of attention in HPC: a flurry of task-based runtime systems have appeared, and task graphs have been standardized in OpenMP. Will it be the graal of parallelism, allowing to easily scale to exa? The StarPU runtime system has previously proved to be able to seamlessly exploit heterogenous systems thanks to task graphs and state-of-the-art scheduling heuristics. We here present our work for HPC over clusters, which shows promising results with very reasonable effort from the application programmer, both on real platforms and leveraging task-accurate simulation.

Jeff Vetter, ORNL

Performance Portability for Extreme Scale High Performance Computing

Concerns about energy-efficiency and reliability have forced our community to reexamine the full spectrum of architectures, software, and algorithms that constitute the HPC ecosystem. While architectures have remained relatively stable for almost two decades, new architectural features, such as heterogeneous processing, non-volatile memory, and optical interconnection networks, have emerged as possible solutions to these constraints. In turn, these architectural changes will force the community to redesign software systems and applications to exploit these new capabilities. However, these dramatic architectural changes are leading to the new challenge of performance portability, where very few applications can make productive use of these very complex systems. In fact, we believe this is the most critical challenge facing HPC. To this end, our group has designed a number of novel methods and tools to help scientists predict and program these increasingly complex systems. In this talk, I will describe a few of these efforts, with a specific focus on emerging non-volatile memory systems and performance prediction.

Frédéric Vivien, INDIA

Multi-level checkpointing and silent error detection

In this talk we will survey a set of recent results in fault-tolerance for HPC applications. Multi-level checkpointing enables to take advantage of different trade-offs between checkpointing cost and fault resilience. Silent data corruptions (SDCs) make a potential important threat to next-generation systems. Hence, several mechanisms to detect them have been proposed. We will see how one can design an efficient multi-level checkpointing protocol, how to choose which SDC detectors to use, and when to look for data corruptions, and how to do everything at the same time (or almost).

David Walker, Cardiff University

Morton Ordering of 2D Arrays for Parallelism and Efficient Access to Hierarchical Memory

This talk describes the recursive Morton ordering that supports efficient access to hierarchical memory across a range of heterogeneous computer platforms, ranging from many-core devices, multi-core processor, clusters, and distributed environments. Programmer-level control of the memory hierarchy is also considered. A brief overview of previous research in this area is given, and algorithms that make use of recursive blocking are described. These are then used to demonstrate the efficiency of the Morton ordering approach by performance experiments on different processors. In particular, timing results are presented for matrix multiplication, Cholesky factorisation, and fast Fourier transform algorithms. The use of the Morton ordering approach leads naturally to algorithms that are recursive, and exposes parallelism at each level of recursion. Thus, the approach advocated in this talk not only provides convenient and efficient access to hierarchical memory, but also provides a basis for exploiting parallelism.

Biographies of Attendees:

Ilkay Altintas is the chief data science officer at the San Diego Supercomputer Center (SDSC), UC San Diego, where she is also the founder and director for the Workflows for Data Science Center of Excellence. Since joining SDSC in 2001, she has worked on different aspects of dataflow-based computing and workflows as a principal investigator and in other leadership roles across a wide range of cross-disciplinary NSF, DOE, NIH, and Moore Foundation projects. Ilkay is a co-initiator of and an active contributor to the popular open-source Kepler Scientific Workflow System and the co-author of publications related to computational data science and e-Sciences at the intersection of scientific workflows, provenance, distributed computing, bioinformatics, observatory systems, conceptual data querying, and software modeling.

Rosa M. Badia holds a PhD on Computer Science (1994) from the Technical University of Catalonia (UPC). She is the manager of the Workflows and Distributed Computing research group at the Barcelona Supercomputing Center (BSC). She is also a Scientific Researcher from the Consejo Superior de Investigaciones Cientificas (CSIC). She was involved in teaching and research activities at the UPC from 1989 to 2008, where she was an Associated Professor since year 1997. From 1999 to 2005 she was involved in research and development activities at the European Center of Parallelism of Barcelona (CEPBA). Her current research interest are programming models for complex platforms (from multicore, GPUs to Grid/Cloud). The group lead by Dr. Badia has been developing StarSs programming model for more than 10 years, with a high success in adoption by application developers. Currently the group focuses its efforts in two instances of StarSs: OmpSs for heterogeneous platforms and COMPSs/PyCOMPSs for distributed computing including Cloud. Dr Badia has published more than 150 papers in international conferences and journals in the topics of her research. She is currently participating in the following European funded projects: ASCETIC, Euroserver, The Human Brain Project, EU-Brazil CloudConnect, the BioExcel CoE, NEXTGenIO, MUG, EUBra BIGSEA, TANGO and it is a member of HiPEAC2 NoE.

Anne Benoit received the PhD degree from Institut National Polytechnique de Grenoble in 2003, and the Habilitation à Diriger des Recherches (HDR) from Ecole Normale Supérieure de Lyon (ENS Lyon) in 2009. She is currently an associate professor in the Computer Science Laboratory LIP at ENS Lyon, France. She is the author of 37 papers published in international journals, and 77 papers published in international conferences. She is the advisor of 7 PhD theses. Her research interests include algorithm design and scheduling techniques for parallel and distributed platforms, and also the performance evaluation of parallel systems and applications, with a focus on energy awareness and resilience. She is Associate Editor of IEEE TPDS, JPDC, and SUSCOM. She is the program chair of several workshops and conferences, in particular she is the program chair for HiPC’2016, and the program chair for the Algorithms track of SC’2016. She is a senior member of the IEEE, and she has been elected a Junior Member of Institut Universitaire de France in 2009.

Francine Berman is the Edward P. Hamilton Distinguished Professor in Computer Science at Rensselaer Polytechnic Institute. She currently serves as U.S. Chair of the Research Data Alliance (RDA) and Co-Chair of RDA's international leadership Council. Previously, she served as the Vice President for Research at RPI, and the High Performance Computing Endowed Chair and Director of the San Diego Supercomputer Center at UC San Diego. Berman currently serves as Chair of the Anita Borg Institute Board of Trustees, as co-Chair of the National Science Foundation CISE Advisory Committee, and as a member of the Board of Trustees of the Sloan Foundation. Her research interests include data cyberinfrastructure and the policy and practice of digital stewardship and preservation. In 2009, Dr. Berman was the inaugural recipient of the ACM/IEEE-CS Ken Kennedy Award for "influential leadership in the design, development, and deployment of national-scale cyberinfrastructure" and in 2015, Berman was nominated by President Obama and confirmed by the U.S. Senate to become a member of the National Council on the Humanities.

Bill Brantley is a Fellow Design Engineer in the Research Division of Advanced Micro Devices leading parts of FastForward and DesignForward research contracts as well as other efforts. Prior to AMD he was at IBM T.J. Watson Research Center where he was one of the architects and implementers of the 64 CPU RP3 (a DARPA supported HPC system development in the mid-80s) including a hardware performance monitor. In IBM Austin he held in a number of roles including the analysis of server performance in the Linux Technology Center. Prior to joining IBM, he completed his Ph.D. at Carnegie Mellon University in ECE after work for 3 years at Los Alamos National Laboratory.

Ron Brightwell currently manages the Scalable System Software Department at Sandia National Laboratories. He joined Sandia in 1995 after receiving his BS in mathematics and his MS in computer science from Mississippi State University. While at Sandia, he has designed and developed software for lightweight compute node operating systems and high-performance networks on several large-scale massively parallel systems, including the Intel Paragon and TeraFLOPS, and the Cray T3 and XT series of machines. He has authored more than 100 peer-reviewed journal, conference, and workshop publications. He is a Senior Member of the IEEE and the ACM.

Franck Cappello is leading the resilience group at ANL, developing research on fault modeling and tolerance. He particularly focused in the past three years on Silent Data Corruption detection and checkpoint/restart environment. He recently started an effort in lossy compression of floating point data, in particular for reducing checkpoint size.

Alok Choudhary is the Henry & Isabelle Dever Professor of Electrical Engineering and Computer Science and a professor at Kellogg School of Management. He is also the founder, chairman and chief scientist (served as its CEO during 2011-2013) of 4C insights (formerly Voxsup Inc.)., a big data analytics and social media marketing company. He received the National Science Foundation's Young Investigator Award in 1993. He is a fellow of IEEE, ACM and AAAS. His research interests are in high-performance computing, data intensive computing, scalable data mining, computer architecture, high-performance I/O systems, software and their applications in science, medicine and business. Alok Choudhary has published more than 400 papers in various journals and conferences and has graduated 33 PhD students. Techniques developed by his group can be found on every modern processor and scalable software developed by his group can be found on many supercomputers. Alok Choudhary’s work and interviews have appeared in many traditional media including New York Times, Chicago Tribune, The Telegraph, ABC, PBS, NPR, AdExchange, Business Daily and many international media outlets all over the world.

Joe Curley serves Intel® Corporation as Senior Director, HPC Platform and Ecosystem Enablement in the High Performance Computing Platform Group (HPG). His primary responsibilities include supporting global ecosystem partners to develop their own powerful and energy-efficient HPC computing solutions utilizing Intel hardware and software products. Mr. Curley joined Intel Corporation in 2007, and has served in multiple other planning and business leadership roles.

Prior to joining Intel, Joe worked at Dell, Inc. leading the global workstation product line, consumer and small business desktops, and a series of engineering roles. He began his career at computer graphics pioneer Tseng Labs.

Anthony Danalis is currently a Research Scientist II with the Innovative Computing Laboratory at the University of Tennessee, Knoxville. His research interests come from the area of High Performance Computing and Performance Analysis. Recently, his work has been focused on the subjects of Compiler Analysis and Optimization, System Benchmarking, MPI, and Accelerators. He received his Ph.D. in Computer Science from the University of Delaware on Compiler Optimizations for HPC. Previously, he received an M.Sc. from the University of Delaware and an M.Sc. from the University of Crete, both on Computer Networks, and a B.Sc. in Physics from the University of Crete.

Ewa Deelman is a Research Professor at the USC Computer Science Department and a Research Director, at the USC Information Sciences Institute (ISI). Dr. Deelman's research interests include the design and exploration of collaborative, distributed scientific environments, with particular emphasis on automation of scientific workflow and management of computing resources, as well as the management of scientific data. In 1997 Dr. Deelman received her PhD in Computer Science from the Rensselaer Polytechnic Institute.

Frédéric Desprez is a Chief Senior Research Scientist at Inria (Corse team Grenoble). He received his PhD in C.S. from Institut National Polytechnique de Grenoble, France, in 1994 and his MS in C.S. from ENS Lyon in 1990. At Inria, he holds a position of Deputy Scientific Director in charge of High Performance Computing, Distributed Systems, networks, and software engineering. In 2008, he obtained an IBM faculty award for his work around data distribution and scheduling for grid and Cloud platforms.

Frederic's current activities include parallel algorithms, scheduling for large scale distributed platforms (clusters, grids, and Clouds), data management, and grid and cloud computing. He leads the Grid'5000 project, which offers a platform to evaluate large scale algorithms, applications, and middleware systems.

See http://graal.ens-lyon.fr/~desprez/ for further information.

Patrick Demichel, Working for HPE since 36 years on computer technologies with focus on scientific domains. Now distinguished Technologist in EMEA on HPC, Big Data and IoT, helping in the development and integration of all IT innovations for extreme systems. Work on The Machine program with HPE Laboratories, Moonshot and all emerging technologies. Past experience: worked on Itanium development in USA.

Jack Dongarra holds an appointment at the University of Tennessee, Oak Ridge National Laboratory, and the University of Manchester. He specializes in numerical algorithms in linear algebra, parallel computing, use of advanced-computer architectures, programming methodology, and tools for parallel computers. He was awarded the IEEE Sid Fernbach Award in 2004; in 2008 he was the recipient of the first IEEE Medal of Excellence in Scalable Computing; in 2010 he was the first recipient of the SIAM Special Interest Group on Supercomputing's award for Career Achievement; in 2011 he was the recipient of the IEEE IPDPS Charles Babbage Award; and in 2013 he received the ACM/IEEE Ken Kennedy Award. He is a Fellow of the AAAS, ACM, IEEE, and SIAM and a member of the National Academy of Engineering.

Ian Foster is a Professor of Computer Science at the University of Chicago, a Distinguished Fellow at Argonne National Laboratory, and Director of the Computation Institute. He is also a fellow of the American Association for the Advancement of Science, the Association for Computing Machinery, and the British Computer Society. His awards include the British Computer Society's Lovelace Medal, honorary doctorates from the University of Canterbury, New Zealand, and CINVESTAV, Mexico, and the IEEE Tsutomu Kanai award.

Geoffrey Fox received a Ph.D. in Theoretical Physics from Cambridge University and is now distinguished professor of Computing, Engineering and Physics at Indiana University where he is director of the Digital Science Center, and Chair of Department of Intelligent Systems Engineering at the School of Informatics and Computing. He previously held positions at Caltech, Syracuse University and Florida State University after being a postdoc at the Institute of Advanced Study at Princeton, Lawrence Berkeley Laboratory and Peterhouse College Cambridge. He has supervised the PhD of 67 students and published around 1200 papers in physics and computer science with an hindex of 72 and over 28000 citations.

He currently works in applying computer science from infrastructure to analytics in Biology, Pathology, Sensor Clouds, Earthquake and Ice-sheet Science, Image processing, Deep Learning, Network Science, Financial Systems and Particle Physics. The infrastructure work is built around Software Defined Systems on Clouds and Clusters. The analytics focuses on scalable parallelism. He is involved in several projects to enhance the capabilities of Minority Serving Institutions. He has experience in online education and its use in MOOCs for areas like Data and Computational Science. He is a Fellow of APS (Physics) and ACM (Computing).

Haohuan Fu is an associate professor in the Ministry of Education Key Laboratory for Earth System Modeling, and Center of Earth System Science in Tsinghua University. He is also the deputy director of the National Supercomputing Center in Wuxi. His research interests include design methodologies for highly efficient and highly scalable simulation applications that can take advantage of emerging multi-core, many-core, and reconfigurable architectures, and make full utilization of current Peta-Flops and future Exa-Flops supercomputers; and intelligent data Management, analysis, and data Mining platforms that combine the statistic methods and machine learning technologies. Fu has a PhD in computing from Imperial College London. He’s a member of IEEE.

Al Geist is a Corporate Research Fellow at Oak Ridge National Laboratory. He is on the Leadership Team of the U.S. Exascale Computing Project and wrote most of the planning documents. He is the Chief Technology Officer of ORNL's Leadership Computing Facility and Chief Scientist for the Computer Science and Mathematics Division. His recent research is on Exascale computing and resilience needs of the hardware and software. He leads the U.S. Department of Energy technical Council on Resilience.

Laura Grigori is a senior research scientist at INRIA in France, where she is leading Alpines group, a joint group between INRIA and J.L. Lions Laboratory, UPMC, in Paris. Her field of expertise is in high performance scientific computing and numerical linear algebra. In the recent years she has co-developed communication avoiding algorithms. She has given several invited plenary talks on this topic including one in SIAM Conference on Parallel Processing 2012 and in IEEE/ACM Supercomputing 2015 Conference. She has received with her co-authors the first SIAM Siag on Supercomputing Best Paper Prize in 2016.

Andrew Grimshaw received his Ph.D. from the University of Illinois at Urbana-Champaign in 1988. He joined the University of Virginia as an Assistant Professor of Computer Science, becoming Associate Professor in 1994 and Professor in 1999. He is the chief designer and architect of Mentat, Legion, Genesis II, and the co-architect for XSEDE. In 1999 he co-founded Avaki Corporation, and served as its Chairman and Chief Technical Officer until 2003. In 2003 he won the Frost and Sullivan Technology Innovation Award. In 2008 he became the founding director of the University of Virginia Alliance for Computational Science and Engineering (UVACSE). The mission of UVACSE is to change the culture of computation at the University of Virginia and to accelerate computationally oriented research.

Andrew is the chairman of the Open Grid Forum (OGF), having served both as a member of the OGF's Board of Directors and as Architecture Area Director. Andrew is the author or co-author of over 100 publications and book chapters. His current projects are IT, Genesis II, and XSEDE. IT is a next generation portable parallel language based on the PCubeS type architecture. Genesis II, is an open source, standards-based, Grid system that focuses on making Grids easy-to-use and accessible to non computer-scientists. XSEDE (eXtreme Science and Engineering Discovery Environment) is the NSF follow-on to the TeraGrid project.

William Gropp

Acting Director and Chief Scientist, NCSA

Director, Parallel Computing Institute

Thomas M. Siebel Chair in Computer Science

University of Illinois Urbana-Champaign

Tony Hey began his career as a theoretical physicist with a doctorate in particle physics from the University of Oxford in the UK. After a career in physics that included research positions at Caltech and CERN, and a professorship at the University of Southampton in England, he became interested in parallel computing and moved into computer science. In the 1980’s he was one of the pioneers of distributed memory message-passing computing and co-wrote the first draft of the successful MPI message-passing standard.

After being both Head of Department and Dean of Engineering at Southampton, Tony Hey escaped to lead the U.K.’s ground-breaking ‘eScience’ initiative in 2001. He recognized the importance of Big Data for science and wrote one of the first papers on the ‘Data Deluge’ in 2003. He joined Microsoft in 2005 as a Vice President and was responsible for Microsoft’s global university research engagements. He worked with Jim Gray and his multidisciplinary eScience research group and edited a tribute to Jim called ‘The Fourth Paradigm: Data-Intensive Scientific Discovery.’ Hey left Microsoft in 2014 and spent a year as a Senior Data Science Fellow at the eScience Institute at the University of Washington. He returned to the UK in November 2015 and is now Chief Data Scientist at the Science and Technology Facilities Council.

In 1987 Tony Hey was asked by Caltech Nobel physicist Richard Feynman to write up his ‘Lectures on Computation’. This covered such unconventional topics as the thermodynamics of computing as well as an outline for a quantum computer. Feynman’s introduction to the workings of a computer in terms of the actions of a ‘dumb file clerk’ was the inspiration for his new book ‘The Computing Universe’ – his attempt to write a popular book about computer science.

Tony Hey is a fellow of the AAAS and of the UK's Royal Academy of Engineering. In 2005, he was awarded a CBE by Prince Charles for his ‘services to science.’

Minh Quan Ho is currently PhD student at the University Grenoble Alps. His main research topics are on optimizing 3D stencil codes and dense linear algebra libraries on many-core processors (Kalray MPPA). Prior to that, Minh Quan did his graduate internship porting the HPL benchmark on the MPPA processor by implementing a lightweight subset of MPI tuned for MPPA. He also participated in parallelizing PSi, a Markov-systems simulator project developed by the Grenoble Informatics Laboratory (LIG) and INRIA. Minh Quan graduated from his master's degree at the Ecole Polytechnique de Grenoble.

Torsten Hoeffler is an Assistant Professor of Computer Science at ETH Zürich, Switzerland. Before joining ETH, he led the performance modeling and simulation efforts of parallel petascale applications for the NSF-funded Blue Waters project at NCSA/UIUC. He is also a key member of the Message Passing Interface (MPI) Forum where he chairs the "Collective Operations and Topologies" working group. Torsten won best paper awards at the ACM/IEEE Supercomputing Conference SC10, SC13, SC14, EuroMPI'13, HPDC'15, HPDC'16, IPDPS'15, and other conferences. He published numerous peer-reviewed scientific conference and journal articles and authored chapters of the MPI-2.2 and MPI-3.0 standards. He received the Latsis prize of ETH Zurich as well as an ERC starting grant in 2015. His research interests revolve around the central topic of "Performance-centric System Design" and include scalable networks, parallel programming techniques, and performance modeling. Additional information about Torsten can be found on his homepage at htor.inf.ethz.ch.

Jeffrey K. Hollingsworth is a Professor of the Computer Science Department at the University of Maryland, College Park. He also has an appointment in the University of Maryland Institute for Advanced Computer Studies and the Electrical and Computer Engineering Department. He received his PhD and MS degrees in computer sciences from the University of Wisconsin. He received a B. S. in Electrical Engineering from the University of California at Berkeley. Dr. Hollingsworth’s research seeks to develop a unified framework to understand the performance of large systems and focuses in performance measurement and auto tuning. He is Editor in chief of the journal Parallel Computing, was general chair of the SC12 conference, and is Chair of ACM SIGHPC.

Emmanuel Jeannot is a Senior Research Scientist at Inria and is doing his research at INRIA Bordeaux Sud-Ouest and at the LaBRI laboratory since 2009. Before that he hold the same position at INRIA Nancy Grand-Est. In 2006, he was a visiting researcher at the University of Tennessee, ICL laboratory. From 1999 to 2005 he was assistant professor at the Université Henry Poincaré, Nancy 1. During the period of 2000 to 2009 he did his research at the LORIA laboratory. He got his PhD and Master degree of computer science (resp. in 1996 and 1999) both from Ecole Normale Supérieur de Lyon, at the LIP laboratory. After his PhD he spent one year as a postdoc at the LaBRI laboratory in Bordeaux. He is currently leading the TADaaM Inria team. His main research interests lie in parallel and high-performance computing and more precisely: processes placement, topology-aware algorithms, scheduling for heterogeneous environments, data redistribution, algorithms and models for parallel machines, distributed computing software, adaptive online compression, and programming models.

Laurent Lefevre is a permanent researcher in computer science at Inria (the French Institute for Research in Computer Science and Control). He is a member of the Avalon team (Algorithms and Software Architectures for Distributed and HPC Platforms) from the LIP laboratory in Ecole Normale Supérieure of Lyon, France. He has organized several conferences in high performance networking and computing and he is a member of several program committees. He has co-authored more than 100 papers published in refereed journals and conference proceedings. His interests include: energy efficiency in large scale distributed systems, high performance computing, distributed computing and networking, high performance networks protocols and services.

See http://perso.ens-lyon.fr/laurent.lefevre for further information.

Ewing “Rusty” Lusk is currently Argonne Distinguished Fellow Emeritus at Argonne National Laboratory. After obtaining his degree in pure mathematics at the University of Maryland in 1970, he spent 12 years as professor of mathematics and computer science at Northern Illinois University before moving to Argonne in 1982. There he worked on automated reasoning, logic programming, and parallel computing software. His primary contributions have been in standardizing the message-passing model (MPI, with many others) and implementing it (MPICH, with Bill Gropp and others). Most recently, he has been an active member of the UNEDF and NUCLEI DOE SciDAC projects in computational nuclear physics. He doesn’t know any physics, but the physicists don’t know any computer science, so it works out.

Doug Miles is Director of PGI compilers & tools at NVIDIA. He has worked in HPC for 30 years in math library development, benchmarking, programming model development, technical marketing and SW engineering management at Floating Point Systems, Cray Research Superservers, The Portland Group, STMicroelectronics and NVIDIA.

Bernd Mohr started to design and develop tools for performance analysis of parallel programs already 1987 at the University of Erlangen in Germany. During a three year Postdoc position at the University of Oregon, he designed and implemented the original TAU performance analysis framework. Since 1996 he has been a senior scientist at Forschungszentrum Jülich. Since 2000, he is the team leader for the group "Programming Environments and Performance Optimization". Besides being responsible for user support and training in regard to performance tools at the Jülich Supercomputing Centre (JSC), he is leading the Scalasca and Score-P performance tools efforts in Jülich. Since 2007, he also serves as deputy head for the JSC division "Application support". For the SC and ISC Conference series, he serves on the Steering Committee. He is the author of many conference and journal articles about performance analysis and tuning of parallel programs.

Frank Mueller is a Professor in Computer Science and a member of multiple research centers at North Carolina State University. Previously, he held positions at Lawrence Livermore National Laboratory and Humboldt University Berlin, Germany. He received his Ph.D. from Florida State University in 1994. He has published papers in the areas of parallel and distributed systems, embedded and real-time systems and compilers. He is a member of ACM SIGPLAN, ACM SIGBED and a senior member of the ACM and IEEE Computer Societies as well as an IEEE Fellow and an ACM Distinguished Scientist. He is a recipient of an NSF Career Award, an IBM Faculty Award, a Google Research Award and two Fellowships from the Humboldt Foundation.

Dimitrios Nikolopoulos is Professor, Head of the School of Electronics, Electrical Engineering and Computer Science, and Acting Director of the Centre for Data Science and Scalable Computing at Queen’s University Belfast. His current research focuses on low-latency analytics and new computing paradigms that push the boundaries of performance and energy-efficiency. His accolades include a Royal Society Wolfson Research Fellowship, NSF and DOE Career Awards, and an IBM Faculty Award. He direct a research group of 30 staff and a research grant portfolio of approximately £30 million.

Christian Obrecht is an associate professor of Applied Physics at the department of Civil Engineering and Urban Planning of the National Institute of Applied Sciences in Lyon (INSA Lyon). Dr Obrecht first graduated in Mathematics from Université de Strasbourg in 1990. From 1993 to 2008, he served as a teacher of mathematics in French secondary education. He obtained a master’s degree in Computer Science from Université Lyon 1 in 2009 and a doctoral degree in Civil Engineering from INSA Lyon in 2012. He was appointed associate professor in 2015 and joined both the Thermal Energy Storage and the Building Physics research groups at the Centre of Energy and Thermal Sciences of Lyon (CETHIL). His research work focuses on innovative approaches in computational fluid dynamics suited to massively parallel processors with applications to high performance simulations of heat storage processes and of urban microclimatic conditions.

Manish Parashar is Distinguished Professor of Computer Science at Rutgers University. He is also the founding Director of the Rutgers Discovery Informatics Institute (RDI2). His research interests are in the broad areas of Parallel and Distributed Computing and Computational and Data-Enabled Science and Engineering. Manish serves on the editorial boards and organizing committees of a large number of journals and international conferences and workshops, and has deployed several software systems that are widely used. He has also received a number of awards and is Fellow of AAAS, Fellow of IEEE/IEEE Computer Society and ACM Distinguished Scientist. For more information please visit http://parashar.rutgers.edu/.

Padma Raghavan specializes in high-performance computing and its applications with a particular focus on sparse graph and matrix problems. Her contributions are in the areas of scalable parallel computing; energy-aware supercomputing; and computational modeling, simulation and knowledge extraction. Prior to joining Vanderbilt in Feb 2016, Raghavan served as the Associate Vice President for Research and Director of Strategic Initiatives at Penn State, where she was also a Distinguished Professor of Computer Science and Engineering and the founding director of the university-wide Institute for CyberScience. Raghavan is now a Professor of Computer Science and Computer Engineering and the Vice Provost for Research at Vanderbilt University.

Gunter Roeth joined NVIDIA as a Solution Architect in October last year having previously worked at Cray, HP, Sun Microsystems and most recently BULL. He has a Master in geophysics from the Institut de Physique du Globe (IPG) in Paris and has completed a PhD in seismology on the use of neural networks (artificial intelligence) for interpreting geophysical data.

Robert Ross is Interim Director of the Mathematics and Computer Science (MCS) Division at Argonne National Laboratory. Rob is a senior fellow in the Northwestern-Argonne Institute for Science and Engineering and in the University of Chicago and Argonne Computation Institute. He also serves an adjunct assistant professor in the Department of Electrical and Computer Engineering at Clemson University. His research interests include design, implementation, and deployment of complex distributed systems and data and communication system software for high-performance computing. Rob received his Ph.D. in computer engineering from Clemson University in 2000. He currently holds several leadership positions at Argonne and in the U.S. Department of Energy (DOE) computing community, including serving as deputy director of the Scientific Data Management, Analysis and Visualization Institute and as lead of the Data Management and Workflow Software component of the DOE Office of Science Exascale Computing activity.

Vaidy Sunderam is a faculty member at Emory University. His research interests are in parallel and distributed systems, infrastructures for collaborative computing, and data security. His prior and recent research efforts have focused on system architectures and implementations for heterogeneous metacomputing, collaborative resource sharing, and data management systems. Vaidy teaches computer science at the beginning, advanced, and graduate levels, and advises graduate theses in the area of computer systems.

Martin Swany is Associate Chair and Professor in the Intelligent Systems Engineering Department in the School of Informatics and Computing at Indiana University, and the Deputy Director of the Center for Research in Extreme Scale Technologies (CREST). His research interests include high-performance parallel and distributed computing and networking.

Michela Taufer is an associate professor in the same department at the University of Delaware. She earned her master’s degrees in Computer Engineering from the University of Padova (Italy) and her doctoral degree in Computer Science from the Swiss Federal Institute of Technology (Switzerland). From 2003 to 2004 she was a La Jolla Interfaces in Science Training Program (LJIS) Postdoctoral Fellow at the University of California San Diego (UCSD) and The Scripps Research Institute (TSRI), where she worked on interdisciplinary projects in computer systems and computational chemistry. From 2005 to 2007, she was an Assistant Professor at the Computer Science Department of the University of Texas at El Paso (UTEP). She joined the University of Delaware in 2007 as an Assistant Professor and was promoted to Associate Professor with tenure in 2012.

Taufer's research interests include scientific applications and their advanced programmability in heterogeneous computing (i.e., multi-core and many-core platforms, GPUs); performance analysis, modeling, and optimization of multi-scale applications on heterogeneous computing, cloud computing, and volunteer computing; numerical reproducibility and stability of large-scale simulations on multi-core platforms; big data analytics and MapReduce.

Samuel Thibault is Assistant Professor at the University of Bordeaux since 2008, and part of the Inria STORM team. His researches revolve around thread, task, and data transfer scheduling in parallel and distributed runtime systems. He is currently focused on the design of the StarPU runtime, and more particularly its scheduling heuristics for heterogeneous architectures and for distributed systems.

Bernard Tourancheau got a MSc. in Apply Maths from Grenoble University in 1986 and a MSc. in Renewable Energy Science and Technology from Loughborough University in 2007. He was awarded best Computer Science PhD by Institut National Polytechnique of Grenoble in 1989 for his work on Parallel Computing for Distributed Memory Architectures.

He was appointed assistant professor at Ecole Normale Supérieure de Lyon LIP lab in 1989 before joining CNRS as a junior researcher. After initiating a CNRS-NSF collaboration, he worked on leave at the University of Tennessee on a senior researcher position with the US Center for Research in Parallel Computation at the ICL laboratory.

He then took a Professor position at University of Lyon in 1995 where he created a research laboratory and the INRIA RESO team, specialized in High Speed Networking and HPC.

In 2001, he joined SUN Microsystems Laboratories for a 6 years sabbatical as a Principal Investigator in the DARPA HPCS project where he lead the backplane networking group.

Back in academia he oriented his research on wireless sensor networks for building energy efficiency at ENS LIP and INSA CITI labs.

He was appointed Professor at University Joseph Fourier of Grenoble in 2012. Since then in the LIG lab Drakkar team, he is developing research about protocols and architectures for the Internet of Things. He as well pursues HPC multicores GPGPU's communication algorithms optimization research. He is also a scientific promoter of renewable energy transition, relocalization and low tech to answer the peak oil and global warming issues.

He has authored more than 140 peer-reviewed publications and filed 10 patents.

Jeffrey Vetter, Ph.D., is a Distinguished R&D Staff Member at Oak Ridge National Laboratory (ORNL). At ORNL, Vetter is the founding group leader of the Future Technologies Group in the Computer Science and Mathematics Division. Vetter also holds joint appointments at the Georgia Institute of Technology and the University of Tennessee-Knoxville. Vetter earned his Ph.D. in Computer Science from the Georgia Institute of Technology. Vetter is a Senior Member of the IEEE, and a Distinguished Scientist Member of the ACM. In 2010, Vetter, as part of an interdisciplinary team from Georgia Tech, NYU, and ORNL, was awarded the ACM Gordon Bell Prize. Also, his work has won awards at major conferences including Best Paper Awards at the International Parallel and Distributed Processing Symposium (IPDPS) and EuroPar, Best Student Paper Finalist at SC14, and Best Presentation at EASC 2015. In 2015, Vetter served as the SC15 Technical Program Chair. His recent books, entitled "Contemporary High Performance Computing: From Petascale toward Exascale (Vols. 1 and 2)," survey the international landscape of HPC. See his website for more information: http://ft.ornl.gov/~vetter/.

Frédéric Vivien received his Ph.D. degree from the École Normale Supérieure de Lyon in 1997. From 1998 to 2002, he was an associate professor at the Louis Pasteur University in Strasbourg, France. He spent the year 2000 working with the Computer Architecture Group of the MIT Laboratory for Computer Science. He is currently a senior researcher from INRIA, working at ENS Lyon, France. He leads the INRIA project-team Roma, which focuses on designing models, algorithms, and scheduling strategies to optimize the execution of scientific applications. He is the author of two books, more than 35 papers published in international journals, and more than 50 papers published in international conferences. His main research interests are scheduling techniques and parallel algorithms for distributed and/or heterogeneous systems.

David Walker is Professor of High Performance Computing in the School of Computer Science and Informatics at Cardiff University, where he heads the Distributed Collaborative Computing group. From 2002-2010 he was also Director of the Welsh e-Science Centre. He received a B.A. (Hons) in Mathematics from Jesus College, Cambridge in 1976, an M.Sc. in Astrophysics from Queen Mary College, London, in 1979, and a Ph.D. in Physics from the same institution in 1983. Professor Walker has conducted research into parallel and distributed algorithms and applications for the past 25 years in the UK and USA, and has published over 140 papers on these subjects. Professor Walker was instrumental in initiating and guiding the development of the MPI specification for message-passing, and has co-authored a book on MPI. He also contributed to the ScaLAPACK library for parallel numerical linear algebra computations. Professor Walker’s research interests include software environments for distributed scientific computing, problem-solving environments and portals, and parallel applications and algorithms. Professor Walker is a Principal Editor of Computer Physics Communications, the co-editor of Concurrency and Computation: Practice and Experience, and serves on the editorial boards of the International Journal of High Performance Computing Applications, and the Journal of Computational Science.

CCGSC 1992, Participants (Some)

CCGSC 1994 Participants (Some), Blackberry Farm, Tennessee

Missing CCGSC 1996 - Anyone have a picture?

CCGSC 1998 Participants, Blackberry Farm, Tennessee

CCGSC 2000 Participants, Faverges, France

CCGSC 2002 Participants, Faverges, France

CCGCS 2004 Participants, Faverges, France

CCGCS 2006 Participants, Flat Rock North Carolina

Some additional pictures can be found here.

http://web.eecs.utk.edu/~dongarra/ccgsc2006/

CCGCS 2008 Participants, Flat Rock North Carolina

http://web.eecs.utk.edu/~dongarra/ccgsc2008/

CCGCS 2010 Participants, Flat Rock North Carolina

http://web.eecs.utk.edu/~dongarra/ccgsc2010/

CCDSC 2012 Participants, Dareize, France

http://web.eecs.utk.edu/~dongarra/CCDSC-2012/index.htm

CCGSC 2014 Participants, Dareize, France

http://web.eecs.utk.edu/~dongarra/CCDSC-2014/index.htm

October 3-6, 2016

Châteauform’

La Maison des Contes

Châteauform’

La Maison des Contes

Message from the Program Chairs

Draft agenda (4/5/17 3:31 AM)

October 3rd – 6th, 2016

Arrival / Departure Information:

Abstracts:

David Walker, Cardiff University

Biographies of Attendees:

October 3^rd – 6^th, 2016