FrobertClusters, Clouds, and Data for Scientific Computing

CCDSC 2014

September 2^nd – 5^th, 2014

Châteauform’

La Maison des Contes

427 Chemin de Chanzé, France

Châteauform’

La Maison des Contes

427 Chemin de Chanzé, France

September 2^nd – 5^th, 2014

CCDSC 2014 will be held at a resort outside of Lyon France called La maison des contes http://www.chateauform.com/en/chateauform/maison/17/chateau-la-maison-des-contes

The address of the Chateau is:

Châteauform’ La Maison des Contes

427 chemin de Chanzé

69490 Dareizé

Telephone: +33 1 30 28 69 69

1 hr 30 min from the Saint Exupéry Airport

45 minutes from Lyon

GPS Coordinates: North latitude 45° 54' 20" East longitude 4° 30' 41"

Go to http://maps.google.com and type in: “427 chemin de Chanzé 69490 Dareizé” or see:

Maps: click here

Map of Chateau: click here

Message from the Program Chairs

This proceeding gathers information about the participants of the Workshop on Clusters, Clouds, and Data for Scientific Computing that will be held at La Maison des Contes, 427 Chemin de Chanzé, France on September 2^nd – 5^th, 2014. This workshop is a continuation of a series of workshops started in 1992 entitled Workshop on Environments and Tools for Parallel Scientific Computing. These workshops have been held every two years and alternate between the U.S. and France. The purpose of this the workshop, which is by invitation only, is to evaluate the state-of-the-art and future trends for cluster computing and the use of computational clouds for scientific computing.

This workshop addresses a number of themes for developing and using both cluster and computational clouds. In particular, the talks covered:

§ Survey and analyze the key deployment, operational and usage issues for clusters, clouds and grids, especially focusing on discontinuities produced by multicore and hybrid architectures, data intensive science, and the increasing need for wide area/local area interaction.

§ Document the current state-of-the-art in each of these areas, identifying interesting questions and limitations. Experiences with clusters, clouds and grids relative to their science research communities and science domains that are benefitting from the technology.

§ Explore interoperability among disparate clouds as well as interoperability between various clouds and grids and the impact on the domain sciences.

§ Explore directions for future research and development against the background of disruptive trends and technologies and the recognized gaps in the current state-of-the-art.

Speakers will present their research and interact with all the participants on the future software technologies that will provide for easier use of parallel computers.

This workshop was made possible thanks to sponsorship from ANR, Google, Hewlett-Packard, The Portland Group, Rhone-Alpes Region with the scientific support of the Innovative Computing Laboratory at the University of Tennessee in Knoxville (UTK) and University Joseph Fourier of Grenoble.

Thanks!

Jack Dongarra, Knoxville, Tennessee, USA.

Bernard Tourancheau, Grenoble, France

Draft agenda (1/15/16 11:54 AM)

September 2^nd – 5^th, 2014

Tuesday September 2nd	Introduction and Welcome Jack Dongarra, U of Tenn Bernard Tourancheau, U Grenoble
6:30 – 7:45	Session Chair: Jack Dongarra	(2 talks - 25 minute each)
7:00	Michael Wolfe	A Compiler Engineer's View of High Performance Technical Computing
7:30	Patrick Geoffray	Google Cloud HPC
8:00 pm – 9:00 pm	Dinner
9:00 pm -

Wednesday, September 3rd
7:30 - 8:30	Breakfast
8:30 - 10:35	Session Chair: Bernard Tourancheau, U Grenoble	(5 talks – 25 minutes each)
8:30	Pete Beckman	Cognitive Dissonance in HPC
8:55	Rosa Badia	Task-based programming with PyCOMPSs and its integration with data management activities at BSC
9:20	David Abramson	The WorkWays Problem Solving Environment
9:45	Jelena Pjesivac-grbovic	Google Cloud Platform focusing on Data Processing and Analytics tools available in GCP
10:10	Patrick Demichel	The Machine
10:35 -11:00	Coffee
11:00 - 1:05	Session Chair: Patrick Demichel	(5 talks – 25 minutes each)
11:00	Franck Cappello	Toward Approximate Detection of Silent Data Corruptions.
11:25	George Bosilca	Mixed resilience solutions
11:50	Yves Robert	Algorithms for coping with silent errors
12:15	Frank Mueller	On Determining a Viable Path to Resilience at Exascale
12:40	Satoshi Matsuoka	Towards Billion-Way Resiliency
1:05 - 2:00	Lunch
2:30 – 3:00	Coffee
3:00 - 5:30	Panel Chair: Rusty Lusk
	Jean-Yves Berthou	Settling the Important Questions, Once and for All
	Geoffrey Fox
	Al Geist
	Thilo Kielmann
	JL Philippe
	Vaidy Sunderam
5:45 – 7:30	Wine tasting Cellar Bruno	Travel time to Cellar Bruno 15-20 minutes: two possibilities, walking or cycling
8:00 – 9:00	Dinner
9:00 pm -

Thursday, September 4th
7:30 - 8:30	Breakfast
8:30 - 10:35	Session Chair: Emmanuel Jeannot	(4 talks – 25 minutes each)
8:30	Bill Gropp	Computing at a cross-roads: Big Data, Big Compute, and the Long Tail
8:55	Barbara Chapman	Portable Application Development in an Age of Node Diversity
9:20	Marc Buffat	High Performance computing and Big Data for turbulent transition analysis
9:45	Joel Saltz	Exascale Challenges in Integrative Multi-scale Spatio-Temporal Analyses
10:35 -11:00	Coffee
11:00 - 1:05	Session Chair: Laurent Lefevre	(5 talks – 25 minutes each)
11:00	Dan Reed	Adaptive, Large-Scale Computing Systems
11:25	Ewa Deelman	Building Community Resources For Scientific Workflow Research
11:50	Christian Perez	Evaluation of an HPC Component Model on Jacobi and 3D FFT Kernels.
12:15	Jeff Hollingsworth	NEMO: Autotuning power and performance
12:40	Rajeev Thakur	Future Node Architectures and their Implications for MPI
1:05 - 2:00	Lunch
2:00 – 4:00	Session Chair: Xavier Vigouroux	(3 talks – 25 minutes each)
2:30	Dimitrios Nikolopoulos	The Challenges and Opportunities of Micro-servers in the HPC Ecosystem
2:55	Mary Hall	Leveraging HPC Expertise and Technology in Data Analytics
3:20	Torsten Hoefler	Slim Fly: A Cost Effective Low-Diameter Network Topology
4:00 – 5:00	Coffee
5:00 - 7:05	Session Chair: Christian Perez	(5 talks – 25 minutes each)
5:00	Jeff Vetter	Exploring Emerging Memory Technologies in the Extreme Scale HPC Co-Design Space
5:25	Bernd Mohr	The Score-P Tool Universe
5:50	Padma Ragahavan	Multilevel Data Structures for Accelerating Parallel Sparse Matrix Computations
6:15	Frederic Suter	Scalable Off-line Simulation of MPI applications
6:40	Christian Obrecht	Early attempts of implementing the lattice Boltzmann method on Intel's MIC architecture
8:00 – 9:00	Dinner
9:00 pm -

Friday, September 5th
7:30 - 8:30	Breakfast
8:30 - 10:35	Session Chair: Rosa Badia	(5 talks – 25 minutes each)
8:30	Anthony Danalis	Why PaRSEC is the right runtime for exascale computing
8:55	Michela Taufer	Performance and Cost Effectiveness of DAG-based Workflow Executions on the Cloud
9:20	Martin Swany	Network Acceleration for Data Logistics in Distributed Computing
9:45	Satoshi Sekiguchi	Dataflow-centric Warehouse-scale Computing
10:10	Frederic Vivien	Scheduling Tree-Shaped Task Graphs to Minimize Memory and Makespan
10:35 -11:00	Coffee
11:00 - 1:05	Session Chair: Frédéric Suter	(3 talks – 25 minutes each)
11:00	David Walker	Algorithms for In-Place Matrix Transposition
11:25	Laurent Lefevre	Towards Energy Proportional HPC and Cloud Infrastructures
11:50	Emmanuel Jeannot	Topology-aware Resource Selection
12:30 - 2:00	Lunch
2:00	Depart

Attendee List:

David	Abramson	U of Queensland
Rosa	Badia	BSC
Pete	Beckman	ANL
Jean-Yves	Berthou	ANR
George	Bosilca	UTK
Bill	Brantley	AMD
Marc	Buffat	U of Lyon
Franck	Cappello	ANL/INRIA
Barbara	Chapman	U of Houston
Francois	Courteille	Nvidia
Joe	Curley	Intel
Anthony	Danalis	UTK
Ewa	Deelman	ISI
Patrick	Demichel	HP
Benoit	Dinechin	Kalray
Jack	Dongarra	UTK/ORNL
Geoffrey	Fox	Indiana
Al	Geist	ORNL
Patrick	Geoffray	Google
Andrew	Grimshaw	U Virginia
Bill	Gropp	UIUC
Mary	Hall	Utah
Torsten	Hoefler	ETH
Jeff	Hollingsworth	U Maryland
Emmanuel	Jeannot	INRIA
Thilo	Kielmann	Vrije Universiteit
Laurent	Lefevre	INRIA
Rusty	Lusk	ANL
Satoshi	Matsuoka	Tokyo Institute of Technology
Bernd	Mohr	Juelich
Frank	Mueller	NC State
Raymond	Namyst	U Bordeaux & INRIA
Dimitrios	Nikolopoulos	Queen's University of Belfast
Christian	Obrecht	INSA Lyon
Jean-Laurent	Philippe	Intel
Christian	Perez	INRIA
Jelena	Pjesivac-grbovic	Google
Padma	Raghavan	Penn State
Dan	Reed	U of Iowa
Yves	Robert	ENS & INRIA
Joel	Saltz	Emory U
Satoshi	Sekiguchi	Grid Technology Research Center, AIST
Vaidy	Sunderam	Emory U
Frederic	Suter	CNRS/IN2P3
Martin	Swany	Indiana U
Michela	Taufer	U of Delaware
Marc	Tchiboukdjian	CGG
Rajeev	Thakur	Argonne
Bernard	Tourancheau	University Grenoble
Stéphane	Ubéda	INRIA
Jeff	Vetter	ORNL
Xavier	Vigouroux	Bull
Frederic	Vivien	ENS & INRIA
David	Walker	Cardiff
Michael	Wolfe	PGI

Arrival / Departure Information:

Here is some information on the meeting in Lyon. We have updated the workshop webpage http://tiny.cc/ccdsc-2014 with the workshop agenda.

On Tuesday September 2^nd there will be a bus to pick up participants at Lyon's Saint Exupéry (old name Satolas) Airport at 3:00. (Note that the Saint Exupéry airport has its own train station with direct TGV connections to Paris via Charles de Gaulle. If you arrive by train at Saint Exupéry airport please go to the airport meeting point (point-rencontre) (second floor, next to the shuttles, near the hallway between the two terminals, see http://www.lyonaeroports.com/eng/Access-maps-car-parks/Maps.

The bus will be at the TGV station is after a long corridor from the airport terminal. The bus stop is near the station entrance on the parking lot called "depose minute".

The bus will then travel to pick up people at the Lyon Part Dieu railway station at 4:45. (There are two train stations in Lyon, you want Part Dieu station not the Perrache station.) There will be someone with a sign at the "Meeting Point/point de rencontre" of the station to direct you to the bus.

The bus is expected to arrive at the La Maison des Contes around 5:30. We would like to hold the first session on Tuesday evening from 6:30 pm to 8:00 pm, with dinner following the session. The La Maison des Contes is about 43 Km from Lyon. For a map to the La Maison des Contes go to http://maps.google.com and type in: “427 chemin de Chanzé 69490 Dareizé” or see: Maps: click here

Map of Chateau: click here

VERY IMPORTANT: Please send your arrival and departure times to Jack so we can arrange the appropriate size bus for transportation. VERY VERY IMPORTANT: If your flight is such that you will miss the bus on Tuesday, September 2^nd at 3:00 send Bernard your flight arrival information so he can arrange for a transportation to pick you up at the train station or the airport in Lyon. It turns out that a taxi from Lyon to the Chateau can cost as much as 100 Euro and the Chateau may be hard to find at night if you rent a car and are not a French driver :-).

At the end of the meeting on Friday afternoon, we will arrange for a bus to transport people to the train station and airport. If you are catching an early flight in the morning of Saturday, September 6^th you may want to stay at the hotel located at Lyon's Saint Exupéry Airport,

see http://www.lyonaeroports.com/eng/Shops-facilities/Hotels for details.

There are also many hotels in Lyon area, see: http://www.en.lyon-france.com/

Due to room constraints at the La Maison des Contes, you may have to share a room with another participant. Dress at the workshop is informal. Please tell us if you need special requirements (vegetarian food etc...) We are expecting to have internet and wireless connections at the meeting.

Please send this information to Jack (dongarra@eecs.utk.edu) by July18^th.

Name:

Institute:

Title:

Abstract:

Participant’s brief biography:

Arrival / Departure Details:

		Arrival Times in Lyon	Departure Times in Lyon
David	Abramson	9/2 Part Dieu 2:00 pm	9/5 train to Paris @4pm
Rosa	Badia	9/2 VY1220 11:15	9/5 VY1223 19:00
Pete	Beckman	9/2 UA8914 10:10am	9/5
Jean-Yves	Berthou	Drive arrive 9/3 Wednesday	9/5
George	Bosilca	9/2 DL8344 2:30pm	9/5
Bill	Brantley	9/2 Airport 10:00am LATE 4:30pm	9/6 8:15am airport
Marc	Buffat	9/3 Drive	9/4
Franck	Cappello	9/2 Part Dieu 4:00 pm	9/5
Barbara	Chapman	9/2 Part Dieu 7:26pm (taxi at train station to cheateau)	9/5
Francois	Courteille	9/2 Part-Dieu	9/4 train
Joe	Curley	9/2 UA8914 10:10am	9/5
Anthony	Danalis	9/2 AF7644 2:30pm	9/5 3:15pm St. Exupery
Ewa	Deelman	9/2 Part Dieu	9/5
Patrick	Demichel	Drive	Drive
Benoit	Dinechin	Drive	Drive
Jack	Dongarra	9/2 DL9288 11:15am	9/6 DL9521 6:35am
Geoffrey	Fox	9/2 BA360 11am	9/6 BA365 8:15am
Al	Geist	9/2 DL9515 1:20pm	9/6 DL8611 8:10am
Patrick	Geoffray	9/2 DL8344 2:30pm	9/5
Bill	Gropp	9/2 Part Dieu 2:00pm	9/5 Part Dieu 4pm
Mary	Hall	9/2 airport via train	9/5 train to Paris
Torsten	Hoefler	9/2 Part Dieu 3:26pm	9/5
Jeff	Hollingsworth	9/2 UA8914 10:10am	9/6 UA8881 6:55am
Emmanuel	Jeannot	9/2 airport 4:50pm (pickup by Benoit De Domechin at 5:30pm)	9/5 airport 4:25pm
Thilo	Kielmann	9/2 KL1417 13:20	9/5 KL1416 18:15
Laurent	Lefevre	Drive	9/5
Rusty	Lusk	9/2 UA8914 10:10am	9/5
Satoshi	Matsuoka	9/2 airport (arriving 9/1)	9/6 airport LH1077 2:40pm
Bernd	Mohr	9/2 4U9414 3:15pm (pickup by Benoit De Domechin at 5:30pm)	9/5 4U9417 8:30pm
Frank	Mueller	9/1 Part Dieu	9/5 Part Dieu 3:34pm
Raymond	Namyst	Drive	Leave Thursday morning
Dimitrios	Nikolopoulos	Drive arrive Wednesday 9/3	Drive depart 9/5
Christian	Obrecht	Drive	9/5
Christian	Perez	Drive	9/4 Drive
Jean-Laurent	Philippe	Drive	Leave Wed even
Jelena	Pjesivac-grbovic	9/2 BA360 11:00am	9/7 BA365 8:15am
Padma	Raghavan	9/2 CH532 1:50pm	9/6 AA8602 8:40am
Dan	Reed	9/2 AA6592 11:00am LATE 6:40pm	9/6 AA8602 8:40am
Yves	Robert	Drive	9/5
Joel	Saltz	9/2 AF7644 2:30pm	9/6 AF7641 10:55am
Satoshi	Sekiguchi	9/2 Part Dieu 4:28pm	9/5 Part Dieu 4pm
Vaidy	Sunderam	9/2 airport by 3:00	9/5
Frederic	Suter	Drive	9/5
Martin	Swany	9/2 DL8344 2:30pm	9/6 DL8611 8:10am
Michela	Taufer	9/2 TGV 4:45pm	9/5 TGV pm
Marc	Tchiboukdjian	9/2 Part Dieu 4:00pm	9/5 Part Dieu 6:00pm
Rajeev	Thakur	9/2 LF1076 1:55pm	9/5
Bernard	Tourancheau	9/2 airport	9/5
Stéphane	Ubéda	9/2 Drive	9/3 Depart
Jeff	Vetter	9/2 Part Dieu from Paris	9/6 airport
Xavier	Vigouroux	Drive	9/4
Frederic	Vivien	Drive	9/5
David	Walker	9/2 KL1413 11:15a	9/5 KL1416 6:15pm
Michael	Wolfe	9/2 from AMS at 1:20pm	9/6 6:35a to AMS

Abstracts:

David Abramson and Hoang Nguyen, University of Queensland

The WorkWays Problem Solving Environment

Science gateways allow computational scientists to interact with a complex mix of mathematical models, software tools and techniques, and high performance computers. Accordingly, various groups have built high-level problem-solving environments that allow these to be mixed freely. In this talk, we introduce an interactive workflow-based science gateway, called WorkWays. WorkWays integrates different domain specific tools, and at the same time is flexible enough to support user input, so that users can monitor and steer simulations as they execute. A benchmark design experiment is used to demonstrate WorkWays.

Rosa M Badia, Barcelona Supercomputing Center

Task-based programming with PyCOMPSs and its integration with data management activities at BSC

StarSs is a family of task-based programming models which is based on the idea of writing sequential code which is executed in parallel at runtime taking into account the data dependences between tasks.

COMPSs is an instance of StarSs, which intends to simplify the execution of Java applications in distributed infrastructures, including clusters and Clouds. For that purpose, COMPSs provides both a straightforward Java-based programming model and a componentised runtime that is able to interact with a wide variety of distributed computing middleware (e.g. gLite, Globus) and Cloud APIs (e.g. OpenStack, OpenNebula, Amazon EC2).

The talk will focus in the recent extensions to COMPSs: PyCOMPSs, a binding for the Python language which will enable a larger number of scientific applications in fields such as lifesciences and in the integration of COMPSs with new Big Data resource management methodologies developed at BSC, such as the Wasabi self-contained objects library and Cassandra data management policies. These activities are performed under the flagship project Human Brain Project and the Spanish BSC Severo Ochoa project.

Pete Beckman, ANL

Cognitive Dissonance in HPC

At extreme-scale, the gulf between what we want and what we can have becomes more pronounced. The list of conflicting truths, wants, and needs within the HPC community is probably too long to analyze and enumerate, which of course means it is Big Data. For extreme-scale hardware and system software we must re-examine our investments, designs, beliefs, and performance tradeoffs.

George Bosilca, UTK

Mixed resilience solutions

For too long sub-optimal resilience mechanisms have been praised as a one-size-fits-all fault management approaches in production-grade applications. Moving to larger and more powerful computing platforms, we started to realize that these solutions, while valid at certain sizes, are only able to support our programming paradigms or applications at a prohibitive hardware cost. In this talk I will focus on a particular method to cope with these imperfect approaches by combining different resilience methodologies in order to capitalize on their benefits and create cheaper, efficient and more stable ways to deal with failures. More specifically, this talk will cover the mixed case of coordinated checkpoint/restart together with algorithmic fault tolerance.

Marc BUFFAT, Université Claude Bernard Lyon 1

High Performance computing and Big Data for turbulent transition analysis

Understanding turbulent transition using numerical experiments is a computational challenge, because it requires very large accurate simulations. In the past, studies in scientific simulation have been mainly focused on the solver, because it was the most CPU consuming part. Nowadays highly accurate numerical solver, as the NadiaSpectral code in our group, allow to run very large turbulent transition simulations using billions of modes on HPC. However, due to the size of such simulations, specific issues are emerging related to the input/output and the analysis of the results. Particularly when large simulations are performed as experiments that must be analyzed in details without a priori knowledge, saving to disk the computed data at regular time steps for post-processing is a source of worrisome overhead. Thus new trends emerge that consider the analysis and the visualization as a part of a high-performance simulation using “in-situ visualization”. A tightly coupled in-situ processing using general purpose visualization tools as VisIt or ParaView is however not well adapted to our needs. In this talk, I will present a case study of an hybrid in-situ concurrent processing, that allow to interact with the simulation, analyze and visualize time dependent results while preserving the accuracy of large simulations.

Franck Cappello, Univ Paris/ANL

Toward Approximate Detection of Silent Data Corruptions.

Exascale systems will suffer more frequent soft errors than current systems. Hardware protections will detect and may correct most of them. However the probability of soft errors to stay unnoticed will become significant. These errors, known as silent soft errors may lead ultimately to wrong results. In this talk we will focus on the SDC detection problem and review existing system and algorithmic techniques. We will also introduce low cost approximate detection approaches that are promising in the Exascale context and beyond.

Barbara Chapman, U of Houston

Portable Application Development in an Age of Node Diversity

Anthony Danalis, UTK

Why PaRSEC is the right runtime for exascale computing

Current HPC systems feature increasing core counts, accelerators, and unpredictable memory access times. Developing efficient applications for such systems requires new programming paradigms. Solutions must react and adapt quickly to unexpected contentions and delays, and have the flexibility to rearrange the load balance to improve the resource utilization. In this talk, we demonstrate why PaRSEC is the right solution for this problem. We outline the dataflow-based task execution model of PaRSEC and describe the Parameterized Task Graph (PTG) that enables this model. Then the PTG is contrasted with the more traditional Bulk Synchronous and Coarse Grain Parallelism model that is embodied in applications that use MPI for explicit message passing. Also, the PTG model is contrasted with the alternative approach for task execution, where the entire dynamic DAG of tasks is created and maintained in memory. We then showcase example success stories and discuss future directions.

Ewa Deelman, ISI

Building Community Resources For Scientific Workflow Research

A significant amount of recent research in scientific workflows aims to develop new techniques, algorithms, and systems that can overcome the challenges of efficient and robust execution of ever larger workflows on increasingly complex distributed infrastructures. Since the infrastructures, systems, and applications are complex, and their behavior is difficult to reproduce using physical experiments, much of this research is based on simulation. However, there exists a shortage of realistic datasets and tools that can be used for such simulations. This talk describes a collection of tools and data that have enabled research on new techniques, algorithms, and systems for scientific workflows. These resources include: 1) execution traces of real workflow applications from which workflow and system characteristics such as resource usage and failure profiles can be extracted, 2) a synthetic workflow generator that can produce realistic synthetic workflows based on profiles extracted from execution traces, and 3) a simulator framework that can simulate the execution of synthetic workflows on realistic distributed infrastructures. The talk describes how these resources have been used to investigate new techniques for efficient and robust workflow execution, as well as provided the basis for improvements to the Pegasus Workflow Management System or other workflow tools. All the tools and data are freely available online for the community.

Patrick Demichel, HP

THE MACHINE

Our industry is challenged by the simultaneous end of regime of most of our old technologies developed for decades, and the insatiable demand of 10X more every 3 years to process the tsunami of data coming to us. The HP-labs have identified this challenge many years ago and developed the technologies then a program to disrupt by at least 2 orders of magnitude the natural trends to enable the Exascale story This time, this will be a radically more disruptive evolution of our systems; we are forced to holistically redesign most of our hardware and software components to achieve this goal and deliver the promise of extracting the value in the data This program is called "THE MACHINE"; this is not just the design of a massive Data Center; but the redesign from scratch of a new infrastructure that will integrate the full ecosystem from the data centers to the billions of connected intelligent objects

Patrick Geoffery, Google

Google Cloud HPC

Bill Gropp, UIUC

Computing at a cross-roads: Big Data, Big Compute, and the Long Tail

The US National Science Foundation has commissioned a study on the future of advanced computing for NSF. The committee is soliciting input on the impact of computing, the tradeoffs between different kinds of computing and data capabilities, and alternative methods of providing cyberinfrastructure resources. This talk will give an overview of the issues, pose questions for the audience, and invite input for the report.

Mary Hall, University of Utah

Leveraging HPC Expertise and Technology in Data Analytics

Scalable approaches to scientific simulation and to data analytics have mostly followed separate technology paths. In HPC, performance and simulation accuracy have been principal drivers of technology, while data analytics research has primarily focused on programming tools and systems that are productive and resilient in the presence of frequent faults. This talk discusses how the future challenges in large-scale systems for both HPC and data analytics will face similar challenges in addressing scalability, energy efficiency, resilience and programmability. We make several observations about programming trends and future architectures through surveying contemporary work in both areas, with a particular emphasis on architectures, programming systems and algorithms. We then discuss where research on HPC can be leveraged in data analytics and how applications that are both compute- and data-intensive can evolve.

Torsten Hoefler, ETH Zürich

Slim Fly: A Cost Effective Low-Diameter Network Topology

We introduce a high-performance cost-effective network topology called Slim Fly that approaches the theoretically optimal network diameter. Slim Fly is based on graphs that approximate the solution to the degree-diameter problem. We analyze Slim Fly and compare it to both traditional and state-of-the-art networks. Our analysis shows that Slim Fly has significant advantages over other topologies in latency, bandwidth, resiliency, cost, and power consumption. Finally, we propose deadlock-free routing schemes and physical layouts for large computing centers as well as a detailed cost and power model. Slim Fly enables constructing cost effective and highly resilient datacenter and HPC networks that offer low latency and high bandwidth under different HPC workloads such as stencil or graph computations.

Jeff Hollingsworth, U Maryland

NEMO: Autotuning power and performance

Autotuning has demonstrated its utility in many domains. However, increasingly there is a need to autotune for multiple objective functions (such as power and performance). In this talk I will describe NEMO, a system for multi-objective autotuning. NEMO allows efficiently finding solutions near the Pareto front without having to explicitly build the full Pareto front. I will present some preliminary results of using NEMO to autotune a GPU kernel.

Emmanuel Jeannot, INRIA

Topology-aware Resource Selection

The way resources are allocated to application plays a crucial role in the performance of the execution. It has been shown recently that a non-contiguous allocation can slowdown the performance by more than 30%. However, a batch scheduler cannot always provide a contiguous allocation and even in the case of such allocation the way processes are mapped to the allocated resources have a big impact on the performance. The reason is that the topology of HPC machine is hierarchical and that the process affinity is not uniform (some pairs of processes exchange more data than some other pairs). Hence taking into account the topology of the machine and the process affinity is an effective way to increase the application performance.

Nowadays, the allocation and the mapping are decoupled. For instance, in Zoltan, processors are first allocated to the application and then pro- cesses are mapped to the allocated resources depending on the topology and the communication pattern. Decoupling allocation and mapping can lead to sub- optimal solutions where a better mapping could have been found if the resource selection had taken into account the process affinity.

In this talk, we will present our work for coupling the resource allocation and the topology-mapping. We have designed and implemented a new Slurm plug-in that takes as input the process affinity of the application and that, according to the machine topology selects resources and maps processes taking into account these two entries (affinity and topology). It is based on our process placement tool called TreeMatch that provides the algorithmic engine to compute the solution. We will present our preliminary results by emulating traces of the Curie machine that features 5040 nodes (2 socket of 8 cores each) and comparing our solution with the plain Slurm.

Laurent Lefevre, INRIA ENS

Towards Energy Proportional HPC and Cloud Infrastructures

Reducing energy consumption is part of the main concerns in cloud and HPC environments. Today servers energy consumption is far from ideal, mostly because it remains very high even with low usage state. An energy consumption proportional to the server load would bring important savings in terms of electricity consumption and then financial costs for a datacenter infrastructure. This talk will present our first result on this domain.

Satoshi Matsuoka and Kento Sato, Tokyo Institute of Technology

Towards Billion-Way Resiliency

Our "Billion-Way Resiliency" project aims at creating algorithms and software frameworks to achieve scalable resiliency in future exascale systems with high failure rates and limited I/O bandwidth. Currently, many future architectural plans assume burst-buffers to alleviate the I/O limitations; our modeling of resiliency I/O behavior demonstrates that, due to the burst buffer itself failing, there are various architectural tradeoffs. The good news is that, given the current failure rates we observe on today’s machines, controlling the reliability of exascale machines seem feasible, but the bad news is that it might not scale beyond. Also, issues such as fault detection, programming abstractions, as well as recovery protocols have been previously neglected in most research. While the recent UFLM proposal for MPI has been definitely a step forward, it is largely confined to the MPI layer, and the jury is out on whether such containment would be the formidable choice, or a software framework design that accommodates higher-level programming abstractions to the end-user, while communicating to lower-level system substrates such as batch-queue schedulers via a standardized interface at the same time, would be more powerful. We will touch upon the issue, along with other techniques such as checkpoint compression; all the technologies combined at this point seems to make billion-scale resiliency feasible for future exascale systems.

Bernd Mohr, Juelich

The Score-P Tool Universe

The talk will present an overview about the community effort Score-P which is a scalable and feature-rich run-time recording package for parallel performance monitoring. It supports profiling, event trace recording, as well as online monitoring; support for sampling is already on the roadmap. Score-P supports a variety of parallel programming models (MPI, OpenMP, CUDA, OpenShmem, GASPI, and others) and a all common HPC architectures (Linux clusters, Cray family, BlueGene family, and more). Unlike all comparable run-time monitoring packages, it is not tied to a particular analysis tool nor to one of the involved groups. Instead it works natively with the four well-established analysis tools Periscope, Scalasca, TAU, and Vampir. Thus it leverages the complementary analysis methodologies of the four tools. The presentation will highlight the mayor features of Score-P as well as Periscope, Scalasca, TAU, and Vampir and give an outlook to their roadmaps. Also, it will showcase selected application scenarios.

Frank Mueller, NC State

On Determining a Viable Path to Resilience at Exascale"

Exascale computing is projected to feature billion core parallelism. At such large processor counts, faults will become more common place. Current techniques to tolerate faults focus on reactive schemes for recovery and generally rely on a simple checkpoint/restart mechanism. Yet, they have a number of shortcomings. (1) They do not scale and require complete job restarts. (2) Projections indicate that the mean-time-between-failures is approaching the overhead required for checkpointing. (3) Existing approaches are application-centric, which increases the burden on application programmers and reduces portability.

To address these problems, we discuss a number of techniques and their level of maturity (or lack thereof) to address these problems. These include (a) scalable network overlays, (b) on-the-fly process recovery, (c) proactive process-level fault tolerance, (d) redundant execution, (e) the effort of SDCs on IEEE floating point arithmetic and (f) resilience modeling. In combination, these methods are aimed to pave the path to exascale computing.

Dimitrios Nikolopoulos, Queen's University of Belfast

The Challenges and Opportunities of Micro-servers in the HPC Ecosystem

Raymond Namyst, U Bordeaux & INRIA

Co-scheduling parallel codes over heterogeneous machines: a supervised approach

Enabling HPC applications to perform efficiently when invoking multiple parallel libraries simultaneously is a great challenge. Even if a uniform runtime system is used underneath, scheduling tasks or threads coming from different libraries over the same set of hardware resources introduces many issues, such as resource oversubscription, undesirable cache flushes or memory bus contention. We present an extension of StarPU, a runtime system specifically designed for heterogeneous architectures, that allows multiple parallel codes to run concurrently with reduced interference. Such parallel codes run within scheduling contexts that provide confined execution environments which are used to partition computing resources. A hypervisor automatically expands or shrinks Scheduling Contexts using feedback from the runtime system to optimize resource utilization.

Christian Obrecht, CETHIL UMR 5008 (CNRS, INSA-Lyon, UCB-Lyon 1), Université de Lyon

Early attempts of implementing the lattice Boltzmann method on Intel's MIC architecture

Starting back in the early 1990's, the lattice Boltzmann method (LBM) has become a well-acknowledged approach in computational fluid dynamics used in numerous industry-grade software such as PowerFLOW, X-Flow, Fluidyna, LaBS. From an algorithmic standpoint, the LBM operates on regular Cartesian grids (and potentially hierarchical refinement) with next-neighbours synchronisation constraints. It is therefore considered as a representative example of stencil computations (see SPEC CPU2006 Benchmark 470.lbm) and it proves to be well-suited for high-performance implementations.

In this contribution, we present two attempts to implement a three-dimensional LBM solver on Intel's MIC processor. The first version is based on the OpenCL framework and shows strong analogies with CUDA implementations of LBM. The second version takes advantage of Intel's MIC MPI support. We then report and discuss performance of both solvers on the MIC, as well as on other target systems such as GPUs or distributed systems.

Jelena Pjesivac-grbovic, Google

Google Cloud Platform focusing on Data Processing and Analytics tools available in GCP

Christian Perez, INRIA

Evaluation of an HPC Component Model on Jacobi and 3D FFT Kernels.

Scientific applications are increasingly getting complex, e.g. to improve their accuracy by taking into account more phenomena. Meanwhile, computing infrastructures are continuing their fast evolution. Thus, software engineering is becoming a major issue to achieve portability while achieving high performance. Software component model is a promising approach, which enables to manipulate the software architecture of an application. However, existing models do not provide enough support for portability across different hardware architectures. This talks sumarizes experience gained with L2C, a low level component model targeting in particular HPC, on Jacobi and 3D FFT kernels.

Padma Raghavan, Penn State

Multilevel Data Structures for Accelerating Parallel Sparse Matrix Computations

We propose multilevel forms of the traditional compressed sparse row (CSR) representation for sparse matrices that map to the non-uniform memory architecture of multicore processors. We seek to reduce the latencies of data accesses by leveraging temporal locality to enhance cache performance. We discuss and provide results that demonstrate that our CSR-K forms can greatly accelerate sparse matrix vector multiplication and sparse triangular solution on multicores. We will also comment on how CSR-K dovetails with dynamic scheduling to enable these sparse computations with the multilevel data structures to approach the high rates of execution that are commanded by their dense matrix counterparts.

Daniel A. Reed, University of Iowa

Adaptive, Large-Scale Computing Systems

HPC systems continue to grow in size and complexity. Today’s leading edge systems contain many thousands of multicore, accelerator-driven nodes, and proposed, next-generation systems are likely to contain even more. At this scale, maintaining system operation when hardware components may fail every few minutes or hours is increasingly difficult. Increasing system sizes bring a complementary challenge surrounding energy availability and costs, with projected systems expected to consume twenty or more megawatts of power. For future HPC systems to be useable and cost effective, we must develop new design methodologies and operating principles that embody the two important realities of large-scale systems: (a) frequent hardware component failures are a part of normal operation and (b) energy consumption and power costs must be managed as carefully as performance and resilience. This talk will survey some of the challenges and current work on resilience and energy management.

Yves Robert, ENS Lyon

Algorithms for coping with silent errors

Silent errors have become a major problem for large-scale distributed systems. Detection is hard, and correction is even harder. This talk presents generic algorithms to achieve both detection and correction of silent errors, by coupling verification mechanisms and checkpointing protocols. Application-specific techniques will also beinvestigated for sparse numerical linear algebra.

Joel Saltz, SUNY Stony Brook

Exascale Challenges in Integrative Multi-scale Spatio-Temporal Analyses

Integrative analyses of large scale spatio-temporal datasets play increasingly important roles in many areas of science and engineering. Our recent work in this area is motivated by application scenarios involving complementary digital microscopy, radiology and “omic” analyses in cancer research. In these scenarios, the objective is to use a coordinated set of image analysis, feature extraction and machine learning methods to predict disease progression and to aid in targeting new therapies. I will describe tools and methods our group has developed for extraction, management, and analysis of features along with the systems software methods for optimizing execution on high end CPU/GPU platforms. Once having provided our current work as an introduction, I will then describe 1) related but much more ambitious exascale biomedical and non-biomedical use cases that also involve the complex interplay between multi-scale structure and molecular mechanism and 2) concepts and requirements for methods and tools that address these challenges.

Satoshi Sekiguchi, Grid Technology Research Center, AIST

Dataflow-centric Warehouse-scale Computing

Foreseeing real-time big data processing in 2020, much more data have to be processed to get better understandings, however, simple scaleup of current system may not satisfy the requirement due to narrow bandwidth between I/O and CPUs to deal with "Big Data". Furthermore, technical trends in IT infrastructure shows no commodity-based servers will survive between gigantic data centers and trillion of edge devices. What does the data center 2020 look like ? We have started a small project to design data processing infrastructure from scratch considering applications to maximize use of wide-variety and multi-velocity of Big Data. This talk will introduce its concept and preliminary thoughts of its design.

Martin Swany, Indiana U

Network Acceleration for Data Logistics in Distributed Computing

Data movement is a key overhead in distributed computing environments, from HPC to big data applications in the cloud. Data logistics concerns having data where it needs to be, minimizing effective overheads. This talk will cover perspectives and specific examples from our recent work, included software-defined networks and programmable network interfaces.

Frédéric SUTER

Scalable Off-line Simulation of MPI applications

In this talk, I will present the last developments related to the simulation of MPI applications with Time-Independent Traces and SimGrid. After an overview of the encouraging results we achieved and the capacities of this simulation framework, I will detail our on-going work to further increase the scalability of our simulations.

Michela Taufer, U of Delaware

Performance and Cost Effectiveness of DAG-based Workflow Executions on the Cloud

When executing DAG-like workflows on a virtualized platform such as the Cloud, we always search for scheduling policies that assure performance and cost effectiveness. The fact that we know the platform's physical characteristics only imperfectly makes our goal hard to achieve. In this talk we address this challenge by performing an exhaustive performance and cost analysis of ``oblivious'' scheduling heuristics on a Cloud platform whose computational characteristics cannot be known reliably. Our study considers three scheduling policies (AO, Greedy, and Sidney) under static and dynamic resource allocation on an EC2 testing environment. Our results outline the strength of the AO policy and show how this policy can effectively reallocate workflows of up to 4000-task DAGs from 2 to 32 vCPUs while providing up to 90% performance gain for 70% additional cost. In contrast, the other policies provide only marginal performance gain for much higher cost. Our empirical observations therefore make a strong case for adopting AO on the Cloud.

Rajeev Thakur, ANL

Future Node Architectures and their Implications for MPI

Jeffery Vetter, ORNL and GATech

Exploring Emerging Memory Technologies in the Extreme Scale HPC Co-Design Space

Concerns about energy-efficiency and reliability have forced our community to reexamine the full spectrum of architectures, software, and algorithms that constitute our ecosystem. While architectures and programming models have remained relatively stable for almost two decades, new architectural features, such as heterogeneous processing, nonvolatile memory, and optical interconnection networks, will demand that software systems and applications be redesigned so that they expose massive amounts of hierarchical parallelism, carefully orchestrate data movement, and balance concerns over performance, power, resiliency, and productivity. In what DOE has termed 'co-design,' teams of architects, software designers, and applications scientists, are working collectively to realize an integrated solution to these challenges. To tackle this challenge of power consumption and cost, we are investigating the design of future memory hierarchies, which includes nonvolatile memory. In this talk, I will sample these emerging memory technologies and discuss how we are preparing applications and software for these upcoming systems with radically different memory hierarchies.

Frederic Vivien

Scheduling Tree-Shaped Task Graphs to Minimize Memory and Makespan

We investigate the execution of tree-shaped task graphs using multiple processors. Each edge of such a tree represents a large IO file. A task can only be executed if all input and output files fit into memory, and a file can only be removed from memory after it has been consumed. Such trees arise, for instance, in the multifrontal method of sparse matrix factorization. The maximum amount of memory needed depends on the execution order of the tasks. With one processor the objective of the tree traversal is to minimize the required memory. This problem was well studied and optimal polynomial algorithms were proposed. Here, we extend the problem by considering multiple processors, which is of obvious interest in the application area of matrix factorization. With the multiple processors comes the additional objective to minimize the time needed to traverse the tree, i.e., to minimize the makespan. Not surprisingly, this problem proves to be much harder than the sequential one. We study the computational complexity of this problem and provide an inapproximability result even for unit weight trees. Several heuristics are proposed, each with a different optimization focus, and they are analyzed in an extensive experimental evaluation using realistic trees.

David Walker and Fred G. Gustavson

Algorithms for In-Place Matrix Transposition

This talk presents an implementation of an in-place, swap-based algorithm for transposing rectangular matrices. The implementation is based on an algorithm described by Tretyakov and Tyrtyshnikov [Optimal in-place transposition of rectangular matrices. Journal of Complexity 25 (2009), pp. 377–384 ], but we have introduced a number of variations. In particular, we show how the original algorithm can be modified to require constant additional memory. We also identify opportunities for exploiting parallelism. Performance measurements for different algorithm variants are presented and discussed.

Michael Wolfe, PGI

A Compiler Engineer's View of High Performance Technical Computing

Looking through the lens of a compiler writer, I explore the past few decades, the present state and likely future of HPC. What computer architectures have come and gone? What has survived, and what will we be using ten years hence? How have programming languages changed? What requirements and expectations does all this place on a compiler?

Biographies of Attendees:

David Abramson and Hoang Nguyen

University of Queensland

Professor David Abramson has been involved in computer architecture and high performance computing research since 1979. He has held appointments at Griffith University,CSIRO, RMIT and Monash University. Most recently at Monash he was the Director of the Monash e-Education Centre, Deputy Director of the Monash e-Research Centre and a Professor of Computer Science in the Faculty of Information Technology. He held an Australian Research Council Professorial Fellowship from 2007 to 2011. He has worked on a variety of HPC middleware components including the Nimrod family of tools and the Guard relative debugger.

Abramson is currently the Director of the Research Computing Centre at the University of Queensland. He is a fellow of the Association for Computing Machinery (ACM) and the Academy of Science and Technological Engineering (ATSE), and a Senior Member of the IEEE.

Nguyen is a PhD student in the School of Information Technology and Electrical Engineering at the University of Queensland.

Rosa M Badia

Barcelona Supercomputing Center

Rosa M. Badia holds a PhD on Computer Science (1994) from the Technical University of Catalonia (UPC). She is a Scientific Researcher from the Consejo Superior de Investigaciones Científicas (CSIC) and team leader of the Grid Computing and Cluster research group at the Barcelona Supercomputing Center (BSC). She was involved in teaching and research activities at the UPC from 1989 to 2008, where she was an Associated Professor since year 1997. From 1999 to 2005 she was involved in research and development activities at the European Center of Parallelism of Barcelona (CEPBA). Her current research interest are programming models for complex platforms (from multicore, GPUs to Grid/Cloud). The group lead by Dr. Badia has been developing StarSs programming model for more than 10 years, with a high success in adoption by application developers. Currently the group focuses its efforts in two instances of StarSs: OmpSs for heterogenoeus platforms and COMPSs for distributed computing (i.e. Cloud). Dr Badia has published more than 120 papers in international conferences and journals in the topics of her research. She has participated in several European projects, for example BEinGRID, Brein, CoreGRID, OGF-Europe, SIENA, TEXT and VENUS-C, and currently she is participating in the project Severo Ochoa (at Spanish level), TERAFLUX, ASCETIC, The Human Brain Project, EU-Brazil CloudConnect, and TransPlant and it is a member of HiPEAC2 NoE.

Pete Beckman

Argonne National Laboratory

Director, Exascale Technology and Computing Institute

Co-Director, Northwestern-Argonne Institute for Science and Engineering

Pete Beckman is the founder and director of the Exascale Technology and Computing Institute at Argonne National Laboratory and the co-director of the Northwestern-Argonne Institute for Science and Engineering. From 2008-2010 he was the director of the Argonne Leadership Computing Facility, where he led the Argonne team working with IBM on the design of Mira, a 10 petaflop Blue Gene/Q, and helped found the International Exascale Software Project.

Pete joined Argonne in 2002, serving first as director of engineering and later as chief architect for the TeraGrid, where he led the design and deployment team that created the world's most powerful Grid computing system for linking production HPC computing centers for the National Science Foundation. After the TeraGrid became fully operational, Pete started a research team focusing on petascale high-performance system software.

As an industry leader, he founded a Turbolinux-sponsored research laboratory in 2000 that developed the world's first dynamic provisioning system for cloud computing and HPC clusters. The following year, Pete became vice president of Turbolinux's worldwide engineering efforts, managing development offices in the U.S., Japan, China, Korea, and Slovenia.

Dr. Beckman has a Ph.D. in computer science from Indiana University (1993) and a B.A. in Computer Science, Physics, and Math from Anderson University (1985).

Jean-Yves Berthou, ANR

Jean-Yves Berthou has joined in September 2011 the French National Research Agency (ANR) as Director of the Department for Information and Communication Science and Technologies. He has been before that the Director of the EDF R&D Information Technologies program since 2008 and the coordinator of the EESI European Support Action, European Exascale Software Initiative, www.eesi-project.eu.

Jean-Yves joined EDF R&D in 1997 as a researcher. He was the head of the Applied Scientific Computing Group (High Performance Computing, Simulation Platforms Development, Scientific Software Architecture) at EDF R&D from 2002 to 2006. He has been Chargé de Mission – Strategic Steering Manager for Simulation, in charge of the simulation program at EDF R&D from 2006 to 2009.

Jean-Yves received a Ph.D in computer science from "Pierre et Marie Curie" University (PARIS VI) in 1993. His research deals mainly with Parallelization, Parallel Programming and Software Architecture for Scientific Computing.

George Bosilca, UTK

George Bosilca is a Research Director at the Innovative Computing Laboratory, at the University of Tennessee, Knoxville. His area of interest evolved around parallel computer architecture and systems, high performance computing, programming paradigms, distributed algorithms and resilience.

Bill Brantley, AMD

Dr. Brantley is a Fellow Design Engineer in the Research Division of Advanced Micro Devices. He completed his Ph.D. at Carnegie Mellon University in ECE after working 3 years at Los Alamos National Laboratory. Next, he joined the T.J. Watson Research Center where he began a project which led to the vector instruction set extensions for the Z-Series CPUs. He was one of the architects of the 64 CPU RP3 (a DARPA supported HPC system development in the mid-80s) and led the processor design including a hardware performance monitor. Later he contributed to a number of projects at IBM, mostly dealing with system level performance of RISC 6000 and other systems, eventually joining the Linux Technology Center. In 2002, he joined Advanced Micro Devices and helped to launch the Opteron. He was leader of the performance team focused on HPC until 2012 when he joined the AMD Research Division where he led both interconnect and programmability efforts of our DoE Exascale Fast Forward contract.

Marc Buffat

université Claude Bernard Lyon 1

Marc BUFFAT is professor in the mechanical engineering department at the University Claude Bernard Lyon 1. His fields of expertise are Fluids Mechanics, Computational Fluid Dynamics and High Performance Computing. He heads the FLMSN "Fédération Lyonnaise de Modélisation et Sciences Numériques" that gathers the HPC mesocenter at Lyon, member of the project EQUIPEX EQUIP@MESO, IXXI and CBP.

Franck Cappello, Univ Paris/ANL

Franck Cappello is senior scientist and the project manager of research on resilience at the extreme scale at Argonne National Laboratory since 2013. He old also a position of Adjunct Professor at the CS department of the University of Illinois at Urbana Champaign. Cappello is the director of the Joint Laboratory on Extreme Scale Computing gathering Inria, UIUC/NCSA, ANL and BSC. He received his Ph.D. from the University of Paris XI in 1994 and joined CNRS, the French National Center for Scientific Research. In 2003, he joined INRIA, where he holds the position of permanent senior researcher. He has initiated and directed several R&D projects, including XtremWeb, MPICH-V, Grid5000, FTI. He also initiated and directed G8 "Enabling Climate Simulation at Exascale" project gathering 7 partners institutions from 6 countries. As a member of the executive committee of the International Exascale Software Project and as leader of the resilience topic in the European Exascale Software Initiative 1&2, he led the roadmap and strategy efforts for projects related to resilience at the extreme scale.

Barbara Chapman, U of Houston

Dr. Chapman studied Mathematics and Computer Science in New Zealand and at Queen’s University of Belfast, Northern Ireland, U.K. She is currently a Professor at the University of Houston, TX, where she is also the founding Director of the Center for Advanced Computing and Data Systems.

Professor Chapman performs research into programming models, compilers and tools for parallel and distributed computations, as well as into program development tools. She has been involved in the development of the OpenMP industry standard for parallel programming for 15 years, and is moreover engaged in the development of the OpenACC programming interface, the Multicore Association’s library interfaces, and OpenSHMEM. Her group has created a near-industry strength compiler, OpenUH, which provides implementations for OpenMP and OpenACC, as well as for Fortran Co-arrays. The group has also produced a reference implementation for OpenSHMEM.

Francois Courteille

NVIDIA Corp

"F. Courteille got a MS degree in Computer Science from Institut National des Sciences Appliquées (INSA) de Lyon in 1977.He worked for more than 35 years as technical leader in HPC, first as pre-sales application project leader at Control Data Corporation on the supercomputer vector lines (CYBER 2xx and ETA) before moving to Convex then NEC Corp. (SX line and Earth Simulator) to lead the Western Europe pre-sales and benchmark teams. He has specialized in high performance computing application software porting and tuning on large scale parallel and vector systems with a specific interest on dense and sparse linear algebra. Today as Solution Architect at NVIDIA he is helping to design and/or promote high performance computing solutions (hardware & software) using GPUs."

Joe Curley, Intel

Joe Curley is director of marketing in Technical Computing Group at Intel Corporation in Hillsboro, OR, USA. Joe joined Intel in 2007 and has served in a series of technology planning and business leadership on what is now the Intel(r) Xeon Phi(tm) product line. Prior to joining Intel, Joe served in a series of business and engineering executive positions at Dell, Inc. from 1996 to 2007. He started his career in technology in 1986 at computer graphics pioneer Tseng Labs, Inc., ultimately serving as general manager of advanced systems for the company.

Anthony Danalis, UTK

Anthony Danalis is currently a Research Scientist II with the Innovative Computing Laboratory at the University of Tennessee, Knoxville. His research interests come from the area of High Performance Computing. Recently, his work has been focused on the subjects of Compiler Analysis and Optimization, System Benchmarking, MPI, and Accelerators. He received his Ph.D. in Computer Science from the University of Delaware on Compiler Optimizations for HPC. Previously, he received an M.Sc. from the University of Delaware and an M.Sc. from the University of Crete, both on Computer Networks, and a B.Sc. in Physics from the University of Crete.

Ewa Deelman, ISI

Ewa Deelman is a Research Associate Professor at the USC Computer Science Department and an Assistant Director of Science Automation Technologies at the USC Information Sciences Institute. Dr. Deelman's research interests include the design and exploration of collaborative, distributed scientific environments, with particular emphasis on workflow management as well as the management of large amounts of data and metadata. At ISI, Dr. Deelman is leading the Pegasus project, which designs and implements workflow mapping techniques for large-scale applications running in distributed environments. Pegasus is being used today in a number of scientific disciplines, enabling researches to formulate complex computations in a declarative way. Dr. Deelman received her PhD in Computer Science from the Rensselaer Polytechnic Institute in 1997.

Patrick Demichel, HP

MS degree in computer architecture from the Control Data Corporation Institute in Paris in 1975 Since work for Hewlett Packard 34 years on scientific computing Worked on real-time computing on the HP1000, hardware, software, then on linux on the HP9000 family Spent 5 years on porting CATIA on HP platforms Spent 5 years in HP-labs in Fort-Collins on the development of the IA64 processor Since 10 years is "senior HPC architect" with focus on largest projects and most innovative projects in EMEA Now Distinguished Technologist works with HP-labs on emerging technologies like sensors, memristors, photonics, cognitive computing, low power technologies for the programs Moonshot and THE MACHINE.

Benoit Dinechin, Kalray

Benoît Dupont de Dinechin is Chief Technology Officer of Kalray (http://www.kalray.eu), a company that manufactures integrated manycore processors for embedded and industrial applications. He is also the Kalray VLIW core main architect, and co-architect of the Kalray Multi Purpose Processing Array (MPPA). Before joining Kalray, Benoît was in charge of Research and Development of the STMicroelectronics Software, Tools, Services division, with special focus on compiler design, virtual machines for embedded systems, and component-based software development frameworks. He was promoted to STMicroelectronics National Fellow in 2008. Prior to his work at STMicroelectronics, Benoît worked part-time at the Cray Research park (Minnesota, USA), where he developed the software pipeliner of the Cray T3E production compilers. Benoît earned an engineering degree in Radar and Telecommunications from the Ecole Nationale Supérieure de l'Aéronautique et de l'Espace (Toulouse, France), and a doctoral degree in computer systems from the University Pierre et Marie Curie (Paris) under the direction of Prof. P. Feautrier. He completed his post-doctoral studies at the McGill university (Montreal, Canada) at the ACAPS laboratory led by Prof. G. R. Gao.

Jack Dongarra, UTK/ORNL

Jack Dongarra received a Bachelor of Science in Mathematics from Chicago State University in 1972 and a Master of Science in Computer Science from the Illinois Institute of Technology in 1973. He received his Ph.D. in Applied Mathematics from the University of New Mexico in 1980. He worked at the Argonne National Laboratory until 1989, becoming a senior scientist. He now holds an appointment as University Distinguished Professor of Computer Science in the Computer Science Department at the University of Tennessee and holds the title of Distinguished Research Staff in the Computer Science and Mathematics Division at Oak Ridge National Laboratory (ORNL), Turing Fellow at Manchester University, and an Adjunct Professor in the Computer Science Department at Rice University. He is the director of the Innovative Computing Laboratory at the University of Tennessee. He is also the director of the Center for Information Technology Research at the University of Tennessee which coordinates and facilitates IT research efforts at the University.

Geoffrey Fox, Indiana

Fox received a Ph.D. in Theoretical Physics from Cambridge University and is now distinguished professor of Informatics and Computing, and Physics at Indiana University where he is director of the Digital Science Center and Senior Associate Dean for Research and Director of the Data Science program at the School of Informatics and Computing. He previously held positions at Caltech, Syracuse University and Florida State University after being a postdoc at the Institute of Advanced Study at Princeton, Lawrence Berkeley Laboratory and Peterhouse College Cambridge. He has supervised the PhD of 66 students and published around 1000 papers in physics and computer science with an hindex of 70 and over 25000 citations.

He currently works in applying computer science from infrastructure to analytics in Biology, Pathology, Sensor Clouds, Earthquake and Ice-sheet Science, Image processing, Deep Learning, Network Science and Particle Physics. The infrastructure work is built around Software Defined Systems on Clouds and Clusters. He is involved in several projects to enhance the capabilities of Minority Serving Institutions including the eHumanity portal. He has experience in online education and its use in MOOC’s for areas like Data and Computational Science. He is a Fellow of APS and ACM.

Al Geist, ORNL

Al Geist is a Corporate Research Fellow at Oak Ridge National Laboratory. He is the Chief Technology Officer of the Oak Ridge Leadership Computing Facility and also leads the Extreme-scale Algorithms and Solver Resilience project. His recent research is on Exascale computing and resilience needs of the hardware and software.

In his 31 years at ORNL, he has published two books and over 200 papers in areas ranging from heterogeneous distributed computing, numerical linear algebra, parallel computing, collaboration technologies, solar energy, materials science, biology, and solid state physics.

Patrick Geoffray

Google

Patrick received his PhD from University of Lyon in 2000 under the directions of Bernard Tourancheau. He worked at Myricom for 13 years implementing communication software and HPC interconnect technology. He joined Google in 2013 to work on amazing things.

Bill Gropp, UIUC

William Gropp is the Thomas M. Siebel Chair in the Department of Computer Science and Director of the Parallel Computing Institute at the University of Illinois in Urbana-Champaign. He received his Ph.D. in Computer Science from Stanford University in 1982 and worked at Yale University and Argonne National Laboratory. His research interests are in parallel computing, software for scientific computing, and numerical methods for partial differential equations. He is a Fellow of ACM, IEEE, and SIAM and a member of the National Academy of Engineering.

Mary Hall

University of Utah

Mary Hall is a Professor in the School of Computing at University of Utah. Her research focuses on compiler technology for exploiting performance-enhancing features of a variety of computer architectures, with a recent emphasis on compiler-based performing tuning technology targeting many-core graphics processors and multi-core nodes in supercomputers. Hall's prior work has focused on compiler techniques for exploiting parallelism and locality on a diversity of architectures: automatic parallelization for SMPs, superword-level parallelism, processing-in-memory architectures and FPGAs. Professor Hall is an ACM Distinguished Scientist. She has published over 70 refereed conference, journal and book chapter articles, and has given more than 50 invited presentations. She has co-authored several reports for government agencies to establish the research agenda in compilers and high-performance computing.

Torsten Hoefler, ETH

Torsten is an Assistant Professor of Computer Science at ETH Zürich, Switzerland. Before joining ETH, he lead the performance modeling and simulation efforts of parallel petascale applications for the NSF-funded Blue Waters project at NCSA/UIUC. He is also a key member of the Message Passing Interface (MPI) Forum where he chairs the "Collective Operations and Topologies" working group. Torsten won best paper awards at the ACM/IEEE Supercomputing Conference 2010 (SC10), EuroMPI 2013, ACM/IEEE Supercomputing Conference 2013 (SC13), and other conferences. He published numerous peer-reviewed scientific conference and journal articles and authored chapters of the MPI-2.2 and MPI-3.0 standards. For his work, Torsten received the SIAM SIAG/Supercomputing Junior Scientist Prize in 2012 and the IEEE TCSC Young Achievers in Scalable Computing Award in 2013. Following his Ph.D., the received the Young Alumni Award 2014 from Indiana University. Torsten was elected into the first steering committee of ACM's SIGHPC in 2013. He was the first European to receive those honors. In addition, he received the Best Student Award 2005 of the Chemnitz University of Technology. His research interests revolve around the central topic of "Performance-centric Software Development" and include scalable networks, parallel programming techniques, and performance modeling. Additional information about Torsten can be found on his homepage at htor.inf.ethz.ch.

Jeff Hollingsworth, U Maryland

Jeffrey K. Hollingsworth is a Professor of the Computer Science Department at the University of Maryland, College Park. He also has an appointment in the University of Maryland Institute for Advanced Computer Studies and the Electrical and Computer Engineering Department. He received his PhD and MS degrees in computer sciences from the University of Wisconsin. He received a B. S. in Electrical Engineering from the University of California at Berkeley.

Dr. Hollingsworth’s research seeks to develop a unified framework to understand the performance of large systems and focuses in several areas. First, he developed new approach, called dynamic instrumentation, to permit the efficient measurement of large parallel applications. Second, he has developed an auto-tuning framework called Active Harmony that can be used to tune kernels, libraries, or full applications. Third, he is investigating the interactions between different layers of software and hardware to understand how they influence performance. He is Editor in chief of the journal Parallel Computing, was general chair of the SC12 conference, and is Vice Chair of ACM SIGHPC.

Emmanuel Jeannot, Inria

Emmanuel Jeannot Emmanuel Jeannot is a senior research scientist at INRIA (Institut National de Recherche en Informatique et en Automatique) and he has been conducting his research at INRIA Bordeaux Sud-Ouest and at the LaBRI laboratory since Sept. 2009. Before that, he held the same position at INRIA Nancy Grand-Est. From Jan. 2006 to Jul. 2006, he was a visiting researcher at the University of Tennessee, ICL laboratory. From Sept. 1999 to Sept. 2005, he was assistant professor at the Université Henry Poincaré, Nancy 1. During the period 2000–2009, he did research at the LORIA laboratory. He got his Master and PhD degrees in computer science in 1996 and 1999, respectively both from Ecole Normale Supérieure de Lyon, at the LIP laboratory. After his PhD, he spent one year as a postdoc at the LaBRI laboratory in Bordeaux. His main research interests are scheduling for heterogeneous environments and grids, data redistribution, algorithms and models for parallel machines, grid computing software, adaptive online compression and programming models.

Thilo Kielmann

VU University Amsterdam

Thilo Kielmann studied Computer Science at Darmstadt University of Technology, Germany. He received his Ph.D. in Computer Engineering in 1997, and his habilitation in Computer Science in 2001, both from Siegen University, Germany. Since 1998, he is working at VU University Amsterdam, The Netherlands, where he is currently Associate Professor at the Computer Science Department. His research studies perfomability of large-scale, HPC systems, especially the trade-offs between application performance and other properties like monetary cost, energy consumption, or failure resilience. Being a systems person, he favours running code and solid experimentation.

Laurent Lefevre

Inria Avalon, Ecole Normale Superieure of Lyon, France

Since 2001, Laurent Lefevre is a permanent researcher in computer science at Inria (the French Institute for Research in Computer Science and Control). He is a member of the Avalon team (Algorithms and Software Architectures for Distributed and HPC Platforms) from the LIP laboratory in Ecole Normale Supérieure of Lyon, France. From 1997 to 2001, he was assistant professor in computer science in Lyon 1 University. He has organized several conferences in high performance networking and computing (ICPP2013, HPCC2009, CCGrid2008) and he is a member of several program committees. He has co-authored more than 100 papers published in refereed journals and conference proceedings. His interests include: energy efficiency in large scale distributed systems, high performance computing, distributed computing and networking, high performance networks protocols and services. He is a member of IEEE and takes part in several research projects. He has leaded the Inria Action de Recherche Cooperative GREEN-NET project on power aware software frameworks. Laurent Lefèvre has been nominated as Management Committee member and WG leader of the European COST action IC0804 on Energy efficiency in large scale distributed systems (2009-2013) and he is co-WG leader of the European COST Action IC1305 NESUS on sustainable ultrascale computing (2014-2018). Laurent Lefèvre was work package leader in the PrimeEnergyIT project (Intelligent Energy in Europe European call, 2010-2012). Laurent Lefèvre is the scientific representative for Inria and executive board member in the GreenTouch consortium dedicated on energy efficiency in networks (2010-2015).

Rusty Lusk

Argonne National Laboratory

Ewing “Rusty” Lusk received his Ph.D. in mathematics from the University of Maryland in 1970. He has published in mathematics (algebraic topology), automated theorem proving, database technology, logic programming, and parallel computing. He is best known for his work with the definition, implementation, and evangelization of the message-passing interface (MPI) standard. He is currently the co-director for computer science of the NUCLEI (Nulear Computational Low-Energy Initiative) SciDAC-3 project. He has been Director of the Mathematics and Computer Science Division at Argonne National Laboratory and currently holds the title of Argonne Distinguished Fellow Emeritus.

Satoshi Matsuoka, Tokyo Institute of Technology

Bernd Mohr, Juelich

Bernd Mohr started to design and develop tools for performance analysis of parallel programs already with his diploma thesis (1987) at the University of Erlangen in Germany, and continued this in his Ph.D. work (1987 to 1992). During a three year Postdoc position at the University of Oregon, he designed and implemented the original TAU performance analysis framework. Since 1996 he has been a senior scientist at Forschungszentrum Jülich, Germany's largest multidisciplinary research center and home of one of Europe's most powerful HPC system, a 28-rack BlueGene/Q. Since 2000, he is the team leader for the group "Programming Environments and Performance Optimization". Besides being responsible for user support and training in regard to performance tools at the Jülich Supercomputing Centre (JSC), he is leading the KOJAK and Scalasca performance tools efforts in collaboration with Prof. Dr. Felix Wolf of GRS Aachen. Since 2007, he also serves as deputy head for the JSC division "Application support". In 2012, Bernd Mohr joined the ISC program team to help set up the programs of the ISC conference series. He is an active member in the International Exascale Software Project (IESP) and work package leader in the European (EESI2) and Jülich (EIC, ECL) Exascale efforts. For the SC and ISC Conference series, he serves on the Steering Committee. He is the author of several dozen conference and journal articles about performance analysis and tuning of parallel programs.

Frank Mueller, NC State

Frank Mueller (mueller@cs.ncsu.edu) is a Professor in Computer Science and a member of multiple research centers at North Carolina State University. Previously, he held positions at Lawrence Livermore National Laboratory and Humboldt University Berlin, Germany. He received his Ph.D. from Florida State University in 1994. He has published papers in the areas of parallel and distributed systems, embedded and real-time systems and compilers. He is a member of ACM SIGPLAN, ACM SIGBED and a senior member of the ACM and IEEE Computer Societies as well as an ACM Distinguished Scientist. He is a recipient of an NSF Career Award, an IBM Faculty Award, a Google Research Award and a Fellowship from the Humboldt Foundation.

Raymond Namyst, University of Bordeaux

Raymond Namyst received his PhD from the University of Lille in

1997. He was lecturer at Ecole Normale Superieure de Lyon from 1998 to 2001. He became a full Professor position at University of Bordeaux in September 2002.

He is the scientific leader of the "Runtime" Inria Research Group devoted to the design of high performance runtime systems for parallel architectures. His main research interests are parallel computing,cscheduling on heterogeneous multiprocessor architectures (multicore,cNUMA, accelerators), and communicationscover high speed networks. Hechas contributed tocthe development of many significant runtime systemsc(MPI, OpenMP) and most notably the StarPU software

(http://runtime.bordeaux.inria.fr/StarPU/).

Dimitrios Nikolopoulos

Queen's University of Belfast

Dimitrios S. Nikolopoulos is Professor in the School ofElectronics, Electrical Engineering and Computer Science, atQueen's University of Belfast, where he holds the Chair in High Performance and Distributed Computing (HPDC) and is Director of Research in the HPDC Cluster. His current research activity explores real-time data-intensive systems, energy-efficient computing and new computing paradigms at the limits of power and reliability. Professor Nikolopoulos has been awarded the NSF CAREER Award, the US DoE Early Career Principal Investigator Award, an IBM Faculty Award, a Marie Curie Fellowship, a Fellowship from HIPEAC and seven best paper awards. His research has been supported with over £20 million of highly competitive, external research funding. He is a Senior Member of the ACM and a Senior Member of the IEEE.

Christian Obrecht

CETHIL UMR 5008 (CNRS, INSA-Lyon, UCB-Lyon 1), Université de Lyon

Christian Obrecht is currently working as a post-doctoral researcher at the CETHIL laboratory in INSA-Lyon. He graduated in mathematics from ULP-Strasbourg in 1990 and taught at high school level until 2008. He received a MSc degree in computer science from UCB-Lyon in 2009 and afterwards served as a research engineer for EDF until 2013. He received a PhD degree from INSA-Lyon in 2012. His research work focuses on implementation and optimization strategies for parallel CFD applications on emerging many-core architectures.

Jean-Laurent Philippe, Intel

Dr. Jean-Laurent Philippe is the Technical Sales Director for Enterprises at Intel Europe. His charter is to help large enterprises find the best solutions based on Intel solutions, platforms, technologies and products. Dr. Philippe has been with Intel over 20 years and has been in various positions in technical support and technical sales, and then has managed several teams and groups in technical presales.

Dr. Philippe holds a PhD from INPG (Grenoble, France) in computer science (automatic parallelization for distributed-memory supercomputers) and applied mathematics (cryptography). Dr. Philippe holds 2 patents in Japan on automated parallelization techniques.

Christian Perez, INRIA

Dr Christian Perez is an Inria researcher. He received his Ph.D. from the Ecole Normale Supérieure de Lyon, France, in 1999. He is leading the Avalon research team at LIP (Lyon, France), a joint team between Inria, CNRS, ENS Lyon, and the University Lyon 1. Avalon deals with energy consumption, data management, programming model, and scheduling of parallel and distributed applications on distributed and HPC platforms. His research topics include parallel and distributed programming models, application deployment, and resource management. He is also leading the Inria project laboratory Héméra that gathers more than 20 French research groups to demonstrate ambitious up-scaling techniques for large scale distributed computing on the Grid'5000 experimental testbed.

Jelena Pjesivac-Grbovic

Google

Jelena Pjesivac-Grbovic is a staff software engineer in Systems Infrastructure at Google, focusing on building large-scale distributed data processing frameworks.

Padma Raghavan, Penn State

Dan Reed

University of Iowa

Daniel A. Reed is Vice President for Research and Economic Development, as well as University Chair in Computational Science and Bioinformatics and Professor of Computer Science, Electrical and Computer Engineering and Medicine, at the University of Iowa. Previously, he was Microsoft’s Corporate Vice President for Technology Policy and Extreme Computing, where he helped shape Microsoft's long-term vision for technology innovations in cloud computing and the company's associated policy engagement with governments and institutions around the world. Before joining Microsoft, he was the Chancellor’s Eminent Professor at UNC Chapel Hill, as well as the Director of the Renaissance Computing Institute (RENCI) and the Chancellor’s Senior Advisor for Strategy and Innovation for UNC Chapel Hill. Prior to that, he was Gutgsell Professor and Head of the Department of Computer Science at the University of Illinois at Urbana-Champaign (UIUC) and Director of the National Center for Supercomputing Applications (NCSA).

Yves Robert

ENS Lyon & Univ. Tenn. Knoxville

Yves Robert received his PhD degree from Institut National Polytechnique

de Grenoble. He is currently a full professor in the Computer Science

Laboratory LIP at ENS Lyon. He is the author of 7 books, 130+ papers

published in international journals, and 200+ papers published in international

conferences. He is the editor of 11 book proceedings and 13 journal

special issues. He is the advisor of 26 PhD theses. His main research

interests are scheduling techniques and resilient algorithms for large-scale platforms.

Yves Robert served on many editorial boards, including IEEE TPDS.

He was the program chair of HiPC'2006 in Bangalore, IPDPS'2008 in Miami,

ISPDC'2009 in Lisbon, ICPP'2013 in Lyon and HiPC'2013 in Bangalore.

He is a Fellow of the IEEE. He has been elected a Senior Member of Institut

Universitaire de France in 2007 and renewed in 2012. He has been

awarded the 2014 IEEE TCSC Award for Excellence in Scalable Computing.

He holds a Visiting Scientist position at the University of Tennessee Knoxville since 2011.

Joel Saltz, Emory U

Satoshi Sekiguchi

National Institute of Advanced Industrial Science and Technology (AIST)

He has received BS from The University of Tokyo, ME from University of Tsukuba, and Ph.D. in Information Science and Technology from The University of Tokyo, respectively. He joined Electrotechnical Laboratory (ETL), Japan in 1984 to engage research in high-performance computing widely from its system architecture to applications. He has extraordinary knowledge in applying IT-based solutions to many of society's problems related to global climate change, environmental management and resource efficiency. He served as Director of Grid Technology Research Center, Director of Information Technology

Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), and is currently Deputy General Director, Directorate for Information Technology and Electronics, AIST. He has been contributing to the Open Grid Forum as a member of board of directors, is a member of IEEE Computer Society, ACM and Information Processing Society of Japan as a fellow.

Vaidy Sunderam, Emory U

Vaidy Sunderam is Samuel Candler Dobbs Professor of Computer Science at Emory University. He is also Chair of the Department of Mathematics and Computer Science, and Director of the University's strategic initiative in Computational and Life Sciences. Professor Sunderam joined the Emory faculty in 1986 after receiving his PhD from the University of Kent, England where he was a Commonwealth Scholar. His research interests are in heterogeneous distributed systems and infrastructures for collaborative computing. He is the principal architect of several frameworks for metacomputing and collaboration, and his work is supported by grants from the National Science Foundation and the U.S. Department of Energy. Professor Sunderam teaches computer science courses at the beginning, advanced, and graduate levels, and advises graduate theses in the area of computer systems. He is the recipient of several recognitions for teaching and research, including the Emory Williams Teaching award, the IEEE Gordon Bell prize for parallel processing, the IBM supercomputing award, and an R&D 100 research innovation award.

Frédéric SUTER

IN2P3 Computing Center / CNRS, Lyon-Villeurbanne, France

Frédéric Suter is a CNRS junior researcher at the IN2P3 Computing Center in Lyon, France, since October 2008. His research interests include scheduling, Grid computing and platform and application simulation. He obtained his M.S. from the Université de Picardie Jules Verne, Amiens, France, in 1999 and his Ph.D. from the Ecole Normale Supérieure de Lyon, France, in 2002.

Martin Swany, Indiana U

Martin Swany is an Associate Professor of Computer Science in Indiana University's School of Informatics and Computing and the Associate Director of the Center for Research in Extreme Scale Technologies (CREST). His research interests include high-performance parallel and distributed computing and networking.

Michela Taufer, U of Delaware

Michela Taufer joined the University of Delaware in 2007, where she was promoted to associate professor with tenure in 2012. She earned her M.S. degree in Computer Engineering from the University of Padova and her Ph.D. in Computer Science from the Swiss Federal Institute of Technology (ETH). She was a post-doctoral research supported by the La Jolla Interfaces in Science Training Program (also called LJIS) at UC San Diego and The Scripps Research Institute. Before she joined the University of Delaware, Michela was faculty in Computer Science at the University of Texas at El Paso.

Michela has a long history of interdisciplinary work with high-profile computational biophysics groups in several research and academic institutions. Her research interests include software applications and their advanced programmability in heterogeneous computing (i.e., multi-core platforms and GPUs); cloud computing and volunteer computing; and performance analysis, modeling and optimization of multi-scale applications.

She has been serving as the principal investigator of several NSF collaborative projects. She also has significant experience in mentoring a diverse population of students on interdisciplinary research. Michela's training expertise includes efforts to spread high-performance computing participation in undergraduate education and research as well as efforts to increase the interest and participation of diverse populations in interdisciplinary studies.

Michela has served on numerous IEEE program committees (SC and IPDPS among others) and has reviewed for most of the leading journals in parallel computing. - See more at: http://www.nvidia.com/content/cuda/spotlights/michela-taufer-delaware.html#sthash.cxwIFqh6.dpuf

Marc Tchiboukdjian, CGG

Marc Tchiboukdjian is an HPC architect at CGG where he is investigating new technologies for CGG's processing centers. He received is PhD in 2010 from the University of Grenoble and did his postdoc in the Exascale Computing Research center in Paris.

Rajeev Thakur, Argonne National Lab

Rajeev Thakur is the Deputy Director of the Mathematics and Computer Science Division at Argonne National Laboratory, where he is also a Senior Computer Scientist. He is also a Senior Fellow in the Computation Institute at the University of Chicago and an Adjunct Professor in the Department of Electrical Engineering and Computer Science at Northwestern University. He received a Ph.D. in Computer Engineering from Syracuse University in 1995. His research interests are in the area of high-performance computing in general and particularly in parallel programming models, runtime systems, communication libraries, and scalable parallel I/O. He is a member of the MPI Forum that defines the Message Passing Interface (MPI) standard. He is also co-author of the MPICH implementation of MPI and the ROMIO implementation of MPI-IO, which have thousands of users all over the world and form the basis of commercial MPI implementations from IBM, Cray, Intel, Microsoft, and other vendors. MPICH received an R&D 100 Award in 2005. Rajeev is a co-author of the book "Using MPI-2: Advanced Features of the Message Passing Interface" published by MIT Press, which has also been translated into Japanese. He was an associate editor of IEEE Transactions on Parallel and Distributed Systems (2003-2007) and was Technical Program Chair of the SC12 conference.

Bernard Tourancheau, University Grenoble

Bernard Tourancheau got a MSc. in Apply Maths from Grenoble University in 1986 and a MSc. in Renewable Energy Science and Technology from Loughborough University in 2007. He was awarded best Computer Science PhD by Institut National Polytechnique of Grenoble in 1989 for his work on Parallel Computing for Distributed Memory Architectures.

Working for the LIP laboratory, he was appointed assistant professor at Ecole Normale Supérieure de Lyon in 1989 before joining CNRS as a junior researcher. After initiating a CNRS-NSF collaboration, he worked two and half years on leave at the University of Tennessee on a senior researcher position with the US Center for Research in Parallel Computation at the ICL laboratory.

He then took a Professor position at University of Lyon in 1995 where he created a research laboratory and the INRIA RESO team, specialized in High Speed Networking and HPC Clusters.

In 2001, he joined SUN Microsystems Laboratories for a 6 years sabbatical as a Principal Investigator in the DARPA HPCS project where he lead the backplane networking group.

Back in academia he oriented his research on sensor and actuator networks for building energy efficiency at ENS LIP and INSA CITI labs.

He was appointed Professor at University Joseph Fourier of Grenoble in 2012. Since then, he is developing research at the LIG laboratory Drakkar team about protocols and architectures for the Internet of Things and their applications to energy efficiency in buildings. He as well continue HPC multicores GPGPU's communication algorithms optimization research and renewable energy transition vs peak oil scientific promotion.

He has authored more than a hundred peer-reviewed publications and filed 10 patents.

Stéphane Ubeda, INRIA

After a PhD in Computer Sciences from the Ecole Normale Supérieure de Lyon in 1993, Stéphane Ubéda was associated professor in the Swiss Federal Institut of Technology until 1994. He was associated professor in the Jean-Monnet Univeristy (Saint-Etienne) up to 2000 and went to the Institut Nationale des Sciences Appliquées de Lyon (INSA Lyon), as full professor in the department Telecommunications. From 2000 to 2010, he was head of the CITI Lab attached to INSA Lyon and associated with Inria. His main interest concerns global mobility management architectures and protocols. In mobility management he is interested in self-organized networks but also very in sensitive issues like temporary address, mutli-homing of mobile host and resources optimization (especially radio resources). He is also interested in to fundamental studies for models of interaction of smart objects, like the notion of trust in such an environment and what kind of security we can build on the top of it. Since 2010, he is Director of Technological Development at Inria, member of the national board of the institute. He is in charge of the coordination of software and technological developments at national level, as well as large scale research infrastructures.

Jeff Vetter, ORNL

Jeffrey Vetter, Ph.D., holds a joint appointment between Oak Ridge National Laboratory (ORNL) and the Georgia Institute of Technology (GT). At ORNL, Vetter is a Distinguished R&D Staff Member, and the founding group leader of the Future Technologies Group in the Computer Science and Mathematics Division. At GT, Vetter is a Joint Professor in the Computational Science and Engineering School, the Principal Investigator for the NSF-funded Keeneland Project that brings large scale GPU resources to NSF users through XSEDE, and the Director of the NVIDIA CUDA Center of Excellence. His papers have won awards at the International Parallel and Distributed Processing Symposium and EuroPar; he was awarded the ACM Gordon Bell Prize in 2010. His recent book “Contemporary High Performance Computing” surveys the international landscape of HPC. See his website for more information: http://ft.ornl.gov/~vetter/.

Xavier Vigouroux, Bull

Xavier Vigouroux, after a PhD from Ecole normale Superieure de Lyon in Distributed computing, worked for several major companies in different positions: From Investigator at Sun labs to Support Engineer for HP. He has now been working for bull for eight years. He led the HPC benchmarking team for the first five years, then in charge of the " Education and Research " market for HPC at Bull, he is now leading de "Center for Excellence in Parallel Programming" of Bull

Frederic Vivien, ENS

David Walker, Cardiff

David Walker is Professor of High Performance Computing in the School of Computer Science and Informatics at Cardiff University, where he heads the Distributed Collaborative Computing group. From 2002-2010 he was also Director of the Welsh e-Science Centre. He received a B.A. (Hons) in Mathematics from Jesus College, Cambridge in 1976, an M.Sc. in Astrophysics from Queen Mary College, London, in 1979, and a Ph.D. in Physics from the same institution in 1983. Professor Walker has conducted research into parallel and distributed algorithms and applications for the past 25 years in the UK and USA, and has published over 140 papers on these subjects. Professor Walker was instrumental in initiating and guiding the development of the MPI specification for message-passing, and has co-authored a book on MPI. He also contributed to the ScaLAPACK library for parallel numerical linear algebra computations. Professor Walker’s research interests include software environments for distributed scientific computing, problem-solving environments and portals, and parallel applications and algorithms. Professor Walker is a Principal Editor of Computer Physics Communications, the co-editor of Concurrency and Computation: Practice and Experience, and serves on the editorial boards of the International Journal of High Performance Computing Applications, and the Journal of Computational Science.

Michael Wolfe

NVIDA/PGI

Michael Wolfe has over 35 years experience working on compilers in academia and industry. He joined PGI in 1996 and has most recently bee working on compilers for heterogeneous host+accelerator systems. He was formerly an associate professor at OGI, and a cofounder and lead compiler engineer at KAI prior to that. He has published one textbook, "High Performance Compilers for Parallel Computing."

CCGSC 1998 Participants, Blackberry Tennessee

CCGSC 2000 Participants, Faverges, France

CCGSC 2002 Participants, Faverges, France

CCGCS 2004 Participants, Faverges, France

CCGCS 2006 Participants, Flat Rock North Carolina

Some additional pictures can be found here.

http://web.eecs.utk.edu/~dongarra/ccgsc2006/

CCGCS 2008 Participants, Flat Rock North Carolina

http://web.eecs.utk.edu/~dongarra/ccgsc2008/

CCGCS 2010 Participants, Flat Rock North Carolina

http://web.eecs.utk.edu/~dongarra/ccgsc2010/

CCDSC 2012 Participants, Dareize, France

http://web.eecs.utk.edu/~dongarra/CCDSC-2012/index.htm

CCGSC 2014 Participants, Dareize, France

Châteauform’

La Maison des Contes

Châteauform’

La Maison des Contes

Message from the Program Chairs

Draft agenda (1/15/16 11:54 AM)

September 2nd – 5th, 2014

Arrival / Departure Information:

Abstracts:

Biographies of Attendees:

Dr. Chapman studied Mathematics and Computer Science in New Zealand and at Queen’s University of Belfast, Northern Ireland, U.K. She is currently a Professor at the University of Houston, TX, where she is also the founding Director of the Center for Advanced Computing and Data Systems.

September 2^nd – 5^th, 2014