



*HPCS*

# DARPA HPCS Overview Productivity Evaluation

David Koester, Ph.D.  
DARPA HPCS Productivity Team

HPCchallenge Benchmarks Panel  
SC2004  
12 November 2004

- This work is sponsored by the Department of Defense under Army Contract W15P7T-05-C-D001. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States

Slide-1  
SC2004  
HPCC Panel

MITRE — MIT Lincoln Laboratory — ISI

*HPCS*



## Outline

- Brief DARPA HPCS Overview
  - Impacts
  - Programmatic
  - HPCS Phase II Teams
  - Program Goals
  - HPCS Productivity Team Benchmarking Working Group
- Productivity Evaluation
  - Development Time Productivity Indicators
  - Publications on HPC Productivity
- Summary

Slide-2  
SC2004  
HPCC Panel

MITRE

MIT Lincoln Laboratory

ISI



## DARPA High Productivity Computing Systems *HPCS*

- Create a new generation of **economically viable computing systems (2010)** and a **procurement methodology (2007-2010)** for the security/industrial community

### Impact:

- **Performance** (time-to-solution): speedup critical national security applications by a factor of 10X to 40X
- **Programmability** (idea-to-first-solution): reduce cost and time of developing application solutions
- **Portability** (transparency): insulate research and operational application software from system
- **Robustness** (reliability): apply all known techniques to protect against outside attacks, hardware faults, & programming errors



### Applications:

- Intelligence/surveillance, reconnaissance, cryptanalysis, weapons analysis, airborne contaminant modeling and biotechnology

Fill the Critical Technology and Capability Gap  
 Today (late 80's HPC technology).....to.....Future (Quantum/Bio Computing)



## High Productivity Computing Systems *HPCS* -Program Overview-

- Create a new generation of **economically viable computing systems (2010)** and a **procurement methodology (2007-2010)** for the security/industrial community

### Full Scale Development

### Advanced Design & Prototypes

### Concept Study

Half-Way Point  
Phase 2

Petascale/s Systems

Validated Procurement  
Evaluation Methodology

Test Evaluation  
Framework

New Evaluation  
Framework



Vendors



Phase 1

Phase 2  
(2003-2005)

Phase 3  
(2006-2010)

MITRE

MIT Lincoln Laboratory

ISI

Slide-4  
SC2004  
HPCC Panel

**DARPA** **HPCS Phase II Teams** **HPCS**

---

**Industry**



PI: Elnozahy



PI: Mitchell



PI: Smith

---

**Mission Partners**



Office of Science  
U.S. Department of Energy



NASA



NSF



NATIONAL SECURITY AGENCY  
U.S. DEPARTMENT OF DEFENSE



INTELLIGENCE COMMUNITY COMPUTING CENTER  
U.S. DEPARTMENT OF DEFENSE



HPC  
NASA  
National Nuclear Security Administration

---

**Productivity Team (Lincoln Lead)**



MIT Lincoln  
Laboratory

PI: Kepner



PI: Lucas



PI: Basili



PI: Benson & Snavely



PI: Dongarra



MITRE



OAK RIDGE



Los Alamos  
NATIONAL LABORATORY



ARGONNE  
NATIONAL LABORATORY



UCSB



CSAIL



Ohio State



CODESOURCERY

PI: Koester

Pls: Vetter, Lusk, Post, Bailey

Pls: Gilbert, Edelman, Ahalt, Mitchell

Slide-6  
SC2004  
HPCC Panel

MITRE

MIT Lincoln Laboratory

ISI

**DARPA** **HPCS Phase II Teams** **HPCS**

---

**Industry**



PI: Elnozahy



PI: Mitchell



PI: Smith

---

**Mission Partners**



Office of Science  
U.S. Department of Energy



NASA



NSF



NATIONAL SECURITY AGENCY  
U.S. DEPARTMENT OF DEFENSE



INTELLIGENCE COMMUNITY COMPUTING CENTER  
U.S. DEPARTMENT OF DEFENSE



HPC  
NASA  
National Nuclear Security Administration

---

**Productivity Team Working Groups**

- Development Time Experiments
- Execution Time Modeling
- Benchmarks
- Programming Models and Definitions
- Test and Spec Environment
- Workflows, Models and Metrics
- Existing Codes Analysis

---

**Productivity Team (Lincoln Lead)**



MIT Lincoln  
Laboratory

PI: Kepner



PI: Lucas



PI: Basili



PI: Benson & Snavely



PI: Dongarra



MITRE



OAK RIDGE



Los Alamos  
NATIONAL LABORATORY



ARGONNE  
NATIONAL LABORATORY



UCSB



CSAIL



Ohio State



CODESOURCERY

PI: Koester

Pls: Vetter, Lusk, Post, Bailey

Pls: Gilbert, Edelman, Ahalt, Mitchell

Slide-6  
SC2004  
HPCC Panel

MITRE

MIT Lincoln Laboratory

ISI

**HPCS Program Goals**  
**Productivity Goals**

**DARPA** **HPCS**

- HPCS overall productivity goals:
  - Execution (sustained performance)
    - 1 Petaflop/s (scalable to greater than 4 Petaflop/s)
    - Reference: Production workflow
  - Development
    - 10X over today's systems
    - Reference: Lone researcher and Enterprise workflows

Production

Observe Orient  
Production Decide  
Act

Lone Researcher

Theory Researcher Experiment

Enterprise

Visualize Design Port Legacy Software  
Enterprise Simulation

Development  
Execution

**10x improvement in time to first solution!**

Slide-5  
SC2004  
HPCC Panel

MITRE MIT Lincoln Laboratory ISI

**HPCS Program Goals**  
**Productivity Framework**

**DARPA** **HPCS**

Activity & Purpose  
Benchmarks

System Parameters  
(Examples)

BW bytes/flop (Balance)  
Memory latency  
**Memory size**  
Processor flop/cycle  
Number of processors  
Clock frequency.....  
Bisection bandwidth  
Power/system  
**# of racks**  
Code size  
Restart time  
Peak flops/sec  
...

Productivity (Utility/Cost)

Work Flows

Productivity Metrics

Execution Time

Development Time

Actual System or Model

MITRE MIT Lincoln Laboratory ISI

Slide-5  
SC2004  
HPCC Panel



**DARPA**

## HPCS Benchmark Spectrum HPCchallenge Benchmarks

**HPCchallenge Benchmarks**  
<http://icl.cs.utk.edu/hpcc/>

Execution Indicators      Development Indicators

Execution Bounds

- Local DGEMM STREAM RandomAccess 1DFFT
- Global Linpack PTRANS RandomAccess 1DFFT

8 HPCchallenge Benchmarks

- To examine the performance of HPC architectures using kernels with more *challenging* memory access patterns than HPL
- To *complement* the Top500 list
- To provide benchmarks that *bound* the performance of many real applications as a function of memory access characteristics — e.g., spatial and temporal locality
- To *outlive* HPCS

System Bounds

Future Applications      Emerging Applications      Existing Applications

Current UM2000 GAMESS OVERFLOW LBMDH RFCTH HYCOM

Near-Future NWChem ALEGRA CCSM

9 Simulation Applications

• HPCchallenge pushes spatial and temporal boundaries; sets performance bounds  
• Available for download <http://icl.cs.utk.edu/hpcc/>

Slide-11  
SC2004  
HPCC Panel

MITRE — MIT Lincoln Laboratory — ISI

**DARPA**

## HPCS Benchmark Spectrum HPCchallenge Benchmarks

**HPCchallenge Benchmarks**  
<http://icl.cs.utk.edu/hpcc/>

Execution Indicators      Development Indicators

Execution Bounds

- Local DGEMM STREAM RandomAccess 1DFFT
- Global Linpack PTRANS RandomAccess 1DFFT

8 HPCchallenge Benchmarks

- 1. EP-DGEMM (matrix x matrix multiply)
- 2. STREAM
  - COPY
  - SCALE
  - ADD
  - TRIADD
- 3. EP-RandomAccess
- 4. EP-1DFFT
- 5. High Performance LINPACK (HPL)
- 6. PTRANS — parallel matrix transpose
- 7. G-RandomAccess
- 8. G-1DFFT

System Bounds

Future Applications      Emerging Applications      Existing Applications

Current UM2000 GAMESS OVERFLOW LBMDH RFCTH HYCOM

Near-Future NWChem ALEGRA CCSM

9 Simulation Applications

• HPCchallenge pushes spatial and temporal boundaries; sets performance bounds  
• Available for download <http://icl.cs.utk.edu/hpcc/>

Slide-12  
SC2004  
HPCC Panel

MITRE — MIT Lincoln Laboratory — ISI



## Outline



- Brief DARPA HPCS Overview
  - Impacts
  - Programmatic
  - HPCS Phase II Teams
  - Program Goals
  - HPCS Productivity Team Benchmarking Working Group
- Productivity Evaluation
  - Development Time Productivity Indicators
  - Publications on HPC Productivity
- Summary

Slide-13  
SC2004  
HPCC Panel

MITRE

MIT Lincoln Laboratory

ISI



## HPCS Program Goals Productivity Framework



$$\text{Productivity} = \text{Utility}/\text{Cost}$$
$$\Psi \equiv \frac{U}{C} = \frac{U(T)}{C_S + C_O + C_M}$$



Activity & Purpose  
Benchmarks

Productivity  
Metrics

Execution  
Time

Development  
Time

System Parameters  
(Examples)

- BW bytes/flop (Balance)
- Memory latency
- Memory size
- Processor flop/cycle
- Number of processors
- Clock frequency.....
- Bisection bandwidth
- Power/system
- # of racks
- Code size
- Restart time
- Peak flops/sec
- ...

Slide-14  
SC2004  
HPCC Panel

MITRE

MIT Lincoln Laboratory

ISI

**DARPA** **Productivity Factors** **Execution Time & Development Time** **HPCS**

**Productivity = Utility/Cost**

$$\Psi \equiv \frac{U}{C} = \frac{U(T)}{C_S + C_O + C_M}$$

- Utility and some Costs are relative to
  - Workflow (WkFlow)
  - Execution Time (ExecTime)
  - Development Time (DevTime)

| <u>Utility</u> | <u>Software &amp; Operating Costs</u> | <u>Machine Costs</u> |
|----------------|---------------------------------------|----------------------|
|                |                                       |                      |
| DevTime        | DevTime                               | DevTime              |
| Low ExecTime   | Low ExecTime                          | Low ExecTime         |

- Reductions in both Execution Time and Development Time contribute to positive increases in Utility
  - Utility generally is inversely related to time
  - Quicker is better
- Reductions in both Execution Time and Development Time contribute to positive decreases in Software and Operating costs
  - Reduction in programmer costs
  - More work performed over a period
- However, systems that will provide increased utility and decreased operating costs may have a higher initial procurement cost
  - Need productivity metrics to justify the higher initial cost

MITRE — MIT Lincoln Laboratory — ISI

Slide-15  
SC2004  
HPCC Panel

**DARPA** **Development Time Productivity Indicators** **HPCS**

- Several key indicators which can be applied directly or indirectly to HPCchallenge, CompactApps, Full App, and Classroom Experiments
- Actual User Performance Achieved
  - Direct: timing of user code
  - Indirect: paper analysis of code/features => connection to workflows
- Effort required
  - Direct: measure time to implement/modify code
  - Indirect: software lines of code (SLOC)
- Expertise level required
  - Direct: fraction of users who can achieve a certain level of performance
  - Indirect: paper analysis of code/features => connection to workflows, number experts of needed
- Many additional factors are important
- Performance, Effort and Expertise were mentioned the most

MITRE — MIT Lincoln Laboratory — ISI

Slide-16  
SC2004  
HPCC Panel



## Strawman Development Time Productivity Formula

$$\text{Dev Time Productivity} = \frac{\text{Relative Speedup}}{\text{Relative Effort}}$$

Speedup =  $\frac{\text{Parallel Performance}}{\text{Serial Performance}}$

Relative Effort =  $\frac{\text{Parallel SLOC}}{\text{Serial SLOC}}$

- **Dev Time Productivity = Utility/Effort**
  - Units: speedup per relative effort
- **Utility = median user speedup**
  - Compared to serial on workstation
- **Effort = relative time to implement**
  - Compared to serial on workstation
- **Simplest way to combine currently measurable quantities**
- **Too simplistic?**

Slide-17  
SC2004  
HPCC Panel

MITRE

MIT Lincoln Laboratory

ISI



## Hypothetical Formula Usage

- Consider Application implemented using various approaches

| Approach                   | Speedup |        |        |              |
|----------------------------|---------|--------|--------|--------------|
|                            | Median  | Expert | Effort | Productivity |
| C/MPI on a 128 CPU cluster | 16      | 100    | 2      | 8            |
| OpenMP on Shared Memory    | 16      | 100    | 1.2    | 13.3         |
| HPCS hardware              | 32      | 200    | 1.2    | 26.3         |
| HPCS performance tools     | 64      | 200    | 1.2    | 53.3         |
| High Level Language        | 64      | 200    | 0.2    | 320          |

- Max HPCS development productivity benefit  $320/8 = 40x$

Slide-18  
SC2004  
HPCC Panel

MITRE

MIT Lincoln Laboratory

ISI



## Special Issue on “HPC Productivity” *HPCS*

- *International Journal of High Performance Computing Applications*, Volume 18, Number 4, Winter 2004 (November)



1. "HPC Productivity: An Overarching View" Jeremy Kepner
2. "Software Project Management and Quality Engineering Practices for Complex, Coupled Multi-Physics, Massively Parallel Computational Simulations: Lessons Learned from ASCI" Doug Post and Richard Kendall
3. "A Framework for Measuring Supercomputer Productivity" Marc Snir and David A. Bader
4. "Productivity Metrics and Models for High Performance Computing" Thomas Sterling
5. "A Strategy for Measuring the Productivity of Programming Interfaces" Ken Kennedy, Charles Koelbel and Rob Schreiber
6. "Performance Metrics Based on Computation Action" Robert W. Numrich
7. "Measuring HPC Productivity" Stuart Faulk, Philip Johnson, Adam Porter, Walter Tichy, and Lawrence Votta
8. "Purpose-Based Benchmarks" John L. Gustafson
9. "Productivity in HPC" David J. Kuck
10. "HPC Productivity Model Synthesis" Jeremy Kepner

- Inventing a new field

Slide-19  
SC2004  
HPCC Panel

MITRE

MIT Lincoln Laboratory

ISI



## Outline *HPCS*

- Brief DARPA HPCS Overview
  - Impacts
  - Programmatic
  - HPCS Phase II Teams
  - Program Goals
  - HPCS Productivity Team Benchmarking Working Group
- Productivity Evaluation
  - Development Time Productivity Indicators
  - Publications on HPC Productivity
- Summary

Slide-20  
SC2004  
HPCC Panel

MITRE

MIT Lincoln Laboratory

ISI



## Summary



- Create a new generation of economically viable computing systems (2010)

|                   |                                  |
|-------------------|----------------------------------|
| – Impacts         | – Hardware Challenges            |
| ▪ Performance     | ▪ 2+ PF/s LINPACK                |
| ▪ Programmability | ▪ 6.5 PB/sec STREAM bandwidth    |
| ▪ Portability     | ▪ 3.2 PB/sec Bisection bandwidth |
| ▪ Robustness      | ▪ 64,000 GUPS                    |

- Create a new procurement methodology based on Productivity (2007-2010)



Slide-21  
SC2004  
HPCC Panel

MITRE

MIT Lincoln Laboratory

ISI