next up previous contents
Next: The HP Integrity Superdome. Up: Recount of (almost) available ... Previous: The Hitachi SR11000.

The HP 9000 SuperDome.

Machine type RISC-based ccNUMA system.
Models HP 9000 SuperDome.
Operating system HP-UX (HP's usual Unix flavour)
Connection structure Crossbar
Compilers Fortran 77, Fortran 90, Parallel Fortran, HPF, C, C++
Vendors information Web page
Year of introduction 2000, 2004 with PA-RISC 8800.

System parameters:

Model HP 9000 SuperDome
Clock cycle 1 GHz
Theor. peak performance
Per proc. (64-bits) 8 Gflop/s
Maximal (64-bits) 512 Gflop/s
Main memory
Memory/node ≤ 64 GB
Memory/maximal 1 TB
No. of processors ≤ 64
Communication bandwidth
aggregate (global) 64 GB/s
(cell—backplane) 8 GB/s
(within cell, see below) 16 GB/s


The Superdome replaced the Exemplar V2600 system which has been withdrawn by HP (see section Systems Disappeared from the List). The connection structure of the Superdome has significantly improved over that of the former V2600. The Superdome has a 2-level crossbar: one level within a 4-processor cell and another level by connecting the cells the crossbar backplane. Every cell connects to the backplane at a speed of 8 GB/s and the global aggregate bandwidth for a fully configured system is therefore 64 GB/s.

As said, the basic building block of the Superdome is the 4-processor cell. All data traffic within a cell is controlled by the Cell Controller, a 10-port ASIC. It connects to the four local memory subsystems at 16 GB/s, to the backplane crossbar at 8 GB/s, and to two ports that each serve two processors at 6.4 GB/s/port. As each processor houses two CPU cores the available bandwidth per CPU core is 1.6 GB/s. Like the SGI Altix systems, the cache coherency in the Superdome is secured by using directory memory. The NUMA factor for a full 64 processor systems is by HP's account very modest: only 1.8.

The PA-RISC 8800 processors run at a clock frequency of 1 GHz. As each processor contains two processor cores which in turn contain 2 floating-point units that are able to execute a combined floating multiply-add instruction, in favourable circumstances 8 flops/cycle can be achieved and a Theoretical Peak Performance of 8 Gflop/s per processor can be attained. This amounts to a peak speed of 512 Gflop/s for a full configuration.
Because a shared-memory parallel model is supported over the entire system, OpenMP can be employed on the total of 64 processors (128 CPU cores).
The Superdome can be partitioned in different complexes that run with different processors, e.g., the Itanium 2. In that case the same backplane can be used but the cells are of a different type. In theory one therefore can have a mixed HP 9000 Superdome and an Integrity Superdome (see below).

Measured Performances:
From the new model with the dual-core PA-RISC 8800 processors no performance results (in the HPC realm) are known to the author, the system in on the market from April 2004. In [42] a speed of 756 Gflop/s is reported for solving a full linear system of unspecified size. This result is achieved on an older 8-way coupled system with a total of 512 PA-RISC 8700+ processors at 875 MHz. As the Theoretical Peak Performance of such a cluster is 1792 Gflop/s the efficiency is 42%.

next up previous contents
Next: The HP Integrity Superdome Up: Recount of (almost) available ... Previous: The Hitachi SR11000.

Aad van der Steen
Wed Oct 13 11:33:00 CEST 2004