|Machine type||RISC-based SMP-clustered DM-MIMD system.|
|Operating system||Tru64 Unix (Compaq's flavour of Unix)|
|Connection structure||Fat Tree|
|Compilers||Fortran 77, HPF, C, C++|
|Vendors information Web page||h18002.www1.hp.com/alphaserver/sc/|
|Year of introduction||2002.|
|Clock cycle||1.25 GHz|
|Theor. peak performance|
|Per proc. (64-bit)||2.5 Gflop/s|
|Maximal (64-bit)||10 Tflop/s|
|Main memory||≤ 8 TB|
|No. of processors||≤ 4096|
|Between cluster nodes||280 MB/s|
The AlphaServer SC is the very high end of HP's AlphaServer line (SC stands for SuperComputer). The system is typical for the present development of SMP-based clustered systems. In the SC system the basic SMP node is the Compaq ES45, a 4-CPU SMP system with the Alpha 21264a (EV68) as its processor. The clock frequency is 1.25 GHz. The SMP node has a crossbar as its internal network with an aggregate bandwidth of 5.2 GB/s (1.33 GB/s/processor). This is sufficient to deliver 1.0.64 byte/clock cycle to each processor in the node simultaneously.
Within a node the system is a shared memory machine that allows for shared-memory parallel processing, for instance by using OpenMP. When more than four processors are required, one has to use a message passing programming model like MPI, PVM, or HPF (HP/Compaq is one of the few companies that still provides its own HPF compiler).
For communication between the SMP nodes the SC uses QsNet, a network manufactured by QSW Limited. In fact QsNet is the follow-on of the network employed in the former Meiko CS-2 systems (see QsNet). The network has the structure of a fat tree, is based on PCI technology, and has a point-to-point bandwidth of 280 MB/s. Because of its fat tree structure the bandwidth in the upper level of the network is 340 MB/s sustained. The peak bandwidth is, according to the documentation, “500 MB/s per server” without further specification which looks impressive but is not very informative. QSW claims a very low latency of 5 µs for MPI messages.
Measured Performances: In  a performance of 13.88 Tflop/s on a 2-way cluster of fully configured AlphaServer SC45s (8192 processors) was reported solving a full linear system of order 633,000 with an efficiency of 68.0%.