|Machine type||RISC-based ccNUMA system.|
|Models||Altix 3300, Altix 3700.|
|Connection structure||Crossbar, hypercube (see remarks)|
|Compilers||Fortran 95, C, C++.|
|Vendors information Web page||www.sgi.com/altix/|
|Year of introduction||2003.|
|Clock cycle||1.5 GHz|
|Theor. peak performance|
|Per proc. (64-bits)||6 Gflop/s|
|Maximum (64-bits)||1.5 Tflop/s|
|Memory/maximal||≤ 512 GB|
|No. of processors||4—256|
|Aggregate peak/64 proc. frame||44.8 GB/s|
The structure of the Altix 3700 is very similar to that of the SGI Origin systems (see the SGI Origin). The smaller variant of the system, the Altix 3300 is not discussed here. Like the Origin systems the Altix has so-called C-bricks that contains boards with four Itanium 2 processors, 2 memory modules, two I/O ports, and two ASICs called SHUBs. Each SHUB connects to a memory module, an I/O port, and a shared path to two processors. In addition the 2 SHUBs are connected to each other by 6.4 GB/s link. The bandwidth of the memory modules and the I/O ports to the SHUbs are 10.2 and 2.4 GB/s, respectively. For the connection to the other bricks the same routers and network as in the Origin 3000 systems are used: the so-called Numalink3 network with a bi-section bandwidth of 25.6 GB/s. Like the Origin, the Altix is a ccNUMA system which means that the address space is shared between all processors (although it is physically distributed and therefore not uniformly accessible). Note that the bandwidth within the nodes is higher than for the off-board connections. On the boards the new Numalink4 technology is employed.
SGI does not provide its own suite of compilers. Rather it distributes the Intel compilers for the Itanium processors. Also the operating system is Linux and not IRIX, SGI's own Unix flavour. SGI is developing its cluster file system CXFS to run on Linux and will be available shortly.
The 64-processor frames can again be coupled with Numalink3 connections, making them effectively a cluster of Altix systems. Up to 4 frames can be presented in a single-system image making it into a 256-processor system with a peak performance of 1.5 Tflop/s. So OpenMP programs with up to 256 processes can be run. On larger configurations, because Numalink allows remote addressing, one can, apart from MPI also employ the Cray-style shmem library for one-sided communication. It is expected that SGI will extend the number of processors within a single system image in the very near future.
In the TOP 500 list, , a complex of 8 Altix 64-processor frames attained a speed of 2439 Gflop/s solving a 252,960-order linear system. The efficiency for this complex is 79%.