The SGI Origin3900.

Next: The Sun Fire E25K. Up: Recount of (almost) available ... Previous: The SGI Altix 3000 series.

The SGI Origin3900.

Machine type RISC-based ccNUMA system.
Models Origin 3900
Operating system IRIX (SGI's Unix variant)
Connection structure Crossbar, hypercube (see remarks)
Compilers Fortran 77, Fortran 90, C, C++ , ADA, Pascal
Vendors information Web page www.sgi.com/origin/3000/overview.html
Year of introduction 2003 (With new R16000-based nodes).

Machine type	RISC-based ccNUMA system.
Models	Origin 3900
Operating system	IRIX (SGI's Unix variant)
Connection structure	Crossbar, hypercube (see remarks)
Compilers	Fortran 77, Fortran 90, C, C++ , ADA, Pascal
Vendors information Web page	www.sgi.com/origin/3000/overview.html
Year of introduction	2003 (With new R16000-based nodes).

System parameters:

Model Origin 3900
Clock cycle 800 MHz
Theor. peak performance
Per proc. (64-bits) 1.6 Gflop/s
Maximum (64-bits) 819 Gflop/s
Main memory
Memory/maximal ≤ 1 TB
No. of processors 16-512
Communication bandwidth
Point-to-point 1.6 GB/s
Aggregate peak 717 GB/s

Model	Origin 3900
Clock cycle	800 MHz
Theor. peak performance
Per proc. (64-bits)	1.6 Gflop/s
Maximum (64-bits)	819 Gflop/s
Main memory
Memory/maximal	≤ 1 TB
No. of processors	16-512
Communication bandwidth
Point-to-point	1.6 GB/s
Aggregate peak	717 GB/s

Remarks:

By July 2000 has passed from its Origin2000 series to its new Origin3000 series comprised of the Origin3200, Origin3400, and Origin3800 models. The smaller intermediate models have be dropped for the Origin 350 midrange server. In the high-end range the Origin3900 is now the only representative. Many of the characteristics of the Origin2000 have been retained of which the most important is its ccNUMA character. The processor used is presently the MIPS R16000, a direct successor of the R14000s in the Origin2000 systems. The R16000 is identical to the R14000 processor, save for the clock frequency that has gone from 600 MHz to 800 MHz.

SGI has further modularised the Origin3900 in comparison with its predecessor. A system contains so-called C-bricks, CPU boards with 2—4 processors and a router chip connecting the on-board memory with the processors to router boards called R-bricks for communication with the rest of the system and to I-bricks that contain disks, PCI expansion slots, etc. and that together make up the I/O sub-system of the machine. Recently an ultra-dense Cx-brick has been introduced that packs 16 R16000s in a 4U-height brick. This is possible because of the very low power consumption of the MIPS processors. Using Cx bricks therefore can greatly diminish the amount of floorspace needed.
The basic hardware bandwidth within a C-brick is 1.6 GB/s from the router chip to one pair of CPUs, 3.2 GB/s from memory to the router chip (2×1.6 GB/s full duplex). The same bandwidth is available for inter-node communication. The off-board I/O bandwidth is 2.4 GB/s (2×1.2 GB/s full duplex). The R-brick can be connected to 16 C-bricks and it has 8 ports to connect it to other R-bricks. So, 128 C-bricks or 512 processors can maximally be interconnected in this way.

The machine is a typical representative of the ccNUMA class of systems. The memory is physically distributed over the node boards but there is one system image. Because of the structure of the system, the bi-sectional bandwidth of the system remains constant from 8 processors on: 210 GB/s. This is a large improvement over the earlier Origin2000 systems where the bi-sectional bandwidth was 82 GB/s.

Parallelisation is done either automatically by the (Fortran or C) compiler or explicitly by the user, mainly through the use of directives. All synchronisation, etc., has to be done via memory. This may cause potentially a fairly large parallelisation overhead. Also a message passing model is allowed on the Origin using the optimised SGI versions of PVM and MPI, and the SGI/Cray-specific shmem library. Programs implemented in this way will possibly run very efficiently on the system.

A nice feature of the Origins is that it may migrate processes to nodes that should satisfy the data requests of these processes. So, the overhead involved in transferring data across the machine are minimised in this way. The technique is reminiscent of the late Kendall Square Systems although in these systems the data were moved to the active process. SGI claims that the time for non-local memory references is on average about 2 times longer than for local memory references, an improvement of 50% over the Origin2000 series.

Measured Performances:
No performance figures for the 800 MHz-based systems are available but in
Next: The Sun Fire E25K. Up: Recount of (almost) available ... Previous: The SGI Altix 3000 series.

Aad van der Steen
Wed Oct 13 15:16:36 CEST 2004