|Machine type||ccNUMA system.|
|Models||NovaScale 5080, 5160.|
|Operating system||Linux, WindowsServer 2003, GCOS 8|
|Connection structure||Full crossbar|
|Compilers||Intel's Fortran 95, C(++)|
|Vendors information Web page||http://www.bull.com/novascale/|
|Year of introduction||2002.|
|Model||NovaScale 5080||NovaScale 5160|
|Clock cycle||1.5 GHz||1.5 GHz|
|Theor. peak performance||48 Gflop/s||96 Gflop/s|
|No. of processors||8||16|
|Point-to-point||6.4 GB/s||6.4 GB/s|
|Aggregate||12.8 GB/s||25.6 GB/s|
The availability of the Itanium 2 has spurred some vendors that are traditionally not active in the HPC business to try their hand in this area. One of these is Bull that markets its NovaScale ccNUMA SMPs with up to 16 nodes. The NovaScale systems are built from standard Intel Quad Building Blocks (QBBs) each housing 4 Itanium 2 processors and a part of the memory. The QBBs in turn are connected by Bull's proprietary FAME Scalability Switch (FSS) providing an aggregate bandwidth of 25.6 GB. For reliability reasons a NovaScale 5160 is equipped with 2 FSSes. This ensures that when any link between a QBB and a switch or between switches fails the system is still operational, be it on a lower communication performance level. As each FSS has 8 ports and only 6 of these are occupied within a 5160 system, the remaining ports can be used to couple two of these systems thus making a 32-processor ccNUMa system. Larger configurations can be made by coupling systems via QsNet II (see section QsNet). Bull provides its own MPI implementation which turns out to be very efficient (see "Measured Performances" below and ).
A distinctive feature of the NovaScale systems is that they can be partitioned such that different nodes can run different operating systems and that repartitioning can be done dynamically. Although this is not particularly enticing for HPC users, it might be interesting for other markets, especially as Bull still has clients that use their proprietary GCOS operating system.
A smaller system, the NovaScale 4040, with 4 processor is also available as a departmental server. As Bull employs the Itanium 2, the Fortran 95 and C compilers from Intel are automatically available. Bull's documentation gives no information about other HPC software that might be available but it should have all third-party software that has been ported to the Itanium 2 platform.
In the spring of 2004 rather extensive benchmark experiments with the EuroBen Benchmark were performed on a 16-processor NovaScale 5160 with the 1.3 GHz variant of the processor. Using the EuroBen benchmark, the MPI version of a dense matrix-vector multiply was found to be 13.3 Gflop/s on 16 processors while both for solving a dense linear system of size N = 1,000 and a 1-D FFT of size N = 65,356 speeds of 3.3—3.4 Gflop/s are observed (see ).