|Machine type||Processor array|
|Front-end||Almost any Unix workstation|
|Operating system||Internal OS transparent to the user, Unix on front-end|
|Connection structure||3-D mesh, (see remarks)|
|Compilers||TAO: a Fortran 77 compiler with some Fortran 90 and some proprietary array extensions|
|Vendors information Web page||http://www.quadrics.com|
|Year of introduction||1999.|
|Clock cycle||267 MHz|
|Theor. peak performance|
|Per Proc. (32-bits)||533 Mflop/s|
|Maximal (32-bits)||1 Tflop/s|
|Memory||<= 64 GB|
|No. of processors||8-2048|
The Apemille is a commercial spin-off of the APE-1000 project of the Italian National Institute for Nuclear Physics and a successor to the APE-100 systems. The systems are available in multiples of 8 processor nodes where up to 16 boards can be fitted into one crate or in multiples of 128 nodes by adding up to 15 crates to the minimal 1-crate system. The interconnection topology of the Quadrics is a 3-D grid with interconnections to the opposite sides (so, in effect a 3-D torus). The 8-node floating-point boards (FPBs) are plugged into the crate backplane which provides point-to-point communication and global control distribution. The FPBs are configured a 2³ cubes that are connected to the other boards appropriately to arrive at the 3-D grid structure.
The basic floating-point processor, the so-called MAD chip, contains a register file of 128 registers. Of these registers the first two hold permanently the values 0 and 1 to be able to express any addition or multiplication as a ``normal operation'', i.e., a combined multiply-add operation, where an addition is of the form, a×b+0 and a multiplication is a×1+b. In favourable circumstances the processor can therefore deliver two floating-point operations per cycle. Instructions are centrally issued by the controller at a rate of one instruction every two clock cycles.
Communication is controlled by the Memory Controller and the Communication Controller which are both housed on the backplane of a crate. When the Memory Controller generates an address it is decoded by the Communication Controller. In case non-local access is desired, the Communication Controller will provide the necessary data transmission. The memory bandwidth per processor is not disclosed in the documentation, nor the bandwidth for non-local communication. Regrettably, Quadrics provides no details on local or global communication speeds whatsoever.
The Apemille communicates with the front-end system via a PCI adapter card and should therefore have a bandwidth of about 100 MB/s. The actual speed is not specified, however. The interface can write and read the memories of the nodes and the Controller. I/O and should have a bandwidth up to 8.5 GB/s according to the documentation.
The TAO language has several extensions to employ the SIMD features of the Quadrics. Firstly, floating-point variables are assumed to be local to the processor that owns them, while integer variables are assumed to be global. Local variables can be promoted to global variables. Other extensions are the ANY, ALL, and WHERE/END WHERE keywords that can be used for global testing and control. Processors that not meet a global condition effectively skip the operation(s) that are associated with it. For easy referencing nearest-neighbour locations special constants LEFT, RIGHT, UP, DOWN, FRONT, and BACK are provided. In addition, new data types and operators on these data types are supported together with overloading of operators. This enables very concise code for certain types of calculations.
Measured performances: No measured performances have been reported for this machine.