next up previous contents
Next: The Fujitsu/Siemens PRIMEPOWER. Up: Recount of (almost) available ... Previous: The Cray Inc. X1.

The Cray Inc. XD1.

Machine type Distributed-memory multi-vector processor.
Models XD1.
Operating system Linux (kernel 2.4.21 with Cray HPC enhancements).
Connection structure Variable (see remarks).
Compilers Fortran 90, C, C++.
Vendors information Web page www.cray.com/products/xd1/
Year of introduction 2004.

System parameters:

Model Cray XD1
Clock cycle 2.2 GHz
Theor. peak performance
Per Chassis (see remarks) 52.8+ Gflop/s
Per Rack (see remarks) 663+ Gflop/s
Memory
Per Chassis 96 GB
Per Rack 1.2 TB
No. of processors
Per Chassis 12
Per Rack 144
Communication bandwidth
Point-to-point ≤ 2.9 GB/s
Aggregate per Chassis 96 GB/s

Remarks:

The Cray XD1 is a product that was originally developed by Octigabay until this company was taken over by Cray. A distinctive factor in the Octigabay systems was the possibility to add FPGAs (see Glossary) to the compute boards of the systems to accelerate algorithms that are of special interest to the user, like massive FFTs or DNA sequence alignments. Hence the plus symbols in the entries for the Theorectical Peak Performance in the System Parameters list above. Cray turned the system into a product by adding its special communication networking capability to connect the compute boards and the nodes, called “chassis” by Cray by means of its proprietary Rapid Array Network.

The general structure of an XD1 is as follows: one chassis houses up to 6 compute cards. Each compute card has 2 AMD Opterons at 2.2 GHz and one or two RapidArray Processors (RAPs) that handle the communication. The two Opterons on a card are connected via AMD's HyperTransport with a bandwidth of 3.2 GB/s forming a 2-way SMP. Because of the high bandwidth of the HyperTransport bus the memory access does not suffer from using two processors on a board, unlike in most 2 processor/node clusters. Optionally an application acceleration processor (FPGA) can be put onto a compute board. With 2 RAPs/board a bandwidth of 8 GB/s (4 GB/s bi-directional) between boards is available via a RapidArray switch. This switch has 48 links of which half is used to connect to the RAPs on the compute boards within the chassis and the others can be used to connect to other chassis. Twelve chassis fit into a standard rack and because of the number of free links per RapidArray switch the chassis in two racks may be connected directly. Of course larger configurations can be put together by connecting the links in a more sparsely connected network, like a 3-D torus or a fat tree.

The RAPs offload the Opteron processors from communication tasks and have hardware support for MPI, Cray-style shmem and Global Arrays (a virtual shared memory system). The communication characteristics for MPI via the RapidArray network as stated by Cray are impressive: 2.9 GB/s for long messages and a 1.6 µs latency for small messages.
An extra feature of the Cray-enhanced Linux OS is the synchronisation of tasks in the system. The random scheduling of tasks within the system (by the OS or otherwise) can result in large latencies (see [29]) that may be detrimental to the MPI performance. By task synchronisation this problem can be evaded.

Measured Performances: The Cray XD1 is quite new and as yet no independent performance results are available.




next up previous contents
Next: The Fujitsu/Siemens PRIMEPOWER. Up: Recount of (almost) available ... Previous: The Cray Inc. X1.

Aad van der Steen
Tue Oct 12 16:26:22 CEST 2004