|Machine type||Distributed-memory multi-vector processor.|
|Operating system||Linux (kernel 2.4.21 with Cray HPC enhancements).|
|Connection structure||Variable (see remarks).|
|Compilers||Fortran 90, C, C++.|
|Vendors information Web page||www.cray.com/products/xd1/|
|Year of introduction||2004.|
|Clock cycle||2.2 GHz|
|Theor. peak performance|
|Per Chassis (see remarks)||52.8+ Gflop/s|
|Per Rack (see remarks)||663+ Gflop/s|
|Per Chassis||96 GB|
|Per Rack||1.2 TB|
|No. of processors|
|Point-to-point||≤ 2.9 GB/s|
|Aggregate per Chassis||96 GB/s|
The Cray XD1 is a product that was originally developed by Octigabay until this company was taken over by Cray. A distinctive factor in the Octigabay systems was the possibility to add FPGAs (see Glossary) to the compute boards of the systems to accelerate algorithms that are of special interest to the user, like massive FFTs or DNA sequence alignments. Hence the plus symbols in the entries for the Theorectical Peak Performance in the System Parameters list above. Cray turned the system into a product by adding its special communication networking capability to connect the compute boards and the nodes, called “chassis” by Cray by means of its proprietary Rapid Array Network.
The general structure of an XD1 is as follows: one chassis houses up to 6 compute cards. Each compute card has 2 AMD Opterons at 2.2 GHz and one or two RapidArray Processors (RAPs) that handle the communication. The two Opterons on a card are connected via AMD's HyperTransport with a bandwidth of 3.2 GB/s forming a 2-way SMP. Because of the high bandwidth of the HyperTransport bus the memory access does not suffer from using two processors on a board, unlike in most 2 processor/node clusters. Optionally an application acceleration processor (FPGA) can be put onto a compute board. With 2 RAPs/board a bandwidth of 8 GB/s (4 GB/s bi-directional) between boards is available via a RapidArray switch. This switch has 48 links of which half is used to connect to the RAPs on the compute boards within the chassis and the others can be used to connect to other chassis. Twelve chassis fit into a standard rack and because of the number of free links per RapidArray switch the chassis in two racks may be connected directly. Of course larger configurations can be put together by connecting the links in a more sparsely connected network, like a 3-D torus or a fat tree.
The RAPs offload the Opteron processors from communication tasks and have
hardware support for MPI, Cray-style shmem and Global Arrays (a
virtual shared memory system). The communication characteristics for MPI via
the RapidArray network as stated by Cray are impressive: 2.9 GB/s for long
messages and a 1.6 µs latency for small messages.
An extra feature of the Cray-enhanced Linux OS is the synchronisation of tasks in the system. The random scheduling of tasks within the system (by the OS or otherwise) can result in large latencies (see ) that may be detrimental to the MPI performance. By task synchronisation this problem can be evaded.
Measured Performances: The Cray XD1 is quite new and as yet no independent performance results are available.