|Machine type||RISC-based shared-memory multi-processor|
|Operating system||Solaris (Sun's Unix flavour)|
|Connection structure||Crossbar (see remarks)|
|Compilers||Fortran 77, Fortran 90, C, C++|
|Vendors information Web page||http://www.sun.com/servers/highend/sunfire_e25k/index.xml|
|Year of introduction||2004.|
|Clock cycle||1.2 GHz|
|Theor. peak performance|
|Per proc. (64-bit)||4.8 Gflop/s|
|Maximal (64-bit)||345.6 Gflop/s|
|Main memory||≤ 576 GB|
|Number of processors||≤ 72|
|Aggregate||<= 172.8 GB/s|
In the E25K is the successor of the 3800-15K system. It employs the newest UltraSPARC IV processors at a clock frequency of 1.2 GHz. The structure of the systems is exactly the same as for its predecessor. In fact, the backplane is identical and one can turn one system into the other by just exchanging the processor boards. The processor/memory boards are plugged into a backplane that is an 18×18 flat crossbar. Each board contains four 1.2 GHz UltraSPARC IV processors and a maximum of 32 GB of memory. So, normally the maximum number of processors would 72. Where in the 3800-15K there is the possibility to use slots of an additional crossbar to plug in extra 2-CPU boards to extend the computational power at the cost of I/O capacity, this possibility is not offered for th3 E25K system. Because of the flat crossbar memory access is uniform and the aggregate bandwidth of the crossbar is 172.8 GB/s. This is equivalent to 2.4 GB/s/processor or 2 B/cycle. So, an 8-byte operand needs 4 cycles to be shipped to the processor.
The main (almost only) difference with its predecessor is that the E25K system
employs the dual-core UltraSPARC IV processors.
This formally doubles the peak performance per processor, although because of
bandwidth constraints the effective performance increase will be lower. Sun
maintains a somewhat confusing terminology with respect to multi-threading: it
uses the term to express the fact that the two processor cores on an UltraSPARC
IV chip each do their processing independently. It calls this chip
multi-threading (CMT). Yet the cores are not capable of switching between
process threads as what is normally understood by multi-threading.
The E25K is a typical SMP machine with provisions for shared-memory parallelism with OpenMP over the full range of processors in the system.
In  a speed of 891.4 Gflop/s is reported for a 7-way cluster with a total of 672 processors in solving a dense linear system of unspecified size. The efficiency for this problem is 74%.