The HP/Convex Exemplar SPP-2000.

Next: The IBM 9076 SP2 Up: Distributed-memory MIMD systems Previous: The Hitachi SR2201 series.

The HP/Convex Exemplar SPP-2000.

Machine type RISC-based distributed-memory multi-processor
Models SPP-2000
Operating system SPP-UX, based on OSF/1 AD microkernel
Connection structure Ring
Compilers Fortran, C
Vendors information Web page http://www.hp.com/go/techservers

Machine type	RISC-based distributed-memory multi-processor
Models	SPP-2000
Operating system	SPP-UX, based on OSF/1 AD microkernel
Connection structure	Ring
Compilers	Fortran, C
Vendors information Web page	http://www.hp.com/go/techservers

System parameters:

Model SPP-2000K SPP-2000S SPP-2000X
Clock cycle 5.55 ns 5.55 ns 5.55 ns
Theor. peak performance
Per proc. (64-bit) 720 Mflop/s 720 Mflop/s 720 Mflop/s
Maximal (64-bit) 2.9 Gflop/s 11.5 Gflop/s 46.8 Gflop/s

Memory/node <=1 GB <=1 GB <=1 GB
Main memory <=4 GB <=16 GB <=64 GB
Communication bandwidth
aggregate (see remarks) 3.84 GB/s 15.4 GB/s 15.4/3.84 4GB/s

No. of processors 1-4 4-16 16-64

Model	SPP-2000K	SPP-2000S	SPP-2000X
Clock cycle	5.55 ns	5.55 ns	5.55 ns
Theor. peak performance
Per proc. (64-bit)	720 Mflop/s	720 Mflop/s	720 Mflop/s
Maximal (64-bit)	2.9 Gflop/s	11.5 Gflop/s	46.8 Gflop/s
Memory/node	<=1 GB	<=1 GB	<=1 GB
Main memory	<=4 GB	<=16 GB	<=64 GB
Communication bandwidth
aggregate (see remarks)	3.84 GB/s	15.4 GB/s	15.4/3.84 4GB/s
No. of processors	1-4	4-16	16-64

Remarks:

The SPP-2000 systems form the family of successors of the SPP-1200/1600. There are significant differences with respect to the preceding SPP-1200 generation. The SPP-2000K and S are shared memory machines connecting their maximally 4 and 16 PA-RISC 8000 processors, respectively, by a crossbar. Each processor has a peak performance of 720 Mflop/s and because the processors feature out-of-order execution of instructions it may be expected that memory latency effects can be evaded or diminished in a good many cases. This should make the impact of cache misses much less severe. Data and instruction caches are large (1 MB both) which also will help in minimising cache misses.

One SPP-2000S can be viewed as the successor of a hypernode in the earlier SPP-1200/SPP-1600 systems. As such the number of processors within a hypernode has doubled. Also the amount of memory per system has increased 8-fold from 8\tm256 MB to 16\tm 1 GB. The internal aggregate bandwidth is 15.36 GB/s for the 2000S and 3.84 GB/s for the 2000K. I/O can be done at an aggregate rate of 960 MB/s.

As in the earlier SPP-1200/1600 systems, the hypernodes are connected by uni-directional SCI rings with an aggregate bandwidth of 3.84 GB/s. This makes the SPP-2000X a NUMA machine when operates in a shared memory fashion.

The Exemplar programming environment as was available for the SPP-1200/1600 carries over to the SPP-2000K/S/X without changes. This environment includes a message passing programming model (PVM) and a virtual shared memory model which allows the user to have a shared-memory view of the system. Of course the shared memory model is not surprising for a symmetrical multiprocessor machine like the SPP-2000S but it is still valid in the SPP-2000X systems which effectively clusters four SPP-2000S systems.

Measured Performances: In [4] a speed of 7.8 \gfl is reported for a 16 proc. system when solving a 13,320-order dense linear system. For the EuroBen mod2a matrix-vector multiplication benchmark a speed of 417 Mflop/s is found on 16 processors. This is however for straight Fortran 77 code with PVM and without the use of library routines.

Next: The IBM 9076 SP2 Up: Distributed-memory MIMD systems Previous: The Hitachi SR2201 series.

Aad van der Steen
Tue Mar 4 16:23:28 MET 1997