The first company who has announced a system fulfilling all requirements of the architecture is NEC corporation with their new SX-4 series. The SX-4 processor is upward compatible to the SX-3R vector processor with enhancements for scalar processing, short vector processing, and parallel processing. Each processor has a peak performance of 2 Gflop/s. From 1 to 32 of these processors are combined into a single shared memory node with uniform memory access speed. The shared memory is made from very fast synchronous SRAM components, which allow a bank refresh cycle of 62.5 MHz. This means that the shared memory can sustain a performance of 512 GByte/s, or 64 GWord/s (8 Byte Words) with only 256 memory banks. Conflict free unit stride as well as stride 2 access is therefore guaranteed from all 32 processors simultaneously. Higher strides and list vector access benefit from the very short bank cycle time. Up to 16 of these shared memory processing node can be combined through a fiber optic crossbar with a bisection bandwith of 128 GByte/s. A full SX-4 configuration then consists of 512 processors with a total memory bandwith of more than 8 TByte/s, namely 8 Tbyte/s from node memories to arithmetic pipelines, 128 GByte/s bisection bandwith to other node memories and 192 GByte/s from node memories to I/O. The internode crossbar supports global hardware addressing. The hardware specifications seem to indicate that such a full configuration would be a good candidate to sustain a processing power close to 1 Tflop/s. The system is announced as a parallel computer with multithreaded Unix operating system software and compilers that allow parallel processing on shared and distributed memory in a multi-user environment.