MPI includes variants of each of the reduce operations where the result is scattered to all processes in the group on return.
MPI_REDUCE_SCATTER( sendbuf, recvbuf, recvcounts,
datatype, op, comm)
[ IN sendbuf] starting address of send buffer (choice)
[ OUT recvbuf] starting address of receive buffer (choice)
[ IN recvcounts] integer array specifying the number of elements in result distributed to each process. Array must be identical on all calling processes.
[ IN datatype] data type of elements of input buffer (handle)
[ IN op] operation (handle)
[ IN comm] communicator (handle)
int MPI_Reduce_scatter(void* sendbuf, void* recvbuf, int *recvcounts, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm)
MPI_REDUCE_SCATTER(SENDBUF, RECVBUF, RECVCOUNTS, DATATYPE, OP, COMM, IERROR)
<type> SENDBUF(*), RECVBUF(*)
INTEGER RECVCOUNTS(*), DATATYPE, OP, COMM, IERROR
MPI_REDUCE_SCATTER first does an element-wise reduction on vector of elements in the send buffer defined by sendbuf, count and datatype. Next, the resulting vector of results is split into n disjoint segments, where n is the number of members in the group. Segment i contains recvcounts[i] elements. The ith segment is sent to process i and stored in the receive buffer defined by recvbuf, recvcounts[i] and datatype.
 Advice to implementors.
routine is functionally equivalent to:
A MPI_REDUCE operation function with count equal to
the sum of recvcounts[i] followed by
MPI_SCATTERV with sendcounts equal to recvcounts.
However, a direct implementation may run faster.
( End of advice to implementors.)