In plasma PIC codes, the orbits of the many interacting plasma electrons and ions are followed as an initial value problem as the particles move in self-consistently calculated electromagnetic fields. The fields are found by solving Maxwell's equations, or a subset, with the plasma currents and charge density as source terms; the electromagnetic fields determine the forces on the particles. In a PIC code, the particles can be anywhere in the simulation domain, but the field equations are solved on a discrete grid. At each time step in a PIC code, there are two stages to the computation. In the first stage, the position and velocities of the particles are updated by calculating the forces on the particles from interpolation of the field values at the grid points; the new charge and current densities at the grid points are then calculated by interpolation from the new positions and velocities of the particles. In the second stage, the updated fields are found by solving the field equations on the grid using the new charge and current densities. Generally, the first stage accounts for most of the computation time because there are many more particles than grid points.

The GCPIC algorithm [Liewer:89c] is designed to make the most computationally intensive portion of a PIC code, which updates the particles and the resulting charge and current densities, run efficiently on a parallel processor. The time used to make these updates is generally on the order of 90% of the total time for a sequential code, with the remaining time divided between the electromagnetic field solution and the diagnostic computations.

To implement a PIC code in parallel using the GCPIC algorithm, the physical domain of the particle simulation is partitioned into subdomains, equal in number to the number of processors, such that all subdomains have roughly equal numbers of particles. For problems with nonuniform particle densities, these subdomains will be of unequal physical size. Each processor is assigned a subdomain and is responsible for storing the particles and the electromagnetic field quantities for its subdomain and for performing the particle computations for its particles. For a one-dimensional code on a hypercube, nearest-neighbor subdomains are assigned to nearest-neighbor processors. When particles move to new subdomains, they are passed to the appropriate processor. As long as the number of particles per subdomain is approximately equal, the processors' computational loads will be balanced. Dynamic load balancing is accomplished by repartitioning the simulations domain into subdomains with roughly equal particle numbers when the processor loads become sufficiently unbalanced. The computation of the new partitions, done in a simple way using a crude approximation to the plasma density profile, adds very little overhead to the parallel code.

The decomposition used for dividing the particles is termed the *primary decomposition*. Because the primary decomposition is not
generally the optimum one for the field solution on the grid, a *secondary decomposition* is used to divide the field computation.
The secondary decomposition remains fixed. At each time step, grid
data must be transferred between the two decompositions
[Ferraro:90b;93a],
[Liewer:89c].

The GCPIC algorithm has led to a very efficient parallel implementation of the benchmarked one-dimensional electrostatic PIC code [Liewer:89c]. In this electrostatic code, only forces from self-consistent (and external) electric fields are included; neither an external nor a plasma-generated magnetic field is included.

Wed Mar 1 10:19:35 EST 1995