From owner-pbwg-compactapp@CS.UTK.EDU Fri May 21 08:42:24 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-UTK)
	id AA03711; Fri, 21 May 93 08:42:24 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA15863; Fri, 21 May 93 08:42:58 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Fri, 21 May 1993 08:42:58 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA15855; Fri, 21 May 93 08:42:56 -0400
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA18681; Fri, 21 May 1993 08:42:55 -0400
Date: Fri, 21 May 1993 08:42:55 -0400
From: walker@rios2.epm.ornl.gov (David Walker)
Message-Id: <9305211242.AA18681@rios2.epm.ornl.gov>
To: pbwg-compactapp@cs.utk.edu
Subject: Compact applications


Dear Compact Applications People,

At last I have roughed out some notes on compact applications to serve as
a discussion for next weeks meeting in Knoxville. See you there,

David
------------------ Latex file below --------------------------------
%file: compac2.tex
\chapter{Compact Applications}
\footnote{assembled by David Walker for Compact Applications subcommittee}

\section{Introduction}
\label{sec:compact.intro}
While kernel applications, such as those described in Chapter 4, provide
a fairly straightforward way of assessing the performance the parallel
systems they are not representative of scientific applications in general
since they do not reflect certain types of system behavior. In particular,
many scientific applications involve data movement between phases of
an application, and may also require significant amounts of I/O. These types
of behavior are difficult to gauge using kernel applications. 

One factor
that has hindered the use of full application codes for benchmarking parallel
computers in the past is that such codes are difficult to parallelize and to
port between target architectures. In addition, full application codes that
have been successfully parallelized are often proprietary, and/or subject
to distribution restrictions. To minimize the negative impact of these factors
we propose to make use of compact applications in our benchmarking effort.

Compact applications are typical of those found in research environments 
(as opposed to production or engineering environments), and usually consist of 
up to a few thousand lines of source code. Compact applications are distinct 
from kernel applications since they are capable of producing scientifically
useful results. In many cases, compact applications are made up of several
kernels, interspersed with data movements and I/O operations between the 
kernels.

In this chapter we will discuss a number of compact applications in terms of 
their purpose, the algorithms used, the types of data movements required, 
the memory requirements, and
the amount of I/O. The compact application below are not meant to form a 
definite or complete list.

\section{Proposed Compact Application Benchmarks}
\label{sec:compact.proposed}
To ensure that those areas of scientific computing that make the most use of
high performance computers are adequately represented in the benchmark
suite we shall classify compact applications by scientific field.

\subsection{Plasma Physics}
\label{subsec:plasmas}
Plasma physics is a large consumer of high performance computer cycles. Among
the areas studied are the design of tokamaks, high power microwave devices, and 
astrophysical plasmas. It would be nice to have a compact application from 
each of these three fields in the benchmark suite. Currently we have Hockney's
device simulation, LPM1, from the GENESIS suite.

\subsubsection{Electronic Device Simulation with LMP1}
\label{subsubsec:lpm1}
LMP1 is a time dependent simulation of an electronic device
using a particle-mesh or PIC-type algorithm. It uses a two-dimensional
$(r,z)$ geometry with the fields being computed on a regular mesh
of size $33\times 75\cdot\alpha$, where $alpha$ is a size parameter that can
take the value 1, 2, 4, and 8, corresponding to runs with between about 700 and
6000 particles.

\subsection{Quantum Chromodynamics}
\label{subsubsec:qcd}
Quantum Chromodynamics (QCD) is the gauge theory of the strong
interaction which binds quarks and gluons into hadrons, which make up the
constituents of nuclear matter. Analytical perturbation methods can be applied
to QCD only at high energies, hence computer simulations are necessary to study
QCD at lower, more realistic, energies. In these lattice gauge theory
simulations the quantum field is discretized onto a periodic, four-dimensional,
space-time lattice. Quarks are located at the lattice sites, and the gluons
that bind them are associated with the lattice links. The gluons are
represented by SU(3) matrices, which are a particular type of $3\!\times\! 3$
complex matrix. A major component of the QCD code involves updating these
matrices.

\subsubsection{Quenched QCD}
\label{subsubsec:quenched}
The QCD code in the Perfect benchmark suite is derived from the work of
Fox, Flower, Otto, and Stolorz at Caltech. The Perfect QCD code uses the 
Cabbibo-Marinari pseudo heat bath algorithm to update the SU(3) matrices on
the lattice links. This algorithm uses a Monte Carlo technique to generate a 
chain of configurations which are distributed with a probability proportional
to $\exp{(-S(U))}$, where $S(U)$ is the action of the configuration $U$.
If the only contributions to the action come from the gauge field then
the action is local. The inclusion of dynamical fermions gives rise to a
nonlocal action. This code ignores the effects of dynamical fermions, and so
represents a pure-gauge model in the quenched approximation.

A major component of this QCD code is the updating of the SU(3) matrices
associated with each link in the lattice, and it is this operation which
is benchmarked in the Perfect timings. Two basic operations are involved in
updating the lattice. The first is the multiplication of SU(3) matrices,
and the second is the generation of pseudo-random numbers.

\subsubsection{Genesis QCD}
\label{subsubsection:dynamical}
Is the Genesis benchmark QCD1 similar to the Caltech QCD code. Which one
should be used?

\subsection{General Relativity}
\label{subsec:gr}
\subsubsection{Evolution of Gravitational Field}
The Genesis code GR1 solves a system of hyperbolic PDEs, derived from general
relativity which describe the evolution of a gravitational field from an
initial state. Although conceptually similar to the solution of the wave
equation the equations are long and complicated. This application solves the
axisymmetric problem to reduce the problem to manageable size. Solution of
the general problem requires three orders of magnitude more compute power,
and is likely to become of substantial interest as more powerful parallel
machines are developed.

\subsubsection{Quantum Theory of Gravity}
\label{subsec:gravity}
This code, which derives from the work of Sorkin and Daughton of
Syracuse University, is part of an effort to provide a
satisfactory quantum theory of gravity by the use of causal set
theory$\ldots$whatever that is. The main computational task is the LU
factorization of large, dense matrices ($10000\times 10000$).

\subsection{Climate and Weather Prediction}
\label{subsec:climate}
Mesoscale weather prediction and global climate modeling have become
important application areas in recent years. They typically involve the
solution of nonlinear PDEs.

\subsubsection{Spectral Solver for the Shallow Water Equations}
\label{subsubsec:swe}
The spectral transform method 
is the standard numerical technique
used to solve partial differential equations on the sphere in
global climate modeling. For example, it is used in CCM1 
(the Community Climate Model 1), and its successor CCM2.
The solution of the shallow water equations on a sphere constitutes an 
important component in such global climate models.
The SSWMSB code uses the spectral transform method to solve the shallow water
equations on the surface of a sphere which is discretized as a regular
longitude-latitude grid. In each timestep the state variables of 
the problem are transformed
between the physical domain, where most of the physical forces are calculated,
and the spectral domain, where the terms of the differential equation
are evaluated. This transformation involves first the evaluation of FFTs along
lines of constant latitude, followed by Legendre integration (i.e., weighted
summation) over longitude.

\subsubsection{Helmholtz Solvers for Meteorological Modeling}
\label{subsubsec:helmholtz}
The Genesis suite includes two meteorological applications based on 
Helmholtz solvers. One uses a pseudo-spectral solution method, and the other
a multigrid algorithm.

\subsection{Molecular Dynamics}
\label{subsec:moldyn}

\subsubsection{Dislocation Studies in Crystals}
\label{subsubsec:dislocation}
In parallel Fortran 77 plus message passing code has been developed at ORNL to 
study dislocation phenomena in crystals. This three-dimensional code divides
space into cells, with each processor being assigned a rectangular block of
cells. Each cell contains a set of particles. Communication is necessary to
exchange particles lying in cells on the boundary of a processor with a
neighboring processor. Particles must also be migrated between processors
as they move in space.

\subsubsection{The Genesis Molecular Dynamics Code}
\label{subsubsec:genesis_md}
I don't know much about this, but I expect it's similar to the ORNL code.

\subsubsection{The PERFECT Molecular Dynamics Code}
\label{subsubsec:perfect_md}
The Perfect benchmark suite included two molecular dynamics code, both of
which use data sets that are too small to be used to evaluate current
parallel computers. BDNA which simulates the hydration structure of potassium
counterions and water in a B-DNA molecule, involves 1500 water molecules and
20 counterions. MDG performs a molecular dynamics calculation on 343 water
molecules in the liquid state.

\subsection{Geophysics}
Two important geophysics computations are flow through porous media and
seismic migration. The Perfect suite includes a seismic migration code,
MG3D. This code is dominated by FFTs. A parallel code for modeling groundwater
flow is under development at ORNL and may be a good code to include in the
suite as an example of a flow through porous media code.

\subsection{Other Codes}
Clearly we would want to include CFD codes, astrophysics codes such as the
tree-based simulations of gravitating systems, quantum chemistry and
superconductor simulations. We also need to include codes from the NAS, NPAC,
PERFECT2, and SLALOM benchmark suites, as well as providing better 
descriptions of the codes above.

\section{Concluding Remarks}
There are probably two or three dozen compact applications that
we might consider for inclusion in the benchmark suite. We should consider
what is a reasonable number of codes to include, and the criteria for
accepting a code in terms of documentation, usefulness, and software quality.


From owner-pbwg-compactapp@CS.UTK.EDU Fri May 21 09:06:07 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-UTK)
	id AA03860; Fri, 21 May 93 09:06:07 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA17282; Fri, 21 May 93 09:06:44 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Fri, 21 May 1993 09:06:43 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from BERRY.CS.UTK.EDU by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA17276; Fri, 21 May 93 09:06:41 -0400
Received: from LOCALHOST.cs.utk.edu by berry.cs.utk.edu with SMTP (5.61++/2.7c-UTK)
	id AA01842; Fri, 21 May 93 09:06:40 -0400
Message-Id: <9305211306.AA01842@berry.cs.utk.edu>
To: walker@rios2.epm.ornl.gov (David Walker)
Cc: pbwg-compactapp@cs.utk.edu
Subject: Re: Compact applications 
In-Reply-To: Your message of "Fri, 21 May 1993 08:42:55 EDT."
             <9305211242.AA18681@rios2.epm.ornl.gov> 
Date: Fri, 21 May 1993 09:06:39 -0400
From: "Michael W. Berry" <berry@cs.utk.edu>

Fellow Compact Applic. Members: Here is a copy of the minutes
from the SPEC/Perfect meeting I attended in Hunstville.  Some
of this information may be useful to PBWG.

Mike B.
---------------------------------------------------------------

                        Draft Minutes: The SPEC Perfect Group
                                   11-13 May 1993

          The Perfect Club Steering Committee voted to merge with the  SPEC
          organization.  The first joint meeting with SPEC occurred  during
          11-13 May 1993.  The original SPEC organization has been modified
          so that  the name  "SPEC" refers  to the  non-profit  corporation
          which acts as  a financial umbrella  for benchmarking  subgroups.
          The original SPEC  group is now  known as the  SPEC Open  Systems
          Group.  The Perfect Club is now known as the SPEC Perfect Group.

          In accordance with the  vote taken by  David Schneider in  April,
          the initial  SPEC Perfect  Steering Committee  includes  Margaret
          Simmons (LANL), George Cybenko(Darmouth), David Schneider (CSRD),
          John Larson (CSRD),  Mike Berry (U.of  Tenn), Satish Rege  (DEC),
          Joanne Martin (IBM), and Philip Tannenbaum (HNSX).  This  meeting
          was attended by David Schneider  (CSRD), Mike Berry (U.of  Tenn),
          Satish Rege  (DEC),  Philip  Tannenbaum  (HNSX),  Leo  Boelhouwer
          (IBM-Kingston,  representing   Joanne   Martin),   Jacob   Thomas
          (IBM-Austin), Larry Gray  (Chairman, SPEC BOD),  and Rod  Skinner
          (Treasurer, SPEC).   Hwa Lai (Fujitsu)  attended as an  observer.
          Various SPEC Open  Systems members  periodically sat  in.   David
          Schneider indicated  that  he  anticipated  Cray  Research  would
          rejoin because of marketing necessity.

          The meeting  began  with David  Schneider,  Larry Gray,  and  Rod
          Skinner presenting the framework for  the merger.  The SPEC  Open
          Systems Group  and  the SPEC  Perfect  Group will  be  autonomous
          subgroups within  SPEC.   SPEC  itself  will act  as  a  business
          umbrella organization.  Each Group will assess dues and  allocate
          budgets independently.   The  overhead which  SPEC Perfect  Group
          will  be  responsible  for   will  include  legal  retainer   and
          accounting fees  for  NCGA,  and additional  costs  of  printing,
          duplication,  distribution,  or  other  services  that  the  SPEC
          Perfect Group may elect  to utilize in the  future.  It was  also
          stated that the  SPEC organization was  flexible on many  issues,
          but the  underlying  requirement  was to  ensure  that  corporate
          non-profit  status  regulations  are  not  violated.    SPEC   is
          incorporated as a non-profit organization in California.

          It was  generally  agreed  by  all that  mutual  trust  would  be
          required from SPEC Open Systems  Group and SPEC Perfect Group  to
          minimize formality and unnecessary bureaucracy.

          The Perfect Group will be given one SPEC BOD seat on a  temporary
          basis until January 1994.  The  SPEC BOD currently consists of  5
          members that  includes HP,  Intel, Sun,  ATT/NCR, and  IBM.   The
          Perfect Group seat will add 1 member to the BOD.  In January 1994
          this 6th BOD  seat will  be open for  voting by  the entire  SPEC
          membership (SPEC Perfect Group and  SPEC Open Systems Group).   A
          discussion about who should fill the temporary SPEC Perfect Group
          BOD seat resulted in agreement  that University people could  not
          practically take the  position because  of travel  expense.   IBM
          already was  represented  on the  SPEC  BOD, so  David  Schneider





          nominated Satish  Rege  (DEC)  and Philip  Tannenbaum  (HNSX)  as
          candidates for  the  BOD  seat.    Leo  Boelhouwer  seconded  the
          nomination  for  Philip  Tannenbaum;  Mike  Berry  seconded   the
          nomination for Satish Rege.   A vote will  be conducted by  email
          on/about 1 June 1993.  The initial  7 Steering Committee  members
          are the eligible voters.

          During June  a  press  announcement about  the  merger  would  be
          jointly written.

          There was discussion about  inclusion of academic and  government
          members.   As  a  result of  SPEC  non-profit  requirements,  all
          members must be  either full members  ($5,000/year) or  associate
          members ($1,000/year).    It was  agreed  that few  academics  or
          government members could  acquire funding for  membership.   SPEC
          Perfect Group  Steering  Committee  could elect  to  sponsor  the
          memberships of  selected  individuals;  and  certain  individuals
          could  be  included  by  creation  of  "SPEC  Fellows"  or  "SPEC
          Affiliates" whereby  specific services  could  be paid  for  with
          membership.     Seeking  industrial   sponsorship  for   academic
          participation was  discussed as  desireable.   Each  member  will
          initiate a "check is  in the mail"  process for their  membership
          fees.   Diane  Dean,  NCGA,  2722  Merrilee  Drive,  Fairfax,  VA
          22301-4499 (703-698-9600  x318) is  our contact  in this  regard.
          SPEC Open  Systems  Group  members  received  6  free  pages  for
          SPEC/OSG reporting  in the  publications; additional  pages  were
          billed at $500  each--it was  noted that DEC  purchased 60  extra
          pages in the last publication to kick off a new product line.

          The SPEC Perfect group organization was discussed.  It was agreed
          that the SPEC Perfect Group should have a Chairman, a  Secretary,
          and a Technical Coordinator.   The Chairman would be  responsible
          for interfacing  with  SPEC  and the  SPEC  Open  Systems  Group,
          organizing meetings, and general management.  The Secretary would
          be   responsible    for   generating    minutes   and    handling
          correspondence.  The Technical  Coordinator would be  responsible
          for benchmarking status,  benchmark production and  distribution,
          coordinating the benchmark subgroups,  and being the focal  point
          for technical issues.  Each benchmark subgroup would have its own
          leadership.

          Temporary assignments were accepted to fill these positions until
          the next SPEC Perfect Group  meeting, targeted for August at  ATT
          (Chicago).    Rege  Satish  is  the  temporary  Chiarman,  Philip
          Tannenbaum  the  temporary  Secretary,  and  Leo  Boelhouwer  the
          temporary Technical Coordinator.   Specific action items for  the
          period include:

             Completing the benchmark codes
             Generating verification tests and timing instrumentation
             Publishing minutes
             Writing a  solicitation for  vendors  and industry  to  attract
             membership or sponsorship support





          A discussion about the benchmark rules and reporting resulted  in
          general  agreement  that  there  would  be  baseline  ("As   Is")
          executions which  allowed only  the minimal  changes required  to
          obtain correct  results.   There would  also be  an optimized  or
          alternative solution execution which would allow unlimited use of
          standard vendor libraries and unlimited rewriting in a high level
          language.  

          It was agreed  that the benchmark  programs would be  distributed
          via netlib  or  anonymous ftp.    Text  would be  added  to  each
          benchmark program  requiring that  any use  of benchmark  results
          from the program, which are  not formally accepted and  published
          by SPEC  Perfect  Group,  must   state  "these  results  are  not
          officially approved  and  reported  by  the  SPEC  Perfect  Group
          Steering Committee.    They may  not  be directly  comparable  to
          accepted and verified results."
          Only actual execution results would be permitted.  All executions
          must be  on  hardware  and  software  systems  that  are  current
          products or  which  will be  generally  available in  the  market
          within 6 months.  

          There was  a  spirited debate  on  the  metrics to  be  used  for
          reporting results.  Discussion about  the pros and cons of  using
          normalized  ratings,  MFLOPS,  wall  clock  times,  and  absolute
          numbers took  place. The  discussion  resulted in  the  benchmark
          publications including 1)elapsed wall clock time, 2)startup time,
          3)time step  timing,  3)cleanup  time,   4)total  user  cpu  time
          accumulated, and 5)total system cpu time accumulated per program.
            No MFLOPS rate will reported.   This was agreed to be the  most
          scientifically  sound  approach  that  would  be  meaningful  and
          unambiguous.

          All execution results presented for approval and publication must
          include  sufficient   detail  of   the  hardware   and   software
          configuration such that the  run could be essentially  duplicated
          with comparable  timings.   Acceptable  results will  have  valid
          answers and meet  SPEC Perfect Group  standards for code  changes
          and execution requirements.   Optimized and alternative  solution
          results must include the entire  program code as executed, and  a
          statement that the code  may be used,  without restriction, as  a
          SPEC Perfect Group baseline benchmark  code.  All vendor  library
          codes  used   must  include   copies  of   the  relevant   vendor
          documentation page that include sufficient detail to describe the
          processes done within  the library routine.   New vendor  library
          routines   must   have    copies   of   equivalent    preliminary
          documentation.   All  library  routines used  must  be  generally
          available to all vendor customers, and must either be  documented
          products, or  become  documented  products  within  6  months  of
          benchmark submission.    Results on  prototype  or  preproduction
          systems could  be removed  from  publication if  the  benchmarked
          products were not released within the 6 month window.

          The goal  is to  provide  all codes  in  a FORTRAN77  version,  a
          FORTRAN90 version, and a message passing version.  It was  agreed





          that version control  should be  instituted so  that all  results
          would be grouped according to benchmark version.  If any one code
          in a  benchmark group  changed,  all codes  would receive  a  new
          version number.  The benchmark groups will be aligned to  address
          vertical industrial areas such as petroleum, chemistry,  finance,
          etc. 

          The codes available  for the initial  release include the  FDMOD,
          FKMIG, and  SEIS from  the ARCO  suite, QCD,  FALSE, PUEBLO,  and
          TURB3D.   The ARCO suite codes are farthest along.  All codes are
          expected  to  represent  scalable  problem  solutions  that   are
          appropriate to vector, vector parallel, and MPP architectures.  A
          goal is to  maintain the benchmark  set at a  level whereby  only
          supercomputer class  and extreme  high end  workstations/clusters
          could reasonably  execute the  problems.   There is  no  specific
          exclusion intended; this goal was stated in order to maintain the
          SPEC Perfect Group focus on  true supercomputing rather than  the
          broader high performance computing classification.  The goals may
          not all be addressed initially because of pratical limitations in
          how much can be accomplished with available resources.

          Coding and  language standards  were discussed.   Proposals  were
          made.  John Larson''s work in this area will be circulated.   Leo
          Boelhouwer will  edit  the  V1 execution  rules  and  present  an
          updated draft for approval during the next meeting.  









          Language standards  were  presented as  a  basis for  creating  a
          benchmark code  standard  by  David  Schneider.    They  included
          numerous items that were accepted by the group, and a few  (noted
          below) where no final conclusion was made.

               Variables could not exceed 31 characters
               No Pointers
               No DOUBLE PRECISION; REAL*8 and COMPLEX*16 should be used
               No CHARACTER-Floating Point equivalences
               No Hollerith constants or data
               No 128 bit requirements (REAL*16, COMPLEX*32)
               All 64 bit constants should be specified in D format
               All 32 bit constants should be specified in E format
               Machine constant limitations were discussed--no  conclusions
          agreed
               INTEGER*8 and LOGICAL*8 should not be used unless  necessary
          for execution
               Tests  for  floating   point  equality  were   discussed--no
          conclusions agreed





               Known vector directive information  will be translated to  a
          "C*PERFECT" syntax to
                    preserve information; it will be explicitly  prohibited
          from implementing compiler
                    recognition of "C*PERFECT" information.
               DO WHILE and DO-ENDDO syntax is allowed
               "!" inlined comments were discussed--no conclusions agreed


          Additional action items were summarized:

               Distribute old by-laws for review (DS)
               Review old by-laws and offer suggestions for revision (all)
               Contact NCGA regarding our new status (DS)
               Present our proposals for membership specific issues to  the
          SPEC BOD (SR)
               Identify manpower  requirements  to  complete  V2  benchmakr
          suite (all)
               Transfer "Perfect Benchmark" trademark  from U.Ill. to  SPEC
          (DS)
               Distribute Minutes (PT)
               Set up address and email lists (DS)
               Next meeting  at ATT,  Chicago, in  August (with  SPEC  Open
          Systems Group) (all)
               Schedule a benchathon to finalize all V2 inital codes (all).

---
Michael W. Berry     ___-___  o==o======   .   .   .   .   .
Ayres 114         =========== ||//         
Department of             \ \ |//__        
Computer Science          #_______/        berry@cs.utk.edu
University of Tennessee                    (615) 974-3838 [OFF]
Knoxville, TN 37996-1301                   (615) 974-4404 [FAX]
From owner-pbwg-compactapp@CS.UTK.EDU Wed May 26 17:49:18 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-UTK)
	id AA09519; Wed, 26 May 93 17:49:18 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA25937; Wed, 26 May 93 17:49:43 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Wed, 26 May 1993 17:49:42 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from BERRY.CS.UTK.EDU by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA25931; Wed, 26 May 93 17:49:41 -0400
Received: from LOCALHOST.cs.utk.edu by berry.cs.utk.edu with SMTP (5.61++/2.7c-UTK)
	id AA11808; Wed, 26 May 93 17:49:40 -0400
Message-Id: <9305262149.AA11808@berry.cs.utk.edu>
To: pbwg-compactapp@cs.utk.edu
Subject: We can get ARCO
Date: Wed, 26 May 1993 17:49:39 -0400
From: "Michael W. Berry" <berry@cs.utk.edu>

Here's an note I recieved from Mosher at ARCO - looks pretty good!
Mike

Return-Path: <ccm@Arco.COM>
Received: from inetg1.Arco.COM by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA14228; Wed, 26 May 93 14:48:57 -0400
Received: by Arco.COM (4.1/SMI-4.1)
	id AA06937; Wed, 26 May 93 13:48:55 CDT
Date: Wed, 26 May 93 13:48:55 CDT
From: ccm@Arco.COM (Chuck Mosher (214)754-6468)
Message-Id: <9305261848.AA06937@Arco.COM>
To: berry@cs.utk.edu
Subject: ARCO/Perfect Seismic Benchmark


Version 1.0 of SeisPerf is due for Beta release June 1.  The
suite provides a working seismic processing executive with
examples of common industry algorithms.  Version 1.0 is built
over a simple message passing layer, which calls PVM, P4, or
native message passing services.  The applications call several
of the kernal routines mentioned in the PBWG minutes, including
3D fft's, tri-diagonal and Toepplitz matrix solvers, convolutions,
and integral methods.  The codes are designed to be scalable
from single processor workstations to ~1000 processor MPP systems.

Verification tools include a simple X-windows frame viewer, and
a checksum table that is printed at the end of each run.  The 1.0
release is based on Fortran 77.  MasPar has provided a Fortran 90
port of the codes for their systems, which could form the base for
and HPF version of the codes.

I'd be happy to participate in PARKBENCH and provide support for
including SeisPerf results.

Regards,
Chuck Mosher
ccm@arco.com

---
Michael W. Berry     ___-___  o==o======   .   .   .   .   .
Ayres 114         =========== ||//         
Department of             \ \ |//__        
Computer Science          #_______/        berry@cs.utk.edu
University of Tennessee                    (615) 974-3838 [OFF]
Knoxville, TN 37996-1301                   (615) 974-4404 [FAX]
From owner-pbwg-compactapp@CS.UTK.EDU Thu May 27 12:54:03 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-UTK)
	id AA13555; Thu, 27 May 93 12:54:03 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA10406; Thu, 27 May 93 12:54:28 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 27 May 1993 12:54:27 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from BERRY.CS.UTK.EDU by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA10400; Thu, 27 May 93 12:54:26 -0400
Received: from LOCALHOST.cs.utk.edu by berry.cs.utk.edu with SMTP (5.61++/2.7c-UTK)
	id AA13805; Thu, 27 May 93 12:54:25 -0400
Message-Id: <9305271654.AA13805@berry.cs.utk.edu>
To: ccm@arco.com (Chuck Mosher (214)754-6468)
Cc: pbwg-compactapp@cs.utk.edu
Subject: Re: ARCO/Perfect Seismic Benchmark 
In-Reply-To: Your message of "Thu, 27 May 1993 06:59:31 CDT."
             <9305271159.AA15941@Arco.COM> 
Date: Thu, 27 May 1993 12:54:24 -0400
From: "Michael W. Berry" <berry@cs.utk.edu>


> An earlier release of the codes is available on the U of Illinois
> anonymous ftp server 'csrd.uiuc.edu' in the directory '/pub/perfect'.
> The file 'arco_beta.tar.Z' contains code, installation scripts,
> and documentation for an earlier f77 version for uniprocessors.
> You might want to get this file and have a look at the documentation
> and source structure.  The message-passing source is pretty close
> in structure to the f77 version.
> 
> We have a mailing list for discussion of the codes:
> 	'perfect_seismic@csrd.uiuc.edu'
> Let me know if you want to be on the list.  We'll announce the
> new codes there.
> 
> Regards,
> Chuck Mosher
 Yes, please add my email addr and pbwg-compactapp@cs.utk.edu to
the mailing list. Thanks Mike

---
Michael W. Berry     ___-___  o==o======   .   .   .   .   .
Ayres 114         =========== ||//         
Department of             \ \ |//__        
Computer Science          #_______/        berry@cs.utk.edu
University of Tennessee                    (615) 974-3838 [OFF]
Knoxville, TN 37996-1301                   (615) 974-4404 [FAX]
From owner-pbwg-compactapp@CS.UTK.EDU Thu Sep 16 11:20:48 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA00187; Thu, 16 Sep 93 11:20:48 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA25374; Thu, 16 Sep 93 11:19:13 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 16 Sep 1993 11:19:10 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from sun4.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA25344; Thu, 16 Sep 93 11:19:07 -0400
Received: by sun4.epm.ornl.gov (4.1/1.34)
	id AA00634; Thu, 16 Sep 93 11:19:06 EDT
Date: Thu, 16 Sep 93 11:19:06 EDT
From: worley@sun4.epm.ornl.gov (Pat Worley)
Message-Id: <9309161519.AA00634@sun4.epm.ornl.gov>
To: pbwg-compactapp@cs.utk.edu
Subject: potential compact benchmark
Forwarding: Mail from 'MAILER-DAEMON (Mail Delivery Subsystem)'
      dated: Thu, 16 Sep 93 11:16:12 EDT

Ian Foster and I are just finishing version 1.0 of PSTSWM, a parallel
algorithm testbed and benchmark code developed for the climate modelling
community. It will be made available to this community via netlib, but it
may also be interesting as a PARKBENCH compact application. There are a
few difficulties with this though, and I would like some
feedback/suggestions on how to proceed.

Description
-----------
PSTSWM is a parallel implementation of a serial code (STSWM 2.0) written
by Jim Hack and Rudy Jakobs at NCAR to solve the shallow water equations
on a sphere using the spectral transform method. It was originally
developed as a numerical algorithm testbed, to allow comparison of
spectral methods with finite difference methods with finite element
methods, etc., and has 6 runtime-selectable test cases in the code.
These test cases specify initial conditions, forcing, and analytic
solutions (for error analysis), and were chosen to test the ability  of
the numerical methods to simulate important flows phenomena.

For PSTSWM, we completely rewrote STSWM to add vertical levels, in order
to get the correct communication and computation granularity for 3-D
climate codes, and to allow the problem size to be selected at runtime
without depending on such nonportable features as dynamic memory. 

PSTSTWM is meant to be a compromise between paper benchmarks and the
usual fixed benchmarks by allowing a significant amount of
runtime-selectable algorithm tuning. Thus, the goal is to see how
quickly the numerical simulation can be run on different machines
without fixing the parallel implementation, but forcing all
implementations to execute the same numerical code (to guarantee
fairness). To enable this PSTSWM supports:

a) 4 classes of parallel algorithms (distributed or transpose
   based for each of two major parallel phases)
b) each class has 3-4 specific parallel algorithms (e.g. using a
   recursive-halving vector sum, using a pipelined ring vector sum,
   etc.)
c) each algorithm has 2-4 variants 
d) each algorithm is built on top of two communication constructs,
   swap and sendrecv, and each of these has 5-6 different communication
   protocol options (synchonous, blocking, nonblocking, forcetypes,
   etc.)

We are quite happy with the code, and are getting good results with it.
Most interesting to us is how the best algorithm changes across
platforms and as the problem size changes on the same platform.

Problems
--------
There are couple of issues to be dealt with in using this code as part
of PARKBENCH.

1) The code currently is in single precision with double precision
   parts. Single precision is sufficient for the problem sizes of
   interest, but the Legendre polynomial values and Gauss quadrature
   weights and nodes must be calculated in higher precision. For larger
   problem sizes, double precision computation will be appropriate, but
   the Gauss weights, etc will then need to be calculated in quad.
   precision. I do not think that this sort of mixed case has been
   discussed yet. 

2) In one sense, PSTSWM is not a single benchmark, but many of them.
   We can fix the problem and parallel algorithm specifications by
   providing (a set of) default input files, but which ones should we
   chose? All of them are arguably good algorithms in some setting, and
   I would hate to compare two machines when the algorithm is good for
   one and inappropriate for another.

3) PSTSWM is currently written using PICL (because that is what I
   normally use and because I have embedded instrumentation in the
   research version of the code). I made a real effort to isolate the
   message passing bits, so porting to anything else will be trivial.
   But the message passing interface that is used does effect the
   parallel algorithms that are supported. For example, PICL supports
   nonblocking send and receive and passes through forcetype message
   types. These are important to performance on some Intel machines.
   This is not a problem so much as something to be aware of. PSTSWM
   will also be available in its original form, but a pointer to some of
   the issues in cross-machine comparisions should be made. This may be
   an issue that should be mentioned in the methodology section as
   pertains to compact applications. Unlike low level benchmarks,
   compact applications are less likely to be "done right" by the vendor
   for their particular machines. 

Comments and suggestions would be appreciated. I imagine every proposed
compact application will be unsuitable in one form or another when it is
first submitted, and precise guidelines on what should or should not be
permitted is important. On the other hand, as a developer, I will not be
interested in doing too much work in modifying the code in order to
include it in the benchmark suite. Even with the best intentions, it
will not be a high priority item for me and is likely to be put off
(forever) if not fairly simple.

Thanks.

Pat Worley

From owner-pbwg-compactapp@CS.UTK.EDU Tue Sep 21 11:49:13 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA02710; Tue, 21 Sep 93 11:49:13 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA08554; Tue, 21 Sep 93 11:47:15 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Tue, 21 Sep 1993 11:47:14 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA08546; Tue, 21 Sep 93 11:47:13 -0400
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA12782; Tue, 21 Sep 1993 11:47:07 -0400
Date: Tue, 21 Sep 1993 11:47:07 -0400
From: walker@rios2.epm.ornl.gov (David Walker)
Message-Id: <9309211547.AA12782@rios2.epm.ornl.gov>
To: pbwg-compactapp@cs.utk.edu
Subject: Application submission form


I'm trying to put together a submission form for people to use to submit 
applications for inclusion in the ParkBench Compact Applications suite. Also
I'd like to establish a procedure for submission. Below is a first stab at
these 2 things. Please send me feedback. Later this week I intend to send
out a filled in version of the submission form as an example.

David
                 PARKBENCH COMPACT APPLICATIONS SUBMISSION FORM

To submit a compact application to the ParkBench suite you must follow the
following procedure:

1. Complete the submission form below, and email it to David Walker
   at walker@msr.epm.ornl.gov. The data on this form will be reviewed 
   by the ParkBench Compact Applications Subcommittee, and you will
   be notified if the application is to be considered further for
   inclusion in the ParkBench suite.
   
2. If ParkBench Compact Applications Subcommittee decides to consider
   your application further you will be asked to submit the source code
   and input and output files, together with any documentation and papers
   about the application. Source code and input and output files should
   be submitted by email, or ftp, unless the files are very large, in
   which case a tar file on a 1/4 inch cassette tape. Wherever possible 
   email submission is preferred for all documents in man page, Latex 
   and/or Postscipt format. These files documents and papers together
   constitute your application package. Your application package should
   be sent to:
		David Walker
                Oak Ridge National Laboratory
                Bldg. 6012/MS-6367
                P. O. Box 2008
                Oak Ridge, TN 37831-6367
                (615) 574-7401/0680 (phone/fax)
                walker@msr.epm.ornl.gov

   The street address is "Bethal Valley Road" if Fedex insists on this.
   The subcommittee will then make a final decision on whether to include 
   your application in the ParkBench suite.

3. If your application is approved for inclusion in the ParkBench suite
   you (or some authorized person from your organization) will be asked
   in complete and sign a form giving ParkBench authority to distribute,
   and modify (if necessary), your application package.

-------------------------------------------------------------------------------
Name of Program         :
-------------------------------------------------------------------------------
Submitter's Name        :
Submitter's Organization:
Submitter's Address     :


Submitter's Telephone # :
Submitter's Fax #       :
Submitter's Email       :
-------------------------------------------------------------------------------
Cognizant Expert(s)     :
CE's Organization       :
CE's Address            :



CE's Telephone #        :
CE's Fax #              :
CE's Email              :
-------------------------------------------------------------------------------
Extent and timeliness with which CE is prepared to respond to questions and
bug reports from ParkBench :


-------------------------------------------------------------------------------
Major Application Field :
Application Subfield(s) :
-------------------------------------------------------------------------------
Application "pedigree"  :




-------------------------------------------------------------------------------
May this code be freely distributed (if not specify restrictions) :


-------------------------------------------------------------------------------
Give length in bytes of integers and floating-point numbers that should be
used in this application:

        Integers :     bytes
	Floats   :     bytes

-------------------------------------------------------------------------------
Documentation describing the implementation of the application (at module
level, or lower) :



-------------------------------------------------------------------------------
Research papers describing sequential code and/or algorithms :



-------------------------------------------------------------------------------
Research papers describing parallel code and/or algorithms :



-------------------------------------------------------------------------------
Other relevent research papers:



-------------------------------------------------------------------------------
Application available in the following languages (give message passing system
used, if applicable, and machines application runs on) :




-------------------------------------------------------------------------------
Total number of lines in source code:
Number of lines excluding comments  :
Size in bytes of source code        :
-------------------------------------------------------------------------------
List input files (filename, number of lines, size in bytes, and if formatted) :



-------------------------------------------------------------------------------
List output files (filename, number of lines, size in bytes, and if formatted) :



-------------------------------------------------------------------------------
Brief, high-level description of what application does:




-------------------------------------------------------------------------------
Main algorithms used:



-------------------------------------------------------------------------------
Skeleton sketch of application:




-------------------------------------------------------------------------------
Brief description of I/O behavior:




-------------------------------------------------------------------------------
Brief description of load balance behavior :




-------------------------------------------------------------------------------
Describe the data distribution (if appropriate) :



-------------------------------------------------------------------------------
Give parameters of the data distribution (if appropriate) :




-------------------------------------------------------------------------------
Give parameters that determine the problem size :



-------------------------------------------------------------------------------
Give memory as function of problem size :


-------------------------------------------------------------------------------
Give number of floating-point operations as function of problem size :


-------------------------------------------------------------------------------
Give communication overhead as function of problem size and data distribution :




-------------------------------------------------------------------------------
Give three problem sizes, small, medium, and large for which the benchmark
should be run (give parameters for problem size, sizes of I/O files,
memory required, and number of floating point operations) :






-------------------------------------------------------------------------------
How did you determine the number of floating-point operations (hardware
monitor, count by hand, etc.) :



-------------------------------------------------------------------------------
From owner-pbwg-compactapp@CS.UTK.EDU Tue Oct  5 15:29:11 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA06534; Tue, 5 Oct 93 15:29:11 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA00420; Tue, 5 Oct 93 15:28:34 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Tue, 5 Oct 1993 15:28:29 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA00402; Tue, 5 Oct 93 15:28:23 -0400
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA20677; Tue, 5 Oct 1993 15:28:21 -0400
Message-Id: <9310051928.AA20677@rios2.epm.ornl.gov>
To: spb@epcc.edinburgh.ac.uk, mia@unixa.nerc-bidston.ac.uk,
        pbwg-compactapp@cs.utk.edu
Subject: Submission form for ParkBench compact applications
Date: Tue, 05 Oct 93 15:28:20 -0500
From: David W. Walker <walker@rios2.epm.ornl.gov>


Below is an example (prepared by Pat Worley of Oak Ridge National Lab) of
the use of the ParkBench Compact Applications submission form. This form (or
something like it) is intended to be used by all persons wishing to submit 
an application to be included in the suite. The first page or so expalins
the submission procedure. Pat has been very thorough in filling out the form.
I don't think it practical to expect every submission to be this detailed.

If you have applications that you would like to submit please go ahead and
fill in the form. Laso any comments on the form would be appreciated. I hope
to give the form wider distribution in a couple of weeks so we can (I hope)
get a good number of submission before teh SC93 ParkBench meeting.

David

                 PARKBENCH COMPACT APPLICATIONS SUBMISSION FORM

To submit a compact application to the ParkBench suite you must follow the
following procedure:

1. Complete the submission form below, and email it to David Walker
   at walker@msr.epm.ornl.gov. The data on this form will be reviewed 
   by the ParkBench Compact Applications Subcommittee, and you will
   be notified if the application is to be considered further for
   inclusion in the ParkBench suite.
   
2. If ParkBench Compact Applications Subcommittee decides to consider
   your application further you will be asked to submit the source code
   and input and output files, together with any documentation and papers
   about the application. Source code and input and output files should
   be submitted by email, or ftp, unless the files are very large, in
   which case a tar file on a 1/4 inch cassette tape. Wherever possible 
   email submission is preferred for all documents in man page, Latex 
   and/or Postscipt format. These files documents and papers together
   constitute your application package. Your application package should
   be sent to:
David Walker
                Oak Ridge National Laboratory
                Bldg. 6012/MS-6367
                P. O. Box 2008
                Oak Ridge, TN 37831-6367
                (615) 574-7401/0680 (phone/fax)
                walker@msr.epm.ornl.gov

   The street address is "Bethal Valley Road" if Fedex insists on this.
   The subcommittee will then make a final decision on whether to include 
   your application in the ParkBench suite.

3. If your application is approved for inclusion in the ParkBench suite
   you (or some authorized person from your organization) will be asked
   in complete and sign a form giving ParkBench authority to distribute,
   and modify (if necessary), your application package.

-------------------------------------------------------------------------------
Name of Program         : PSTSWM 
                        : (Parallel Spectral Transform Shallow Water Model)
-------------------------------------------------------------------------------
Submitter's Name        : Patrick H. Worley
Submitter's Organization: Oak Ridge National Laboratory
Submitter's Address     : Bldg. 6012/MS-6367
                          P. O. Box 2008
                          Oak Ridge, TN 37831-6367
Submitter's Telephone # : (615) 574-3128
Submitter's Fax #       : (615) 574-0680
Submitter's Email       : worley@msr.epm.ornl.gov
-------------------------------------------------------------------------------
Cognizant Expert(s)     : Patrick H. Worley
CE's Organization       : Oak Ridge National Laboratory
CE's Address            : Bldg. 6012/MS-6367
                          P. O. Box 2008
                          Oak Ridge, TN 37831-6367
CE's Telephone #        : (615) 574-3128
CE's Fax #              : (615) 574-0680
CE's Email              : worley@msr.epm.ornl.gov

Cognizant Expert(s)     : Ian T. Foster
CE's Organization       : Argonne National Laboratory
CE's Address            : MCS 221/D-235
                          9700 S. Cass Avenue
                          Argonne, IL 60439
CE's Telephone #        : (708) 252-4619
CE's Fax #              : (708) 252-5986
CE's Email              : itf@mcs.anl.gov
-------------------------------------------------------------------------------
Extent and timeliness with which CE is prepared to respond to questions and
bug reports from ParkBench :

Modulo other commitments, Worley is prepared to respond quickly to questions
and bug reports, but expects to be kept informed as to results of experiments
and modifications to the code.

-------------------------------------------------------------------------------
Major Application Field : Fluid Dynamics
Application Subfield(s) : Climate Modeling
-------------------------------------------------------------------------------
Application "pedigree" (origin, history, authors, major mods) :

PSTSWM Version 1.0 is a message-passing benchmark code and parallel algorithm
testbed that solves the nonlinear shallow water equations using the spectral
transform method. The spectral transform algorithm of the code follows
closely how CCM2, the NCAR Community Climate Model, handles the dynamical
part of the primitive equations, and the parallel algorithms implemented in
the model include those currently used in the message-passing parallel
implementation of CCM2. PSTSWM was written by Patrick Worley of Oak Ridge
National Laboratory and Ian Foster of Argonne National Laboratory, and is
based partly on previous parallel algorithm research by John Drake, David
Walker, and Patrick Worley of Oak Ridge National Laboratory. Both the code
development and parallel algorithms research were funded by the DOE Computer
Hardware, Advanced Mathematics, and Model Physics (CHAMMP) program. The
features of version 1.0 were frozen on 8/1/93, and it is this version we
would offer initially as a benchmark.  

PSTSWM is a parallel implementation of a sequential code (STSWM 2.0) written
by James Hack and Ruediger Jakob at NCAR to solve the shallow water equations 
on a sphere using the spectral transform method. STSWM evolved from a
spectral shallow water model written by Hack (NCAR/CGD) to compare numerical
schemes designed to solve the divergent barotropic equations in spherical
geometry. STSWM was written partially to provide the reference solutions
to the test cases proposed by Williamson et. al. (see citation [4] below),
which were chosen to test the ability of numerical methods to simulate
important flow phenomena. These test cases are embedded in the code and 
are selectable at run-time via input parameters, specifying initial conditions,
forcing, and analytic solutions (for error analysis). The solutions are also
published in a Technical Note by Jakob et. al. [3]. In addition, this code is
meant to serve as an educational tool for numerical studies of the shallow
water equations. A detailed description of the spectral transform method, and
a derivation of the equations used in this software, can be found in the
Technical Note by Hack and Jakob [2].  

For PSTSWM, we rewrote STSWM to add vertical levels (in order to get the
correct communication and computation granularity for 3-D weather and climate
codes), to increase modularity and support code reuse, and to allow the
problem size to be selected at runtime without depending on dynamic memory
allocation. PSTSTWM is meant to be a compromise between paper benchmarks and
the usual fixed benchmarks by allowing a significant amount of
runtime-selectable algorithm tuning. Thus, the goal is to see how quickly the
numerical simulation can be run on different machines without fixing the
parallel implementation, but forcing all implementations to execute the same
numerical code (to guarantee fairness). The code has also been written in
such a way that linking in optimized library functions for common operations
instead of the "portable" code will simple.

-------------------------------------------------------------------------------
May this code be freely distributed (if not specify restrictions) :

Yes, but users are requested to acknowledge the authors (Worley and
Foster) and the program that supported the development of the code
(DOE CHAMMP program) in any resulting research or publications, and are
encouraged to send reprints of their work with this code to the authors.
Also, the authors would appreciate being notified of any modifications to 
the code. Finally, the code has been written to allow easy reuse of code in
other applications, and for educational purposes. The authors encourage this,
but also request that they be notified when pieces of the code are used.

-------------------------------------------------------------------------------
Give length in bytes of integers and floating-point numbers that should be
used in this application:

The program currently uses INTEGER, REAL, COMPLEX, and DOUBLE PRECISION
variables. The code should work correctly for any system in which COMPLEX is
represented as 2 REALs. The include file params.i has parameters that can be
used to specify the length of these. Also, some REAL and DOUBLE parameters
values may need to be modified for floating point number systems with large
mantissas, e.g., PI, TWOPI. PSTSWM is currently being used on systems where

        Integers : 4   bytes
	Floats   : 4   bytes

The use of two precisions can be eliminated, but at the cost of a significant
loss of precision. (For 4 bytes REALs, not using DOUBLE PRECISION increases
the error by approximately three orders of magnitude.) DOUBLE PRECISION
results are only used in set-up (computing Gauss weights and nodes and
Legendre polynomial values), and are not used in the body of the computation.

-------------------------------------------------------------------------------
Documentation describing the implementation of the application (at module
level, or lower) :

The sequential code is documented in a file included in the distribution of the
code from NCAR:

Jakob, Ruediger, Description of Software for the Spectral Transform Shallow
Water Model Version 2.0. National Center for Atmospheric Research,
Boulder, CO 80307-3000, August 1992

and in 

Hack, J.J. and R. Jakob, Description of a global shallow water model based on
the spectral transform method, NCAR Technical Note TN-343+STR, January 1992. 

Documentation of the parallel code is in preparation, but extensive
documentation is present in the code.

-------------------------------------------------------------------------------
Research papers describing sequential code and/or algorithms :

1) Browning, G.L., J.J. Hack and P.N. Swarztrauber, A comparison of
   three numerical methods for solving differential equations on
   the sphere, Monthly Weather Review, 117:1058-1075, 1989.

2) Hack, J.J. and R. Jakob, Description of a global
   shallow water model based on the spectral transform method,
   NCAR Technical Note TN-343+STR, January 1992.

3) Jakob, R., J.J. Hack and D.L. Williamson, Reference solutions to
   shallow water test set using the spectral transform method,
   NCAR Technical Note TN-388+STR (in preparation).

4) Williamson, D.L., J.B. Drake, J.J. Hack, R. Jakob and P.S. Swarztrauber,
   A standard test set for numerical approximations to the shallow
   water equations in spherical geometry, Journal of Computational Physics,
   Vol. 102, pp.211-224, 1992.
-------------------------------------------------------------------------------
Research papers describing parallel code and/or algorithms :

5) Worley, P. H. and J. B. Drake, Parallelizing the Spectral Transform Method,
   Concurrency: Practice and Experience, Vol. 4, No. 4 (June 1992), 
   pp. 269-291.

6) Walker, D. W., P. H. Worley, and J. B. Drake, Parallelizing the Spectral
   Transform Method. Part II, 
   Concurrency: Practice and Experience, Vol. 4, No. 7 (October 1992), 
   pp. 509-531.

7) Foster, I. T. and P. H. Worley,
   Parallelizing the Spectral Transform Method: A Comparison of Alternative
   Parallel Algorithms,
   Proceedings of the Sixth SIAM Conference on Parallel Processing for
   Scientific Computing (March22-24, 1993), pp. 100-107.

8) Foster, I. T. and P. H. Worley,
   Parallel Algorithms for the Spectral Transform Method,
   (in preparation)

9) Worley, P. H. and I. T. Foster,
   PSTSWM: A Parallel Algorithm Testbed and Benchmark.
   (in preparation)

-------------------------------------------------------------------------------
Other relevant research papers:

10) I. Foster, W. Gropp, and R. Stevens, 
    The parallel scalability of the spectral transform method, 
    Mon. Wea. Rev., 120(5), 1992, pp. 835--850. 

11) Drake, J. B., R. E. Flanery, I. T. Foster, J. J. Hack, J. G. Michalakes,
    R. L. Stevens, D. W. Walker, D. L. Williamson, and P. H. Worley,
    The Message-Passing Version of the Parallel Community Climate Model,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 500-513.

12) Sato, R. K. and R. D. Loft,
    Implementation of the NCAR CCM2 on the Connection Machine,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 371-393.

13) Barros, S. R. M. and Kauranne, T.,
    On the Parallelization of Global Spectral Eulerian Shallow-Water Models,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 36-43.

14) Kauranne, T. and S. R. M. Barros,
    Scalability Estimates of Parallel Spectral Atmospheric Models,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 312-328.

15) Pelz, R. B. and W. F. Stern,
    A Balanced Parallel Algorithm for Parallel Processing,
    Proceedings of the Sixth SIAM Conference on Parallel Processing for
    Scientific Computing (March22-24, 1993), pp. 126-128.

-------------------------------------------------------------------------------
Application available in the following languages (give message passing system
used, if applicable, and machines application runs on) :

The model code is primarily written in Fortran 77, but also uses
DO ... ENDDO and DO WHILE ... ENDDO, and the INCLUDE extension (to pull in
common and parameter declarations). It has been compiled and run on the Intel
iPSC/2, iPSC/860, Delta, and Paragon, the IBM SP1, and on Sun Sparcstation,
IBM RS/6000, and Stardent 3000/1500 workstations (as a sequential code).

Message passing is implemented using the PICL message passing system.
All message passing is encapsulated in 3 highlevel routines:

BCAST0 (broadcast)
GMIN0  (global minimum)
GMAX0  (global maximum)

two classes of low level routines:
 SWAP, SWAP_SEND, SWAP_RECV, SWAP_RECVBEGIN, SWAP_RECVEND, SWAP1, SWAP2, SWAP3
 (variants and/or pieces of the swap operation)
and
 SENDRECV, SRBEGIN, SREND, SR1, SR2, SR3
 (variants and/or pieces of the send/recv operation)

and one synchronization primitive:
CLOCKSYNC0

PICL instrumentation commands are also embedded in the code.

Porting the code to another message passing library will be simple, although
some of the runtime communication options may become illegal then.
The PICL instrumentation calls can be stubbed out (or removed) without
changing the functionality of the code, but some sort of synchronization is
needed when timing short benchmark runs.

-------------------------------------------------------------------------------
Total number of lines in source code: 28,204
Number of lines excluding comments  : 12,434
Size in bytes of source code        : 994,299
-------------------------------------------------------------------------------
List input files (filename, number of lines, size in bytes, and if formatted) :

problem:   23 lines, 559 bytes, ascii
algorithm: 33 lines, 874 bytes, ascii

-------------------------------------------------------------------------------
List output files (filename, number of lines, size in bytes, and if formatted) :

standard output: Number of lines and bytes is a function of the input
                 specifications, but for benchmarking would normally be
                 63 lines (2000 bytes) of meaningful output. (On the Intel
                 machine, FORTRAN STOP messages are sent from each processor
                 at the end of the run, increasing this number.)

timings:         Each run produces one line of output, containing approx.
                 150 bytes.

Both files are ascii.


-------------------------------------------------------------------------------
Brief, high-level description of what application does:

(P)STSWM solves the nonlinear shallow water equations on the sphere.
The nonlinear shallow water equations constitute a simplified
atmospheric-like fluid prediction model that exhibits many of the features of
more complete models, and that has been used to investigate numerical
methods and benchmark a number of machines.
Each run of PSTSWM uses one of 6 embedded initial conditions and forcing
functions. These cases were chosen to stress test numerical methods for this
problem, and to represent important flows that develop in atmospheric
modeling. STSWM also supports reading in arbitrary initial conditions, but
this was removed from the parallel code to simplify the development of the
initial implementation. 

-------------------------------------------------------------------------------
Main algorithms used:

PSTSWM uses the spectral transform method to solve the shallow water
equations. During each timestep, the state variables of the
problem are transformed between the physical domain, where most of the
physical forces are calculated, and the spectral domain, where the terms of
the differential equation are evaluated. The physical domain is a tensor
product longitude-latitude grid. The spectral domain is the set of spectral
coefficients in a spherical harmonic expansion of of the state variables, and
is normally characterized as a triangular array (using a "triangular"
truncation of spectral coefficients). 

Transforming from physical coordinates to spectral coordinates involves
performing a real FFT for each line of constant latitude, followed by 
integration over latitude using Gaussian quadrature (approximating the
Legendre transform) to obtain the spectral coefficients. The inverse
transformation involves evaluating sums of spectral harmonics and inverse
real FFTs, analogous to the forward transform.

Parallel algorithms are used to compute the FFTs and to compute the 
vector sums used to approximate the forward and inverse Legendre transforms.
Two major alternatives are available for both transforms, distributed
algorithms, using a fixed data decompostion and computing results where they
are assigned, and transpose algorithms, remapping the domains to allow the
transforms to be calculated sequentially. This translates to four major
parallel algorithms:

a) distributed FFT/distributed Legendre transform (LT)
b) transpose FFT/distributed LT
c) distributed FFT/transpose LT
d) transpose FFT/transpose LT

Multiple implementations are supported for each type of algorithm, and
the assignment of processors to transforms is also determined by input
parameters. For example, input parameters specify a logical 2-D processor
grid and define the data decomposition of the physical and spectral domains
onto this grid. If 16 processors are used, these can be arranged as
a 4x4 grid, an 8x2 grid, a 16x1 grid, a 2x8 grid, or a 1x16 grid.
This specification determines how many processors are used to calculate each
parallel FFT and how many are used to calculate each parallel LT.

-------------------------------------------------------------------------------
Skeleton sketch of application:

The main program calls INPUT to read problem and algorithm parameters
and set up arrays for spectral transformations, and then calls
INIT to set up the test case parameters. Routines ERRANL and
NRGTCS are called once before the main timestepping loop for
error normalization, once after the main timestepping for 
calculating energetics data and errors, and periodically during 
the timestepping, as requested. The prognostic fields are 
initialized using routine ANLYTC, which provides the analytic
solution. Each call to STEP advances the computed fields by a 
timestep DT. Timing logic surrounds the timestepping loop, so the
initialization phase is not timed. Also, a fake timestep is calculated before
beginning timing to eliminate the first time "paging" effect currently seen
on the Intel Paragon systems. 

STEP computes the first two time levels by two semi-implicit timesteps;
normal time-stepping is by a centered leapfrog-scheme. STEP calls COMP1,
which choses between an explicit numerical algorithm, a semi-implicit
algorithm, and a simplified algorithm associated with solving the advection
equation, one of the embedded test cases. The numerical algorithm used is an
input parameter. 

The basic outline of each timestep is the following:
1) Evaluate non-linear product and forcing terms.
2) Fourier transform non-linear terms in place as a block transform.
3) Compute and update divergence, geopotential, and vorticity spectral
   coefficients. (Much of the calculation of the time update is "bundled"
   with the Legendre transform.)
4) Compute velocity fields and transform divergence, geopotential,
   and vorticity back to gridpoint space using 
   a) an inverse Legendre transform and associated computations and
   b) an inverse real block FFT.

PSTSWM has "fictitious" vertical levels, and all computations are duplicated
on the different levels, potentially significantly increasing the granularity
of the computation. (The number of vertical levels is an input parameter.)
For error analysis, a single vertical level is extracted and analyzed. 

-------------------------------------------------------------------------------
Brief description of I/O behavior:

Processor 0 reads in the input parameters and broadcasts them to the rest of
the processors. Processor 0 also receives the error analysis and timing
results from the other processors and writes them out.

-------------------------------------------------------------------------------
Describe the data distribution (if appropriate) :

The processors are treated as a logical 2-D grid. There are 3 domains to be
distributed:
 a) physical domain: tensor product longitude-latitude grid
 b) Fourier domain: tensor product wavenumber-latitude grid
 c) spectral domain: triangular array, where each column contains the
                     spectral coefficients associated with a given
                     wavenumber. The larger the wavenumber is, the shorter
                     the column is.
An unordered FFT is used, and the Fourier and spectral domains use the
"unordered" permutation when the data is being distributed.

I) distributed FFT/distributed LT
   1) The tensor-product longitude-latitude grid is mapped onto the 
      processor grid by assigning a block of contiguous longitudes 
      to each processor column and by assigning one or two blocks of
      contiguous latitudes to each processor row. The vertical dimension is
      not distributed.   
   2) After the FFT, the subsequent wavenumber-latitude grid is similarly
      distributed over the processor grid, with a block of the permuted
      wavenumbers assigned to each processor column.
   3) After the LT, the wavenumbers are distributed as before and the spectral
      coefficients associated with any given wavenumber are either
      distributed evenly over the processors in the column containing that
      wavenumber, or are duplicated over the column. What happens is a
      function of the particular distributed LT algorithm used.

II) transpose FFT/distributed LT
   1) same as in (I)
   2) Before the FFT, the physical domain is first remapped to
      a vertical layer-latitude decomposition, with a block of contiguous
      vertical layers assigned to each processor column and the longitude
      dimension not distributed. After the transform, the vertical
      level-latitude grid is distributed as before, and the wavenumber
      dimension is not distributed. 
   3) After the LT, the spectral coefficients for a given vertical layers are
      either distributed evenly over the processors in a column, or are
      duplicated over that column. What happens is a function of the
      particular distributed LT algorithm used. 

III) distributed FFT/transpose LT
   1) same as (I)
   2) same as (I)
   3) Before the LT, the wavenumber-latitude grid is first remapped to
      a wavenumber-vertical layer decomposition, with a block of contiguous
      vertical layers assigned to eadh processor row and the latitude
      dimension not distributed. After the transform, the spectral
      coefficients associated with a given wavenumber and vertical layer
      are all on one processor, and the wavenumbers and vertical layers are
      distributed as before.

IV) transpose FFT/transpose LT
   1) same as (I)
   2) same as (II)
   3) Before the LT, the vertical level-latitude grid is first remapped to
      a vertical level-wavenumber decomposition, with a block of the permuted 
      wavenumbers now assigned to each processor row and the latitude
      dimension not distributed. After the transform, the spectral
      coefficients associated with a given wavenumber and vertical layer
      are all on one processor, and the wavenumbers and vertical layers are
      distributed as before.

-------------------------------------------------------------------------------
Give parameters of the data distribution (if appropriate) :

The distribution is a function of the problem size (longitude, latitude,
vertical levels), the logical processor grid (PX, PY), and the algorithm
(transpose vs. distributed for FFT and LT).

-------------------------------------------------------------------------------
Brief description of load balance behavior :

The load is fairly well balanced. If PX and PY evenly divide the number of
longitudes, latitudes, and vertical levels, then all load imbalances are due
to the unequal distribution of spectral coefficients. As described above, the
spectral coefficients are laid out as a triangular array in most runs, where
each column corresponds to a different Fourier wavenumber. The wavenumbers are
partitioned among the processors in most of the parallel algorithms. Since
each column is a different length, a wrap mapping of the the columns will
approximately balance the load. Instead, the natural "unordered" ordering of
the FFT is used with a block partitioning, which does a reasonable job of
load balancing without any additional data movement. The load imbalance is
quantified in Walker, et al [5]. 

If PX and PY do not evenly divide the dimensions of the physical domain,
then other load imbalances may be as large as a factor of 2 in the worse
case. 

-------------------------------------------------------------------------------
Give parameters that determine the problem size :

MM, NN, KK - specifes number of Fourier wavenumber and spectral truncation
             used. For a triangular truncation, MM = NN = KK.
NLON, NLAT, NVER
           - number of longitudes, latitudes, and vertical levels. There
             are required relationships between NLON, NLAT, and NVER, and
             between these and MM. These relationships are checked in the
             code. We will also provide a selection of input files that
             specify legal (and interesting) problems.
DT         - timestep (in seconds). (Must be small enough to satisfy Courant
             condition stability condition. Code warns if too large, but does
             not abort.)
TAUE       - end of model run (in hours)

-------------------------------------------------------------------------------
Give memory as function of problem size :

Executable size is determined at compile time by setting the parameters
COMPSZ in params.i. Per node memory requirements are approximately
(in REALs)

associated Legendre polynomial values:
   MM*MM*NLAT/PX*PY
physical grid fields: 
   8*NLON*NLAT*NVER/(PX*PY)
spectral grid fields: 
   3*MM*MM*NVER/(PX*PY) 
 or (if spectral coefficients duplicated within a processor column)
   3*MM*MM*MVER/PX        
work space:
   8*NLON*NLAT*NVER*BUFS1/(PX*PY) + 3*MM*MM*NVER*BUFS2/(PX*PY)
 or (if spectral coefficients duplicated within a processor column)
   8*NLON*NLAT*NVER*BUFS1/(PX*PY) + 3*MM*MM*NVER*BUFS2/PX

where BUFS1 and BUFS2 are input parameters (number of communication buffers).
BUFS1 and BUFS2 can be as small as 0 and as large as PX or PY.

In standard test cases, NLON=2*NLAT, NLON=4*NVER, and NLON=3*MM+1, so memory
requirements are approximately:

    (2 + 108*(1+BUFS1) + 3*(1+BUFS2))*(M**3)/(4*PX*PY)
  or
    (2 + 108*(1+BUFS1))*(M**3)/(4*PX*PY) + 3*(1+BUFS2)*(M**3)/(4*PX)


-------------------------------------------------------------------------------
Give number of floating-point operations as function of problem size :

for a serial run per timestep (very rough):
  nonlinear terms:
        10*NLON*NLAT*NVER
  forward FFT:
        40*NLON*NLAT*NVER*LOG2(NLON)
  forward LT and time update:
       48*MM*NLAT*NVER + 7*(MM**2)*NLAT*NVER
  inverse LT and calculation of velocities:
       20*MM*NLAT*NVER + 14*(MM**2)*NLAT*NVER
  inverse FFT:
       25*NLON*NLAT*NVER*LOG2(NLON)

Using standard assumptions (NLON=2*NLAT, NLON=4*NVER, and NLON=3*MM+1):

approx. 460*(M**3) + 348*(M**3)*LOG2(M) + 24*(M**4) flops per timestep.

For a total run, multiply by TAUE/DT.

-------------------------------------------------------------------------------
Give communication overhead as function of problem size and data distribution :

This is a function of the algorithm chosen.

I) transpose FFT
   a) forward + inverse FFT: let D = 13*NLON*NLAT*NVER/(PX*PY)
        2*(PX-1) steps, D volume
      or
        2*LOG2(PX) steps, D*LOG2(PX) volume 

II) distributed FFT
   a) forward + inverse FFT: let D = 13*NLON*NLAT*NVER/(PX*PY)
        2*LOG2(PX) steps, D*LOG2(PX) volume

III) transpose LT

   a) forward LT:  let D = 8*NLON*NLAT*NVER/(PX*PY)
        2*(PY-1) steps, D volume
      or
        2*LOG2(PY) steps, D*LOG2(PY) volume 

   b) inverse LT:  let D = (3/2)*(MM**2)*NVER/(PX*PY)
        (PY-1) steps, D volume
       or
        LOG2((PY) steps, D*PY volume

IV) distributed LT

   a) forward + inverse LT:  let D = 3*(MM**2)*NVER/(PX*PY)
        2*(PY-1) steps, D*PY volume
       or
        2*LOG2((PY) steps, D*PY volume

These are per timestep costs. Multiply by TAUE/DT for total communication
overhead. 

-------------------------------------------------------------------------------
Give three problem sizes, small, medium, and large for which the benchmark
should be run (give parameters for problem size, sizes of I/O files,
memory required, and number of floating point operations) :

Standard input files will be provided for 

T21: MM=KK=NN=21      T42: MM=KK=NN=42        T85: MM=NN=KK=85
     NLON=32               NLON=64                 NLON=128
     NLAT=64               NLAT=128                NVER=256
     NVER=8                NVER=16                 NVER=32
     ICOND=2               ICOND=2                 ICOND=2
     DT=4800.0             DT=2400.0               DT=1200.0
     TAUE=120.0            TAUE=120.0              TAUE=120.0

These are 5 day runs of the "benchmark" case specified in Williamson, et al
[3]. Flops and memory requirements for serial runs are as follows (approx.):

T21:           500,000 REALs
         2,000,000,000 flops
     
T42:         4,000,000 REALs
        45,000,000,000 flops

T85:        34,391,000 REALs
     1,000,000,000,000 flops

Both memory and flops scale well, so, for example, the T42 run fits in
approx. 4MB of memory for a 4 processor run. But different algorithms and 
different aspect ratios of the processor grid use different amounts of memory.

-------------------------------------------------------------------------------
How did you determine the number of floating-point operations (hardware
monitor, count by hand, etc.) :

Count by hand (looking primarily at inner loops, but eliminating common
subexpressions that compiler is expected to find).

-------------------------------------------------------------------------------
Other relevant information:



-------------------------------------------------------------------------------
From owner-pbwg-compactapp@CS.UTK.EDU Fri Oct  8 09:17:11 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA29750; Fri, 8 Oct 93 09:17:11 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA00426; Fri, 8 Oct 93 09:16:23 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Fri, 8 Oct 1993 09:16:22 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA00418; Fri, 8 Oct 93 09:16:20 -0400
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA20027; Fri, 8 Oct 1993 09:16:19 -0400
Message-Id: <9310081316.AA20027@rios2.epm.ornl.gov>
To: pbwg-compactapp@cs.utk.edu
Subject: Compact applications chapter
Date: Fri, 08 Oct 93 09:16:19 -0500
From: David W. Walker <walker@rios2.epm.ornl.gov>


I just sent the following to Mike Berry, but some of you might also like to make
suggestions.

David

Mike,
	I am a bit of a loss as to what to put into the ParkBench report
for Compact Application since we haven't had any codes submitted (except
for maybe 2 or 3).  It seems to me that we can't really say much without
the codes, about from very general requirements.

David
From owner-pbwg-compactapp@CS.UTK.EDU Fri Oct  8 10:17:35 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA00610; Fri, 8 Oct 93 10:17:35 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA06069; Fri, 8 Oct 93 10:17:05 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Fri, 8 Oct 1993 10:17:03 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from haven.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA06059; Fri, 8 Oct 93 10:17:02 -0400
Received: by haven.EPM.ORNL.GOV (4.1/1.34)
	id AA15407; Fri, 8 Oct 93 10:16:56 EDT
Date: Fri, 8 Oct 93 10:16:56 EDT
From: worley@haven.EPM.ORNL.GOV (Pat Worley)
Message-Id: <9310081416.AA15407@haven.EPM.ORNL.GOV>
To: walker@rios2.epm.ornl.gov, pbwg-compactapp@cs.utk.edu
Subject: Re: Compact applications chapter
In-Reply-To: Mail from 'David W. Walker <walker@rios2.epm.ornl.gov>'
      dated: Fri, 08 Oct 93 09:16:19 -0500
Cc: worley@haven.EPM.ORNL.GOV

>I just sent the following to Mike Berry, but some of you might also like to make
>suggestions.
>
>David
>
>Mike,
>>I am a bit of a loss as to what to put into the ParkBench report
>for Compact Application since we haven't had any codes submitted (except
>for maybe 2 or 3).  It seems to me that we can't really say much without
>the codes, about from very general requirements.
>
>David

Since I imagine that there will always be a dearth of (good) compact
applications, a requirements document (or, at least, a wish list) would be a
useful contribution, particularly if the wishlist were prioritized by what is
most important for the code to have, e.g.,

1) scientific relevance (does anyone care about this type of problem)
2) numerical relevance (are the numerical algorithms representative or
   interesting) 
3) algorithmic relevance (are the parallel algorithms representative or
   interesting)
4) portability (language, parallel programming model, etc.)
5) runability (easy to run, easy to validate results, easy to use for
   benchmarking)
6) ...

This can probably be broken into requirements and desirable features.

Pat

From owner-pbwg-compactapp@CS.UTK.EDU Thu Oct 14 13:38:54 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA16662; Thu, 14 Oct 93 13:38:54 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA04580; Thu, 14 Oct 93 13:37:31 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 14 Oct 1993 13:37:29 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA04571; Thu, 14 Oct 93 13:37:28 -0400
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA19646; Thu, 14 Oct 1993 13:37:27 -0400
Date: Thu, 14 Oct 1993 13:37:27 -0400
From: walker@rios2.epm.ornl.gov (David Walker)
Message-Id: <9310141737.AA19646@rios2.epm.ornl.gov>
To: berry@cs.utk.edu
Subject: ParkBench compact applications
Cc: pbwg-compactapp@cs.utk.edu


Mike,
	Below is the latest version of the Compact Application section of the
ParkBench document. I also intend to send a latex version of the submission 
form to you later today for inclusion as Appendix A. I hope there will
be some comments back from the other members of teh subcommittee about
this section so I hope there will be an opportunity to update it.

David
%file: compac3.tex
%date: October 14, 1993
\chapter{Compact Applications}
\footnote{assembled by David Walker for Compact Applications subcommittee}

\section{Introduction}
\label{sec:compact.intro}
While kernel applications, such as those described in Chapter 3, provide
a fairly straightforward way of assessing the performance of parallel
systems they are not representative of scientific applications in general
since they do not reflect certain types of system behavior. In particular,
many scientific applications involve data movement between phases of
an application, and may also require significant amounts of I/O. These types
of behavior are difficult to gauge using kernel applications. 

One factor
that has hindered the use of full application codes for benchmarking parallel
computers in the past is that such codes are difficult to parallelize and to
port between target architectures. In addition, full application codes that
have been successfully parallelized are often proprietary, and/or subject
to distribution restrictions. To minimize the negative impact of these factors
we propose to make use of compact applications in our benchmarking effort.

Compact applications are typical of those found in research environments 
(as opposed to production or engineering environments), and usually consist of 
up to a few thousand lines of source code. Compact applications are distinct 
from kernel applications since they are capable of producing scientifically
useful results. In many cases, compact applications are made up of several
kernels, interspersed with data movements and I/O operations between the 
kernels.

In this chapter the criteria for selecting compact applications
for the ParkBench suite will be discussed. In addition, the general 
research areas that will be represented in the suite are outlined.

%In this chapter we will discuss a number of compact applications in terms of 
%their purpose, the algorithms used, the types of data movements required, 
%the memory requirements, and
%the amount of I/O. The compact application below are not meant to form a 
%definite or complete list.

\section{Criteria for Selection}
\label{sec:criteria}
The three main criteria for inclusion of a parallel code
in the Compact Applications suite are,
\begin{enumerate}
\item
The code must be a complete application and be capable of producing results
of research interest. These two points distinguish a compact application from
a kernel. For example, a code that only solves a randomly-generated, dense, 
linear system by LU factorization should be considered a kernel. Even though 
the code is complete, it does not produce results of research interest. 
However, if the LU factorization is embedded in an application that uses
the boundary element method to solve, for example, a two-dimensional
elastodynamics problem, then such an application could legitimately be
considered a compact application. 
Compact applications and full production codes are distinguished by their
software complexity, which is difficult to quantify. Software complexity gives
an indication of how hard it is to write, port and maintain an application, 
and may be gauged very roughly by the length of the source code. However, there
is no hard upper limit on the length of a code in the Compact Applications 
suite.  It is expected that the source code (excluding comments and repeated 
common blocks) for most compact applications will be between 2000 and 10000 
lines, but some may be longer.

\item
The code must be of high quality. This means it must have been extensively
tested and validated, preferably on a wide selection of different parallel
architectures. The problem size and number of processors used must not be
hard-coded into the application, and should be specified at runtime as input 
to the program. Ideally, the parallel code should not impose restrictions on 
the problem size that are not applicable for the corresponding sequential code.
Thus, the parallel code should not require that the problem size be exactly 
divisible by the number of processors, or that the number of processors be 
a power of two. In some cases this latter requirement may have to be relaxed.
For example, most parallel fast Fourier transform routines require the number
of processors to be a power of two. It is preferable that the code be
written so that it works correctly for
an arbitrary one-to-one mapping between the logical process topology of the
application and the hardware topology of the parallel computer.
This is desirable so
that the assignment of a location in the logical process topology to a
physical processor can be easily adjusted when porting
the application between platforms. For example a Gray code assignment may
be best for a hypercube, and a natural ordering for a mesh architecture.

\item
The application must be well documented. The source code itself should 
contain an adequate number of comments, and each module should begin
with a comment section that describes what the routine does, and the
arguments passed to it. In addition, there should be a ``Users' Guide''
to the application that describes the input and output, the parameterization
of the problem size and processor layout, and details of what the application
does. The Users' Guide should also contain a bibliography of related
papers.
\end{enumerate}

In addition, to the three criteria discussed above, there are a number of
other desirable features that a ParkBench Compact Application should have.
These are discussed in the following subsections.

\subsection{Self Checking Applications}
\label{subsec:checking}
The application should be self-checking. That is, at the end of the computation
the application should perform a check to validate the results of the run.
The application may also output a summary of performance results for the run,
such as the Mflop rate, and other pertinent information.

\subsection{Programming Languages}
\label{subsec:languages}
The code should be written in Fortran 77, Fortran 90, High Performance Fortran,
or C. Data should be passed between processors by explicit message passing.
ParkBench does not specify which message passing system should be used, but
one that is available on a number of parallel platforms is preferable. 
Eventually it is expected that MPI will become the message passing system
of choice, but in the meantime portable systems such as PVM, PICL, Express,
PARMACS, and P4 are acceptable alternatives. The codes in the
Compact Applications suite should not contain any assembly coded portions,
although assembly code may be used in optimized versions of the code.

\section{Proposed Compact Application Benchmarks}
\label{sec:compact.proposed}
At the time of writing (October 1993) the ParkBench organization is in
the process of soliciting submission of applications for inclusion in
the Compact Applications suite. Thus, the applications that comprise the suite
cannot yet be listed here. However, in this section the main application areas
that are expected to be in the suite are outlined. The intention is that
these areas should be representative of the fields in which parallel
computers are actually used. The codes should exercise a number of different
algorithms, and possess different communication and I/O characteristics.
Initially the Compact Applications suite will
consist of no more than ten codes. This restriction is imposed so that
the resources needed to manage and distribute the suite can be assessed. The
suite may be enlarged in the future if this seems manageable.
Below is a list of the application areas that are expected to be
represented in the suite. This is
not meant to be an exclusive list; submissions from other application areas
will be considered for inclusion in the suite.
\begin{itemize}
\item
Climate and meteorological modeling
\item
Computational fluid dynamics (CFD)
\item
Finance, e.g., portfolio optimization
\item
Molecular dynamics
\item
Plasma physics
\item
Quantum chemistry
\item
Quantum chromodynamics (QCD)
\item
Reservoir modeling
\end{itemize}

\section{Submitting to the Compact Application Suite}
\label{sec:submit}
The procedure for submitting codes to the ParkBench Compact Applications suite
is as follows.
\begin{enumerate}
\item
Complete the submission form in Appendix A, and email it to David Walker
at walker@msr.epm.ornl.gov. The data on this form will be reviewed
by the ParkBench Compact Applications Subcommittee, and the submitter will
be notified if the application is to be considered further for
inclusion in the ParkBench suite.
\item
If ParkBench Compact Applications Subcommittee decides to consider
the application further the submitter will be asked to submit the source code
and input and output files, together with any documentation and papers
about the application. Source code and input and output files should
be submitted by email, or ftp, unless the files are very large, in
which case a tar file on a 1/4 inch cassette tape. Wherever possible
email submission is preferred for all documents in man page, Latex
and/or Postscipt format. These files documents and papers together
constitute the application package. The application package should
be sent to the following address, and the subcommittee will then make a final 
decision on whether to include the application in the ParkBench suite.\par
\smallskip
\indent David W. Walker\par
\indent Oak Ridge National Laboratory\par
\indent Bldg.~6012/MS-6367\par
\indent P. O. Box 2008\par
\indent Oak Ridge, TN 37831-6367\par
\indent (615) 574-7401/0680 (phone/fax)\par
\indent walker@msr.epm.ornl.gov\par

\item
If the application is approved for inclusion in the ParkBench suite
an authorized person from the submitting organization will be asked
to complete and sign a form giving ParkBench authority to distribute,
and modify (if necessary), the application package.
From owner-pbwg-compactapp@CS.UTK.EDU Thu Oct 28 08:51:57 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA11600; Thu, 28 Oct 93 08:51:57 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA07295; Thu, 28 Oct 93 08:51:33 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 28 Oct 1993 08:51:32 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA07287; Thu, 28 Oct 93 08:51:31 -0400
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA13437; Thu, 28 Oct 1993 08:51:41 -0400
Date: Thu, 28 Oct 1993 08:51:41 -0400
From: walker@rios2.epm.ornl.gov (David Walker)
Message-Id: <9310281251.AA13437@rios2.epm.ornl.gov>
To: pbwg-compactapp@cs.utk.edu
Subject: Compact Appl. Submissions


So far I've received 3 submissions for the ParkBench Compact
Applications suite. I'm sending you the completed forms in 3 
separate email messages.

David
From owner-pbwg-compactapp@CS.UTK.EDU Thu Oct 28 08:52:38 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA11616; Thu, 28 Oct 93 08:52:38 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA07341; Thu, 28 Oct 93 08:52:14 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 28 Oct 1993 08:52:13 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA07333; Thu, 28 Oct 93 08:52:11 -0400
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA11913; Thu, 28 Oct 1993 08:52:21 -0400
Date: Thu, 28 Oct 1993 08:52:21 -0400
From: walker@rios2.epm.ornl.gov (David Walker)
Message-Id: <9310281252.AA11913@rios2.epm.ornl.gov>
To: pbwg-compactapp@cs.utk.edu
Subject: POLMP Compact Application


-------------------------------------------------------------------------------
Name of Program         : POLMP
                 (Proudman Oceanographic Laboratory Multiprocessing Program)
-------------------------------------------------------------------------------
Submitter's Name        : Mike Ashworth
Submitter's Organization: NERC Computer Services
Submitter's Address     : Bidston Observatory
			  Birkenhead, L43 7RA, UK
Submitter's Telephone # : +44-51-653-8633
Submitter's Fax #       : +44-51-653-6269
Submitter's Email       : mia@ua.nbi.ac.uk
-------------------------------------------------------------------------------
Cognizant Expert 	: Mike Ashworth
CE's Organization	: NERC Computer Services
CE's Address     	: Bidston Observatory
			  Birkenhead, L43 7RA, UK
CE's Telephone # 	: +44-51-653-8633
CE's Fax #       	: +44-51-653-6269
CE's Email       	: mia@ua.nbi.ac.uk
-------------------------------------------------------------------------------
Extent and timeliness with which CE is prepared to respond to questions and
bug reports from ParkBench :

Bearing in mind other commitments, Mike Ashworth is prepared to respond 
quickly to questions and bug reports, and expects to be kept informed as 
to results of experiments and modifications to the code.

-------------------------------------------------------------------------------
Major Application Field : Fluid Dynamics
Application Subfield(s) : Ocean and Shallow Sea Modeling
-------------------------------------------------------------------------------
Application "pedigree" (origin, history, authors, major mods) :

     The POLMP project was created to develop numerical
     algorithms for shallow sea 3D hydrodynamic models that run
     efficiently on modern parallel computers. A code was
     developed, using a set of portable programming conventions
     based upon standard Fortran 77, which follows the wind
     induced flow in a closed rectangular basin including a number
     of arbitrary land areas. The model solves a set of
     hydrodynamic partial differential equations, subject to a set of
     initial conditions, using a mixed explicit/implicit forward time
     integration scheme. The explicit component corresponds to a
     horizontal finite difference scheme and the implicit to a
     functional expansion in the vertical (Davies, Grzonka and
     Stephens, 1989).

     By the end of 1989 the code had been implemented on the RAL
     4 processor Cray X-MP using Cray's microtasking system,
     which provides parallel processing at the level of the Fortran
     DO loop. Acceptable parallel performance was achieved by
     integrating each of the vertical modes in parallel, referred to
     in Ashworth and Davies (1992) as vertical partitioning. In
     particular, a speed-up of 3.15 over single processor execution
     was obtained, with an execution rate of 548 Megaflops
     corresponding to 58 per cent of the peak theoretical
     performance of the machine. Execution on an 8 processor Cray
     Y-MP gave a speed-up efficiency of 7.9 and 1768 Megaflops or
     67 per cent of the peak (Davies, Proctor and O'Neill, 1991).
     The latter resulted in Davies and Grzonka being awarded a
     prize in the 1990 Cray Gigaflop Performance Awards .

     The project has been extended by implementing the shallow
     sea model in a form which is more appropriate to a variety of
     parallel architectures, especially distributed memory
     machines, and to a larger number of processors. It is especially
     desirable to be able to compare shared memory parallel
     architectures with distributed memory architectures. Such a
     comparison is currently relevant to NERC science generally
     and will be a factor in the considerations for the purchase of
     new machines, bids for allocations on other academic
     machines, and for the design of new codes and the
     restructuring of existing codes.

     In order to simplify development of the new code and to ensure
     a proper comparison between machines, a restructured version
     of the Davies and Grzonka rectangle was designed which will
     perform partitioning of the region in the horizontal dimension.
     This has the advantage over vertical partitioning that the
     communication between processors is limited to a few points
     at the boundaries of each sub-domain. The ratio of interior
     points to boundary points, which determines the ratio of
     computation to communication and hence the efficiency on
     message passing, distributed memory machines, may be
     increased by increasing the size of the individual sub-domains.
     This design may also improve the efficiency on shared memory
     machines by reducing the time of the critical section and
     reducing memory conflicts between processors. In addition, the
     required number of vertical modes is only about 16, which,
     though well suited to a 4 or 8 processor machine, does not
     contain sufficient parallelism for more highly parallel
     machines.

     The code has been designed with portability in mind, so that
     essentially the same code may be run on parallel computers
     with a range of architectures. 

-------------------------------------------------------------------------------
May this code be freely distributed (if not specify restrictions) :

Yes, but users are requested to acknowledge the authors (Ashworth and
Davies) in any resulting research or publications, and are
encouraged to send reprints of their work with this code to the authors.
Also, the authors would appreciate being notified of any modifications to 
the code. 

-------------------------------------------------------------------------------
Give length in bytes of integers and floating-point numbers that should be
used in this application:

Some 8 byte floating point numbers are used in some of the initialization
code, but calculations on the main field arrays may be done using
4 byte floating point variables without grossly affecting the solution.
Nevertheless, precision conversion is facilitated by a switch supplied
to the C preprocessor. By specifying -DSINGLE, variables will be declared
as REAL, normally 4 bytes, whereas -DDOUBLE will cause declarations to be
DOUBLE PRECISION, normally 8 bytes.

-------------------------------------------------------------------------------
Documentation describing the implementation of the application (at module
level, or lower) :

The README file supplied with the code describes how the various versions
of the code should be built. Extensive documentation, including the 
definition of all variables in COMMON is present as comments in the code.

-------------------------------------------------------------------------------
Research papers describing sequential code and/or algorithms :

1) Davies, A.M., Formulation of a linear three-dimensional hydrodynamic
   sea model using a Galerkin-eigenfunction method, Int. J. Num. Meth.
   in Fliuds, 1983, Vol. 3, 33-60.

2) Davies, A.M., Solution of the 3D linear hydrodynamic equations using
   an enhanced eigenfunction approach, Int. J. Num. Meth. in Fluids,
   1991, Vol. 13, 235-250.

-------------------------------------------------------------------------------
Research papers describing parallel code and/or algorithms :

1) Ashworth, M. and Davies, A.M., Restructuring three-dimensional
   hydrodynamic models for computers with low and high degrees of
   parallelism, in Parallel Computing '91, eds D.J.Evans, G.R.Joubert
   and H.Liddell (North Holland, 1992), 553-560.
   
2) Ashworth, M., Parallel Processing in Environmental Modelling, in
   Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
   Meteorology (Nov. 23-27, 1992)
   Hoffman, G.-R and T. Kauranne, ed., 
   World Scientific Publishing Co. Pte. Ltd, Singapore, 1993.

3) Ashworth, M. and Davies, A.M., Performance of a Three Dimensional
   Hydrodynamic Model on a Range of Parallel Computers, in
   Proceedings of the Euromicro Workshop on Parallel and Distributed
   Computing, Gran Canaria 27-29 January 1993, pp 383-390, (IEEE
   Computer Society Press)
   
4) Davies, A.M., Ashworth, M., Lawrence, J., O'Neill, M.,
   Implementation of three dimensional shallow sea models on vector
   and parallel computers, 1992a, CFD News, Vol. 3, No. 1, 18-30.
   
5) Davies, A.M., Grzonka, R.G. and Stephens, C.V., The implementation
   of hydrodynamic numerical sea models on the Cray X-MP, 1992b, in
   Advances in Parallel Computing, Vol. 2, edited D.J. Evans.
   
6) Davies, A.M., Proctor, R. and O'Neill, M., "Shallow Sea
   Hydrodynamic Models in Environmental Science", Cray Channels,
   Winter 1991.

-------------------------------------------------------------------------------
Other relevant research papers:

-------------------------------------------------------------------------------
Application available in the following languages (give message passing system
used, if applicable, and machines application runs on) :

Code is initially passed through the C preprocessor, allowing a 
number of versions with different programming styles, precisions
and machine dependencies to be generated.

Fortran 77 version

     A sequential version of POLMP is available, which conforms
     to the Fortran 77 standard. This version has been run on a
     large number of machines from workstations to supercomputers 
     and any code which caused problems, even if it conformed to 
     the standard, has been changed or removed. Thus its conformance 
     to the Fortran 77 standard is well established.

     In order to allow the code to run on a wide range of problem
     sizes without recompilation, the major arrays are defined
     dynamically by setting up pointers, with names starting with
     IX, which point to locations in a single large data array: SA.
     Most pointers are allocated in subroutine MODSUB and the
     starting location passed down into subroutines in which they
     are declared as arrays. For example :

     IX1 = 1
     IX2 = IX1 + N*M
     CALL SUB ( SA(IX1), SA(IX2), N, M )

     SUBROUTINE SUB ( A1, A2, N, M )
     DIMENSION A1(N,M), A2(N,M)
     END

     Although this is probably against the spirit of the Fortran 77
     standard, it is considered the best compromise between
     portability and utility, and has caused no problems on any of
     the machines on which it has been tried. 

     The code has been run on a number of traditional vector
     supercomputers, mainframes and workstations. In addition,
     key loops are able to be parallelized automatically by some
     compilers on shared (or virtual shared) memory MIMD machines, 
     allowing parallel execution on the Convex C2 and C3, Cray X-MP, 
     Y-MP, and Y-MP/C90, and Kendall Square Research KSR-1. Cray 
     macrotasking calls may also be enabled for an alternative
     mode of parallel execution on Cray multiprocessors.

Message passing version

     POLMP has been implemented on a number of message-passing machines:
     Intel iPSC/2 and iPSC/860, Meiko CS-1 i860 and CS-2 and nCUBE 2.
     Code is also present for the PVM and Parmacs portable message
     passing systems, and POLMP has run successfully, though not 
     efficiently, on a network of Silicon Graphics workstations. 
     Calls to message passing routines are concentrated 
     in a small number of routines for ease of portability and 
     maintenance. POLMP performs housekeeping tasks on one node of the 
     parallel machine, usually node zero, referred to in the code as the 
     driver process, the remaining processes being workers. For Parmacs
     version 5 which requires a host program, a simple host program has 
     been provided which loads the node program onto a two dimensional 
     torus and then takes no further part in the run, other than to 
     receive a completion code from the driver, in case terminating the 
     host early would interfere with execution of the nodes.

Data parallel versions

     A data parallel version of the code has been run on the
     Thinking Machines CM-2, CM-200 and MasPar MP-1 machines.

     High Performance Fortran (HPF) defines extensions to the
     Fortran 90 language in order to provide support for parallel
     execution on a wide variety of machines using a data parallel
     programming model. 

     The subset-HPF version of the POLMP code has been written
     to the draft standard specified by the High Performance
     Fortran Forum in the HPF Language Specification version 0.4
     dated November 6, 1992. Fortran 90 code was developed on a
     Thinking Machines CM-200 machine and checked for
     conformance with the Fortran 90 standard using the
     NAGWare Fortran 90 compiler. HPF directives were inserted
     by translating from the CM Fortran directives, but have not
     been tested due to the lack of access to an HPF compiler. The
     only HPF features used are the PROCESSORS, TEMPLATE,
     ALIGN and DISTRIBUTE directives and the system inquiry
     intrinsic function NUMBER_OF_PROCESSORS.

-------------------------------------------------------------------------------
Total number of lines in source code: 26,699
Number of lines excluding comments  : 11,313
Size in bytes of source code        : 756,107

-------------------------------------------------------------------------------
List input files (filename, number of lines, size in bytes, and if formatted) :

steering file:   13 lines, 250 bytes, ascii (typical size)

-------------------------------------------------------------------------------
List output files (filename, number of lines, size in bytes, and if formatted) :

standard output: 700 lines, 62,000 bytes, ascii (typical size)

-------------------------------------------------------------------------------
Brief, high-level description of what application does:

POLMP solves the linear three-dimensional hydrodynamic equations 
for the wind induced flow in a closed rectangular basin of constant depth
which may include an arbitrary number of land areas. 

-------------------------------------------------------------------------------
Main algorithms used:

The discretized form of the hydrodynamic equations are solved for field 
variables, z, surface elevation, and u and v, horizontal components of
velocity. The fields are represented in the horizontal by a staggered 
finite difference grid. The profile of vertical velocity with depth
is represented by the superposition of a number of spectral components.
The functions used in the vertical are arbitrary, although the 
computational advantages of using eigenfunctions (modes) of the eddy
viscosity profile have been demonstrated (Davies, 1983). Velocities
at the closed boundaries are set to zero.

Each timestep in the forward time integration of the model, involves
successive updates to the three fields, z, u and v. New field values 
computed in each update are used in the subsequent calculations. A
five point finite difference stencil is used, requiring only nearest 
neighbours on the grid. 

A number of different data storage and data processing methods is 
included mainly for handling cases with significant amounts of land, 
e.g. index array, packed data. In particular the program may be 
switched between masked operation, more suitable for vector processors, 
in which computation is done on all points, but land and boundary points
are masked out, and strip-mining, more suitable for scalar and RISC 
processors, in which calculations are only done for sea points.

-------------------------------------------------------------------------------
Skeleton sketch of application:

The call chart of the major subroutines is represented thus:

  AAAPOL -> APOLMP -> INIT
                   -> RUNPOL -> INIT2  -> MAP
                                       -> DIVIDE
                                       -> PRMAP
                                       -> GENSTP
                                       -> SPEC   -> ROOTS  -> TRANS
                             -> SNDWRK
                             -> RCVWRK
                             -> SETUP
                             -> MODSUB -> MODEL  -> ASSIGN -> GENMSK
                                                           -> GENSTP
                                                           -> GENIND
                                                           -> GENPAC
                                                           -> METRIC
                                                 -> CLRFLD
                                                 -> TIME*  -> SNDBND
                                                           -> RCVBND
                                                 -> RESULT
                             -> SNDRES
                             -> RCVRES
                             -> MODOUT -> OZUVW  -> OUTFLD -> GETRES
                                                           -> OUTARR
                                                           -> GRYARR
                                       -> WSTATE

AAAPOL is a dummy main program calling APOLMP. APOLMP calls INIT which
reads parameters from the steering file, checks and monitors them.
RUNPOL is then called which calls another initialization routine INIT2.
Called from INIT2, MAP forms a map of the domain to be modelled, DIVIDE
divides the domain between processors, PRMAP maps sub-domains onto
processors, GENSTP counts indexes for strip-mining and SPEC, ROOTS
and TRANS set up the coefficients for the spectral expansion.

SNDWRK on the driver process sends details of the sub-domain to be
worked on to each worker. RCVWRK receives that information. SETUP
does some array allocation and MODSUB does the main allocation of array 
space to the field and ancillary arrays. MODEL is the main driver 
subroutine for the model. ASSIGN calls routines to generate masks
strip-mining indexes, packing indexes and measurement metrics.
CLRFLD initializes the main data arrays. Then one of seven time-
stepping routines, TIME*, is chosen dependent on the vectorization
and packing/indexing method used to cope with the presence of land.
SNDBND and RCVBND handle the sending and reception of boundary
data between sub-domains. After the required number of time-steps
is complete, RESULT saves results from the desired region, and 
SNDRES, on the workers and RCVRES on the driver collect the result data.
MODOUT handles the writing of model output to standard output and disk
files, as required.

For a non-trivial run, 99% of time is spent in whichever of the 
timestepping routines, TIME*, has been chosen.

-------------------------------------------------------------------------------
Brief description of I/O behavior:

The driver process, usually processor 0, reads in the input parameters 
and broadcasts them to the rest of the processors. The driver also receives 
the results from the other processors and writes them out.

-------------------------------------------------------------------------------
Describe the data distribution (if appropriate) :

The processors are treated as a logical 2-D grid. The simulation domain
is divided into a number of sub-domains which are allocated, one sub-domain
per processor.

-------------------------------------------------------------------------------
Give parameters of the data distribution (if appropriate) :

The number of processors, p, and the number of sub-domains are provided 
as steering parameters, as is a switch which requests either one-dimensional
or two-dimensional partitioning. 

Partitioning is only actually carried out for the message passing versions
of the code. For two-dimensional partitioning p is factored into px and py 
where px and py are as close as possible to sqrt(p). 

For the data parallel version the number of sub-domains is set to one 
and decomposition is performed by the compiler via data distribution 
directives.

-------------------------------------------------------------------------------
Brief description of load balance behavior :

Unless land areas are specified, the load is fairly well balanced. 
If px and py evenly divide the number of grid points, then the
model is perfectly balanced except that boundary sub-domains have 
fewer communications.

No tests with land areas have yet been performed with the parallel 
code, and more sophisticated domain decomposition algorithms have
not yet been included.

-------------------------------------------------------------------------------
Give parameters that determine the problem size :

nx, ny      Size of horizontal grid
m           Number of vertical modes
nts         Number of timesteps to be performed

-------------------------------------------------------------------------------
Give memory as function of problem size :

See below for specific examples.

-------------------------------------------------------------------------------
Give number of floating-point operations as function of problem size :

Assuming stanrdard compiler optimizations, there is a requirement for
29 floating point operations (18 add/subtracts and 11 multiplies) per 
grid point, so the total computational load is

          29 * nx * ny * m * nts

-------------------------------------------------------------------------------
Give communication overhead as function of problem size and data distribution :

During each timestep each sub-domain of size nsubx=nx/px by nsuby=ny/py 
requires the following communications in words :

             nsubx * m     from N
             nsubx         from S
             nsubx * m     from S
             nsuby * m     from W
             nsuby         from E
             nsuby * m     from E
             m             from NE
             m             from SW

making a total of 

             (2 * m + 1)*(nsubx * nsuby) + 2*m words 

in eight messages from six directions.

-------------------------------------------------------------------------------
Give three problem sizes, small, medium, and large for which the benchmark
should be run (give parameters for problem size, sizes of I/O files,
memory required, and number of floating point operations) :

     The data sizes and computational requirements for the various
     problems supplied are :

     Name      nx x ny x m x nts        Computational    Memory
                                        Load (Gflop)     (Mword)

     dbg        10 x   10 x  1 x 2      Small debugging test case

     dbg2d      10 x   10 x  1 x 2      Small debugging test case
                                        for a 2 x 2 decomposition

     v200      512 x  512 x 16 x 200        24             14 

     wa200    1024 x 1024 x 40 x 200       226            126

     xb200    2048 x 2048 x 80 x 200      1812            984

     The memory sizes are the number of Fortran real elements
     (words) required for the strip-mined case on a single processor.
     For the masked case the memory requirement is approximately doubled 
     for the extra mask arrays. For the message passing versions, the 
     total memory requirement will also tend to increase slightly (<10%) 
     with the number of processors employed.

-------------------------------------------------------------------------------
How did you determine the number of floating-point operations (hardware
monitor, count by hand, etc.) :

Count by hand looking at inner loops and making reasonable assumptions
about common compiler optimizations.

-------------------------------------------------------------------------------
Other relevant information:



-------------------------------------------------------------------------------

-- 
                                    ,?,
                                   (o o)
|------------------------------oOO--(_)--OOo----------------------------|
|                                                                       |
| Dr Mike Ashworth                          NERC Computer Services      |
| NERC Supercomputing Consultant            Bidston Observatory         |
| Tel:         +44 51 653 8633              BIRKENHEAD                  |
| Fax:         +44 51 653 6269              L43 7RA                     |
| email:       mia@ua.nbi.ac.uk             United Kingdom              |
| alternative: M.Ashworth@ncs.nerc.ac.uk                                |
|-----------------------------------------------------------------------|

From owner-pbwg-compactapp@CS.UTK.EDU Thu Oct 28 08:52:55 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA11653; Thu, 28 Oct 93 08:52:55 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA07365; Thu, 28 Oct 93 08:52:35 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 28 Oct 1993 08:52:34 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA07357; Thu, 28 Oct 93 08:52:32 -0400
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA16524; Thu, 28 Oct 1993 08:52:41 -0400
Date: Thu, 28 Oct 1993 08:52:41 -0400
From: walker@rios2.epm.ornl.gov (David Walker)
Message-Id: <9310281252.AA16524@rios2.epm.ornl.gov>
To: pbwg-compactapp@cs.utk.edu
Subject: PSTSWM Compact Application


Received: from msr.EPM.ORNL.GOV by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA20602; Tue, 5 Oct 1993 09:58:22 -0400
Received: from haven.EPM.ORNL.GOV by msr.epm.ornl.gov (4.1/1.34)
	id AA09050; Tue, 5 Oct 93 09:58:21 EDT
Received: by haven.EPM.ORNL.GOV (4.1/1.34)
	id AA13369; Tue, 5 Oct 93 09:58:14 EDT
Date: Tue, 5 Oct 93 09:58:14 EDT
From: worley@haven.epm.ornl.gov (Pat Worley)
Message-Id: <9310051358.AA13369@haven.EPM.ORNL.GOV>
To: walker@msr.epm.ornl.gov

                 PARKBENCH COMPACT APPLICATIONS SUBMISSION FORM

To submit a compact application to the ParkBench suite you must follow the
following procedure:

1. Complete the submission form below, and email it to David Walker
   at walker@msr.epm.ornl.gov. The data on this form will be reviewed 
   by the ParkBench Compact Applications Subcommittee, and you will
   be notified if the application is to be considered further for
   inclusion in the ParkBench suite.
   
2. If ParkBench Compact Applications Subcommittee decides to consider
   your application further you will be asked to submit the source code
   and input and output files, together with any documentation and papers
   about the application. Source code and input and output files should
   be submitted by email, or ftp, unless the files are very large, in
   which case a tar file on a 1/4 inch cassette tape. Wherever possible 
   email submission is preferred for all documents in man page, Latex 
   and/or Postscipt format. These files documents and papers together
   constitute your application package. Your application package should
   be sent to:
David Walker
                Oak Ridge National Laboratory
                Bldg. 6012/MS-6367
                P. O. Box 2008
                Oak Ridge, TN 37831-6367
                (615) 574-7401/0680 (phone/fax)
                walker@msr.epm.ornl.gov

   The street address is "Bethal Valley Road" if Fedex insists on this.
   The subcommittee will then make a final decision on whether to include 
   your application in the ParkBench suite.

3. If your application is approved for inclusion in the ParkBench suite
   you (or some authorized person from your organization) will be asked
   in complete and sign a form giving ParkBench authority to distribute,
   and modify (if necessary), your application package.

-------------------------------------------------------------------------------
Name of Program         : PSTSWM 
                        : (Parallel Spectral Transform Shallow Water Model)
-------------------------------------------------------------------------------
Submitter's Name        : Patrick H. Worley
Submitter's Organization: Oak Ridge National Laboratory
Submitter's Address     : Bldg. 6012/MS-6367
                          P. O. Box 2008
                          Oak Ridge, TN 37831-6367
Submitter's Telephone # : (615) 574-3128
Submitter's Fax #       : (615) 574-0680
Submitter's Email       : worley@msr.epm.ornl.gov
-------------------------------------------------------------------------------
Cognizant Expert(s)     : Patrick H. Worley
CE's Organization       : Oak Ridge National Laboratory
CE's Address            : Bldg. 6012/MS-6367
                          P. O. Box 2008
                          Oak Ridge, TN 37831-6367
CE's Telephone #        : (615) 574-3128
CE's Fax #              : (615) 574-0680
CE's Email              : worley@msr.epm.ornl.gov

Cognizant Expert(s)     : Ian T. Foster
CE's Organization       : Argonne National Laboratory
CE's Address            : MCS 221/D-235
                          9700 S. Cass Avenue
                          Argonne, IL 60439
CE's Telephone #        : (708) 252-4619
CE's Fax #              : (708) 252-5986
CE's Email              : itf@mcs.anl.gov
-------------------------------------------------------------------------------
Extent and timeliness with which CE is prepared to respond to questions and
bug reports from ParkBench :

Modulo other commitments, Worley is prepared to respond quickly to questions
and bug reports, but expects to be kept informed as to results of experiments
and modifications to the code.

-------------------------------------------------------------------------------
Major Application Field : Fluid Dynamics
Application Subfield(s) : Climate Modeling
-------------------------------------------------------------------------------
Application "pedigree"  :

PSTSWM Version 1.0 is a message-passing benchmark code and parallel algorithm
testbed that solves the nonlinear shallow water equations using the spectral
transform method. The spectral transform algorithm of the code follows
closely how CCM2, the NCAR Community Climate Model, handles the dynamical
part of the primitive equations, and the parallel algorithms implemented in
the model include those currently used in the message-passing parallel
implementation of CCM2. PSTSWM was written by Patrick Worley of Oak Ridge
National Laboratory and Ian Foster of Argonne National Laboratory, and is
based partly on previous parallel algorithm research by John Drake, David
Walker, and Patrick Worley of Oak Ridge National Laboratory. Both the code
development and parallel algorithms research were funded by the DOE Computer
Hardware, Advanced Mathematics, and Model Physics (CHAMMP) program. The
features of version 1.0 were frozen on 8/1/93, and it is this version we
would offer initially as a benchmark.  

PSTSWM is a parallel implementation of a sequential code (STSWM 2.0) written
by James Hack and Ruediger Jakob at NCAR to solve the shallow water equations 
on a sphere using the spectral transform method. STSWM evolved from a
spectral shallow water model written by Hack (NCAR/CGD) to compare numerical
schemes designed to solve the divergent barotropic equations in spherical
geometry. STSWM was written partially to provide the reference solutions
to the test cases proposed by Williamson et. al. (see citation [4] below),
which were chosen to test the ability of numerical methods to simulate
important flow phenomena. These test cases are embedded in the code and 
are selectable at run-time via input parameters, specifying initial conditions,
forcing, and analytic solutions (for error analysis). The solutions are also
published in a Technical Note by Jakob et. al. [3]. In addition, this code is
meant to serve as an educational tool for numerical studies of the shallow
water equations. A detailed description of the spectral transform method, and
a derivation of the equations used in this software, can be found in the
Technical Note by Hack and Jakob [2].  

For PSTSWM, we rewrote STSWM to add vertical levels (in order to get the
correct communication and computation granularity for 3-D weather and climate
codes), to increase modularity and support code reuse, and to allow the
problem size to be selected at runtime without depending on dynamic memory
allocation. PSTSTWM is meant to be a compromise between paper benchmarks and
the usual fixed benchmarks by allowing a significant amount of
runtime-selectable algorithm tuning. Thus, the goal is to see how quickly the
numerical simulation can be run on different machines without fixing the
parallel implementation, but forcing all implementations to execute the same
numerical code (to guarantee fairness). The code has also been written in
such a way that linking in optimized library functions for common operations
instead of the "portable" code will simple.

-------------------------------------------------------------------------------
May this code be freely distributed (if not specify restrictions) :

Yes, but users are requested to acknowledge the authors (Worley and
Foster) and the program that supported the development of the code
(DOE CHAMMP program) in any resulting research or publications, and are
encouraged to send reprints of their work with this code to the authors.
Also, the authors would appreciate being notified of any modifications to 
the code. Finally, the code has been written to allow easy reuse of code in
other applications, and for educational purposes. The authors encourage this,
but also request that they be notified when pieces of the code are used.

-------------------------------------------------------------------------------
Give length in bytes of integers and floating-point numbers that should be
used in this application:

The program currently uses INTEGER, REAL, COMPLEX, and DOUBLE PRECISION
variables. The code should work correctly for any system in which COMPLEX is
represented as 2 REALs. The include file params.i has parameters that can be
used to specify the length of these. Also, some REAL and DOUBLE parameters
values may need to be modified for floating point number systems with large
mantissas, e.g., PI, TWOPI. PSTSWM is currently being used on systems where

        Integers : 4   bytes
	Floats   : 4   bytes

The use of two precisions can be eliminated, but at the cost of a significant
loss of precision. (For 4 bytes REALs, not using DOUBLE PRECISION increases
the error by approximately three orders of magnitude.) DOUBLE PRECISION
results are only used in set-up (computing Gauss weights and nodes and
Legendre polynomial values), and are not used in the body of the computation.

-------------------------------------------------------------------------------
Documentation describing the implementation of the application (at module
level, or lower) :

The sequential code is documented in a file included in the distribution of the
code from NCAR:

Jakob, Ruediger, Description of Software for the Spectral Transform Shallow
Water Model Version 2.0. National Center for Atmospheric Research,
Boulder, CO 80307-3000, August 1992

and in 

Hack, J.J. and R. Jakob, Description of a global shallow water model based on
the spectral transform method, NCAR Technical Note TN-343+STR, January 1992. 

Documentation of the parallel code is in preparation, but extensive
documentation is present in the code.

-------------------------------------------------------------------------------
Research papers describing sequential code and/or algorithms :

1) Browning, G.L., J.J. Hack and P.N. Swarztrauber, A comparison of
   three numerical methods for solving differential equations on
   the sphere, Monthly Weather Review, 117:1058-1075, 1989.

2) Hack, J.J. and R. Jakob, Description of a global
   shallow water model based on the spectral transform method,
   NCAR Technical Note TN-343+STR, January 1992.

3) Jakob, R., J.J. Hack and D.L. Williamson, Reference solutions to
   shallow water test set using the spectral transform method,
   NCAR Technical Note TN-388+STR (in preparation).

4) Williamson, D.L., J.B. Drake, J.J. Hack, R. Jakob and P.S. Swarztrauber,
   A standard test set for numerical approximations to the shallow
   water equations in spherical geometry, Journal of Computational Physics,
   Vol. 102, pp.211-224, 1992.
-------------------------------------------------------------------------------
Research papers describing parallel code and/or algorithms :

5) Worley, P. H. and J. B. Drake, Parallelizing the Spectral Transform Method,
   Concurrency: Practice and Experience, Vol. 4, No. 4 (June 1992), 
   pp. 269-291.

6) Walker, D. W., P. H. Worley, and J. B. Drake, Parallelizing the Spectral
   Transform Method. Part II, 
   Concurrency: Practice and Experience, Vol. 4, No. 7 (October 1992), 
   pp. 509-531.

7) Foster, I. T. and P. H. Worley,
   Parallelizing the Spectral Transform Method: A Comparison of Alternative
   Parallel Algorithms,
   Proceedings of the Sixth SIAM Conference on Parallel Processing for
   Scientific Computing (March22-24, 1993), pp. 100-107.

8) Foster, I. T. and P. H. Worley,
   Parallel Algorithms for the Spectral Transform Method,
   (in preparation)

9) Worley, P. H. and I. T. Foster,
   PSTSWM: A Parallel Algorithm Testbed and Benchmark.
   (in preparation)

-------------------------------------------------------------------------------
Other relevent research papers:

10) I. Foster, W. Gropp, and R. Stevens, 
    The parallel scalability of the spectral transform method, 
    Mon. Wea. Rev., 120(5), 1992, pp. 835--850. 

11) Drake, J. B., R. E. Flanery, I. T. Foster, J. J. Hack, J. G. Michalakes,
    R. L. Stevens, D. W. Walker, D. L. Williamson, and P. H. Worley,
    The Message-Passing Version of the Parallel Community Climate Model,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 500-513.

12) Sato, R. K. and R. D. Loft,
    Implementation of the NCAR CCM2 on the Connection Machine,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 371-393.

13) Barros, S. R. M. and Kauranne, T.,
    On the Parallelization of Global Spectral Eulerian Shallow-Water Models,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 36-43.

14) Kauranne, T. and S. R. M. Barros,
    Scalability Estimates of Parallel Spectral Atmospheric Models,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 312-328.

15) Pelz, R. B. and W. F. Stern,
    A Balanced Parallel Algorithm for Parallel Processing,
    Proceedings of the Sixth SIAM Conference on Parallel Processing for
    Scientific Computing (March22-24, 1993), pp. 126-128.

-------------------------------------------------------------------------------
Application available in the following languages (give message passing system
used, if applicable, and machines application runs on) :

The model code is primarily written in Fortran 77, but also uses
DO ... ENDDO and DO WHILE ... ENDDO, and the INCLUDE extension (to pull in
common and parameter declarations). It has been compiled and run on the Intel
iPSC/2, iPSC/860, Delta, and Paragon, the IBM SP1, and on Sun Sparcstation,
IBM RS/6000, and Stardent 3000/1500 workstations (as a sequential code).

Message passing is implemented using the PICL message passing system.
All message passing is encapsulated in 3 highlevel routines:

BCAST0 (broadcast)
GMIN0  (global minimum)
GMAX0  (global maximum)

two classes of low level routines:
 SWAP, SWAP_SEND, SWAP_RECV, SWAP_RECVBEGIN, SWAP_RECVEND, SWAP1, SWAP2, SWAP3
 (variants and/or pieces of the swap operation)
and
 SENDRECV, SRBEGIN, SREND, SR1, SR2, SR3
 (variants and/or pieces of the send/recv operation)

and one synchronization primitive:
CLOCKSYNC0

PICL instrumentation commands are also embedded in the code.

Porting the code to another message passing library will be simple, although
some of the runtime communication options may become illegal then.
The PICL instrumentation calls can be stubbed out (or removed) without
changing the functionality of the code, but some sort of synchronization is
needed when timing short benchmark runs.

-------------------------------------------------------------------------------
Total number of lines in source code: 28,204
Number of lines excluding comments  : 12,434
Size in bytes of source code        : 994,299
-------------------------------------------------------------------------------
List input files (filename, number of lines, size in bytes, and if formatted) :

problem:   23 lines, 559 bytes, ascii
algorithm: 33 lines, 874 bytes, ascii

-------------------------------------------------------------------------------
List output files (filename, number of lines, size in bytes, and if formatted) :

standard output: Number of lines and bytes is a function of the input
                 specifications, but for benchmarking would normally be
                 63 lines (2000 bytes) of meaningful output. (On the Intel
                 machine, FORTRAN STOP messages are sent from each processor
                 at the end of the run, increasing this number.)

timings:         Each run produces one line of output, containing approx.
                 150 bytes.

Both files are ascii.


-------------------------------------------------------------------------------
Brief, high-level description of what application does:

(P)STSWM solves the nonlinear shallow water equations on the sphere.
The nonlinear shallow water equations constitute a simplified
atmospheric-like fluid prediction model that exhibits many of the features of
more complete models, and that has been used to investigate numerical
methods and benchmark a number of machines.
Each run of PSTSWM uses one of 6 embedded initial conditions and forcing
functions. These cases were chosen to stress test numerical methods for this
problem, and to represent important flows that develop in atmospheric
modeling. STSWM also supports reading in arbitrary initial conditions, but
this was removed from the parallel code to simplify the development of the
initial implementation. 

-------------------------------------------------------------------------------
Main algorithms used:

PSTSWM uses the spectral transform method to solve the shallow water
equations. During each timestep, the state variables of the
problem are transformed between the physical domain, where most of the
physical forces are calculated, and the spectral domain, where the terms of
the differential equation are evaluated. The physical domain is a tensor
product longitude-latitude grid. The spectral domain is the set of spectral
coefficients in a spherical harmonic expansion of of the state variables, and
is normally characterized as a triangular array (using a "triangular"
truncation of spectral coefficients). 

Transforming from physical coordinates to spectral coordinates involves
performing a real FFT for each line of constant latitude, followed by 
integration over latitude using Gaussian quadrature (approximating the
Legendre transform) to obtain the spectral coefficients. The inverse
transformation involves evaluating sums of spectral harmonics and inverse
real FFTs, analogous to the forward transform.

Parallel algorithms are used to compute the FFTs and to compute the 
vector sums used to approximate the forward and inverse Legendre transforms.
Two major alternatives are available for both transforms, distributed
algorithms, using a fixed data decompostion and computing results where they
are assigned, and transpose algorithms, remapping the domains to allow the
transforms to be calculated sequentially. This translates to four major
parallel algorithms:

a) distributed FFT/distributed Legendre transform (LT)
b) transpose FFT/distributed LT
c) distributed FFT/transpose LT
d) transpose FFT/transpose LT

Multiple implementations are supported for each type of algorithm, and
the assignment of processors to transforms is also determined by input
parameters. For example, input parameters specify a logical 2-D processor
grid and define the data decomposition of the physical and spectral domains
onto this grid. If 16 processors are used, these can be arranged as
a 4x4 grid, an 8x2 grid, a 16x1 grid, a 2x8 grid, or a 1x16 grid.
This specification determines how many processors are used to calculate each
parallel FFT and how many are used to calculate each parallel LT.

-------------------------------------------------------------------------------
Skeleton sketch of application:

The main program calls INPUT to read problem and algorithm parameters
and set up arrays for spectral transformations, and then calls
INIT to set up the test case parameters. Routines ERRANL and
NRGTCS are called once before the main timestepping loop for
error normalization, once after the main timestepping for 
calculating energetics data and errors, and periodically during 
the timestepping, as requested. The prognostic fields are 
initialized using routine ANLYTC, which provides the analytic
solution. Each call to STEP advances the computed fields by a 
timestep DT. Timing logic surrounds the timestepping loop, so the
initialization phase is not timed. Also, a fake timestep is calculated before
beginning timing to eliminate the first time "paging" effect currently seen
on the Intel Paragon systems. 

STEP computes the first two time levels by two semi-implicit timesteps;
normal time-stepping is by a centered leapfrog-scheme. STEP calls COMP1,
which choses between an explicit numerical algorithm, a semi-implicit
algorithm, and a simplified algorithm associated with solving the advection
equation, one of the embedded test cases. The numerical algorithm used is an
input parameter. 

The basic outline of each timestep is the following:
1) Evaluate non-linear product and forcing terms.
2) Fourier transform non-linear terms in place as a block transform.
3) Compute and update divergence, geopotential, and vorticity spectral
   coefficients. (Much of the calculation of the time update is "bundled"
   with the Legendre transform.)
4) Compute velocity fields and transform divergence, geopotential,
   and vorticity back to gridpoint space using 
   a) an inverse Legendre transform and associated computations and
   b) an inverse real block FFT.

PSTSWM has "fictitious" vertical levels, and all computations are duplicated
on the different levels, potentially significantly increasing the granularity
of the computation. (The number of vertical levels is an input parameter.)
For error analysis, a single vertical level is extracted and analyzed. 

-------------------------------------------------------------------------------
Brief description of I/O behavior:

Processor 0 reads in the input parameters and broadcasts them to the rest of
the processors. Processor 0 also receives the error analysis and timing
results from the other processors and writes them out.

-------------------------------------------------------------------------------
Describe the data distribution (if appropriate) :

The processors are treated as a logical 2-D grid. There are 3 domains to be
distributed:
 a) physical domain: tensor product longitude-latitude grid
 b) Fourier domain: tensor product wavenumber-latitude grid
 c) spectral domain: triangular array, where each column contains the
                     spectral coefficients associated with a given
                     wavenumber. The larger the wavenumber is, the shorter
                     the column is.
An unordered FFT is used, and the Fourier and spectral domains use the
"unordered" permutation when the data is being distributed.

I) distributed FFT/distributed LT
   1) The tensor-product longitude-latitude grid is mapped onto the 
      processor grid by assigning a block of contiguous longitudes 
      to each processor column and by assigning one or two blocks of
      contiguous latitudes to each processor row. The vertical dimension is
      not distributed.   
   2) After the FFT, the subsequent wavenumber-latitude grid is similarly
      distributed over the processor grid, with a block of the permuted
      wavenumbers assigned to each processor column.
   3) After the LT, the wavenumbers are distributed as before and the spectral
      coefficients associated with any given wavenumber are either
      distributed evenly over the processors in the column containing that
      wavenumber, or are duplicated over the column. What happens is a
      function of the particular distributed LT algorithm used.

II) transpose FFT/distributed LT
   1) same as in (I)
   2) Before the FFT, the physical domain is first remapped to
      a vertical layer-latitude decomposition, with a block of contiguous
      vertical layers assigned to each processor column and the longitude
      dimension not distributed. After the transform, the vertical
      level-latitude grid is distributed as before, and the wavenumber
      dimension is not distributed. 
   3) After the LT, the spectral coefficients for a given vertical layers are
      either distributed evenly over the processors in a column, or are
      duplicated over that column. What happens is a function of the
      particular distributed LT algorithm used. 

III) distributed FFT/transpose LT
   1) same as (I)
   2) same as (I)
   3) Before the LT, the wavenumber-latitude grid is first remapped to
      a wavenumber-vertical layer decomposition, with a block of contiguous
      vertical layers assigned to eadh processor row and the latitude
      dimension not distributed. After the transform, the spectral
      coefficients associated with a given wavenumber and vertical layer
      are all on one processor, and the wavenumbers and vertical layers are
      distributed as before.

IV) transpose FFT/transpose LT
   1) same as (I)
   2) same as (II)
   3) Before the LT, the vertical level-latitude grid is first remapped to
      a vertical level-wavenumber decomposition, with a block of the permuted 
      wavenumbers now assigned to each processor row and the latitude
      dimension not distributed. After the transform, the spectral
      coefficients associated with a given wavenumber and vertical layer
      are all on one processor, and the wavenumbers and vertical layers are
      distributed as before.

-------------------------------------------------------------------------------
Give parameters of the data distribution (if appropriate) :

The distribution is a function of the problem size (longitude, latitude,
vertical levels), the logical processor grid (PX, PY), and the algorithm
(transpose vs. distributed for FFT and LT).

-------------------------------------------------------------------------------
Brief description of load balance behavior :

The load is fairly well balanced. If PX and PY evenly divide the number of
longitudes, latitudes, and vertical levels, then all load imbalances are due
to the unequal distribution of spectral coefficients. As described above, the
spectral coefficients are laid out as a triangular array in most runs, where
each column corresponds to a different Fourier wavenumber. The wavenumbers are
partitioned among the processors in most of the parallel algorithms. Since
each column is a different length, a wrap mapping of the the columns will
approximately balance the load. Instead, the natural "unordered" ordering of
the FFT is used with a block partitioning, which does a reasonable job of
load balancing without any additional data movement. The load imbalance is
quantified in Walker, et al [5]. 

If PX and PY do not evenly divide the dimensions of the physical domain,
then other load imbalances may be as large as a factor of 2 in the worse
case. 

-------------------------------------------------------------------------------
Give parameters that determine the problem size :

MM, NN, KK - specifes number of Fourier wavenumber and spectral truncation
             used. For a triangular truncation, MM = NN = KK.
NLON, NLAT, NVER
           - number of longitudes, latitudes, and vertical levels. There
             are required relationships between NLON, NLAT, and NVER, and
             between these and MM. These relationships are checked in the
             code. We will also provide a selection of input files that
             specify legal (and interesting) problems.
DT         - timestep (in seconds). (Must be small enough to satisfy Courant
             condition stability condition. Code warns if too large, but does
             not abort.)
TAUE       - end of model run (in hours)

-------------------------------------------------------------------------------
Give memory as function of problem size :

Executable size is determined at compile time by setting the parameters
COMPSZ in params.i. Per node memory requirements are approximately
(in REALs)

associated Legendre polynomial values:
   MM*MM*NLAT/PX*PY
physical grid fields: 
   8*NLON*NLAT*NVER/(PX*PY)
spectral grid fields: 
   3*MM*MM*NVER/(PX*PY) 
 or (if spectral coefficients duplicated within a processor column)
   3*MM*MM*MVER/PX        
work space:
   8*NLON*NLAT*NVER*BUFS1/(PX*PY) + 3*MM*MM*NVER*BUFS2/(PX*PY)
 or (if spectral coefficients duplicated within a processor column)
   8*NLON*NLAT*NVER*BUFS1/(PX*PY) + 3*MM*MM*NVER*BUFS2/PX

where BUFS1 and BUFS2 are input parameters (number of communication buffers).
BUFS1 and BUFS2 can be as small as 0 and as large as PX or PY.

In standard test cases, NLON=2*NLAT, NLON=4*NVER, and NLON=3*MM+1, so memory
requirements are approximately:

    (2 + 108*(1+BUFS1) + 3*(1+BUFS2))*(M**3)/(4*PX*PY)
  or
    (2 + 108*(1+BUFS1))*(M**3)/(4*PX*PY) + 3*(1+BUFS2)*(M**3)/(4*PX)


-------------------------------------------------------------------------------
Give number of floating-point operations as function of problem size :

for a serial run per timestep (very rough):
  nonlinear terms:
        10*NLON*NLAT*NVER
  forward FFT:
        40*NLON*NLAT*NVER*LOG2(NLON)
  forward LT and time update:
       48*MM*NLAT*NVER + 7*(MM**2)*NLAT*NVER
  inverse LT and calculation of velocities:
       20*MM*NLAT*NVER + 14*(MM**2)*NLAT*NVER
  inverse FFT:
       25*NLON*NLAT*NVER*LOG2(NLON)

Using standard assumptions (NLON=2*NLAT, NLON=4*NVER, and NLON=3*MM+1):

approx. 460*(M**3) + 348*(M**3)*LOG2(M) + 24*(M**4) flops per timestep.

For a total run, multiply by TAUE/DT.

-------------------------------------------------------------------------------
Give communication overhead as function of problem size and data distribution :

This is a function of the algorithm chosen.

I) transpose FFT
   a) forward + inverse FFT: let D = 13*NLON*NLAT*NVER/(PX*PY)
        2*(PX-1) steps, D volume
      or
        2*LOG2(PX) steps, D*LOG2(PX) volume 

II) distributed FFT
   a) forward + inverse FFT: let D = 13*NLON*NLAT*NVER/(PX*PY)
        2*LOG2(PX) steps, D*LOG2(PX) volume

III) transpose LT

   a) forward LT:  let D = 8*NLON*NLAT*NVER/(PX*PY)
        2*(PY-1) steps, D volume
      or
        2*LOG2(PY) steps, D*LOG2(PY) volume 

   b) inverse LT:  let D = (3/2)*(MM**2)*NVER/(PX*PY)
        (PY-1) steps, D volume
       or
        LOG2((PY) steps, D*PY volume

IV) distributed LT

   a) forward + inverse LT:  let D = 3*(MM**2)*NVER/(PX*PY)
        2*(PY-1) steps, D*PY volume
       or
        2*LOG2((PY) steps, D*PY volume

These are per timestep costs. Multiply by TAUE/DT for total communication
overhead. 

-------------------------------------------------------------------------------
Give three problem sizes, small, medium, and large for which the benchmark
should be run (give parameters for problem size, sizes of I/O files,
memory required, and number of floating point operations) :

Standard input files will be provided for 

T21: MM=KK=NN=21      T42: MM=KK=NN=42        T85: MM=NN=KK=85
     NLON=32               NLON=64                 NLON=128
     NLAT=64               NLAT=128                NVER=256
     NVER=8                NVER=16                 NVER=32
     ICOND=2               ICOND=2                 ICOND=2
     DT=4800.0             DT=2400.0               DT=1200.0
     TAUE=120.0            TAUE=120.0              TAUE=120.0

These are 5 day runs of the "benchmark" case specified in Williamson, et al
[3]. Flops and memory requirements for serial runs are as follows (approx.):

T21:           500,000 REALs
         2,000,000,000 flops
     
T42:         4,000,000 REALs
        45,000,000,000 flops

T85:        34,391,000 REALs
     1,000,000,000,000 flops

Both memory and flops scale well, so, for example, the T42 run fits in
approx. 4MB of memory for a 4 processor run. But different algorithms and 
different aspect ratios of the processor grid use different amounts of memory.

-------------------------------------------------------------------------------
How did you determine the number of floating-point operations (hardware
monitor, count by hand, etc.) :

Count by hand (looking primarily at inner loops, but eliminating common
subexpressions that compiler is expected to find).

-------------------------------------------------------------------------------

From owner-pbwg-compactapp@CS.UTK.EDU Thu Oct 28 08:53:23 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA11659; Thu, 28 Oct 93 08:53:23 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA07386; Thu, 28 Oct 93 08:52:54 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 28 Oct 1993 08:52:53 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA07372; Thu, 28 Oct 93 08:52:51 -0400
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA13457; Thu, 28 Oct 1993 08:52:59 -0400
Date: Thu, 28 Oct 1993 08:52:59 -0400
From: walker@rios2.epm.ornl.gov (David Walker)
Message-Id: <9310281252.AA13457@rios2.epm.ornl.gov>
To: pbwg-compactapp@cs.utk.edu
Subject: SOLVER Compact Application


Received: from sun2.nsfnet-relay.ac.uk by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA21681; Mon, 18 Oct 1993 01:55:44 -0400
Via: uk.ac.edinburgh.castle; Mon, 18 Oct 1993 06:31:49 +0100
Received: from epcc.ed.ac.uk by castle.ed.ac.uk id aa21204; 18 Oct 93 6:31 BST
Received: from subnode.epcc.ed.ac.uk (feldspar.epcc.ed.ac.uk) by epcc.ed.ac.uk;
          Sun, 17 Oct 93 16:28:48 BST
Date: Sun, 17 Oct 93 16:28:46 BST
Message-Id: <2567.9310171528@subnode.epcc.ed.ac.uk>
From: S P Booth <spb@epcc.edinburgh.ac.uk>
Subject: Re: ParkBench applications
To: "David W. Walker" <walker@rios2.epm.ornl.gov>
In-Reply-To: David W. Walker's message of Fri, 15 Oct 93 13:23:46 -0500


Sorry I took so long to reply to this.
If any of this needs any futher clarification don't hesitate to send me
some email.
		spb

-------------------------------------------------------------------------
                  PARKBENCH COMPACT APPLICATIONS SUBMISSION FORM

To submit a compact application to the ParkBench suite you must follow the
following procedure:

1. Complete the submission form below, and email it to David Walker
   at walker@msr.epm.ornl.gov. The data on this form will be reviewed 
   by the ParkBench Compact Applications Subcommittee, and you will
   be notified if the application is to be considered further for
   inclusion in the ParkBench suite.
   
2. If ParkBench Compact Applications Subcommittee decides to consider
   your application further you will be asked to submit the source code
   and input and output files, together with any documentation and papers
   about the application. Source code and input and output files should
   be submitted by email, or ftp, unless the files are very large, in
   which case a tar file on a 1/4 inch cassette tape. Wherever possible 
   email submission is preferred for all documents in man page, Latex 
   and/or Postscript format. These files documents and papers together
   constitute your application package. Your application package should
   be sent to:
David Walker
                Oak Ridge National Laboratory
                Bldg. 6012/MS-6367
                P. O. Box 2008
                Oak Ridge, TN 37831-6367
                (615) 574-7401/0680 (phone/fax)
                walker@msr.epm.ornl.gov

   The street address is "Bethal Valley Road" if Fedex insists on this.
   The subcommittee will then make a final decision on whether to include 
   your application in the ParkBench suite.

3. If your application is approved for inclusion in the ParkBench suite
   you (or some authorized person from your organization) will be asked
   in complete and sign a form giving ParkBench authority to distribute,
   and modify (if necessary), your application package.

-------------------------------------------------------------------------------
Name of Program         : SOLVER
                        : 
-------------------------------------------------------------------------------
Submitter's Name        : Stephen P. Booth
Submitter's Organization: UKQCD collaboration
Submitter's Address     : EPCC
			  The University of Edinburgh
			  James Clerk Maxwell Building
			  The King's Buildings 
			  Mayfield Road
			  Edinburgh EH9 3JZ
		          Scotland
Submitter's Telephone # : +44 (0)31 650 5746
Submitter's Fax #       : +44 (0)31 622 4712
Submitter's Email       : spb@epcc.ed.ac.uk
-------------------------------------------------------------------------------
Cognizant Expert(s)     : Dr S.P.Booth
CE's Organization       : EPCC/UKQCD
CE's Address            : The University of Edinburgh
			  James Clerk Maxwell Building
			  The King's Buildings 
			  Mayfield Road
			  Edinburgh EH9 3JZ
		          Scotland
CE's Telephone #        : +44 (0)31 650 5746
CE's Fax #              : +44 (0)31 622 4712
CE's Email              : spb@epcc.ed.ac.uk

Cognizant Expert(s)     : Dr R.D. Kenway
CE's Organization       : EPCC/UKQCD
CE's Address            : The University of Edinburgh
			  James Clerk Maxwell Building
			  The King's Buildings 
			  Mayfield Road
			  Edinburgh EH9 3JZ
		          Scotland
CE's Telephone #        : +44 (0)31 650 5245
CE's Fax #              : +44 (0)31 622 4712
CE's Email              : rdk@epcc.ed.ac.uk

-------------------------------------------------------------------------------
Extent and timeliness with which CE is prepared to respond to questions and
bug reports from ParkBench :

S.Booth is prepared to respond quickly to questions and bug reports.
We have a strong interest in the portability and performance of this code.


-------------------------------------------------------------------------------
Major Application Field : Lattice gauge theory
Application Subfield(s) : QCD
-------------------------------------------------------------------------------
Application "pedigree" (origin, history, authors, major mods) :

SOLVER is part of an ongoing software development exercise carried out
by UKQCD (The United Kingdom Quantum Chromo-Dynamics  collaboration)
To develop a new generation of simulation codes. The current generation
of codes were highly tuned for a particular machine architecture so a
software development exercise was started to design and develop a set of
portable codes. This code was developed by S.Booth and N.Stanford of
the University of Edinburgh during the course of 1993.
Solver is a benchmark code derived from the codes used to generate quark
propagators. It is designed to benchmark and validate the computational 
sections of this operation. It differs from the production code in that
it self initialises to non-trivial test data rather than performing file
access. This is because there is no accepted standard for parallel file
access.
The benchmark was originally developed as part of a national UK procurement
exercise.

-------------------------------------------------------------------------------
May this code be freely distributed (if not specify restrictions) :

The code may be freely distributed for benchmarking purposes but 
the code remains the property of UKQCD and we ask to be contacted
if anyone wishes to use it as an application code.

-------------------------------------------------------------------------------
Give length in bytes of integers and floating-point numbers that should be
used in this application:

All floating point numbers are defined as macros (either Fpoint or Dpoint)
The majority of the variables are Fpoint. Dpoint is only used for
accumulation values that may require higher precision. This allows the
precision of the program to be changed easily. For small and
intermediate problem sizes 4 byte Fpoints and 8 byte Dpoints should be 
sufficient. For large problems higher precision may be required.
INTEGERS must be large enough to hold the number of sites 
allocated to a processor (4 bytes almost certainly sufficient)
The COMPLEX type is not used.

-------------------------------------------------------------------------------
Documentation describing the implementation of the application (at module
level, or lower) :

Documentation exists for all program routines except some low level
routines local to a single source file.

-------------------------------------------------------------------------------
Research papers describing sequential code and/or algorithms :

-------------------------------------------------------------------------------
Research papers describing parallel code and/or algorithms :

-------------------------------------------------------------------------------
Other relevant research papers:

-------------------------------------------------------------------------------
Application available in the following languages (give message passing system
used, if applicable, and machines application runs on) :

Two version of the application were developed in parallel.
1) A HPF version (both CMF and HPF directives)
2) A message passing version.

The message passing version uses ansi-F77 with the following extensions
a) CPP is used for include files and some simple macros and build-time 
   conditionals.
b) The F77 restrictions of variable names are not adhered to though the
   authors have tools to convert the code to conform.

All of the message passing operations are confined to a small number of
routines. These routines were designed to be implementable in as many
different message passing systems as possible. Current versions are
1) fake - converts the program to a single processor code.
2) PARMACS - original parallel versions
3) PVM - under development.

-------------------------------------------------------------------------------
Total number of lines in source code: 15567
Number of lines excluding comments  : 10679
Size in bytes of source code        : 432398
-------------------------------------------------------------------------------
List input files (filename, number of lines, size in bytes, and if formatted) :

None 

-------------------------------------------------------------------------------
List output files (filename, number of lines, size in bytes, and if formatted) :

standard output: formatted text

-------------------------------------------------------------------------------
Brief, high-level description of what application does:

The application generates quark propagators from a  background gauge
configuration and a fermionic source. This is equivalent to solving 
M psi = source 
where psi is the quark propagator and M (a function operating on psi)
depends on the gauge fields.
The benchmark performs a cut down version of this operation.

-------------------------------------------------------------------------------
Main algorithms used:

Conjugate gradient least norm with red-black pre-conditioning.

-------------------------------------------------------------------------------
Skeleton sketch of application:

The benchmark code initialises the gauge field to a unit gauge
configuration. (The results for a unit gauge can be calculated
analytically allowing a check on the results)
A gauge transformation is then applied to the gauge field. A unit gauge
field only consists of zeros and ones by applying a gauge transformation
non-trivial values are generated. Quantities corresponding to physical
observables should be unchanged by such a transformation. 
In application code the gauge field would have been read in from disk.
The source field is initialised to a point source (a single non-zero
point on one lattice site)
An iterative solver is called to generate the quark propagator.
The solver routine also generates timing information.
In application code this would then be dumped to disk.
In the benchmark we use the quark propagator to generate a physically
significant quantity (the pion propagator). This generates a single real
number for each timeslice of the lattice. These values are printed to
standard out.

This procedure requires a large number of iterations. For benchmarking
we are only interested in the time per-iteration and some check on the
validity of the results. We therefore usually only perform a fixed
number of iterations (say 50) to generate accurate timing information
and verify the results by comparison with other machines.

-------------------------------------------------------------------------------
Brief description of I/O behaviour:

Unless an error occurs a single processor outputs to standard out.

-------------------------------------------------------------------------------
Describe the data distribution (if appropriate) :
A spacial decomposition is used to distribute the 4-D arrays over a 4-D
grid of processors. Each dimension is distributed independently.
The program supports non-regular decomposition,
e.g. a lattice of width 22 will be distributed across a processor-grid
of width 4 as (6, 6, 5, 5)

-------------------------------------------------------------------------------
Give parameters of the data distribution (if appropriate) :
Lattice size:     NX NY NZ NT
processor grid:   NPX NPY NPZ NPT

-------------------------------------------------------------------------------
Brief description of load balance behavior :

Load balancing depends only on the distribution, if the lattice size can
be exactly divided by the processor grid size all processors will have 
the same workload. In practice it is often useful to trade load
balancing for a larger number of processors.

-------------------------------------------------------------------------------
Give parameters that determine the problem size :
Lattice size, NX NY NZ NT
problem size is NX*NY*NZ*NT
-------------------------------------------------------------------------------
Give memory as function of problem size :

In a production environment there are build time parameters that
set the array sizes and problem/machine sizes can be set at runtime. 
When creating a benchmark program it seemed less confusing to set
lattice and processor-grid sizes at build time and derive all other
quantities from them. The appropriate parameters for memory use are
Max_body (maximum number of data-points per/processor)
Max_bound (maximum number of data points on a single boundary between
   two processors)
If LX LY LZ LT are the local lattice sizes obtained by dividing the
lattice size by the processor grid size and rounding up to the nearest integer.
Max_body = (LX*LY*LZ*LT)/2
Max_bound = MAX( LX*LY*LZ/2 ,LY*LZ*LT/2 ,LX*LZ*LT/2 ,LX*LY*LT/2 )

The code contains a number of build-time switches for variations
in the implementation that may be beneficial on some machines. The
memory usage depends on these switches but typical values are:
108 * Max_body + 36 * Max_bound Fpoints
16 * (Max_body + Max_bound) INTEGERS

-------------------------------------------------------------------------------
Give number of floating-point operations as function of problem size :

Each iteration performs 2760 floating point operations per lattice site.
ie. 50 iteration using a 24^3*48 lattice = 9.16e+10 floating point operations.

-------------------------------------------------------------------------------
Give communication overhead as function of problem size and data distribution :

For each iteration every processor sends 24 messages to each of its 8
neighbours each message contains one floating point number for each
lattice point in the common boundary. Two global sum operations are also
performed for each iteration.

-------------------------------------------------------------------------------
Give three problem sizes, small, medium, and large for which the benchmark
should be run (give parameters for problem size, sizes of I/O files,
memory required, and number of floating point operations) :

18^3*36		2.90e+10 fp operations
24^3*48		9.16e+10 fp operations
36^3*72		4.64e+11 fp operations

-------------------------------------------------------------------------------
How did you determine the number of floating-point operations (hardware
monitor, count by hand, etc.) :

count operations in each loop by hand. The code contains a counter to
sum these values.

-------------------------------------------------------------------------------
Other relevant information:



-------------------------------------------------------------------------------


From owner-pbwg-compactapp@CS.UTK.EDU Wed Nov  3 09:19:23 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA22427; Wed, 3 Nov 93 09:19:23 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA27464; Wed, 3 Nov 93 09:18:54 -0500
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Wed, 3 Nov 1993 09:18:53 EST
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA27455; Wed, 3 Nov 93 09:18:52 -0500
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA15591; Wed, 3 Nov 1993 09:18:51 -0500
Date: Wed, 3 Nov 1993 09:18:51 -0500
From: walker@rios2.epm.ornl.gov (David Walker)
Message-Id: <9311031418.AA15591@rios2.epm.ornl.gov>
To: pbwg-compactapp@cs.utk.edu
Subject: ARCO Compact Application Submission


-------------------------------------------------------------------------------
Name of Program         : ARCO Parallel Seismic Processing Benchmarks
-------------------------------------------------------------------------------
Submitter's Name        : Charles C. Mosher
Submitter's Organization: ARCO Exploration and Production Technology
Submitter's Address     : 2300 West Plano Parkway
                          Plano, TX 75075-8499

Submitter's Telephone # : (214)754-6468
Submitter's Fax #       : (214)754-3016
Submitter's Email       : ccm@arco.com
-------------------------------------------------------------------------------
Cognizant Expert(s)     : Charles C. Mosher

Cognizant Expert(s)     : Siamak Hassanzadeh (co-author)
CE's Organization       : Fujitsu America
CE's Email              : siamak@fai.com
-------------------------------------------------------------------------------
Extent and timeliness with which CE is prepared to respond to questions and
bug reports from ParkBench :

Will handle reasonable requests in a timley fashion.

-------------------------------------------------------------------------------
Major Application Field : Seismic Data Processing
Application Subfield(s) : Parallel I/O, signal processing, solution of PDE's
-------------------------------------------------------------------------------
Application "pedigree" (origin, history, authors, major mods) :

The application began as a prototype system for seismic data processing
on parallel computing architectures.  The prototype was used to design
and implement production seismic processing on ARCO's Intel iPSC/860, where
it is used today.

Like other companies, ARCO continues to upgrade our HPC facilities.  We found
that we were spending a large amount of time on benchmarking, as were other
companies in the oil industry.  We decided to place our system in the public
domain as a benchmark suite, in the hopes that the benchmarking effort could
be spread across many participants. In addition, we hope to use the system
as a mechanism for code development and sharing between academia, national
labs, and industry.

Our first attempt was to work with the Perfect Benchmark Club at the
University of Illinois Center for Supercomputing Research and Development.
Many members of that group provided valuable input that significantly
improved the structure and content of the suite.  Special thanks to 
David Schneider for his work on organizing and managing the Perfect effort.

Perfect has since disbanded, which leads us to the ParKBench submission.
A consulting organization (Resource 2000) has also picked up the code and
is providing newsletter subscriptions to participants in the oil industry
describing both benchmark numbers and commentary on usability of the sytems
tested.  Thanks to Randy Premont, Gary Montry, and Clive Bailley of Resource
2000 for their continuing work to make the ARCO suite a viable benchmark.

-------------------------------------------------------------------------------
May this code be freely distributed (if not specify restrictions) :

The code may be freely distributed.  We request that ARCO and the authors be
acknowledged in publications.

In order to ensure relevance of the codes in the suite, the authors plan
to retain control of the source and algorithms contained therein, and request
that suggestions for changes and updates be directed to the authors only.

-------------------------------------------------------------------------------
Give length in bytes of integers and floating-point numbers that should be
used in this application:

        Integers :   4 bytes
        Floats   :   4 bytes

-------------------------------------------------------------------------------
Documentation describing the implementation of the application (at module
level, or lower) :

High level: ARCO Seismic Benchark Suite Users's Guide
Low  level: source comments

-------------------------------------------------------------------------------
Research papers describing sequential code and/or algorithms :

Yilmaz, Ozdogan, 1990, Seismic Data Processing: Investigations in Geophysics
    vol. 2, Society of Exploration Geophysicists, P.O. Box 702740,
    Tulsa, Oklahoma, 74170

-------------------------------------------------------------------------------
Research papers describing parallel code and/or algorithms :

Mosher, C., Hassanzadeh, S., and Schneider, D., 1992, A Benchmark Suite 
    for Parallel Seismic Processing, Supercomputing 1992 proceedings.

-------------------------------------------------------------------------------
Other relevant research papers:


-------------------------------------------------------------------------------
Application available in the following languages (give message passing system
used, if applicable, and machines application runs on) :

Language: 
    Fortran 77

Message Passing: 
    Yet Another Message Passing Layer (YAMPL)
	Sample implementations for PVM, Intel NX, TCGMSG

Machines Supported:
	Workstation clusters and multiprocessors (i.e. Sun, Dec, HP, IBM, SGI)
	Cray YMP 
	Intel iPSC/860

-------------------------------------------------------------------------------
Total number of lines in source code: ~ 20000
Number of lines excluding comments  : ~ 15000
Size in bytes of source code        : ~ 1 MByte 
-------------------------------------------------------------------------------
List input files (filename, number of lines, size in bytes, and if formatted) :

ASCI parameter files, 10-100 lines

-------------------------------------------------------------------------------
List output files (filename, number of lines, size in bytes, and if formatted) :

Binary seismic data files, 1 MByte (small),  1 GByte (medium), 
                          10 Gbyte (large), 100 Gbyte (huge)


-------------------------------------------------------------------------------
Brief, high-level description of what application does:

Synthetic seismic data for small, medium and large test cases are generated
in the native format of the target machine.  The test data are read and
processed in parallel, and the output is written to disk.  Simple checksum
and timing tables are printed to standard output.  A simple x-windows image
display tool is used to verify correctness of results.

-------------------------------------------------------------------------------
Main algorithms used:

Signal processing (FFT's, Toepplitz equation solvers, interpolation)
Seismic Imaging (Fourier domain, Kirchhoff integral, 
   finite difference algorithms)

-------------------------------------------------------------------------------
Skeleton sketch of application:

Processing modules are applied in a pipeline fashion to 2D arrays of seismic
data read from disk.  Processing flows are of the form READ-FLTR-MIGR-WRIT.
The same flow is executed on all processors.  Individual modules communicate
via message passing to implement parallel algorithms.  Nearly all message
passing is hidden via transpose operations that change the parallel data 
distribution as appropriate for each algorithm.

-------------------------------------------------------------------------------
Brief description of I/O behavior:

2D arrays are read/written from HDF style files on disk.  Parallel I/O is
supported for both a single large file read by multiple processors, and a
a separate file read by each processor.  A significant part of the seismic
processing flow requires data to be read in transposed fashion across all
processors.

-------------------------------------------------------------------------------
Brief description of load balance behavior :

Assumes a homogeneous array of processors with similar capabilities.
Load balance is rudimentary, with an attempt to distribute equal-sized
'workstation' chunks of work.

-------------------------------------------------------------------------------
Describe the data distribution (if appropriate) :

Seismic data is inherently parallel, with large data sets that offer mutliple
opportunities for parallel operation.  Typically, the data is treated as a
collection of 2D arrays, with each processor owning a 'slab' of data.

-------------------------------------------------------------------------------
Give parameters of the data distribution (if appropriate) :

The data is defined as a 4-dimensional array with Fortran dimensions
(sample, trace, frame, volume).  The third dimension (frame) is typically
spread across the processors.

-------------------------------------------------------------------------------
Give parameters that determine the problem size :

The ASCII parameter files define the data set size in terms of the number
of samples per seismic traces, the number of traces per shot, the number
of shooting lines, and the number of 3D volumes.

-------------------------------------------------------------------------------
Give memory as function of problem size :

Requires enough memory to hold 2 frames on each node, and a 3D volume
spread across the node.

-------------------------------------------------------------------------------
Give number of floating-point operations as function of problem size :

Reported by code as appropriate. On a Cray YMP, medium sized problems with
750 MB of output run at 30-100 Mflops for about an hour.

-------------------------------------------------------------------------------
Give communication overhead as function of problem size and data distribution :

On an Intel iPSC/860, there are parts of the suite that have comp/comm
ratios ranging from near infinite to 1/10.

-------------------------------------------------------------------------------
Give three problem sizes, small, medium, and large for which the benchmark
should be run (give parameters for problem size, sizes of I/O files,
memory required, and number of floating point operations) :

small: 1 MB output, 10 sec on YMP
medium: 1 GB output, 1 hour on YMP
large: 10 GB output, 10 hours on YMP

-------------------------------------------------------------------------------
How did you determine the number of floating-point operations (hardware
monitor, count by hand, etc.) :

Hand count for simple operations, Regression analysis of Cray HPM results
for more complex operations.

-------------------------------------------------------------------------------
Other relevant information:



-------------------------------------------------------------------------------
From owner-parkbench-compactapp@CS.UTK.EDU Tue Mar 22 09:57:45 1994
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.8t-netlib)
	id JAA13757; Tue, 22 Mar 1994 09:57:44 -0500
Received: from localhost by CS.UTK.EDU with SMTP (cf v2.8s-UTK)
	id JAA09199; Tue, 22 Mar 1994 09:57:20 -0500
X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Tue, 22 Mar 1994 09:57:19 EST
Errors-to: owner-parkbench-compactapp@CS.UTK.EDU
Received: from rios2.epm.ornl.gov by CS.UTK.EDU with SMTP (cf v2.8s-UTK)
	id JAA09186; Tue, 22 Mar 1994 09:57:17 -0500
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA24475; Tue, 22 Mar 1994 09:57:26 -0500
Message-Id: <9403221457.AA24475@rios2.epm.ornl.gov>
To: ccm@arco.com
Cc: pbwg-compactapp@CS.UTK.EDU
Subject: ParkBench code
Date: Tue, 22 Mar 94 09:57:26 -0500
From: "David W. Walker" <walker@rios2.epm.ornl.gov>


Dear Dr. Mosher,

Thank you for submitting the ARCO Parallel Seismic Processing Benchmarks for
inclusion in the ParkBench Compact Applications benchmark suite. After due
consideration the Compact Applications subcommittee has decided to include
the code in the benchmark suite.  I would be grateful if you would arrange
for the source code, input, and output files to be sent to me.

To submit your code please send me the following:

1. The complete source code

2. Input files corresponding to the small, medium, and large cases
   described in your submission

3. An output file corresponding to the small case to be used for
   validation purposes

4. PostScript files of the following papers (if available)

   Mosher, C., Hassanzadeh, S., and Schneider, D., 1992, A Benchmark Suite
   for Parallel Seismic Processing, Supercomputing 1992 proceedings.

   ARCO Seismic Benchark Suite Users's Guide

   and any other relevant papers you may have online.

If you have versions of the code using different message passing packages
please supply multiple versions of the source code.

Ultimately we would like the codes to be self-validating. Please can you
let me have any suggestions on what quantities might be checked to
validate the code.

All the above will probably come to several Mbytes, so it is probably not
appropriate to email it to me. Do you have an anonymous ftp site where I
could copy the files from?

Best Regards,
David Walker
From owner-parkbench-compactapp@CS.UTK.EDU Tue Mar 22 10:12:48 1994
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.8t-netlib)
	id KAA13948; Tue, 22 Mar 1994 10:12:48 -0500
Received: from localhost by CS.UTK.EDU with SMTP (cf v2.8s-UTK)
	id KAA10288; Tue, 22 Mar 1994 10:11:05 -0500
X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Tue, 22 Mar 1994 10:10:55 EST
Errors-to: owner-parkbench-compactapp@CS.UTK.EDU
Received: from rios2.epm.ornl.gov by CS.UTK.EDU with SMTP (cf v2.8s-UTK)
	id KAA10257; Tue, 22 Mar 1994 10:10:50 -0500
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA18866; Tue, 22 Mar 1994 10:07:46 -0500
Message-Id: <9403221507.AA18866@rios2.epm.ornl.gov>
To: mia@unixa.nerc-bidston.ac.uk
Cc: pbwg-compactapp@CS.UTK.EDU
Subject: ParkBench code
Date: Tue, 22 Mar 94 10:07:46 -0500
From: "David W. Walker" <walker@rios2.epm.ornl.gov>


Dear Dr. Ashworth,

Thank you for submitting the POLMP code for inclusion in
the ParkBench Compact Applications benchmark suite. After due
consideration the Compact Applications subcommittee has decided to include
the code in the benchmark suite.  I would be grateful if you would arrange
for the source code, input, and output files to be sent to me.

To submit your code please send me the following:

1. The complete source code

2. Input files corresponding to the small, medium, and large cases
   described in your submission (v200, wa200, xb200)

3. An output file corresponding to the small case to be used for
   validation purposes

4. PostScript files of the following papers mentioned in your submission
   describing the sequential and parallel codes (if available). Also the
   users guide if there is one.

If you have versions of the code using different message passing packages
please supply multiple versions of the source code.

Ultimately we would like the codes to be self-validating. Please can you
let me have any suggestions on what quantities might be checked to
validate the code.

If the above files are too large to email to me, please let me know if there
is an anonymous ftp site where I can copy them from.

Best Regards,
David Walker
--------------------------------------------------------------------------
| David W. Walker                 |   Office   : (615) 574-7401          |
| Oak Ridge National Laboratory   |   Fax      : (615) 574-0680          |
| Building 6012/MS-6367           |   Messages : (615) 574-1936          |
| P. O. Box 2008                  |   Email    : walker@msr.epm.ornl.gov |
| Oak Ridge, TN 37831-6367        |                                      |
--------------------------------------------------------------------------

-------------------------------------------------------------------------------
Name of Program         : POLMP
                 (Proudman Oceanographic Laboratory Multiprocessing Program)
-------------------------------------------------------------------------------
Submitter's Name        : Mike Ashworth
Submitter's Organization: NERC Computer Services
Submitter's Address     : Bidston Observatory
			  Birkenhead, L43 7RA, UK
Submitter's Telephone # : +44-51-653-8633
Submitter's Fax #       : +44-51-653-6269
Submitter's Email       : mia@ua.nbi.ac.uk
-------------------------------------------------------------------------------
Cognizant Expert 	: Mike Ashworth
CE's Organization	: NERC Computer Services
CE's Address     	: Bidston Observatory
			  Birkenhead, L43 7RA, UK
CE's Telephone # 	: +44-51-653-8633
CE's Fax #       	: +44-51-653-6269
CE's Email       	: mia@ua.nbi.ac.uk
-------------------------------------------------------------------------------
Extent and timeliness with which CE is prepared to respond to questions and
bug reports from ParkBench :

Bearing in mind other commitments, Mike Ashworth is prepared to respond 
quickly to questions and bug reports, and expects to be kept informed as 
to results of experiments and modifications to the code.

-------------------------------------------------------------------------------
Major Application Field : Fluid Dynamics
Application Subfield(s) : Ocean and Shallow Sea Modeling
-------------------------------------------------------------------------------
Application "pedigree" (origin, history, authors, major mods) :

     The POLMP project was created to develop numerical
     algorithms for shallow sea 3D hydrodynamic models that run
     efficiently on modern parallel computers. A code was
     developed, using a set of portable programming conventions
     based upon standard Fortran 77, which follows the wind
     induced flow in a closed rectangular basin including a number
     of arbitrary land areas. The model solves a set of
     hydrodynamic partial differential equations, subject to a set of
     initial conditions, using a mixed explicit/implicit forward time
     integration scheme. The explicit component corresponds to a
     horizontal finite difference scheme and the implicit to a
     functional expansion in the vertical (Davies, Grzonka and
     Stephens, 1989).

     By the end of 1989 the code had been implemented on the RAL
     4 processor Cray X-MP using Cray's microtasking system,
     which provides parallel processing at the level of the Fortran
     DO loop. Acceptable parallel performance was achieved by
     integrating each of the vertical modes in parallel, referred to
     in Ashworth and Davies (1992) as vertical partitioning. In
     particular, a speed-up of 3.15 over single processor execution
     was obtained, with an execution rate of 548 Megaflops
     corresponding to 58 per cent of the peak theoretical
     performance of the machine. Execution on an 8 processor Cray
     Y-MP gave a speed-up efficiency of 7.9 and 1768 Megaflops or
     67 per cent of the peak (Davies, Proctor and O'Neill, 1991).
     The latter resulted in Davies and Grzonka being awarded a
     prize in the 1990 Cray Gigaflop Performance Awards .

     The project has been extended by implementing the shallow
     sea model in a form which is more appropriate to a variety of
     parallel architectures, especially distributed memory
     machines, and to a larger number of processors. It is especially
     desirable to be able to compare shared memory parallel
     architectures with distributed memory architectures. Such a
     comparison is currently relevant to NERC science generally
     and will be a factor in the considerations for the purchase of
     new machines, bids for allocations on other academic
     machines, and for the design of new codes and the
     restructuring of existing codes.

     In order to simplify development of the new code and to ensure
     a proper comparison between machines, a restructured version
     of the Davies and Grzonka rectangle was designed which will
     perform partitioning of the region in the horizontal dimension.
     This has the advantage over vertical partitioning that the
     communication between processors is limited to a few points
     at the boundaries of each sub-domain. The ratio of interior
     points to boundary points, which determines the ratio of
     computation to communication and hence the efficiency on
     message passing, distributed memory machines, may be
     increased by increasing the size of the individual sub-domains.
     This design may also improve the efficiency on shared memory
     machines by reducing the time of the critical section and
     reducing memory conflicts between processors. In addition, the
     required number of vertical modes is only about 16, which,
     though well suited to a 4 or 8 processor machine, does not
     contain sufficient parallelism for more highly parallel
     machines.

     The code has been designed with portability in mind, so that
     essentially the same code may be run on parallel computers
     with a range of architectures. 

-------------------------------------------------------------------------------
May this code be freely distributed (if not specify restrictions) :

Yes, but users are requested to acknowledge the authors (Ashworth and
Davies) in any resulting research or publications, and are
encouraged to send reprints of their work with this code to the authors.
Also, the authors would appreciate being notified of any modifications to 
the code. 

-------------------------------------------------------------------------------
Give length in bytes of integers and floating-point numbers that should be
used in this application:

Some 8 byte floating point numbers are used in some of the initialization
code, but calculations on the main field arrays may be done using
4 byte floating point variables without grossly affecting the solution.
Nevertheless, precision conversion is facilitated by a switch supplied
to the C preprocessor. By specifying -DSINGLE, variables will be declared
as REAL, normally 4 bytes, whereas -DDOUBLE will cause declarations to be
DOUBLE PRECISION, normally 8 bytes.

-------------------------------------------------------------------------------
Documentation describing the implementation of the application (at module
level, or lower) :

The README file supplied with the code describes how the various versions
of the code should be built. Extensive documentation, including the 
definition of all variables in COMMON is present as comments in the code.

-------------------------------------------------------------------------------
Research papers describing sequential code and/or algorithms :

1) Davies, A.M., Formulation of a linear three-dimensional hydrodynamic
   sea model using a Galerkin-eigenfunction method, Int. J. Num. Meth.
   in Fliuds, 1983, Vol. 3, 33-60.

2) Davies, A.M., Solution of the 3D linear hydrodynamic equations using
   an enhanced eigenfunction approach, Int. J. Num. Meth. in Fluids,
   1991, Vol. 13, 235-250.

-------------------------------------------------------------------------------
Research papers describing parallel code and/or algorithms :

1) Ashworth, M. and Davies, A.M., Restructuring three-dimensional
   hydrodynamic models for computers with low and high degrees of
   parallelism, in Parallel Computing '91, eds D.J.Evans, G.R.Joubert
   and H.Liddell (North Holland, 1992), 553-560.
   
2) Ashworth, M., Parallel Processing in Environmental Modelling, in
   Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
   Meteorology (Nov. 23-27, 1992)
   Hoffman, G.-R and T. Kauranne, ed., 
   World Scientific Publishing Co. Pte. Ltd, Singapore, 1993.

3) Ashworth, M. and Davies, A.M., Performance of a Three Dimensional
   Hydrodynamic Model on a Range of Parallel Computers, in
   Proceedings of the Euromicro Workshop on Parallel and Distributed
   Computing, Gran Canaria 27-29 January 1993, pp 383-390, (IEEE
   Computer Society Press)
   
4) Davies, A.M., Ashworth, M., Lawrence, J., O'Neill, M.,
   Implementation of three dimensional shallow sea models on vector
   and parallel computers, 1992a, CFD News, Vol. 3, No. 1, 18-30.
   
5) Davies, A.M., Grzonka, R.G. and Stephens, C.V., The implementation
   of hydrodynamic numerical sea models on the Cray X-MP, 1992b, in
   Advances in Parallel Computing, Vol. 2, edited D.J. Evans.
   
6) Davies, A.M., Proctor, R. and O'Neill, M., "Shallow Sea
   Hydrodynamic Models in Environmental Science", Cray Channels,
   Winter 1991.

-------------------------------------------------------------------------------
Other relevant research papers:

-------------------------------------------------------------------------------
Application available in the following languages (give message passing system
used, if applicable, and machines application runs on) :

Code is initially passed through the C preprocessor, allowing a 
number of versions with different programming styles, precisions
and machine dependencies to be generated.

Fortran 77 version

     A sequential version of POLMP is available, which conforms
     to the Fortran 77 standard. This version has been run on a
     large number of machines from workstations to supercomputers 
     and any code which caused problems, even if it conformed to 
     the standard, has been changed or removed. Thus its conformance 
     to the Fortran 77 standard is well established.

     In order to allow the code to run on a wide range of problem
     sizes without recompilation, the major arrays are defined
     dynamically by setting up pointers, with names starting with
     IX, which point to locations in a single large data array: SA.
     Most pointers are allocated in subroutine MODSUB and the
     starting location passed down into subroutines in which they
     are declared as arrays. For example :

     IX1 = 1
     IX2 = IX1 + N*M
     CALL SUB ( SA(IX1), SA(IX2), N, M )

     SUBROUTINE SUB ( A1, A2, N, M )
     DIMENSION A1(N,M), A2(N,M)
     END

     Although this is probably against the spirit of the Fortran 77
     standard, it is considered the best compromise between
     portability and utility, and has caused no problems on any of
     the machines on which it has been tried. 

     The code has been run on a number of traditional vector
     supercomputers, mainframes and workstations. In addition,
     key loops are able to be parallelized automatically by some
     compilers on shared (or virtual shared) memory MIMD machines, 
     allowing parallel execution on the Convex C2 and C3, Cray X-MP, 
     Y-MP, and Y-MP/C90, and Kendall Square Research KSR-1. Cray 
     macrotasking calls may also be enabled for an alternative
     mode of parallel execution on Cray multiprocessors.

Message passing version

     POLMP has been implemented on a number of message-passing machines:
     Intel iPSC/2 and iPSC/860, Meiko CS-1 i860 and CS-2 and nCUBE 2.
     Code is also present for the PVM and Parmacs portable message
     passing systems, and POLMP has run successfully, though not 
     efficiently, on a network of Silicon Graphics workstations. 
     Calls to message passing routines are concentrated 
     in a small number of routines for ease of portability and 
     maintenance. POLMP performs housekeeping tasks on one node of the 
     parallel machine, usually node zero, referred to in the code as the 
     driver process, the remaining processes being workers. For Parmacs
     version 5 which requires a host program, a simple host program has 
     been provided which loads the node program onto a two dimensional 
     torus and then takes no further part in the run, other than to 
     receive a completion code from the driver, in case terminating the 
     host early would interfere with execution of the nodes.

Data parallel versions

     A data parallel version of the code has been run on the
     Thinking Machines CM-2, CM-200 and MasPar MP-1 machines.

     High Performance Fortran (HPF) defines extensions to the
     Fortran 90 language in order to provide support for parallel
     execution on a wide variety of machines using a data parallel
     programming model. 

     The subset-HPF version of the POLMP code has been written
     to the draft standard specified by the High Performance
     Fortran Forum in the HPF Language Specification version 0.4
     dated November 6, 1992. Fortran 90 code was developed on a
     Thinking Machines CM-200 machine and checked for
     conformance with the Fortran 90 standard using the
     NAGWare Fortran 90 compiler. HPF directives were inserted
     by translating from the CM Fortran directives, but have not
     been tested due to the lack of access to an HPF compiler. The
     only HPF features used are the PROCESSORS, TEMPLATE,
     ALIGN and DISTRIBUTE directives and the system inquiry
     intrinsic function NUMBER_OF_PROCESSORS.

-------------------------------------------------------------------------------
Total number of lines in source code: 26,699
Number of lines excluding comments  : 11,313
Size in bytes of source code        : 756,107

-------------------------------------------------------------------------------
List input files (filename, number of lines, size in bytes, and if formatted) :

steering file:   13 lines, 250 bytes, ascii (typical size)

-------------------------------------------------------------------------------
List output files (filename, number of lines, size in bytes, and if formatted) :

standard output: 700 lines, 62,000 bytes, ascii (typical size)

-------------------------------------------------------------------------------
Brief, high-level description of what application does:

POLMP solves the linear three-dimensional hydrodynamic equations 
for the wind induced flow in a closed rectangular basin of constant depth
which may include an arbitrary number of land areas. 

-------------------------------------------------------------------------------
Main algorithms used:

The discretized form of the hydrodynamic equations are solved for field 
variables, z, surface elevation, and u and v, horizontal components of
velocity. The fields are represented in the horizontal by a staggered 
finite difference grid. The profile of vertical velocity with depth
is represented by the superposition of a number of spectral components.
The functions used in the vertical are arbitrary, although the 
computational advantages of using eigenfunctions (modes) of the eddy
viscosity profile have been demonstrated (Davies, 1983). Velocities
at the closed boundaries are set to zero.

Each timestep in the forward time integration of the model, involves
successive updates to the three fields, z, u and v. New field values 
computed in each update are used in the subsequent calculations. A
five point finite difference stencil is used, requiring only nearest 
neighbours on the grid. 

A number of different data storage and data processing methods is 
included mainly for handling cases with significant amounts of land, 
e.g. index array, packed data. In particular the program may be 
switched between masked operation, more suitable for vector processors, 
in which computation is done on all points, but land and boundary points
are masked out, and strip-mining, more suitable for scalar and RISC 
processors, in which calculations are only done for sea points.

-------------------------------------------------------------------------------
Skeleton sketch of application:

The call chart of the major subroutines is represented thus:

  AAAPOL -> APOLMP -> INIT
                   -> RUNPOL -> INIT2  -> MAP
                                       -> DIVIDE
                                       -> PRMAP
                                       -> GENSTP
                                       -> SPEC   -> ROOTS  -> TRANS
                             -> SNDWRK
                             -> RCVWRK
                             -> SETUP
                             -> MODSUB -> MODEL  -> ASSIGN -> GENMSK
                                                           -> GENSTP
                                                           -> GENIND
                                                           -> GENPAC
                                                           -> METRIC
                                                 -> CLRFLD
                                                 -> TIME*  -> SNDBND
                                                           -> RCVBND
                                                 -> RESULT
                             -> SNDRES
                             -> RCVRES
                             -> MODOUT -> OZUVW  -> OUTFLD -> GETRES
                                                           -> OUTARR
                                                           -> GRYARR
                                       -> WSTATE

AAAPOL is a dummy main program calling APOLMP. APOLMP calls INIT which
reads parameters from the steering file, checks and monitors them.
RUNPOL is then called which calls another initialization routine INIT2.
Called from INIT2, MAP forms a map of the domain to be modelled, DIVIDE
divides the domain between processors, PRMAP maps sub-domains onto
processors, GENSTP counts indexes for strip-mining and SPEC, ROOTS
and TRANS set up the coefficients for the spectral expansion.

SNDWRK on the driver process sends details of the sub-domain to be
worked on to each worker. RCVWRK receives that information. SETUP
does some array allocation and MODSUB does the main allocation of array 
space to the field and ancillary arrays. MODEL is the main driver 
subroutine for the model. ASSIGN calls routines to generate masks
strip-mining indexes, packing indexes and measurement metrics.
CLRFLD initializes the main data arrays. Then one of seven time-
stepping routines, TIME*, is chosen dependent on the vectorization
and packing/indexing method used to cope with the presence of land.
SNDBND and RCVBND handle the sending and reception of boundary
data between sub-domains. After the required number of time-steps
is complete, RESULT saves results from the desired region, and 
SNDRES, on the workers and RCVRES on the driver collect the result data.
MODOUT handles the writing of model output to standard output and disk
files, as required.

For a non-trivial run, 99% of time is spent in whichever of the 
timestepping routines, TIME*, has been chosen.

-------------------------------------------------------------------------------
Brief description of I/O behavior:

The driver process, usually processor 0, reads in the input parameters 
and broadcasts them to the rest of the processors. The driver also receives 
the results from the other processors and writes them out.

-------------------------------------------------------------------------------
Describe the data distribution (if appropriate) :

The processors are treated as a logical 2-D grid. The simulation domain
is divided into a number of sub-domains which are allocated, one sub-domain
per processor.

-------------------------------------------------------------------------------
Give parameters of the data distribution (if appropriate) :

The number of processors, p, and the number of sub-domains are provided 
as steering parameters, as is a switch which requests either one-dimensional
or two-dimensional partitioning. 

Partitioning is only actually carried out for the message passing versions
of the code. For two-dimensional partitioning p is factored into px and py 
where px and py are as close as possible to sqrt(p). 

For the data parallel version the number of sub-domains is set to one 
and decomposition is performed by the compiler via data distribution 
directives.

-------------------------------------------------------------------------------
Brief description of load balance behavior :

Unless land areas are specified, the load is fairly well balanced. 
If px and py evenly divide the number of grid points, then the
model is perfectly balanced except that boundary sub-domains have 
fewer communications.

No tests with land areas have yet been performed with the parallel 
code, and more sophisticated domain decomposition algorithms have
not yet been included.

-------------------------------------------------------------------------------
Give parameters that determine the problem size :

nx, ny      Size of horizontal grid
m           Number of vertical modes
nts         Number of timesteps to be performed

-------------------------------------------------------------------------------
Give memory as function of problem size :

See below for specific examples.

-------------------------------------------------------------------------------
Give number of floating-point operations as function of problem size :

Assuming stanrdard compiler optimizations, there is a requirement for
29 floating point operations (18 add/subtracts and 11 multiplies) per 
grid point, so the total computational load is

          29 * nx * ny * m * nts

-------------------------------------------------------------------------------
Give communication overhead as function of problem size and data distribution :

During each timestep each sub-domain of size nsubx=nx/px by nsuby=ny/py 
requires the following communications in words :

             nsubx * m     from N
             nsubx         from S
             nsubx * m     from S
             nsuby * m     from W
             nsuby         from E
             nsuby * m     from E
             m             from NE
             m             from SW

making a total of 

             (2 * m + 1)*(nsubx * nsuby) + 2*m words 

in eight messages from six directions.

-------------------------------------------------------------------------------
Give three problem sizes, small, medium, and large for which the benchmark
should be run (give parameters for problem size, sizes of I/O files,
memory required, and number of floating point operations) :

     The data sizes and computational requirements for the various
     problems supplied are :

     Name      nx x ny x m x nts        Computational    Memory
                                        Load (Gflop)     (Mword)

     dbg        10 x   10 x  1 x 2      Small debugging test case

     dbg2d      10 x   10 x  1 x 2      Small debugging test case
                                        for a 2 x 2 decomposition

     v200      512 x  512 x 16 x 200        24             14 

     wa200    1024 x 1024 x 40 x 200       226            126

     xb200    2048 x 2048 x 80 x 200      1812            984

     The memory sizes are the number of Fortran real elements
     (words) required for the strip-mined case on a single processor.
     For the masked case the memory requirement is approximately doubled 
     for the extra mask arrays. For the message passing versions, the 
     total memory requirement will also tend to increase slightly (<10%) 
     with the number of processors employed.

-------------------------------------------------------------------------------
How did you determine the number of floating-point operations (hardware
monitor, count by hand, etc.) :

Count by hand looking at inner loops and making reasonable assumptions
about common compiler optimizations.

-------------------------------------------------------------------------------
Other relevant information:



-------------------------------------------------------------------------------

-- 
                                    ,?,
                                   (o o)
|------------------------------oOO--(_)--OOo----------------------------|
|                                                                       |
| Dr Mike Ashworth                          NERC Computer Services      |
| NERC Supercomputing Consultant            Bidston Observatory         |
| Tel:         +44 51 653 8633              BIRKENHEAD                  |
| Fax:         +44 51 653 6269              L43 7RA                     |
| email:       mia@ua.nbi.ac.uk             United Kingdom              |
| alternative: M.Ashworth@ncs.nerc.ac.uk                                |
|-----------------------------------------------------------------------|









From owner-parkbench-compactapp@CS.UTK.EDU Tue Mar 22 10:14:36 1994
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.8t-netlib)
	id KAA13973; Tue, 22 Mar 1994 10:14:35 -0500
Received: from localhost by CS.UTK.EDU with SMTP (cf v2.8s-UTK)
	id KAA10524; Tue, 22 Mar 1994 10:14:19 -0500
X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Tue, 22 Mar 1994 10:14:18 EST
Errors-to: owner-parkbench-compactapp@CS.UTK.EDU
Received: from rios2.epm.ornl.gov by CS.UTK.EDU with SMTP (cf v2.8s-UTK)
	id KAA10516; Tue, 22 Mar 1994 10:14:14 -0500
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA18130; Tue, 22 Mar 1994 10:14:23 -0500
Message-Id: <9403221514.AA18130@rios2.epm.ornl.gov>
To: worley@rios2.epm.ornl.gov
Cc: pbwg-compactapp@CS.UTK.EDU
Subject: ParkBench code
Date: Tue, 22 Mar 94 10:14:23 -0500
From: "David W. Walker" <walker@rios2.epm.ornl.gov>


Dear Pat,

Thank you for submitting the PSTSWM for
inclusion in the ParkBench Compact Applications benchmark suite. After due
consideration the Compact Applications subcommittee has decided to include
the code in the benchmark suite.  I would be grateful if you would arrange
for the source code, input, and output files to be sent to me.

To submit your code please send me the following:

1. The complete source code

2. Input files corresponding to the small, medium, and large cases
   described in your submission (T21, T42, and T85)

3. An output file corresponding to the small case to be used for
   validation purposes

4. PostScript files of any papers describing the sequential and parallel
   algorithms that you may have available.

If you have versions of the code using different message passing packages
please supply multiple versions of the source code.

Ultimately we would like the codes to be self-validating. Please can you
let me have any suggestions on what quantities might be checked to
validate the code.

Best Regards,
David Walker
--------------------------------------------------------------------------
| David W. Walker                 |   Office   : (615) 574-7401          |
| Oak Ridge National Laboratory   |   Fax      : (615) 574-0680          |
| Building 6012/MS-6367           |   Messages : (615) 574-1936          |
| P. O. Box 2008                  |   Email    : walker@msr.epm.ornl.gov |
| Oak Ridge, TN 37831-6367        |                                      |
--------------------------------------------------------------------------

                 PARKBENCH COMPACT APPLICATIONS SUBMISSION FORM

To submit a compact application to the ParkBench suite you must follow the
following procedure:

1. Complete the submission form below, and email it to David Walker
   at walker@msr.epm.ornl.gov. The data on this form will be reviewed 
   by the ParkBench Compact Applications Subcommittee, and you will
   be notified if the application is to be considered further for
   inclusion in the ParkBench suite.
   
2. If ParkBench Compact Applications Subcommittee decides to consider
   your application further you will be asked to submit the source code
   and input and output files, together with any documentation and papers
   about the application. Source code and input and output files should
   be submitted by email, or ftp, unless the files are very large, in
   which case a tar file on a 1/4 inch cassette tape. Wherever possible 
   email submission is preferred for all documents in man page, Latex 
   and/or Postscipt format. These files documents and papers together
   constitute your application package. Your application package should
   be sent to:
David Walker
--------------------------------------------------------------------------
| David W. Walker                 |   Office   : (615) 574-7401          |
| Oak Ridge National Laboratory   |   Fax      : (615) 574-0680          |
| Building 6012/MS-6367           |   Messages : (615) 574-1936          |
| P. O. Box 2008                  |   Email    : walker@msr.epm.ornl.gov |
| Oak Ridge, TN 37831-6367        |                                      |
--------------------------------------------------------------------------


-------------------------------------------------------------------------------
Name of Program         : PSTSWM 
                        : (Parallel Spectral Transform Shallow Water Model)
-------------------------------------------------------------------------------
Submitter's Name        : Patrick H. Worley
Submitter's Organization: Oak Ridge National Laboratory
Submitter's Address     : Bldg. 6012/MS-6367
                          P. O. Box 2008
                          Oak Ridge, TN 37831-6367
Submitter's Telephone # : (615) 574-3128
Submitter's Fax #       : (615) 574-0680
Submitter's Email       : worley@msr.epm.ornl.gov
-------------------------------------------------------------------------------
Cognizant Expert(s)     : Patrick H. Worley
CE's Organization       : Oak Ridge National Laboratory
CE's Address            : Bldg. 6012/MS-6367
                          P. O. Box 2008
                          Oak Ridge, TN 37831-6367
CE's Telephone #        : (615) 574-3128
CE's Fax #              : (615) 574-0680
CE's Email              : worley@msr.epm.ornl.gov

Cognizant Expert(s)     : Ian T. Foster
CE's Organization       : Argonne National Laboratory
CE's Address            : MCS 221/D-235
                          9700 S. Cass Avenue
                          Argonne, IL 60439
CE's Telephone #        : (708) 252-4619
CE's Fax #              : (708) 252-5986
CE's Email              : itf@mcs.anl.gov
-------------------------------------------------------------------------------
Extent and timeliness with which CE is prepared to respond to questions and
bug reports from ParkBench :

Modulo other commitments, Worley is prepared to respond quickly to questions
and bug reports, but expects to be kept informed as to results of experiments
and modifications to the code.

-------------------------------------------------------------------------------
Major Application Field : Fluid Dynamics
Application Subfield(s) : Climate Modeling
-------------------------------------------------------------------------------
Application "pedigree"  :

PSTSWM Version 1.0 is a message-passing benchmark code and parallel algorithm
testbed that solves the nonlinear shallow water equations using the spectral
transform method. The spectral transform algorithm of the code follows
closely how CCM2, the NCAR Community Climate Model, handles the dynamical
part of the primitive equations, and the parallel algorithms implemented in
the model include those currently used in the message-passing parallel
implementation of CCM2. PSTSWM was written by Patrick Worley of Oak Ridge
National Laboratory and Ian Foster of Argonne National Laboratory, and is
based partly on previous parallel algorithm research by John Drake, David
Walker, and Patrick Worley of Oak Ridge National Laboratory. Both the code
development and parallel algorithms research were funded by the DOE Computer
Hardware, Advanced Mathematics, and Model Physics (CHAMMP) program. The
features of version 1.0 were frozen on 8/1/93, and it is this version we
would offer initially as a benchmark.  

PSTSWM is a parallel implementation of a sequential code (STSWM 2.0) written
by James Hack and Ruediger Jakob at NCAR to solve the shallow water equations 
on a sphere using the spectral transform method. STSWM evolved from a
spectral shallow water model written by Hack (NCAR/CGD) to compare numerical
schemes designed to solve the divergent barotropic equations in spherical
geometry. STSWM was written partially to provide the reference solutions
to the test cases proposed by Williamson et. al. (see citation [4] below),
which were chosen to test the ability of numerical methods to simulate
important flow phenomena. These test cases are embedded in the code and 
are selectable at run-time via input parameters, specifying initial conditions,
forcing, and analytic solutions (for error analysis). The solutions are also
published in a Technical Note by Jakob et. al. [3]. In addition, this code is
meant to serve as an educational tool for numerical studies of the shallow
water equations. A detailed description of the spectral transform method, and
a derivation of the equations used in this software, can be found in the
Technical Note by Hack and Jakob [2].  

For PSTSWM, we rewrote STSWM to add vertical levels (in order to get the
correct communication and computation granularity for 3-D weather and climate
codes), to increase modularity and support code reuse, and to allow the
problem size to be selected at runtime without depending on dynamic memory
allocation. PSTSTWM is meant to be a compromise between paper benchmarks and
the usual fixed benchmarks by allowing a significant amount of
runtime-selectable algorithm tuning. Thus, the goal is to see how quickly the
numerical simulation can be run on different machines without fixing the
parallel implementation, but forcing all implementations to execute the same
numerical code (to guarantee fairness). The code has also been written in
such a way that linking in optimized library functions for common operations
instead of the "portable" code will simple.

-------------------------------------------------------------------------------
May this code be freely distributed (if not specify restrictions) :

Yes, but users are requested to acknowledge the authors (Worley and
Foster) and the program that supported the development of the code
(DOE CHAMMP program) in any resulting research or publications, and are
encouraged to send reprints of their work with this code to the authors.
Also, the authors would appreciate being notified of any modifications to 
the code. Finally, the code has been written to allow easy reuse of code in
other applications, and for educational purposes. The authors encourage this,
but also request that they be notified when pieces of the code are used.

-------------------------------------------------------------------------------
Give length in bytes of integers and floating-point numbers that should be
used in this application:

The program currently uses INTEGER, REAL, COMPLEX, and DOUBLE PRECISION
variables. The code should work correctly for any system in which COMPLEX is
represented as 2 REALs. The include file params.i has parameters that can be
used to specify the length of these. Also, some REAL and DOUBLE parameters
values may need to be modified for floating point number systems with large
mantissas, e.g., PI, TWOPI. PSTSWM is currently being used on systems where

        Integers : 4   bytes
	Floats   : 4   bytes

The use of two precisions can be eliminated, but at the cost of a significant
loss of precision. (For 4 bytes REALs, not using DOUBLE PRECISION increases
the error by approximately three orders of magnitude.) DOUBLE PRECISION
results are only used in set-up (computing Gauss weights and nodes and
Legendre polynomial values), and are not used in the body of the computation.

-------------------------------------------------------------------------------
Documentation describing the implementation of the application (at module
level, or lower) :

The sequential code is documented in a file included in the distribution of the
code from NCAR:

Jakob, Ruediger, Description of Software for the Spectral Transform Shallow
Water Model Version 2.0. National Center for Atmospheric Research,
Boulder, CO 80307-3000, August 1992

and in 

Hack, J.J. and R. Jakob, Description of a global shallow water model based on
the spectral transform method, NCAR Technical Note TN-343+STR, January 1992. 

Documentation of the parallel code is in preparation, but extensive
documentation is present in the code.

-------------------------------------------------------------------------------
Research papers describing sequential code and/or algorithms :

1) Browning, G.L., J.J. Hack and P.N. Swarztrauber, A comparison of
   three numerical methods for solving differential equations on
   the sphere, Monthly Weather Review, 117:1058-1075, 1989.

2) Hack, J.J. and R. Jakob, Description of a global
   shallow water model based on the spectral transform method,
   NCAR Technical Note TN-343+STR, January 1992.

3) Jakob, R., J.J. Hack and D.L. Williamson, Reference solutions to
   shallow water test set using the spectral transform method,
   NCAR Technical Note TN-388+STR (in preparation).

4) Williamson, D.L., J.B. Drake, J.J. Hack, R. Jakob and P.S. Swarztrauber,
   A standard test set for numerical approximations to the shallow
   water equations in spherical geometry, Journal of Computational Physics,
   Vol. 102, pp.211-224, 1992.
-------------------------------------------------------------------------------
Research papers describing parallel code and/or algorithms :

5) Worley, P. H. and J. B. Drake, Parallelizing the Spectral Transform Method,
   Concurrency: Practice and Experience, Vol. 4, No. 4 (June 1992), 
   pp. 269-291.

6) Walker, D. W., P. H. Worley, and J. B. Drake, Parallelizing the Spectral
   Transform Method. Part II, 
   Concurrency: Practice and Experience, Vol. 4, No. 7 (October 1992), 
   pp. 509-531.

7) Foster, I. T. and P. H. Worley,
   Parallelizing the Spectral Transform Method: A Comparison of Alternative
   Parallel Algorithms,
   Proceedings of the Sixth SIAM Conference on Parallel Processing for
   Scientific Computing (March22-24, 1993), pp. 100-107.

8) Foster, I. T. and P. H. Worley,
   Parallel Algorithms for the Spectral Transform Method,
   (in preparation)

9) Worley, P. H. and I. T. Foster,
   PSTSWM: A Parallel Algorithm Testbed and Benchmark.
   (in preparation)

-------------------------------------------------------------------------------
Other relevent research papers:

10) I. Foster, W. Gropp, and R. Stevens, 
    The parallel scalability of the spectral transform method, 
    Mon. Wea. Rev., 120(5), 1992, pp. 835--850. 

11) Drake, J. B., R. E. Flanery, I. T. Foster, J. J. Hack, J. G. Michalakes,
    R. L. Stevens, D. W. Walker, D. L. Williamson, and P. H. Worley,
    The Message-Passing Version of the Parallel Community Climate Model,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 500-513.

12) Sato, R. K. and R. D. Loft,
    Implementation of the NCAR CCM2 on the Connection Machine,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 371-393.

13) Barros, S. R. M. and Kauranne, T.,
    On the Parallelization of Global Spectral Eulerian Shallow-Water Models,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 36-43.

14) Kauranne, T. and S. R. M. Barros,
    Scalability Estimates of Parallel Spectral Atmospheric Models,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 312-328.

15) Pelz, R. B. and W. F. Stern,
    A Balanced Parallel Algorithm for Parallel Processing,
    Proceedings of the Sixth SIAM Conference on Parallel Processing for
    Scientific Computing (March22-24, 1993), pp. 126-128.

-------------------------------------------------------------------------------
Application available in the following languages (give message passing system
used, if applicable, and machines application runs on) :

The model code is primarily written in Fortran 77, but also uses
DO ... ENDDO and DO WHILE ... ENDDO, and the INCLUDE extension (to pull in
common and parameter declarations). It has been compiled and run on the Intel
iPSC/2, iPSC/860, Delta, and Paragon, the IBM SP1, and on Sun Sparcstation,
IBM RS/6000, and Stardent 3000/1500 workstations (as a sequential code).

Message passing is implemented using the PICL message passing system.
All message passing is encapsulated in 3 highlevel routines:

BCAST0 (broadcast)
GMIN0  (global minimum)
GMAX0  (global maximum)

two classes of low level routines:
 SWAP, SWAP_SEND, SWAP_RECV, SWAP_RECVBEGIN, SWAP_RECVEND, SWAP1, SWAP2, SWAP3
 (variants and/or pieces of the swap operation)
and
 SENDRECV, SRBEGIN, SREND, SR1, SR2, SR3
 (variants and/or pieces of the send/recv operation)

and one synchronization primitive:
CLOCKSYNC0

PICL instrumentation commands are also embedded in the code.

Porting the code to another message passing library will be simple, although
some of the runtime communication options may become illegal then.
The PICL instrumentation calls can be stubbed out (or removed) without
changing the functionality of the code, but some sort of synchronization is
needed when timing short benchmark runs.

-------------------------------------------------------------------------------
Total number of lines in source code: 28,204
Number of lines excluding comments  : 12,434
Size in bytes of source code        : 994,299
-------------------------------------------------------------------------------
List input files (filename, number of lines, size in bytes, and if formatted) :

problem:   23 lines, 559 bytes, ascii
algorithm: 33 lines, 874 bytes, ascii

-------------------------------------------------------------------------------
List output files (filename, number of lines, size in bytes, and if formatted) :

standard output: Number of lines and bytes is a function of the input
                 specifications, but for benchmarking would normally be
                 63 lines (2000 bytes) of meaningful output. (On the Intel
                 machine, FORTRAN STOP messages are sent from each processor
                 at the end of the run, increasing this number.)

timings:         Each run produces one line of output, containing approx.
                 150 bytes.

Both files are ascii.


-------------------------------------------------------------------------------
Brief, high-level description of what application does:

(P)STSWM solves the nonlinear shallow water equations on the sphere.
The nonlinear shallow water equations constitute a simplified
atmospheric-like fluid prediction model that exhibits many of the features of
more complete models, and that has been used to investigate numerical
methods and benchmark a number of machines.
Each run of PSTSWM uses one of 6 embedded initial conditions and forcing
functions. These cases were chosen to stress test numerical methods for this
problem, and to represent important flows that develop in atmospheric
modeling. STSWM also supports reading in arbitrary initial conditions, but
this was removed from the parallel code to simplify the development of the
initial implementation. 

-------------------------------------------------------------------------------
Main algorithms used:

PSTSWM uses the spectral transform method to solve the shallow water
equations. During each timestep, the state variables of the
problem are transformed between the physical domain, where most of the
physical forces are calculated, and the spectral domain, where the terms of
the differential equation are evaluated. The physical domain is a tensor
product longitude-latitude grid. The spectral domain is the set of spectral
coefficients in a spherical harmonic expansion of of the state variables, and
is normally characterized as a triangular array (using a "triangular"
truncation of spectral coefficients). 

Transforming from physical coordinates to spectral coordinates involves
performing a real FFT for each line of constant latitude, followed by 
integration over latitude using Gaussian quadrature (approximating the
Legendre transform) to obtain the spectral coefficients. The inverse
transformation involves evaluating sums of spectral harmonics and inverse
real FFTs, analogous to the forward transform.

Parallel algorithms are used to compute the FFTs and to compute the 
vector sums used to approximate the forward and inverse Legendre transforms.
Two major alternatives are available for both transforms, distributed
algorithms, using a fixed data decompostion and computing results where they
are assigned, and transpose algorithms, remapping the domains to allow the
transforms to be calculated sequentially. This translates to four major
parallel algorithms:

a) distributed FFT/distributed Legendre transform (LT)
b) transpose FFT/distributed LT
c) distributed FFT/transpose LT
d) transpose FFT/transpose LT

Multiple implementations are supported for each type of algorithm, and
the assignment of processors to transforms is also determined by input
parameters. For example, input parameters specify a logical 2-D processor
grid and define the data decomposition of the physical and spectral domains
onto this grid. If 16 processors are used, these can be arranged as
a 4x4 grid, an 8x2 grid, a 16x1 grid, a 2x8 grid, or a 1x16 grid.
This specification determines how many processors are used to calculate each
parallel FFT and how many are used to calculate each parallel LT.

-------------------------------------------------------------------------------
Skeleton sketch of application:

The main program calls INPUT to read problem and algorithm parameters
and set up arrays for spectral transformations, and then calls
INIT to set up the test case parameters. Routines ERRANL and
NRGTCS are called once before the main timestepping loop for
error normalization, once after the main timestepping for 
calculating energetics data and errors, and periodically during 
the timestepping, as requested. The prognostic fields are 
initialized using routine ANLYTC, which provides the analytic
solution. Each call to STEP advances the computed fields by a 
timestep DT. Timing logic surrounds the timestepping loop, so the
initialization phase is not timed. Also, a fake timestep is calculated before
beginning timing to eliminate the first time "paging" effect currently seen
on the Intel Paragon systems. 

STEP computes the first two time levels by two semi-implicit timesteps;
normal time-stepping is by a centered leapfrog-scheme. STEP calls COMP1,
which choses between an explicit numerical algorithm, a semi-implicit
algorithm, and a simplified algorithm associated with solving the advection
equation, one of the embedded test cases. The numerical algorithm used is an
input parameter. 

The basic outline of each timestep is the following:
1) Evaluate non-linear product and forcing terms.
2) Fourier transform non-linear terms in place as a block transform.
3) Compute and update divergence, geopotential, and vorticity spectral
   coefficients. (Much of the calculation of the time update is "bundled"
   with the Legendre transform.)
4) Compute velocity fields and transform divergence, geopotential,
   and vorticity back to gridpoint space using 
   a) an inverse Legendre transform and associated computations and
   b) an inverse real block FFT.

PSTSWM has "fictitious" vertical levels, and all computations are duplicated
on the different levels, potentially significantly increasing the granularity
of the computation. (The number of vertical levels is an input parameter.)
For error analysis, a single vertical level is extracted and analyzed. 

-------------------------------------------------------------------------------
Brief description of I/O behavior:

Processor 0 reads in the input parameters and broadcasts them to the rest of
the processors. Processor 0 also receives the error analysis and timing
results from the other processors and writes them out.

-------------------------------------------------------------------------------
Describe the data distribution (if appropriate) :

The processors are treated as a logical 2-D grid. There are 3 domains to be
distributed:
 a) physical domain: tensor product longitude-latitude grid
 b) Fourier domain: tensor product wavenumber-latitude grid
 c) spectral domain: triangular array, where each column contains the
                     spectral coefficients associated with a given
                     wavenumber. The larger the wavenumber is, the shorter
                     the column is.
An unordered FFT is used, and the Fourier and spectral domains use the
"unordered" permutation when the data is being distributed.

I) distributed FFT/distributed LT
   1) The tensor-product longitude-latitude grid is mapped onto the 
      processor grid by assigning a block of contiguous longitudes 
      to each processor column and by assigning one or two blocks of
      contiguous latitudes to each processor row. The vertical dimension is
      not distributed.   
   2) After the FFT, the subsequent wavenumber-latitude grid is similarly
      distributed over the processor grid, with a block of the permuted
      wavenumbers assigned to each processor column.
   3) After the LT, the wavenumbers are distributed as before and the spectral
      coefficients associated with any given wavenumber are either
      distributed evenly over the processors in the column containing that
      wavenumber, or are duplicated over the column. What happens is a
      function of the particular distributed LT algorithm used.

II) transpose FFT/distributed LT
   1) same as in (I)
   2) Before the FFT, the physical domain is first remapped to
      a vertical layer-latitude decomposition, with a block of contiguous
      vertical layers assigned to each processor column and the longitude
      dimension not distributed. After the transform, the vertical
      level-latitude grid is distributed as before, and the wavenumber
      dimension is not distributed. 
   3) After the LT, the spectral coefficients for a given vertical layers are
      either distributed evenly over the processors in a column, or are
      duplicated over that column. What happens is a function of the
      particular distributed LT algorithm used. 

III) distributed FFT/transpose LT
   1) same as (I)
   2) same as (I)
   3) Before the LT, the wavenumber-latitude grid is first remapped to
      a wavenumber-vertical layer decomposition, with a block of contiguous
      vertical layers assigned to eadh processor row and the latitude
      dimension not distributed. After the transform, the spectral
      coefficients associated with a given wavenumber and vertical layer
      are all on one processor, and the wavenumbers and vertical layers are
      distributed as before.

IV) transpose FFT/transpose LT
   1) same as (I)
   2) same as (II)
   3) Before the LT, the vertical level-latitude grid is first remapped to
      a vertical level-wavenumber decomposition, with a block of the permuted 
      wavenumbers now assigned to each processor row and the latitude
      dimension not distributed. After the transform, the spectral
      coefficients associated with a given wavenumber and vertical layer
      are all on one processor, and the wavenumbers and vertical layers are
      distributed as before.

-------------------------------------------------------------------------------
Give parameters of the data distribution (if appropriate) :

The distribution is a function of the problem size (longitude, latitude,
vertical levels), the logical processor grid (PX, PY), and the algorithm
(transpose vs. distributed for FFT and LT).

-------------------------------------------------------------------------------
Brief description of load balance behavior :

The load is fairly well balanced. If PX and PY evenly divide the number of
longitudes, latitudes, and vertical levels, then all load imbalances are due
to the unequal distribution of spectral coefficients. As described above, the
spectral coefficients are laid out as a triangular array in most runs, where
each column corresponds to a different Fourier wavenumber. The wavenumbers are
partitioned among the processors in most of the parallel algorithms. Since
each column is a different length, a wrap mapping of the the columns will
approximately balance the load. Instead, the natural "unordered" ordering of
the FFT is used with a block partitioning, which does a reasonable job of
load balancing without any additional data movement. The load imbalance is
quantified in Walker, et al [5]. 

If PX and PY do not evenly divide the dimensions of the physical domain,
then other load imbalances may be as large as a factor of 2 in the worse
case. 

-------------------------------------------------------------------------------
Give parameters that determine the problem size :

MM, NN, KK - specifes number of Fourier wavenumber and spectral truncation
             used. For a triangular truncation, MM = NN = KK.
NLON, NLAT, NVER
           - number of longitudes, latitudes, and vertical levels. There
             are required relationships between NLON, NLAT, and NVER, and
             between these and MM. These relationships are checked in the
             code. We will also provide a selection of input files that
             specify legal (and interesting) problems.
DT         - timestep (in seconds). (Must be small enough to satisfy Courant
             condition stability condition. Code warns if too large, but does
             not abort.)
TAUE       - end of model run (in hours)

-------------------------------------------------------------------------------
Give memory as function of problem size :

Executable size is determined at compile time by setting the parameters
COMPSZ in params.i. Per node memory requirements are approximately
(in REALs)

associated Legendre polynomial values:
   MM*MM*NLAT/PX*PY
physical grid fields: 
   8*NLON*NLAT*NVER/(PX*PY)
spectral grid fields: 
   3*MM*MM*NVER/(PX*PY) 
 or (if spectral coefficients duplicated within a processor column)
   3*MM*MM*MVER/PX        
work space:
   8*NLON*NLAT*NVER*BUFS1/(PX*PY) + 3*MM*MM*NVER*BUFS2/(PX*PY)
 or (if spectral coefficients duplicated within a processor column)
   8*NLON*NLAT*NVER*BUFS1/(PX*PY) + 3*MM*MM*NVER*BUFS2/PX

where BUFS1 and BUFS2 are input parameters (number of communication buffers).
BUFS1 and BUFS2 can be as small as 0 and as large as PX or PY.

In standard test cases, NLON=2*NLAT, NLON=4*NVER, and NLON=3*MM+1, so memory
requirements are approximately:

    (2 + 108*(1+BUFS1) + 3*(1+BUFS2))*(M**3)/(4*PX*PY)
  or
    (2 + 108*(1+BUFS1))*(M**3)/(4*PX*PY) + 3*(1+BUFS2)*(M**3)/(4*PX)


-------------------------------------------------------------------------------
Give number of floating-point operations as function of problem size :

for a serial run per timestep (very rough):
  nonlinear terms:
        10*NLON*NLAT*NVER
  forward FFT:
        40*NLON*NLAT*NVER*LOG2(NLON)
  forward LT and time update:
       48*MM*NLAT*NVER + 7*(MM**2)*NLAT*NVER
  inverse LT and calculation of velocities:
       20*MM*NLAT*NVER + 14*(MM**2)*NLAT*NVER
  inverse FFT:
       25*NLON*NLAT*NVER*LOG2(NLON)

Using standard assumptions (NLON=2*NLAT, NLON=4*NVER, and NLON=3*MM+1):

approx. 460*(M**3) + 348*(M**3)*LOG2(M) + 24*(M**4) flops per timestep.

For a total run, multiply by TAUE/DT.

-------------------------------------------------------------------------------
Give communication overhead as function of problem size and data distribution :

This is a function of the algorithm chosen.

I) transpose FFT
   a) forward + inverse FFT: let D = 13*NLON*NLAT*NVER/(PX*PY)
        2*(PX-1) steps, D volume
      or
        2*LOG2(PX) steps, D*LOG2(PX) volume 

II) distributed FFT
   a) forward + inverse FFT: let D = 13*NLON*NLAT*NVER/(PX*PY)
        2*LOG2(PX) steps, D*LOG2(PX) volume

III) transpose LT

   a) forward LT:  let D = 8*NLON*NLAT*NVER/(PX*PY)
        2*(PY-1) steps, D volume
      or
        2*LOG2(PY) steps, D*LOG2(PY) volume 

   b) inverse LT:  let D = (3/2)*(MM**2)*NVER/(PX*PY)
        (PY-1) steps, D volume
       or
        LOG2((PY) steps, D*PY volume

IV) distributed LT

   a) forward + inverse LT:  let D = 3*(MM**2)*NVER/(PX*PY)
        2*(PY-1) steps, D*PY volume
       or
        2*LOG2((PY) steps, D*PY volume

These are per timestep costs. Multiply by TAUE/DT for total communication
overhead. 

-------------------------------------------------------------------------------
Give three problem sizes, small, medium, and large for which the benchmark
should be run (give parameters for problem size, sizes of I/O files,
memory required, and number of floating point operations) :

Standard input files will be provided for 

T21: MM=KK=NN=21      T42: MM=KK=NN=42        T85: MM=NN=KK=85
     NLON=32               NLON=64                 NLON=128
     NLAT=64               NLAT=128                NVER=256
     NVER=8                NVER=16                 NVER=32
     ICOND=2               ICOND=2                 ICOND=2
     DT=4800.0             DT=2400.0               DT=1200.0
     TAUE=120.0            TAUE=120.0              TAUE=120.0

These are 5 day runs of the "benchmark" case specified in Williamson, et al
[3]. Flops and memory requirements for serial runs are as follows (approx.):

T21:           500,000 REALs
         2,000,000,000 flops
     
T42:         4,000,000 REALs
        45,000,000,000 flops

T85:        34,391,000 REALs
     1,000,000,000,000 flops

Both memory and flops scale well, so, for example, the T42 run fits in
approx. 4MB of memory for a 4 processor run. But different algorithms and 
different aspect ratios of the processor grid use different amounts of memory.

-------------------------------------------------------------------------------
How did you determine the number of floating-point operations (hardware
monitor, count by hand, etc.) :

Count by hand (looking primarily at inner loops, but eliminating common
subexpressions that compiler is expected to find).

-------------------------------------------------------------------------------

From owner-parkbench-compactapp@CS.UTK.EDU Tue Mar 22 10:19:48 1994
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.8t-netlib)
	id KAA14012; Tue, 22 Mar 1994 10:19:45 -0500
Received: from localhost by CS.UTK.EDU with SMTP (cf v2.8s-UTK)
	id KAA10903; Tue, 22 Mar 1994 10:19:27 -0500
X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Tue, 22 Mar 1994 10:19:17 EST
Errors-to: owner-parkbench-compactapp@CS.UTK.EDU
Received: from rios2.epm.ornl.gov by CS.UTK.EDU with SMTP (cf v2.8s-UTK)
	id KAA10892; Tue, 22 Mar 1994 10:19:14 -0500
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA23268; Tue, 22 Mar 1994 10:18:26 -0500
Message-Id: <9403221518.AA23268@rios2.epm.ornl.gov>
To: spb@epcc.ed.ac.uk
Cc: pbwg-compactapp@CS.UTK.EDU
Subject: ParkBench code
Date: Tue, 22 Mar 94 10:18:26 -0500
From: "David W. Walker" <walker@rios2.epm.ornl.gov>


Dear Dr. Booth,

Thank you for submitting the SOLVER code for inclusion in
the ParkBench Compact Applications benchmark suite. After due
consideration the Compact Applications subcommittee has decided to include
the code in the benchmark suite.  I would be grateful if you would arrange
for the source code, input, and output files to be sent to me.

To submit your code please send me the following:


1. The complete source code

2. Input files corresponding to the small, medium, and large cases
   described in your submission

3. An output file corresponding to the small case to be used for
   validation purposes

4. PostScript files of the following papers mentioned in your submission
   describing the sequential and parallel codes (if available). Also the
   users guide if there is one.

If you have versions of the code using different message passing packages
please supply multiple versions of the source code.

Ultimately we would like the codes to be self-validating. Please can you
let me have any suggestions on what quantities might be checked to
validate the code.

If the above files are too large to email to me, please let me know if there
is an anonymous ftp site where I can copy them from.

Best Regards,
David Walker
--------------------------------------------------------------------------
| David W. Walker                 |   Office   : (615) 574-7401          |
| Oak Ridge National Laboratory   |   Fax      : (615) 574-0680          |
| Building 6012/MS-6367           |   Messages : (615) 574-1936          |
| P. O. Box 2008                  |   Email    : walker@msr.epm.ornl.gov |
| Oak Ridge, TN 37831-6367        |                                      |
--------------------------------------------------------------------------


-------------------------------------------------------------------------
                  PARKBENCH COMPACT APPLICATIONS SUBMISSION FORM

To submit a compact application to the ParkBench suite you must follow the
following procedure:

1. Complete the submission form below, and email it to David Walker
   at walker@msr.epm.ornl.gov. The data on this form will be reviewed 
   by the ParkBench Compact Applications Subcommittee, and you will
   be notified if the application is to be considered further for
   inclusion in the ParkBench suite.
   
2. If ParkBench Compact Applications Subcommittee decides to consider
   your application further you will be asked to submit the source code
   and input and output files, together with any documentation and papers
   about the application. Source code and input and output files should
   be submitted by email, or ftp, unless the files are very large, in
   which case a tar file on a 1/4 inch cassette tape. Wherever possible 
   email submission is preferred for all documents in man page, Latex 
   and/or Postscript format. These files documents and papers together
   constitute your application package. Your application package should
   be sent to:
David Walker
                Oak Ridge National Laboratory
                Bldg. 6012/MS-6367
                P. O. Box 2008
                Oak Ridge, TN 37831-6367
                (615) 574-7401/0680 (phone/fax)
                walker@msr.epm.ornl.gov

   The street address is "Bethal Valley Road" if Fedex insists on this.
   The subcommittee will then make a final decision on whether to include 
   your application in the ParkBench suite.

3. If your application is approved for inclusion in the ParkBench suite
   you (or some authorized person from your organization) will be asked
   in complete and sign a form giving ParkBench authority to distribute,
   and modify (if necessary), your application package.

-------------------------------------------------------------------------------
Name of Program         : SOLVER
                        : 
-------------------------------------------------------------------------------
Submitter's Name        : Stephen P. Booth
Submitter's Organization: UKQCD collaboration
Submitter's Address     : EPCC
			  The University of Edinburgh
			  James Clerk Maxwell Building
			  The King's Buildings 
			  Mayfield Road
			  Edinburgh EH9 3JZ
		          Scotland
Submitter's Telephone # : +44 (0)31 650 5746
Submitter's Fax #       : +44 (0)31 622 4712
Submitter's Email       : spb@epcc.ed.ac.uk
-------------------------------------------------------------------------------
Cognizant Expert(s)     : Dr S.P.Booth
CE's Organization       : EPCC/UKQCD
CE's Address            : The University of Edinburgh
			  James Clerk Maxwell Building
			  The King's Buildings 
			  Mayfield Road
			  Edinburgh EH9 3JZ
		          Scotland
CE's Telephone #        : +44 (0)31 650 5746
CE's Fax #              : +44 (0)31 622 4712
CE's Email              : spb@epcc.ed.ac.uk

Cognizant Expert(s)     : Dr R.D. Kenway
CE's Organization       : EPCC/UKQCD
CE's Address            : The University of Edinburgh
			  James Clerk Maxwell Building
			  The King's Buildings 
			  Mayfield Road
			  Edinburgh EH9 3JZ
		          Scotland
CE's Telephone #        : +44 (0)31 650 5245
CE's Fax #              : +44 (0)31 622 4712
CE's Email              : rdk@epcc.ed.ac.uk

-------------------------------------------------------------------------------
Extent and timeliness with which CE is prepared to respond to questions and
bug reports from ParkBench :

S.Booth is prepared to respond quickly to questions and bug reports.
We have a strong interest in the portability and performance of this code.


-------------------------------------------------------------------------------
Major Application Field : Lattice gauge theory
Application Subfield(s) : QCD
-------------------------------------------------------------------------------
Application "pedigree" (origin, history, authors, major mods) :

SOLVER is part of an ongoing software development exercise carried out
by UKQCD (The United Kingdom Quantum Chromo-Dynamics  collaboration)
To develop a new generation of simulation codes. The current generation
of codes were highly tuned for a particular machine architecture so a
software development exercise was started to design and develop a set of
portable codes. This code was developed by S.Booth and N.Stanford of
the University of Edinburgh during the course of 1993.
Solver is a benchmark code derived from the codes used to generate quark
propagators. It is designed to benchmark and validate the computational 
sections of this operation. It differs from the production code in that
it self initialises to non-trivial test data rather than performing file
access. This is because there is no accepted standard for parallel file
access.
The benchmark was originally developed as part of a national UK procurement
exercise.

-------------------------------------------------------------------------------
May this code be freely distributed (if not specify restrictions) :

The code may be freely distributed for benchmarking purposes but 
the code remains the property of UKQCD and we ask to be contacted
if anyone wishes to use it as an application code.

-------------------------------------------------------------------------------
Give length in bytes of integers and floating-point numbers that should be
used in this application:

All floating point numbers are defined as macros (either Fpoint or Dpoint)
The majority of the variables are Fpoint. Dpoint is only used for
accumulation values that may require higher precision. This allows the
precision of the program to be changed easily. For small and
intermediate problem sizes 4 byte Fpoints and 8 byte Dpoints should be 
sufficient. For large problems higher precision may be required.
INTEGERS must be large enough to hold the number of sites 
allocated to a processor (4 bytes almost certainly sufficient)
The COMPLEX type is not used.

-------------------------------------------------------------------------------
Documentation describing the implementation of the application (at module
level, or lower) :

Documentation exists for all program routines except some low level
routines local to a single source file.

-------------------------------------------------------------------------------
Research papers describing sequential code and/or algorithms :

-------------------------------------------------------------------------------
Research papers describing parallel code and/or algorithms :

-------------------------------------------------------------------------------
Other relevant research papers:

-------------------------------------------------------------------------------
Application available in the following languages (give message passing system
used, if applicable, and machines application runs on) :

Two version of the application were developed in parallel.
1) A HPF version (both CMF and HPF directives)
2) A message passing version.

The message passing version uses ansi-F77 with the following extensions
a) CPP is used for include files and some simple macros and build-time 
   conditionals.
b) The F77 restrictions of variable names are not adhered to though the
   authors have tools to convert the code to conform.

All of the message passing operations are confined to a small number of
routines. These routines were designed to be implementable in as many
different message passing systems as possible. Current versions are
1) fake - converts the program to a single processor code.
2) PARMACS - original parallel versions
3) PVM - under development.

-------------------------------------------------------------------------------
Total number of lines in source code: 15567
Number of lines excluding comments  : 10679
Size in bytes of source code        : 432398
-------------------------------------------------------------------------------
List input files (filename, number of lines, size in bytes, and if formatted) :

None 

-------------------------------------------------------------------------------
List output files (filename, number of lines, size in bytes, and if formatted) :

standard output: formatted text

-------------------------------------------------------------------------------
Brief, high-level description of what application does:

The application generates quark propagators from a  background gauge
configuration and a fermionic source. This is equivalent to solving 
M psi = source 
where psi is the quark propagator and M (a function operating on psi)
depends on the gauge fields.
The benchmark performs a cut down version of this operation.

-------------------------------------------------------------------------------
Main algorithms used:

Conjugate gradient least norm with red-black pre-conditioning.

-------------------------------------------------------------------------------
Skeleton sketch of application:

The benchmark code initialises the gauge field to a unit gauge
configuration. (The results for a unit gauge can be calculated
analytically allowing a check on the results)
A gauge transformation is then applied to the gauge field. A unit gauge
field only consists of zeros and ones by applying a gauge transformation
non-trivial values are generated. Quantities corresponding to physical
observables should be unchanged by such a transformation. 
In application code the gauge field would have been read in from disk.
The source field is initialised to a point source (a single non-zero
point on one lattice site)
An iterative solver is called to generate the quark propagator.
The solver routine also generates timing information.
In application code this would then be dumped to disk.
In the benchmark we use the quark propagator to generate a physically
significant quantity (the pion propagator). This generates a single real
number for each timeslice of the lattice. These values are printed to
standard out.

This procedure requires a large number of iterations. For benchmarking
we are only interested in the time per-iteration and some check on the
validity of the results. We therefore usually only perform a fixed
number of iterations (say 50) to generate accurate timing information
and verify the results by comparison with other machines.

-------------------------------------------------------------------------------
Brief description of I/O behaviour:

Unless an error occurs a single processor outputs to standard out.

-------------------------------------------------------------------------------
Describe the data distribution (if appropriate) :
A spacial decomposition is used to distribute the 4-D arrays over a 4-D
grid of processors. Each dimension is distributed independently.
The program supports non-regular decomposition,
e.g. a lattice of width 22 will be distributed across a processor-grid
of width 4 as (6, 6, 5, 5)

-------------------------------------------------------------------------------
Give parameters of the data distribution (if appropriate) :
Lattice size:     NX NY NZ NT
processor grid:   NPX NPY NPZ NPT

-------------------------------------------------------------------------------
Brief description of load balance behavior :

Load balancing depends only on the distribution, if the lattice size can
be exactly divided by the processor grid size all processors will have 
the same workload. In practice it is often useful to trade load
balancing for a larger number of processors.

-------------------------------------------------------------------------------
Give parameters that determine the problem size :
Lattice size, NX NY NZ NT
problem size is NX*NY*NZ*NT
-------------------------------------------------------------------------------
Give memory as function of problem size :

In a production environment there are build time parameters that
set the array sizes and problem/machine sizes can be set at runtime. 
When creating a benchmark program it seemed less confusing to set
lattice and processor-grid sizes at build time and derive all other
quantities from them. The appropriate parameters for memory use are
Max_body (maximum number of data-points per/processor)
Max_bound (maximum number of data points on a single boundary between
   two processors)
If LX LY LZ LT are the local lattice sizes obtained by dividing the
lattice size by the processor grid size and rounding up to the nearest integer.
Max_body = (LX*LY*LZ*LT)/2
Max_bound = MAX( LX*LY*LZ/2 ,LY*LZ*LT/2 ,LX*LZ*LT/2 ,LX*LY*LT/2 )

The code contains a number of build-time switches for variations
in the implementation that may be beneficial on some machines. The
memory usage depends on these switches but typical values are:
108 * Max_body + 36 * Max_bound Fpoints
16 * (Max_body + Max_bound) INTEGERS

-------------------------------------------------------------------------------
Give number of floating-point operations as function of problem size :

Each iteration performs 2760 floating point operations per lattice site.
ie. 50 iteration using a 24^3*48 lattice = 9.16e+10 floating point operations.

-------------------------------------------------------------------------------
Give communication overhead as function of problem size and data distribution :

For each iteration every processor sends 24 messages to each of its 8
neighbours each message contains one floating point number for each
lattice point in the common boundary. Two global sum operations are also
performed for each iteration.

-------------------------------------------------------------------------------
Give three problem sizes, small, medium, and large for which the benchmark
should be run (give parameters for problem size, sizes of I/O files,
memory required, and number of floating point operations) :

18^3*36		2.90e+10 fp operations
24^3*48		9.16e+10 fp operations
36^3*72		4.64e+11 fp operations

-------------------------------------------------------------------------------
How did you determine the number of floating-point operations (hardware
monitor, count by hand, etc.) :

count operations in each loop by hand. The code contains a counter to
sum these values.

-------------------------------------------------------------------------------
Other relevant information:



-------------------------------------------------------------------------------


From owner-parkbench-compactapp@CS.UTK.EDU Mon Mar 13 08:44:32 1995
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id IAA15646; Mon, 13 Mar 1995 08:44:31 -0500
Received: from localhost by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA14363; Mon, 13 Mar 1995 08:45:00 -0500
X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Mon, 13 Mar 1995 08:44:57 EST
Errors-to: owner-parkbench-compactapp@CS.UTK.EDU
Received: from vax.darpa.mil by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA14339; Mon, 13 Mar 1995 08:44:55 -0500
Received: from next63.darpa.mil  (next63.darpa.mil) by vax.darpa.mil (5.65c/5.61+local-5)
	id <AA14843>; Mon, 13 Mar 1995 08:44:53 -0500
Received: by  next63.darpa.mil  (NX5.67d/NeXT-2.0)
	id AA00427; Mon, 13 Mar 95 08:43:24 -0500
Message-Id: <9503131343.AA00427@ next63.darpa.mil >
Content-Type: text/plain
Mime-Version: 1.0 (NeXT Mail 3.3 v118.2)
Received: by NeXT.Mailer (1.118.2)
From: Jose Munoz <jmunoz@next63.darpa.mil>
Date: Mon, 13 Mar 95 08:43:22 -0500
To: pbwg-compactapp@CS.UTK.EDU
Subject: realtime?

Hello,
I'm interested in identifying a set of realtime benchmarks for  
embedded appls.
Is this a good place to start (I thinkk so)?  Im in the process of  
dl a copy of
the report (as I write) and hopefully will have more focused  
questions.  In general
I'm interested in  (1) has a benchmark std. been def'd, (2) are  
metrics id'd, (3)
how is the underlying hw id'd?

Thanks.
Jose
---
<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<  Dr. Jose L. Munoz        | email: jmunoz@arpa.mil    >
<  ARPA/CSTO                |                           >
<  3701 N. Fairfax Dr.      | Phone: (703)696-4468      >
<  Arlington, VA 22203-1714 | FAX:   (703)696-2202      >
<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
From owner-parkbench-compactapp@CS.UTK.EDU Mon Mar 13 12:10:57 1995
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id MAA19933; Mon, 13 Mar 1995 12:10:56 -0500
Received: from localhost by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id LAA25609; Mon, 13 Mar 1995 11:08:01 -0500
X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Mon, 13 Mar 1995 11:07:59 EST
Errors-to: owner-parkbench-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id LAA25596; Mon, 13 Mar 1995 11:07:56 -0500
Received: (from walker@localhost) by rios2.EPM.ORNL.GOV (8.6.10/8.6.10) id LAA18850; Mon, 13 Mar 1995 11:07:20 -0500
From: David Walker <walker@rios2.EPM.ORNL.GOV>
Message-Id: <199503131607.LAA18850@rios2.EPM.ORNL.GOV>
To: Jose Munoz <jmunoz@next63.darpa.mil>
Cc: pbwg-compactapp@CS.UTK.EDU
Subject: Re: realtime? 
In-reply-to: (Your message of Mon, 13 Mar 95 08:43:22 EST.)
             <9503131343.AA00427@ next63.darpa.mil > 
Date: Mon, 13 Mar 95 11:07:19 -0500

Jose,

ParkBench is a proposed set of standard benchmarks, but has not
be officially sanctioned by any standrads body such as ISO.
Several metrics, detailed in the Parkbench report have been identified.
For more information, please take a look at the www page at:

http://www.epm.ornl.gov/~walker/parkbench/

Regards,
David
--------------------------------------------------------------------------
| David W. Walker                 |   Office   : (615) 574-7401          |
| Oak Ridge National Laboratory   |   Fax      : (615) 574-0680          |
| Building 6012/MS-6367           |   Messages : (615) 574-1936          |
| P. O. Box 2008                  |   Email    : walker@msr.epm.ornl.gov |
| Oak Ridge, TN 37831-6367        |                                      |
|               WEB: http://www.epm.ornl.gov/~walker/                    |
--------------------------------------------------------------------------
From owner-parkbench-compactapp@CS.UTK.EDU Fri Sep  8 16:36:42 1995
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id QAA14450; Fri, 8 Sep 1995 16:36:42 -0400
Received: from localhost by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id QAA04473; Fri, 8 Sep 1995 16:36:21 -0400
X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Fri, 8 Sep 1995 16:36:20 EDT
Errors-to: owner-parkbench-compactapp@CS.UTK.EDU
Received: from franklin.seas.gwu.edu by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id QAA04465; Fri, 8 Sep 1995 16:36:18 -0400
Received: from felix.seas.gwu.edu (abdullah@felix.seas.gwu.edu [128.164.9.3]) by franklin.seas.gwu.edu (v8) with ESMTP id QAA10099 for <parkbench-compactapp@cs.utk.edu>; Fri, 8 Sep 1995 16:36:16 -0400
Received: (from abdullah@localhost) by felix.seas.gwu.edu (8.6.12/8.6.12) id QAA07113 for parkbench-compactapp@cs.utk.edu; Fri, 8 Sep 1995 16:36:12 -0400
Date: Fri, 8 Sep 1995 16:36:12 -0400
From: Abdullah Meajil <abdullah@seas.gwu.edu>
Message-Id: <199509082036.QAA07113@felix.seas.gwu.edu>
To: parkbench-compactapp@CS.UTK.EDU
Subject: subscribe

subscribe

From owner-parkbench-compactapp@CS.UTK.EDU Fri Jun 28 10:51:58 1996
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id KAA09606; Fri, 28 Jun 1996 10:51:57 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id KAA20519; Fri, 28 Jun 1996 10:51:17 -0400
Received: from convex.convex.com (convex.convex.com [130.168.1.1]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id KAA20506; Fri, 28 Jun 1996 10:51:07 -0400
Received: from bach.convex.com by convex.convex.com (8.6.4.2/1.35)
	id JAA01420; Fri, 28 Jun 1996 09:50:28 -0500
Received: from localhost by bach.convex.com (8.6.4/1.28)
	id JAA09161; Fri, 28 Jun 1996 09:50:27 -0500
From: hari@bach.convex.com (Harikumar Sivaraman)
Message-Id: <199606281450.JAA09161@bach.convex.com>
Subject: Bug report on COMMS3.f in PARKBENCH2.0
To: parkbench-comments@CS.UTK.EDU, parkbench-lowlevel@CS.UTK.EDU
Date: Fri, 28 Jun 96 9:50:26 CDT
Cc: romero@bach.convex.com (Paco Romero)
X-Mailer: ELM [version 2.3 PL11]

DISCLAIMER: The contents of this mail are not an official HP position.
	    I do not speak for HP.


The COMMS3 benchmark in PARKBENCH2.0 is in apparent violation of 
the specifications in the MPI standard. The benchmark attempts to do an
MPI_RECV into the same buffer on which it has posted an MPI_ISEND
before it does an MPI_WAIT. The relevant code fragment is as below:

COMMS3  (This code fragments applies in the case of two processors)
------
CALL MPI_ISEND(A, IWORD, MPI_DOUBLE_PRECISION, .....

CALL MPI_RECV(A, IWORD, MPI_DOUBLE_PRECISION, ......

CALL MPI_WAIT(request(NSLAVE), status, ierr)


COMMS3  (Multiple processors)
------

do i = 1, #processors
   CALL MPI_ISEND(A, IWORD, MPI_DOUBLE_PRECISION, .....
enddo

// The MPI_ISEND statements in the loop violate the MPI standard since the buffer "A"
//  is reused inside the loop.

do i = 1, #processors
   CALL MPI_RECV(A, IWORD, MPI_DOUBLE_PRECISION, ......
enddo

do i = 1, #processors
   CALL MPI_WAIT(request(NSLAVE), status, ierr)
enddo

Comments:
---------
The MPI standards (page 40, last but one paragraph) says "the sender should
not access any part of the send buffer after a nonblocking send operation 
is called, until the send completes." Page 41, line 1 of the MPI standards says 
"the functions MPI_WAIT and MPI_TEST are used to complete a nonblocking
communication". Clearly the reuse of buffer "A" in the code fragments
above is in violation of the standard. 

-------
H. Sivaraman                                  (214) 497 - 4374
HP; 3000 Waterview Pk.way
Dallas, TX - 75080


From owner-parkbench-compactapp@CS.UTK.EDU Mon Sep  9 20:31:06 1996
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id UAA24848; Mon, 9 Sep 1996 20:31:05 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id UAA10076; Mon, 9 Sep 1996 20:29:21 -0400
Received: from convex.convex.com (convex.convex.com [130.168.1.1]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id UAA10069; Mon, 9 Sep 1996 20:29:17 -0400
Received: from brittany.rsn.hp.com by convex.convex.com (8.6.4.2/1.35)
	id PAA25214; Mon, 9 Sep 1996 15:42:49 -0500
Received: from localhost by brittany.rsn.hp.com with SMTP
	(1.38.193.4/16.2) id AA16691; Mon, 9 Sep 1996 15:39:52 -0500
Sender: sercely@convex.convex.com
Message-Id: <32348098.3BF5@convex.com>
Date: Mon, 09 Sep 1996 15:39:52 -0500
From: Ron Sercely <sercely@convex.convex.com>
Organization: Hewlett-Packard Convex Technology Center
X-Mailer: Mozilla 2.0 (X11; I; HP-UX A.09.05 9000/710)
Mime-Version: 1.0
To: parkbench-lowlevel@CS.UTK.EDU
Cc: wallach@convex.convex.com, romero@convex.convex.com,
        sercely@convex.convex.com
Subject: comms2 and comms3 bugs, mpi release 
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

HP/Convex wants to release lowlevel numbers in two weeks, but we are
trying to
figure out what to do about the bugs we have reported in these codes.

Options are:
Submitting results without these tests
HP/Convex Re-writing the benchmarks to "do the right thing"
other ?

I would appreciate a phone call to discuss these issues.
-- 
Ron Sercely
214.497.4667

HP/CXTC Toolsmith

From owner-parkbench-compactapp@cs.utk.edu Tue Sep 10 07:23:38 1996
Return-Path: <owner-parkbench-compactapp@cs.utk.edu>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id HAA00602; Tue, 10 Sep 1996 07:23:36 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA24084; Tue, 10 Sep 1996 05:20:31 -0400
Received: from postoffice.npac.syr.edu (postoffice.npac.syr.edu [128.230.7.30]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id FAA24037; Tue, 10 Sep 1996 05:20:22 -0400
Received: from yosemite (pc280.sis.port.ac.uk [148.197.205.60]) by postoffice.npac.syr.edu (8.7.5/8.7.1) with SMTP id FAA00584; Tue, 10 Sep 1996 05:13:39 -0400 (EDT)
From: Mark Baker <mab@npac.syr.edu>
Date: Tue, 10 Sep 96 10:10:24    
Subject: RE: comms2 and comms3 bugs, mpi release 
To: parkbench-lowlevel@cs.utk.edu, Ron Sercely <sercely@convex.convex.com>
Cc: wallach@convex.convex.com, romero@convex.convex.com,
        sercely@convex.convex.com, erich@cs.utk.edu, dongarra@cs.utk.edu,
        ajgh@ecs.soton.ac.uk
X-PRIORITY: 3 (Normal)
X-Mailer: Chameleon notFound, TCP/IP for Windows, NetManage Inc.
Message-ID: <Chameleon.842347097.mab@yosemite>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=us-ascii

Ron,

Ian Glendenning and I produced the first MPI port of the low-level
codes for Parkbench approximately a year ago. 

Erich Strohmaier (who works for Jack Dongarra at UTK) has been managing
and maintaining all the parkbench codes since then.

I would suggest he reply to you on the subject.

If you do not get a reply I am willing to help.

Regards

Mark


On Mon, 09 Sep 1996 15:39:52 -0500  Ron Sercely <sercely@convex.convex.com> 
wrote:

>HP/Convex wants to release lowlevel numbers in two weeks, but we are
>trying to
>figure out what to do about the bugs we have reported in these codes.
>
>Options are:
>Submitting results without these tests
>HP/Convex Re-writing the benchmarks to "do the right thing"
>other ?
>
>I would appreciate a phone call to discuss these issues.
>-- 
>Ron Sercely
>214.497.4667
>
>HP/CXTC Toolsmith
>

-------------------------------------
Dr Mark Baker
DIS, University of Portsmouth, Hants, UK
E-mail: mab@npac.syr.edu
Date: 10/09/96 - Time: 10:10:24
URL http://www.npac.syr.edu/
-------------------------------------


From owner-parkbench-compactapp@cs.utk.edu Tue Sep 10 07:27:37 1996
Return-Path: <owner-parkbench-compactapp@cs.utk.edu>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id HAA00650; Tue, 10 Sep 1996 07:27:37 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id EAA15736; Tue, 10 Sep 1996 04:02:25 -0400
Received: from beech.soton.ac.uk (beech.soton.ac.uk [152.78.128.78]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id DAA15421; Tue, 10 Sep 1996 03:59:42 -0400
Received: from bright.ecs.soton.ac.uk (bright.ecs.soton.ac.uk [152.78.64.201])
   by beech.soton.ac.uk (8.6.12/hub-8.5a) with SMTP id IAA22959;
   Tue, 10 Sep 1996 08:57:52 +0100
Received: from landlord.ecs.soton.ac.uk by bright.ecs.soton.ac.uk; Tue, 10 Sep 96 08:57:21 BST
From: Vladimir Getov <vsg@ecs.soton.ac.uk>
Received: from caesar.ecs.soton.ac.uk by landlord.ecs.soton.ac.uk; Tue, 10 Sep 96 08:59:09 BST
Date: Tue, 10 Sep 96 08:58:36 BST
Message-Id: <2546.9609100758@caesar.ecs.soton.ac.uk>
To: parkbench-comm@cs.utk.edu, parkbench-lowlevel@cs.utk.edu,
        sercely@convex.convex.com
Subject: Re: comms2 and comms3 bugs, mpi release
Cc: wallach@convex.convex.com, romero@convex.convex.com

Hi Ron,

Are you talking about the same or similar bugs as the ones reported for
the comms3 benchmark by Harikumar Sivaraman at the end of June (see the
included message below)?

			-Vladimir Getov

p.s. Apologies if you receive this message more than once - I have 
included parkbench-comm@CS.UTK.EDU on the "To:" line but do not know
the cross membership.
> 
> HP/Convex wants to release lowlevel numbers in two weeks, but we are
> trying to
> figure out what to do about the bugs we have reported in these codes.
> 
> Options are:
> Submitting results without these tests
> HP/Convex Re-writing the benchmarks to "do the right thing"
> other ?
> 
> I would appreciate a phone call to discuss these issues.
> -- 
> Ron Sercely
> 214.497.4667
> 
> HP/CXTC Toolsmith
> 
____________________________  included message  _______________________
>From owner-parkbench-compactapp@CS.UTK.EDU Fri Jun 28 15:54:32 1996
From: hari@bach.convex.com (Harikumar Sivaraman)
Subject: Bug report on COMMS3.f in PARKBENCH2.0
To: parkbench-comments@CS.UTK.EDU, parkbench-lowlevel@CS.UTK.EDU
Date: Fri, 28 Jun 96 9:50:26 CDT
Cc: romero@bach.convex.com (Paco Romero)
X-Mailer: ELM [version 2.3 PL11]
Content-Length: 1559
X-Status: 

DISCLAIMER: The contents of this mail are not an official HP position.
	    I do not speak for HP.


The COMMS3 benchmark in PARKBENCH2.0 is in apparent violation of 
the specifications in the MPI standard. The benchmark attempts to do an
MPI_RECV into the same buffer on which it has posted an MPI_ISEND
before it does an MPI_WAIT. The relevant code fragment is as below:

COMMS3  (This code fragments applies in the case of two processors)
------
CALL MPI_ISEND(A, IWORD, MPI_DOUBLE_PRECISION, .....

CALL MPI_RECV(A, IWORD, MPI_DOUBLE_PRECISION, ......

CALL MPI_WAIT(request(NSLAVE), status, ierr)


COMMS3  (Multiple processors)
------

do i = 1, #processors
   CALL MPI_ISEND(A, IWORD, MPI_DOUBLE_PRECISION, .....
enddo

// The MPI_ISEND statements in the loop violate the MPI standard since the buffer "A"
//  is reused inside the loop.

do i = 1, #processors
   CALL MPI_RECV(A, IWORD, MPI_DOUBLE_PRECISION, ......
enddo

do i = 1, #processors
   CALL MPI_WAIT(request(NSLAVE), status, ierr)
enddo

Comments:
---------
The MPI standards (page 40, last but one paragraph) says "the sender should
not access any part of the send buffer after a nonblocking send operation 
is called, until the send completes." Page 41, line 1 of the MPI standards says 
"the functions MPI_WAIT and MPI_TEST are used to complete a nonblocking
communication". Clearly the reuse of buffer "A" in the code fragments
above is in violation of the standard. 

-------
H. Sivaraman                                  (214) 497 - 4374
HP; 3000 Waterview Pk.way
Dallas, TX - 75080



From owner-parkbench-compactapp@CS.UTK.EDU Tue Sep 10 08:46:41 1996
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id IAA01821; Tue, 10 Sep 1996 08:46:40 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA13971; Tue, 10 Sep 1996 08:41:06 -0400
Received: from rudolph.cs.utk.edu (RUDOLPH.CS.UTK.EDU [128.169.92.87]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id IAA13960; Tue, 10 Sep 1996 08:40:59 -0400
From: Erich Strohmaier <erich@CS.UTK.EDU>
Received:  by rudolph.cs.utk.edu (cf v2.11c-UTK)
          id IAA13912; Tue, 10 Sep 1996 08:40:58 -0400
Date: Tue, 10 Sep 1996 08:40:58 -0400
Message-Id: <199609101240.IAA13912@rudolph.cs.utk.edu>
To: parkbench-lowlevel@CS.UTK.EDU, sercely@convex.convex.com
Subject: Re:  comms2 and comms3 bugs, mpi release
Cc: romero@convex.convex.com, wallach@convex.convex.comh

Ron,

We fixed the two bugs you mentioned and we are currently testing the
new codes.  The new version should be out by end of this week.  If you
would like to get it earlier, please let me know.


Best Regards

Erich



===========================================================================
Erich Strohmaier                       email:  erich@cs.utk.edu
Department of Computer Science         phone:  ++ 1 (423) 974 0293
104 Ayres Hall                         fax  :  ++ 1 (423) 974 8296
Knoxville TN, 37996 - USA              http://www.cs.utk.edu/~erich/

From owner-parkbench-compactapp@CS.UTK.EDU Tue Sep 10 18:13:11 1996
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id SAA06946; Tue, 10 Sep 1996 18:13:11 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id SAA05907; Tue, 10 Sep 1996 18:12:17 -0400
Received: from VNET.IBM.COM (vnet.ibm.com [199.171.26.4]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id SAA05894; Tue, 10 Sep 1996 18:12:13 -0400
Message-Id: <199609102212.SAA05894@CS.UTK.EDU>
Received: from PKEDVM9 by VNET.IBM.COM (IBM VM SMTP V2R3) with BSMTP id 2875;
   Tue, 10 Sep 96 18:12:14 EDT
Date: Tue, 10 Sep 96 18:11:11 EDT
From: "C. George Hsi" <HSI@PKEDVM9.VNET.IBM.COM>
To: parkbench-lowlevel@CS.UTK.EDU

Hi, could you please add my name to the ParkBench Low-Level mailing
list?  I work in the RS/6000 SP performance measurement area at IBM
Poughkeepsie, and have been involved in using the ParkBench Low-Level
code recently.  My address is:   hsi@pkedvm9.vnet.ibm.com

Thanks for your help,

C. George Hsi




From owner-parkbench-compactapp@CS.UTK.EDU Mon Sep 16 15:02:05 1996
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id PAA24616; Mon, 16 Sep 1996 15:02:04 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id OAA17941; Mon, 16 Sep 1996 14:51:47 -0400
Received: from blueberry.cs.utk.edu (BLUEBERRY.CS.UTK.EDU [128.169.92.34]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id OAA17934; Mon, 16 Sep 1996 14:51:45 -0400
Received:  by blueberry.cs.utk.edu (cf v2.11c-UTK)
          id SAA05937; Mon, 16 Sep 1996 18:49:20 GMT
From: "Erich Strohmaier" <erich@CS.UTK.EDU>
Message-Id: <9609161449.ZM5935@blueberry.cs.utk.edu>
Date: Mon, 16 Sep 1996 14:49:20 -0400
X-Face: ,v?vp%=2zU8m.23T00H*9+qjCVLwK{V3T{?1^Bua(Ud:|%?@D!~^v^hoA@Z5/*TU[RFq_n'n"}z{qhQ^Q3'Mexsxg0XW>+CbEOca91voac=<YfvQ8HrQFkH>P/w]>n_nS]V_ZL>XRSYWi:{MzalK9Hb^=B}Y*[x*MOX7R=*V}PI.HG~2
X-Mailer: Z-Mail (3.2.0 26oct94 MediaMail)
To: parkbench-comm@@CS.UTK.EDU, cs.utk.edu@CS.UTK.EDU,
        parkbench-lowlevel@CS.UTK.EDU
Subject: ParKBench Release 2.1
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii

Hello,

The release 2.1 of ParKBench is available at netlib:
   http://www.netlib.org/parkbench/

It contains the following bug fixes:
- Comms2 for MPI made to be a true exchange benchmark using MPI_SENDRECV.
- Comms3 for MPI using wild-card  and second buffer.
- Added missing mpif.f for the MPI2PVM library.
- Fixed Makefiles.
- make.local.def modifications.
- Updated conf/make.def.SP2MPI.
- LU Solver fixed though the use of a flag to the Blacs build in the Bmakes.
- Addition of the definition for mpi_group_translate_ranks in Bdef.h.
- PBLAS bug solved with new BLACS compilation.

Best Regards


Erich Strohmaier

email:  erich@cs.utk.edu

From owner-parkbench-compactapp@CS.UTK.EDU Mon Oct 14 14:28:34 1996
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id OAA06896; Mon, 14 Oct 1996 14:28:34 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id OAA07493; Mon, 14 Oct 1996 14:22:58 -0400
Received: from blueberry.cs.utk.edu (BLUEBERRY.CS.UTK.EDU [128.169.92.34]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id OAA07485; Mon, 14 Oct 1996 14:22:53 -0400
Received:  by blueberry.cs.utk.edu (cf v2.11c-UTK)
          id SAA13307; Mon, 14 Oct 1996 18:20:29 GMT
From: "Erich Strohmaier" <erich@CS.UTK.EDU>
Message-Id: <9610141420.ZM13305@blueberry.cs.utk.edu>
Date: Mon, 14 Oct 1996 14:20:27 -0400
X-Face: ,v?vp%=2zU8m.23T00H*9+qjCVLwK{V3T{?1^Bua(Ud:|%?@D!~^v^hoA@Z5/*TU[RFq_n'n"}z{qhQ^Q3'Mexsxg0XW>+CbEOca91voac=<YfvQ8HrQFkH>P/w]>n_nS]V_ZL>XRSYWi:{MzalK9Hb^=B}Y*[x*MOX7R=*V}PI.HG~2
X-Mailer: Z-Mail (3.2.0 26oct94 MediaMail)
To: parkbench-comm@CS.UTK.EDU, parkbench-lowlevel@CS.UTK.EDU
Subject: ParkBench Workshop: Tentative Agenda
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii


Dear Colleague,

The ParkBench (Parallel Benchmark Working Group) will meet in
Knoxville, Tennessee on October 31th, 1996.

The format of the meeting is:
Thursday October 31th
   9:00 - 12.00  Full group meeting
  12.00 -  1.30  Lunch
   1.30 -  5.00  Full group meeting

The tentative agenda for the meeting is:

  1. Minutes of last meeting

     Current release:
  2. Status report and experience with the current release
  3. Examine the results obtained

     Next release:
  4. New HPF Low Level benchmarks
  5. New shared memory Low Level benchmarks
  6. New performance database design and new benchmark output format
  7. Update of GBIS with new Web front-end

  8. Report from other benchmark activities

     ParkBench:
  9. Discussion of ParkBench group structure
 10. ParkBench Bibliography
 11. Status of ParkBench funding

     Other Activities:
 12. Discussion of the Supercomputing'96 activities
 13. "Electronic Benchmarking Journal" - status report

 14. Miscellaneous

 15. Date and venue for next meeting


The meeting site will be the Knoxville Downtown Hilton Hotel.
We have made arrangements with the Hilton Hotel in Knoxville.
You can download a postscript map of the area by looking at
http://www.netlib.org/utk/people/JackDongarra.html.

When making arrangements tell the hotel you are associated
with the Parallel Benchmarking or ParkBench or Park.
The rate about $75.00/night.

  Hilton Hotel
  501 W. Church Street
  Knoxville, TN
  Phone:  423-523-2300

==>  Please make your reservation as soon as possible!



Jack Dongarra
Erich Strohmaier


From owner-parkbench-compactapp@CS.UTK.EDU Mon Oct 21 16:14:12 1996
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id QAA11230; Mon, 21 Oct 1996 16:14:11 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id PAA21293; Mon, 21 Oct 1996 15:57:23 -0400
Received: from blueberry.cs.utk.edu (BLUEBERRY.CS.UTK.EDU [128.169.92.34]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id PAA20796; Mon, 21 Oct 1996 15:54:50 -0400
Received:  by blueberry.cs.utk.edu (cf v2.11c-UTK)
          id TAA16003; Mon, 21 Oct 1996 19:52:28 GMT
From: "Erich Strohmaier" <erich@CS.UTK.EDU>
Message-Id: <9610211552.ZM16001@blueberry.cs.utk.edu>
Date: Mon, 21 Oct 1996 15:52:27 -0400
X-Face: ,v?vp%=2zU8m.23T00H*9+qjCVLwK{V3T{?1^Bua(Ud:|%?@D!~^v^hoA@Z5/*TU[RFq_n'n"}z{qhQ^Q3'Mexsxg0XW>+CbEOca91voac=<YfvQ8HrQFkH>P/w]>n_nS]V_ZL>XRSYWi:{MzalK9Hb^=B}Y*[x*MOX7R=*V}PI.HG~2
X-Mailer: Z-Mail (3.2.0 26oct94 MediaMail)
To: parkbench-lowlevel@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU
Subject: ParKBench Workshop
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii

Dear Colleague,

All of you who are planning to come to the next meeting
---  http://www.netlib.org/parkbench/ ---
please send email to us so we can make local arrangements.

Thank you very much


Erich Strohmaier



From owner-parkbench-compactapp@CS.UTK.EDU Tue Dec  3 21:46:51 1996
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id VAA14230; Tue, 3 Dec 1996 21:46:50 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id VAA13342; Tue, 3 Dec 1996 21:45:10 -0500
Received: from alberta.sallynet.com (root@[208.1.117.130]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id VAA13325; Tue, 3 Dec 1996 21:45:06 -0500
Received: from euphoria.com (Cust28.Max45.Seattle.WA.MS.UU.NET [153.34.132.156]) by alberta.sallynet.com (8.7.4/8.7.3) with SMTP id RAA06216; Tue, 3 Dec 1996 17:11:18 -0500 (EST)
Message-Id: <199612032211.RAA06216@alberta.sallynet.com>
Comments: Authenticated sender is <promote@mail.strutstuff.com>
From: mail.strutstuff.com@alberta.sallynet.com
To: "(promote)"<"(promote)"@CS.UTK.EDU (promote@strutstuff.com)>,
        "(promote)"<"(promote)"@CS.UTK.EDU (promote@strutstuff.com)>
Date: Tue, 3 Dec 1996 14:12:21 +0000
MIME-Version: 1.0
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Subject: Free offer
Priority: normal
X-mailer: Pegasus Mail for Win32 (v2.42a)

Strut Your Stuff!
1001 FREE Places to Promote your site!
http://www.strutyourstuff.com
---------------------------------

If you like to be removed from any future
free offers, simple type the word "remove"
in the subject line. Thank You.

From owner-parkbench-compactapp@CS.UTK.EDU Wed Apr 23 16:40:22 1997
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id QAA10091; Wed, 23 Apr 1997 16:40:22 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id QAA02831; Wed, 23 Apr 1997 16:40:25 -0400
Received: from blueberry.cs.utk.edu (BLUEBERRY.CS.UTK.EDU [128.169.92.34]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id QAA02732; Wed, 23 Apr 1997 16:40:00 -0400
Received:  by blueberry.cs.utk.edu (cf v2.11c-UTK)
          id SAA12213; Wed, 23 Apr 1997 18:36:17 GMT
From: "Erich Strohmaier" <erich@CS.UTK.EDU>
Message-Id: <9704231436.ZM12211@blueberry.cs.utk.edu>
Date: Wed, 23 Apr 1997 14:36:16 -0400
X-Face: ,v?vp%=2zU8m.23T00H*9+qjCVLwK{V3T{?1^Bua(Ud:|%?@D!~^v^hoA@Z5/*TU[RFq_n'n"}z{qhQ^Q3'Mexsxg0XW>+CbEOca91voac=<YfvQ8HrQFkH>P/w]>n_nS]V_ZL>XRSYWi:{MzalK9Hb^=B}Y*[x*MOX7R=*V}PI.HG~2
X-Mailer: Z-Mail (3.2.0 26oct94 MediaMail)
To: parkbench-lowlevel@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU,
        parkbench-hpf@CS.UTK.EDU
Subject: ParkBench Committee Meeting - tentative Agenda
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii

Dear Colleague,

The ParkBench (Parallel Benchmark Working Group)
will meet in Knoxville, Tennessee on
May 9th, 1997.

The meeting site will be the Knoxville Downtown Hilton Hotel.
We have made arrangements with the Hilton Hotel in Knoxville.

  Hilton Hotel
  501 W. Church Street
  Knoxville, TN
  Phone:  423-523-2300

When making arrangements tell the hotel you are associated with
the 'ParkBench'. The rate about $79.00/night.
You can download a postscript map of the area by looking at
http://www.netlib.org/utk/people/JackDongarra.html.

----------------
The format of the meeting is:

Friday May 9th, 1997.
   9:00 - 12.00  Full group meeting
  12.00 -  1.30  Lunch
   1.30 -  5.00  Full group meeting

There might be also a joint session with the SPEC/HPG group
on Thursday 8th at about 3pm-5pm


----------------
Please send us your comments about the tentative agenda:

  1. Minutes of last meeting (MBe)

     Changes to Current release:
  2. Low Level (ES, VG, RS)
     comms1, comms2, comms3, poly2
  3. Linear Algebra (ES)
  4. Compact Applications - NPBs (SS, ES)

     New benchmarks:
  5. HPF Low Level benchmarks (MBa)
? 6. New shared memory Low Level benchmarks (MBa)
? 7. New performance database design and new benchmark output format (MBa,VG)
? 8. Update of GBIS with new Web front-end (MBa,VG)

     Report from other benchmark activities
  9. ASCI Benchmark Codes (RS)
 10. SPEC (RE)

     ParkBench:
 11. ParkBench Bibliography
 12. ParkBench Report 2

     Other Activities:
 13. Discussion of the ParkBench Workshop 11/12 September, UK
 14. "Electronic Benchmarking Journal" - status report -

 15. Miscellaneous -

 16. Date and venue for next meeting -


  (MBa) Mark Baker          Univ. of Portsmouth
  (MBe) Michael Berry       Univ. of Tennessee
  (JD)  Jack Dongarra       Univ. of Tenn./ORNL
  (RE)  Rudi Eigenmann      SPEC
  (VG)  Vladimir Getov      Univ. of Westminister
  (TH)  Tony Hey            Univ. of Southampton
  (SS)  Subhash Saini       NASA Ames
  (RS)  Ron Sercely         HP/CXTC
  (ES)  Erich Strohmaier    Univ. of Tennessee


Jack Dongarra
Erich Strohmaier


From owner-parkbench-compactapp@CS.UTK.EDU Wed Apr 23 19:11:02 1997
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id TAA12012; Wed, 23 Apr 1997 19:11:01 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id TAA16877; Wed, 23 Apr 1997 19:10:25 -0400
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id TAA16794; Wed, 23 Apr 1997 19:09:55 -0400
Received: from mordillo (node3.remote.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA29461; Thu, 24 Apr 97 00:10:42 BST
Date: Wed, 23 Apr 97 23:56:13    
From: Mark Baker <mab@sis.port.ac.uk>
Subject: RE: ParkBench Committee Meeting - tentative Agenda 
To: parkbench-lowlevel@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU,
        parkbench-hpf@CS.UTK.EDU, Erich Strohmaier <erich@CS.UTK.EDU>
X-Priority: 3 (Normal)
X-Mailer: Chameleon 5.0.1, TCP/IP for Windows, NetManage Inc.
Message-Id: <Chameleon.861836407.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=us-ascii

Erich,

Some corrections...

--- On Wed, 23 Apr 1997 14:36:16 -0400  Erich Strohmaier 
<erich@CS.UTK.EDU> wrote:


>Please send us your comments about the tentative agenda:
>
>  1. Minutes of last meeting (MBe)
>
>     Changes to Current release:
>  2. Low Level (ES, VG, RS)
>     comms1, comms2, comms3, poly2
>  3. Linear Algebra (ES)
>  4. Compact Applications - NPBs (SS, ES)
>
>     New benchmarks:
>  5. HPF Low Level benchmarks (MBa)
>? 6. New shared memory Low Level benchmarks (MBa)

Can you change this to report on our I/O benchmark efforts.

>? 7. New performance database design and new benchmark output 
format (MBa,VG)
>? 8. Update of GBIS with new Web front-end (MBa,VG)

Tony or I will update the committe on the new
back/fronts ends of GBIS + hopefully also give a demo.

VG, as far as I know, is not involved in this activity.

Regards

Mark


-------------------------------------
DIS, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 4/23/97 - Time: 11:56:13 PM
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-compactapp@cs.utk.edu Sat Apr 26 06:40:56 1997
Return-Path: <owner-parkbench-compactapp@cs.utk.edu>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id GAA20901; Sat, 26 Apr 1997 06:40:55 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id OAA18130; Wed, 23 Apr 1997 14:37:56 -0400
Received: from blueberry.cs.utk.edu (BLUEBERRY.CS.UTK.EDU [128.169.92.34]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id OAA18062; Wed, 23 Apr 1997 14:36:39 -0400
Received:  by blueberry.cs.utk.edu (cf v2.11c-UTK)
          id SAA12213; Wed, 23 Apr 1997 18:36:17 GMT
From: "Erich Strohmaier" <erich@cs.utk.edu>
Message-Id: <9704231436.ZM12211@blueberry.cs.utk.edu>
Date: Wed, 23 Apr 1997 14:36:16 -0400
X-Face: ,v?vp%=2zU8m.23T00H*9+qjCVLwK{V3T{?1^Bua(Ud:|%?@D!~^v^hoA@Z5/*TU[RFq_n'n"}z{qhQ^Q3'Mexsxg0XW>+CbEOca91voac=<YfvQ8HrQFkH>P/w]>n_nS]V_ZL>XRSYWi:{MzalK9Hb^=B}Y*[x*MOX7R=*V}PI.HG~2
X-Mailer: Z-Mail (3.2.0 26oct94 MediaMail)
To: parkbench-lowlevel@cs.utk.edu, parkbench-comm@cs.utk.edu,
        parkbench-hpf@cs.utk.edu
Subject: ParkBench Committee Meeting - tentative Agenda
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii

Dear Colleague,

The ParkBench (Parallel Benchmark Working Group)
will meet in Knoxville, Tennessee on
May 9th, 1997.

The meeting site will be the Knoxville Downtown Hilton Hotel.
We have made arrangements with the Hilton Hotel in Knoxville.

  Hilton Hotel
  501 W. Church Street
  Knoxville, TN
  Phone:  423-523-2300

When making arrangements tell the hotel you are associated with
the 'ParkBench'. The rate about $79.00/night.
You can download a postscript map of the area by looking at
http://www.netlib.org/utk/people/JackDongarra.html.

----------------
The format of the meeting is:

Friday May 9th, 1997.
   9:00 - 12.00  Full group meeting
  12.00 -  1.30  Lunch
   1.30 -  5.00  Full group meeting

There might be also a joint session with the SPEC/HPG group
on Thursday 8th at about 3pm-5pm


----------------
Please send us your comments about the tentative agenda:

  1. Minutes of last meeting (MBe)

     Changes to Current release:
  2. Low Level (ES, VG, RS)
     comms1, comms2, comms3, poly2
  3. Linear Algebra (ES)
  4. Compact Applications - NPBs (SS, ES)

     New benchmarks:
  5. HPF Low Level benchmarks (MBa)
? 6. New shared memory Low Level benchmarks (MBa)
? 7. New performance database design and new benchmark output format (MBa,VG)
? 8. Update of GBIS with new Web front-end (MBa,VG)

     Report from other benchmark activities
  9. ASCI Benchmark Codes (RS)
 10. SPEC (RE)

     ParkBench:
 11. ParkBench Bibliography
 12. ParkBench Report 2

     Other Activities:
 13. Discussion of the ParkBench Workshop 11/12 September, UK
 14. "Electronic Benchmarking Journal" - status report -

 15. Miscellaneous -

 16. Date and venue for next meeting -


  (MBa) Mark Baker          Univ. of Portsmouth
  (MBe) Michael Berry       Univ. of Tennessee
  (JD)  Jack Dongarra       Univ. of Tenn./ORNL
  (RE)  Rudi Eigenmann      SPEC
  (VG)  Vladimir Getov      Univ. of Westminister
  (TH)  Tony Hey            Univ. of Southampton
  (SS)  Subhash Saini       NASA Ames
  (RS)  Ron Sercely         HP/CXTC
  (ES)  Erich Strohmaier    Univ. of Tennessee


Jack Dongarra
Erich Strohmaier


From owner-parkbench-comm@CS.UTK.EDU Fri May  2 15:53:02 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id PAA00358; Fri, 2 May 1997 15:53:02 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id PAA13341; Fri, 2 May 1997 15:44:43 -0400
Received: from blueberry.cs.utk.edu (BLUEBERRY.CS.UTK.EDU [128.169.92.34]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id PAA13327; Fri, 2 May 1997 15:44:36 -0400
Received:  by blueberry.cs.utk.edu (cf v2.11c-UTK)
          id TAA08348; Fri, 2 May 1997 19:44:04 GMT
From: "Erich Strohmaier" <erich@CS.UTK.EDU>
Message-Id: <9705021544.ZM8346@blueberry.cs.utk.edu>
Date: Fri, 2 May 1997 15:44:03 -0400
X-Face: ,v?vp%=2zU8m.23T00H*9+qjCVLwK{V3T{?1^Bua(Ud:|%?@D!~^v^hoA@Z5/*TU[RFq_n'n"}z{qhQ^Q3'Mexsxg0XW>+CbEOca91voac=<YfvQ8HrQFkH>P/w]>n_nS]V_ZL>XRSYWi:{MzalK9Hb^=B}Y*[x*MOX7R=*V}PI.HG~2
X-Mailer: Z-Mail (3.2.0 26oct94 MediaMail)
To: parkbench-comm@CS.UTK.EDU
Subject: ParkBench Committee Meeting
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii

Dear Colleague,

Here is the revised agenda.
Please send me ASAP a short email if you come
so that we can arrange for a meeting room.

-------------------
The ParkBench (Parallel Benchmark Working Group)
will meet in Knoxville, Tennessee on
May 9th, 1997.

The meeting site will be the Knoxville Downtown Hilton Hotel.
We have made arrangements with the Hilton Hotel in Knoxville.

  Hilton Hotel
  501 W. Church Street
  Knoxville, TN
  Phone:  423-523-2300

When making arrangements tell the hotel you are associated with
the 'ParkBench'. The rate about $79.00/night.
You can download a postscript map of the area by looking at
http://www.netlib.org/utk/people/JackDongarra.html.

----------------
The tentative agenda for the meeting is:

  1. Minutes of last meeting (MBe)

     Changes to Current release:
  2. Low Level (ES, VG, RS)
     comms1, comms2, comms3, poly2
  3. Linear Algebra (ES)
  4. Compact Applications - NPBs (SS, ES)

     New benchmarks:
  5. HPF Low Level benchmarks (MBa)
  6. Java Low-Level Benchmarks (VG)
  7. New I/O benchmark benchmarks (MBa)
  8. New performance database design and new benchmark output format
     Update of GBIS with new Web front-end (MBa,TH)

     Report from other benchmark activities
  9. ASCI Benchmark Codes (AH)
 10. SPEC-HPG (RE, JD)

     ParkBench:
 11. ParkBench Bibliography
 12. ParkBench Report 2

     Other Activities:
 13. Discussion of the ParkBench Workshop 11/12 September, UK (TH, MBa)
 14. PEMCS - "Electronic Benchmarking Journal" - status report - (TH, MBa)
 15. Status of Funding proposals (JD, TH)

 15. Miscellaneous -

 16. Date and venue for next meeting -


  (MBa) Mark Baker          Univ. of Portsmouth
  (MBe) Michael Berry       Univ. of Tennessee
  (JD)  Jack Dongarra       Univ. of Tenn./ORNL
  (RE)  Rudi Eigenmann      SPEC
  (VG)  Vladimir Getov      Univ. of Westminister
  (TH)  Tony Hey            Univ. of Southampton
  (AH)  Adolfy Hoisie       LLNL
  (SS)  Subhash Saini       NASA Ames
  (RS)  Ron Sercely         HP/CXTC
  (ES)  Erich Strohmaier    Univ. of Tennessee


Jack Dongarra
Erich Strohmaier


From owner-parkbench-comm@CS.UTK.EDU Tue May  6 14:46:45 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id OAA04480; Tue, 6 May 1997 14:46:45 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id OAA25737; Tue, 6 May 1997 14:34:05 -0400
Received: from punt-2.mail.demon.net (relay-11.mail.demon.net [194.217.242.137]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id OAA25715; Tue, 6 May 1997 14:33:58 -0400
Received: from minnow.demon.co.uk ([158.152.73.63]) by punt-2.mail.demon.net
           id aa1000641; 6 May 97 19:07 BST
Message-ID: <UOrwADAXM3bzEwfw@minnow.demon.co.uk>
Date: Tue, 6 May 1997 19:06:15 +0100
To: parkbench-comm@CS.UTK.EDU
From: Roger Hockney <roger@minnow.demon.co.uk>
Subject: Parkbench Meeting Documents
In-Reply-To: <9705021544.ZM8346@blueberry.cs.utk.edu>
MIME-Version: 1.0
X-Mailer: Turnpike Version 3.01 <kRL7V2isFfDmnKSZb08I5Tyfx$>

AGENDA ITEM:

>     Changes to Current release:
>  2. Low Level (VG)
>     comms1, comms2,

Two documents will be submitted to the committee on this item by Roger
Hockney and Vladimir Getov (Westminster University, UK). They can be
downloaded as postscript files from:

"New COMMS1 Benchmark: Results and Recommendations"
http://www.minow.demon.co.uk/Pbench/comms1/PBPAPER2.PS
 
"New COMMS1 Benchmark: The Details"
http://www.minow.demon.co.uk/Pbench/comms1/PBPAPER3.PS

The papers will be presented by Vladimir who will bring some paper
copies with him.

Best wishes
Roger and Vladimir
-- 
Roger Hockney.  Checkout my new Web page at URL   http://www.minnow.demon.co.uk
University of   and link to my new book: "The Science of Computer Benchmarking"
Westminster UK  suggestions welcome. Know any fish movies or suitable links?

From owner-parkbench-comm@CS.UTK.EDU Tue May  6 17:54:47 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id RAA07526; Tue, 6 May 1997 17:54:46 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id RAA17012; Tue, 6 May 1997 17:48:50 -0400
Received: from punt-1.mail.demon.net (relay-7.mail.demon.net [194.217.242.9]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id RAA17003; Tue, 6 May 1997 17:48:47 -0400
Received: from minnow.demon.co.uk ([158.152.73.63]) by punt-1.mail.demon.net
           id aa0623986; 6 May 97 21:37 BST
Message-ID: <IQsX3CAKQ5bzEw9M@minnow.demon.co.uk>
Date: Tue, 6 May 1997 21:26:50 +0100
To: parkbench-comm@CS.UTK.EDU
From: Roger Hockney <roger@minnow.demon.co.uk>
Subject: Parkbench Meeting Documents (Correction)
MIME-Version: 1.0
X-Mailer: Turnpike Version 3.01 <kRL7V2isFfDmnKSZb08I5Tyfx$>

I am resending this because there was a typo in the URLs:
There are two MM in "minnow". 

Also if you took PBPAPER2.PS before receiving this repeat message,
please take it again as I have corrected two errors in the graphs.

SORRY 
Roger
************************
AGENDA ITEM:

>     Changes to Current release:
>  2. Low Level (VG)
>     comms1, comms2,

Two documents will be submitted to the committee on this item by Roger
Hockney and Vladimir Getov (Westminster University, UK). They can be
downloaded as postscript files from:

CORRECTED URLs:

"New COMMS1 Benchmark: Results and Recommendations"
http://www.minnow.demon.co.uk/Pbench/comms1/PBPAPER2.PS
              
"New COMMS1 Benchmark: The Details"
http://www.minnow.demon.co.uk/Pbench/comms1/PBPAPER3.PS

The papers will be presented by Vladimir who will bring some paper
copies with him.

Best wishes
Roger and Vladimir
-- 
-- 
Roger Hockney.  Checkout my new Web page at URL   http://www.minnow.demon.co.uk
University of   and link to my new book: "The Science of Computer Benchmarking"
Westminster UK  suggestions welcome. Know any fish movies or suitable links?

From owner-parkbench-comm@CS.UTK.EDU Mon May 12 05:36:41 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id FAA24086; Mon, 12 May 1997 05:36:41 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA10068; Mon, 12 May 1997 05:18:21 -0400
Received: from haven.EPM.ORNL.GOV (haven.epm.ornl.gov [134.167.12.69]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id FAA10051; Mon, 12 May 1997 05:18:18 -0400
Received: (from worley@localhost) by haven.EPM.ORNL.GOV (8.8.3/8.8.3) id FAA29262; Mon, 12 May 1997 05:18:16 -0400 (EDT)
Date: Mon, 12 May 1997 05:18:16 -0400 (EDT)
From: Pat Worley <worley@haven.EPM.ORNL.GOV>
Message-Id: <199705120918.FAA29262@haven.EPM.ORNL.GOV>
To: parkbench-comm@CS.UTK.EDU
Subject: Gordon Conference on HPC and NII 
Forwarding: Mail from 'Tony Skjellum <tony@aurora.cs.msstate.edu>'
     dated: Sat, 10 May 1997 16:32:12 -0500 (CDT)
Cc: worley@haven.EPM.ORNL.GOV

Just in case you haven't received information on this already, here is a
blurb on the 1997 Gordon conference in high performance computing. 
Unlike previous years, there is not an explicit emphasis on performance
evaluation in this year's stated themes, but you can't (shouldn't) discuss
future architectures and their impacts without discussing how to
evaluate performance, and I am hoping that some benchmarking-minded people
will show up and keep the discussion honest.

---------- Begin Forwarded Message ----------

The deadline for applying to attend the 1997 Gordon conference in high
performance computing is June 1. If you are interested in attending,
please apply as soon as possible. The simplest way to apply is to download
the application form from the web site indicated below, or to use the online
registration option. If you have any problems with either of these,
please contact the organizers at tony@cs.msstate.edu and worleyph@ornl.gov.

-------------------------------------------------------------------------------
The 1997 Gordon Conference on High Performance Computing and
Information Infrastructure: "Practical Revolutions in HPC and NII"

Chair, Anthony Skjellum, Mississippi State University, tony@cs.msstate.edu,
       601-325-8435
Co-Chair, Pat Worley, Oak Ridge National Laboratory, worleyph@ornl.gov,
       615-574-3128

Conference web page: http://www.erc.msstate.edu/conferences/gordon97

July 13-17, 1997
Plymouth State College
Plymouth NH

The now bi-annual Gordon conference series in HPC and NII commenced in 1992
and has had its second meeting in 1995.  The Gordon conferences are an
elite series of conferences designed to advance the state-of-the-art in
covered disciplines. Speakers are assured of anonymity and
referencing presentations done at Gordon conferences is prohibited by
conference rules in order to promote science, rather than publication
lists.  Previous meetings have had good international participation,
and this is always encouraged. Experts, novices, and technically
interested parties from other fields interested in HPC and NII are
encouraged to apply to attend.

All attendees, including speakers, poster presenters, and session chairs
must apply to attend. We *strongly* encourage all poster presenters to have
their poster proposals in by May 13, 1997, though we will consider poster
presentations up to six weeks prior to the conference.  Application to
attend the conference is also six weeks in advance.

More information on the conference can be found at the web page
listed above, including the list of speakers and poster presenters
and information on applying for attendance.


----------- End Forwarded Message -----------


From owner-parkbench-comm@CS.UTK.EDU Tue May 13 13:58:00 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id NAA20879; Tue, 13 May 1997 13:57:59 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id NAA11997; Tue, 13 May 1997 13:33:14 -0400
Received: from timbuk.cray.com (timbuk-fddi.cray.com [128.162.8.102]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id NAA11983; Tue, 13 May 1997 13:33:10 -0400
Received: from ironwood.cray.com (root@ironwood-fddi.cray.com [128.162.21.36]) by timbuk.cray.com (8.8.5/CRI-gate-news-1.3) with ESMTP id MAA20939 for <parkbench-comm@CS.UTK.EDU>; Tue, 13 May 1997 12:33:07 -0500 (CDT)
Received: from magnet.cray.com (magnet [128.162.173.162]) by ironwood.cray.com (8.8.4/CRI-ironwood-news-1.0) with ESMTP id MAA16428 for <parkbench-comm@CS.UTK.EDU>; Tue, 13 May 1997 12:33:06 -0500 (CDT)
From: Charles Grassl <cmg@cray.com>
Received: by magnet.cray.com (8.8.0/btd-b3)
          id RAA20181; Tue, 13 May 1997 17:33:04 GMT
Message-Id: <199705131733.RAA20181@magnet.cray.com>
Subject: Parkbench directions
To: parkbench-comm@CS.UTK.EDU
Date: Tue, 13 May 1997 12:33:04 -0500 (CDT)
X-Mailer: ELM [version 2.4 PL24-CRI-d]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit


To:   ParkBench Group
From: Charles Grassl

Date: May 13, 1997

(Long)

I appreciated the meeting this past week and wish to thank Eric and Jack 
for hosting it.  I am aware of the great effort of many individuals
have contributed to developing and implementing the ParkBench suite.
In spite of this, I feel that we need to evaluate and correct our course.

ParkBench should not merge with or use benchmarks from the SPEC/HPG
(High Performance Group) group.  SGI/Cray and IBM have already
withdrawn from the SPEC/HPG group and Fujitsu and NEC are no longer
participating.  The reasons for these companies and other institutions
no longer participating should indicate to us (ParkBench) that
something is amiss with the SPEC/HPG benchmarks and paradigm.

Several of the reasons for the supercomputer manufacturers not
supporting the SPEC/HPG effort are listed below.  I list these reasons
so that the ParkBench group can learn from them and avoid the same
problems.

- Relevance.  The particular benchmark programs being used by SPEC/HPG
  are not relevant or appropriate for supercomputing.  The programs in
  the current SPEC/HPG suite do not represent any leading edge software
  which is more typical of usage for high performance systems.

- Redundancy.  The programs being developed by SPEC/HPG
  are not qualitatively or quantitatively different from the SPEC/OSG
  programs and as such, it is viewed as redundant and expensive.

- Methodology.  The methodology being used by SPEC/HPG to
  procure, develop and run benchmarks lacks scientific and technical
  basis and hence results have a vague and arbitrary interpretation.

- Programming model.  Designing benchmarks for portability across
  systems is a convenient idea but does not reflect actual constraints
  or usage.  More often than not, compatibility with a PREVIOUS model
  of computer is more important than compatibility ACROSS computers.

- Expense.  Some of the large data cases for the SPEC/HPG programs
  will requires hours or days to run with little new data or
  information gained by the exercise.  These exercises are extremely
  expensive both in time and capital equipment and in logistics.

- Ergonomics.  The cumbersome design of SPEC/HPG Makefiles and build
  procedures make the programs difficult and expensive to test,
  maintain and analyze.

We in the ParkBench group must acknowledge the above items if we are to
maintain interest and participation from computer vendors.  I believe
that reorganizing and refocusing the group could revitalize high
performance computer benchmarking and and re-invigorate the ParkBench
group.

As the ParkBench suite now stands, there are too many programs and they
are difficult to build, test and maintain.  This situation impedes
usage and participation.  Here are a few suggestions for our future
practices and directions:

- Design and write benchmarks programs.  Don't borrow or solicit old
  code.  The borrowed or solicited code is never quite appropriate and
  usually obsolete.  Our greatest asset is that we have scientist who
  are capable of designing experiments (benchmarks).  (Build value.)

- Monitor and evaluate accuracy.  Though we mention accuracy in
  ParkBench Report 1, we haven't applied it to the current programs
  (Scientifically validate, or invalidate, our experiments.)

- Make it simple.  Write and develop simple programs which do not need
  elaborate build procedures and which easier to test and to maintain.
  (Keep It Simple, Stupid.)

- Build a better user interface.  The belabored "run rules" and the
  interface with layers of Makefiles, includes and embedded relative
  file paths is unacceptable.  An acceptable interface might require
  binary distribution and hence a desirable emphasis on designing and
  running rather than building and porting the benchmarks.  (Make the
  product more attractive to more users.)

- Make the suite truly modular.  The current structure makes the
  simplest one CPU program as difficult to build and run as the most
  complicated program with Makefile includes, special compilers, source
  file includes, special libraries, suite libraries, etc. (Make it
  manageable.)

- Drop the connection with SPEC/HPG and with NPB.  This "grand
  unifying" scheme make redundant code.  It has had the opposite effect
  of focusing benchmarking attention on ParkBench because it is yet
  another collection of benchmarks used by other organizations.  (Be
  distinguishable and identifiable.)

- Emphasis what ParkBench is associated with:  benchmarking distributed
  memory parallel computers.  We should write and develop benchmark
  programs which measure and instrument the parallel processing aspect
  of MPP systems.  (Keep our focus.)


I volunteer to develop and write a suite of message passing test
programs which measure the performance and variance of message passing
communication schemes.  I have much experience with writing such a
programs and believe that such suite would be useful for others and for
the computer industry in general.

I hesitate to contribute such programs to the present structure for
several reasons:

- The network test suite does not logically fit into the current
  "hierarchy" and hence might further clutter the ParkBench suite and
  make it further unfocused.

- The current ParkBench structure is not manageable.  Testing and
  maintenance would be extremely expensive in the current structure.

- My company's effort may be interpreted as an endorsement of the
  current structure and model.  The suite is not popular with vendors
  for reasons outlined above.  Participation is currently discouraged.


Discussion?  


Regards,
Charles Grassl
SGI/Cray
Eagan, Minnesota  USA

From owner-parkbench-comm@CS.UTK.EDU Wed May 21 17:25:15 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id RAA27513; Wed, 21 May 1997 17:25:15 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id RAA07579; Wed, 21 May 1997 17:18:07 -0400
Received: from rastaman.rmt.utk.edu (root@TCHM11A6.RMT.UTK.EDU [128.169.27.188]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id RAA07571; Wed, 21 May 1997 17:18:02 -0400
Received: from rastaman.rmt.utk.edu (localhost [127.0.0.1]) by rastaman.rmt.utk.edu (8.7.6/8.7.3) with SMTP id RAA01108; Wed, 21 May 1997 17:24:43 -0400
Sender: mucci@CS.UTK.EDU
Message-ID: <3383681A.D98C5FB@cs.utk.edu>
Date: Wed, 21 May 1997 17:24:42 -0400
From: "Philip J. Mucci" <mucci@CS.UTK.EDU>
Organization: University of Tennessee, Knoxville
X-Mailer: Mozilla 3.01 (X11; I; Linux 2.0.28 i586)
MIME-Version: 1.0
To: parkbench-comm@CS.UTK.EDU
CC: "PVM Developer's Mailing List" <pvmspankers@msr.epm.ornl.gov>
Subject: Mesg Passing Benchmarks
References: <199705131733.RAA20181@magnet.cray.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Hi all,

Charles Grassl in his last message to this committee volunteered
to write a suite of message passing benchmarks to replace the Low
Levels...Before any action on his or this committee's part, I would
recommend that you all have a look at version 3 of my pvmbench
package. It now does MPI as well and can easily support other
message passing primitives with a few #defines. 

Version 3 along with some sample results can be found at
http://www.cs.utk.edu/~mucci/pvmbench.

Note that this has not been tested on any MPP's with UTK PVM.

This benchmark will generate and graph the following:

bandwidth
gap time (to buffer an outgoing message)
roundtrip (latency /2)
barrier/sec
broadcast
summation reduction

Other tests can easily be added...I would highly recommend before any 
action done that this code be examined. It is less than a year old, 
version 3 available on that page is in beta, i.e. it has not been
released to the general public. Let me know what you think...

-Phil

-- 
/%*\ Philip J. Mucci | GRA in CS under Dr. JJ Dongarra /*%\
\*%/ http://www.cs.utk.edu/~mucci  PVM/Active Messages \%*/

From owner-parkbench-comm@CS.UTK.EDU Fri May 23 12:03:04 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id MAA06549; Fri, 23 May 1997 12:03:03 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id LAA15901; Fri, 23 May 1997 11:05:32 -0400
Received: from berry.cs.utk.edu (BERRY.CS.UTK.EDU [128.169.94.70]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id LAA15895; Fri, 23 May 1997 11:05:30 -0400
Received: from cs.utk.edu by berry.cs.utk.edu with ESMTP (cf v2.11c-UTK)
          id LAA01370; Fri, 23 May 1997 11:05:31 -0400
Message-Id: <199705231505.LAA01370@berry.cs.utk.edu>
to: parkbench-comm@CS.UTK.EDU
Subject: Minutes of May ParkBench Meeting
Date: Fri, 23 May 1997 11:05:31 -0400
From: "Michael W. Berry" <berry@CS.UTK.EDU>

Here are the minutes from the recent ParkBench meeting in Knoxville.
Best regards,
Mike

-----------------------------------------------------------------
Minutes of ParkBench Meeting - Knoxville Hilton, May 9, 1997
-----------------------------------------------------------------

ParkBench Attendee List:

     (MBa) Mark Baker          Univ. of Portsmouth   mab@sis.port.ac.uk
     (MBe) Michael Berry       Univ. of Tennessee    berry@cs.utk.edu
           Shirley Browne      Univ. of Tennessee    browne@cs.utk.edu
     (JD)  Jack Dongarra       Univ. of Tenn./ORNL   dongarra@cs.utk.edu
           Jeff Durachta       Army Res. Lab MSRC    durachta@arl.mil
     (VG)  Vladimir Getov      Univ. of Westminister getovv@wmin.ac.uk
     (CG)  Charles Grassl      SGI/Cray              cmg@cray.com
     (TH)  Tony Hey            Univ. of Southampton  ajgh@ecs.soton.ac.uk
     (AH)  Adolfy Hoisie       Los Alamos Nat'l Lab  hoisie@lanl.gov
     (CK)  Charles Koelbel     Rice University       chk@cs.rice.edu
     (PM)  Phil Mucci          Univ. of Tennessee    mucci@cs.utk.edu
           Erik Riedel         GENIAS Software GmbH  erik@genias.de
     (SS)  Subhash Saini       NASA Ames             saini@nas.nasa.gov
     (RS)  Ron Sercely         HP-Convex             sercely@convex.hp.com
           Alan Stagg          CEWES                 stagga@wes.army.mil
     (ES)  Erich Strohmaier    Univ. of Tennessee    erich@cs.utk.edu
     (PW)  Pat Worley          Oak Ridge Nat'l Lab   worleyph@ornl.gov

SPEC-HPG Visitors:

           Don Dossa           DEC                   dossa@eng.pko.dec.com
     (RE)  Rudi Eigenmann      Purdue University     eigenman@ecn.purdue.edu
           Greg Gaertner       DEC                   ggg@zko.dec.com
           Jean Suplick        HP                    suplick@rsn.hp.com
           Joe Throp           Kuck & Associates     throp@kai.com

At 9:05am EST, TH opened the meeting and ask that all the attendees
introduce themselves.  After a brief overview of the proposed agenda,
MBe reviewed the minutes from the last ParkBench meeting in October
of '96.  The minutes were unanimously accepted and TH asked VG to
present the proposed changes to the low-level benchmarks (9:20am).

VG reviewed the original COMMS1 (ping-pong or simplex communication) and
the COMMS2 (duplex communication) low-level benchmarks.  He discussed
some of the problems with the previous versions.  These included the
omission of calculated bandwidth, large message length problems, and
large errors in the asymptotic fit.   In collaboration with RS and CG,
a number of improvements have been made to these benchmarks:

	1. Measured bandwidth is provided in output.
	2. Time for shortest message is provided.
	3. Maximum measured bandwidth and the corresponding message
	   length is now provided.
	4. The accuracy of the least-squares 2-parameter fit has been
	   improved (sum of squares of the "relative" and not absolute
	   error is now used).
	5. New 3-parameter variable-power fit for certain cases added.
	6. Can report parametric fits if the error is less than some
   	   user-specified tolerance.
	7. Introduce KDIAG parameter to invoke diagnostic outputs.
	8. Modifications fo ESTCOM.f (as suggested by RS).
    
CG pointed out that it may not always be possible to interpret zero-length 
messages for these codes.  On the Cray machines, such messages force an 
immediate return (i.e., no synchronization).  He proposed that allowing zero-
length messages be removed for the COMMS benchmarks.  RS showed an actual
COMMS1 performance graph demonstrating the difficulty of data extrapolation
(if used to get latency for zero-length message-passing).  RS pointed out,
however, that zero-length message are defined w/in MPI, and suggested that
a simple return (as in the case of Cray machines) is not standard.

VG displayed some of the observed COMMS1/2 performance obtained on the
Cray T3E.  The 3-parameter fit yielded a 7% relative error for messages
ranging from 8 to 1.E+7 bytes.  CG questioned how the breakpoints were
determined?  He indicated the input parameters to the program required
previous knowledge of where breakpoints occur (although implementations
could change constantly).  TH suggested that the parametric fitting should
not be the default for these benchmarks, i.e., separate the analysis from
the actual benchmarking (this concept was seconded by CG).  RS suggested
that the fitting routines could be placed on the WWW/Internet and the
COMMS1/2 codes simply produce data.  CK, however, stressed that the codes
should maintain some minimal parametric fitting for clarity and
consistency of output interpretations.  

The minimal message length shown for the T3E results shown by VG was 8 and
the corresponding minimal message length for a Convex CXD set of
COMMS benchmarks was 1.  The lack of similar ranges of messages could
pose problems for comparisons.  JD strongly felt that users will return
to the notion of "latency" and want zero-length message overheads.  Users
may be primarily interested in start-up time for message-passing.  RS pointed
out that MPI does process zero-length messages.  JD suggested that
the minimal message length for the COMMS benchmarks be 8 bytes and RS proposed
that the minimal message-passing time and corresp. message length be
an output.  After more discussion, the following COMMS changes/outputs were 
unanimously agreed upon:

	1.  Maximum bandwidth with corresp. message size.
	2.  Minimum message-passing time with corresp. message size.
	3.  Time for minimum message length (could be 0, 1, 8, or 32 bytes
            but must be specified).
	4.  The software will be split into two program: one to report
	    the spot measurements and the other for the analysis.


At 10:00 am, SPEC-HPG members joined the ParkBench meeting for a joint
session.  CK reviewed the DoD Modernization Program.  He indicated that
the program is based on 3 primary components:

	1. CHSSI (Commonly Highly Scalable Software Initiative)
	2. DREN (Defense Research & Engineering Network)
	3. Shared Resource Centers (4 Major Shared Resource Centers or
           MSRC's and 20 Distributed Centers or DC's)

Benchmarking is part of the mission of the MSRC's, especially for
system integration and the Programming Environment & Training (PET)
team.  CK mentioned that the resources available at the MSRC's include:

256-proc. Cray T3E, SGI Power Challenge (CEWES), 256 proc. IBM SP/2 and
SGI Origin 2000 at ASC, SGI 790 at NAVO, and a collection of {SGI Origin,
Cray Titan, J90} at the Army Research Lab.

The benchmarking needs of the DoD program can be categorized as either
contractual or training.  The contractual needs are specified as PL1
(evaluation of initial machines), PL2 (upgrade to gain 3 times the
performance of PL1), and PL3 (upgrade to gain 10 times the performance
of PL1).  CK mentioned that the MSRC's are planning for the PL2 phase
later this year with PL3 scheduled in approx. 3 years.
The training needs include: the evaluation of programming paradigms,
the evaluation of performance trade-offs, templates for designing new
codes, and benchmarks for training examples.

The contractual benchmarks comprise 30 benchmarks (22 programs) some
of which are export-controlled or proprietary (data may not be used
in the public domain in some cases).  The run rules specify the number
of iterations for each benchmark in the suite.  Each MSRC uses a different
number of iterations per benchmark.  Code modifications are allowed (parallel
directives and message-passing can be used but no assembler) and algorithm
substitutions are permitted provided the problem does not become specialized.
The only performance metric reported for these benchmarks is the elapsed
time for the entire suite.  Benchmarks can be upgraded to reflect current
workloads of the MSRCs but they must be compared head-to-head with 
previous systems.

Example codes included in the DoD benchmark suite include: CTH (finite
volume shock simulation), X3D (explicit finite element code), OCEAN-O2 (an
ocean modeling code), NIKE3D (implicit nonlinear 3D FEM), and Aggregate
I/O benchmark.

Planned benchmarking activites for the DoD Modernization Program include:

	1. benchmarks for evaluating programming techniques (determine what
           works; develop decision trees)
	2. benchmarks for teaching (classes on "worked" examples; template
           modification)

This effort currently has 1 FTE and over 50 University personnel (in PET
program) involved (although they are not primarily responsible for
benchmarking work).

At 10:35am, TH asked AH from Los Alamos Nat'l Lab to overview their ASCI
benchmark suite.  He began by pointing out that these codes formulate the
"Los Alamos set of" ASCI Benchmarks.  Before presenting the list of codes,
AH noted that the philosophy of this activity was to achieve 
"experiment ahead" capability especially with immature computing platforms.
Los Alamos is also interested in developing performance modes as well as
kernels.  The list of active/research codes and compact applications 
comprising this suite are:

Code		Language(s)	Parallelism 	Description           
    
*HEAT(RAGE)	f77, f90	MPI(f90)	Eulerian adaptive mesh
				MPIfSM(f77)	refinement based on
						Riemann solvers; coupled
						physics-CFD; particle &
						radiative transport

EULER		f90		MPI		Admissable fluid (for SIMD);
				SIMD(SP		unstructured mesh, explicit
				vector)		solution; high-speed fluids;
						SP=single processor

NEUT		f77		MPI,SM,		Monte-Carlo, particle
				SHMEM

SWEEP3D		f90		MPI, SHMEM	Inner/outer iteration (kernel)
                                                (compact application)

HYDRO(T)	f77		Serial          (compact application)

TBON		f77		MPI		Material science; quantum
						mechanics; polymer age    
						simulation

*TECOLOTE	C++		MPI		Mixed call hydro. with regular
						structured grid

*TELURIDE	f90		MPI		Casting simulation; irregular
						structured grid; Krylov solution
						methods

*DANTE		HPF		MPI

* = export controlled

The codes and compact apps above vary in size from 2,000 to 35,000 lines.
AK noted that LANL could provide support for future ASCI-based ParkBench codes. 
The ASCI benchmark suite presented might include in the future tri-lab
(Livermore, Sandia, Los Alamos) contributions.  The ASCI application suite can
be set up with data sets leading to varying run-times.  AH mentioned that Los 
Alamos' ASCI benchmarking efforts are focused on high performance computing,
leading edge architectures, algorithms, and applications.  They are 
particularly concentrating in developing expertise in distributed shared-memory
performance evaluation and modeling.  AH expressed the hope that the efforts of
ParkBench will follow similar directions.

At 11:05am, SS reviewed some of the most recent NAS Parallel Benchmarks results.
He began with vendor-optimized CG Class B results using row and column 
distribution blocking.  Results for different numbers of processors of the T3D 
were reported along with results for the NEC SX-4, SGI Origin 2K, Convex SPP2K,
Fujitsu VPP700, and IBM P2SC.  He also showed results for FT Class B and BT 
Class B (all machines reported performed well on this benchmark).  For BT, it 
was pointed out that 4 of the machines (Cray T3E, DEC Alpha, IBM P2SC, and NEC 
SX-4) essentially are based on the same processor but achieve widely-varying
results.  SS also reported HPF Class A MG results on 16 processors of the IBM 
SP2.  The HPF version (APR-HPF/Portland Group compiled) was only 3 times slower
than the MPI-based (f77) implementation.  This is indeed a significant result 
given that two years ago the HPF version was as much as 10 times slower than 
the comparable MPI version.  An HPF version of the Class A FT benchmark on 64 
processors was shown to be faster than the MPI version (1.6 times faster) when
optimized libraries are used in both versions.  For the Class A SP benchmark
(on 64 processors of the SP/2), the APR- and PGI-compiled HPF versions were 
within a factor of 2 of the MPI versions.  Finally, the HPF Class A BT code on 
64 processors of the Cray T3D was within a factor of 0.5 of the MPI version.

At 11:35am, TH invited RE to overview current SPEC-HPG activities.  The SPEC-HPG
benchmarks define a suite of real-world high-performance computing applications
designed for comparisons across different platforms (serial and message-
passing).  RE pointed out the history of the SPEC-HPG effort as a merger between
the PERFECT and SPEC benchmarking activities.  The current SPEC-HPG suite is
comprised of 2 codes: SPECchem96 and SPECseis96.  The SPECchem96 code evolved
from the GAMES code used in pharmaceutical and chemical industries.  It
comprises 109,389 lines of f77 (21% comments), 865 subroutines and
functions.  The wave functions are written to disk.  The SPECseis96 code
is derived from the ARCO benchmark suite which consists of four phases: data
generation, stack data, time migration, and depth migration.  This code
decomposes the domain into n equal parts (for n processors) with each part
processed independently.  It is have over 15K lines of code made up of
230 Fortran subroutines and 199 C functions for I/O and systems utilities.
SPECseis96 uses 32-bit precision, FFT's, Kirchoff integrals, and finite
differences.

The very first set of SPEC-HPG benchmark results were approved on May 8,
1997 (preceding day).  New benchmarks being considered are PMD (Parallel
Molecular Dynamics) and MM5 (NCAR Weather Processing C code).  The decision
on whether or not to accept these two potential SPEC-HPG codes will be made
in about 5 months.  The SPEC-HPG run rules permit the use of compiler
switches, source code changes, optimized libraries (which have been
disclosed to customers).  Only approved algorithmic changes will be disclosed.
RE gave the URL for the SPEC-HPG effort: http://www.specbench.org/hpg.  He
also referred to a recent article by himself and S. Hassanzadeh in "IEEE
Computational Science & Engineering" and two email reflectors for SPEC-HPG
communication: comments@specbench.org and info@specbench.org.

JD then gave a brief history of ParkBench and SPEC-HPG interactions and
suggested that the two efforts might consider sharing results and software.
The biggest difference in the two efforts is in the availability of
software as ParkBench code is freely available and SPEC-HPG software
has some restrictions.  A forum to publish both sets of results was discussed
and it was agreed that both efforts should at least share links on their
respective webpages.  RE pointed out that anyone can get the SPEC-HPG CD
of benchmarks without actually being a SPEC member.

JD stressed that the process of running codes (for any suite) needs to
be simplified so that building executables for different platforms is not
problematic.  Modifications for porting should be restricted to driver programs.
RS indicated that he has Perl scripts that runs all low_level, including 
COMMS3 for 2 to N procs, and produces a summary of the results. 

*** ACTION ITEM ***
JD, RE, AH, and CK will discuss a potential joint effort to simplify the
running of benchmark codes (contact RS also about his Perl scripts).

MBa noted that the SPEC-HPG members should be added to the ParkBench
email list (parkbench-comm@cs.utk.edu).  He also indicated that European
benchmarking workshop scheduled next Fall might coordinate with the
European SPEC group (scheduled for Sept. 11-12).

At 12:10pm, the attendees went to the lunch (Soup Kitchen).

After lunch (1:30pm), TH asked ES and VG to coordinate changes to the
COMMS benchmarks discussed above (*** ACTION ITEM ***).  ES then discussed
modifications to poly2 for the ParkBench V2.2 suite.  The proposed changes
include
	1. enlarged arrays A(1000000), B(1000000)
	2. removal of arrays C and D
	3. avoid cache flush (use a sliding vector), i.e., 

             DO I=1,N               DO I=NMIN,NMAX
                         becomes       ...

                                       NMIN=NMIN+N+INC

           where INC=17 by default (avoids reuse of the old cache line).

PM then discussed a program for determining parameters for memory subsystems.
Characteristics of this software include the use of tight loops, independent
memory references, maximized register use.  He showed graphs of memory
hierarchy bandwidth (reads and writes) depicting memory size (ranging from 4Kb
to 4Mb) versus Mb/sec transfer rates.  Some curves illustrated the effective 
cache size quite well.  PM pointed out that dynamically-scheduled processors
pose a significant problem for this type of modeling.  The program can be
run with or without a calibration loop exploiting known memory transfer data.
CG suggested that it would be nice to have such a program to measure latency
at all levels of the hierarchy.  PM's webpages for this program are:

	http://www.cs.utk.edu/~mucci/cachebench and
	http://www.cs.utk.edu/~mucci/parkbench.

CK suggested that an uncalibrated version of PM's benchmark would be more
useful to users (more reflective of real codes).  JD pointed out that the
output of the program could be tabulated bandwidths, latencies, etc.  CG
felt this program would be a very useful tool.  PM noted that the calibration
will not be used by default.  TH suggested that the ParkBench effort might
want to develop a future "ParkBench Tool Set" which contains progams like
this one developed by PM.

With regard to the Linalg Kernels, ES noted that although many of the
routines have calls to Scalapack routines, Scalapack will not be included
in future software releases.  Users will have to ge their own copies of
the source (or binaries) for Scalapack.  The size of these particular
kernel benchmarks drops by a factor of one-third by removing Scalapack.

*** ACTION ITEM ***
ES will report the most recent Linalg benchmark performance results at the
next ParkBench meeting.

TH then asked for discussions on new benchmarks with MBa leading the
discussion on HPF benchmarks.  MBa indicated that a new mail reflector
(parkbench-hpf@cs.utk.edu) had been set up for this cause with himself
as moderator for low-level codes (CK will moderate kernels and SS will
moderate discussions on HPF compact applications).  MBa noted that there
is limited manpower for the HPF benchmarking activities.  CK noted that
he had discussed this effort at recent the HPFF meeting (and other
users meetings).  A draft document on the ParkBench HPF benchmarks is
available at http://www.sis.port.ac.uk/~mab/ParkBench.  MBa felt strongly
that without manpower support this particular activity will die and that
a lead site is needed.

*** ACTION ITEM ***
CK and SS will investigate interest in HPF compact application development.

JD indicated that wrappers are being used to create HPF versions of the
Linalg kernels.  The procedure involves writing wrappers for the current
Scalapack driver programs.  Eventually, these programs may be completely
rewritten in HPF (this will start in the summer).  TH suggested that HPF
kernel benchmark performance be reported at the ParkBench meeting 
in September (at Southampton Performance Workshop).

MBa went on to report on the status of I/O benchmarks.  Basically, not
much progress has been made on the ParkBench I/O initiative.  A new I/O 
project between ECMWF, FECIT, and the Univ. of Southampton was launched
this past February.  They are looking at the I/O  in the IFS code from
the ECMWF (European Weather Forecasting).  David Snelling is the FECIT
leader who has also participated in ParkBench activities.  This I/O
project has 1 FTE at Southampton and 1.5 FTE at FECIT along with several
personnel at ECMWF.  One workshop, two technical meetings for the 1-year
project is planned.  The goals are: to develop instrumented I/O
benchmarks and build on top of MPI-IO (test, characterize parallel
systems).  Their methodology is very similar to that of ParkBench.
Codes in f90 and ANSI C are being considered (stubs for VAMPIR and
PABLO).  Regular reports to Fujitsu (sponsor of activity) are planned
and a full I/O test suite is planned by February 1998.

MBa also reported on the status of the ParkBench graphical database.
Currently, the performance data is kept in a relational DBMS.  A
frontend Java applet has been written to query the DBMS on-the-fly.
A backend is also in development which will automate the extraction
of new performance data and insertion into the DBMS (via an http
server).  By September, a more complete prototype which will allow
MS access and JDBC between 2 different machines should be ready.

VG then discussed the development of Java-based low-level benchmarks.
He presented a Java-to-C Interface Generator which would allow Java
benchmarks to call existing C libraries on remote machines.  He
presented sample Java+C NAS PB results on a 16-processor IBM SP/2
(Class A IS Benchmark):

           Version        1 Proc  2Procs  4 Procs  8 Procs  16 Procs
           NASA (C)        29.1    17.4     9.4     5.2        2.8
                 C         40.5    24.9     13.1    9.3       15.6
              Java         ----   132.5     64.7   37.9       33.5

At 2:50pm, TH reported other ParkBench activities including the
new PEMCS (Performance Evaluation and Modeling for Computer Systems)
electronic journal.  Suggested articles/authors include:

       *1. ParkBench Report No. 2 (ES, MBe)
       *2. NAS PB
	3. SPEC-HPG
       *4. Top 500
	5. AutoBench (M. Ginsburg)
       *6. Euroben (van der Steen)
	7. RAPS
	8. Europort
       *9. Cache benchmarks
       10. ASCI benchmarks (DoD)
      *11. PERFORM
       12. R. Hockney
      *13. PEPS
       14. C3I/Rome Labs

Those articles possible for Summer '97 are marked via *.  JD suggested
that articles be available in Encapsulated Postscript, PDF (Adobe),
and HTML.  TH noted that EU funding will provide a host computer and
some administration.  Possible publishers are Oxford Univ. Press and
Elsevier.

At 3:10pm, ES requested more items for the ParkBench bibliography
which will be available on the WWW.  PW suggested that authors should
be able to submit links to ParkBench-related applications.  JD then
briefly discussed WebBench which is a website focused on benchmarking
and performance evaluation.  Data is presented on platform,s applications,
organizations, vendors, conferences, papers, newsgroups, FAQ's, and
repositories (PDS, Top500, Linpack, etc.).  The WebBench URL is
http://www.netlib.org/benchweb.

MBa reminded attendees of the Fall Performance Workshop/ParkBench
meeting on (Thursday and Friday) Sept. 11 and 12.  This meeting
will be held at Venue, County Hotel, Southampton, UK.  Invited
and contributed talks will be presented.

With regard to ParkBench funding, JD indicated that the UT/ORNL/NASA
Ames proposal was not selected for funding but that it could be re-
submitted next year.  Expected funding from Rome lab was not received.
TH and VG did not succeed this past year either although some funding
from Fujitsu is possible.

TH adjourned the meeting at 3:25pm EST.

From owner-parkbench-comm@CS.UTK.EDU Tue May 27 10:32:45 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id KAA25239; Tue, 27 May 1997 10:32:45 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id KAA05022; Tue, 27 May 1997 10:12:02 -0400
Received: from exu.inf.puc-rio.br (exu.inf.puc-rio.br [139.82.16.3]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id KAA05013; Tue, 27 May 1997 10:11:53 -0400
Received: from obaluae (obaluae.inf.puc-rio.br) by exu.inf.puc-rio.br (4.1/SMI-4.1)
	id AA20170; Tue, 27 May 97 11:11:00 EST
From: maira@inf.puc-rio.br (Maira Tres Medina)
Received: by obaluae (SMI-8.6/client-1.3)
	id LAA16226; Tue, 27 May 1997 11:10:58 -0300
Date: Tue, 27 May 1997 11:10:58 -0300
Message-Id: <199705271410.LAA16226@obaluae>
To: parkbench-comments@CS.UTK.EDU
Subject: Benchmarks
Cc: parkbench-comm@CS.UTK.EDU, maira@CS.UTK.EDU, victal@CS.UTK.EDU
X-Sun-Charset: US-ASCII

Hello 

I'm a graduate student at the Computer Science Department of PUC-Rio
(Catholic University of Rio de Janeiro). I'm  currently studing
Low_Level benchmarks for measuring basic computer characteristics.

I have had same problems trying to run some of the benchmarks.
For example, the benchmark comms1 for PVM, prints the following errors messages
and stops.
 
    n05.sp1.lncc.br:/u/renata/maira/ParkBench/bin/RS6K>comms1_pvm
      Number of nodes =          2
      Front End System (1=yes, 0=no) =          0
      Spawning done by process (1=yes, 0=no) =          1
      Spawned           0  processes OK...
      libpvm [t4000c]: pvm_mcast(): Bad parameter
      TIDs sent...benchmark progressing...
 
 
   n05.sp1.lncc.br:/u/renata/maira/ParkBench> bin/RS6K/comms1_pvm 
     1525-006 The OPEN request cannot be processed because STATUS=OLD was coded 
     in the OPEN statement but the file comms1.dat does not exist. The program 
     will continue if ERR= or IOSTAT= has been coded in the OPEN statement.
     1525-099 Program is stopping because errors have occurred in an I/O request 
     and ERR= or IOSTAT= was not coded in the I/O statement.
 
 
I would like to know how I can execute the benchmarks only for  PVM.
Can you help me?
 
I have not had problems with benchmarks sequentials (tick1, tick2 ...).
 
Thank you very much for your attention.
 
Maira Tres Medina
Phd. Student
Pontificial Catholic University
Rio de Janeiro, Brazil
 

From owner-parkbench-comm@CS.UTK.EDU Wed May 28 16:36:07 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id QAA15377; Wed, 28 May 1997 16:36:06 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id QAA16158; Wed, 28 May 1997 16:26:41 -0400
Received: from rastaman.rmt.utk.edu (root@TCHM03A16.RMT.UTK.EDU [128.169.27.60]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id QAA16150; Wed, 28 May 1997 16:26:37 -0400
Received: from rastaman.rmt.utk.edu (localhost [127.0.0.1]) by rastaman.rmt.utk.edu (8.7.6/8.7.3) with SMTP id QAA00226; Wed, 28 May 1997 16:33:33 -0400
Sender: mucci@CS.UTK.EDU
Message-ID: <338C968B.124F15AA@cs.utk.edu>
Date: Wed, 28 May 1997 16:33:33 -0400
From: "Philip J. Mucci" <mucci@CS.UTK.EDU>
Organization: University of Tennessee, Knoxville
X-Mailer: Mozilla 3.01 (X11; I; Linux 2.0.28 i586)
MIME-Version: 1.0
To: Maira Tres Medina <maira@inf.puc-rio.br>
CC: parkbench-comments@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU
Subject: Re: Benchmarks
References: <199705271410.LAA16226@obaluae>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Hi,

You need to make sure the dat files are in the executable directory.
They should be installed in $PVM_ROOT/bin/$PVM_ARCH.

-Phil

-- 
/%*\ Philip J. Mucci | GRA in CS under Dr. JJ Dongarra /*%\
\*%/ http://www.cs.utk.edu/~mucci  PVM/Active Messages \%*/

From owner-parkbench-comm@CS.UTK.EDU Thu Jun  5 11:30:41 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id LAA11302; Thu, 5 Jun 1997 11:30:41 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id KAA14227; Thu, 5 Jun 1997 10:53:09 -0400
Received: from haven.EPM.ORNL.GOV (haven.epm.ornl.gov [134.167.12.69]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id KAA14220; Thu, 5 Jun 1997 10:53:07 -0400
Received: (from worley@localhost) by haven.EPM.ORNL.GOV (8.8.3/8.8.3) id KAA06499; Thu, 5 Jun 1997 10:53:06 -0400 (EDT)
Date: Thu, 5 Jun 1997 10:53:06 -0400 (EDT)
From: Pat Worley <worley@haven.EPM.ORNL.GOV>
Message-Id: <199706051453.KAA06499@haven.EPM.ORNL.GOV>
To: parkbench-comm@CS.UTK.EDU
Subject: Gordon conference deadline extended
Forwarding: Mail from 'Pat Worley <worley>'
     dated: Thu, 5 Jun 1997 10:48:07 -0400 (EDT)
Cc: worley@haven.EPM.ORNL.GOV, tony@cs.msstate.edu

(Our apologies if you receive this multiple times.)

There is still room for additional attendees at the Gordon Conference on High
Performance Computing, and the Gordon Research Conference administration has
agreed to extend the application deadline. As a practical matter,
applications need to be submitted no later than JULY 1. We will also stop
accepting applications before that date if the maximum meeting size is
reached, so please apply as soon as possible if you are interested in
attending.   

The simplest way to apply is to download the application form from the web
site 

http://www.erc.msstate.edu/conferences/gordon97

or to use the online registration option available at the same site.
If you have any problems with either of these, please contact the organizers
at tony@cs.msstate.edu and worleyph@ornl.gov. 

Complete information on the meeting is available from the Web site or its
links, but a short summary of the meeting follows:

--------------------------------------------------------------------------

The 1997 Gordon Conference on High Performance Computing and
Information Infrastructure: "Practical Revolutions in HPC and NII"

Chair, Anthony Skjellum, Mississippi State University, tony@cs.msstate.edu,
       601-325-8435
Co-Chair, Pat Worley, Oak Ridge National Laboratory, worleyph@ornl.gov,
       615-574-3128

Conference web page: http://www.erc.msstate.edu/conferences/gordon97

July 13-17, 1997
Plymouth State College
Plymouth NH

The now bi-annual Gordon conference series in HPC and NII commenced in 1992
and has had its second meeting in 1995.  The Gordon conferences are an
elite series of conferences designed to advance the state-of-the-art in
covered disciplines. Speakers are assured of anonymity and
referencing presentations done at Gordon conferences is prohibited by
conference rules in order to promote science, rather than publication
lists.  Previous meetings have had good international participation,
and this is always encouraged. Experts, novices, and technically
interested parties from other fields interested in HPC and NII are
encouraged to apply to attend.

The conference consists of technical sessions in the morning and evening,
with afternoons free for discussion and recreation. Each session consists of
2 or 3 one hour talks, with ample time for questions and discussion. All
speakers are invited and there are no parallel sessions. All attendees are
both encouraged and expected to actively participate, via discussions during
the technical sessions or via poster presentations. 

All attendees, including speakers, poster presenters, and session chairs,
must apply to attend. Poster presenters should indicate their poster
proposals on their applications. While all posters must be approved,
successful applicants should assume that their posters have been accpeted
unless they hear otherwise. 

Meeting Themes:
  Networks: Emerging capabilities and the practical implications
          : New types of networking  
  Real-Time Issues
  Multilevel Multicomputers
  Processors-in-Memory and Other Fine Grain Computational Architectures
  Impact of Evolving Hardware on Applications
  Impact of Software Abstractions on Performance

Confirmed Speakers:
  Ashok K. Agrawala		University of Maryland
  Kirstie Bellman		DARPA/SISTO
  James C. Browne		University of Texas at Austin
  Andrew Chien			University of Illiniois, Urbana-Champaign
  Thomas H. Cormen		Dartmouth College
  Jean-Dominique Decotignie	CSEM
  David Greenberg		Sandia National Laboratories
  William Gropp			Argonne National Laboratory
  Don Heller			Ames Laboratory
  Jeff Koller			Information Sciences Institute
  Peter Kogge			University of Notre Dame
  Chris Landauer		The Aerospace Corporation
  Olaf M. Lubeck		Los Alamos National Laboratory
  Andrew Lumsdaine		University of Notre Dame
  Lenore Mullins		SUNY, Albany
  Paul Plassmann		Argonne National Laboratory
  Lui Sha			Carnegie Mellon Univeristy
  Paul Woodward			University of Minnesota


From owner-parkbench-comm@CS.UTK.EDU Tue Jul  1 17:06:52 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id RAA20550; Tue, 1 Jul 1997 17:06:51 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id RAA21503; Tue, 1 Jul 1997 17:03:35 -0400
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id RAA21438; Tue, 1 Jul 1997 17:02:42 -0400
Received: from baker (baker.npac.syr.edu) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA10168; Tue, 1 Jul 97 22:00:22 BST
Date: Tue,  1 Jul 97 20:55:49    
From: Mark Baker <mab@sis.port.ac.uk>
Subject: Fall 97 Parkbench Workshop - Southampton, UK
To: ejz@ecs.soton.ac.uk, parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU,
        William Gropp <gropp@mcs.anl.gov>,
        Antoine Hyaric <Antoine.Hyaric@comlab.ox.ac.uk>, gent@genias.de,
        gcf@npac.syr.edu, geerd.hoffman@ecmwf.co.uk, reed@cs.uiuc.edu,
        david@cs.cf.ac.uk, clemens-august.thole@gmd.de, klaus.stueben@gmd.de,
        "J.C.T. Pool" <jpool@cacr.caltech.edu>,
        Paul Messina <messina@cacr.caltech.edu>, foster@mcs.anl.gov,
        idh@soton.ac.uk, rjc@soton.ac.uk, plg@pac.soton.ac.uk,
        Graham.Nudd@dcs.warwick.ac.uk
Cc: lec@ecs.soton.ac.uk, rjr@ecs.soton.ac.uk,
        "MATRAVERS Prof. D R STAF" <DRM12@sms.port.ac.uk>,
        wilsona@sis.port.ac.uk, grant <grant@afs.mcc.ac.uk>,
        hwyau@epcc.ed.ac.uk
X-Priority: 3 (Normal)
X-Mailer: Chameleon 5.0.1, TCP/IP for Windows, NetManage Inc.
Message-Id: <Chameleon.867790738.mab@baker>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear All,

This is to let you know that the Department of Electronics and Computer
Science at the University of Southampton is organising a Fall 97 
Parkbench Workshop on the 11th and 12th of September 1997.
See http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/ for futher 
details.

The workshop will include a number of talks from researchers working in
th field of performance evaluation and modelling of computer systems, a panel
discussion session and a Parkbench committee meeting.

The Workshop is free to attend - workshop delegates need only cover their
own travel and accommodation expenses. Attendance is limited and so the 
availability of places at the Workshop will be allocated on a first come basis.

It is planned to turn the talks given at the Workshop into a series of 
short papers which will be put together and published as a Special Issue 
of the electronic journal Performance Evaluation and Modelling of 
Computer Systems (PEMCS).

For further information or registration details refer to the Web pages -
(http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/registration.html).

I would appreciate it if you would kindly pass this email onto colleges who
may be interested in the event.

Regards

Mark


-------------------------------------
Dr Mark Baker
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 7/1/97 - Time: 8:55:49 PM
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Wed Jul 23 17:19:23 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id RAA04434; Wed, 23 Jul 1997 17:19:23 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id RAA28191; Wed, 23 Jul 1997 17:10:39 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id RAA28171; Wed, 23 Jul 1997 17:10:24 -0400 (EDT)
Received: from baker (baker.npac.syr.edu) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA14190; Wed, 23 Jul 97 22:10:30 BST
Date: Wed, 23 Jul 97 22:01:41 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: PEMCS Web Site
To: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Face: "3@c]&iv:nfs&\mp6n<RAc6^[LSl&b"vx:G#zkJus3[uV=a.|~/c]T(LKr/FQ'iWPiMF'4x
 n2{)H=1~y.#7>N90ioxbQ-Eu:]}^MyviIL7YjwT,Cl)|TYpTQ})PP'&O=V`~)JQRWjM?H;'`q\"3mv
 "j@5vs)}!WC3pG9q:;rpe0\LoLQfY"1?1A.\(f=E*&QAW8WK+)*)T0[Bv=[{.-f7<6Ddv!2XaWhH
X-Priority: 3 (Normal)
Message-Id: <Chameleon.869692062.mab@baker>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear All,

The Web site that will host the Journal of "Performance 
Evaluation and Modelling of Computer Systems (PEMCS)" can
be found at:

http://hpc-journals.ecs.soton.ac.uk/PEMCS/

The pages I have put up are at the present still in a 
"draft/under-construction" state.

I would appreciate any comments or feedback about the
pages.

Regards

Mark



-------------------------------------
Dr Mark Baker
DIS, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 07/23/97 - Time: 22:01:41
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Thu Jul 24 08:26:42 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id IAA12708; Thu, 24 Jul 1997 08:26:42 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA04617; Thu, 24 Jul 1997 08:21:55 -0400 (EDT)
Received: from berry.cs.utk.edu (BERRY.CS.UTK.EDU [128.169.94.70]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA04599; Thu, 24 Jul 1997 08:21:23 -0400 (EDT)
Received: from cs.utk.edu by berry.cs.utk.edu with ESMTP (cf v2.11c-UTK)
          id IAA13817; Thu, 24 Jul 1997 08:21:24 -0400
Message-Id: <199707241221.IAA13817@berry.cs.utk.edu>
To: Mark Baker <mab@sis.port.ac.uk>
cc: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU
Subject: Re: PEMCS Web Site 
In-reply-to: Your message of Wed, 23 Jul 1997 22:01:41 -0000.
             <Chameleon.869692062.mab@baker> 
Date: Thu, 24 Jul 1997 08:21:24 -0400
From: "Michael W. Berry" <berry@CS.UTK.EDU>



> Dear All,
> 
> The Web site that will host the Journal of "Performance 
> Evaluation and Modelling of Computer Systems (PEMCS)" can
> be found at:
> 
> http://hpc-journals.ecs.soton.ac.uk/PEMCS/
> 
> The pages I have put up are at the present still in a 
> "draft/under-construction" state.
> 
> I would appreciate any comments or feedback about the
> pages.
> 
> Regards
> 
> Mark
> 
> 
> 
> -------------------------------------
> Dr Mark Baker
> DIS, University of Portsmouth, Hants, UK
> Tel: +44 1705 844285	Fax: +44 1705 844006
> E-mail: mab@sis.port.ac.uk
> Date: 07/23/97 - Time: 22:01:41
> URL http://www.sis.port.ac.uk/~mab/
> -------------------------------------
> 

Mark,
the webpages are well organized.  You might reconsider the
red text on the green background of the menu frame.  It was
difficult to read on my machine at home.

Nice work!
Mike

-------------------------------------------------------------------
Michael W. Berry                     Ayres Hall 114
berry@cs.utk.edu                     Department of Computer Science          
OFF:(423) 974-3838                   University of Tennessee
FAX:(423) 974-4404                   Knoxville, TN  37996-1301
URL:http://www.cs.utk.edu/~berry/
-------------------------------------------------------------------

From owner-parkbench-comm@CS.UTK.EDU Fri Aug  1 12:59:29 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id MAA05831; Fri, 1 Aug 1997 12:59:27 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id MAA01387; Fri, 1 Aug 1997 12:38:00 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id MAA01337; Fri, 1 Aug 1997 12:37:24 -0400 (EDT)
Received: from baker (baker.npac.syr.edu) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA15842; Fri, 1 Aug 97 17:36:11 BST
Date: Fri,  1 Aug 97 17:17:51 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Reminder - Fall Parkbench Workshop
To: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Face: ,<'y31|nlb,jCP5?km9\KD+>p9/e?:|$RRhY]e;#`awGHh=mrY.T??#]-*rt}l0*u`k2A7n
 KlqNG"u'-%cS@3|G[%=m%bSB[lfSn5n"gD4CU(j?1y?#SOkm!qw_=p%c#"6g&(+\Oy6T{4CEShal?z
 M)&Gd'Pb6Qc~>SPx{m[F55=]yY>cN>|/m5)T?q`OTjdQL=7-n%NT({;;$P*2[#7ZWL8baLoI_/F89,
 x'u`*$'<|ctKNYTSJuLV=!$QT3bN*>91V,a0Cc"_UsxwMKg\;#W2LZ$!`j?ZWp;byz~;y}2Dz6i7y%
 E&;gfnmI_~}+oifmWXJMHfWeezBL1("ZnFe!rnX[Q|,:IJ?iq+PePa/[3R4
X-Priority: 3 (Normal)
Message-Id: <Chameleon.870453138.mab@baker>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear All,

This email is a reminder about the:

----------------------------------------------------------------------------------------------------

				Fall ParkBench Workshop

                           Thursday 11th/Friday 12th September 1997 

                               at the University of Southampton, UK


		See http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/


----------------------------------------------------------------------------------------------------

If you are interested in attending the Workshop you should register now and 
reserve accommodation as hotel rooms in Southampton during the workshop period
will be in short supply due to the "International Southampton Boat Show" which 
will also be taking place.

At present we have a preliminary reservation on rooms at the County Hotel where
the Workshop is being held. Without concrete delegate reservations we can only
hold onto there rooms for approximately another week.

Thereafter, accommodation at the Hotel, or around the city, may be more problematic
in getting and reserving. So, I encourage potential Workshop delegates to 
register ASAP.

Mark


-------------------------------------
Dr Mark Baker
University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 08/01/97 - Time: 17:17:52
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Mon Aug 11 13:13:12 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id NAA20171; Mon, 11 Aug 1997 13:13:11 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id NAA06842; Mon, 11 Aug 1997 13:02:59 -0400 (EDT)
Received: from MIT.EDU (SOUTH-STATION-ANNEX.MIT.EDU [18.72.1.2]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id NAA06808; Mon, 11 Aug 1997 13:02:42 -0400 (EDT)
Received: from MIT.MIT.EDU by MIT.EDU with SMTP
	id AA27349; Mon, 11 Aug 97 13:02:14 EDT
Received: from HOCKEY.MIT.EDU by MIT.MIT.EDU (5.61/4.7) id AA01161; Mon, 11 Aug 97 13:02:12 EDT
Message-Id: <9708111702.AA01161@MIT.MIT.EDU>
X-Sender: mmccarth@po9.mit.edu
X-Mailer: Windows Eudora Pro Version 2.1.2
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Mon, 11 Aug 1997 13:02:12 -0400
To: alison.wall@rl.ac.uk, weber@scripps.edu, schauser@cs.ucsb.edu,
        dewombl@sandia.gov, edgorha@sandia.gov, rdskocy@sandia.gov,
        sales@pgroup.com, utpds@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU,
        pancake@cs.orst.edu, johnreed@ghost.CS.ORST.EDU, levesque@apri.com,
        davida@cit.gu.edu.au, gddt@gup.uni-linz.ac.at,
        atempt@gup.uni-linz.ac.at, rileyba@ornl.gov, bac@ccs.ornl.gov
From: "Michael F. McCarthy" <mmccarth@MIT.EDU>
Subject: For Sale: CM-5


   PLEASE FORWARD THIS NOTE TO ANYONE THAT YOU BELIEVE 
   MAY HAVE AN INTEREST IN PURCHASING THIS SYSTEM!
__________________________________________________________________________

Case #3971 -- FOR SALE - CM5 with 128 nodes and SDA --
__________________________________________________________________________

The MIT Lab for Computer Science offers for bid sale a Thinking Machines 
CM-5 Connection Machine (described below).  

Bids to purchase this system are requested from all interested parties,
(with a minimum expected Bid of $25,000).

All bids must be received at the MIT property office by 5:00 PM (EDT)
on Monday, 8/Sept/97.

The machine must be moved from MIT within 10 business days of  acceptance
of the bid. All expenses and arrangements for moving will be made by 
purchaser.

The system consists of:

1) 128 PN CM-5 w/ Vector Units, 256 Network addresses-Part 
          No.CM5-128V-32F
2) Scalable Disk Array with Twenty-four(24) 
          1.2 GB Drives-Part No.CM5-SA25F
3) Control Processor Interface-Part No. CM5-CPI
4) S-Bus to Diagnostics Network Interface-Part No. CM5-SDN
5) S-Bus Network Interface Board(5)-Part No. CM5-SNI

[N.B. On July 16 1997 power was turned off.The machine can be 
turned back on in its present location only until Friday, 22/AUG/97 
when wiring changes are planned in that machine room.]
 
"The Institute reserves the right to reject any or all offers.MIT makes no
warranty of any kind, express or implied, with respect to this equipment.
This includes fitness for a particular purpose. It is the responsibility of 
those making an offer to determine, before making an offer, that the
equipment meets any conditions required by those making that offer.Thank you."
__________________________________________________________________________

Submit bids for Case #3971  
                before Monday, 8/Sept/97, 5:00 PM (EDT) to: 
*****************************************************************
* Michael F. McCarthy       * Phone:  (617)253-2779             *
* MIT Property Office       * FAX:    (617)253-2444             *
* E19-429                   * E-Mail: mmccarth@MIT.EDU          * 
* 77 Massachusetts Ave.     *                                   *
* Cambridge, MA 02139       *                                   *
*****************************************************************
__________________________________________________________________________

SYSTEM HISTORY 

The Project SCOUT CM-5 is housed in M.I.T's Laboratory for Computer 
Science (L.C.S). The machine was acquired in 1993 as part of the the ARPA 
sponsored project SCOUT, and used to accomplish the stated aim of the 
project of "fermenting collaborations between users, builders and
networkers of massively parallel computers". The CM-5 computer, developed
and manufactured by Thinking Machines Corporation, evolved from earlier
T.M.C. computers (the CM-2 and the CM-200)with an architecture targeted 
toward teraflops performance for large, complex data intensive applications.

The MIT hardware consists of a total of 128 32MHz SPARC  microprocessors,
each with 4 proprietary floating point arithmetic units and 32Mb of local
memory attached to it. The system also includes a subsidiary 25Gb parallel 
file system for handling high volume parallel application I/O. 
 
The system was operated under full maintenance contract 
from May of 1993 until March 20 1997.

On July 16 1997 power was turned off. The machine can be turned back on
in its present location only until Friday, 22/AUG/97 when wiring changes 
are planned in that machine room.

The system was used primarily for research but a description of an 
instructional use made of the machine can be found at
     http://www-erl.mit.edu/eaps/seminar/iap95/cnh/CM5Intro.html

Web sites about other CM5 sites and general information include:
     http://www.math.uic.edu/~hanson/cmg.html
     http://www.acl.lanl.gov/UserInfo/cm5admin.html
     http://ec.msc.edu/CM5/

__________________________________________________________________________
FUTURE MAINTENANCE

People submitting bids may wish to discuss future maintenance issues
with a company that is a present maintainer of CM5 Equipment, 
Connection Machine Services. 
*****************************************************************
* Larry Stewart                * Phone:  (505) 820-1470         *
*                              * Cell:   (505) 690-7799         *
* Account Executive            * FAX:    (505) 820-0810         *
* Connection Machines Services * Home:   (505) 983-9670         *
* 1373 Camino Sin Salida       * Pager   (888) 712-4143         *
* Santa Fe, NM 87501           * E-Mail: stewart@ix.netcom.com  *
*****************************************************************
__________________________________________________________________________








Michael F. McCarthy
MIT Property Office
E19-429
77 Massachusetts Ave.
Cambridge, MA 02139
Ph   (617)253-2779
Fax  (617)253-2444


From owner-parkbench-comm@CS.UTK.EDU Mon Sep  1 05:44:50 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id FAA11838; Mon, 1 Sep 1997 05:44:50 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA07176; Mon, 1 Sep 1997 05:35:14 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA07160; Mon, 1 Sep 1997 05:34:44 -0400 (EDT)
Received: from mordillo (pc297.sis.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA14311; Mon, 1 Sep 97 10:33:06 BST
Date: Mon,  1 Sep 97 10:19:23 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Final Announcement: Fall ParkBench Workshop
To: "Daniel A. Reed"  <reed@cs.uiuc.edu>,
        "J.C.T. Pool"  <jpool@cacr.caltech.edu>, a.j.grant@mcc.ac.uk,
        Antoine Hyaric  <Antoine.Hyaric@comlab.ox.ac.uk>,
        Ed Zaluska  <E.J.Zaluska@ecs.soton.ac.uk>,
        Fritz Ferstl  <ferstl@genias.de>, Hon W Yau  <hwyau@epcc.ed.ac.uk>,
        idh@soton.ac.uk, parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU,
        Paul Messina  <messina@cacr.caltech.edu>,
        R.Rankin@Queens-Belfast.AC.UK, rjc@soton.ac.uk, topic@mcc.ac.uk,
        Wolfgang Genzsch  <getup@genias.de>
Cc: lec@ecs.soton.ac.uk
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Face: ,<'y31|nlb,jCP5?km9\KD+>p9/e?:|$RRhY]e;#`awGHh=mrY.T??#]-*rt}l0*u`k2A7n
 KlqNG"u'-%cS@3|G[%=m%bSB[lfSn5n"gD4CU(j?1y?#SOkm!qw_=p%c#"6g&(+\Oy6T{4CEShal?z
 M)&Gd'Pb6Qc~>SPx{m[F55=]yY>cN>|/m5)T?q`OTjdQL=7-n%NT({;;$P*2[#7ZWL8baLoI_/F89,
 x'u`*$'<|ctKNYTSJuLV=!$QT3bN*>91V,a0Cc"_UsxwMKg\;#W2LZ$!`j?ZWp;byz~;y}2Dz6i7y%
 E&;gfnmI_~}+oifmWXJMHfWeezBL1("ZnFe!rnX[Q|,:IJ?iq+PePa/[3R4
X-Priority: 3 (Normal)
Message-Id: <Chameleon.873106125.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear all,

This is the FINAL ANNOUNCEMENT:

If you would like to attend this workshop please let Lesley Courtney 
(lec@ecs.soton.ac.uk) know by Friday 5th September 1997 at 
the latest as we need to confirm numbers.

Workshop details can be found at

http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/

Regards

Mark



-------------------------------------
Dr Mark Baker
University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 09/01/97 - Time: 10:19:23
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Wed Sep  3 15:37:55 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id PAA20262; Wed, 3 Sep 1997 15:37:55 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id PAA08273; Wed, 3 Sep 1997 15:19:14 -0400 (EDT)
Received: from punt-2.mail.demon.net (punt-2b.mail.demon.net [194.217.242.6]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id PAA08262; Wed, 3 Sep 1997 15:19:10 -0400 (EDT)
Received: from minnow.demon.co.uk ([158.152.73.63]) by punt-2.mail.demon.net
           id aa0626941; 3 Sep 97 17:35 BST
Message-ID: <pin21IA7KYD0Ew2z@minnow.demon.co.uk>
Date: Wed, 3 Sep 1997 16:31:07 +0100
To: parkbench-comm@CS.UTK.EDU
From: Roger Hockney <roger@minnow.demon.co.uk>
Subject: Prototype PICT release 1.0
MIME-Version: 1.0
X-Mailer: Turnpike Version 3.03a <kRL7V2isFfDmnKSZb08I5Tyfx$>

At their last meeting the Parkbench Committee recommended that an
interactive curve fitting tool be produced for the postprocessing and
parametrisation of Parkbench results using the latest Internet Web
technology. I have produced a prototype of such a tool as a Java applet
running on a Web page on the user's machine and called it PICT
(Parkbench Interactive Curve-fitting Tool). This is now ready for
evaluation and testing by the committee.

The tool provides the following features:

(1) Automatic plotting of Low-Level Parkbench output files from a URL
anywhere on the Web (At present limited to New COMMS1 and Raw data, but
easily extended to original COMMS1 and RINF1). This is useful for a
quick comparison of raw data.

(2) Automatic plotting of both 2 and 3-parameter curve-fits which are
produce by the benchmarks. Good for checking the quality of the fits.

(3) Allows manual rescaling of the graph range to suit the data, either
by typing in the required range values or by dragging out a range box
with the mouse.

(4) Allows the 2-parameter and 3-parameter performance curves to be
manually moved about the graph in order to fine tune the fits. The curve
follows the mouse and the RMS and MAX percentage errors are shown as the
curve moves. Alternatively parameter values can be typed in and the
Manual button pressed when the curve for these values will be plotted.

(5) The data file being plotted can be VIEWed and a HELP button provides
a description of the action of each button in a separate windows.

The PICT applet has been built on top of Leigh Brookshaw's 2D plotting
package the URL for which is given at the bottom of the HELP window. The
features under the RESTART button are in his original code, I have just
added the 2-PARA and 3-PARA features.

The applet was developed using JDK1.0 beta on a PC with a 1600x1200
display and works on the PC both locally and from my Web page with
appletview, MSIE 3.02 and Netscape 3.01. It has also been successfully
run on a Solaris Sun with NS3.01, but another Sun user has reported no
graphs and errors due to "wrong applet version". So please report your
experiences (both success and failure please) to me with all the
details.

To play with PICT turn your browser to:

     http://www.minnow.demon.co.uk/pict/source/pict1.html  or 
                                               pict1a.html 

pict1.html asks for 1000x732 pixels and suits PCs best (it's about the
minimum useful size).

pict1a.html asks for 1020x900 pixels and was necessary for the whole
applet to visible on the Sun.

For those wishing to look closer all the source is provided and should
be downloadable. Suggestions for improvement, corrections or
constructive criticism are solicited.

I have asked for an agenda item to be included for the Parkbench meeting
on 11 Sept in Southampton so that PICT can be discussed. I look forward
to seeing some of you there.
-- 
Roger Hockney.  Checkout my new Web page at URL   http://www.minnow.demon.co.uk
University of   and link to my new book: "The Science of Computer Benchmarking"
Westminster UK  suggestions welcome. Know any fish movies or suitable links?

From owner-parkbench-lowlevel@CS.UTK.EDU Wed Sep 10 06:29:15 1997
Return-Path: <owner-parkbench-lowlevel@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id GAA21129; Wed, 10 Sep 1997 06:29:14 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id GAA20815; Wed, 10 Sep 1997 06:31:30 -0400 (EDT)
Received: from sun3.nsfnet-relay.ac.uk (sun3.nsfnet-relay.ac.uk [128.86.8.50]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id GAA20791; Wed, 10 Sep 1997 06:30:47 -0400 (EDT)
Received: from bright.ecs.soton.ac.uk by sun3.nsfnet-relay.ac.uk with JANET SMTP (PP); Wed, 10 Sep 1997 11:30:44 +0100
Received: from landlord.ecs.soton.ac.uk by bright.ecs.soton.ac.uk; Wed, 10 Sep 97 11:32:57 BST
From: Vladimir Getov <vsg@ecs.soton.ac.uk>
Received: from bill.ecs.soton.ac.uk by landlord.ecs.soton.ac.uk; Wed, 10 Sep 97 11:33:16 BST
Date: Wed, 10 Sep 97 11:33:13 BST
Message-Id: <2458.9709101033@bill.ecs.soton.ac.uk>
To: parkbench-lowlevel@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU,
        parkbench-hpf@CS.UTK.EDU
Subject: ParkBench Committee Meeting - tentative Agenda

Dear Colleague,

The ParkBench (Parallel Benchmark Working Group)
will meet in Southampton, U.K. on 
September 11th, 1997 as part of the ParkBench Workshop.

The Workshop site will be the County Hotel in Southampton.

  County Hotel
  Highfield Lane
  Southampton, U.K.
  Phone: +44-(0)1703-359955


Please send us your comments about the tentative agenda:


14:30  Finalize meeting agenda
       Minutes of last meeting (Erich Strohmaier)

14:45  Changes to Current release:
         - Low Level COMMS benchmarks (Vladimir Getov)
         - NAS Parallel Benchmarks (Subhash Saini)

15:15  New benchmarks:
         - HPF Low Level benchmarks (Mark Baker)


15:30  ParkBench Performance Analysis Tools:
         - ParkBench Result Templates (Vladimir Getov and Mark Papiani)
         - Visualization of Parallel Benchmark Results - new GBIS
           (Mark Papiani and Flavio Bergamaschi)
         - Interactive Web-page Curve-fitting of Parallel Performance
           Measurements (Roger Hockney)


16:15  Demonstrations:
         - Java Low-Level Benchmarks (Vladimir Getov)
         - BenchView: Java Tool for Visualization of Parallel Benchmark Results
           (Mark Papiani and Flavio Bergamaschi)
         - PICT: An Interactive Web-page Curve-fitting Tool (Roger Hockney)


16:45  Other activities:
         - "Electronic Benchmarking Journal" - status report (Mark Baker)


       Miscellaneous
       Date and venue for next meeting

17:00       Adjourn


Tony Hey
Vladimir Getov
Erich Strohmaier

From owner-parkbench-comm@CS.UTK.EDU Wed Sep 10 06:40:25 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id GAA21186; Wed, 10 Sep 1997 06:40:25 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id GAA20806; Wed, 10 Sep 1997 06:31:06 -0400 (EDT)
Received: from sun3.nsfnet-relay.ac.uk (sun3.nsfnet-relay.ac.uk [128.86.8.50]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id GAA20791; Wed, 10 Sep 1997 06:30:47 -0400 (EDT)
Received: from bright.ecs.soton.ac.uk by sun3.nsfnet-relay.ac.uk with JANET SMTP (PP); Wed, 10 Sep 1997 11:30:44 +0100
Received: from landlord.ecs.soton.ac.uk by bright.ecs.soton.ac.uk; Wed, 10 Sep 97 11:32:57 BST
From: Vladimir Getov <vsg@ecs.soton.ac.uk>
Received: from bill.ecs.soton.ac.uk by landlord.ecs.soton.ac.uk; Wed, 10 Sep 97 11:33:16 BST
Date: Wed, 10 Sep 97 11:33:13 BST
Message-Id: <2458.9709101033@bill.ecs.soton.ac.uk>
To: parkbench-lowlevel@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU,
        parkbench-hpf@CS.UTK.EDU
Subject: ParkBench Committee Meeting - tentative Agenda

Dear Colleague,

The ParkBench (Parallel Benchmark Working Group)
will meet in Southampton, U.K. on 
September 11th, 1997 as part of the ParkBench Workshop.

The Workshop site will be the County Hotel in Southampton.

  County Hotel
  Highfield Lane
  Southampton, U.K.
  Phone: +44-(0)1703-359955


Please send us your comments about the tentative agenda:


14:30  Finalize meeting agenda
       Minutes of last meeting (Erich Strohmaier)

14:45  Changes to Current release:
         - Low Level COMMS benchmarks (Vladimir Getov)
         - NAS Parallel Benchmarks (Subhash Saini)

15:15  New benchmarks:
         - HPF Low Level benchmarks (Mark Baker)


15:30  ParkBench Performance Analysis Tools:
         - ParkBench Result Templates (Vladimir Getov and Mark Papiani)
         - Visualization of Parallel Benchmark Results - new GBIS
           (Mark Papiani and Flavio Bergamaschi)
         - Interactive Web-page Curve-fitting of Parallel Performance
           Measurements (Roger Hockney)


16:15  Demonstrations:
         - Java Low-Level Benchmarks (Vladimir Getov)
         - BenchView: Java Tool for Visualization of Parallel Benchmark Results
           (Mark Papiani and Flavio Bergamaschi)
         - PICT: An Interactive Web-page Curve-fitting Tool (Roger Hockney)


16:45  Other activities:
         - "Electronic Benchmarking Journal" - status report (Mark Baker)


       Miscellaneous
       Date and venue for next meeting

17:00       Adjourn


Tony Hey
Vladimir Getov
Erich Strohmaier

From owner-parkbench-lowlevel@CS.UTK.EDU Thu Sep 18 18:27:19 1997
Return-Path: <owner-parkbench-lowlevel@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id SAA12991; Thu, 18 Sep 1997 18:27:18 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id SAA29359; Thu, 18 Sep 1997 18:26:21 -0400 (EDT)
Received: from k2.llnl.gov (zosel@k2.llnl.gov [134.9.1.1]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id SAA29352; Thu, 18 Sep 1997 18:26:19 -0400 (EDT)
Received: (from zosel@localhost)
	by k2.llnl.gov (8.8.5/8.8.5/LLNL-Jun96) id PAA07246
	for parkbench-lowlevel@cs.utk.edu; Thu, 18 Sep 1997 15:26:16 -0700 (PDT)
Date: Thu, 18 Sep 1997 15:26:16 -0700 (PDT)
From: Mary E Zosel <zosel@k2.llnl.gov>
Message-Id: <199709182226.PAA07246@k2.llnl.gov>
To: parkbench-lowlevel@CS.UTK.EDU
Subject: any pthreads tests???

Does anyone know of any low-level performance tests for pthreads libraries???
I'm interested in both single processor performance of pthreads calls - 
and also multiprocessor (shared memory) calls ... to measure the overhead
of the calls.
  -mary zosel-   zosel@llnl.gov

From owner-parkbench-lowlevel@CS.UTK.EDU Sun Sep 21 09:13:20 1997
Return-Path: <owner-parkbench-lowlevel@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id JAA08699; Sun, 21 Sep 1997 09:13:20 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id JAA15884; Sun, 21 Sep 1997 09:15:32 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id JAA15877; Sun, 21 Sep 1997 09:15:30 -0400 (EDT)
Received: from mordillo (p41.ascend3.is2.bb.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA10322; Sun, 21 Sep 97 14:15:58 BST
Date: Sun, 21 Sep 97 13:32:56 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Re: any pthreads tests???
To: Mary E Zosel  <zosel@k2.llnl.gov>, parkbench-lowlevel@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
References: <199709182226.PAA07246@k2.llnl.gov> 
Message-Id: <Chameleon.874845447.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Mary,

This has been talked about as one of the activities that Parkbench
would be interested in persuing. But, so far we have not had the
time or man-power to follow up our interests.

Ron Sercely at HP/CTCX was particularly interested in this area. Also,
I know the people at Manchester University wrote a bunch of
Pthreads codes - some were benchmarks - for their KSR machine.

Hope this helps.

Regards

Mark


--- On Thu, 18 Sep 1997 15:26:16 -0700 (PDT)  Mary E Zosel <zosel@k2.llnl.gov> wrote:
> Does anyone know of any low-level performance tests for pthreads libraries???
> I'm interested in both single processor performance of pthreads calls - 
> and also multiprocessor (shared memory) calls ... to measure the overhead
> of the calls.
>   -mary zosel-   zosel@llnl.gov
> 

---------------End of Original Message-----------------

-------------------------------------
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 09/21/97 - Time: 13:32:57
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Wed Sep 24 06:04:19 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id GAA23913; Wed, 24 Sep 1997 06:04:18 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA23163; Wed, 24 Sep 1997 05:46:35 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA23156; Wed, 24 Sep 1997 05:46:26 -0400 (EDT)
Received: from mordillo (pc297.sis.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA29780; Wed, 24 Sep 97 10:47:01 BST
Date: Wed, 24 Sep 97 10:38:39 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: PC timers
To: parkbench-comm@CS.UTK.EDU, parkbench-low-level@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
Message-Id: <Chameleon.875094053.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Can someone suggest the appropriate PC-based timer 
function (MS Visual C++ or Digital Visual Fortran)
to replace the usual gettimeofday call !?

Cheers

Mark

-------------------------------------
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 09/24/97 - Time: 10:38:39
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Thu Sep 25 10:11:01 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id KAA20147; Thu, 25 Sep 1997 10:11:01 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id JAA18087; Thu, 25 Sep 1997 09:24:56 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id JAA18080; Thu, 25 Sep 1997 09:24:53 -0400 (EDT)
Received: from mordillo (pc297.sis.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA12457; Thu, 25 Sep 97 14:25:35 BST
Date: Thu, 25 Sep 97 14:11:59 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: PC Time function
To: parkbench-comm@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
Message-Id: <Chameleon.875193559.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Thanks to all for timer info. I used the C function _ftime()
in the end because it had millisec resolution. Just had
to get a my head around using INTERFACE in F90 to include
the external C function.

I've inserted my version of the _ftime() timer below - I don't think
there are any obvious error in it :-)

I also implemented the dflib F90 function  CALL GETTIM(hour, min, sec, hund) -
this function passed tick2 testing but only has 1/100 sec resolution.

-------------------------------------------------------
double dwalltime00()
{

    struct _timeb timebuf;

    _ftime( &timebuf );

    return (double) timebuf.time + (double) timebuf.millitm / 1000.0;
}

double dwalltime00_()
{
	return dwalltime00();
}

double DWALLTIME00()
{
	return dwalltime00();
}
-------------------------------------------------------




Cheers

Mark




-------------------------------------
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 09/25/97 - Time: 14:11:59
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Tue Oct  7 06:35:04 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id GAA26560; Tue, 7 Oct 1997 06:35:04 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id GAA25697; Tue, 7 Oct 1997 06:10:11 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id GAA25668; Tue, 7 Oct 1997 06:09:40 -0400 (EDT)
Received: from mordillo (pc297.sis.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA05125; Tue, 7 Oct 97 11:09:53 BST
Date: Tue,  7 Oct 97 10:43:49 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Workshop Papers
To: "Aad J. van der Steen"  <steen@fys.ruu.nl>, Charles Grassl  <cmg@cray.com>,
        Clemens Thole  <clemens-august.thole@gmd.de>,
        David Snelling  <snelling@fecit.co.uk>,
        Erich Strohmaier  <erich@CS.UTK.EDU>,
        Grapham Nudd  <Graham.Nudd@dcs.warwick.ac.uk>,
        Klaus Stueben  <klaus.stueben@gmd.de>, parkbench-comm@CS.UTK.EDU,
        Roger Hockney  <roger@minnow.demon.co.uk>,
        Saini Subhash  <saini@nas.nasa.gov>,
        Vladimir Getov  <vsg@ecs.soto.ac.uk>,
        William Gropp  <gropp@mcs.anl.gov>
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
Message-Id: <Chameleon.876218541.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear All,

I am now back in the office and have a small amount of time to follow
up the Parkbench Workshop that took place a few weeks ago.

I would firstly like to thanks everyone who attended - especially
all the speakers. Even though we did not attract hundreds of 
delegates to the workshop, I think the event was very successful 
- but I may be bias...

So, the plans are that in the first instance I will collect the slides
from all the speaker and package them up and put them on the PEMCS
Web site.

We also decided that we would encourage all the speaker to produce
short papers on their talks and put all the workshop paper together
to create a special issue the the PEMCES journal.

Can the speakers therefore send me their slides (I would prefer
powerpoint or word version if possible). I will harrass you further
about a short papers in the near future.

Thanks in advance for your help.

Regards

Mark



-------------------------------------
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 10/07/97 - Time: 10:43:49
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Sun Oct 12 09:55:57 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id JAA28908; Sun, 12 Oct 1997 09:55:57 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id JAA08800; Sun, 12 Oct 1997 09:44:23 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id JAA08793; Sun, 12 Oct 1997 09:44:20 -0400 (EDT)
Received: from mordillo (p26.nas4.is2.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA11347; Sun, 12 Oct 97 14:45:07 BST
Date: Sun, 12 Oct 97 14:35:10 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Equivalent to comms1
To: parkbench-comm@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
Message-Id: <Chameleon.876663429.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Can someone point me at the equivalant of comms1 written in
C - either MPI or sockets (or even PVM if its out there).

Cheers

Mark


-------------------------------------
Dr Mark Baker
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 10/12/97 - Time: 14:35:10
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Mon Oct 13 16:30:04 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id QAA17020; Mon, 13 Oct 1997 16:29:59 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id QAA24297; Mon, 13 Oct 1997 16:02:05 -0400 (EDT)
Received: from dancer.cs.utk.edu (DANCER.CS.UTK.EDU [128.169.92.77]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id QAA24288; Mon, 13 Oct 1997 16:02:03 -0400 (EDT)
From: Philip Mucci <mucci@CS.UTK.EDU>
Received:  by dancer.cs.utk.edu (cf v2.11c-UTK)
          id QAA02925; Mon, 13 Oct 1997 16:02:00 -0400
Date: Mon, 13 Oct 1997 16:02:00 -0400
Message-Id: <199710132002.QAA02925@dancer.cs.utk.edu>
To: mab@sis.port.ac.uk, parkbench-comm@CS.UTK.EDU
Subject: Re: Equivalent to comms1
In-Reply-To: <Chameleon.876663429.mab@mordillo>
X-Mailer: [XMailTool v3.1.2b]


I would check out my mpbench on my web page....
It does PVM and MPI for now...

> Can someone point me at the equivalant of comms1 written in
> C - either MPI or sockets (or even PVM if its out there).
> 
> Cheers
> 
> Mark
> 
> 
> -------------------------------------
> Dr Mark Baker
> CSM, University of Portsmouth, Hants, UK
> Tel: +44 1705 844285	Fax: +44 1705 844006
> E-mail: mab@sis.port.ac.uk
> Date: 10/12/97 - Time: 14:35:10
> URL http://www.sis.port.ac.uk/~mab/
> -------------------------------------
> 

/%*\ Philip J. Mucci | GRA in CS under Dr. JJ Dongarra /*%\
\*%/ http://www.cs.utk.edu/~mucci  PVM/Active Messages \%*/ 

From owner-parkbench-comm@CS.UTK.EDU Mon Oct 20 10:37:14 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id KAA15359; Mon, 20 Oct 1997 10:37:14 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id KAA07990; Mon, 20 Oct 1997 10:19:41 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id KAA07691; Mon, 20 Oct 1997 10:17:09 -0400 (EDT)
Received: from mordillo (pc297.sis.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA16636; Mon, 20 Oct 97 15:17:33 BST
Date: Mon, 20 Oct 97 15:02:39 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: PEMCS Short Article
To: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
Message-Id: <Chameleon.877356527.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear All,

I've just put up (at last!!) the first PEMCES short article at
http://hpc-journals.ecs.soton.ac.uk/PEMCS/Articles/

At the moment there is not much of a "house style" for the format
of the papers and articles - this will hopefully be developed over
the coming months.

I expect to put the first full paper up on the Web in the next week or
so.

Comments, ideas and help with the journal and its Web site are most
welcome.

Regards

Mark

------------------------------------------------------------------------------------------


COMPARING COMMUNICATION PERFORMANCE OF MPI ON THE CRAY RESEARCH T3E-600 AND IBM SP-2 1
	
                                by
                Glenn R. Luecke and James J. Coyle
                    Iowa State University
                 Ames, Iowa 50011-2251, USA
                
                     Waqar ul Haque
                University of Northern British Columbia
               Prince George, British Columbia, Canada V2N 4Z9
                              

Abstract 

This paper reports the performance of the Cray Research T3E and IBM SP-2 on a collection of 
communication tests that use MPI for the message passing. These tests have been designed to 
evaluate the performance of communication patterns that we feel are likely to occur in 
scientific programs. Communication tests were performed for messages of sizes 8 Bytes (B), 
1 KB, 100 KB, and 10 MB with 2, 4, 8, 16, 32 and 64 processors. Both machines provided a high 
level of concurrency for the nearest neighbor communication tests and moderate concurrency on 
the broadcast operations. On the tests used, the T3E significantly outperformed the SP-2 with 
most performance tests being at least three times faster than the SP-2. 


-------------------------------------
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 10/20/97 - Time: 15:02:42
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Sat Oct 25 08:52:33 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id IAA12875; Sat, 25 Oct 1997 08:52:33 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA05256; Sat, 25 Oct 1997 08:41:15 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA05244; Sat, 25 Oct 1997 08:41:05 -0400 (EDT)
Received: from mordillo (p16.nas2.is2.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA01764; Sat, 25 Oct 97 13:41:26 BST
Date: Sat, 25 Oct 97 13:27:24 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Parkbench Workshop Talks - On line
To: Chuck Koelbel  <chk@cs.rice.edu>,
        Clemens Thole  <clemens-august.thole@gmd.de>,
        Grapham Nudd  <Graham.Nudd@dcs.warwick.ac.uk>,
        Guy Robinson  <robinson@arsc.edu>,
        Klaus Stueben  <klaus.stueben@gmd.de>, parkbench-comm@CS.UTK.EDU,
        William Gropp  <gropp@mcs.anl.gov>
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 2 (High)
Message-Id: <Chameleon.877782734.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear All,

I have put the talks received so far up at...

http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/abstracts.html

Please can the speakers who have not passed their talks onto me to
do so.

Thanks in advance.

Regards

Mark


-------------------------------------
Dr Mark Baker
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 10/25/97 - Time: 13:27:25
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Fri Oct 31 08:22:47 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id IAA19412; Fri, 31 Oct 1997 08:22:46 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id HAA15140; Fri, 31 Oct 1997 07:44:09 -0500 (EST)
Received: from post.mail.demon.net (post-20.mail.demon.net [194.217.242.27]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id HAA15133; Fri, 31 Oct 1997 07:44:05 -0500 (EST)
Received: from minnow.demon.co.uk ([158.152.73.63]) by post.mail.demon.net
           id aa2017784; 31 Oct 97 12:25 GMT
Message-ID: <uFBOiBAJ2cW0EwdA@minnow.demon.co.uk>
Date: Fri, 31 Oct 1997 12:22:33 +0000
To: parkbench-comm@CS.UTK.EDU
From: Roger Hockney <roger@minnow.demon.co.uk>
Subject: Announcing PICT2
MIME-Version: 1.0
X-Mailer: Turnpike Version 3.03a <kRL7V2isFfDmnKSZb08I5Tyfx$>

                         ANNOUNCING PICT2
                         ++++++++++++++++

The prototype Parkbench Interactive Curve Fitting Tool (PICT1) that was
demonstrated at the Southampton meeting of Parkbench in September was
difficult to use on small screens because the image was too large and
could not be reduced in size to suit the users' screen size. Sorry, I
had developed it on my own 1600x1200 display without realising that most
users considered 800x600 as large!

Well the new version PICT2 that is now on my web page allows for the
full range of screen sizes: 640x480, 800x600, 1024x768, >=1600x1200, and
also allows the user to customise his own display by selecting a font
size and screen width and height. So the new version should be usable by
all -- I hope!   

Another problem at Southampton was that the display workstation was very
old and too slow in MHz to do the job. I use a P133 Pentium and the
graphs lines move around instantly, but if you only have a 20MHz machine
for example the response wil probably be too slow to be useful for real
curve interactive fitting. There is nothing I can do about this except
to suggest that you use the need to use PICT as an excuse (I mean
justification) to upgrade your equipment.

PICT2 still relies on the use of New COMMS1 to compute the least square
2-para fit and the 3-point fit fot the 3-para. The next step will be to
put these features in PICT but that is a fair amount of code to get
right and I thought it best to solve the screen-size problem first. But
remember the key point about PICT is that it allows Interactive manual
fitting and display that is not otherwise available.

To try out PICT2 turn your browser to:

             http://www.minnow.demon.co.uk/pict/source/pict2a.html

and follow the instructions. When you have a good PICT Frame displayed,
press the HELP button for a description of the button actions.

Please report problems, experiences (good and bad), suggestions to me
at:

             roger@minnow.demon.co.uk

I need feedback in order to improve the tool.  

Best wishes to you all

Roger
-- 
Roger Hockney.  Checkout my new Web page at URL   http://www.minnow.demon.co.uk
University of   and link to my new book: "The Science of Computer Benchmarking"
Westminster UK  suggestions welcome. Know any fish movies or suitable links?

From owner-parkbench-comm@CS.UTK.EDU Tue Nov 11 06:21:05 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id GAA18373; Tue, 11 Nov 1997 06:21:05 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id GAA27963; Tue, 11 Nov 1997 06:06:45 -0500 (EST)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id GAA27930; Tue, 11 Nov 1997 06:06:15 -0500 (EST)
Received: from mordillo (pc297.sis.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA23083; Tue, 11 Nov 97 11:07:22 GMT
Date: Tue, 11 Nov 97 11:00:36 GMT
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Couple of Announcements
To: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
Message-Id: <Chameleon.879246493.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

A couple of announcements...

Firstly, the majority of the papers presented at  Fall ParkBench Workshop 
on Thursday 11th /Friday 12th September 1997 at the University of Southampton,
are now on-line and can be found at...

http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/abstracts.html

or 

>From http://hpc-journals.ecs.soton.ac.uk/PEMCS/ and click on News in the left frame...

Secondly, the first full paper for the electronic journal Performance Evaluation 
and Modelling of Computer Systems (PEMCS)

"PERFORM - A Fast Simulator For Estimating Program Execution Time" By Alistair
Dunlop and Tony Hey,  Department Electronics and Computer Science University of 
Southampton Southampton, SO17 1BJ, U.K. 

Can be found at...

http://hpc-journals.ecs.soton.ac.uk/PEMCS/Papers/vol1.html

See you'll at the Parkbench BOF at SC'97...


Mark



-------------------------------------
Dr Mark Baker
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 11/11/97 - Time: 11:00:36
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-lowlevel@CS.UTK.EDU Wed Nov 12 21:30:42 1997
Return-Path: <owner-parkbench-lowlevel@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id VAA13985; Wed, 12 Nov 1997 21:30:42 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id VAA06841; Wed, 12 Nov 1997 21:31:46 -0500 (EST)
Received: from rudolph.cs.utk.edu (RUDOLPH.CS.UTK.EDU [128.169.92.87]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id VAA06806; Wed, 12 Nov 1997 21:31:01 -0500 (EST)
Received: from localhost by rudolph.cs.utk.edu with SMTP (cf v2.11c-UTK)
          id VAA24812; Wed, 12 Nov 1997 21:31:01 -0500
Date: Wed, 12 Nov 1997 21:31:00 -0500 (EST)
From: Erich Strohmaier <erich@CS.UTK.EDU>
To: parkbench-hpf@CS.UTK.EDU, parkbench-lowlevel@CS.UTK.EDU,
        parkbench-comm@CS.UTK.EDU
Subject: ParkBench BOF session at the SC'97
Message-ID: <Pine.SUN.3.96.971112212856.24760D-100000@rudolph.cs.utk.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear Colleague,

  The ParkBench (PARallel Kernels and BENCHmarks) committee has
organized a BOF session at the SC'97 in San Jose.

   Room:   Convention Center Room C1 
   Time:   Wednesday  5:30pm              


We will talk about the latest release, new results available and future
plans.

                Tentative Agenda of the BOF

   - Introduction, background, WWW-Server 
   - Current Release of ParkBench
   - Low Level Performance Evaluation Tools
   - LinAlg Kernel Benchmarks
   - NAS Parallel Benchmarks,  including latest results
   - Plans for the next Release 
   - Electronic Journal of Performance Evaluation and Modeling
     for Computer Systems
   - Questions from the floor / discussion 


  Please mark your calendar and plan to attend.


Jack Dongarra
Tony Hey 
Erich Strohmaier



From owner-parkbench-comm@CS.UTK.EDU Wed Nov 12 21:46:18 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id VAA14031; Wed, 12 Nov 1997 21:46:17 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id VAA06813; Wed, 12 Nov 1997 21:31:03 -0500 (EST)
Received: from rudolph.cs.utk.edu (RUDOLPH.CS.UTK.EDU [128.169.92.87]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id VAA06806; Wed, 12 Nov 1997 21:31:01 -0500 (EST)
Received: from localhost by rudolph.cs.utk.edu with SMTP (cf v2.11c-UTK)
          id VAA24812; Wed, 12 Nov 1997 21:31:01 -0500
Date: Wed, 12 Nov 1997 21:31:00 -0500 (EST)
From: Erich Strohmaier <erich@CS.UTK.EDU>
To: parkbench-hpf@CS.UTK.EDU, parkbench-lowlevel@CS.UTK.EDU,
        parkbench-comm@CS.UTK.EDU
Subject: ParkBench BOF session at the SC'97
Message-ID: <Pine.SUN.3.96.971112212856.24760D-100000@rudolph.cs.utk.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear Colleague,

  The ParkBench (PARallel Kernels and BENCHmarks) committee has
organized a BOF session at the SC'97 in San Jose.

   Room:   Convention Center Room C1 
   Time:   Wednesday  5:30pm              


We will talk about the latest release, new results available and future
plans.

                Tentative Agenda of the BOF

   - Introduction, background, WWW-Server 
   - Current Release of ParkBench
   - Low Level Performance Evaluation Tools
   - LinAlg Kernel Benchmarks
   - NAS Parallel Benchmarks,  including latest results
   - Plans for the next Release 
   - Electronic Journal of Performance Evaluation and Modeling
     for Computer Systems
   - Questions from the floor / discussion 


  Please mark your calendar and plan to attend.


Jack Dongarra
Tony Hey 
Erich Strohmaier



From owner-parkbench-lowlevel@CS.UTK.EDU Thu Nov 13 06:30:40 1997
Return-Path: <owner-parkbench-lowlevel@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id GAA07097; Thu, 13 Nov 1997 06:30:40 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA01844; Thu, 13 Nov 1997 05:55:24 -0500 (EST)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA01835; Thu, 13 Nov 1997 05:55:18 -0500 (EST)
Received: from mordillo (p19.nas2.is2.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA18430; Thu, 13 Nov 97 10:56:11 GMT
Date: Thu, 13 Nov 97 10:48:53 GMT
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Fall 97 Parkbench Committee Meeting Minutes
To: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU,
        parkbench-lowlevel@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
References: <Pine.SUN.3.96.971112212856.24760D-100000@rudolph.cs.utk.edu> 
Message-Id: <Chameleon.879418489.mab@mordillo>
Mime-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="mordillo:879418490:877:126:21579"

--mordillo:879418490:877:126:21579
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear All,

Here are the minutes of the Parkbench committee meeting held The County
Hotel in Southampton during the Fall 97 Parkbench Workshop.

For those of you with a MIME-compliant mail-reader I've attached a formatted
word 7 doc.

Regards

Mark

-----------------------------------------------------------------------------

Parkbench Committee Meeting
Held during the Fall Parkbench Workshop

The County Hotel
Southampton, UK

1515,  11th September 1997


Meeting Participation List:

Mark Baker - Univ. of Portsmouth (mab@sis.port.ac.uk)
Flavio Bergamaschi  - Univ of Southampton (fab@ecs.soton.ac.uk)
Jack Dongarra - Univ. of Tenn./ORNL (dongarra@cs.utk.edu)
Vladimir Getov  - Univ. of Westminister (getovv@wmin.ac.uk)
Charles Grassl - SGI/Cray (cmg@cray.com)
William Gropp - ANL (gropp@mcs.anl.gov)
Tony Hey - Univ. of Southampton (ajgh@ecs.soton.ac.uk)
Roger Hockney - Univ. of Westminister (roger@minnow.demon.co.uk)
Mark Papiani - Univ of Southampton (mp@ecs.soton.ac.uk)
Subhash Saini - NASA Ames (saini@nas.nasa.gov)
Dave Snelling - FECIT (snelling@fecit.co.uk)
Aad J. van der Steen  - RUU (steen@fys.ruu.nl)
Erich Strohmaier - Univ. of Tennessee (erich@cs.utk.edu)
Klaus Stueben - GMD  (klaus.stueben@gmd.de)

Meeting Activities and Actions

Tony Hey chaired the meeting.

Minutes from last meeting were seven pages long and it was decided that only the actions from the last 

meeting would be reviewed. The actions from last meeting were reviewed - a short discussion about each 

took place. A discussion about interaction with SPEC-HPG was initiated.

Comms Low-Level Benchmarks 

Vladimir Getov gave a short presentation on the current status of the Parkbench Comms benchmarks.  
Charles Grassl was asked to explained how his new Comms programs worked and the rationale behind it. 
A long discussion ensued.

Action - Create a formal proposal  of alternative or additions to the comms low-level benchmarks for 
SC'97 
- Charles Grassl.

Action - Members should look at the PALLAS version of the low-level benchmarks (based on 
Genesis/RAPS).

Action  - Erich  Strohmaier and Vladimir Getov will discuss the efforts needed to split up Parkbench 
and 
add in the new Comms1 benchmark (with new curve fitting routine).

NPB - Subhash Siani reported on the status of the NAS Parallel Benchmarks

HPF - Mark Baker read Chuck Koebel's email about CEWES HPCM HPF efforts.

Action - Subhash Siani will let RICE know that Gina should start of from the single NAS codes

Electronic Journal - Mark Baker and Tony Hey reported on the electronic journal PEMCS and its Web 
site. It was agreed that this would be discussed  further informally.

Parkbench Report -Erich Strohmaier reported on the efforts of creating a new Parkbench report. A short 

discussion about this ensued.

Action - Jack Dongarra /Tony Hey will talk to other members about the potential efforts that could be 
put 
into a Parkbench report II by SC'97.

Funding Efforts

Jack Dongarra's recent benchmarking  proposal was turned down. Tony Hey mentioned the possibly of 
entering a proposal to the EU.
Possibility of a joint EU / NSF bid.

Mark Baker asked if SIO would be interested in being more closely involved.  William Gropp reported 
that 
SIO was actually winding down and so formal association was not really an option.

AOB

The participants were then invited by Tony to move to the University of Southampton (bldg. 16) for the 

Parkbench demonstrations which included:

-- Java Low-Level Benchmarks (Vladimir Getov)
-- BenchView: Java Tool for Visualization of Parallel Benchmark Results (Mark Papiani and Flavio
   Bergamaschi)
-- PICT: An Interactive Web-page Curve-fitting Tool (Roger Hockney)

Jack Dongarra  informed the committee of  Parkbench BOF at SC'97 (Wednesday at 3.30PM).

The meeting was wound up by Tony Hey at 1630.

-----------------------------------------------------------------------------




-------------------------------------
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 11/13/97 - Time: 10:48:53
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------

--mordillo:879418490:877:126:21579
Content-Type: APPLICATION/msword; name="minutes-fall-97.doc"
Content-Transfer-Encoding: BASE64
Content-Description: minutes-fall-97.doc

0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAAB
AAAAEQAAAAAAAAAAEAAAEgAAAAEAAAD+////AAAAABAAAAD/////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
///////////////////////cpWgAY+AJBAAAAABlAAAAAAAAAAAAAAAAAwAA
hxAAABAeAAAAAAAAAAAAAAAAAAAAAAAAhw0AAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAABgAAGoAAAAAGAAAagAAAGoYAAAAAAAAahgAAAAA
AABqGAAAAAAAAGoYAAAAAAAAahgAABQAAACkGAAAAAAAAKQYAAAAAAAApBgA
AAAAAACkGAAAAAAAAKQYAAAAAAAApBgAAAoAAACuGAAAEAAAAKQYAAAAAAAA
Eh0AAHwAAAC+GAAAAAAAAL4YAAAAAAAAvhgAAAAAAAC+GAAAAAAAAL4YAAAA
AAAAvhgAAAAAAAC+GAAAAAAAAL4YAAAAAAAABxoAAAIAAAAJGgAAAAAAAAka
AAAAAAAACRoAAEsAAABUGgAAUAEAAKQbAABQAQAA9BwAAB4AAACOHQAAWAAA
AOYdAAAqAAAAEh0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAahgAAAAAAAC+GAAA
AAAAAAAACQAKAAEAAgC+GAAAAAAAAL4YAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AL4YAAAAAAAAvhgAAAAAAAASHQAAAAAAANQYAAAAAAAAahgAAAAAAABqGAAA
AAAAAL4YAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL4YAAAAAAAA1BgAAAAAAADU
GAAAAAAAANQYAAAAAAAAvhgAABYAAABqGAAAAAAAAL4YAAAAAAAAahgAAAAA
AAC+GAAAAAAAAAcaAAAAAAAAAAAAAAAAAAAQq9KCIvC8AX4YAAAOAAAAjBgA
ABgAAABqGAAAAAAAAGoYAAAAAAAAahgAAAAAAABqGAAAAAAAAL4YAAAAAAAA
BxoAAAAAAADUGAAAMwEAANQYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAABQYXJrYmVuY2ggQ29tbWl0dGVlIE1lZXRp
bmcNDUhlbGQgZHVyaW5nIHRoZSBGYWxsIFBhcmtiZW5jaCBXb3Jrc2hvcA0N
VGhlIENvdW50eSBIb3RlbA0NU291dGhhbXB0b24sIFVLDQ0xNTE1LCAgMTF0
aCBTZXB0ZW1iZXIgMTk5Nw0NDU1lZXRpbmcgUGFydGljaXBhdGlvbiBMaXN0
Og0NTWFyayBCYWtlciAtIFVuaXYuIG9mIFBvcnRzbW91dGggKG1hYkBzaXMu
cG9ydC5hYy51aykNRmxhdmlvIEJlcmdhbWFzY2hpICAtIFVuaXYgb2YgU291
dGhhbXB0b24gKGZhYkBlY3Muc290b24uYWMudWspDUphY2sgRG9uZ2FycmEg
LSBVbml2LiBvZiBUZW5uLi9PUk5MIChkb25nYXJyYUBjcy51dGsuZWR1KQ1W
bGFkaW1pciBHZXRvdiAgLSBVbml2LiBvZiBXZXN0bWluaXN0ZXIgKGdldG92
dkB3bWluLmFjLnVrKQ1DaGFybGVzIEdyYXNzbCAtIFNHSS9DcmF5IChjbWdA
Y3JheS5jb20pDVdpbGxpYW0gR3JvcHAgLSBBTkwgKGdyb3BwQG1jcy5hbmwu
Z292KQ1Ub255IEhleSAtIFVuaXYuIG9mIFNvdXRoYW1wdG9uIChhamdoQGVj
cy5zb3Rvbi5hYy51aykNUm9nZXIgSG9ja25leSAtIFVuaXYuIG9mIFdlc3Rt
aW5pc3RlciAocm9nZXJAbWlubm93LmRlbW9uLmNvLnVrKQ1NYXJrIFBhcGlh
bmkgLSBVbml2IG9mIFNvdXRoYW1wdG9uIChtcEBlY3Muc290b24uYWMudWsp
DVN1Ymhhc2ggU2FpbmkgLSBOQVNBIEFtZXMgKHNhaW5pQG5hcy5uYXNhLmdv
dikNRGF2ZSBTbmVsbGluZyAtIEZFQ0lUIChzbmVsbGluZ0BmZWNpdC5jby51
aykNQWFkIEouIHZhbiBkZXIgU3RlZW4gIC0gUlVVIChzdGVlbkBmeXMucnV1
Lm5sKQ1FcmljaCBTdHJvaG1haWVyIC0gVW5pdi4gb2YgVGVubmVzc2VlIChl
cmljaEBjcy51dGsuZWR1KQ1LbGF1cyBTdHVlYmVuIC0gR01EICAoa2xhdXMu
c3R1ZWJlbkBnbWQuZGUpDQ1NZWV0aW5nIEFjdGl2aXRpZXMgYW5kIEFjdGlv
bnMNDVRvbnkgSGV5IGNoYWlyZWQgdGhlIG1lZXRpbmcuDQ1NaW51dGVzIGZy
b20gbGFzdCBtZWV0aW5nIHdlcmUgc2V2ZW4gcGFnZXMgbG9uZyBhbmQgaXQg
d2FzIGRlY2lkZWQgdGhhdCBvbmx5IHRoZSBhY3Rpb25zIGZyb20gdGhlIGxh
c3QgbWVldGluZyB3b3VsZCBiZSByZXZpZXdlZC4gVGhlIGFjdGlvbnMgZnJv
bSBsYXN0IG1lZXRpbmcgd2VyZSByZXZpZXdlZCAtIGEgc2hvcnQgZGlzY3Vz
c2lvbiBhYm91dCBlYWNoIHRvb2sgcGxhY2UuIEEgZGlzY3Vzc2lvbiBhYm91
dCBpbnRlcmFjdGlvbiB3aXRoIFNQRUMtSFBHIHdhcyBpbml0aWF0ZWQuDQ1D
b21tcyBMb3ctTGV2ZWwgQmVuY2htYXJrcyANDVZsYWRpbWlyIEdldG92IGdh
dmUgYSBzaG9ydCBwcmVzZW50YXRpb24gb24gdGhlIGN1cnJlbnQgc3RhdHVz
IG9mIHRoZSBQYXJrYmVuY2ggQ29tbXMgYmVuY2htYXJrcy4gIENoYXJsZXMg
R3Jhc3NsIHdhcyBhc2tlZCB0byBleHBsYWluZWQgaG93IGhpcyBuZXcgQ29t
bXMgcHJvZ3JhbXMgd29ya2VkIGFuZCB0aGUgcmF0aW9uYWxlIGJlaGluZCBp
dC4gDUEgbG9uZyBkaXNjdXNzaW9uIGVuc3VlZC4NDUFjdGlvbiAtIENyZWF0
ZSBhIGZvcm1hbCBwcm9wb3NhbCAgb2YgYWx0ZXJuYXRpdmUgb3IgYWRkaXRp
b25zIHRvIHRoZSBjb21tcyBsb3ctbGV2ZWwgYmVuY2htYXJrcyBmb3IgU0OS
OTcgLSBDaGFybGVzIEdyYXNzbC4NDUFjdGlvbiAtIE1lbWJlcnMgc2hvdWxk
IGxvb2sgYXQgdGhlIFBBTExBUyB2ZXJzaW9uIG9mIHRoZSBsb3ctbGV2ZWwg
YmVuY2htYXJrcyAoYmFzZWQgb24gR2VuZXNpcy9SQVBTKS4NDUFjdGlvbiAg
LSBFcmljaCAgU3Ryb2htYWllciBhbmQgVmxhZGltaXIgR2V0b3Ygd2lsbCBk
aXNjdXNzIHRoZSBlZmZvcnRzIG5lZWRlZCB0byBzcGxpdCB1cCBQYXJrYmVu
Y2ggYW5kIGFkZCBpbiB0aGUgbmV3IENvbW1zMSBiZW5jaG1hcmsgKHdpdGgg
bmV3IGN1cnZlIGZpdHRpbmcgcm91dGluZSkuDQ1OUEIgLSBTdWJoYXNoIFNp
YW5pIHJlcG9ydGVkIG9uIHRoZSBzdGF0dXMgb2YgdGhlIE5BUyBQYXJhbGxl
bCBCZW5jaG1hcmtzDQ1IUEYgLSBNYXJrIEJha2VyIHJlYWQgQ2h1Y2sgS29l
YmVsknMgZW1haWwgYWJvdXQgQ0VXRVMgSFBDTSBIUEYgZWZmb3J0cy4NDUFj
dGlvbiAtIFN1Ymhhc2ggU2lhbmkgd2lsbCBsZXQgUklDRSBrbm93IHRoYXQg
R2luYSBzaG91bGQgc3RhcnQgb2YgZnJvbSB0aGUgc2luZ2xlIE5BUyBjb2Rl
cw0NRWxlY3Ryb25pYyBKb3VybmFsIC0gTWFyayBCYWtlciBhbmQgVG9ueSBI
ZXkgcmVwb3J0ZWQgb24gdGhlIGVsZWN0cm9uaWMgam91cm5hbCBQRU1DUyBh
bmQgaXRzIFdlYiBzaXRlLiBJdCB3YXMgYWdyZWVkIHRoYXQgdGhpcyB3b3Vs
ZCBiZSBkaXNjdXNzZWQgIGZ1cnRoZXIgaW5mb3JtYWxseS4NDVBhcmtiZW5j
aCBSZXBvcnQgLUVyaWNoIFN0cm9obWFpZXIgcmVwb3J0ZWQgb24gdGhlIGVm
Zm9ydHMgb2YgY3JlYXRpbmcgYSBuZXcgUGFya2JlbmNoIHJlcG9ydC4gQSBz
aG9ydCBkaXNjdXNzaW9uIGFib3V0IHRoaXMgZW5zdWVkLg0NQWN0aW9uIC0g
SmFjayBEb25nYXJyYSAvVG9ueSBIZXkgd2lsbCB0YWxrIHRvIG90aGVyIG1l
bWJlcnMgYWJvdXQgdGhlIHBvdGVudGlhbCBlZmZvcnRzIHRoYXQgY291bGQg
YmUgcHV0IGludG8gYSBQYXJrYmVuY2ggcmVwb3J0IElJIGJ5IFNDkjk3Lg0N
RnVuZGluZyBFZmZvcnRzDQ1KYWNrIERvbmdhcnJhknMgcmVjZW50IGJlbmNo
bWFya2luZyAgcHJvcG9zYWwgd2FzIHR1cm5lZCBkb3duLiBUb255IEhleSBt
ZW50aW9uZWQgdGhlIHBvc3NpYmx5IG9mIGVudGVyaW5nIGEgcHJvcG9zYWwg
dG8gdGhlIEVVLg1Qb3NzaWJpbGl0eSBvZiBhIGpvaW50IEVVIC8gTlNGIGJp
ZC4NDU1hcmsgQmFrZXIgYXNrZWQgaWYgU0lPIHdvdWxkIGJlIGludGVyZXN0
ZWQgaW4gYmVpbmcgbW9yZSBjbG9zZWx5IGludm9sdmVkLiAgV2lsbGlhbSBH
cm9wcCByZXBvcnRlZCB0aGF0IFNJTyB3YXMgYWN0dWFsbHkgd2luZGluZyBk
b3duIGFuZCBzbyBmb3JtYWwgYXNzb2NpYXRpb24gd2FzIG5vdCByZWFsbHkg
YW4gb3B0aW9uLg0NQU9CDQ1UaGUgcGFydGljaXBhbnRzIHdlcmUgdGhlbiBp
bnZpdGVkIGJ5IFRvbnkgdG8gbW92ZSB0byB0aGUgVW5pdmVyc2l0eSBvZiBT
b3V0aGFtcHRvbiAoYmxkZy4gMTYpIGZvciB0aGUgUGFya2JlbmNoIGRlbW9u
c3RyYXRpb25zIHdoaWNoIGluY2x1ZGVkOg0NSmF2YSBMb3ctTGV2ZWwgQmVu
Y2htYXJrcyAoVmxhZGltaXIgR2V0b3YpDUJlbmNoVmlldzogSmF2YSBUb29s
IGZvciBWaXN1YWxpemF0aW9uIG9mIFBhcmFsbGVsIEJlbmNobWFyayBSZXN1
bHRzIChNYXJrIFBhcGlhbmkgYW5kIEZsYXZpbyBCZXJnYW1hc2NoaSkNUElD
VDogQW4gSW50ZXJhY3RpdmUgV2ViLXBhZ2UgQ3VydmUtZml0dGluZyBUb29s
IChSb2dlciBIb2NrbmV5KQ0NSmFjayBEb25nYXJyYSAgaW5mb3JtZWQgdGhl
IGNvbW1pdHRlZSBvZiAgUGFya2JlbmNoIEJPRiBhdCBTQ5I5NyAoV2VkbmVz
ZGF5IGF0IDMuMzBQTSkuDQ1UaGUgbWVldGluZyB3YXMgd291bmQgdXAgYnkg
VG9ueSBIZXkgYXQgMTYzMC4NFQCk0C+l4D2mCAenCAeooAWpoAWqAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwAAHQMAAGgD
AAByAwAAdAMAAIYDAAChAwAAogMAAMUDAADXAwAABAQAABcEAAA+BAAAUQQA
AHwEAACNBAAAqgQAALYEAADNBAAA3gQAAAEFAAAVBQAAPgUAAFYFAABvBQAA
jgUAAKsFAAC9BQAA1gUAAOoFAAAJBgAAGQYAAEIGAABSBgAAagYAAH4GAACB
BgAAoAYAANcHAADzBwAARAgAAEkIAACJCAAAjggAANgIAADeCAAAVgkAAFwJ
AABeCQAAvwkAAMUJAAA3CgAAPQoAAGsKAABuCgAAtgoAALkKAAAACwAABgsA
AF8LAABxCwAACAwAABgMAACODAAAlAwAAB4NAAAtDQAALg0AAJIOAACVDgAA
hxAAAJ4QAAD79gD0APHvAO0A7QDtAO0A7QDrAO0A7QDtAO0A7QDtAO0A7QDm
APEA7QDtAOMA4+EA4wDtAPEA8QDjAPEA8QDjAPHvAPEA3wAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJ1AQACVoEABFWBVoEA
CFWBXQMAYxgAAANdBQADXQQAA10DAAVVgV0DAAJoAQAIVYFdAwBjHAAACFWB
XQMAYyQARwADAAAcAwAAHQMAAEUDAABGAwAAVwMAAFgDAABoAwAAaQMAAIQD
AACFAwAAhgMAAKIDAACjAwAA2QMAABkEAABTBAAAjwQAALgEAADgBAAAFwUA
AFgFAACQBQAAvwUAAOwFAAAbBgAAVAYAAIAGAACBBgAAoAYAAKEGAAC/BgAA
wAYAANYHAADXBwAA8wcAAPQHAAC9CAAA1wgAANgIAAD9AAHAIaIB+gABwCGi
Af0AAcAhRgH9AAHAIUYB/QABwCFGAf0AAcAhRgH9AAHAIUYB/QABwCHrAP0A
AcAh6wD6AAHAIesA+gABwCHrAPoAAcAh6QD6AAHAIesA+gABwCHyAPoAAcAh
8gD6AAHAIfIA+gABwCHyAPoAAcAh8gD6AAHAIfIA+gABwCHyAPoAAcAh8gD6
AAHAIfIA+gABwCHyAPoAAcAh8gD6AAHAIfIA+gABwCHyANwAAcAh8gD6AAHA
IesA+gABwCEWAfoAAcAh6wD6AAHAIesA+gABwCHrAPoAA8Ah6wD6AAHAIesA
+gABwCHpAPoAAcAh6wD6AALAIfIA+gABwCHrAPoAAcAh6wAAAAAAAAAAHQAA
BQMMNP8BAAgAAAEAAAABAGgBAAAAAAAAtwAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAgAABQMAAgAABQEn2AgAAFUJAABWCQAAvgkAAL8JAABq
CgAAawoAALUKAAC2CgAA/woAAAALAABeCwAAXwsAAAcMAAAIDAAAjQwAAI4M
AAAdDQAAHg0AAC4NAAAvDQAAsA0AANUNAADWDQAAkQ4AAJIOAACWDgAAlw4A
ACcPAAAoDwAAUw8AAL4PAAD/DwAAABAAAFgQAABZEAAAhxAAAP0E/8Ah2QH9
AAHAIesA/QT/wCHZAf0AAcAh6wD9BP/AIeAB/QABwCHrAP0AAcAh7gD9AAHA
IesA/QABwCHuAP0AAcAh6wD9AAHAIe4A/QABwCHrAP0E/8Ah2QH9AAHAIesA
/QT/wCHZAf0AAcAh6wD9BP/AIdkB/QABwCHrAP0AAcAh6QD9AAHAIesA/QAC
wCHrAP0AAcAh6wD9AAHAIesA/QACwCHrAP0AAcAh6wD9AAHAIekA/QABwCHr
AP0AAsAh6wD9AAHAIesA2wABwCH6ANsE/8Ah5QHbAAHAIfoA/QABwCHrAP0A
AcAh6wD9AAHAIesA/QABwCHrAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAIQAABQMNCxFoAROY/gw0/wEACAAAAQAAAAEAaAEAAAAA
AAC3AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAFAyQOAA8A
CAABAEsADwAAAAAAGgAAQPH/AgAaAAZOb3JtYWwAAgAAAAMAYQkEAAAAAAAA
AAAAAAAAAAAAAAAAACIAQUDy/6EAIgAWRGVmYXVsdCBQYXJhZ3JhcGggRm9u
dAAAAAAAAAAAAAAAAAAAAIcNAAAEAIcQAAAAAP////8CAAQh//8BAAAg//8C
AAAAAABqBwAAhw0AAAAAAQAAAAEAAAAAAAADAACeEAAACQAAAwAA2AgAAIcQ
AAAKAAsAAAAAAAECAAAVAgAAiQ0AAAcAHAAHADMBC01hcmsgIEJha2VyJEM6
XHRleFxQYXJrQmVuY2hcbWludXRlcy1mYWxsLTk3LmRvYwtNYXJrICBCYWtl
cjNDOlx0ZXhcUGFya0JlbmNoXEFkbWluaXN0cmF0aW9uXG1pbnV0ZXMtZmFs
bC05Ny5kb2MLTWFyayAgQmFrZXIzQzpcdGV4XFBhcmtCZW5jaFxBZG1pbmlz
dHJhdGlvblxtaW51dGVzLWZhbGwtOTcuZG9jC01hcmsgIEJha2VyM0M6XHRl
eFxQYXJrQmVuY2hcQWRtaW5pc3RyYXRpb25cbWludXRlcy1mYWxsLTk3LmRv
YwtNYXJrICBCYWtlcjNDOlx0ZXhcUGFya0JlbmNoXEFkbWluaXN0cmF0aW9u
XG1pbnV0ZXMtZmFsbC05Ny5kb2P/QFRla3Ryb25peCBQaGFzZXIgNTUwIDEy
MDAgZHBpAExQVDE6AHdpbnNwb29sAFRla3Ryb25peCBQaGFzZXIgNTUwIDEy
MDAgZHBpAFRla3Ryb25peCBQaGFzZXIgNTUwIDEyMDAgZHBpAAAAAQQABJwA
tAATzwEAAQABAOoKbwhkAAEADwBYAgIAAQAAAAMAAABMZXR0ZXIAABQAZWVl
ZWVlZWVlZWVlZWVlZWVlZWVlZQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFBSSVbgEAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAYAAAAAAAQJxAnECcAABAnAAAA
AAAAAABjdQgA/wMAAQEBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFRla3Ryb25peCBQaGFzZXIg
NTUwIDEyMDAgZHBpAAAAAQQABJwAtAATzwEAAQABAOoKbwhkAAEADwBYAgIA
AQAAAAMAAABMZXR0ZXIAAAAADwAGAAAACgAwARQAMAEUAHIAcABjAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAFBSSVbgEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAYAAAAAAAQJxAnECcAABAnAAAAAAAAAABjdQgA/wMAAQEBAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAOAAQDHAAAAxwAAAAgAzwDPAMcAAAAAAAAAxwAAAHwAFRaQAQAAVGlt
ZXMgTmV3IFJvbWFuAAwSkAECAFN5bWJvbAAWIpABAAZBcmlhbABIZWx2ZXRp
Y2EAABsmvAIAAEFyaWFsIFJvdW5kZWQgTVQgQm9sZAARNZABAABDb3VyaWVy
IE5ldwARNZABAgBNUyBMaW5lRHJhdwAiAAQAcQiJGAAA0AIAAGgBAAAAANBb
GYa2ahuGAAAAAAcAXAAAAPQBAAAnCwAAAgAFAAAABACDEBcAAAAAAAAAAAAA
AAIAAQAAAAEAAAAAAAAAIQMAAAAAKgAAAAAAAAALTWFyayAgQmFrZXILTWFy
ayAgQmFrZXIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAEAAAACAAAAAwAAAAQAAAAFAAAABgAAAAcA
AAAIAAAACQAAAAoAAAALAAAADAAAAA0AAAAOAAAADwAAAP7////9////FAAA
AP7///8cAAAA/v/////////////////////////////////////////+////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////UgBvAG8AdAAg
AEUAbgB0AHIAeQAAAGspDUphY2sgRG9uZ2FycmEgLSBVbml2LiBvZiBUZW5u
Li9PUk5MIChkbxYABQH//////////wEAAAAACQIAAAAAAMAAAAAAAABGAAAA
AKD5PUK9vrwBEKvSgiLwvAETAAAAQAMAAGdldG9XAG8AcgBkAEQAbwBjAHUA
bQBlAG4AdAAAAHNzbCAtIFNHSS9DcmF5IChjbWdAY3JheS5jb20pDVdpbGxp
YW0gGgACAQIAAAADAAAA/////3BwQG1jcy5hbmwuZ292KQ1Ub255IEhleSAt
IFVuaXYuIG9mIAAAAAAQHgAAdG9uIAEAQwBvAG0AcABPAGIAagAAAC51aykN
Um9nZXIgSG9ja25leSAtIFVuaXYuIG9mIFdlc3RtaW5pc3RlciAocm8SAAIB
////////////////LmNvLnVrKQ1NYXJrIFBhcGlhbmkAAAAAAAAAAAAAAAAA
AAAAAAAAAGoAAABtcEBlBQBTAHUAbQBtAGEAcgB5AEkAbgBmAG8AcgBtAGEA
dABpAG8AbgAAAHMgKHNhaW5pQG5hcy5uYXNhLmdvdikNRCgAAgH/////BAAA
AP////9FQ0lUIChzbmVsbGluZ0BmZWNpdAAAAAAAAAAAAAAAAAAAAAACAAAA
vAEAAHRlZW4BAAAA/v///wMAAAAEAAAABQAAAAYAAAAHAAAACAAAAP7///8K
AAAACwAAAAwAAAD+////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
/////////////////////////////////wEA/v8DCgAA/////wAJAgAAAAAA
wAAAAAAAAEYYAAAATWljcm9zb2Z0IFdvcmQgRG9jdW1lbnQACgAAAE1TV29y
ZERvYwAQAAAAV29yZC5Eb2N1bWVudC42APQ5snEAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAA/v8AAAQAAgAAAAAAAAAAAAAAAAAAAAAAAQAA
AOCFn/L5T2gQq5EIACsns9kwAAAAjAEAABIAAAABAAAAmAAAAAIAAACgAAAA
AwAAAKwAAAAEAAAAuAAAAAUAAADMAAAABgAAANgAAAAHAAAA5AAAAAgAAAD0
AAAACQAAAAgBAAASAAAAFAEAAAoAAAA8AQAACwAAAEgBAAAMAAAAVAEAAA0A
AABgAQAADgAAAGwBAAAPAAAAdAEAABAAAAB8AQAAEwAAAIQBAAACAAAA5AQA
AB4AAAABAAAAAAAGAB4AAAABAAAAAFdSTR4AAAAMAAAATWFyayAgQmFrZXIA
HgAAAAEAAAAAOmkQHgAAAAEAAAAAAAAAHgAAAAcAAABOb3JtYWwAYR4AAAAM
AAAATWFyayAgQmFrZXIAHgAAAAIAAAA3AAQAHgAAAB4AAABNaWNyb3NvZnQg
V29yZCBmb3IgV2luZG93cyA5NQAAAEAAAAAAKC3aDAAAAEAAAAAAAAAABQBE
AG8AYwB1AG0AZQBuAHQAUwB1AG0AbQBhAHIAeQBJAG4AZgBvAHIAbQBhAHQA
aQBvAG4AAAAAAAAAAAAAADgAAgD///////////////8AAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJAAAA6AAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAP///////////////wAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAA////////////////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD/
//////////////8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAD+/wAABAACAAAAAAAAAAAAAAAAAAAAAAABAAAAAtXN
1ZwuGxCTlwgAKyz5rjAAAAC4AAAACAAAAAEAAABIAAAADwAAAFAAAAAEAAAA
dAAAAAUAAAB8AAAABgAAAIQAAAALAAAAjAAAABAAAACUAAAADAAAAJwAAAAC
AAAA5AQAAB4AAAAZAAAAVW5pdmVyc2l0eSBvZiBQb3J0c21vdXRoAAAAAAMA
AAAAOgAAAwAAABcAAAADAAAABQAAAAsAAAAAAAAACwAAAAAAAAAMEAAAAgAA
AB4AAAABAAAAAAMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AABAAAAAADhSnMW+vAFAAAAAANR+ciLwvAEDAAAAAgAAAAMAAAD0AQAAAwAA
ACcLAAADAAAAAAAAAAAAAAD+/wAABAACAAAAAAAAAAAAAAAAAAAAAAABAAAA
AtXN1ZwuGxCTlwgAKyz5rjAAAAC4AAAACAAAAAEAAABIAAAADwAAAFAAAAAE
AAAAdAAAAAUAAAB8AAAABgAAAIQAAAALAAAAjAAAABAAAACUAAAADAAAAJwA
AAACAAAA5AQAAB4AAAAZAAAAVW5pdmVyc2l0eSBvZiBQb3J0c21vdXRoAAAA
AAMAAAAAOgAAAwAAABcAAAADAAAABQAAAAsAAAAAAAAACwAAAAAAAAAMEAAA
AgAAAB4AAAABAAAAAAMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAA

--mordillo:879418490:877:126:21579--

From owner-parkbench-comm@CS.UTK.EDU Thu Nov 13 06:31:53 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id GAA07105; Thu, 13 Nov 1997 06:31:52 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA01880; Thu, 13 Nov 1997 05:56:05 -0500 (EST)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA01835; Thu, 13 Nov 1997 05:55:18 -0500 (EST)
Received: from mordillo (p19.nas2.is2.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA18430; Thu, 13 Nov 97 10:56:11 GMT
Date: Thu, 13 Nov 97 10:48:53 GMT
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Fall 97 Parkbench Committee Meeting Minutes
To: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU,
        parkbench-lowlevel@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
References: <Pine.SUN.3.96.971112212856.24760D-100000@rudolph.cs.utk.edu> 
Message-Id: <Chameleon.879418489.mab@mordillo>
Mime-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="mordillo:879418490:877:126:21579"

--mordillo:879418490:877:126:21579
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear All,

Here are the minutes of the Parkbench committee meeting held The County
Hotel in Southampton during the Fall 97 Parkbench Workshop.

For those of you with a MIME-compliant mail-reader I've attached a formatted
word 7 doc.

Regards

Mark

-----------------------------------------------------------------------------

Parkbench Committee Meeting
Held during the Fall Parkbench Workshop

The County Hotel
Southampton, UK

1515,  11th September 1997


Meeting Participation List:

Mark Baker - Univ. of Portsmouth (mab@sis.port.ac.uk)
Flavio Bergamaschi  - Univ of Southampton (fab@ecs.soton.ac.uk)
Jack Dongarra - Univ. of Tenn./ORNL (dongarra@cs.utk.edu)
Vladimir Getov  - Univ. of Westminister (getovv@wmin.ac.uk)
Charles Grassl - SGI/Cray (cmg@cray.com)
William Gropp - ANL (gropp@mcs.anl.gov)
Tony Hey - Univ. of Southampton (ajgh@ecs.soton.ac.uk)
Roger Hockney - Univ. of Westminister (roger@minnow.demon.co.uk)
Mark Papiani - Univ of Southampton (mp@ecs.soton.ac.uk)
Subhash Saini - NASA Ames (saini@nas.nasa.gov)
Dave Snelling - FECIT (snelling@fecit.co.uk)
Aad J. van der Steen  - RUU (steen@fys.ruu.nl)
Erich Strohmaier - Univ. of Tennessee (erich@cs.utk.edu)
Klaus Stueben - GMD  (klaus.stueben@gmd.de)

Meeting Activities and Actions

Tony Hey chaired the meeting.

Minutes from last meeting were seven pages long and it was decided that only the actions from the last 

meeting would be reviewed. The actions from last meeting were reviewed - a short discussion about each 

took place. A discussion about interaction with SPEC-HPG was initiated.

Comms Low-Level Benchmarks 

Vladimir Getov gave a short presentation on the current status of the Parkbench Comms benchmarks.  
Charles Grassl was asked to explained how his new Comms programs worked and the rationale behind it. 
A long discussion ensued.

Action - Create a formal proposal  of alternative or additions to the comms low-level benchmarks for 
SC'97 
- Charles Grassl.

Action - Members should look at the PALLAS version of the low-level benchmarks (based on 
Genesis/RAPS).

Action  - Erich  Strohmaier and Vladimir Getov will discuss the efforts needed to split up Parkbench 
and 
add in the new Comms1 benchmark (with new curve fitting routine).

NPB - Subhash Siani reported on the status of the NAS Parallel Benchmarks

HPF - Mark Baker read Chuck Koebel's email about CEWES HPCM HPF efforts.

Action - Subhash Siani will let RICE know that Gina should start of from the single NAS codes

Electronic Journal - Mark Baker and Tony Hey reported on the electronic journal PEMCS and its Web 
site. It was agreed that this would be discussed  further informally.

Parkbench Report -Erich Strohmaier reported on the efforts of creating a new Parkbench report. A short 

discussion about this ensued.

Action - Jack Dongarra /Tony Hey will talk to other members about the potential efforts that could be 
put 
into a Parkbench report II by SC'97.

Funding Efforts

Jack Dongarra's recent benchmarking  proposal was turned down. Tony Hey mentioned the possibly of 
entering a proposal to the EU.
Possibility of a joint EU / NSF bid.

Mark Baker asked if SIO would be interested in being more closely involved.  William Gropp reported 
that 
SIO was actually winding down and so formal association was not really an option.

AOB

The participants were then invited by Tony to move to the University of Southampton (bldg. 16) for the 

Parkbench demonstrations which included:

-- Java Low-Level Benchmarks (Vladimir Getov)
-- BenchView: Java Tool for Visualization of Parallel Benchmark Results (Mark Papiani and Flavio
   Bergamaschi)
-- PICT: An Interactive Web-page Curve-fitting Tool (Roger Hockney)

Jack Dongarra  informed the committee of  Parkbench BOF at SC'97 (Wednesday at 3.30PM).

The meeting was wound up by Tony Hey at 1630.

-----------------------------------------------------------------------------




-------------------------------------
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 11/13/97 - Time: 10:48:53
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------

--mordillo:879418490:877:126:21579
Content-Type: APPLICATION/msword; name="minutes-fall-97.doc"
Content-Transfer-Encoding: BASE64
Content-Description: minutes-fall-97.doc

0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAAB
AAAAEQAAAAAAAAAAEAAAEgAAAAEAAAD+////AAAAABAAAAD/////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
///////////////////////cpWgAY+AJBAAAAABlAAAAAAAAAAAAAAAAAwAA
hxAAABAeAAAAAAAAAAAAAAAAAAAAAAAAhw0AAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAABgAAGoAAAAAGAAAagAAAGoYAAAAAAAAahgAAAAA
AABqGAAAAAAAAGoYAAAAAAAAahgAABQAAACkGAAAAAAAAKQYAAAAAAAApBgA
AAAAAACkGAAAAAAAAKQYAAAAAAAApBgAAAoAAACuGAAAEAAAAKQYAAAAAAAA
Eh0AAHwAAAC+GAAAAAAAAL4YAAAAAAAAvhgAAAAAAAC+GAAAAAAAAL4YAAAA
AAAAvhgAAAAAAAC+GAAAAAAAAL4YAAAAAAAABxoAAAIAAAAJGgAAAAAAAAka
AAAAAAAACRoAAEsAAABUGgAAUAEAAKQbAABQAQAA9BwAAB4AAACOHQAAWAAA
AOYdAAAqAAAAEh0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAahgAAAAAAAC+GAAA
AAAAAAAACQAKAAEAAgC+GAAAAAAAAL4YAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AL4YAAAAAAAAvhgAAAAAAAASHQAAAAAAANQYAAAAAAAAahgAAAAAAABqGAAA
AAAAAL4YAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL4YAAAAAAAA1BgAAAAAAADU
GAAAAAAAANQYAAAAAAAAvhgAABYAAABqGAAAAAAAAL4YAAAAAAAAahgAAAAA
AAC+GAAAAAAAAAcaAAAAAAAAAAAAAAAAAAAQq9KCIvC8AX4YAAAOAAAAjBgA
ABgAAABqGAAAAAAAAGoYAAAAAAAAahgAAAAAAABqGAAAAAAAAL4YAAAAAAAA
BxoAAAAAAADUGAAAMwEAANQYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAABQYXJrYmVuY2ggQ29tbWl0dGVlIE1lZXRp
bmcNDUhlbGQgZHVyaW5nIHRoZSBGYWxsIFBhcmtiZW5jaCBXb3Jrc2hvcA0N
VGhlIENvdW50eSBIb3RlbA0NU291dGhhbXB0b24sIFVLDQ0xNTE1LCAgMTF0
aCBTZXB0ZW1iZXIgMTk5Nw0NDU1lZXRpbmcgUGFydGljaXBhdGlvbiBMaXN0
Og0NTWFyayBCYWtlciAtIFVuaXYuIG9mIFBvcnRzbW91dGggKG1hYkBzaXMu
cG9ydC5hYy51aykNRmxhdmlvIEJlcmdhbWFzY2hpICAtIFVuaXYgb2YgU291
dGhhbXB0b24gKGZhYkBlY3Muc290b24uYWMudWspDUphY2sgRG9uZ2FycmEg
LSBVbml2LiBvZiBUZW5uLi9PUk5MIChkb25nYXJyYUBjcy51dGsuZWR1KQ1W
bGFkaW1pciBHZXRvdiAgLSBVbml2LiBvZiBXZXN0bWluaXN0ZXIgKGdldG92
dkB3bWluLmFjLnVrKQ1DaGFybGVzIEdyYXNzbCAtIFNHSS9DcmF5IChjbWdA
Y3JheS5jb20pDVdpbGxpYW0gR3JvcHAgLSBBTkwgKGdyb3BwQG1jcy5hbmwu
Z292KQ1Ub255IEhleSAtIFVuaXYuIG9mIFNvdXRoYW1wdG9uIChhamdoQGVj
cy5zb3Rvbi5hYy51aykNUm9nZXIgSG9ja25leSAtIFVuaXYuIG9mIFdlc3Rt
aW5pc3RlciAocm9nZXJAbWlubm93LmRlbW9uLmNvLnVrKQ1NYXJrIFBhcGlh
bmkgLSBVbml2IG9mIFNvdXRoYW1wdG9uIChtcEBlY3Muc290b24uYWMudWsp
DVN1Ymhhc2ggU2FpbmkgLSBOQVNBIEFtZXMgKHNhaW5pQG5hcy5uYXNhLmdv
dikNRGF2ZSBTbmVsbGluZyAtIEZFQ0lUIChzbmVsbGluZ0BmZWNpdC5jby51
aykNQWFkIEouIHZhbiBkZXIgU3RlZW4gIC0gUlVVIChzdGVlbkBmeXMucnV1
Lm5sKQ1FcmljaCBTdHJvaG1haWVyIC0gVW5pdi4gb2YgVGVubmVzc2VlIChl
cmljaEBjcy51dGsuZWR1KQ1LbGF1cyBTdHVlYmVuIC0gR01EICAoa2xhdXMu
c3R1ZWJlbkBnbWQuZGUpDQ1NZWV0aW5nIEFjdGl2aXRpZXMgYW5kIEFjdGlv
bnMNDVRvbnkgSGV5IGNoYWlyZWQgdGhlIG1lZXRpbmcuDQ1NaW51dGVzIGZy
b20gbGFzdCBtZWV0aW5nIHdlcmUgc2V2ZW4gcGFnZXMgbG9uZyBhbmQgaXQg
d2FzIGRlY2lkZWQgdGhhdCBvbmx5IHRoZSBhY3Rpb25zIGZyb20gdGhlIGxh
c3QgbWVldGluZyB3b3VsZCBiZSByZXZpZXdlZC4gVGhlIGFjdGlvbnMgZnJv
bSBsYXN0IG1lZXRpbmcgd2VyZSByZXZpZXdlZCAtIGEgc2hvcnQgZGlzY3Vz
c2lvbiBhYm91dCBlYWNoIHRvb2sgcGxhY2UuIEEgZGlzY3Vzc2lvbiBhYm91
dCBpbnRlcmFjdGlvbiB3aXRoIFNQRUMtSFBHIHdhcyBpbml0aWF0ZWQuDQ1D
b21tcyBMb3ctTGV2ZWwgQmVuY2htYXJrcyANDVZsYWRpbWlyIEdldG92IGdh
dmUgYSBzaG9ydCBwcmVzZW50YXRpb24gb24gdGhlIGN1cnJlbnQgc3RhdHVz
IG9mIHRoZSBQYXJrYmVuY2ggQ29tbXMgYmVuY2htYXJrcy4gIENoYXJsZXMg
R3Jhc3NsIHdhcyBhc2tlZCB0byBleHBsYWluZWQgaG93IGhpcyBuZXcgQ29t
bXMgcHJvZ3JhbXMgd29ya2VkIGFuZCB0aGUgcmF0aW9uYWxlIGJlaGluZCBp
dC4gDUEgbG9uZyBkaXNjdXNzaW9uIGVuc3VlZC4NDUFjdGlvbiAtIENyZWF0
ZSBhIGZvcm1hbCBwcm9wb3NhbCAgb2YgYWx0ZXJuYXRpdmUgb3IgYWRkaXRp
b25zIHRvIHRoZSBjb21tcyBsb3ctbGV2ZWwgYmVuY2htYXJrcyBmb3IgU0OS
OTcgLSBDaGFybGVzIEdyYXNzbC4NDUFjdGlvbiAtIE1lbWJlcnMgc2hvdWxk
IGxvb2sgYXQgdGhlIFBBTExBUyB2ZXJzaW9uIG9mIHRoZSBsb3ctbGV2ZWwg
YmVuY2htYXJrcyAoYmFzZWQgb24gR2VuZXNpcy9SQVBTKS4NDUFjdGlvbiAg
LSBFcmljaCAgU3Ryb2htYWllciBhbmQgVmxhZGltaXIgR2V0b3Ygd2lsbCBk
aXNjdXNzIHRoZSBlZmZvcnRzIG5lZWRlZCB0byBzcGxpdCB1cCBQYXJrYmVu
Y2ggYW5kIGFkZCBpbiB0aGUgbmV3IENvbW1zMSBiZW5jaG1hcmsgKHdpdGgg
bmV3IGN1cnZlIGZpdHRpbmcgcm91dGluZSkuDQ1OUEIgLSBTdWJoYXNoIFNp
YW5pIHJlcG9ydGVkIG9uIHRoZSBzdGF0dXMgb2YgdGhlIE5BUyBQYXJhbGxl
bCBCZW5jaG1hcmtzDQ1IUEYgLSBNYXJrIEJha2VyIHJlYWQgQ2h1Y2sgS29l
YmVsknMgZW1haWwgYWJvdXQgQ0VXRVMgSFBDTSBIUEYgZWZmb3J0cy4NDUFj
dGlvbiAtIFN1Ymhhc2ggU2lhbmkgd2lsbCBsZXQgUklDRSBrbm93IHRoYXQg
R2luYSBzaG91bGQgc3RhcnQgb2YgZnJvbSB0aGUgc2luZ2xlIE5BUyBjb2Rl
cw0NRWxlY3Ryb25pYyBKb3VybmFsIC0gTWFyayBCYWtlciBhbmQgVG9ueSBI
ZXkgcmVwb3J0ZWQgb24gdGhlIGVsZWN0cm9uaWMgam91cm5hbCBQRU1DUyBh
bmQgaXRzIFdlYiBzaXRlLiBJdCB3YXMgYWdyZWVkIHRoYXQgdGhpcyB3b3Vs
ZCBiZSBkaXNjdXNzZWQgIGZ1cnRoZXIgaW5mb3JtYWxseS4NDVBhcmtiZW5j
aCBSZXBvcnQgLUVyaWNoIFN0cm9obWFpZXIgcmVwb3J0ZWQgb24gdGhlIGVm
Zm9ydHMgb2YgY3JlYXRpbmcgYSBuZXcgUGFya2JlbmNoIHJlcG9ydC4gQSBz
aG9ydCBkaXNjdXNzaW9uIGFib3V0IHRoaXMgZW5zdWVkLg0NQWN0aW9uIC0g
SmFjayBEb25nYXJyYSAvVG9ueSBIZXkgd2lsbCB0YWxrIHRvIG90aGVyIG1l
bWJlcnMgYWJvdXQgdGhlIHBvdGVudGlhbCBlZmZvcnRzIHRoYXQgY291bGQg
YmUgcHV0IGludG8gYSBQYXJrYmVuY2ggcmVwb3J0IElJIGJ5IFNDkjk3Lg0N
RnVuZGluZyBFZmZvcnRzDQ1KYWNrIERvbmdhcnJhknMgcmVjZW50IGJlbmNo
bWFya2luZyAgcHJvcG9zYWwgd2FzIHR1cm5lZCBkb3duLiBUb255IEhleSBt
ZW50aW9uZWQgdGhlIHBvc3NpYmx5IG9mIGVudGVyaW5nIGEgcHJvcG9zYWwg
dG8gdGhlIEVVLg1Qb3NzaWJpbGl0eSBvZiBhIGpvaW50IEVVIC8gTlNGIGJp
ZC4NDU1hcmsgQmFrZXIgYXNrZWQgaWYgU0lPIHdvdWxkIGJlIGludGVyZXN0
ZWQgaW4gYmVpbmcgbW9yZSBjbG9zZWx5IGludm9sdmVkLiAgV2lsbGlhbSBH
cm9wcCByZXBvcnRlZCB0aGF0IFNJTyB3YXMgYWN0dWFsbHkgd2luZGluZyBk
b3duIGFuZCBzbyBmb3JtYWwgYXNzb2NpYXRpb24gd2FzIG5vdCByZWFsbHkg
YW4gb3B0aW9uLg0NQU9CDQ1UaGUgcGFydGljaXBhbnRzIHdlcmUgdGhlbiBp
bnZpdGVkIGJ5IFRvbnkgdG8gbW92ZSB0byB0aGUgVW5pdmVyc2l0eSBvZiBT
b3V0aGFtcHRvbiAoYmxkZy4gMTYpIGZvciB0aGUgUGFya2JlbmNoIGRlbW9u
c3RyYXRpb25zIHdoaWNoIGluY2x1ZGVkOg0NSmF2YSBMb3ctTGV2ZWwgQmVu
Y2htYXJrcyAoVmxhZGltaXIgR2V0b3YpDUJlbmNoVmlldzogSmF2YSBUb29s
IGZvciBWaXN1YWxpemF0aW9uIG9mIFBhcmFsbGVsIEJlbmNobWFyayBSZXN1
bHRzIChNYXJrIFBhcGlhbmkgYW5kIEZsYXZpbyBCZXJnYW1hc2NoaSkNUElD
VDogQW4gSW50ZXJhY3RpdmUgV2ViLXBhZ2UgQ3VydmUtZml0dGluZyBUb29s
IChSb2dlciBIb2NrbmV5KQ0NSmFjayBEb25nYXJyYSAgaW5mb3JtZWQgdGhl
IGNvbW1pdHRlZSBvZiAgUGFya2JlbmNoIEJPRiBhdCBTQ5I5NyAoV2VkbmVz
ZGF5IGF0IDMuMzBQTSkuDQ1UaGUgbWVldGluZyB3YXMgd291bmQgdXAgYnkg
VG9ueSBIZXkgYXQgMTYzMC4NFQCk0C+l4D2mCAenCAeooAWpoAWqAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwAAHQMAAGgD
AAByAwAAdAMAAIYDAAChAwAAogMAAMUDAADXAwAABAQAABcEAAA+BAAAUQQA
AHwEAACNBAAAqgQAALYEAADNBAAA3gQAAAEFAAAVBQAAPgUAAFYFAABvBQAA
jgUAAKsFAAC9BQAA1gUAAOoFAAAJBgAAGQYAAEIGAABSBgAAagYAAH4GAACB
BgAAoAYAANcHAADzBwAARAgAAEkIAACJCAAAjggAANgIAADeCAAAVgkAAFwJ
AABeCQAAvwkAAMUJAAA3CgAAPQoAAGsKAABuCgAAtgoAALkKAAAACwAABgsA
AF8LAABxCwAACAwAABgMAACODAAAlAwAAB4NAAAtDQAALg0AAJIOAACVDgAA
hxAAAJ4QAAD79gD0APHvAO0A7QDtAO0A7QDrAO0A7QDtAO0A7QDtAO0A7QDm
APEA7QDtAOMA4+EA4wDtAPEA8QDjAPEA8QDjAPHvAPEA3wAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJ1AQACVoEABFWBVoEA
CFWBXQMAYxgAAANdBQADXQQAA10DAAVVgV0DAAJoAQAIVYFdAwBjHAAACFWB
XQMAYyQARwADAAAcAwAAHQMAAEUDAABGAwAAVwMAAFgDAABoAwAAaQMAAIQD
AACFAwAAhgMAAKIDAACjAwAA2QMAABkEAABTBAAAjwQAALgEAADgBAAAFwUA
AFgFAACQBQAAvwUAAOwFAAAbBgAAVAYAAIAGAACBBgAAoAYAAKEGAAC/BgAA
wAYAANYHAADXBwAA8wcAAPQHAAC9CAAA1wgAANgIAAD9AAHAIaIB+gABwCGi
Af0AAcAhRgH9AAHAIUYB/QABwCFGAf0AAcAhRgH9AAHAIUYB/QABwCHrAP0A
AcAh6wD6AAHAIesA+gABwCHrAPoAAcAh6QD6AAHAIesA+gABwCHyAPoAAcAh
8gD6AAHAIfIA+gABwCHyAPoAAcAh8gD6AAHAIfIA+gABwCHyAPoAAcAh8gD6
AAHAIfIA+gABwCHyAPoAAcAh8gD6AAHAIfIA+gABwCHyANwAAcAh8gD6AAHA
IesA+gABwCEWAfoAAcAh6wD6AAHAIesA+gABwCHrAPoAA8Ah6wD6AAHAIesA
+gABwCHpAPoAAcAh6wD6AALAIfIA+gABwCHrAPoAAcAh6wAAAAAAAAAAHQAA
BQMMNP8BAAgAAAEAAAABAGgBAAAAAAAAtwAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAgAABQMAAgAABQEn2AgAAFUJAABWCQAAvgkAAL8JAABq
CgAAawoAALUKAAC2CgAA/woAAAALAABeCwAAXwsAAAcMAAAIDAAAjQwAAI4M
AAAdDQAAHg0AAC4NAAAvDQAAsA0AANUNAADWDQAAkQ4AAJIOAACWDgAAlw4A
ACcPAAAoDwAAUw8AAL4PAAD/DwAAABAAAFgQAABZEAAAhxAAAP0E/8Ah2QH9
AAHAIesA/QT/wCHZAf0AAcAh6wD9BP/AIeAB/QABwCHrAP0AAcAh7gD9AAHA
IesA/QABwCHuAP0AAcAh6wD9AAHAIe4A/QABwCHrAP0E/8Ah2QH9AAHAIesA
/QT/wCHZAf0AAcAh6wD9BP/AIdkB/QABwCHrAP0AAcAh6QD9AAHAIesA/QAC
wCHrAP0AAcAh6wD9AAHAIesA/QACwCHrAP0AAcAh6wD9AAHAIekA/QABwCHr
AP0AAsAh6wD9AAHAIesA2wABwCH6ANsE/8Ah5QHbAAHAIfoA/QABwCHrAP0A
AcAh6wD9AAHAIesA/QABwCHrAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAIQAABQMNCxFoAROY/gw0/wEACAAAAQAAAAEAaAEAAAAA
AAC3AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAFAyQOAA8A
CAABAEsADwAAAAAAGgAAQPH/AgAaAAZOb3JtYWwAAgAAAAMAYQkEAAAAAAAA
AAAAAAAAAAAAAAAAACIAQUDy/6EAIgAWRGVmYXVsdCBQYXJhZ3JhcGggRm9u
dAAAAAAAAAAAAAAAAAAAAIcNAAAEAIcQAAAAAP////8CAAQh//8BAAAg//8C
AAAAAABqBwAAhw0AAAAAAQAAAAEAAAAAAAADAACeEAAACQAAAwAA2AgAAIcQ
AAAKAAsAAAAAAAECAAAVAgAAiQ0AAAcAHAAHADMBC01hcmsgIEJha2VyJEM6
XHRleFxQYXJrQmVuY2hcbWludXRlcy1mYWxsLTk3LmRvYwtNYXJrICBCYWtl
cjNDOlx0ZXhcUGFya0JlbmNoXEFkbWluaXN0cmF0aW9uXG1pbnV0ZXMtZmFs
bC05Ny5kb2MLTWFyayAgQmFrZXIzQzpcdGV4XFBhcmtCZW5jaFxBZG1pbmlz
dHJhdGlvblxtaW51dGVzLWZhbGwtOTcuZG9jC01hcmsgIEJha2VyM0M6XHRl
eFxQYXJrQmVuY2hcQWRtaW5pc3RyYXRpb25cbWludXRlcy1mYWxsLTk3LmRv
YwtNYXJrICBCYWtlcjNDOlx0ZXhcUGFya0JlbmNoXEFkbWluaXN0cmF0aW9u
XG1pbnV0ZXMtZmFsbC05Ny5kb2P/QFRla3Ryb25peCBQaGFzZXIgNTUwIDEy
MDAgZHBpAExQVDE6AHdpbnNwb29sAFRla3Ryb25peCBQaGFzZXIgNTUwIDEy
MDAgZHBpAFRla3Ryb25peCBQaGFzZXIgNTUwIDEyMDAgZHBpAAAAAQQABJwA
tAATzwEAAQABAOoKbwhkAAEADwBYAgIAAQAAAAMAAABMZXR0ZXIAABQAZWVl
ZWVlZWVlZWVlZWVlZWVlZWVlZQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFBSSVbgEAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAYAAAAAAAQJxAnECcAABAnAAAA
AAAAAABjdQgA/wMAAQEBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFRla3Ryb25peCBQaGFzZXIg
NTUwIDEyMDAgZHBpAAAAAQQABJwAtAATzwEAAQABAOoKbwhkAAEADwBYAgIA
AQAAAAMAAABMZXR0ZXIAAAAADwAGAAAACgAwARQAMAEUAHIAcABjAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAFBSSVbgEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAYAAAAAAAQJxAnECcAABAnAAAAAAAAAABjdQgA/wMAAQEBAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAOAAQDHAAAAxwAAAAgAzwDPAMcAAAAAAAAAxwAAAHwAFRaQAQAAVGlt
ZXMgTmV3IFJvbWFuAAwSkAECAFN5bWJvbAAWIpABAAZBcmlhbABIZWx2ZXRp
Y2EAABsmvAIAAEFyaWFsIFJvdW5kZWQgTVQgQm9sZAARNZABAABDb3VyaWVy
IE5ldwARNZABAgBNUyBMaW5lRHJhdwAiAAQAcQiJGAAA0AIAAGgBAAAAANBb
GYa2ahuGAAAAAAcAXAAAAPQBAAAnCwAAAgAFAAAABACDEBcAAAAAAAAAAAAA
AAIAAQAAAAEAAAAAAAAAIQMAAAAAKgAAAAAAAAALTWFyayAgQmFrZXILTWFy
ayAgQmFrZXIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAEAAAACAAAAAwAAAAQAAAAFAAAABgAAAAcA
AAAIAAAACQAAAAoAAAALAAAADAAAAA0AAAAOAAAADwAAAP7////9////FAAA
AP7///8cAAAA/v/////////////////////////////////////////+////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////UgBvAG8AdAAg
AEUAbgB0AHIAeQAAAGspDUphY2sgRG9uZ2FycmEgLSBVbml2LiBvZiBUZW5u
Li9PUk5MIChkbxYABQH//////////wEAAAAACQIAAAAAAMAAAAAAAABGAAAA
AKD5PUK9vrwBEKvSgiLwvAETAAAAQAMAAGdldG9XAG8AcgBkAEQAbwBjAHUA
bQBlAG4AdAAAAHNzbCAtIFNHSS9DcmF5IChjbWdAY3JheS5jb20pDVdpbGxp
YW0gGgACAQIAAAADAAAA/////3BwQG1jcy5hbmwuZ292KQ1Ub255IEhleSAt
IFVuaXYuIG9mIAAAAAAQHgAAdG9uIAEAQwBvAG0AcABPAGIAagAAAC51aykN
Um9nZXIgSG9ja25leSAtIFVuaXYuIG9mIFdlc3RtaW5pc3RlciAocm8SAAIB
////////////////LmNvLnVrKQ1NYXJrIFBhcGlhbmkAAAAAAAAAAAAAAAAA
AAAAAAAAAGoAAABtcEBlBQBTAHUAbQBtAGEAcgB5AEkAbgBmAG8AcgBtAGEA
dABpAG8AbgAAAHMgKHNhaW5pQG5hcy5uYXNhLmdvdikNRCgAAgH/////BAAA
AP////9FQ0lUIChzbmVsbGluZ0BmZWNpdAAAAAAAAAAAAAAAAAAAAAACAAAA
vAEAAHRlZW4BAAAA/v///wMAAAAEAAAABQAAAAYAAAAHAAAACAAAAP7///8K
AAAACwAAAAwAAAD+////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
/////////////////////////////////wEA/v8DCgAA/////wAJAgAAAAAA
wAAAAAAAAEYYAAAATWljcm9zb2Z0IFdvcmQgRG9jdW1lbnQACgAAAE1TV29y
ZERvYwAQAAAAV29yZC5Eb2N1bWVudC42APQ5snEAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAA/v8AAAQAAgAAAAAAAAAAAAAAAAAAAAAAAQAA
AOCFn/L5T2gQq5EIACsns9kwAAAAjAEAABIAAAABAAAAmAAAAAIAAACgAAAA
AwAAAKwAAAAEAAAAuAAAAAUAAADMAAAABgAAANgAAAAHAAAA5AAAAAgAAAD0
AAAACQAAAAgBAAASAAAAFAEAAAoAAAA8AQAACwAAAEgBAAAMAAAAVAEAAA0A
AABgAQAADgAAAGwBAAAPAAAAdAEAABAAAAB8AQAAEwAAAIQBAAACAAAA5AQA
AB4AAAABAAAAAAAGAB4AAAABAAAAAFdSTR4AAAAMAAAATWFyayAgQmFrZXIA
HgAAAAEAAAAAOmkQHgAAAAEAAAAAAAAAHgAAAAcAAABOb3JtYWwAYR4AAAAM
AAAATWFyayAgQmFrZXIAHgAAAAIAAAA3AAQAHgAAAB4AAABNaWNyb3NvZnQg
V29yZCBmb3IgV2luZG93cyA5NQAAAEAAAAAAKC3aDAAAAEAAAAAAAAAABQBE
AG8AYwB1AG0AZQBuAHQAUwB1AG0AbQBhAHIAeQBJAG4AZgBvAHIAbQBhAHQA
aQBvAG4AAAAAAAAAAAAAADgAAgD///////////////8AAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJAAAA6AAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAP///////////////wAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAA////////////////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD/
//////////////8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAD+/wAABAACAAAAAAAAAAAAAAAAAAAAAAABAAAAAtXN
1ZwuGxCTlwgAKyz5rjAAAAC4AAAACAAAAAEAAABIAAAADwAAAFAAAAAEAAAA
dAAAAAUAAAB8AAAABgAAAIQAAAALAAAAjAAAABAAAACUAAAADAAAAJwAAAAC
AAAA5AQAAB4AAAAZAAAAVW5pdmVyc2l0eSBvZiBQb3J0c21vdXRoAAAAAAMA
AAAAOgAAAwAAABcAAAADAAAABQAAAAsAAAAAAAAACwAAAAAAAAAMEAAAAgAA
AB4AAAABAAAAAAMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AABAAAAAADhSnMW+vAFAAAAAANR+ciLwvAEDAAAAAgAAAAMAAAD0AQAAAwAA
ACcLAAADAAAAAAAAAAAAAAD+/wAABAACAAAAAAAAAAAAAAAAAAAAAAABAAAA
AtXN1ZwuGxCTlwgAKyz5rjAAAAC4AAAACAAAAAEAAABIAAAADwAAAFAAAAAE
AAAAdAAAAAUAAAB8AAAABgAAAIQAAAALAAAAjAAAABAAAACUAAAADAAAAJwA
AAACAAAA5AQAAB4AAAAZAAAAVW5pdmVyc2l0eSBvZiBQb3J0c21vdXRoAAAA
AAMAAAAAOgAAAwAAABcAAAADAAAABQAAAAsAAAAAAAAACwAAAAAAAAAMEAAA
AgAAAB4AAAABAAAAAAMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAA

--mordillo:879418490:877:126:21579--

From owner-parkbench-comm@CS.UTK.EDU Mon Nov 17 08:32:09 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id IAA28026; Mon, 17 Nov 1997 08:32:09 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id HAA07698; Mon, 17 Nov 1997 07:58:13 -0500 (EST)
Received: from post.mail.demon.net (post-20.mail.demon.net [194.217.242.27]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id HAA07665; Mon, 17 Nov 1997 07:57:54 -0500 (EST)
Received: from minnow.demon.co.uk ([158.152.73.63]) by post.mail.demon.net
           id aa2024828; 17 Nov 97 12:43 GMT
Message-ID: <06u4dCAfsDc0Ew8p@minnow.demon.co.uk>
Date: Mon, 17 Nov 1997 12:39:59 +0000
To: parkbench-comm@CS.UTK.EDU
From: Roger Hockney <roger@minnow.demon.co.uk>
Subject: To the PARKBENCH97 BOF
MIME-Version: 1.0
X-Mailer: Turnpike Version 3.03a <kRL7V2isFfDmnKSZb08I5Tyfx$>

                 GREETINGS TO THE PARKBENCH 1997 BOF
                 -----------------------------------
I am not able to attend the Parkbench BOF this year but would like to
make the following input:

Chairman: Please express my apologies for absence to the meeting.


Agenda Item: Low-Level Performance Evaluation tools.
             --------------------------------------
The latest version of the Parkbench Interactive Curve Fitting Tool 
(PICT2) is on my Web page at:

      http://www.minnow.demon.co.uk/pict/source/pict2a.html

I believe that this solves the problem of displaying on different 
sized screens. Please try it and give me feedback (I have had little
so far, so I don't know how worthwhile it is!). 

This plots and allows manual interactive curve fitting of data 
anywhere on the Web in raw-data, Original COMMS1, and New COMMS1 
format. However, it still relies on COMMS1 calculating the least 
squares 2-Para and 3-Point 3-Para fits. 

Agenda Item : Plans for the next Release.
              --------------------------
Just a reminder that New COMMS1 as announced in my email to the 
committee of 16 Feb 1997, was designed as the minimum necessary 
changes to the existing release to solve the problems raised at
the beginning of the year. It involves new versions of 5 routines
and 2 new routines. In addition, the Make files need the 2 new 
routines added where appropriate. We have incorporated these 
changes at Westminster in the existing release without trouble.

I believe that these should be incorported in the next release.

In summary:

New COMMS1

In directory:

http://www.minnow.demon.co.uk/Pbench/comms1/

The 5 Changed Routines:

(1) File COMMS1_1.F replaces

        ParkBench/Low_Level/comms1/src_mpi/COMMS1.f

(2) File COMMS1_1.INC replaces

ParkBench/Low_Level/comms1/src_mpi/comms1.inc

(3) File ESTCOM_1.F replaces

        ParkBench/Low_Level/comms1/src_mpi/ESTCOM.f

(4) File LSTSQ_1.F replaces

        ParkBench/lib/Low_Level/LSTSQ.f

(5) File CHECK_1.F replaces

        Parkbench/lib/Low_Level/CHECK.f

The 2 New Routines:

(6) File LINERR_1.F add as

        ParkBench/lib/Low_Level/LINERR.f

(7) File VPOWER_1.F add as

        ParkBench/lib/Low_Level/VPOWER.f


HAVE A NICE MEETING, and best wishes to you all,

Roger Hockney

-- 
Roger Hockney.  Checkout my new Web page at URL   http://www.minnow.demon.co.uk
University of   and link to my new book: "The Science of Computer Benchmarking"
Westminster UK  suggestions welcome. Know any fish movies or suitable links?

From owner-parkbench-comm@CS.UTK.EDU Mon Dec  1 08:38:55 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id IAA05062; Mon, 1 Dec 1997 08:38:55 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA20432; Mon, 1 Dec 1997 08:03:34 -0500 (EST)
Received: from hermes.lsi.usp.br (hermes.lsi.usp.br [143.107.161.220]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id IAA20425; Mon, 1 Dec 1997 08:03:30 -0500 (EST)
Received: from cali.lsi.usp.br (cali.lsi.usp.br [10.0.161.7]) by hermes.lsi.usp.br (8.8.5/8.7.3) with SMTP id LAA05866; Mon, 1 Dec 1997 11:03:20 -0200 (BDB)
Message-ID: <34830ABD.487C@lsi.usp.br>
Date: Mon, 01 Dec 1997 11:06:37 -0800
From: Martha Torres <mxtd@lsi.usp.br>
Organization: LSI
X-Mailer: Mozilla 3.01Gold (Win95; I)
MIME-Version: 1.0
To: parkbench-comm@CS.UTK.EDU
CC: mxtd@lsi.usp.br
Subject: compiling ParkBench for MPICH
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Sirs
ParkBench Committee

Dear Sirs,
I am Ph.D student and I am working with collective communication
operations. Particulary, I am interested in to quantify the influence
of collective communication operations on the total execution time
of several MPI-programs.

My platform is a cluster of 8 Dual Pentium Pro processors 
interconnected by 100Mb/s Fastethernet.
I use MPICH version 1.1, fort77 and cc compilers

I have downloaded ParkBench.tar from netlib. I followed 
all instructions but there are some programs that
did not work:
1. Low_Level/poly1 poly2 rinf1 tick1 tick2
They did not compile. It appears the following:
ParkBench/lib/LINUX/ParkBench_misc.a: No such file or
directory. 
How do I create this library??

2. Kernels/LU_solver QR TRD
They also did not compile. It appears the following:
ParkBench/lib/LINUX/pblas_subset.a: In function 'pberror_'
undefined reference to 'blacs_gridinfo_'
undefined reference to 'blacs_abort_'

3. Comp_Apps/PSTSWM and Kernels/MATMUL
They compiled but they did not run

Thanks in advance, 

Best Regards
Martha Torres
Laboratorio de Sistema Integraveis
University of Sao Paulo
Sao Paulo - S.P. Brazil


From owner-parkbench-lowlevel@CS.UTK.EDU Wed Dec  3 02:22:07 1997
Return-Path: <owner-parkbench-lowlevel@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id CAA13224; Wed, 3 Dec 1997 02:22:07 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id CAA11602; Wed, 3 Dec 1997 02:22:29 -0500 (EST)
Received: from soran.pacific.net.sg (soran.pacific.net.sg [203.120.90.76]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id CAA11594; Wed, 3 Dec 1997 02:22:26 -0500 (EST)
From: <htchng@honda.insasbhd.com>
Received: from pop1.pacific.net.sg (pop1.pacific.net.sg [203.120.90.85])
	by soran.pacific.net.sg with ESMTP
	id PAA08723 for <pbwg-compactapp@cs.utk.edu>; Wed, 3 Dec 1997 15:22:07 +0800 (SGT)
Received: from pacific.net.sg ([203.116.15.109])
        by pop1.pacific.net.sg with SMTP
        id PAA19445 for <pbwg-compactapp@cs.utk.edu>; Wed, 3 Dec 1997 15:22:19 +0800 (SGT)
Message-Id: <199712030722.PAA19445@pop1.pacific.net.sg>
To: pbwg-compactapp@CS.UTK.EDU
Date: Wed,  3 Dec 97 15:25:30 +0800
Subject: Seeking Importer for Blank CD-R and  Computer Parts
X-Mailer: Crescent Internet ToolPak OLE Mail Control v.1.0

Dear Sir,

I understand that you are a computer reseller/trader.
(If you not, or not interested in this message, DO NOTHING, as we might have made a mistake)
We respect your privacy. 
As such, we only followup if you are interested and responded to our mail.

We are seeking importer for the following products:-
Able to supply the following in bulk / small quantity.
 
1.CD-R (Jewel Case)   
2.CD-R (Spindle)      
3.CD-R replicator (4pcs/hour, 50pcs tower)
4.Yamaha CDR400 (4x write, 6x read) recorder.
5.CD-RW as well as its recorder
6.PC Mother Board
7.PC RAMs
8.PC CPUs.
All products FOB Singapore.
Clients to specify freight forwarder.

 
Thank you very much. 
Have a nice day.

Best regards,

Manager,
Insas Networks.


From owner-parkbench-comm@CS.UTK.EDU Wed Jan  7 16:49:19 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id QAA19963; Wed, 7 Jan 1998 16:49:19 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id QAA17461; Wed, 7 Jan 1998 16:30:05 -0500 (EST)
Received: from timbuk.cray.com (timbuk-fddi.cray.com [128.162.8.102]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id QAA17452; Wed, 7 Jan 1998 16:30:02 -0500 (EST)
Received: from ironwood.cray.com (root@ironwood-fddi.cray.com [128.162.21.36]) by timbuk.cray.com (8.8.7/CRI-gate-news-1.3) with ESMTP id PAA16817 for <parkbench-comm@CS.UTK.EDU>; Wed, 7 Jan 1998 15:30:03 -0600 (CST)
Received: from magnet.cray.com (magnet [128.162.173.162]) by ironwood.cray.com (8.8.4/CRI-ironwood-news-1.0) with ESMTP id PAA27253; Wed, 7 Jan 1998 15:30:00 -0600 (CST)
From: Charles Grassl <cmg@cray.com>
Received: by magnet.cray.com (8.8.0/btd-b3)
          id VAA26077; Wed, 7 Jan 1998 21:29:59 GMT
Message-Id: <199801072129.VAA26077@magnet.cray.com>
Subject: Low Level benchmarks
To: parkbench-comm@CS.UTK.EDU
Date: Wed, 7 Jan 1998 15:29:59 -0600 (CST)
X-Mailer: ELM [version 2.4 PL24-CRI-d]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit


-- 
Charles Grassl

From owner-parkbench-comm@CS.UTK.EDU Wed Jan  7 16:56:40 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id QAA19981; Wed, 7 Jan 1998 16:56:40 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id QAA17784; Wed, 7 Jan 1998 16:36:27 -0500 (EST)
Received: from timbuk.cray.com (timbuk-fddi.cray.com [128.162.8.102]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id QAA17776; Wed, 7 Jan 1998 16:36:24 -0500 (EST)
Received: from ironwood.cray.com (root@ironwood-fddi.cray.com [128.162.21.36]) by timbuk.cray.com (8.8.7/CRI-gate-news-1.3) with ESMTP id PAA17087 for <parkbench-comm@cs.utk.edu>; Wed, 7 Jan 1998 15:36:24 -0600 (CST)
Received: from magnet.cray.com (magnet [128.162.173.162]) by ironwood.cray.com (8.8.4/CRI-ironwood-news-1.0) with ESMTP id PAA28449 for <parkbench-comm@cs.utk.edu>; Wed, 7 Jan 1998 15:36:22 -0600 (CST)
Received: from magnet by magnet.cray.com (8.8.0/btd-b3) via SMTP
          id VAA26107; Wed, 7 Jan 1998 21:36:21 GMT
Sender: cmg@cray.com
Message-ID: <34B3F553.167E@cray.com>
Date: Wed, 07 Jan 1998 15:36:19 -0600
From: Charles Grassl <cmg@cray.com>
Organization: Cray Research
X-Mailer: Mozilla 3.01SC-SGI (X11; I; IRIX 6.2 IP22)
MIME-Version: 1.0
To: parkbench-comm@CS.UTK.EDU
Subject: Low Level benchmark errors and differences
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

To:      Parkbench Low Level interests
From:    Charles Grassl

Subject: Low Level benchmark errors and differences

Date:    7 January, 1998


We should not produce or publish Parkbench Low level benchmark results
with the current suite of programs because the programs are inaccurate
and unreliable.  I ran the Low Level programs and compared the results
with the same metrics as recorded from other benchmark programs.
The differences range from less than 5% (acceptable) to a factor of 6
times difference, which is unacceptable.

The differences, or "errors", are summarized in the table below.
The recorded differences in results from the Low Level program were
arrived at by comparing the Parkbench program reported metrics with the
same metrics as measured by alternative programs.


       Table.  Differences in Low Level benchmark results
               for two systems.  System A is an Origin 2000.
               System B is a CRAY T3E.

                     System A          System B
                  Rinf  Startup    Rinf   Startup
        -----------------------------------------
        COMMS1    <10%     6x       <5%      6x
        COMMS2      2x     3x       <5%     <5%
        COMMS3     <5%              <5%
        POLY1      <5%    60%        2x     <5%
        POLY2      <5%    60%        2x     <5%
        POLY3       -      -         2x     80x


The Parkbench Low Level programs are occasionally requested for
benchmarking computer systems, but the results are usually rejected
because of their inaccuracy and unreliability.  If not rejected, they
cause confusion and consternation because the results do not agree
with other measurements of the same variables.  I emphasize that this
is not a case of obtaining optimization and favorable results for a
computer system.  The problem is with the inaccuracy and unreliability
of the results.

The Low Level programs measure and report low level parameters.
Therefore their value is in accuracy and utility.  The programs do not
constitute definitions of the reported metrics and hence the results
should correlate with other measurements of the the same variables.

The Low Level programs are obsolete and need to be replaced.  I have
written seven simple programs, with MPI and PVM versions, and offer them
as a replacement for the Low Level suite.

I strongly suggest that we delete or withdraw from distribution the
current Low Level suite.

From owner-parkbench-comm@CS.UTK.EDU Thu Jan  8 05:40:28 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id FAA01529; Thu, 8 Jan 1998 05:40:28 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA00442; Thu, 8 Jan 1998 05:20:21 -0500 (EST)
Received: from sun1.ccrl-nece.technopark.gmd.de (sun1.ccrl-nece.technopark.gmd.de [193.175.160.67]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id FAA00380; Thu, 8 Jan 1998 05:20:13 -0500 (EST)
Received: from sgi7.ccrl-nece.technopark.gmd.de (sgi7.ccrl-nece.technopark.gmd.de [193.175.160.89]) by sun1.ccrl-nece.technopark.gmd.de (8.7/3.4W296021412) with SMTP id LAA28869; Thu, 8 Jan 1998 11:20:05 +0100 (MET)
Received: (from hempel@localhost) by sgi7.ccrl-nece.technopark.gmd.de (950413.SGI.8.6.12/950213.SGI.AUTOCF) id LAA24864; Thu, 8 Jan 1998 11:18:48 +0100
Date: Thu, 8 Jan 1998 11:18:48 +0100
From: hempel@ccrl-nece.technopark.gmd.de (Rolf Hempel)
Message-Id: <199801081018.LAA24864@sgi7.ccrl-nece.technopark.gmd.de>
To: parkbench-comm@CS.UTK.EDU
Subject: Low Level benchmark errors and differences
Cc: ritzdorf@ccrl-nece.technopark.gmd.de,
        zimmermann@ccrl-nece.technopark.gmd.de, clantwin@ess.nec.de,
        eckhard@ess.nec.de, lonsdale@ccrl-nece.technopark.gmd.de,
        tbeckers@ess.nec.de
Reply-To: hempel@ccrl-nece.technopark.gmd.de

To:		Parkbench Low Level interests
From:		Rolf Hempel

Subject:	Low Level benchmark errors and differences,
		Note from Charles Grassl of January 7th

Date:		8 January, 1998


Thank you, Charles, for your note on the Low Level benchmarks. It could
not have come at a better time, because at NEC we just recently ran into
problems with COMMS1.

This code had been specified by a customer as a test case in a current
procurement. When we ran COMMS1 with our current MPI library, the
results for rinfinity and latency were completely wrong. In particular,
the latency values were off by more than a factor of two, when compared
with other ping-pong test programs. The following turned out to be the
main reasons for the errors:

1. The performance model is completely inadequate. A linear dependency
   between time and message length, fitted to the measurements by
   least squares, is bound to fail in the presence of discontinuities
   caused by protocol changes. Most MPI implementations change
   protocols for different message lengths for an overall performance
   optimization.

2. To make things worse, the least square fit overweighs the data points
   for very long messages, because the differences "model minus
   measurement" are largest there in absolute terms. The fitted line,
   therefore, more or less ignores the short message measurements.
   As a result, the latencies are completely up to chance.

3. The correction for internal measurement overhead (e.g., for
   subroutine calls) is programmed in a sloppy way, to say the least.
   We discovered several subroutine calls which were not
   taken into account, and the overhead is measured with low
   precision. For our implementation, this alone introduced a latency
   error of about 25%.
   
The result in our case was that, instead of the 13.5 usec latency
measured by the MPICH MPPTEST routine, COMMS1 initially reported some
28 usec. My colleague Hubert Ritzdorf then made an interesting
experiment: he removed some optimization from our MPI library for
long messages, thus INCREASING the communication times for messages
longer than 128000 bytes, and not changing anything for shorter
messages. The resulting DROP in latency from 28 to under 22 usec
clearly shows how ridiculous the COMMS1 benchmark is.

Thus, I strongly agree with Charles in that the COMMS* benchmarks
must be removed from PARKBENCH. They don't help anybody, and they
only cause confusion on the side of customers and frustration on the
side of benchmarkers. Let's get rid of this long-standing nuisance as
quickly as possible.

Best regards,
 Rolf Hempel
------------------------------------------------------------------------
Rolf Hempel      (email: hempel@ccrl-nece.technopark.gmd.de)
Senior Research Staff Member
C&C Research Laboratories, NEC Europe Ltd., Rathausallee 10,
53757 Sankt Augustin, Germany
Tel.: +49 (0) 2241 - 92 52 - 95
Fax:  +49 (0) 2241 - 92 52 - 99

From owner-parkbench-comm@CS.UTK.EDU Thu Jan  8 08:07:54 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id IAA02383; Thu, 8 Jan 1998 08:07:53 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id HAA05392; Thu, 8 Jan 1998 07:50:13 -0500 (EST)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id HAA05383; Thu, 8 Jan 1998 07:50:03 -0500 (EST)
Received: from mordillo (p108.nas1.is4.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA03072; Thu, 8 Jan 98 12:48:32 GMT
Date: Thu,  8 Jan 98 12:10:55 GMT
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Re: Low Level benchmark errors and differences 
To: Charles Grassl  <cmg@cray.com>, parkbench-comm@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
References: <34B3F553.167E@cray.com> 
Message-Id: <Chameleon.884263257.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

I am in agreement with Charles and Rolf about the low-level codes.

We've known for some time that they (the codes) are less than perfect,
if not in some cases flawed. At the SC'97 Parkbench meeting it was
mooted that Parkbench should concentrate on producing, supporting,
analysing and recording Low-Level codes and results. If this is the
case then we should certainly ensure that what we support codes
that are soundly written and produce consistent and reliable results. 

I certainly believe that a set of codes, akin to the low-level ones,
should be part of the Parkbench suite. Maybe this is a good time to
replace the current codes with those that Charles has produced !?

As a side issue, I think we should produce C versions of whatever low-level
codes we produce.

Charles, I'd be interested in your thoughts on the codes that Pallas produce 
- ftp://ftp.pallas.de/pub/PALLAS/PMB/PMB10.tar.gz. These are C benchmark 
codes that run: 

PingPong - like comms1
PingPing - like comms2
Xover
Cshift
Exchange 
Allreduce
Bcast
Barrier - like synch1

Obviously, I would'nt like to comment on how well written they are or how reliable
the results that they produce are. I'm relatively impressed with them. I also like
the fact they try and produce results for commonly used MPI functions - 
cshift/exchange/etc. I've run the codes on NT boxes and they appear to produce 
results close to what I would expect. 

Regards

Mark


--- On Wed, 07 Jan 1998 15:36:19 -0600  Charles Grassl <cmg@cray.com> wrote:
> To:      Parkbench Low Level interests
> From:    Charles Grassl
> 
> Subject: Low Level benchmark errors and differences
> 
> Date:    7 January, 1998
> 
> 
> We should not produce or publish Parkbench Low level benchmark results
> with the current suite of programs because the programs are inaccurate
> and unreliable.  I ran the Low Level programs and compared the results
> with the same metrics as recorded from other benchmark programs.
> The differences range from less than 5% (acceptable) to a factor of 6
> times difference, which is unacceptable.
> 
> The differences, or "errors", are summarized in the table below.
> The recorded differences in results from the Low Level program were
> arrived at by comparing the Parkbench program reported metrics with the
> same metrics as measured by alternative programs.
> 
> 
>        Table.  Differences in Low Level benchmark results
>                for two systems.  System A is an Origin 2000.
>                System B is a CRAY T3E.
> 
>                      System A          System B
>                   Rinf  Startup    Rinf   Startup
>         -----------------------------------------
>         COMMS1    <10%     6x       <5%      6x
>         COMMS2      2x     3x       <5%     <5%
>         COMMS3     <5%              <5%
>         POLY1      <5%    60%        2x     <5%
>         POLY2      <5%    60%        2x     <5%
>         POLY3       -      -         2x     80x
> 
> 
> The Parkbench Low Level programs are occasionally requested for
> benchmarking computer systems, but the results are usually rejected
> because of their inaccuracy and unreliability.  If not rejected, they
> cause confusion and consternation because the results do not agree
> with other measurements of the same variables.  I emphasize that this
> is not a case of obtaining optimization and favorable results for a
> computer system.  The problem is with the inaccuracy and unreliability
> of the results.
> 
> The Low Level programs measure and report low level parameters.
> Therefore their value is in accuracy and utility.  The programs do not
> constitute definitions of the reported metrics and hence the results
> should correlate with other measurements of the the same variables.
> 
> The Low Level programs are obsolete and need to be replaced.  I have
> written seven simple programs, with MPI and PVM versions, and offer them
> as a replacement for the Low Level suite.
> 
> I strongly suggest that we delete or withdraw from distribution the
> current Low Level suite.
> 

---------------End of Original Message-----------------

-------------------------------------
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 01/08/98 - Time: 12:10:55
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Mon Jan 12 16:02:28 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id QAA26216; Mon, 12 Jan 1998 16:02:28 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id PAA16631; Mon, 12 Jan 1998 15:38:05 -0500 (EST)
Received: from post.mail.demon.net (post-20.mail.demon.net [194.217.242.27]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id PAA16588; Mon, 12 Jan 1998 15:37:38 -0500 (EST)
Received: from minnow.demon.co.uk ([158.152.73.63]) by post.mail.demon.net
           id aa2012292; 12 Jan 98 17:34 GMT
Message-ID: <X8YQ8DANPlu0Ewpp@minnow.demon.co.uk>
Date: Mon, 12 Jan 1998 17:33:01 +0000
To: hempel@ccrl-nece.technopark.gmd.de
Cc: parkbench-comm@CS.UTK.EDU, ritzdorf@ccrl-nece.technopark.gmd.de,
        zimmermann@ccrl-nece.technopark.gmd.de, clantwin@ess.nec.de,
        eckhard@ess.nec.de, lonsdale@ccrl-nece.technopark.gmd.de,
        tbeckers@ess.nec.de
From: Roger Hockney <roger@minnow.demon.co.uk>
Subject: Re: Low Level benchmark errors and differences
In-Reply-To: <199801081018.LAA24864@sgi7.ccrl-nece.technopark.gmd.de>
MIME-Version: 1.0
X-Mailer: Turnpike Version 3.03a <kRL7V2isFfDmnKSZb08I5Tyfx$>

To: Rolf, Charles, Mark and others,

From: Roger 

I too am distressed to see the original COMMS1 code (written and tested
for message lengths only up to 10^4) is still being issued by Parkbench
and being used well outside its range of proven validity (message
lengths now typically up to 10^7 or even 10^8).

These problems were pointed out about one year ago by Charles and Ron,
and as a result I worked on the code and issued to the committee a
minmum set of changes to the current release that would solve many of
the problems. These involve replacing five existing routines and adding
two to the existing release. The routines involved have been
downloadable from my Web site since about 12 March 1997 and have been
used successfully at Westminster University in our work. 

The New COMMS1, as I called it, was the subject of two printed reports
to the May 1997 meeting of Parkbench and  further results were shown at
the Sept 1997 meeting. There were also extensive discussions in this
email group during 1997.

Unfortunately my simple fixes were not inserted into the Parkbench
release and as a result we are still getting a bad press from
benchmarkers. After all the effort I put into solving this problem a
year ago, I feel rather let down that my work was never used. If my
changes had been encorporated into the Parkbenchmarks when they were
offered at least as an interim measure, I believe we could have avoided
much of the current bad publicity. 

I emphasise that the New COMMS1 was written as a minimum patch to the
existing release to solve an urgent problem in the simplest way. I am
not against a complete rethink of the low level benchmarks and now that
MPI has become a recognised standard, benchmarks timing the principal
software primitives of MPI would seem to be the most useful. Quite
possibly Charles's or Mucci's codes could be used.

However, I am still firmly convinced of the value of approximate
parametric representation of all the benchmark measurements based on a
simple performance model. Most of the existing low-level benchmarks were
written primarily to determine such parameters and hence include both
raw measurements and least squares curve fitting to obtain the
parameters. I have yet to see data that cannot be satisfactorily fitted
by 2 or 3 parameters, or two sets of 2-paras. And remember that I am
talking here about fitting ALL the measured data by some simple
formulae.    

After the decision of the May 1997 meeting to separate the raw
measurements from the parametric curve fitting, the curve fitting will
eventually become part of the "Parkbench Interactive Curve Fitting Tool"
(PICT). At present this applet can be used to produce a manual curve
fit, but eventually I will put up on my Web site a version in which the
least squares and 3-point buttons are active. But PICT as it is can now
be used manually to see how good or bad the 2-para and 3-para fits are.
Turn your browser to:

       http://www.minnow.demon.co.uk/pict/source/pict2a.html

and insert your raw data. I would be very interested to see what the NEC
data looks like.  

To answer some of Rolf's points: 

Rolf Hempel <hempel@ccrl-nece.technopark.gmd.de> writes
>
>1. The performance model is completely inadequate. A linear dependency
>   between time and message length, fitted to the measurements by
>   least squares, is bound to fail in the presence of discontinuities
>   caused by protocol changes. Most MPI implementations change
>   protocols for different message lengths for an overall performance
>   optimization.
>
Note that the original COMMS1 that you are using allows you to insert
one break point to take account of one major discontinuity. Have you
tried this?

In any case, to make t_0 a good measure of startup it is sensible ALWAYS
to make a breakpoint at say 100 or 1000 Byte, then the short message t_0
should be a good measure of startup. The long message t_0 is then not of
interest and should be ignored. In this way one is using the straight-
line fit over a short range of lengths, and the resulting t_0 should be
a better estimate of latency because it is derived from several
measurements rather than just selecting a single measurement (e.g. the
time for the shortest message) -- surely a better experimental
procedure. I emphasise that this procedure can be used now with the
original COMMS1 to get sensible results.

If there are many small discontinuities or changes of protocol then I
expect you data is rather like that shown by Charles this time last year
and used as an example in PICT. In this case the 3-para fit may give
good results for your data as it did for Charles's.

>2. To make things worse, the least square fit overweighs the data points
>   for very long messages, because the differences "model minus
>   measurement" are largest there in absolute terms. The fitted line,
>   therefore, more or less ignores the short message measurements.
>   As a result, the latencies are completely up to chance.
>
This is absolutely true and was discovered to be the problem one year
ago. My solution, used in the New COMMS1, was and is to minimise the sum
of the squares of the relative (rather than absolute) error. If this is
done the values for short messages are not ignored in the way described,
and t_0 is held much closer to the time for the smallest message length.

Note also that the 3-parameter fit provided by New COMMS1 can be fitted
exactly to the time for the shortest message, to the bandwidth for the
longest message, and to the bandwidth near the mid point. This is the
so-called 3-point fit, but it does require a third parameter.

Can you please email me the output file for the NEC from the original
COMMS1. I can then put this data through the New COMMS1 and see what two
and three parameter fits are produced.

Otherwise you could update your version of Parkbenchmarks with the 7
subroutines and rerun using New COMMS1. See the instructions at the end
of this email.
 
>28 usec. My colleague Hubert Ritzdorf then made an interesting
>experiment: he removed some optimization from our MPI library for
>long messages, thus INCREASING the communication times for messages
>longer than 128000 bytes, and not changing anything for shorter
>messages. The resulting DROP in latency from 28 to under 22 usec
>clearly shows how ridiculous the COMMS1 benchmark is.
>
Hubert's results are just what one would expect from minimising the
absolute error. I suspect you would not see this effect with New COMMS1
which does not over-emphasise the long message measurements.

Please remember that the t_0 reported by COMMS1 is not a measurement of
the time for any particular message length. It is the constant term in
the fitted curve:

                        t = t_0 + n/rinf

which is an approximation to ALL the measured data.

If you want to know the time, say for the smallest message length, then
that is listed in the table of lengths and times reported in the
benchmark output. If you mean by latency the time for the shortest
message (hopefully zero or 1 Byte) then the COMMS1 measurements of this
are in this table not in t_0.

For those who missed my two earlier emailings on using the New COMMS1, I
copy my earlier email below:

Agenda Item : Plans for the next Release.
              --------------------------
Just a reminder that New COMMS1 as announced in my email to the 
committee of 16 Feb 1997, was designed as the minimum necessary 
changes to the existing release to solve the problems raised at
the beginning of the year. It involves new versions of 5 routines
and 2 new routines. In addition, the Make files need the 2 new 
routines added where appropriate. We have incorporated these 
changes at Westminster in the existing release without trouble.

I believe that these should be incorported in the next release.

In summary:

New COMMS1

In directory:

http://www.minnow.demon.co.uk/Pbench/comms1/

The 5 Changed Routines:

(1) File COMMS1_1.F replaces the following file in the current release:

        ParkBench/Low_Level/comms1/src_mpi/COMMS1.f

(2) File COMMS1_1.INC replaces

        ParkBench/Low_Level/comms1/src_mpi/comms1.inc

(3) File ESTCOM_1.F replaces

        ParkBench/Low_Level/comms1/src_mpi/ESTCOM.f

(4) File LSTSQ_1.F replaces

        ParkBench/lib/Low_Level/LSTSQ.f

(5) File CHECK_1.F replaces

        Parkbench/lib/Low_Level/CHECK.f

The 2 New Routines:

(6) File LINERR_1.F add as

        ParkBench/lib/Low_Level/LINERR.f

(7) File VPOWER_1.F add as

        ParkBench/lib/Low_Level/VPOWER.f

Best wishes to you all

Roger
-- 
Roger Hockney.  Checkout my new Web page at URL   http://www.minnow.demon.co.uk
University of   and link to my new book: "The Science of Computer Benchmarking"
Westminster UK  suggestions welcome. Know any fish movies or suitable links?

From owner-parkbench-comm@CS.UTK.EDU Tue Jan 13 08:38:07 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id IAA17513; Tue, 13 Jan 1998 08:38:07 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA03191; Tue, 13 Jan 1998 08:20:10 -0500 (EST)
Received: from sun1.ccrl-nece.technopark.gmd.de (sun1.ccrl-nece.technopark.gmd.de [193.175.160.67]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id IAA03184; Tue, 13 Jan 1998 08:20:07 -0500 (EST)
Received: from sgi7.ccrl-nece.technopark.gmd.de (sgi7.ccrl-nece.technopark.gmd.de [193.175.160.89]) by sun1.ccrl-nece.technopark.gmd.de (8.7/3.4W296021412) with SMTP id OAA04953; Tue, 13 Jan 1998 14:19:47 +0100 (MET)
Received: (from hempel@localhost) by sgi7.ccrl-nece.technopark.gmd.de (950413.SGI.8.6.12/950213.SGI.AUTOCF) id OAA02202; Tue, 13 Jan 1998 14:18:30 +0100
Date: Tue, 13 Jan 1998 14:18:30 +0100
From: hempel@ccrl-nece.technopark.gmd.de (Rolf Hempel)
Message-Id: <199801131318.OAA02202@sgi7.ccrl-nece.technopark.gmd.de>
To: roger@minnow.demon.co.uk
Subject: COMMS1 Benchmark
Cc: tbeckers@ess.nec.de, lonsdale@ccrl-nece.technopark.gmd.de,
        eckhard@ess.nec.de, clantwin@ess.nec.de, parkbench-comm@CS.UTK.EDU
Reply-To: hempel@ccrl-nece.technopark.gmd.de

Dear Roger,

thank you for your note on the COMMS1 benchmark. We didn't try the
NEW COMMS1 code yet with our MPI library, so I cannot comment on its
accuracy. I just would like to answer some of the issues you raised
in your mail.

Of course we have seen that in COMMS1 you can select a transition point
between a short and a long model. For this choice, however, you have
to be able to change the input data. In our case (a benchmark suite
used in a procurement) our customer had provided the input dataset,
and we were not allowed to change it. So, the only way for us to correct
the results was to tune our MPI library to make it fit to the benchmark
program. I don't think that this is what you had in mind when you
wrote COMMS1.

You didn't comment on the inaccuracies we found in the raw measurements.
We ran several ping-pong benchmarks before, as, for example, the
MPPTEST routine of MPICH, and they consistently give better latencies
for short messages (difference approx. 25%). As I explained in my
previous mail, we found the reason to be an improper correction for
measurement overheads in COMMS1. Thus, the raw data are flawed,
and this cannot be resolved by any parameter fitting. This is also the
reason that I hesitate to send you the raw data reported by COMMS1 on
our machine.

I agree with you that it would be nice to have a few parameters to
characterize the performance of any given system. The values for
"n1/2" and "rinfinity" have been quite successful for vector arithmetic
operations. The situation is, however, much more complicated for
communication operations.

As an example, let's take the famous ping-pong benchmark. We already
discussed the problem of discontinuities caused by protocol changes.
If you want to do a parameter fitting, the only reasonable solution
seems to me that your test program automatically detects such points
and handles the different protocols separately. If you leave the
selection to an input parameter, you will inevitably run into the
problem I discussed above.

Even if you solve this problem, there remain many others. In modern
(i.e. highly optimized) MPI implementations, the performance of a
ping-pong operation crucially depends on the status of the two processes
involved. Is the receiving process already waiting for the message?
In a ping-pong, it usually is. This can make a huge difference!
Also, the performance can also depend on the global number of processes
active in the application. Not only do search lists in communication
progress engines become shorter if there are fewer processes, but some
implementers even went as far as writing special code for the case
where you just have two processes. Ping-pong codes such as COMMS1
almost always just use two communicating processes, so they measure the
best case. Another effect which is too often ignored is that messages
can interfere with each other (both at the hardware and software level)
if they are sent at the same time between different process pairs.
All those effects combined cause a substantial difference between
ping-pong results and measurements in real applications. In this
situation the apparent precision of performance parameters can be
quite misleading.

If I want to judge the quality of an MPI implementation, I don't
trust in best fit parameters so much. For the ping-pong code, I just
look at a graphic representation of time versus message length for
short messages, and another one of bandwidth versus message length for
long messages. This way I can study discontinuities and other minor
effects in detail. And then, take real applications and measure the
communication times there. Then you will often find surprising 
results which you have never seen in a ping-pong benchmark.

Best wishes,
  Rolf
------------------------------------------------------------------------
Rolf Hempel      (email: hempel@ccrl-nece.technopark.gmd.de)
Senior Research Staff Member
C&C Research Laboratories, NEC Europe Ltd., Rathausallee 10,
53757 Sankt Augustin, Germany
Tel.: +49 (0) 2241 - 92 52 - 95
Fax:  +49 (0) 2241 - 92 52 - 99

From owner-parkbench-comm@CS.UTK.EDU Thu Jan 15 14:17:57 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id OAA00690; Thu, 15 Jan 1998 14:17:56 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id NAA23858; Thu, 15 Jan 1998 13:55:08 -0500 (EST)
Received: from timbuk.cray.com (timbuk-fddi.cray.com [128.162.8.102]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id NAA23830; Thu, 15 Jan 1998 13:54:57 -0500 (EST)
Received: from ironwood.cray.com (root@ironwood-fddi.cray.com [128.162.21.36]) by timbuk.cray.com (8.8.7/CRI-gate-news-1.3) with ESMTP id LAA11159 for <parkbench-comm@cs.utk.edu>; Thu, 15 Jan 1998 11:11:42 -0600 (CST)
Received: from magnet.cray.com (magnet [128.162.173.162]) by ironwood.cray.com (8.8.4/CRI-ironwood-news-1.0) with ESMTP id LAA08650 for <parkbench-comm@cs.utk.edu>; Thu, 15 Jan 1998 11:11:41 -0600 (CST)
From: Charles Grassl <cmg@cray.com>
Received: by magnet.cray.com (8.8.0/btd-b3)
          id RAA07227; Thu, 15 Jan 1998 17:11:40 GMT
Message-Id: <199801151711.RAA07227@magnet.cray.com>
Subject: Low Level Benchmarks
To: parkbench-comm@CS.UTK.EDU
Date: Thu, 15 Jan 1998 11:11:39 -0600 (CST)
X-Mailer: ELM [version 2.4 PL24-CRI-d]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit


To:      Parkbench interests
From:    Charles Grassl
Subject: Low Level benchmarks

Date:    15 January, 1998


Mark, thank you for pointing us to the PMB benchmark.  It is well written
and coded, but has some discrepancies and shortcomings.  My comments
lead to suggestions and recommendation regarding low level communication
benchmarks.

First, in program PMB the PingPong tests are twice as fast (in time)
as the corresponding message length tests in the PingPing tests (as run
on a CRAY T3E).  The calculation of the time and bandwidth is incorrect
by a factor of 100% in one of the programs.

This error can be fixed by recording, using and reporting the actual
time, amount of data sent and their ratio.  That is, the time should not
be divided by two in order to correct for a round trip.  This recorded
time is for a round trip message, and is not precisely the time for
two messages.  Half the round trip message passing time, as reported in
the PMB tests, is not the time for a single message and should not be
reported and such.  This same erroneous technique is used in the COMMS1
and COMMS2 two benchmarks.  (Is Parkbench is responsible for propagating
this incorrect methodology.)

In program PMB, the testing procedure performs a "warm up".  This
procedure is a poor testing methodology because is discards important
data.  Testing programs such as this should record all times and calculate
the variance and other statistics in order to perform error analysis.

Program PMB does not measure contention or allow extraction of network
contention data.  Tests "Allreduce" and "Bcast" and several others
stress the inter-PE communication network with multiple messages,
but it is not possible to extract information about the contention from
these tests.  The MPI routines for Allreduce and Bcast have algorithms
which change with respect to number of PEs and message lengths,  Hence,
without detailed information about the specific algorithms used, we cannot
extract information about network performance or further characterize
the inter-PE network.

Basic measurements must be separated from algorithms.  Tests PingPong,
PingPing, Barrier, Xover, Cshift and Exchange are low level.  Tests
Allreduce and Bcast are algorithms.  The algorithms Allreduce and Bcast
need additional (algorithmic) information in order to be described in
terms of the basic level benchmarks.


With respect to low level testing, the round trip exchange of messages,
as per PingPing and PingPong in PMB or COMMS1 and COMMS2, is not
characteristic of the lowest level of communication.  This pattern
is actually rather rare in programming practice.  It is more common
for tasks to send single messages and/or to receive single messages.
In this scheme, messages do not make a round trip and there is not
necessarily caching or other coherency effects.

The single message passing is a distinctly different case from that
of round trip tests.  We should be worried that the round trip testing
might introduce artifacts not characteristic of actual (low level) usage.
We need a better test of basic bandwidth and latency in order to measure
and characterize message passing performance.


Here are suggestions and requirements, in an outline form, for a low
level benchmark design:



    I. Single and double (bidirectional) messages.

       A. Test single messages, not round trips.
         1. The round trip test is an algorithm and a pattern.  As
            such it should not be used as the basic low level test of
            bandwidth.
         2. Use direct measurements where possible (which is nearly
            always).  For experimental design, the simplest method is
            the most desirable and best.
         3. Do not perform least squares fits A PIORI.  We know that
            the various message passing mechanisms are not linear or
            analytic because different mechanisms are used for different
            message sizes.  It is not necessarily known before hand
            where this transition occurs.  Some computer systems have
            more than two regimes and their boundaries are dynamic.
         4. Our discussion of least squares fitting is loosing tract
            of experimental design versus modeling.  For example, the
            least squares parameter for t_0 from COMMS1 is not a better
            estimate of latency than actual measurements (assuming
            that the timer resolution is adequate).  A "better" way to
            measure latency is to perform addition DIRECT measurements,
            repetitions or otherwise, and hence decrease the statistical
            error.  The fitting as used in the COMMS programs SPREADS
            error.  It does not reduce error and hence it is not a
            good technique for measuring such an important parameter
            as latency.

       B. Do not test zero length messages.  Though valid, zero length
          messages are likely to take special paths through library
          routines.  This special case is not particularly interesting or
          important.
          1. In practice, the most common and important message size is 64
             bits (one word).  The time for this message is the starting
             point for bandwidth characterization.

       D. Record all times and use statistics to characterize the message
          passing time.  That is, do not prime or warm up caches
          or buffers.  Timings for unprimed caches and buffers give
          interesting and important bounds.  These timings are also the
          nearest to typical usage.  
          1. Characterize message rates by a minimum, maximum, average
             and standard deviation.

       E. Test inhomogeneity of the communication network.  The basic
          message test should be performed for all pairs of PEs.
   

   II. Contention.

       A. Measure network contention relative to all PEs sending and/or
          receiving messages.

       B. Do not use high level routines where the algorithm is not known.
          1. With high level algorithms, we cannot deduce which component
             of the timing is attributable to the "operation count"
             and which is attributable to the actual system (hardware)
             performance.


  III. Barrier.

       A. Simple test of barrier time for all numbers of processors.




Additionally, the suite should be easy to use.  C and Fortran programs
for direct measurements of message passing times are short and simple.
These simple tests are of order 100 lines of code and, at least in
Fortran 90, can be written in a portable and reliable manner.

The current Parkbench low level suite does not satisfy the above
requirements.  It is inaccurate, as pointed out by previous letters, and
uses questionable techniques and methodologies.  It is also difficult to
use, witness the proliferation of files, patches, directories, libraries
and the complexity and size of the Makefiles.

This Low Level suite is a burden for those who are expecting a tool to
evaluate and investigate computer performance.  The suite is becoming
a liability for our group.  As such, it should be withdrawn from
distribution.

I offer to write, test and submit a new set of programs which satisfy
most of the above requirements.


Charles Grassl
SGI/Cray Research
Eagan, Minnesota  USA

From owner-parkbench-comm@CS.UTK.EDU Fri Jan 16 09:12:18 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id JAA11774; Fri, 16 Jan 1998 09:12:18 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA16130; Fri, 16 Jan 1998 08:53:07 -0500 (EST)
Received: from haven.EPM.ORNL.GOV (haven.epm.ornl.gov [134.167.12.69]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id IAA16123; Fri, 16 Jan 1998 08:53:06 -0500 (EST)
Received: (from worley@localhost) by haven.EPM.ORNL.GOV (8.8.3/8.8.3) id IAA01963; Fri, 16 Jan 1998 08:52:17 -0500 (EST)
Date: Fri, 16 Jan 1998 08:52:17 -0500 (EST)
From: Pat Worley <worley@haven.EPM.ORNL.GOV>
Message-Id: <199801161352.IAA01963@haven.EPM.ORNL.GOV>
To: parkbench-comm@CS.UTK.EDU
Subject: Re: Low Level Benchmarks
In-Reply-To: Mail from 'Charles Grassl <cmg@cray.com>'
      dated: Thu, 15 Jan 1998 11:11:39 -0600 (CST)
Cc: worley@haven.EPM.ORNL.GOV, ritzdorf@ccrl-nece.technopark.gmd.de,
        zimmermann@ccrl-nece.technopark.gmd.de, clantwin@ess.nec.de,
        eckhard@ess.nec.de, lonsdale@ccrl-nece.technopark.gmd.de,
        tbeckers@ess.nec.de

I have not been paying close attention to the current Low Level communication
suite discussions, having confidence in capabilities and resolve of the
current participants, but have decided to muddy the waters with a few personal
observations. 

1) I do not use the Low Level suite in my own performnace-related work. 
   I find that the interpretation of results is much easier if the
   experiments are designed to answer (my) specific performance questions.
   Producing numbers that are accurate enough and whose experiments are
   well-enough understood to be used to answer arbitrary performance
   questions is much more difficult. 

2) It may be time to revisit the goals of the Low Level suite. There are
   two obvious extremes.

   a) Determine some (hopefully representative) metrics of point-to-point
      communication performance, concentrating on making the measurements
      fair when comparing across platforms, but not requiring that the
      underlying architecture parameters be derivable from these numbers,
      or that they agree exactly with any other group's measurements.
      In this situation, a two (or more) parameter model fit to the data can
      be useful, if only as a shorthand for the raw data, but the model
      should not be expected to explain the data.

   b) Characterize the low level communication performance for each
      platform. Charles Grassl's latest recommendation is a first step in
      that direction. As a personal aside, I attempted such an exercise
      a few years ago (on the T3D, looking at the effect of common usage
      patterns on performance, not just ping-pong between nearest
      neighbors). I quickly became swamped by the amount of data and by the
      number of ways of presenting it (and the work was never written up). I
      realize now that my problem was trying to address too many evaluation
      questions simultaneously. 

      In addition to the large amount of data required, an accurate
      characterization is likely to require more platform-specific
      elements, and will continue to evolve as new machines are added, in
      order to be as fair to the new machines as it is to the old ones.
      (The two parameter models are very acurrate for some of the previous
       generation of homogeneous message-passing platforms.)
    

    In case my sympathies are not clear, I prefer to revisit and fix the
    current suite, "dumbing it down", if only in presentation, making it clear
    what it does and does not measure. In my own work, the
    point-to-point measurements are only for establishing a general
    performance baseline. The important measures are the performance
    observed in the kernel and full application codes. The baseline
    measurements are simply to assess the "peak achieveable" communication 
    performance.

    While a full characterization is an important thing to do, I do not
    believe that this group has the manpower, resources, or staying power to
    do it right. At one time in the past, we proposed to simply be a
    clearinghouse for the best of the performance measurement codes. If
    Charles wants to write and submit such an extensive low level suite, we
    can consider it, but in the meantime we should address the problems in
    the current suite, and not claim more than is appropriate. In particular,
    make sure that the customer does not become concerned that the
    vendor-stated latency and bandwidth does not match the PARKBENCH reported
    values. A discrepancy does not necessarily mean that someone is lying,
    simply that different aspects are being measured. But we should also be
    sure that intermachine comparisons using PARKBENCH measurements are
    valid, otherwise, they serve no purpose.

Pat Worley


PS.  - I may be in the fringe, but all my codes are written using variants
       of SWAP and SENDRECV, and most of the codes I see can be written in
       such a fashion (and could gain something from it). So, ping-pong and
       ping-ping are not irrelevant to me. 

PPS. - Of course the real reason for using ping-pong is the difficulty in
       measuring the time for one-way messaging. I was not aware that this
       was a solved problem, at least at the MPI or PVM level. Perhaps
       system instrumentation can answer it, but I didn't know that
       portable measurement codes could be guaranteed to do so across the
       different platforms.


From owner-parkbench-comm@CS.UTK.EDU Fri Jan 16 10:57:55 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id KAA13381; Fri, 16 Jan 1998 10:57:55 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id KAA20483; Fri, 16 Jan 1998 10:38:52 -0500 (EST)
Received: from sun1.ccrl-nece.technopark.gmd.de (sun1.ccrl-nece.technopark.gmd.de [193.175.160.67]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id KAA20468; Fri, 16 Jan 1998 10:38:45 -0500 (EST)
Received: from sgi7.ccrl-nece.technopark.gmd.de (sgi7.ccrl-nece.technopark.gmd.de [193.175.160.89]) by sun1.ccrl-nece.technopark.gmd.de (8.7/3.4W296021412) with SMTP id QAA09438; Fri, 16 Jan 1998 16:38:41 +0100 (MET)
Received: (from hempel@localhost) by sgi7.ccrl-nece.technopark.gmd.de (950413.SGI.8.6.12/950213.SGI.AUTOCF) id QAA04930; Fri, 16 Jan 1998 16:37:14 +0100
Date: Fri, 16 Jan 1998 16:37:14 +0100
From: hempel@ccrl-nece.technopark.gmd.de (Rolf Hempel)
Message-Id: <199801161537.QAA04930@sgi7.ccrl-nece.technopark.gmd.de>
To: parkbench-comm@CS.UTK.EDU
Subject: Re: Low Level Benchmarks
Cc: tbeckers@ess.nec.de, lonsdale@ccrl-nece.technopark.gmd.de,
        eckhard@ess.nec.de, clantwin@ess.nec.de,
        zimmermann@ccrl-nece.technopark.gmd.de,
        ritzdorf@ccrl-nece.technopark.gmd.de,
        hempel@ccrl-nece.technopark.gmd.de
Reply-To: hempel@ccrl-nece.technopark.gmd.de

I would like to send some remarks to the notes by Charles Grassl and
Pat Worley on the problem of low-level communication benchmarks.

As Pat pointed out, the ping-pong benchmark has been invented because
generally there is no global clock by which you could measure the time
for a single message. Everybody knows that this is no perfect solution,
and in my previous mail I already explained some aspects of why
ping-pong results can differ substantially from times found in real
applications. So, I think we will have to use ping-pong tests in
the future, with the caveat that they only measure a very special case
of message-passing. If Charles knows a way to measure single messages,
I would like to learn about it.

In most other points I agree with Charles. I'm strongly convinced that
the COMMS* routines are obsolete and should be replaced with something
reasonable. In particular, the current routines are far too complicated
to use, and give completely meaningless results. Therefore, I think one
should not even try to correct the COMMS* routines, especially as there
are already better alternatives available. One example is the PMB suite
of PALLAS. It is relatively easy to use, but the documentation should
provide more information than the internal calling tree given in the
README file. What is missing is a precise definition of the underlying
measuring methodology.

I strongly prefer the output of timing tables (perhaps translated in
good graphical representations) over crude parametrizations like the
ones in the COMMS* benchmarks. Those can only frustrate the experts
and confuse all other people.

As to the definition of latency, Charles is right in saying that zero
byte messages are dangerous because they often use special algorithms.
The straightforward solution to use 1 byte messages instead is bad
because usually messages are sent as multiples of 4 or 8 bytes, and for
other message lengths some overhead by additional copying or even
subroutine calls may be introduced. Since the lengths of most real
messages are multiples of 4 or 8 bytes, I support Charles' proposal to
measure the time for an 8 byte message and call it the latency.

I think the warm-up phase before the actual benchmarking is important
in order not to smear out initialization overheads over some number of
messages. The time for the first ping-pong (or other operation), 
however, should be measured and compared with the time found for the
following operations.

I very much welcome Charles Grassl's kind offer to write a new benchmark
suite. Perhaps there are even other suites available which could also
be candidates for getting adopted by PARKBENCH. This forum meanwhile
is quite well-known, which gives them considerable responsibility.
PARKBENCH's choice of benchmark programs influences procurements of new
machines world-wide, and the availability of a good set of low level
benchmarks could give PARKBENCH a good reputation. I'm afraid that the
current set of routines has the opposite effect.

- Rolf Hempel
------------------------------------------------------------------------
Rolf Hempel      (email: hempel@ccrl-nece.technopark.gmd.de)
Senior Research Staff Member
C&C Research Laboratories, NEC Europe Ltd., Rathausallee 10,
53757 Sankt Augustin, Germany
Tel.: +49 (0) 2241 - 92 52 - 95
Fax:  +49 (0) 2241 - 92 52 - 99

From owner-parkbench-comm@CS.UTK.EDU Fri Jan 16 12:46:04 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id MAA14801; Fri, 16 Jan 1998 12:46:04 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id MAA27007; Fri, 16 Jan 1998 12:29:03 -0500 (EST)
Received: from haven.EPM.ORNL.GOV (haven.epm.ornl.gov [134.167.12.69]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id MAA27000; Fri, 16 Jan 1998 12:29:01 -0500 (EST)
Received: (from worley@localhost) by haven.EPM.ORNL.GOV (8.8.3/8.8.3) id MAA02149; Fri, 16 Jan 1998 12:29:01 -0500 (EST)
Date: Fri, 16 Jan 1998 12:29:01 -0500 (EST)
From: Pat Worley <worley@haven.EPM.ORNL.GOV>
Message-Id: <199801161729.MAA02149@haven.EPM.ORNL.GOV>
To: parkbench-comm@CS.UTK.EDU
Subject: Re: Low Level Benchmarks
In-Reply-To: Mail from 'hempel@ccrl-nece.technopark.gmd.de (Rolf Hempel)'
      dated: Fri, 16 Jan 1998 16:37:14 +0100
Cc: worley@haven.EPM.ORNL.GOV

	In most other points I agree with Charles. I'm strongly convinced that
	the COMMS* routines are obsolete and should be replaced with something
	reasonable. 

I have no problem with this. As I indicated, I have no experience with these.

	What is missing is a precise definition of the underlying measuring
        methodology. 

Perhaps this is the point that I was trying to make. Not only must the
codes be easy to use, but the results should be easy to interpret.
Every code should have a simple description of what it is measuring, what
the data can be used for (and what it shouldn't be used for), and how to use
the data. 

PARKBENCH needs to provide guidance in what data to collect, not just
carefully crafted benchmark codes. And we need to describe clearly what 
low level communication tests are good for. For example,
I have problems with low level contention tests. Understanding hotspots
is an interesting exercise, but the connection to "real" codes
is more subtle. Do we stress test, look at contention for given
algorithms/global operators (and which algorithms), use some standard
workload characterization as the background job, ...? For any given
performance question, what should be used may be clear, but it is difficult
to do this a priori. A simultaneous send/receive stress test may very well be
something interesting to present, but we also need to be able to explain
why (because it is typical in synchronous global communication operations?).

In summary, I would like to see a prioritized list of what low level
information is worth collecting, and why. We can then use this to choose or
generate codes to do the testing. I apologize for being lazy. This may have
already been laid out in the original ParkBench document, but I never worried
about the low level tests before and don't have a copy of the document in
front of me. 

Pat Worley





From owner-parkbench-comm@CS.UTK.EDU Fri Jan 16 13:45:53 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id NAA15447; Fri, 16 Jan 1998 13:45:52 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id NAA29375; Fri, 16 Jan 1998 13:15:58 -0500 (EST)
Received: from c3serve.c3.lanl.gov (root@c3serve-f0.c3.lanl.gov [128.165.20.100]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id NAA29368; Fri, 16 Jan 1998 13:15:55 -0500 (EST)
Received: from risc.c3.lanl.gov (risc.c3.lanl.gov [128.165.21.76]) by c3serve.c3.lanl.gov (8.8.5/1995112301) with ESMTP id LAA04436 for <parkbench-comm@cs.utk.edu>; Fri, 16 Jan 1998 11:16:08 -0700 (MST)
Received: from localhost (hoisie@localhost) by risc.c3.lanl.gov (950413.SGI.8.6.12/c93112801) with SMTP id LAA13115 for <parkbench-comm@CS.UTK.EDU>; Fri, 16 Jan 1998 11:14:30 -0700
Date: Fri, 16 Jan 1998 11:14:30 -0700 (MST)
From: Adolfy Hoisie <hoisie@c3serve.c3.lanl.gov>
To: parkbench-comm@CS.UTK.EDU
Subject: Low Level Benchmarks
Message-ID: <Pine.SGI.3.95.980116105256.11971B-100000@risc.c3.lanl.gov>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII


Just to amplify some of the numerous excellent points made by Pat and
Charles and Rolf, the emphasis of the Parkbench group, as I see it, should
be on defining the methodology for benchmarking at this level. A string of
numbers says very little about machine performance in absence of a solid,
scientifcally defined underlying base for the programs utilized for
benchmarking.
COMMS is obsolete in methodology, coding and generation and analysis of
results. As such, I have used it quite some time ago only to reach the
conclusions above. Instead, I always chose to write my own benchmarking
programs in order to extract meaningful data for the applications I was
working on.
I would like to see the debate heading towards what is it that we need to
measure in a suite of general use that is applicable to machines of
interest. For example, very little or no attention is being paid to
benchmarking DSM architectures, where quite a few architectural parameters
become harder to define and subtler to interpret. Including, but not
limited to, message passing characterization on these architectures.

Adolfy    

======================================================================
Adolfy Hoisie                  \        Los Alamos National Laboratory
                                \Scientific Computing, CIC-19, MS B256
hoisie@lanl.gov                  \           Los Alamos, NM  87545 USA
                                  \                Phone: 505-667-5216
http://www.c3.lanl.gov/~hoisie/hoisie.html           FAX: 505-667-1126


From owner-parkbench-comm@CS.UTK.EDU Sun Jan 18 07:38:42 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id HAA20627; Sun, 18 Jan 1998 07:38:42 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id HAA21662; Sun, 18 Jan 1998 07:28:22 -0500 (EST)
Received: from post.mail.demon.net (post-10.mail.demon.net [194.217.242.154]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id HAA21655; Sun, 18 Jan 1998 07:28:20 -0500 (EST)
Received: from minnow.demon.co.uk ([158.152.73.63]) by post.mail.demon.net
           id aa1002926; 18 Jan 98 12:25 GMT
Message-ID: <IWo32HA0Rfw0Ew5i@minnow.demon.co.uk>
Date: Sun, 18 Jan 1998 12:24:20 +0000
To: parkbench-comm@CS.UTK.EDU
From: Roger Hockney <roger@minnow.demon.co.uk>
Subject: Low Level Benchmarks
MIME-Version: 1.0
X-Mailer: Turnpike Version 3.03a <kRL7V2isFfDmnKSZb08I5Tyfx$>


To: the low-level discussion group

From: Roger

I comment below on recent emailings on this topic which arrived  on the 
16 Jan 1998.

Pat Worley writes:

>2) It may be time to revisit the goals of the Low Level suite. There ar
>   are two obvious extremes.
>
>   a) Determine some (hopefully representative) metrics of point-to-po
>      point communication performance, concentrating on making the    
>      measurements
>      SNIP
>      In this situation, a two (or more) parameter model fit to the 
>      data can be useful, if only as a shorthand for the raw data, 
>      but the model should not be expected to explain the data.

This is of course what COMMS1 sets out to do. But please when judging 
this point, use the New COMMS1 revised code that DOES give much more 
sensible answers in difficult cases. Please do not base your opinions on 
results from the Original COMMS1 code that is still unfortunately being 
issued by Parkbench. Instructions for getting the new code was given in 
my email to this group on 12 Jan 1998. 

>     (The two parameter models are very accurate for some of the 
>      previous generation of homogeneous message-passing platforms.)

It is nice to have confirmation of this from an independent source.
In addition, the 3-parameter mode is available in New COMMS1 for cases 
where the 2-para fails.

>    In case my sympathies are not clear, I prefer to revisit and fix 
>    the current suite, "dumbing it down", if only in presentation, 
>    making it clear wh