## NA Digest Sunday, August 18, 1991 Volume 91 : Issue 33

Today's Editor: Cleve Moler

Today's Topics:

### Submissions for NA Digest:

Mail to na.digest@na-net.ornl.gov.

Mail to na.help@na-net.ornl.gov.

-------------------------------------------------------

Date: Fri, 16 Aug 91 10:17:21 -0400

Douglas B. Meade, formerly affiliated with Purdue University, is now
located at the University of South Carolina. His new address is:

Department of Mathematics
University of South Carolina
Columbia, SC 29208-0001

Phone: (803) 777-6183
FAX: (803) 777-3783

------------------------------

From: Tim Davis <davis@nautilus.cis.ufl.edu>
Date: Fri, 16 Aug 91 10:49:22 -0400
Subject: Clarification of Florida Research Position

The position that was mentioned in two weeks ago in vol. 91, issue 31,
of the NA Digest is limited to someone who would pursue their Master's
degree and/or Ph.D. degree (preferably the latter) in the Computer and
Information Science Department at the University of Florida.

Tim Davis

------------------------------

From: Joe Grcar <sepp@snll-arpagw.llnl.gov>
Date: Wed, 14 Aug 91 20:03:13 PDT
Subject: IEEE Arithmetic Stinks

I think the IEEE arithmetic standard is bad. I realize this shot
across the bow is nothing compared to the big bore delivery of
those who devised the standard, but I'm shooting anyway. I'm
tired of people making simple things complicated (and writing a

(1) The standard makes simple things complicated.

The first rule of engineering is: make it simple. The second
rule is: keep it simple. In contrast, the SUN OS documentation
for IEEE arithmetic is one hundred and sixty pages long.

(2) The standard makes precision indefinite.

When magnitudes become too small, the exponent sticks and the
representation degenerates to fixed point. When magnitudes
become too large, the numbers become "infinity."

(3) The standard makes error analysis impossible.

There is no machine epsilon. Fixed point arithmetic has no
error for addition and subtraction, but when fixed and floating
point numbers mix, multiplication and division can have large
relative error.

(4) The standard doesn't prevent failure, it only hides failure.

The standard allows arithmetic with fixed point, infinite and
"not a number" data. Programs run just fine with garbage in
and garbage out.

(5) The standard makes debugging difficult.

The standard allows programs to run with garbage for data.
These are the hardest programs to debug.

Here's an example of what can go wrong. An engineer installed a
large simulation program on his Sparc station. He got the wrong
versions of some libraries, so the data structures were
inconsistent. The Sparc's printed output consisted of "infinity"
and "not a number." The engineer said: "this program must be bad;
it won't work for the Sparc's IEEE arithmetic; it only works for
the Cray YMP's bad arithmetic." I had to laugh when I saw the
output and realized what had happened. According to the IEEE
standard, infinity is a perfectly good number---it just has no
precision.

the habit of writing interrupt handlers. The people who wrote the
IEEE standard have done something amazing. They've convinced billion
dollar companies to design special hardware just so expensive
workstations can have "not a number" for output.

In this example the cost of the IEEE's meddling was the wasted time
spent finding the hidden source of failure. In the old days, the
program would just crash, which is a reliable error detection
system. But what about the future? What will Captain Kirk do
when the navigational computer tells him: not a number? ("Lock
the phasers on Evans Hall, Mr. Chekov!")

Regards, Joe Grcar

------------------------------

From: SIAM Publications Department <SIAMPUBS@WILMA.WHARTON.UPENN.EDU>
Date: Tue, 13 Aug 91 11:54 EDT
Subject: Book on Total Least Squares

RECENTLY PUBLISHED AT SIAM

THE TOTAL LEAST SQUARES PROBLEM
COMPUTATIONAL ASPECTS AND ANALYSIS
By: Sabine Van Huffel and Joos Vandewalle
Frontiers in Applied Mathematics 9

"...TLS (Total Least Squares) represents a technique which synthesizes
statistical and numerical methodologies for solving problems arising in
many applications areas. The authors of this monograph have been leaders
in showing how to use TLS for solving a variety of problems, especially
those arising in a signal processing context. They give an elegant
presentation of the various aspects of the TLS problem. Their survey
encompasses the many elements required to understand the problem. It is
a pleasure to read such a clear account which is presented using standard
mathematical ideas and nomenclature."
- Gene H. Golub, Department of Computer Science, Stanford University

This is the first book published to be devoted entirely to total least
squares. The authors give a unified presentation of the TLS problem--
a description of its basic principle, the various algebraic, statistical,
and sensitivity properties of the problem are discussed, and
generalizations are presented. Applications are surveyed to facilitate
uses in an even wider range of applications. Whenever possible, comparison
is made with the well-known least squares methods.

Contents

Introduction; Basic Principles of the Total Least Squares Problem;
Extensions of the Basic Total Least Squares Problem; Direct Speed
Improvement of the Total Least Squares Computations; Iterative Speed
Improvement for Solving Slowly Varying Total Least Squares Problems;
Algebraic Connections Between Total Least Squares and Least Squares Problems;
Sensitivity Analysis of Total Least Squares and Least Squares Problems in
the Presence of Errors in all Data; Statistical Properties of the Total
Least Squares Problem; Algebraic Connections Between Total Least Squares
Estimation and Classical Linear Regression in Multicollinearity Problems;
Conclusions.

3600 University City Science Center, Philadelphia, PA 19104-2688; telephone:
215-382-9800; fax: 215-386-7999; e-mail: siam@wharton.upenn.edu.

------------------------------

From: Mike Leuze <leuze@aldebaran.EPM.ORNL.GOV>
Date: Tue, 13 Aug 91 10:17:36 EDT
Subject: Tenth Parallel Circus at Oak Ridge

FIRST ANNOUNCEMENT
of the
Tenth Parallel Circus

Location: Oak Ridge, Tennessee
Dates: October 25-26, 1991

Continuing the tradition that began at Yale in 1986, the Mathematical
Sciences Section and the Engineering Physics and Mathematics Division
at the Oak Ridge National Laboratory, Oak Ridge, Tennessee will be
hosting the Tenth Parallel Circus in Oak Ridge on Friday and Saturday,
October 25-26, 1991.

The Parallel Circus is an informal meeting which emphasizes parallel
algorithms for scientific computing.

This is the first time that the Circus will be held in the
southeastern part of the USA, and we hope to have many attendees from
the southeastern states, as well as from other parts of the USA and
other countries.

The circus is unique in that it is very informal, and thus allows
attendees to talk about the very latest results as well as interesting
work in progress. In previous meetings there was a very healthy mix
of industrial and academic participants, and there were lots of infor-
mal discussions.

Transportation

We have arranged special conference air fares with Delta Airlines. To
get the special rate and for details, call Delta or have your Travel
Agent call toll free at 1-800-221-1212 (8:00am - 11:00pm eastern time)
and ask for the Special Meetings Network. Please refer to file refer-
ence number J28116.

Accommodation

We have made arrangements for a block of discount rooms at the Comfort
Inn, Oak Ridge, for October 24, 25 and 26. The price per night is
$46.12 for a single room or$53.12 for a double room. To get the
discount rate mention "Parallel Circus" when you call for reserva-
tions.

Presentations

Each presentation will be about 30 minutes long -- 25 minutes plus 5
minutes for questions and discussions. Their actual lengths will
depend on the number of participants that wish to speak.

The circus will begin on Friday morning. Although there is no
prescribed program, we will probably end by early Saturday afternoon.
Participants who give a talk and leave are generally regarded as
anti-social so you should plan to attend all of the talks.

Organizers: Gene Golub, Michael Leuze, Esmond Ng

Ms. Joyce Nave
Mathematical Sciences Section
Oak Ridge National Laboratory
P.O. Box 2008, Bldg. 6012
Oak Ridge, Tennessee 37831-6367

Phone: (615) 576-4720
Fax: (615) 574-0680

e-mail: workshop@msr.epm.ornl.gov

------------------------------

From: Will Sawyer <sawyer@bernina.ethz.ch>
Date: Sat, 17 Aug 91 2:17:01 MET DST
Subject: Ninth Parallel Circus Summary

Better late but before the Tenth Parallel Circus...
--Will Sawyer

RECAP OF THE NINTH PARALLEL CIRCUS AT UCSB, MARCH 22-23, 1991

What is the latest word on parallel computing in the world of numerical
analysis? The expression "massive parallelization" was the main topic at the
latest Parallel Circus at the UCSB campus on 22 - 23rd March 1991.

The talks were various, casual, some provocative and all too short.
Indeed the two days offered only the opportunity to get acquainted and
for the speakers to give a short summary of their research on parallel
algorithms and issues.

The most provocative talk was given by Alan Karp who turned off the overhead
projectors and stated that, instead of presenting research, he wanted to
start an argument. Alan (IBM Palo Alto), who is on the committee for
proposing IBM's long term strategy for parallel processor software,
succeeded brilliantly by recommending that IBM support only the FX90
standard as parallel language of the future. In particular, he felt that
nearly every problem can be broken down into FX90 structure. What sort of
parallel problems, he queried, could not ultimately be described by its
constructs? The replies were numerous: chaotic iterations, monte-carlo
simulations and hierarchical methods, for example. Undaunted, Alan suggested
ways of bypassing the apparent obstacles.

The majority of the audience remained unconvinced by Alan's arguments. In a
spur-of-the-moment poll, most people expressed the feeling that additional
MIMD languages will be necessary to support the state-of-the-art and future
architectures.

Hardly less interesting was David Silvester's (visiting professor at Stanford's
Institute for Scientific Computing) talk on fast parallel finite element
methods. He spoke of a 160 Mflop code on a 4K processor CM1 -- and not just
the matrix multiplies but the entire code! -- written by students in his
course. They had never even heard of finite elements before the start of
the course... well, barely. How was such performance achieved? Firstly by
sticking to the simplest elements (rectangles) and secondly by careful node
numbering to avoid recurrences.

Everyone was aware of the wonders of cyclic reduction for tridiagonal
matrices. Izzy Nelken (University of Toronto) has applied the concept
to dense matrices. The resulting algorithm is similar (but not identical
to) that resulting from the Schur Complement.

Izzy reported that the complexity was identical to that of Gaussian
elimination and to an equal degree parallelizable. The error was generally
considerable smaller. On the negative side, he conceeded, the optimized
LAPACK routines were always quicker.

Andy Cleary of Sandia National Labs, reported on his sparse Cholesky
decomposition algorithm on a Connection Machine (CM). More precisely,

T
A = L L where A is large, sparse and reordered.

He pointed out that the CM was not conceived as a number cruncher. It was
envisioned for graphical manipulations; the needs of the numerical community
have changed Thinking Machine's thoughts, though. Groups of 32 single-bit
processors form so-called "sprint nodes" for floating-point computations.
The standard algorithm --- Gilbert and Schreiber's Sparse Cholesky ---
is less than optimal: it requires routers to distribute the frontal matrices,
the processors drop out after the system shrinks and there is no use of
symmetry of fronts.

Assistance comes from Kratzer's QR algorithm, which was altered for the
Connection Machine. The CM is configured as a "data-flow" array of sprint
nodes. Columns, not rows are mapped to the nodes. By utilizing communication
in two dimensions and using optimized Sparsepak routines to do the basic
computations he has improved the performance on the CM by an order of
magnitude over Gilbert and Schreiber's algorithm.

UCLA's Tony Chan started his talk with a general philosophical point of
view: the tough problems today are tightly-coupled problems. We are
confronted with a trade off: we try to split up the problem at the risk
of losing its relationship to the original problem.

Elliptic problems illustrate this trade-off. If one marches through the
grid over the multidimensional region in one direction, there is not
enough calculation to keep all of the parallel machine's processors going.
Red-Black ordering results in better parallelization but at a cost of global
accuracy. The multi-front method of van der Vorst, Meurant and Chan--Goovaerts
may be useful on a Cray X/MP with four processors, but not on a Connection
Machine with 64K processors. Finally, Tony spoke about his research in
circulant preconditioners which improve the condition numbers of
ill-conditioned matrices at a minimum of cost.

Myung Kim from UCSB described improvements on triangular linear systems'
solvers on a Hypercube. Triangular solvers, she pointed out, tend to leave
processors unoccupied with time. The trick to improving the backwards
insertion algorithm, based on the work of Coleman, Heath and Li, et al,
is to chop up and reform the matrix so as to keep as many processors busy
as possible. By increasingly clever partitioning of the matrix, efficiencies
of up to 80\% (as compared to 33\% for the naive backward insertion) may be
obtained.

Not only the numerical analysts were represented at the conference.
Mike Warren from Los Alamos National Labs spoke about N-body simulations
on parallel computers. The problem seems trivial: n bodies interact,
e.g. by way of gravitational forces.

For n approximately 10^4 this problem can be dealt with on a Cray. But already
at n approx. 10^5 the classical approach to the problem proves intractable
due to the enormous number of interactions. Clearly, help is needed.
The innovation from Barnes and Hut is the "hierarchical decomposition."
Since interactions of nearer bodies are much more significant than those with
distant ones, local clusters are isolated into groups. The local interactions
can be performed well in parallel, particularly when the groups contain
roughly equal number of points. The major difficulty is subsequent
consideration of the interactions between the groups. A fair estimation
of the error can be found by comparing the results from different
decompositions with that of the "exact" solution from the naive algorithm.

John De Pillis of the University of California at Riverside was
"convinced" to give a talk with no prior preparation. He discussed the
differential equation studied by Chin and Manteuffel, (excuse the LaTeX
notation)

-\varepsilon \Delta u + a(x,y)u_x + b(x,y)u_y + c(x,y)u = f \nonumber

This can be solved by generating two interdependent sequences, one which
tends to zero and the other which tends to the solution, namely

v ====> 0
k

u ====> u where Au = f
k

These result in a "race" between the two sequences, an ideal application
for a dual-processor machine. The relevant question for further research
is indeed, whether more than two competing sequences can be found for this
or other problems.

David Bacon from IBM's TJ Watson Research Center introduced the idea
of "Optimistic Parallelization." If there is a fast, but not necessarily
accurate way to calculate a needed value along with a sure-fire but slow
way, one can attack a problem in the following way:

y = f (x);
Fast

cont || if error(x,y) then
abort "cont";
y = f (x);
Slow
"cont";
fi;

He solicited the audience for suggestions about how this method could be
applied to known problems. While no one had a concrete suggestion, the comment
was made that sometimes the answer from the fast method can be used in the slow
method, for example in incomplete decompositions.

Last, but not least, the computer scientists had a great deal to say about the
newest in parallel computing. Seth Goldstein of U.C. Berkeley discussed
the advent of parallel debuggers for MIMD architectures. The non-deterministic
nature of programs running in parallel arises from external events (interrupts)
and shared-memory access. Value logging, that is code to copy potentially
shared read-data to a log file, can give valuable information for a later
debugging phase. Unfortunately, it makes the code slow and creates large
log files.

The goal of Seth's research is to provide the opportunity for deterministic
replay during the debugging phase, create the smallest possible logs and
is through event logging, namely logging cache traffic, which is deterministic
at the cost of running only one process at a time.

As with any good conference, the social events did not fall short. The
banquet at the "China House" in Goleta provided good food and enabled
good conversation, not only about numerical analysis.

The NEXT CIRCUS will take place on Oct 25, 26 in Oak Ridge, Tenn.

-- Will Sawyer with thanks to Gene Golub

------------------------------

From: SIAM Publications Department <SIAMPUBS@WILMA.WHARTON.UPENN.EDU>
Date: Wed, 14 Aug 91 17:23 EDT
Subject: Contents: SIAM J. Optimization

SIAM Journal on Optimization

November 1991 Volume 1, Number 4

CONTENTS

An Auction Algorithm for Shortest Paths
Dimitri Bertsekas

Direct Search Methods on Parallel Machines
J. E. Dennis, Jr. and Virginia Torczon

On the Impact of Automatic Differentiation on the Relative Performance
of Parallel Truncated Newton and Variable Metric Algorithms
L. C. W. Dixon

Parallel Constraint Distribution
M. C. Ferris and O. L. Mangasarian

Parallel Solution of Large-Scale Block-Diagonal Concave Minimization Problems
J. H. Glick, R. S. Maier, and J. B. Rosen

Parallel Genetic Algorithms Applied to the Traveling Salesman Problem
Prasanna Jog, Jung Y. Suh, and Dirk Van Gucht

A General-Purpose Parallel Algorithm for Unconstrained Optimization
Stephen G. Nash and Ariela Sofer

Acceleration and Parallelization of the Path-Following Method for a
T. Nesterov and A. Nemirovsky

James M. Ortega

An Interior Point Method for Block Angular Optimization
Gary L. Schultz and Robert R. Meyer

On the Rate of Convergence of a Partially Asynchronous Gradient Projection
Algorithm
Paul Tseng

Partitioned Dynamic Programming for Optimal Control
Stephen J. Wright

On the Fine-Grain Decomposition of Multicommodity Transportation Problems
Stavros A. Zenios

------------------------------

From: SIAM Publications Department <SIAMPUBS@WILMA.WHARTON.UPENN.EDU>
Date: Fri, 16 Aug 91 10:30 EDT
Subject: Contents: SIAM J. Computing

SIAM JOURNAL ON COMPUTING
DECEMBER 1991, Volume 20, Number 6

CREW PRAMs and Decision Trees
Noam Nisan

On the Complexity of String Matching: Lower Bounds
Zvi Galil and Raffaele Giancarlo

Locality in Distributed Graph Algorithms
Nathan Linial

Self-P-Printability and Polynomial Time Turing Equivalence to a Tally Set
Roy S. Rubinstein

New Upper Bounds in Klee's Measure Problem
Mark H. Overmars and Chee-Keng Yap

An Optimal Randomized Parallel Algorithm for Finding Connected Components
in a Graph
Hillel Gazit

Asymptotically Fast Triangularization of Matrices over Rings
James L. Hafner and Kevin S. McCurley

Noninteractive Zero Knowledge
Manuel Blum, Alfredo De Santis, Silvio Micali, and Guiseppe Persiano

Elimination of Infrequent Variables Improves Average Case Performance of
Satisfiability Algorithms
John Franco

On Sets with Efficient Implicit Membership Tests
Lane A. Hemachandra and Albrecht Hoene

Parallel Tree Contraction, Part II: Further Applications
Gary L. Miller and John H. Reif

An Almost Linear-Time Algorithm for the Dense Subset-Sum Problem
Zvi Galil and Oded Margalit

------------------------------

End of NA Digest

**************************
-------