## NA Digest Saturday, April 27, 1991 Volume 91 : Issue 17

Today's Editor: Cleve Moler

Today's Topics:

-------------------------------------------------------

From: Jack Dongarra <dongarra@cs.utk.edu>
Date: Sat, 27 Apr 91 12:20:40 -0400
Subject: Netlib/NA-Net Machine Will Be Down a Few Days

The Mathematical Sciences Section at Oak Ridge National Laboratory
is moving to a new building next week. As a result, the computer
running netlib and na-net at ORNL will be down starting Wednesday,
May 1, 1991, for a few days.
We are automatically rerouting mail for netlib@ornl.gov to
netlib@research.att.com, so netlib service will not be interrupted.
However, the na-net will be off the air until the move has been
completed. Mail sent to na.<lastname>@na-net.ornl.gov will be queued
until the machine comes back up. We hope to have the machine back
by Friday, May 3.

Jack

------------------------------

From: Glenn McKee <mckee_g@vaxc.stevens-tech.edu>
Date: Tue, 23 Apr 1991 10:30 EST
Subject: Stone's Strongly Implicit Procedure

Dear colleagues,

I am interested in speeding up a PDE algorithm that involves
solving a large, sparse matrix system. I understand that
a useful algorithm is a modified version of Stone's Strongly
Implicit Proceedure.

Our library lacks references to the this algorithm, and I was
C or FORTRAN and would consider sending me a copy.

Sincerely,

Glenn McKee
E-mail: MCKEE_G@VAXC.STEVENS-TECH.EDU

------------------------------

From: R. Kaasschieter <wsanrk@win.tue.nl>
Date: Tue, 23 Apr 91 16:44:39 +0200
Subject: Solving a Nonlinear Initial BVP

Dear colleagues,

It happens that we have to solve for our department of chemistry
the following initial boundary value problem on a three-dimensional
domain:

du/dt = div ( D(u) grad u ) in the domain,
- n . ( D(u) grad u ) = f(u) on the boundary,
u = 1 for t = 0.

D is uniformly bounded from below by a positive constant and
f(u) >= 0 for all u.
A typical example for D is D(u) = exp(-u). Since the solution is known
to be positive, D(u) > 1.

We have planned to use finite elements for the space discretisation.
Our main problem is a sensible choice for the time discretisation.

Do you have any experience in solving this problem or related problems,
or do you know relevant literature?

Rik Kaasschieter and Bob Mattheij

Dr. E.F. Kaasschieter
Department of Mathematics and Computing Science
Eindhoven University of Technology
P.O. Box 513
5600 MB Eindhoven
The Netherlands
E-mail: wsanrk@win.tue.nl

------------------------------

From: Renato De Leone <deleone@cs.wisc.edu>
Date: Thu, 25 Apr 1991 07:33:03 CDT
Subject: References Sought for Parallel SOR

Hello na-netters,
I am looking for references on parallel Successive Overrelaxation
algorithms for SIMD machine. In particular for solving system of
linear equations with specially structured (5-diagonals) matrices.
Any ideas or pointers to references would be greatly appreciated.

Thanks a lot
Renato

Renato De Leone
Computer Sciences Dept.
1210 W. Dayton Madison, WI 53705
deleone@cs.wisc.edu
(608) 263-2677

------------------------------

From: Steven Lee <slee@csrd.uiuc.edu>
Date: Sat, 27 Apr 91 12:33:56 CDT
Subject: 2nd Annual Midwest NA Day

THIRD ANNOUNCEMENT
for the
2nd Annual Midwest NA Day

FINAL CALL FOR PRESENTERS

Location: Champaign-Urbana, IL
Digital Computer Laboratory
1304 W. Springfield Ave, Urbana

Date: Saturday, May 11 1991
8:30AM - 4:30PM

The 2nd Annual Midwest NA Day is scheduled for Saturday, May 11 1991
on the University of Illinois, Urbana-Champaign campus. The first NA
Day was held in April of last year to take special note of the
retirement of Bill Gear from the University of Illinois. This year,
the conference coincides with the May graduation ceremonies in which
Gene Golub is to receive an honorary doctorate from the University.

This is an informal conference with an emphasis on scheduling a
variety of talks from speakers with interests in numerical analysis
and scientific computing.

Those who wish to be included on the electronic mailing list for news
on this event (schedule information, travel directions, etc.) should
send email to:

Again, for those who plan to attend and would also like to contribute
a 20-minute or 40-minute presentation, please send the title and a
announcement in NA Digest; we hope to finalize the program and list of
speakers within the next few days.

Organizers: Paul Saylor, Robert Skeel, Faisal Saied, Ahmed Sameh,
Michael J. Holst and Steven Lee

------------------------------

From: Bernard Danloy <danloy@anma.ucl.ac.be>
Date: Tue, 23 Apr 91 and Fri, 26 Apr 91
Subject: New Measures of Precision in Floating-Point Arithmetic

In the last NA-Digest (21 April ; Vol.91/16), two communications were devoted to
the precision of the floating point arithmetic of a computer.
I would like to thank Nick Higham and Cleve Moler for their attempt to define
unambiguous concepts ; the lack of well defined concepts is too often the source
of confusion and errors. In addition, I am very happy to have learned that
the number mu defined by Higham and Moler is not always a negative power of the
base b : I never payed enough attention to that quantity by itself and was
ready to claim the opposite ; I just hope I was not the only one.

However it seems to me that Higham and Moler did not go as far as needed : they
refer to the basic concept of "floating point number" : this is meaningful in a
classical environment where the arithmetical unit and the memory use exactly
the same representation for real numbers. If this is the case, it is irrelevant
to store the intermediate results of a process in a variable or to keep them
within the arithmetic unit ; the quantities eps,mu and u are unambiguous if we
define the floating point numbers as the possibles values a variable of real
type can take.

But it should be reminded that not all computers work that way : it may happen
that the arithmetical unit can handle quantities it is impossible to save in a
variable without loss of information : such a situation was mentioned by Higham
and Moler and arises when the memory uses the basic IEEE standard ( 53 bits )
while the arithmetical unit conforms to the extended IEEE standard ( 64 bits ).
How should we define eps,mu and u in such a context ?

We are then facing two possible values for eps and u ; the values derived from
the characteristics of the memory appear as a reasonable choice but the values
related to the arithmetical unit are not a useless information.

The case of mu is much more difficult because the number of possible values is
almost unpredictable : mu is the result of some computation and therefore its
value depends on the whole sequence of operations : it is then crucial to know
exactly which computation has to be performed and how it is implemented. On one
single machine, the number of computations resulting in different values for mu
can be far greater than 2 ! I show now what you obtain on a Sun 3/60 computer
equipped with an optional arithmetical coprocessor Motorola 68881 when you are
looking for mu through Pascal programs. Here follows a summary of my results
( x and y denote real variables )

Without using the arithmetical coprocessor :

1. The boolean expression 1+x>1 is true iff x >= 2^(-53) + 2^(-105)

2. When y is precomputed by y := 1+x
the boolean expression y>1 is true iff x >= 2^(-53) + 2^(-105)

Using the optionnal arithmetical coprocessor :

3. The boolean expression 1+x>1 is true iff x >= 2^(-64) + 2^(-116)

4. When y is precomputed by y := 1+x
the boolean expression y>1 is true iff x >= 2^(-53) + 2^(-64) + 2^(-105)

5. The boolean expression 1+(x+y)>1 is true in many situations, for example,
a. if x > 2^(-64) and y >= 0
b. if x = 2^(-64) and y >= 2^(-127)
c. if x = 2^(-64) - 2^(-117) and y >= 2^(-117) + 2^(-128) + 2^(-169)

The above results clearly show that defining mu as the smallest quantity such
that 1+mu>1 is ambiguous if we don't say precisely :
- how mu may be obtained and used
- where the result of the addition has to be stored before it is compared to 1

I am far from being an expert but I suggest to adopt the following definition :
if x is a variable of real type, mu is the smallest value that x can take if
we want the computer to give the value true to the boolean expression 1+x>1

That corresponds to the situations 1 and 3 above ; the fact that mu does not
reflect the representation of the floating point numbers within the memory is
not disturbing at all : comparing 1 and 1+x is part of a tricky way to get
information about that representation, assuming a unique standard is valid
everywhere. Maybe, it is time to realize that, when this is not the case, the
errors generated by a single computation do have two independent sources :
- the arithmetical operation itself
- the transfer of numbers to ( and from ? ) the memory
It is then perfectly normal to use two independent parameters mu and eps to
characterize the precision of a floating point arithmetic : the value of mu
says how precisely a result can be obtained within the arithmetical processor
and eps says how precisely that result can be stored in the memory.

************

Date: Fri, 26 Apr 91 20:15:11 +0200

I would like to add new comments and proposals to my first note on the measure
of precision of a floating-point arithmetic.

I just realized that if we want to characterize a t-digit base b arithmetic, we
need four and not only three parameters. Two of them are concerned with "static"
properties ( Cleve Moler did use very properly the term " geometric " ) and the
two others are concerned with " dynamic " properties. The problem we are facing
now is just that, as a result of historical evolution, a confusion has been
created between " static " and " dynamic " ; the best example is perhaps the
quantity u ( unit roundoff ) : it is clearly a " dynamic " concept but it has
been defined until now by a " static " value : I suggest we try to get rid of
our habits and focus on what we really need. Here are my proposals :

" STATIC " ( or " GEOMETRIC " ) ASPECT

b, t and eps = b^(1-t) are clearly the basic parameters ; two of them define
the third. I suggest to characterize the " static " properties of the floating-
point arithmetic by
(1) eps = b^(1-t) : this defines ( within the memory ) the spacing of the
floating-point numbers between 1 and b
Important remark : eps has nothing to do with the equivalent spacing
within the arithmetical unit.
(2) b : the base ( the knowledge of b is only needed if we want to get error
bounds as precise as possible : for a same value of eps, the precision
attached to some computations is a bit higher if the base b is smaller )

" DYNAMIC " ASPECT

Any arithmetic computation usually requires three things :
- a transfer of the data from the memory to the processor
- the execution of the operation itself ( within the processor )
- a transfer of the result from the processor to the memory
With a " dynamic " point of view, the precision may be characterized by two
independant parameters :
(3) u is the unit roundoff of the memory and gives information about the
rounding which occurs when transfering a number from the processor to
the memory ( let us remember that reading data on a peripheral usually
implies a conversion to the base b and involves thus the processor ).
I suggest to define u as the smallest floating-point number x such that
true is the computed value of the boolean expression fl(y>1) if y has
been previously obtained by fl(1+x), stored in the memory and picked
back up ( this double transfer is crucial )

N.B. The old definition of u as 0.5*b^(1-t) is a very special case of
the new one and is equivalent in the case of rounding away from zero.
( What was considered as perfect twenty years ago, before rounding to
even appeared ).

Actually the new value is equal to u = rho * eps where rho is
1 in the case of pure chopping
1/2 in the case of rounding away from zero
something slightly bigger than 1/2 in the case of rounding to even
The new definition of u makes it equal to the quantity mu introduced by
Nick Higham provided that the computations are performed in two steps
as required. ( By the way, Nick did the two steps since he asked the

(4) nu is the unit roundoff of the arithmetical processor when it performs
a computation on data extracted from the memory. It is defined as the
smallest floating-point number x such that true is the computed value of
the boolean expression fl(1+x>1) where the unusual notation ( I wrote
fl(1+x>1) and not fl(1+x)>1 ) means that the user does not force the
intermediate result fl(1+x) to be sent back to the memory ( the computer
is given complete freedom and gets the due credit for its choice ).

Actually nu is equal to u if the computer ( I should probably have to
say the software ) always transfers intermediate results back to the
memory and never tries to avoid it. The value of nu is equivalent to
that of mu if the computation is performed in one single step : we just
ask the computer to evaluate the complete expression as a whole.

As I did already mention it in my first note, I ran some computations on a Sun3
with a Motorola chip 68881. I found

Using Pascal with the otion -f68881 activated :
eps = 2^(-52)
b = 2
u = 2^(-53) + 2^(-64) + 2^(-105)
nu = 2^(-64) + 2^(-116)

Using Pascal without the otion -f68881 :
eps = 2^(-52)
b = 2
u = 2^(-53) + 2^(-105)
nu = 2^(-53) + 2^(-105)

Using MATLAB ( and the Motorola chip ) :
eps = 2^(-52)
b = 2
u = 2^(-53) + 2^(-64) + 2^(-105)
nu = 2^(-53) + 2^(-64) + 2^(-105)

Partial tests have been run on a PC and confirm the last values for MATLAB.
Further trials will performed later.

It turns out that the " static " parameters are independant of the software
and characterize a standard like IEEE, as they should ; on the opposite, the
" dynamic " parameters do vary very strongly : even on a single machine, they
may depend on the software but also on the options activated by the user.

At this point, one thing is clear for me : we really need to agree on precise
concepts and the topics are worth a much deeper discussion. I hope NA-NET will
contribute to it.

Bernard Danloy
Institut de Mathematique
University of Louvain-la-Neuve
Belgium
Email : danloy@anma.ucl.ac.be

------------------------------

End of NA Digest

**************************
-------