## NA Digest Sunday, May 6, 1990 Volume 90 : Issue 18

Today's Editor: Cleve Moler

Today's Topics:

-------------------------------------------------------

From: Arthur Wouk <wouk@cs.duke.edu>
Date: Mon, 30 Apr 90 10:24:14 EDT
Subject: Change of Address for Arthur Wouk

Effective May 15 I will be moving from Durham to Boulder Colorado.

Arthur Wouk
3849 Birchwood Drive
Boulder CO 80301

My new net address will be

I remain fully retired.

------------------------------

From: Pat Nolan <nolan@meggie.stanford.edu>
Date: 27 Apr 90 00:19:03 GMT
Subject: Minimization Subroutines Wanted

I am looking for a nonlinear function-minimizing subroutine in
Fortran that works without derivatives. "Numerical Recipes"
recommends Powell's conjugate gradient method, and offers a
subroutine ready to use. However, it also says that Brent's
implementation of the method (too long to print in the book) is
better. Is Brent's code available anywhere on the network?
I've already checked netlib -- if it's there I missed it.
For the experts: I there anything better than Brent/Powell for
the general case? (That is, not least-squares, no derivatives,
nonlinear, < 20 dimensions.)

Patrick Nolan * Bitnet: PLN@SLACVM
Hansen Laboratories * Internet: nolan@meggie.stanford.edu
Stanford University * (36.64.0.180)

------------------------------

Date: 3 May 90 02:17:56 GMT
Subject: A Matrix Equation

In the course of an optimization problem I have come across the
following problem (all matrices are symmetric):

Solve the following matrix equation:
Y + B + A*X*A = 0,

where A and B are known symmetric n x n matrices, and X and Y are unknown
matrices with the following property: given i,j entry exactly one of X[i,j] or
Y[i,j] is a variable and exactly one of them is 0. For example:

[a 0 b] [1 2 -1] [3 5 -2] [0 u 0] [3 5 -2]
[0 c 0] + [2 1 3] + [5 1 0] * [u 0 v] * [5 1 0] = 0
[b 0 d] [-1 3 2] [-2 0 3] [0 v 0] [-2 0 3]

Y + B + A * X * A = 0

Now, it is obvious that this problem is equivalent to a system of m linear
equations in m unknowns where m= n(n+1)/2, and can be solved in
O(m^3)=O(n^6) operations. My question is can we do better, say in
O(n^3) or O(m^2)=O(n^4)? Notice that in the special case where Y=0, i.e
all the variables are in X, we can solve the problem in O(n^3) by multiplying
>From left and right by inverse of A. I don't see how to extend this to the
more general problem. Is anyone aware of any method, or have any
suggestions? Iterative methods including methods that require computation
of eigenvalues and/or eigenvectors of A (and/or B) is acceptable, as long
as we get a significant improvement over O(n^6) flop count. Alternatively,
can someone put forward a plausible argument that O(m^3) is the best
I can hope for and the special structure of the problem is not special
enough to lend itself to a more clever solution?

Computer Science Dept.
University of Minnesota
Minneapolis, Mn 55455

------------------------------

From: John Pryce <PRYCE%rmcs.cranfield.ac.uk@NSFnet-Relay.AC.UK>
Date: Thu, 3 MAY 90 15:17:03 BST
Subject: Dictionary Covering Numerical Analysis Sought

Query from John Pryce (na.pryce), Maths Group, Royal Military College of
Science, Shrivenham, Swindon, UK.

For a Knowledge-Based System project at NAG Ltd we would very much like to get
hold of a mathematical dictionary
- With a solid coverage of numerical analysis terms and concepts at a level
likely to be useful to users of a NAG help system.
- If in machine readable form (TeX) so much the better.
Does anyone know of such a work?

Replies to me at the above address.

------------------------------

From: Cleve Moler <moler@mathworks.com>
Date: Sun May 6 12:40:15 PDT 1990
Subject: Scaling, Underflow, and Rank Deficiency

Scaling, Underflow, and Rank Deficiency

by Cleve Moler
The MathWorks, Inc.

Several of the routines in LINPACK, and its derivatives like MATLAB,
have occasion to scale a column of a matrix. Let x be such a
column and let s be some norm of x . The algorithms involve
replacing x by x/s . In the various LU factorizations, s is
the largest element of x and x/s becomes a column of L .
In the QR and SVD factorizations, s is the length of x and
x/s generates a Householder reflection. LINPACK calls upon one
of the BLAS, DSCAL, to do the scaling. Unfortunately, DSCAL
multiplies a vector by a scalar because that is usually faster than
dividing by a scalar. So instead of the replacement

x = x/s

LINPACK computes

x = (1/s)*x

A difficulty occurs when all the elements of x are near the
underflow limit of some of the floating point representations.
Then s is nonzero and x/s has some elements of reasonable size,
but LINPACK fails because the temporary scalar 1/s overflows.

We knew about such difficulties when we were developing LINPACK,
but we felt they would be so rare that we didn't have to worry.
The floating point representations in common use at the time
(late '70s) had very few, if any, values whose reciprocals would
overflow. Moreover, you would have to have a really perverse
matrix to get near such values anyway.

I still think that judgement was correct. LINPACK has proved to
be very robust and I, for one, have never heard complaints from
developments have prompted me to change my position slightly.

The first development is the emergence of the IEEE floating point
standard. IEEE floating point has "many" nonzero values whose
reciprocals overflow, namely all the denormals in the range
from 2^(-1024) to 2^(-1074), which is roughly 5.0E-309 to 5.0E-324.
(The situation is probably even worse with IEEE short precision, but
I never use it.) Some machines with IEEE floating point format
circumvent the standard by setting these tiny values to zero. But most
machines based on Intel and Motorola chips, including most PC's, Mac's
and Sun's, handle them "correctly".

The second development is our appreciation of a class of problems
that are not strangely scaled, but which can provoke this difficulty.
The simplest example is to compute the QR or SVD factorizations of
the n-by-n matrix all of whose elements are one,

X = ones(n,n)

Other examples include x(i,j) = i or x(i,j) = i+j . The point
is these matrices are highly rank deficient and their elements
are related to each other by very simple formulas. In the first
step of the QR factorization of ones(n,n), for example, the vector
x is ones(n,1) and s = sqrt(n). The reduced matrix, X(2:n,2:n),
should be flat zero, but, if n is not a perfect square, roundoff
error will probably cause the reduced matrix to again be a constant,
nonzero matrix, but now the constant is a small integer multiple of
eps , the relative machine precision. For IEEE format, eps = 2^(-52) .
The process is repeated. The second step of the reduction
produces a constant matrix with elements on the order of eps^2,
the third gives elements of order eps^3 , and so on. It may
happen that exact cancellation produces an exact zero reduced matrix
and the subsequent steps can be skipped. But on some machines,
for some values of n , the process continues until the
range of denormals and underflow is approached. The solution
to the equation

eps^k = denormal

is k = 1022/52 ~= 20 so on IEEE machines at least 20 steps
are required.

If you have access to MATLAB, you might want to try this:

format compact
format short e
for n = 10:50
home
n
X = ones(n,n);
r = diag(qr(X))'
pause
end

program with the LINPACK library.

parameter (LDX = 50)
double precision X(LDX,LDX),qraux(LDX),work(LDX)
integer jpvt(LDX)
c
do n = 1, LDX
do j = 1, n
do i = 1, n
X(i,j) = 1.0d0
enddo
enddo
c
job = 0
call DQRDC(X,ldx,n,n,qraux,jpvt,work,job)
c
write(6,'(5d15.6)') (X(i,i),i=1,n)
enddo
end

If the machine you use does not have IEEE arithmetic, or only
has IEEE format without denormals, you will probably not encounter
any difficulties, although conceivably you could get overflows
in either MATLAB's or LINPACK's DQRDC routine.

If you use a machine with full IEEE arithmetic, including denormals,
you will very likely see NaN's show up in the output for various
values of n larger than 20, including n = 45 on all the machines
of this type that I have tried so far.

The implications for SVD are worse than just getting NaN's in the
output -- the machine appears to "hang" for a while with DSVDC in
a loop trying to get the NaN's to converge to something negligible.
You can either wait for the iteration limit to be reached, which can
take a while, or face the consequences of breaking out of a tight loop
in the floating point unit.

As one example, here is what I get on my Sun SparcStation1
with n = 24 and X = ones(n,n):

diag(qr(X)) svd(X)

-4.8990e+00 2.4000e+01
2.1298e-15 2.0849e-15
-1.3875e-30 2.8902e-45
6.0202e-46 3.6272e-76
-8.6969e-62 1.0219e-106
7.5288e-77 1.9146e-137
-6.5086e-92 1.6693e-167
2.8090e-107 2.6651e-198
-1.2102e-122 1.4180e-244
2.6018e-138 0
0 0
0 0
... ...
0 0
0 0

We haven't got NaN's or nonconvergence yet, but you can see that
we're headed for trouble. I get nonconvergence in SVD when n = 28.

My first "fix" for this involved an attempt to detect, in a
machine-independent, portable (without using DMACH) manner, when
we were near the dangerous range. Then set the vector to zero
and pretend that this was another roundoff error. In principle
this should be done in each LINPACK routine which calls DSCAL with
the reciprocal of some norm. It wasn't pretty.

The correct solution is to create another Basic Linear Algebra
Subroutine, DSCALD, which scales by division, rather than by
multiplication. Every place DSCAL is now called with 1.0/s,

The execution time of DSCALD is rarely significant; it is never the
inner loop. But if you wanted to speed it up, you could do the following.
Let

delta = 2^(-64)

This is not a machine dependent constant, but it is vaguely related
to the roundoff levels of all the machines we might be dealing with.
(We might have to revise the value of delta in another 10 years.)
For all the computers I know about:

if delta*s does not underflow, then 1/s will not overflow.

So the following algorithm for DSCALD is both safe and reasonably
efficient.

function y = dscald(x,s)
vector x,y
scalar s
constant delta = 2^(-64)
if delta*s == 0
y = x/s
else
y = (1/s)*x
end

I do not propose that all the LINPACK libraries in the world be
changed to use DSCALD. As far as I know, this difficulty only
occurs for a few very special matrices made up to show it can happen.
But it is something for other users of the BLAS and future library
developers to keep in mind. And we will incorporate it in future
releases of MATLAB.

------------------------------

From: Baker Kearfott <rbk5287@usl.edu>
Date: Sun, 6 May 90 13:13:21 CDT
Subject: Fortran 8X, C++ or What?

Without stirring up too much of a political controversy, I would like to
gather opinions concerning the future of the programming languages we
use.

In particular, I have been using Fortran 77 for a long time. My primary
consideration has been portability and universality. Indeed, I have
seen several languages touted as infinitely superior come and go.

However, at this point I find myself in need of operator overloading, of
structures, of recursion, etc. We have a code in which the arithmetic
is in software, and in which we wish to try various arithmetic schemes.
This code also makes use of linked lists and of stacks. While
maintaining and improving the code, the objects we wish to associate
with elements of the lists and stacks change; these objects are also of
various data types. Also, we are starting to consider structured
systems (sparse and otherwise) within this framework. Such changes and
experimentation would require changing of subroutine argument lists in a
fairly deep calling heirarchy, and a consequent proliferation of code
versions, even though the basic algorithm has not changed. Furthermore,
the lack of operator overloading severely limits the ease with which we
can work with different test problems.

Much of the facility we need exists within specialized systems (which
just run on one machine, or which have only one of the capabilities we
need). However, I place a high premium on portability and the ability
to publish and distribute the actual code.

Fortran 8X seems to be crystallizing at this point. However, it may be
several years until reasonable compilers become available. Also, such
compilers probably would cost \$\$\$ per machine, and money has been in
very short supply here.

Alternately, more and more people seem to be using C++. Furthermore, it
seems that AT&T is very strongly encouraging it through various means
(including distribution at a nominal charge, and availability of a
Fortran to C converter). It seems that we could have C++ available on
most of our machines by this summer.

Do people think we could go wrong by switching to C++? Our goals are to
our codes.

Baker Kearfott
rbk@usl.edu (Internet)

------------------------------

From: E. Gallopoulos <stratis@s6.csrd.uiuc.edu>
Date: Wed, 2 May 90 02:33:17 CDT
Subject: ACM International Conference on Supercomputing

1990 ACM International Conference on Supercomputing
June 11-15, Amsterdam, the Netherlands
Vrije Universiteit and Centrum voor Wiskunde en Informatica
in association with ACM-SIGNUM, AICA, BCS-PPG, CTI, CSRD,
CWI, GI, INRIA, IPSJ, SBMAC, and SIAM-SIAG

CONFERENCE CO-CHAIRMEN
Ahmed Sameh, University of Illinois, USA
and
Henk van der Vorst, Delft University of Technology CWI,
the Netherlands

PROGRAM DIRECTOR
John R. Sopka, Digital Equipment Corporation, USA

PROGRAM COMMITTEES
AMERICA: E. Gallopoulos (Chair), F. Darema, E. Davidson,
G. Fox, D. Gannon, M. Heath, J. Hennessy, E. Houstis,
J. McGraw, Y. Patt, C. Polychronopoulos, J. Rice, Y. Saad,
J. E. Smith, H. Wijshoff.
EUROPE-AFRICA J. R. Gurd (Chair), I. S. Duff, W. K. Giloi,
F. Hossfeld, W. Jalby, P. Leca, T. S. Papatheodorou,
R. H. Perrot, P. Sguazerro, H. v. d. Vorst, J. v. Leeuwen,
M. Vanneschi.
JAPAN-FAR EAST Y. Muraoka (Chair), K. Asai, S. Ono, M. Ishii,
T. Kamimura, M. Koike, S. Nagashima, T. Shimada, H. Tanaka,
ORGANIZING COMMITTEE D. DeGroot, H. te Riele,
M. Louter-Nool, F. Snijders, L. Verdonk.

INVITED LECTURES:
Burton Smith, "The Tera Computer System"
Tony Chan, "Parallel Multilevel Algorithms for PDE's"
William Wulf, "The Collaboratory: A Larger Context for Support
of Computational Science"
Wolfgang Fichtner, "Iterative Methods and Supercomputers for
VLSI Device Simulation"
Toshitsugu Yuba, "Dataflow Computer Development in Japan"
Ahmed Noor, "Strategies for Large-Scale Structural Problems on
High Performance Computers"
Piet van der Houwen, "Parallel ODE Solvers"
Anthony Hey, "Supercomputing with transputers-past,
present and future"

This is the fourth year of the ACM International Conference on Supercomputing.
The conference will be held at Vrije Universiteit. Conference Proceedings
there will also be demonstrations of projects on Software Environments
for Scientific Computing, accompanied by panel discussions, and vendor exhibits.
The conference will also host the award of the first annual
Bell-Perfect Prize for high performance scientific computing.

SOCIAL EVENTS:

Sunday, 10 June: Reception and informal get together. Admission
free for participants.
Tuesday, 12 June: Conference banquet, to be served during a
boat trip through the Amsterdam canals and on the river Zaan to the
Zaanse Schans. Costs: Dfl. 80.-- per person.
Wednesday, 13 June: Excursion to either the Kroller-Muller Museum,
where part (i.e. the drawings) of the great Van Gogh 1990 exhibition
can be admired or to the Delta works.
Costs: Dfl. 60.-- per person.
Thursday, 14 June: Reception in a typically Dutch museum. Admission
free for participants.
Friday, 15 June: (A limited number of) Tickets are available for the
other part of the great Van Gogh 1990 exhibition (the Van Gogh
paintings) in the Van Gogh Museum in Amsterdam. Price per ticket:
Dfl. 20.--.

program, can be obtained from

Mr. Frans Snijders
CWI
P.O. Box 4079, Kruislaan 413
1009 AB Amsterdam, The Netherlands

fax: +31-20-5924199
email: franss@cwi.nl

------------------------------

End of NA Digest

**************************
-------