* wsc@research.bell-labs.com Mon Dec 30 16:55 EST 1985
* W. S. Cleveland
* Bell Laboratories
* Murray Hill NJ 07974
*
* outline of this file:
* lines 1-72 introduction
* 73-177 documentation for lowess
* 178-238 ratfor version of lowess
* 239-301 documentation for lowest
* 302-350 ratfor version of lowest
* 351-end test driver and fortran version of lowess and lowest
*
* a multivariate version is available by "send dloess from a"
*
* COMPUTER PROGRAMS FOR LOCALLY WEIGHTED REGRESSION
*
* This package consists of two FORTRAN programs for
* smoothing scatterplots by robust locally weighted
* regression, or lowess. The principal routine is LOWESS
* which computes the smoothed values using the method
* described in The Elements of Graphing Data, by William S.
* Cleveland (Wadsworth, 555 Morego Street, Monterey,
* California 93940).
*
* LOWESS calls a support routine, LOWEST, the code for
* which is included. LOWESS also calls a routine SORT, which
* the user must provide.
*
* To reduce the computations, LOWESS requires that the
* arrays X and Y, which are the horizontal and vertical
* coordinates, respectively, of the scatterplot, be such that
* X is sorted from smallest to largest. The user must
* therefore use another sort routine which will sort X and Y
* according to X.
* To summarize the scatterplot, YS, the fitted values,
* should be plotted against X. No graphics routines are
* available in the package and must be supplied by the user.
*
* The FORTRAN code for the routines LOWESS and LOWEST has
* been generated from higher level RATFOR programs
* (B. W. Kernighan, ``RATFOR: A Preprocessor for a Rational
* Fortran,'' Software Practice and Experience, Vol. 5 (1975),
* which are also included.
*
* The following are data and output from LOWESS that can
* be used to check your implementation of the routines. The
* notation (10)v means 10 values of v.
*
*
*
*
* X values:
* 1 2 3 4 5 (10)6 8 10 12 14 50
*
* Y values:
* 18 2 15 6 10 4 16 11 7 3 14 17 20 12 9 13 1 8 5 19
*
*
* YS values with F = .25, NSTEPS = 0, DELTA = 0.0
* 13.659 11.145 8.701 9.722 10.000 (10)11.300 13.000 6.440 5.596
* 5.456 18.998
*
* YS values with F = .25, NSTEPS = 0 , DELTA = 3.0
* 13.659 12.347 11.034 9.722 10.511 (10)11.300 13.000 6.440 5.596
* 5.456 18.998
*
* YS values with F = .25, NSTEPS = 2, DELTA = 0.0
* 14.811 12.115 8.984 9.676 10.000 (10)11.346 13.000 6.734 5.744
* 5.415 18.998
*
*
*
*
* LOWESS
*
*
*
* Calling sequence
*
* CALL LOWESS(X,Y,N,F,NSTEPS,DELTA,YS,RW,RES)
*
* Purpose
*
* LOWESS computes the smooth of a scatterplot of Y against X
* using robust locally weighted regression. Fitted values,
* YS, are computed at each of the values of the horizontal
* axis in X.
*
* Argument description
*
* X = Input; abscissas of the points on the
* scatterplot; the values in X must be ordered
* from smallest to largest.
* Y = Input; ordinates of the points on the
* scatterplot.
* N = Input; dimension of X,Y,YS,RW, and RES.
* F = Input; specifies the amount of smoothing; F is
* the fraction of points used to compute each
* fitted value; as F increases the smoothed values
* become smoother; choosing F in the range .2 to
* .8 usually results in a good fit; if you have no
* idea which value to use, try F = .5.
* NSTEPS = Input; the number of iterations in the robust
* fit; if NSTEPS = 0, the nonrobust fit is
* returned; setting NSTEPS equal to 2 should serve
* most purposes.
* DELTA = input; nonnegative parameter which may be used
* to save computations; if N is less than 100, set
* DELTA equal to 0.0; if N is greater than 100 you
* should find out how DELTA works by reading the
* additional instructions section.
* YS = Output; fitted values; YS(I) is the fitted value
* at X(I); to summarize the scatterplot, YS(I)
* should be plotted against X(I).
* RW = Output; robustness weights; RW(I) is the weight
* given to the point (X(I),Y(I)); if NSTEPS = 0,
* RW is not used.
* RES = Output; residuals; RES(I) = Y(I)-YS(I).
*
*
* Other programs called
*
* LOWEST
* SSORT
*
* Additional instructions
*
* DELTA can be used to save computations. Very roughly the
* algorithm is this: on the initial fit and on each of the
* NSTEPS iterations locally weighted regression fitted values
* are computed at points in X which are spaced, roughly, DELTA
* apart; then the fitted values at the remaining points are
* computed using linear interpolation. The first locally
* weighted regression (l.w.r.) computation is carried out at
* X(1) and the last is carried out at X(N). Suppose the
* l.w.r. computation is carried out at X(I). If X(I+1) is
* greater than or equal to X(I)+DELTA, the next l.w.r.
* computation is carried out at X(I+1). If X(I+1) is less
* than X(I)+DELTA, the next l.w.r. computation is carried out
* at the largest X(J) which is greater than or equal to X(I)
* but is not greater than X(I)+DELTA. Then the fitted values
* for X(K) between X(I) and X(J), if there are any, are
* computed by linear interpolation of the fitted values at
* X(I) and X(J). If N is less than 100 then DELTA can be set
* to 0.0 since the computation time will not be too great.
* For larger N it is typically not necessary to carry out the
* l.w.r. computation for all points, so that much computation
* time can be saved by taking DELTA to be greater than 0.0.
* If DELTA = Range (X)/k then, if the values in X were
* uniformly scattered over the range, the full l.w.r.
* computation would be carried out at approximately k points.
* Taking k to be 50 often works well.
*
* Method
*
* The fitted values are computed by using the nearest neighbor
* routine and robust locally weighted regression of degree 1
* with the tricube weight function. A few additional features
* have been added. Suppose r is FN truncated to an integer.
* Let h be the distance to the r-th nearest neighbor
* from X(I). All points within h of X(I) are used. Thus if
* the r-th nearest neighbor is exactly the same distance as
* other points, more than r points can possibly be used for
* the smooth at X(I). There are two cases where robust
* locally weighted regression of degree 0 is actually used at
* X(I). One case occurs when h is 0.0. The second case
* occurs when the weighted standard error of the X(I) with
* respect to the weights w(j) is less than .001 times the
* range of the X(I), where w(j) is the weight assigned to the
* j-th point of X (the tricube weight times the robustness
* weight) divided by the sum of all of the weights. Finally,
* if the w(j) are all zero for the smooth at X(I), the fitted
* value is taken to be Y(I).
*
*
*
*
* subroutine lowess(x,y,n,f,nsteps,delta,ys,rw,res)
* real x(n),y(n),ys(n),rw(n),res(n)
* logical ok
* if (n<2){ ys(1) = y(1); return }
* ns = max0(min0(ifix(f*float(n)),n),2) # at least two, at most n points
* for(iter=1; iter<=nsteps+1; iter=iter+1){ # robustness iterations
* nleft = 1; nright = ns
* last = 0 # index of prev estimated point
* i = 1 # index of current point
* repeat{
* while(nright1,rw,ok)
* # fitted value at x(i)
* if (!ok) ys(i) = y(i)
* # all weights zero - copy over value (all rw==0)
* if (lastcut) break # i one beyond last pt within cut
* if(x(i)==x(last)){ # exact match in x
* ys(i) = ys(last)
* last = i
* }
* }
* i=max0(last+1,i-1)
* # back 1 point so interpolation within delta, but always go forward
* } until(last>=n)
* do i = 1,n # residuals
* res(i) = y(i)-ys(i)
* if (iter>nsteps) break # compute robustness weights except last time
* do i = 1,n
* rw(i) = abs(res(i))
* call sort(rw,n)
* m1 = 1+n/2; m2 = n-m1+1
* cmad = 3.0*(rw(m1)+rw(m2)) # 6 median abs resid
* c9 = .999*cmad; c1 = .001*cmad
* do i = 1,n {
* r = abs(res(i))
* if(r<=c1) rw(i)=1. # near 0, avoid underflow
* else if(r>c9) rw(i)=0. # near 1, avoid underflow
* else rw(i) = (1.0-(r/cmad)**2)**2
* }
* }
* return
* end
*
*
*
*
* LOWEST
*
*
*
* Calling sequence
*
* CALL LOWEST(X,Y,N,XS,YS,NLEFT,NRIGHT,W,USERW,RW,OK)
*
* Purpose
*
* LOWEST is a support routine for LOWESS and ordinarily will
* not be called by the user. The fitted value, YS, is
* computed at the value, XS, of the horizontal axis.
* Robustness weights, RW, can be employed in computing the
* fit.
*
* Argument description
*
*
* X = Input; abscissas of the points on the
* scatterplot; the values in X must be ordered
* from smallest to largest.
* Y = Input; ordinates of the points on the
* scatterplot.
* N = Input; dimension of X,Y,W, and RW.
* XS = Input; value of the horizontal axis at which the
* smooth is computed.
* YS = Output; fitted value at XS.
* NLEFT = Input; index of the first point which should be
* considered in computing the fitted value.
* NRIGHT = Input; index of the last point which should be
* considered in computing the fitted value.
* W = Output; W(I) is the weight for Y(I) used in the
* expression for YS, which is the sum from
* I = NLEFT to NRIGHT of W(I)*Y(I); W(I) is
* defined only at locations NLEFT to NRIGHT.
* USERW = Input; logical variable; if USERW is .TRUE., a
* robust fit is carried out using the weights in
* RW; if USERW is .FALSE., the values in RW are
* not used.
* RW = Input; robustness weights.
* OK = Output; logical variable; if the weights for the
* smooth are all 0.0, the fitted value, YS, is not
* computed and OK is set equal to .FALSE.; if the
* fitted value is computed OK is set equal to
*
*
* Method
*
* The smooth at XS is computed using (robust) locally weighted
* regression of degree 1. The tricube weight function is used
* with h equal to the maximum of XS-X(NLEFT) and X(NRIGHT)-XS.
* Two cases where the program reverts to locally weighted
* regression of degree 0 are described in the documentation
* for LOWESS.
*
*
*
*
* subroutine lowest(x,y,n,xs,ys,nleft,nright,w,userw,rw,ok)
* real x(n),y(n),w(n),rw(n)
* logical userw,ok
* range = x(n)-x(1)
* h = amax1(xs-x(nleft),x(nright)-xs)
* h9 = .999*h
* h1 = .001*h
* a = 0.0 # sum of weights
* for(j=nleft; j<=n; j=j+1){ # compute weights (pick up all ties on right)
* w(j)=0.
* r = abs(x(j)-xs)
* if (r<=h9) { # small enough for non-zero weight
* if (r>h1) w(j) = (1.0-(r/h)**3)**3
* else w(j) = 1.
* if (userw) w(j) = rw(j)*w(j)
* a = a+w(j)
* }
* else if(x(j)>xs)break # get out at first zero wt on right
* }
* nrt=j-1 # rightmost pt (may be greater than nright because of ties)
* if (a<=0.0) ok = FALSE
* else { # weighted least squares
* ok = TRUE
* do j = nleft,nrt
* w(j) = w(j)/a # make sum of w(j) == 1
* if (h>0.) { # use linear fit
* a = 0.0
* do j = nleft,nrt
* a = a+w(j)*x(j) # weighted center of x values
* b = xs-a
* c = 0.0
* do j = nleft,nrt
* c = c+w(j)*(x(j)-a)**2
* if(sqrt(c)>.001*range) {
* # points are spread out enough to compute slope
* b = b/c
* do j = nleft,nrt
* w(j) = w(j)*(1.0+b*(x(j)-a))
* }
* }
* ys = 0.0
* do j = nleft,nrt
* ys = ys+w(j)*y(j)
* }
* return
* end
*
*
*
c test driver for lowess
c for expected output, see introduction
real x(20), y(20), ys(20), rw(20), res(20)
data x /1,2,3,4,5,10*6,8,10,12,14,50/
data y /18,2,15,6,10,4,16,11,7,3,14,17,20,12,9,13,1,8,5,19/
call lowess(x,y,20,.25,0,0.,ys,rw,res)
write(6,*) ys
call lowess(x,y,20,.25,0,3.,ys,rw,res)
write(6,*) ys
call lowess(x,y,20,.25,2,0.,ys,rw,res)
write(6,*) ys
end
c**************************************************************
c Fortran output from ratfor
c
subroutine lowess(x, y, n, f, nsteps, delta, ys, rw, res)
integer n
integer nsteps
real x(n), y(n), f, delta, ys(n), rw(n)
real res(n)
integer nright, min0, max0, i, j, ifix
integer iter, last, m1, m2, ns, nleft
real abs, cut, cmad, r, d1, d2
real c1, c9, alpha, denom, float
logical ok
if (n .ge. 2) goto 1
ys(1) = y(1)
return
c at least two, at most n points
1 ns = max0(min0(ifix(f*float(n)), n), 2)
iter = 1
goto 3
2 iter = iter+1
3 if (iter .gt. nsteps+1) goto 22
c robustness iterations
nleft = 1
nright = ns
c index of prev estimated point
last = 0
c index of current point
i = 1
4 if (nright .ge. n) goto 5
c move nleft, nright to right if radius decreases
d1 = x(i)-x(nleft)
c if d1<=d2 with x(nright+1)==x(nright), lowest fixes
d2 = x(nright+1)-x(i)
if (d1 .le. d2) goto 5
c radius will not decrease by move right
nleft = nleft+1
nright = nright+1
goto 4
c fitted value at x(i)
5 call lowest(x, y, n, x(i), ys(i), nleft, nright, res, iter
+ .gt. 1, rw, ok)
if (.not. ok) ys(i) = y(i)
c all weights zero - copy over value (all rw==0)
if (last .ge. i-1) goto 9
denom = x(i)-x(last)
c skipped points -- interpolate
c non-zero - proof?
j = last+1
goto 7
6 j = j+1
7 if (j .ge. i) goto 8
alpha = (x(j)-x(last))/denom
ys(j) = alpha*ys(i)+(1.0-alpha)*ys(last)
goto 6
8 continue
c last point actually estimated
9 last = i
c x coord of close points
cut = x(last)+delta
i = last+1
goto 11
10 i = i+1
11 if (i .gt. n) goto 13
c find close points
if (x(i) .gt. cut) goto 13
c i one beyond last pt within cut
if (x(i) .ne. x(last)) goto 12
ys(i) = ys(last)
c exact match in x
last = i
12 continue
goto 10
c back 1 point so interpolation within delta, but always go forward
13 i = max0(last+1, i-1)
14 if (last .lt. n) goto 4
c residuals
do 15 i = 1, n
res(i) = y(i)-ys(i)
15 continue
if (iter .gt. nsteps) goto 22
c compute robustness weights except last time
do 16 i = 1, n
rw(i) = abs(res(i))
16 continue
call sort(rw, n)
m1 = n/2+1
m2 = n-m1+1
c 6 median abs resid
cmad = 3.0*(rw(m1)+rw(m2))
c9 = .999*cmad
c1 = .001*cmad
do 21 i = 1, n
r = abs(res(i))
if (r .gt. c1) goto 17
rw(i) = 1.
c near 0, avoid underflow
goto 20
17 if (r .le. c9) goto 18
rw(i) = 0.
c near 1, avoid underflow
goto 19
18 rw(i) = (1.0-(r/cmad)**2)**2
19 continue
20 continue
21 continue
goto 2
22 return
end
subroutine lowest(x, y, n, xs, ys, nleft, nright, w, userw
+, rw, ok)
integer n
integer nleft, nright
real x(n), y(n), xs, ys, w(n), rw(n)
logical userw, ok
integer nrt, j
real abs, a, b, c, h, r
real h1, sqrt, h9, amax1, range
range = x(n)-x(1)
h = amax1(xs-x(nleft), x(nright)-xs)
h9 = .999*h
h1 = .001*h
c sum of weights
a = 0.0
j = nleft
goto 2
1 j = j+1
2 if (j .gt. n) goto 7
c compute weights (pick up all ties on right)
w(j) = 0.
r = abs(x(j)-xs)
if (r .gt. h9) goto 5
if (r .le. h1) goto 3
w(j) = (1.0-(r/h)**3)**3
c small enough for non-zero weight
goto 4
3 w(j) = 1.
4 if (userw) w(j) = rw(j)*w(j)
a = a+w(j)
goto 6
5 if (x(j) .gt. xs) goto 7
c get out at first zero wt on right
6 continue
goto 1
c rightmost pt (may be greater than nright because of ties)
7 nrt = j-1
if (a .gt. 0.0) goto 8
ok = .false.
goto 16
8 ok = .true.
c weighted least squares
do 9 j = nleft, nrt
c make sum of w(j) == 1
w(j) = w(j)/a
9 continue
if (h .le. 0.) goto 14
a = 0.0
c use linear fit
do 10 j = nleft, nrt
c weighted center of x values
a = a+w(j)*x(j)
10 continue
b = xs-a
c = 0.0
do 11 j = nleft, nrt
c = c+w(j)*(x(j)-a)**2
11 continue
if (sqrt(c) .le. .001*range) goto 13
b = b/c
c points are spread out enough to compute slope
do 12 j = nleft, nrt
w(j) = w(j)*(b*(x(j)-a)+1.0)
12 continue
13 continue
14 ys = 0.0
do 15 j = nleft, nrt
ys = ys+w(j)*y(j)
15 continue
16 return
end