[R] scatter.smooth warning "k-d tree" --> "loess bowels"

Fri Sep 5 16:36:45 CEST 2003

>>>>> "Peter" == Peter Flom <flom at ndri.org>
>>>>>     on Thu, 04 Sep 2003 13:28:12 -0400 writes:

    Peter> Hello When I run

    Peter> scatter.smooth(jitter(weight), jitter(height2), span
    Peter>                = .25, evaluation = 50, pch = '.')

    Peter> I get the type of graph I thought I would get, but
    Peter> also a warning.....

 (and not an "error" as said in the original Subject)

    Peter> k-d tree limited by memory. ncmax= 528

    Peter> I always get concerned when there are warnings I
    Peter> don't understand.  What's a k-d tree?  Is this
    Peter> something to be concerned about?

scatter.smooth() builds on loess() and the reference in
help(loess) is chapter 8 of "the white book",

     W.S. Cleveland, E. Grosse and W.M. Shyu (1992) Local regression
     models. Chapter 8 of _Statistical Models in S_ eds J.M. Chambers
     and T.J. Hastie, Wadsworth & Brooks/Cole.

Specifically, Section 8.4.2, p.373-376 is what you need here.
You can learn that a k-d tree is the data structure used to
represent a particular kind of "rpart()"-like partitioning of
the predictor space.
(The fun part is in the subsection "Error Messages from the Bowels of Loess"
 where you learn why you can even get an error message  "Chernobyl! ...")

---

The warning means that the loess() approximation will be a bit
more rough than might be desired.,
since help(loess.control) has

  >> Usage:
  >> 
  >>      loess.control(surface = c("interpolate", "direct"),
  >>                    statistics = c("approximate", "exact"),
  >>                    trace.hat = c("exact", "approximate"),
  >>                    cell = 0.2, iterations = 4, ...)
  >> 
  >> Arguments:
  >> 
  >>  surface: should be fitted surface be computed exactly or via
  >>           interpolation from a kd tree?

By setting surface = "direct" you will certainly get rid of the
above warning, but probably pay a (too) big performance penalty.

Unfortunately the loess-underlying Fortran code is pretty messy
(with many dozens of subroutines called ehg125(), ehg126(), ....)
so that it's not obvious how to improve it to adapt memory usage
to the size of the k-d tree used.  I'm pretty sure that today's
computers would allow much larger trees than the loess()
algorithm was made to.

Regards,
Martin Maechler <maechler at stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><