# [R] scatter.smooth warning "k-d tree" --> "loess bowels"

Martin Maechler maechler at stat.math.ethz.ch
Fri Sep 5 16:36:45 CEST 2003

```>>>>> "Peter" == Peter Flom <flom at ndri.org>
>>>>>     on Thu, 04 Sep 2003 13:28:12 -0400 writes:

Peter> Hello When I run

Peter> scatter.smooth(jitter(weight), jitter(height2), span
Peter>                = .25, evaluation = 50, pch = '.')

Peter> I get the type of graph I thought I would get, but
Peter> also a warning.....

(and not an "error" as said in the original Subject)

Peter> k-d tree limited by memory. ncmax= 528

Peter> I always get concerned when there are warnings I
Peter> don't understand.  What's a k-d tree?  Is this
Peter> something to be concerned about?

scatter.smooth() builds on loess() and the reference in
help(loess) is chapter 8 of "the white book",

W.S. Cleveland, E. Grosse and W.M. Shyu (1992) Local regression
models. Chapter 8 of _Statistical Models in S_ eds J.M. Chambers
and T.J. Hastie, Wadsworth & Brooks/Cole.

Specifically, Section 8.4.2, p.373-376 is what you need here.
You can learn that a k-d tree is the data structure used to
represent a particular kind of "rpart()"-like partitioning of
the predictor space.
(The fun part is in the subsection "Error Messages from the Bowels of Loess"
where you learn why you can even get an error message  "Chernobyl! ...")

---

The warning means that the loess() approximation will be a bit
more rough than might be desired.,
since help(loess.control) has

>> Usage:
>>
>>      loess.control(surface = c("interpolate", "direct"),
>>                    statistics = c("approximate", "exact"),
>>                    trace.hat = c("exact", "approximate"),
>>                    cell = 0.2, iterations = 4, ...)
>>
>> Arguments:
>>
>>  surface: should be fitted surface be computed exactly or via
>>           interpolation from a kd tree?

By setting surface = "direct" you will certainly get rid of the
above warning, but probably pay a (too) big performance penalty.

Unfortunately the loess-underlying Fortran code is pretty messy
(with many dozens of subroutines called ehg125(), ehg126(), ....)
so that it's not obvious how to improve it to adapt memory usage
to the size of the k-d tree used.  I'm pretty sure that today's
computers would allow much larger trees than the loess()