# [R] strange behavior of loess() & predict()

Hong Ooi Hong.Ooi at iag.com.au
Wed Dec 7 00:41:14 CET 2005

```_______________________________________________________________________________________

The problem appears to be in how your original data has several tied values:

> table(x)
x
1.8   2 2.2 2.4 2.6 2.8   3 3.2 3.4 3.6   4
1   2   2   2   5   7   2   3   1   2   1

IIRC the maths and programming behind loess assume unique values for the predictor.

One way to get around this is to jitter your data:

> x2 <- jitter(x)
> modj <- loess(y ~ x2, span=.5, degree=1)
> predict(modj, data.frame(x=X))
 3.156192 3.141705 3.126918 3.112996 3.101108 3.087696 3.063471 3.038609 3.024639 3.032585 3.059480 3.091774
 3.115763 3.117743 3.092979 3.040798 2.988283 2.957976 2.950648 3.008358 3.070065 3.127379 3.193501 3.149428
 3.082843 3.010998 2.939407 2.888213 2.841487 2.812815 2.801583 2.807181 2.837887 2.899130 2.978165 3.062088
 3.137995 3.204628 3.271813 3.339450 3.407396 3.475510 3.543843 3.612450 3.681267 3.750227 3.819267 3.888321
 3.957324 4.026212

Another way is to summarise your data using table() and aggregate(), and fit a weighted model where the weights are the counts for each unique x-value:

> dtab <-  aggregate(data.frame(y=y), by=list(x=x), FUN=mean)
> dtab\$x <- as.numeric(as.character(dtab\$x))
> dtab\$w <- table(x)
> modt <- loess(y ~ x, span=.5, degree=1, weights=w, data=dtab)
> predict(modt, data.frame(x=X))
 3.186959 3.163133 3.136244 3.110822 3.091396 3.076705 3.047705 3.018362 3.007143 3.032246 3.069599 3.092369
 3.098049 3.084134 3.053633 3.027429 3.012429 3.013908 3.036517 3.060372 3.076116 3.086870 3.095758 3.097287
 3.073824 3.031238 2.976659 2.917402 2.863489 2.821469 2.796398 2.793336 2.823850 2.892363 2.980322 3.068725
 3.140843 3.208920 3.279124 3.351965 3.427952 3.504330 3.577149 3.647119 3.714984 3.781486 3.847369 3.913375
 3.980249 4.048733

There's probably a way to make the aggregate and table calls neater.

--
Hong Ooi
Senior Research Analyst, IAG Limited
388 George St, Sydney NSW 2000
+61 (2) 9292 1566
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Leo Gürtler
Sent: Wednesday, 7 December 2005 8:10 AM
To: r-help at stat.math.ethz.ch
Cc: gavin.simpson at ucl.ac.uk
Subject: Re: [R] strange behavior of loess() & predict()

Gavin Simpson wrote:

Dear list,

I am very sorry for being inaccurate in my question. But re-reading the
predict.loess help site does not provide a solution. As long as predict
is used on a new dataset based on this dataset, the strange values
remain and can be reproduced.
Adding a new element to both vectors (at the beginning, e.g. "1" for
each vector) results in plausible values - but not in every case.
Even switching x and y is sufficient (i.e. x as predictor and y as
dependent variable). So my question is:

Is it normal - or under which conditions does it take place - that
predict.loess predicts values that are almost 20000/max(y) ~ 5000 times
higher than expected?

best,

leo gürtler

>On Tue, 2005-12-06 at 18:09 +0100, Leo Gürtler wrote:
>
>
>>Dear altogether,
>>
>>
><snip>
>
>
>># here is the difference!!
>>predict(mod, data.frame(x=X), se=TRUE)
>>predict(mod, x=X, se=TRUE)
>>
>>
>><--- end of snip --->
>>
>>I assume this has some reason but I do not understand this reason.
>>Merci,
>>
>>
>
>Not sure if this is the reason, but there is no argument x in
>predict.loess, and:
>
>a <- predict(mod, se = TRUE)
>
>gives you the same results as:
>
>b <- predict(mod, x=X, se=TRUE)
>
>so the x argument appears to be being passed on/in the ... arguments and
>ignored? As such, you have no newdata, so mod\$x is used.
>
>Now, when you do:
>
>c <- predict(mod, data.frame(x=X), se=TRUE)
>
>You have used an un-named argument in position 2. R takes this to be
>what you want to use for newdata and so works with this data rather than
>the one in mod\$x as in the first case:
>
># now named second argument - gets ignored as in a and b
>d <- predict(mod, x = data.frame(x=X), se=TRUE)
>
>all.equal(a, b) # TRUE
>all.equal(a, c) # FALSE
>all.equal(a, d) # TRUE
>
># this time we assign X to x by using (), the result is used as newdata
>e <-  predict(mod, (x=X), se=TRUE)
>
>all.equal(c, e) # TRUE
>
>If in doubt, name your arguments and check the help! ?predict.loess
>would have quickly shown you where the problem lay.
>
>HTH
>
>G
>
>
>
>>best regards
>>
>>leo gürtler
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>
>>

--

email: leog at anicca-vijja.de
www: http://www.anicca-vijja.de/

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help