[R] strange behavior of loess() & predict()

Wed Dec 7 00:41:14 CET 2005

_______________________________________________________________________________________

The problem appears to be in how your original data has several tied values:

> table(x)
x
1.8   2 2.2 2.4 2.6 2.8   3 3.2 3.4 3.6   4 
  1   2   2   2   5   7   2   3   1   2   1

IIRC the maths and programming behind loess assume unique values for the predictor.

One way to get around this is to jitter your data:

> x2 <- jitter(x)
> modj <- loess(y ~ x2, span=.5, degree=1)
> predict(modj, data.frame(x=X))
 [1] 3.156192 3.141705 3.126918 3.112996 3.101108 3.087696 3.063471 3.038609 3.024639 3.032585 3.059480 3.091774
[13] 3.115763 3.117743 3.092979 3.040798 2.988283 2.957976 2.950648 3.008358 3.070065 3.127379 3.193501 3.149428
[25] 3.082843 3.010998 2.939407 2.888213 2.841487 2.812815 2.801583 2.807181 2.837887 2.899130 2.978165 3.062088
[37] 3.137995 3.204628 3.271813 3.339450 3.407396 3.475510 3.543843 3.612450 3.681267 3.750227 3.819267 3.888321
[49] 3.957324 4.026212

Another way is to summarise your data using table() and aggregate(), and fit a weighted model where the weights are the counts for each unique x-value: 

> dtab <-  aggregate(data.frame(y=y), by=list(x=x), FUN=mean)
> dtab$x <- as.numeric(as.character(dtab$x))
> dtab$w <- table(x)
> modt <- loess(y ~ x, span=.5, degree=1, weights=w, data=dtab)
> predict(modt, data.frame(x=X))
 [1] 3.186959 3.163133 3.136244 3.110822 3.091396 3.076705 3.047705 3.018362 3.007143 3.032246 3.069599 3.092369
[13] 3.098049 3.084134 3.053633 3.027429 3.012429 3.013908 3.036517 3.060372 3.076116 3.086870 3.095758 3.097287
[25] 3.073824 3.031238 2.976659 2.917402 2.863489 2.821469 2.796398 2.793336 2.823850 2.892363 2.980322 3.068725
[37] 3.140843 3.208920 3.279124 3.351965 3.427952 3.504330 3.577149 3.647119 3.714984 3.781486 3.847369 3.913375
[49] 3.980249 4.048733

There's probably a way to make the aggregate and table calls neater.

-- 
Hong Ooi
Senior Research Analyst, IAG Limited
388 George St, Sydney NSW 2000
+61 (2) 9292 1566
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Leo Gürtler
Sent: Wednesday, 7 December 2005 8:10 AM
To: r-help at stat.math.ethz.ch
Cc: gavin.simpson at ucl.ac.uk
Subject: Re: [R] strange behavior of loess() & predict()

Gavin Simpson wrote:

Dear list,

I am very sorry for being inaccurate in my question. But re-reading the 
predict.loess help site does not provide a solution. As long as predict 
is used on a new dataset based on this dataset, the strange values 
remain and can be reproduced.
Adding a new element to both vectors (at the beginning, e.g. "1" for 
each vector) results in plausible values - but not in every case.
Even switching x and y is sufficient (i.e. x as predictor and y as 
dependent variable). So my question is:

Is it normal - or under which conditions does it take place - that 
predict.loess predicts values that are almost 20000/max(y) ~ 5000 times 
higher than expected?

best,

leo gürtler

>On Tue, 2005-12-06 at 18:09 +0100, Leo Gürtler wrote:
>  
>
>>Dear altogether,
>>    
>>
><snip>
>  
>
>># here is the difference!!
>>predict(mod, data.frame(x=X), se=TRUE)
>>predict(mod, x=X, se=TRUE)
>>
>>
>><--- end of snip --->
>>
>>I assume this has some reason but I do not understand this reason.
>>Merci,
>>    
>>
>
>Not sure if this is the reason, but there is no argument x in
>predict.loess, and:
>
>a <- predict(mod, se = TRUE)
>
>gives you the same results as:
>
>b <- predict(mod, x=X, se=TRUE)
>
>so the x argument appears to be being passed on/in the ... arguments and
>ignored? As such, you have no newdata, so mod$x is used.
>
>Now, when you do:
>
>c <- predict(mod, data.frame(x=X), se=TRUE)
>
>You have used an un-named argument in position 2. R takes this to be
>what you want to use for newdata and so works with this data rather than
>the one in mod$x as in the first case:
>
># now named second argument - gets ignored as in a and b
>d <- predict(mod, x = data.frame(x=X), se=TRUE)
>
>all.equal(a, b) # TRUE
>all.equal(a, c) # FALSE
>all.equal(a, d) # TRUE
>
># this time we assign X to x by using (), the result is used as newdata
>e <-  predict(mod, (x=X), se=TRUE)
>
>all.equal(c, e) # TRUE
>
>If in doubt, name your arguments and check the help! ?predict.loess
>would have quickly shown you where the problem lay.
>
>HTH
>
>G
>
>  
>
>>best regards
>>
>>leo gürtler
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>    
>>

-- 

email: leog at anicca-vijja.de
www: http://www.anicca-vijja.de/

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

_______________________________________________________________________________________

The information transmitted in this message and its attachme...{{dropped}}