[R] What is the most useful way to detect nonlinearity in lo

Mon Dec 6 03:26:02 CET 2004

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of 
> Ted.Harding at nessie.mcc.ac.uk
> Sent: Sunday, December 05, 2004 7:14 PM
> To: r-help at stat.math.ethz.ch
> Subject: Re: [R] What is the most useful way to detect 
> nonlinearity in lo
> 
> 
> On 05-Dec-04 Peter Dalgaard wrote:
> > (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> writes:
> > 
> >> >> x <- runif(500)
> >> >> y <- rbinom(500,size=1,p=plogis(x))
> >> >> xx <- predict(loess(resid(glm(y~x,binomial))~x),se=T)
> >> >> matplot(x,cbind(xx$fit, 2*xx$se.fit, -2*xx$se.fit),pch=20)
> >> >> 
> >> >> Not sure my money isn't still on the splines, though.
> > .....
> >> > Serves me right for posting way beyond my bedtime...
> >> 
> >> Hi Peter,
> >> 
> >> Yes, the above is certainly misleading (try it with 2000 instead
> >> of 500)! But what would you suggest instead?
> > 
> > (I did and this little computer came tumbling down...). 
> 
> So did mine -- but at 5000 (which is the value I first tried):
> lots of disk grinding and then it went "prprprprp" and wrote
> words to the effect "Calloc cannot allocate (18790050 times 4)"
> i.e. it needed 72MB, which bankrupted my 192MB baby.
> 
> 2000 was OK, however, but I had plenty of time for a meal etc.
> before it finished.
> 
> Which brings up that predict(loess(....)) seems to be very
> memory-hungry.

locfit to the rescue, perhaps?

> library(locfit)
> n <- 5000
> x <- sort(runif(n))
> y <- rbinom(n, size=1, p=plogis(x))
> system.time(xx <- predict(locfit(resid(glm(y~x, binomial))~x),
where="data",
+                           se=TRUE), gcFirst=TRUE)
[1] 0.79 0.00 0.84   NA   NA
> matplot(x, cbind(xx$fit, 2*xx$se.fit, -2*xx$se.fit), pch=20)

[The plot looks strange...]

This is on my mobile Pentium 1.6GHz w/512MB laptop.  Using loess it also ran
out of memory.  At n=2000, 
the loess route took just under 3 seconds.

Cheers,
Andy

> > Basically, I'd reconsider the type= option to residual.glm. 
> As I said,
> > at least type="response" should have the right mean. Ideally, you'd
> > want to take advantage of the fact that the variance of the 
> residuals
> > is known too, rather than have the smoother estimate it. The more I
> > think, the more I like the splines...
> 
> I'll have a go at your suggestions (if I can get the syntax 
> right ... )
> 
> Thanks,
> Ted.
> 
> 
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
> Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
> Date: 06-Dec-04                                       Time: 00:13:53
> ------------------------------ XFMail ------------------------------
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>