[R] loess crash

Liaw, Andy andy_liaw at merck.com
Mon Sep 16 22:16:49 CEST 2002


I agree with John mostly.  For a model as complicated as you're trying to
fit with loess, you might  as well try things like ppr (in the `modreg'
package), MARS (in the 'mda' package) or neural nets (in the 'nnet'
package), or even randomForest...  Actually MARS might offer a bit more
interpretability than others, because of its hierarchical construction.

If you do care about `marginal effects' of the predictors, then aren't you
sort of assuming additivity?  In which case the additive model is more
appropriate.  If not, the `marginal effects' can be misleading.

In terms of comparing a loess with 5 terms with a less complicated model, I
think it needs to be pointed out that (AFAIK) it can only be done on a more
or less qualitative level, as the models are not nested.

Cheers,
Andy

> -----Original Message-----
> From: John Fox [mailto:jfox at mcmaster.ca]
> Sent: Monday, September 16, 2002 1:59 PM
> To: jdeke2 at comcast.net
> Cc: r-help at stat.math.ethz.ch
> Subject: RE: [R] loess crash
> 
> 
> Dear John,
> 
> It's true that the gam function in mgcv fits with splines 
> while loess uses 
> local regression, but an even more fundamental difference is 
> that gam fits 
> additive models (though, with some care, you can include 
> higher-dimensional 
> terms). Given your description of what you plan to do with the fitted 
> model, an additive model might be what you want.
> 
> More generally, a model that fits five-way interactions may 
> be useful as a 
> point of comparison for simpler models, but I doubt that it 
> will provide a 
> digestible description of the data.
> 
> I hope that this helps,
>   John
> 
> At 10:45 AM 9/16/2002 -0400, you wrote:
> >Thanks for the suggestion. I've only used splines for 
> desnity estimation
> >before -- I've never used them for regression (although I'm 
> aware that
> >people do). I'll look into it...
> >
> >
> >-----Original Message-----
> >From: Rafael A. Irizarry [mailto:ririzarr at jhsph.edu]
> >Sent: Monday, September 16, 2002 10:17 AM
> >To: jdeke2 at comcast.net
> >Cc: 'r-help at stat.math.ethz.ch'
> >Subject: RE: [R] loess crash
> >
> >
> >i would suggest looking at the package mgcv.
> >you can fit generalized additive models which are useful for what
> >you desribe below.
> >
> >On Mon, 16 Sep 2002, John Deke wrote:
> >
> > > Ah... I hadn't noticed that option! Thanks... that's a 
> good idea. I'm
> >quite
> > > happy to use local linear regression.
> > >
> > > To answer your question -- perhaps I'm off base, but my 
> reason for wanting
> > > to do this is that I have a set of explanatory variables 
> that most likely
> > > influence my dependent variable in ways that are 
> difficult to model
> > > parametrically. That is, I suspect that there are all sorts of
> >complementary
> > > relationships between these variables, and its not at all 
> clear that
> >there's
> > > a satisfying theoretical model that would suggest a 
> clear-cut parametric
> > > relationship. So, rather than using parametric 
> regression, I'd like to try
> > > something non-parametric.
> > >
> > > My plan for summarizing the results is to find the 
> average marginal effect
> > > of each explanatory variable of interest, holding all 
> else constant. Also,
> >I
> > > would calculate predicted outcomes for combinations of 
> the explanatory
> > > variables that are most likely to occur in "the real world".
> > >
> > > John
> > >
> > > -----Original Message-----
> > > From: John Fox [mailto:jfox at mcmaster.ca]
> > > Sent: Monday, September 16, 2002 9:31 AM
> > > To: John Deke
> > > Cc: r-help at stat.math.ethz.ch
> > > Subject: Re: [R] loess crash
> > >
> > >
> > > Dear John,
> > >
> > > For curiosity, I tried your example under R 1.5.1 on an 
> 800 MHz PC with
> >512
> > > Mb of memory running Windows 2000. The results were just 
> as you described:
> >
> > > The four-predictor problem ran essentially instantly, and the
> > > five-predictor problem crashed R, again instantly.
> > >
> > > I also tried making the problem less computationally demanding by
> > > specifying locally linear, rather than quadratic, fits; 
> this appears to
> > > work:
> > >
> > >  > loess(y~x1+x2+x3+x4+x5, data2, degree=1)
> > > Call:
> > > loess(formula = y ~ x1 + x2 + x3 + x4 + x5, data = data2, 
> degree = 1)
> > >
> > > Number of Observations: 500
> > > Equivalent Number of Parameters: 13.5
> > > Residual Standard Error: 1.012
> > >  >
> > >
> > >
> > > Although something is obviously wrong here, I wonder 
> whether it makes
> >sense
> > > to fit a local regression with so many predictors (unless 
> the object is to
> >
> > > compare the general nonparametric fit with some more 
> constrained model):
> > > how would you describe the five-dimensional surface 
> that's produced?
> > >
> > > John
> > >
> > > At 07:36 AM 9/16/2002 -0400, John Deke wrote:
> > > >Here's a simple example that yields the crash:
> > > >
> > > >library(modreg)
> > > >data1 <- array(runif(500*5),c(500,5))
> > > >colnames(data1) <- c("x1","x2","x3","x4","x5")
> > > >y <-
> > >
> > 
> >3+2*data1[,"x1"]+15*data1[,"x2"]+13*data1[,"x3"]-8*data1[,"x4
> "]+14*data1[,"
> > > x5"]+rnorm(500)
> > > >data2 <- cbind(y,data1)
> > > >data2 <- as.data.frame(data2)
> > > >result1 <- loess(y~x1+x2+x3+x4,data2)
> > > >
> > > >To get the crash, I just add x5--
> > > >
> > > >result1 <- loess(y~x1+x2+x3+x4+x5,data2)
> > > >
> > > >And bammo -- I'm dead. It doesn't even pause -- Rgui 
> crashes, and I mean
> > > >really crashes -- the program is terminated, I get the 
> little Windows
> > > >dialogue saying that a log file is being generated -- 
> the whole dramatic
> > > >death scene.
> > > >
> > > >I know its a computationally intensive thing, but the 
> one that doesn't
> > > >crash (with four explanatory variables) runs almost 
> instantly. Its hard
> >to
> > > >see how adding a fifth could be so catastrophic. But I 
> am somewhat new to
> >
> > > >this particular methodology....
> > > >
> > > >John
> > > >
> > > >At 03:38 AM 9/16/2002, Peter Dalgaard BSA wrote:
> > > >>John Deke <jdeke2 at comcast.net> writes:
> > > >>
> > > >> > Hmm... if I reduce the number of observations to 
> just 500, I still
> >get
> > > >> > the error.
> > > >> >
> > > >> > I don't think its an issue of colinearity, because 
> I've tried several
> > > >> > different combinations of variables, all of which 
> work just fine in
> >an
> > > >> > OLS or logistic regression.
> > > >> >
> > > >> > I'm probably doing something stupid, but I'm not seeing it...
> > > >> >
> > > >> > At 02:00 PM 9/15/2002, John Deke wrote:
> > > >> > >Hi,
> > > >> > >
> > > >> > > I have a data frame with 6563 observations. I can 
> run a regression
> > > >> > > with loess using four explanatory variables. If I 
> add a fifth, R
> > > >> > > crashes. There are no missings in the data, and if I run a
> > > >> > > regression with any four of the five explanatory 
> variables, it
> > > >> > > works. Its only when I go from four to five that 
> it crashes.
> > > >>
> > > >>Hmm... I wouldn't try loess with more than one or two 
> descriptors. I
> > > >>mean, it's a smoothing method and representing a smooth 
> function of
> > > >>many variables can be computationally demanding.
> > > >>
> > > >>The Fortran source code for loess is one of the more 
> obfuscated pieces
> > > >>of R, but I can see that some structures inside of it 
> are of fixed
> > > >>size, which might explain it (BTW: Does R really crash, 
> or just say
> > > >>memory exhausted?).
> > > >>
> > > >>Do you have a simple example that reproduces the crash 
> (using random
> > > >>numbers, e.g.)?
> > >
> > > -----------------------------------------------------
> > > John Fox
> 
> ____________________________
> John Fox
> Department of Sociology
> McMaster University
> email: jfox at mcmaster.ca
> web: http://www.socsci.mcmaster.ca/jfox
> ____________________________
> 
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> -.-.-.-.-.-.-.-.-
> r-help mailing list -- Read 
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._


------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message.  If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it.

==============================================================================

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list