[R] Using $ accessor in GAM formula

Gavin Simpson gavin.simpson at ucl.ac.uk
Fri May 6 19:18:23 CEST 2011


On Fri, 2011-05-06 at 11:20 -0500, Gene Leynes wrote:
> Hmmm....
> 
> After reading that email four times, I think I see what you mean.
> 
> Checking for variables within particular scopes is probably one of the most
> challenging things in R, and I would guess in other languages too.  In R
> it's compounded by situations when you're writing a function to accept
> variables as either a stand alone global variable, or as an element of a
> data.frame or list.

Dear Gene,

> I think this is a new problem, and I'll switch to the lengthier
> data.frame[,'x'] syntax in gam for now.

No, No, No, No, No!!!!!

If there is one thing you **should** take from this thread is that there
is no need to perform subsetting like that in a model formula.

Why would you want (or prefer) to do:

gam(dat$y ~ s(dat$x))

or

gam(dat[, "y"] ~ s(dat[, "x"]))

when

gam(y ~ s(x), dat)

will suffice?

> By the way, about the $ accessors.  I can see why some people don't like
> them, but they are a part of the language.  And, I think they're a good
> part.  They make the code much more readable, and I have yet to make a
> mistake using them (which makes me think that they can be used
> responsibly).  Making code harder to read is its own source of error, and is
> not something that I take lightly!

If you use R's formula notation properly, you'll get cleaner code than
either of your suggestions.

> Thanks for the replies.  And, thank you Rolf for the detailed analysis.  Do
> you think that your or I should submit a bug report to the package
> maintainer?  I'm not sure how that works.  Very few of my challenges turn
> out to be actual bugs, but I think this one is.

I would consider this a bug - but in the sense that Simon didn't foresee
the strange formulas that users of his software might concoct.

By the way, I think you perhaps meant Berwin (re the detailed analysis)?

HTH

G

> On Fri, May 6, 2011 at 4:53 AM, Berwin A Turlach
> <Berwin.Turlach at gmail.com>wrote:
> 
> > G'day Rolf,
> >
> > On Fri, 06 May 2011 09:58:50 +1200
> > Rolf Turner <rolf.turner at xtra.co.nz> wrote:
> >
> > > but it's strange that the dodgey code throws an error with gam(dat1$y
> > > ~ s(dat1$x))  but not with gam(dat2$cf ~ s(dat2$s))
> >
> > > Something a bit subtle is going on; it would be nice to be able to
> > > understand it.
> >
> > Well,
> >
> > R> traceback()
> > 3: eval(expr, envir, enclos)
> > 2: eval(inp, data, parent.frame())
> > 1: gam(dat$y ~ s(dat$x))
> >
> > So the lines leading up to the problem seem to be the following from
> > the gam() function:
> >
> >        vars <- all.vars(gp$fake.formula[-2])
> >        inp <- parse(text = paste("list(", paste(vars, collapse = ","),
> >            ")"))
> >        if (!is.list(data) && !is.data.frame(data))
> >            data <- as.data.frame(data)
> >
> >
> >
> > Setting
> >
> > R> options(error=recover)
> >
> > running the code until the error occurs, and then examining the frame
> > number for the gam() call shows that "inp" is
> > "expression(list( dat1,x ))" in your first example and
> > "expression(list( dat2,s ))" in your second example.  In both
> > examples, "data" is "list()" (not unsurprisingly).  When,
> >
> >        dl <- eval(inp, data, parent.frame())
> >
> > is executed, it tries to eval "inp", in both cases "dat1" and "dat2"
> > are found, obviously, in the parent frame.  In your first example "x" is
> > (typically) not found and an error is thrown, in your second example an
> > object with name "s" is found in "package:mgcv" and the call to eval
> > succeeds.  "dl" becomes a list with two components, the first being,
> > respectively, "dat1" or "dat2", and the second the body of the function
> > "s".  (To verify that, you should probably issue the command
> > "debug(gam)" and step through those first few lines of the function
> > until you reach the above command.)
> >
> > The corollary is that you can use the name of any object that R will
> > find in the parent frame, if it is another data set, then that data
> > set will become the second component of "inp".  E.g.:
> >
> > R> dat=data.frame(min=1:100,cf=sin(1:100/50)+rnorm(100,0,.05))
> > R> gam(dat$cf ~ s(dat$min))
> >
> > Family: gaussian
> > Link function: identity
> >
> > Formula:
> > dat$cf ~ s(dat$min)
> >
> > Estimated degrees of freedom:
> > 3.8925  total = 4.892488
> >
> > GCV score: 0.002704789
> >
> > Or
> >
> > R> dat=data.frame(BOD=1:100,cf=sin(1:100/50)+rnorm(100,0,.05))
> > R> gam(dat$cf ~ s(dat$BOD))
> >
> > Family: gaussian
> > Link function: identity
> >
> > Formula:
> > dat$cf ~ s(dat$BOD)
> >
> > Estimated degrees of freedom:
> > 3.9393  total = 4.939297
> >
> > GCV score: 0.002666985
> >
> > > Just out of pure academic interest. :-)
> >
> > Hope your academic curiosity is now satisfied. :)
> >
> > HTH.
> >
> > Cheers,
> >
> >        Berwin
> >
> > ========================== Full address ============================
> > A/Prof Berwin A Turlach               Tel.: +61 (8) 6488 3338 (secr)
> > School of Maths and Stats (M019)            +61 (8) 6488 3383 (self)
> > The University of Western Australia   FAX : +61 (8) 6488 1028
> > 35 Stirling Highway
> > Crawley WA 6009                e-mail: Berwin.Turlach at gmail.com
> > Australia                        http://www.maths.uwa.edu.au/~berwin
> >
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list