[R] Using $ accessor in GAM formula

Berwin A Turlach Berwin.Turlach at gmail.com
Fri May 6 11:53:33 CEST 2011


G'day Rolf,

On Fri, 06 May 2011 09:58:50 +1200
Rolf Turner <rolf.turner at xtra.co.nz> wrote:

> but it's strange that the dodgey code throws an error with gam(dat1$y
> ~ s(dat1$x))  but not with gam(dat2$cf ~ s(dat2$s))

> Something a bit subtle is going on; it would be nice to be able to 
> understand it.

Well, 

R> traceback()
3: eval(expr, envir, enclos)
2: eval(inp, data, parent.frame())
1: gam(dat$y ~ s(dat$x))

So the lines leading up to the problem seem to be the following from
the gam() function:

        vars <- all.vars(gp$fake.formula[-2])
        inp <- parse(text = paste("list(", paste(vars, collapse = ","), 
            ")"))
        if (!is.list(data) && !is.data.frame(data)) 
            data <- as.data.frame(data)
        


Setting

R> options(error=recover)

running the code until the error occurs, and then examining the frame
number for the gam() call shows that "inp" is
"expression(list( dat1,x ))" in your first example and
"expression(list( dat2,s ))" in your second example.  In both
examples, "data" is "list()" (not unsurprisingly).  When, 

	dl <- eval(inp, data, parent.frame())

is executed, it tries to eval "inp", in both cases "dat1" and "dat2"
are found, obviously, in the parent frame.  In your first example "x" is
(typically) not found and an error is thrown, in your second example an
object with name "s" is found in "package:mgcv" and the call to eval
succeeds.  "dl" becomes a list with two components, the first being,
respectively, "dat1" or "dat2", and the second the body of the function
"s".  (To verify that, you should probably issue the command
"debug(gam)" and step through those first few lines of the function
until you reach the above command.)

The corollary is that you can use the name of any object that R will
find in the parent frame, if it is another data set, then that data
set will become the second component of "inp".  E.g.:

R> dat=data.frame(min=1:100,cf=sin(1:100/50)+rnorm(100,0,.05))
R> gam(dat$cf ~ s(dat$min))

Family: gaussian 
Link function: identity 

Formula:
dat$cf ~ s(dat$min)

Estimated degrees of freedom:
3.8925  total = 4.892488 

GCV score: 0.002704789

Or 

R> dat=data.frame(BOD=1:100,cf=sin(1:100/50)+rnorm(100,0,.05))
R> gam(dat$cf ~ s(dat$BOD))

Family: gaussian 
Link function: identity 

Formula:
dat$cf ~ s(dat$BOD)

Estimated degrees of freedom:
3.9393  total = 4.939297 

GCV score: 0.002666985

> Just out of pure academic interest. :-)

Hope your academic curiosity is now satisfied. :)

HTH.

Cheers,

	Berwin

========================== Full address ============================
A/Prof Berwin A Turlach               Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)            +61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway                   
Crawley WA 6009                e-mail: Berwin.Turlach at gmail.com
Australia                        http://www.maths.uwa.edu.au/~berwin



More information about the R-help mailing list