[Rd] [R] difference in using with() and the "data" argument in glm (PR#9338)

Gabor Grothendieck ggrothendieck at gmail.com
Fri Nov 3 16:34:34 CET 2006


One thing I noticed is that ?glm does not really specify what happens
if you do not give a value for data.  Is data then just skipped so that search
takes place in enivonrment(formula) only or is it supposed to default to
something?   Some clarification in ?glm would be helpful.

On 11/3/06, murdoch at stats.uwo.ca <murdoch at stats.uwo.ca> wrote:
> I've redirected this reply from r-help to the bugs list.
>
> On 11/3/2006 8:25 AM, vito muggeo wrote:
> > Dear all,
> > I am dealing with the following (apparently simple problem):
> > For some reasons I am interested in passing variables from a dataframe
> > to a specific environment, and in fitting a standard glm:
> >
> > dati<-data.frame(y=rnorm(10),x1=runif(10),x2=runif(10))
> > KK<-new.env()
> > for(i in 1:ncol(dati)) assign(names(dati[i]),dati[[i]],envir=KK)
> > #Now the following two lines work correctly:
> > coef(glm(y~x1+x2,data=KK))
> > with(KK,coef(glm(y~x1+x2)))
> >
> > #However if I write the above code inside a function, with() does not
> > appear to work..
> >
> > ff<-function(Formula,Data,method=1){
> >      KK<-new.env()
> >      for(i in 1:ncol(Data)) assign(names(Data[i]),Data[[i]],envir=KK)
> >      o<-if(method==1) glm(Formula,data=KK) else with(KK,glm(Formula))
> >      o}
> >
> >  > ff(y~x1+x2,dati,1) #it works
> > Call:  glm(formula = Formula, data = KK)
> > ..[SNIP]..
> >  > ff(y~x1+x2,dati,2) #it does not
> > Error in eval(expr, envir, enclos) : object "y" not found
> >  >
> >
> > Could anyone to explain such difference? I believed that
> > "with(data,glm(formula))" and "glm(formula,data)" were equivalent.
>
> I think this is a bug in terms.formula.  Near the end it has
>
>     environment(terms) <- environment(x)
>
> where x is the formula.  Since "y" isn't defined in that environment, it
> fails.  It would work for you with
>
>     environment(terms) <- data
>
> but see below.
>
> A workaround that should work for you is to put
>
> environment(Formula) <- KK
>
> before the call to glm.
>
> I'm not going to make the patch I suggest above, because I don't think
> it's consistent with the expected behaviour of glm() in the case where
> some of the terms in the formula are supposed to come from
> environment(x), and some from "data".
>
> I don't know how to handle that case properly:  I think it requires a
> different search strategy than R employs (but I might be wrong).  This
> isn't a problem with the workaround I suggested to you, because there
> the parent of KK is environment(x), but that wouldn't be true in general.
>
> Duncan Murdoch
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list