[R] [Fwd: Re: evaluation question]

Tue Jan 27 19:38:02 CET 2009

Gabor Grothendieck wrote:
> It looks in data and if not found there in environment(formula)
> so try this:
>
> mylm <- function(model, wghts) {
> 	lm(model, data.frame(wghts), weights = wghts)
> }
>   

won't help, i'm afraid:

wghts = 1:10
y = rnorm(10)

lm(y~wghts, weights=rep(1,10))
mylm(y~wghts, rep(1,10))

irrespectively of how ridiculous it might be to call 'wghts' a variable
subsequently used as an element in a formula, it should not matter what
names the user of mylm (or pracx below) happens to define *outside* the
function.  it just so happens that smart tricks with variable lookup may
unexpectedly break code that intuitively seems ok.

vQ

> On Sun, Jan 25, 2009 at 4:20 PM, Wacek Kusnierczyk
> <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
>   
>> dear list,
>>
>> below is an edited version of my response to an r user asking me for
>> explaining some issues related to r's evaluation rules.  i find the
>> problem interesting enough to be forwarded to the list, hopefully for
>> comments from whoever may want to extend or correct my explanations.
>>
>> (i'd like to add that much as i'm happy to receive and answer offline
>> mails, questions related to r are best sent directly to the list, where
>> the real experts are.)
>>
>>
>> -------- Original Message --------
>> Subject:        Re: evaluation question
>> Date:   Sun, 25 Jan 2009 20:32:22 +0100
>>
>>
>>
>>
>>
>>
>>
>>
>> xxx wrote:
>>
>> <snip>
>>
>>     
>>> Someone sent in an example a few days ago showing that prac1 ( see
>>> below ) doesn't work. Then someone else sent two different
>>> ways of fixing it.
>>> I'm still slightly confused.
>>>       
>> <snip>
>>
>>
>>     
>>> x<-1:10;
>>> y<-rnorm(10) + x;
>>>
>>> # THIS DOES NOT WORK
>>>
>>> prac1 <- function( model,wghts){
>>>   lm( model, weights = wghts)
>>> }
>>>
>>> prac1(model = y~x, wghts = rep(1, 10))
>>>       
>> tfm:
>>
>> " the variables are taken from 'environment(formula)', typically
>>  the environment from which 'lm' is called. "
>>
>> when lm is applied to a model, the variable names used to pass arguments
>> to lm (here, 'wghts') are looked up in the environment where the model
>> was defined.  here, you have two environments:
>>
>> - the global one (say, e_g), where x, y, and prac1 are defined;
>> - the call-local one (say, e_p1), created when prac1 is applied.
>>
>> there is a variable name 'wghts' in the latter, but none in the
>> former.  just before the call, environmentwise the situation is as follows:
>>
>> e_g = { 'x':v1, 'y':v2, 'prac1':v3 }
>>
>> where e_g contains three mappings (of those we are interested here), written here as <name>:<value>, none for
>> 'wghts'.  (the v1, v2, v3 stand for the respective values, as in the
>> code above.)
>>
>> when you apply prac1, you create a new, local environment:
>>
>> e_p1 = { 'model':v4, 'wghts':v5 }
>>
>> where v4 is a promise with the expression 'y~x' and evaluation
>> environment e_g (the caller's environment), and v5 is a promise with the
>> expression 'rep(1, 10)' and evaluation environment e_g.
>>
>> when you call lm, things are a little bit more complicated.  after some
>> black magic is performed on the arguments in the lm call, weights are
>> extracted from the model using model.weights, and the lookup is
>> performed not in e_p1, but in e_g.
>>
>> rm(list=ls()) # cleanup
>> x = 1:10
>> y = rnorm(10)+x
>>
>> p1 = function(model, wghts)
>>    lm(model, weights=wghts)
>>
>> p1(y~x, rep(1,10))
>> # (somewhat cryptic) error: no variable named 'wghts' found
>>
>> wghts = rep(1,10)
>> p1(y~x, wghts)
>> # now works, e_g has a binding for 'wghts'
>> # passing wghts as an argument to p1 makes no difference
>>
>> note, due to lazy evaluation, the following won't do:
>>
>> rm(wghts) # cleanup
>>
>> p1(y~x, wghts<-rep(1,10))
>> # wghts still not found in e_g
>>
>>
>> if you happen to generalize your p1 over the additional arguments to be
>> passed to lm, ugly surprizes await, too:
>>
>> p2 = function(model, ...) {
>>    # some additional code
>>    lm(model, ...) }
>> p2(y~x, weights=rep(1,10))
>> # (rather cryptic) error
>>
>>
>> if you want to fit a model with different sets of weights, the following
>> won't do:
>>
>> rm(wghts) # cleanup
>> lapply(
>>   list(rep(1,10), rep(c(0.5, 1.5), 5)), # alternative weight vectors
>>   function(weights) p1(y~x, weights))
>> # wghts not found in e_g, as before
>>
>> but this, incidentally, will work:
>>
>> rm(wghts) # cleanup
>> lapply(
>>   list(rep(1,10), rep(c(0.5, 1.5), 5)),
>>   function(wghts) p1(y~x, wghts))
>> # wghts found in e_g, not in e_p1
>>
>> as will this:
>>
>> rm(wghts) # cleanup
>> lapply(
>>   list(rep(1,10), rep(c(0.5, 1.5), 5)),
>>   function(wghts) p1(y~x))
>> # wghts found in e_g
>>
>> but obviously not this:
>>
>> rm(wghts) # cleanup
>> lapply(
>>   list(rep(1,10), rep(c(0.5, 1.5), 5)),
>>   function(weights) p1(y~x))
>> # wghts not found
>>
>>
>>
>>     
>>> # SOLUTION # 1
>>>
>>> prac2 <- function( model,wghts){
>>>  environment(model) <- environment()
>>>  lm(model,weights = wghts)
>>> }
>>>
>>> prac2(model = y~x, wghts = rep(1, 10))
>>>       
>> environment() returns the local call environment (see e_p1 above), where
>> 'wghts' is mapped to a promise to evaluate rep(1,10) in e_g.  you set
>> the environment of model to e_p1, so that lm looks for wghts there --
>> and finds it.
>>
>> this is an 'elegant' workaround, with possible disastrous
>> consequences if the model happens to include a variable named 'model' or
>> 'wghts':
>>
>> model = 1:10
>> prac2(y~model, rep(1,10))
>> # can't use model in a formula?
>>
>> wghts = x
>> prac2(y~wghts, rep(1,10))
>> # oops, not quite the same prac2(y~x, rep(1,10))
>>
>> another problem with this 'elegant' 'solution' is that if prac_ happens to have
>> local variables with names in conflict with names in the model formula, you're
>> in trouble again:
>>
>> prac2 = function(model, wghts) {
>>    environment(model) = environment()
>>    x = NULL # for whatever reason one might need an x here
>>    # whatever
>>    lm(model, weights = wghts) }
>>
>> prac2(y~x, rep(1,10))
>> # oops, NULL is not good an x in the model
>>
>> these may be unlikely scenarios, but the issue is serious.  you need to
>> understand the details of how lm is implemented in order to understand
>> why your (intuitively correct) example above did not work, to understand
>> why and when your variables can be captured in unexpected ways, and to
>> know how to write 'elegant' 'solutions' to avoid the problems.
>>
>>
>> this also means that writing general purpose modules to be used by others is likely
>> to cause errors (or silently produce rubbish) if you're not careful enough.
>> (or else you need to prepare your code to handle all exotic situations such as
>> passing weights to prac1 together with a model where one of the variables is
>> named 'model' or 'wghts' -- this is point 2 in a recent comment by Greg Snow,
>> which, as he says, stinks hubris.
>>
>>     
>>> # SOLUTION # 2
>>>
>>> prac3 <- function( model,wghts){
>>>   cur.env <- environment()
>>>   lm( model, weights = wghts, data = cur.env )
>>> }
>>>
>>> prac3(model = y~x, wghts = rep(1, 10))
>>>
>>>
>>>       
>> this is an equally 'good' 'solution', with the above comments equally
>> applicable here.  you'd have to tell the user of your modules which variable
>> names are persona non grata.
>>
>> vQ
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>     
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   

-- 
-------------------------------------------------------------------------------
Wacek Kusnierczyk, MD PhD

Email: waku at idi.ntnu.no
Phone: +47 73591875, +47 72574609

Department of Computer and Information Science (IDI)
Faculty of Information Technology, Mathematics and Electrical Engineering (IME)
Norwegian University of Science and Technology (NTNU)
Sem Saelands vei 7, 7491 Trondheim, Norway
Room itv303

Bioinformatics & Gene Regulation Group
Department of Cancer Research and Molecular Medicine (IKM)
Faculty of Medicine (DMF)
Norwegian University of Science and Technology (NTNU)
Laboratory Center, Erling Skjalgsons gt. 1, 7030 Trondheim, Norway
Room 231.05.060