[R] problem with lapply(x, subset, ...) and variable select argument

Peter Dalgaard p.dalgaard at biostat.ku.dk
Tue Oct 11 11:36:05 CEST 2005


"Dimitris Rizopoulos" <dimitris.rizopoulos at med.kuleuven.be> writes:

> As Gabor said, the issue here is that subset.data.frame() evaluates 
> the value of the `select' argument in the parent.frame(); Thus, if you 
> create a local function within lapply() (or sapply()) it works:

It's more complicated than that: It evaluates the select argument in a
named list with names duplicating those of the data frame, and *then*
in parent.frame. This is convenient for command line use, because you
can specify ranges of variables as in

  dfsub <- subset(dfr,select=c(sex:treat, x_pre:x_24))

but it is quite risky to try and do this inside a function - if you're
passing in a variable, the result depends on whether there is a
variable of the same name in the data frame! You can probably get
around it using substitute() constructions, but I think it is safer to
avoid using functions with nonstandard semantics inside functions.
 
 
> tt <- function (n) {
>     x <- list(data.frame(a = 1, b = 2), data.frame(a = 3, b = 4))
>     print(lapply(x, function(y, n) subset(y, select = n), n = n))
>     print(sapply(x, function(y, n) subset(y, select = n), n = n))
> }
> 
> tt("a")
> 
> 
> I hope it helps.
> 
> Best,
> Dimitris
> 
> ----
> Dimitris Rizopoulos
> Ph.D. Student
> Biostatistical Centre
> School of Public Health
> Catholic University of Leuven
> 
> Address: Kapucijnenvoer 35, Leuven, Belgium
> Tel: +32/(0)16/336899
> Fax: +32/(0)16/337015
> Web: http://www.med.kuleuven.be/biostat/
>      http://www.student.kuleuven.be/~m0390867/dimitris.htm
> 
> 
> 
> ----- Original Message ----- 
> From: "joerg van den hoff" <j.van_den_hoff at fz-rossendorf.de>
> To: "Gabor Grothendieck" <ggrothendieck at gmail.com>; "Thomas Lumley" 
> <tlumley at u.washington.edu>
> Cc: "r-help" <r-help at stat.math.ethz.ch>
> Sent: Tuesday, October 11, 2005 10:18 AM
> Subject: Re: [R] problem with lapply(x, subset,...) and variable 
> select argument
> 
> 
> > Gabor Grothendieck wrote:
> >> The problem is that subset looks into its parent frame but in this
> >> case the parent frame is not the environment in tt but the 
> >> environment
> >> in lapply since tt does not call subset directly but rather lapply 
> >> does.
> >>
> >> Try this which is similar except we have added the line beginning
> >> with environment before the print statement.
> >>
> >> tt <- function (n) {
> >>    x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
> >>    environment(lapply) <- environment()
> >>    print(lapply(x, subset, select = n))
> >> }
> >>
> >> n <- "b"
> >> tt("a")
> >>
> >> What this does is create a new version of lapply whose
> >> parent is the environment in tt.
> >>
> >>
> >> On 10/10/05, joerg van den hoff <j.van_den_hoff at fz-rossendorf.de> 
> >> wrote:
> >>
> >>>I need to extract identically named columns from several data 
> >>>frames in
> >>>a list. the column name is a variable (i.e. not known in advance). 
> >>>the
> >>>whole thing occurs within a function body. I'd like to use lapply 
> >>>with a
> >>>variable 'select' argument.
> >>>
> >>>
> >>>example:
> >>>
> >>>tt <- function (n) {
> >>>   x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
> >>>   for (xx in x) print(subset(xx, select = n))   ### works
> >>>   print (lapply(x, subset, select = a))   ### works
> >>>   print (lapply(x, subset, select = "a"))  ### works
> >>>   print (lapply(x, subset, select = n))  ### does not work as 
> >>> intended
> >>>}
> >>>n = "b"
> >>>tt("a")  #works (but selects not the intended column)
> >>>rm(n)
> >>>tt("a")   #no longer works in the lapply call including variable 
> >>>'n'
> >>>
> >>>
> >>>question: how  can I enforce evaluation of the variable n such that
> >>>the lapply call works? I suspect it has something to do with eval 
> >>>and
> >>>specifying the correct evaluation frame, but how? ....
> >>>
> >>>
> >>>many thanks
> >>>
> >>>joerg
> >>>
> >>>______________________________________________
> >>>R-help at stat.math.ethz.ch mailing list
> >>>https://stat.ethz.ch/mailman/listinfo/r-help
> >>>PLEASE do read the posting guide! 
> >>>http://www.R-project.org/posting-guide.html
> >>>
> >>
> >>
> >
> > many thanks to thomas and gabor for their help. both solutions solve 
> > my
> > problem perfectly.
> >
> > but just as an attempt to improve my understanding of the inner 
> > workings
> > of R (similar problems are sure to come up ...) two more question:
> >
> > 1.
> > why does the call of the "[" function (thomas' solution) behave
> > different from "subset" in that the look up of the variable "n" 
> > works
> > without providing lapply with the current environment (which is 
> > nice)?
> >
> > 2.
> > using 'subset' in this context becomes more cumbersome, if sapply is
> > used. it seems that than I need
> > ...
> > environment(sapply) <- environment(lapply) <- environment()
> > sapply(x, subset, select = n))
> > ...
> > to get it working (and that means you must know, that sapply uses
> > lapply). or can I somehow avoid the additional explicit definition 
> > of
> > the lapply-environment?
> >
> >
> > again: many thanks
> >
> > joerg
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> > 
> 
> 
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907




More information about the R-help mailing list