[R] Using 'by()' in a function

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Apr 28 19:59:28 CEST 2000


> From: Setzer.Woodrow at epamail.epa.gov
> Date: Fri, 28 Apr 2000 13:06:42 -0400
> Subject: [R] Using 'by()' in a function
> To: r-help at hypatia.math.ethz.ch
> Content-disposition: inline
> X-Lotus-FromDomain: EPA
> 
> 
> 
> I have a list of dataframes, and want to apply a function to subsets of the 
rows
> of each dataframe.  It seemed natural to write a function that takes a 
dataframe
> as an argument, and uses 'by() within it to apply the function to the 
dataframe
> subsets.  However, I cannot get it to work.  The problem seems to be passing 
the
> data argument of by() as a function argument.  Is this bug, or am I missing
> something (or both)?
> 
> > ### Generate some test data
> > Test <- vector("list",2)
> > Test[[1]] <-
> data.frame(Dose=rep(c(0,1),c(10,10)),Resp1=rnorm(20),Resp2=rnorm(20))
> > ### The summary function
> > sumfun <- function(z)
> + {
> +   by(data=z,
> +      INDICES=list(factor(z[,"Dose"])),
> +      FUN=function(y)
> +      {
> +        apply(as.matrix(y[,c("Resp1","Resp2")]),2,
> +              function(x)c(Mean=mean(x),SD=sqrt(var(x))))
> +      }
> +      )
> + }
> > ### Using by works by itself
> > by(data=Test[[1]],
> +    INDICES=list(factor(Test[[1]][,"Dose"])),
> +    FUN=function(y)
> +    {
> +      apply(as.matrix(y[,c("Resp1","Resp2")]),2,
> +            function(x)c(Mean=mean(x),SD=sqrt(var(x))))
> +    }
> +    )
> : 0
>           Resp1      Resp2
> Mean -0.2426571 -0.1024979
> SD    0.9203455  0.9988352
> ------------------------------------------------------------
> : 1
>           Resp1     Resp2
> Mean -0.1632326 0.1079938
> SD    1.4124645 0.8793081
> > ### But not in a function
> > sumfun(Test[[1]])
> [1] "data.frame"
> Error in nrow(z) : Object "z" not found
> >

It's a scoping problem (and a bug). Alter by.data.frame to be

function (data, INDICES, FUN, ...) 
{
    if (!is.list(INDICES)) {
        IND <- vector("list", 1)
        IND[[1]] <- INDICES
        names(IND) <- deparse(substitute(INDICES))
    }
    else IND <- INDICES
    FUNx <- function(x) FUN(data[x, ], ...)
    nd <- nrow(data)
    ans <- eval(substitute(tapply(1:nd, IND, FUNx)), 
        data)
    attr(ans, "call") <- match.call()
    class(ans) <- "by"
    ans
}



-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list