[R] Looking for a sort of tapply() to data frames

Thu Dec 15 14:21:37 CET 2005

On 12/15/05, January Weiner <january at uni-muenster.de> wrote:
> Hello again,
>
> On 12/14/05, Thomas Lumley <tlumley at u.washington.edu> wrote:
> > You want
> >
> > by(df[,-1], df$Day, function.that.means.each.column)
>
> OK, slowly :-) I don't understand it.
>
> - why df[,-1] and not df? don't we loose the df$Day entries?

You don't get them as a column but you get them as the
component labels.

   by(df, df$Day, function(x) colMeans(x[,-1]))

If you convert it to a data frame you get them as the rownames:

  do.call("rbind", by(df, df$Day, function(x) colMeans(x[,-1])))

>
> (by the way, why does typeof(df) show "list"? I thought that
> read.table() returns a data frame?)

I think you want class(df) which shows its a data frame.

>
> > so all you need to do is write  function.that.means.each.column()
> > In this case there is a built-in function, colMeans, so you don't even
> > have to write it.
>
> Hmmmmm, I tried it and it did not work. That is, it works - but not as
> intended :-).
>
> Fake example:
>
> > df <- data.frame(Day=c("Tue","Tue","Tue", "Wed", "Wed"), val1=seq(1,5), val2=3*seq(1,5))
> > df
>  Day val1 val2
> 1 Tue    1    3
> 2 Tue    2    6
> 3 Tue    3    9
> 4 Wed    4   12
> 5 Wed    5   15
> > ddf <- by(df[,-1], df$Day, colMeans)
> > ddf
> df$Day: Tue
> val1 val2
>   2    6
> ------------------------------------------------------------
> df$Day: Wed
> val1 val2
>  4.5 13.5
> > ddf$Day
> NULL
> > ddf$val1
> NULL
>
> In real data, instead of "days", I have around 6000 items, so I need
> them to be in one column called "Days" (or whatever).  OK. So correct
> me if I understand wrongly what is happening here:
>
> by() divides df in data frame subsets and applies a function
> (colMeans) to each of them.  The result of colMeans ... manual says
> that colMeans returns the following:
>
>     A numeric or complex array of suitable size, or a vector if the
>     result is one-dimensional.  The 'dimnames' (or 'names' for a
>     vector result) are taken from the original array.
>
> ...which doesn't tell me much.  typeof(colMeans(...)) tells me
> "double" but I think it lies. OK, lets assume it is a vector (should
> be, I assume the result is one-dimensional, as I can hardly imagine a
> multidimensional result).
>
> So in the end I have a list with as many columns as I have days, and
> in each column I have a vector with N named dimensions, where N is the
> numbers of variables in the original data frame bar one.  But what I
> would like to have is a data frame with exactly the same column names,
> and rows being just a summary.  And no clue how to convert one in the
> other :-)
>
> > More generally (eg the approach would work for medians as well)
> >
> > by(df[,1], df$Day, function(today) apply(today, 2, mean))
>
> Huh? why is it df[,1] now? I think I'm completly lost.

  df[,1] and df$Day both refer to the same first column.

>
> > Finally, you could just use aggregate().
>
> Probably, yes.  As soon as I figure out how to use it, that is :-) (an

   aggregate(df[,-1], df[,1,drop = FALSE], mean)

or

   aggregate(df[,-1], list(Day = df$Day), mean)

The second arg of aggregate must be a list which is why we used
drop = FALSE in the first instance and an explicit list in the second.

Another alternative is to use summaryBy from the doBy package found
at http://genetics.agrsci.dk/~sorenh/misc/ :

   library(doBy)
   summaryBy(cbind(var1, var2) ~ Day, data = df)

> hour later: OK, I got it! yuppie!)  However what I really needed was
> smth like this:
>
> ddf <- by(df[,-1], df$Day, function(z) { return(cor(z$val1,z$val2)) ; } )
>
> (but I still don't know how to convert it to a friendly data frame...)
>

   do.call("rbind", ddf)

> Thanks for the answers!
>
> January
>
> --
> ------------ January Weiner 3  ---------------------+---------------
> Division of Bioinformatics, University of Muenster  |  Schloßplatz 4
> (+49)(251)8321634                                   |  D48149 Münster
> http://www.uni-muenster.de/Biologie.Botanik/ebb/    |  Germany
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>