[R] Looking for a sort of tapply() to data frames

Frank E Harrell Jr f.harrell at vanderbilt.edu
Fri Dec 16 18:04:41 CET 2005


Gabor Grothendieck wrote:
> On 12/16/05, January Weiner <january at uni-muenster.de> wrote:
> 
>>Hi,
>>
>>On 12/15/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
>>
>>>You don't get them as a column but you get them as the
>>>component labels.
>>>
>>>   by(df, df$Day, function(x) colMeans(x[,-1]))
>>>
>>>If you convert it to a data frame you get them as the rownames:
>>>
>>>  do.call("rbind", by(df, df$Day, function(x) colMeans(x[,-1])))
>>
>>Thanks! that helps a lot.  But I still run into problems with this.
>>Sorry for bothering you with newbie questions, if my problems are
>>trivial, point me to a suitable guide (I did read the introductory
>>materials on R).
>>
>>First: it works for colMeans, but it does not work for a function like this:
>>
>>do.call("rbind", by(df, df$Day, function(x) cor(df$val1, df$val2))
> 
> 
> There are a number of problems:
> 
> 1. the function does not depend on x and therefore will return the
> same result for each day group.
> 
> 2. although ?by says it returns a list, it apparently simplifies the result,
> contrary to the documentation, in certain cases.  Try this:
> 
> do.call("rbind", as.list(by(df, df$Day, function(x) cor(x$val1, x$val2))))
> 
> or this:
> 
> do.call("rbind", by(df, df$Day, function(x) list(cor = cor(x$val1, x$val2))))
> 
> 
> 3. In your sample data val1 is constant for Wed so you won't be able
> to get a correlation.  That's the source of the warning that you get
> when running the line in #2.
> 
> 
>>it says "Error in do.call(....) : second argument must be a list". I
>>do not understand this, as the second argument is "b" of the class
>>"by", as it was in the case of colMeans, so it did not change...?
>>
>>Second: in case of colMeans (where it works) it returns a matrix, and
>>I have troubles getting it back to the data.frame, so I can access
>>blah$Day.  Instead, I have smth like that:
> 
> 
> Try blah[,"Day"] which works with both matrices and data frames.
> 
> 
>>>do.call("rbind",b)
>>
>>   V2 V3 V4 V5       V7
>>Tue 19 15  2  0 1.538462
>>Wed  5  3  6  1 1.285714
> 
> 
> 
> Another possibility is to coerce it to a data frame:
> 
> as.data.frame(do.call("rbind", b))
> 
> or change your function to return a list.
> 
> 
>>...and I do not know how to acces, for example, values for "Tue",
>>except with [1,] -- which is somewhat problematic.  For example, I
>>would like to display the 3 days for which V7 is highest.  How can I
>>do that?
>>
>>
>>>I think you want class(df) which shows its a data frame.
>>
>>Ops. Sorry, I didn't guess it from the manual :-)
>>
>>
>>>   aggregate(df[,-1], df[,1,drop = FALSE], mean)
>>
>>But why is df[,1,drop=FALSE] a list?  I don't get it...
> 
> 
> Because df is a one column data frame and data frames are lists.
> Had we not specified drop, it would have automatically dropped it
> since it has only one dimension simplifying it to a non-list.
> We do not want that simplification here.
> 
> 
>>>   aggregate(df[,-1], list(Day = df$Day), mean)
>>
>>Yeah, I figured out that one.
>>
>>
>>>Another alternative is to use summaryBy from the doBy package found
>>>at http://genetics.agrsci.dk/~sorenh/misc/ :
>>>
>>>   library(doBy)
>>>   summaryBy(cbind(var1, var2) ~ Day, data = df)
>>
>>I think I am not confident enough with the basic data types in R, I
>>need to understand them before I go over to specialized packages :-)
>>Again, thanks a lot,
>>January

You might want to look at the summarize function in the Hmisc package.

Frank

>>
>>--
>>------------ January Weiner 3  ---------------------+---------------
>>Division of Bioinformatics, University of Muenster  |  Schloßplatz 4
>>(+49)(251)8321634                                   |  D48149 Münster
>>http://www.uni-muenster.de/Biologie.Botanik/ebb/    |  Germany
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>
> 
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list