[R] Odp: question about "mean"

Allan Engelhardt allane at cybaea.com
Tue Jun 15 18:08:20 CEST 2010


This solution also seems to be the fastest of the proposed options for 
this data set:

library("rbenchmark")
benchmark(columns = c("test", "elapsed", "relative"), order = "elapsed",
           apply =apply(iris[, -5], 2, tapply, iris$Species, mean),
           with = with(iris, rowsum(iris[, -5], Species)/table(Species)),
           aggregate = aggregate(iris[,-5],list(iris[,5]),mean),
           sapply = sapply(split(iris[,1:4], iris$Species), mean))
# 4    sapply   0.148 1.000000
# 1     apply   0.248 1.675676
# 2      with   0.310 2.094595
# 3 aggregate   0.313 2.114865

However, the 'with/rowsum/table' option proposed by Bill Venables 
appears to scale better:

i <- rbind(iris, iris, iris, iris, iris)
i <- rbind(i, i, i, i, i); i <- rbind(i, i, i, i, i); i <- rbind(i, i, 
i, i, i)
NROW(i)
# [1] 93750
benchmark(columns=c("test", "elapsed", "relative"), order="elapsed",
           apply=apply(i[, -5], 2, tapply, i$Species, mean),
           with=with(i, rowsum(i[, -5], Species)/table(Species)),
           aggregate=aggregate(i[,-5],list(i[,5]),mean),
           sapply=sapply(split(i[,1:4], i$Species), mean))
#        test elapsed  relative
# 2      with   2.708  1.000000
# 4    sapply   5.189  1.916174
# 3 aggregate  15.990  5.904727
# 1     apply  31.646 11.686115

(Because I care about these things...)

Allan

On 10/06/10 09:44, Petr PIKAL wrote:
> Hi
>
> split/sapply can be used besides other options
>
> sapply(split(iris[,1:4], iris$Species), mean)
>
> Regards
> Petr
>
> r-help-bounces at r-project.org napsal dne 10.06.2010 00:43:29:
>
>    
>> Hi there:
>>       I have a question about generating mean value of a data.frame. Take
>> iris data for example, if I have a data.frame looking like the
>>      
> following:
>    
>> ---------------------
>>      Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
>> 1                    5.1               3.5                  1.4
>>      0.2     setosa
>> 2                    4.9               3.0                  1.4
>>      0.2     setosa
>> 3                    4.7               3.2                   1.3
>>     0.2     setosa
>> .                         .                   .                      .
>>               .              .
>> .                         .                   .                      .
>>              .               .
>> .                         .                   .                      .
>>              .               .
>> -----------------------
>> There are three different species in this table. I want to make a table
>>      
> and
>    
>> calculate mean value for each specie as the following table:
>>
>> -----------------
>>                               Sepal.Length Sepal.Width Petal.Length
>> Petal.Width
>> mean.setosa                    5.006            3.428             1.462
>>        0.246
>> mean.versicolor               5.936             2.770             4.260
>>        1.326
>> mean.virginica                  6.588            2.974             5.552
>>        2.026
>> -----------------
>> Is there any short syntax can do it?? I mean shorter than the code I
>>      
> wrote
>    
>> as following:
>>
>> attach(iris)
>> mean.setosa<-mean(iris[Species=="setosa", 1:4])
>> mean.versicolor<-mean(iris[Species=="versicolor", 1:4])
>> mean.virginica<-mean(iris[Species=="virginica", 1:4])
>> data.mean<-rbind(mean.setosa, mean.versicolor, mean.virginica)
>> detach(iris)
>> ------------------
>>
>> Thanks a million!!!
>>
>>
>> -- 
>> =====================================
>> Shih-Hsiung, Chou
>> System Administrator / PH.D Student at
>> Department of Industrial Manufacturing
>> and Systems Engineering
>> Kansas State University
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>>      
> http://www.R-project.org/posting-guide.html
>    
>> and provide commented, minimal, self-contained, reproducible code.
>>      
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list