[R] more on apply on data frame

Gabor Grothendieck ggrothendieck at myway.com
Sat Aug 21 14:32:58 CEST 2004


Laura Holt <lauraholt_983 <at> hotmail.com> writes:

> 
> Hi R People:
> 
> Several of you pointed out that using "tapply" on a data frame will work on 
> the iris data frame.
> 
> I'm still having a problem.
> 
> The iris data frame has 150 rows, 5 variables.  The first 4 are numeric, 
> while the last is a factor, which has the Species names.
> 
> I can use tapply for 1 variable at a time:
> >tapply(iris[,1],iris[,5],mean)
>     setosa versicolor  virginica
>      5.006      5.936      6.588
> >
> but if I try to use this for all of the first 4, I get an error:
> >tapply(iris[,1:4],iris[,5],mean)
> Error in tapply(iris[, 1:4], iris[, 5], mean) :
>         arguments must have same length


This is a job for aggregate:

R> data(iris)
R> aggregate(iris[,1:4], list(Species = iris[,5]), mean)

     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa        5.006       3.428        1.462       0.246
2 versicolor        5.936       2.770        4.260       1.326
3  virginica        6.588       2.974        5.552       2.026


The by command would also work using colMeans:

R> by(iris[,1:4], list(Species = iris[,5]), colMeans)

Species: setosa
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       5.006        3.428        1.462        0.246 
------------------------------------------------------------ 
Species: versicolor
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       5.936        2.770        4.260        1.326 
------------------------------------------------------------ 
Species: virginica
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       6.588        2.974        5.552        2.026




More information about the R-help mailing list