[R] mean of subset of rows

darteta001 at ikasle.ehu.es darteta001 at ikasle.ehu.es
Tue Oct 2 11:58:10 CEST 2007


Thankyou all for your answers, I have decided using aggregate() but I 
will keep in mind tapply(). I was wondering if it is possible to tell 
aggregate to use two functions at the same time, i.e., mean() and sd
(), or is it better to call aggregate() two times, one for mean, and 
another for sd and then cbind both results.

Also, if the data.frame has now three columns, ID-size-Town, as 
follows:

>data<-data.frame(ID=c(rep(letters[1:4],2),rep("f",12)), size=runif
(20),Town=c(rep(LETTERS[1:2],4),rep("C",12)))

And I want to produce a table with the mean size for each ID keeping 
Town in the result table, I cannot get to add the Town for each ID

>avgs <-aggregate(data$size, by = list(data$ID), mean)
>SDs <-aggregate(data$size, by = list(data$ID), sd)

>results = cbind(avgs,SDs[2],data$Town)
Error in data.frame(..., check.names = FALSE) : 
        arguments imply differing number of rows: 5, 20


Thanks again!

David

> You were on the right track with the for loop, but often you can do  
> the same thing looplessly (I know, it's not really a word) in R:
> 
> If your data is like this:
> 
> data<-data.frame(ID=rep(letters[1:4], 5), size=runif(20))
> 
> then apply either
> 
> tapply(data$size, data$ID, mean)
> 
> or
> 
> aggregate(data$size, list(data$ID), mean)
> 
> For further reference, section 4.2 in "An Introduction to R"  
> describes using tapply in this way.
> 
> Jeff.
> 
> On Oct 1, 2007, at 11:57 AM, <darteta001 at ikasle.ehu.es>  
> <darteta001 at ikasle.ehu.es> wrote:
> 
> > Dear list,
> > this must be an easy one:
> >
> > I have a data.frame of two columns, "ID" with four different 
levels (A
> > to D) and numerical "size", and each of the 4 different IDs is
> > repeated a
> > different number of times. I would like to get the mean size for 
each
> > ID as another data.frame. I have tried the following:
> >
> >> ID= as.character(unique(data[,1])) # I use unique() because "data"
> > will be larger in future
> >> nIDs = length(ID)
> >> for(i in 1:nIDs){
> > +  subdata = subset(data,V1==ID[i])
> > +  average = as.data.frame(cbind(1:i,ID[i],mean(subdata[,2]))
> > + }
> >
> > Unfortunately, my output only gets the last level of ID four times:
> >> average
> >      V1 V2               V3
> > 1  1  D 179.777777777778
> > 2  2  D 179.777777777778
> > 3  3  D 179.777777777778
> > 4  4  D 179.777777777778
> >
> > How can I get what I need? there might be an easier way to do it, 
but
> > I guess my skills aren´t that good. Any suggestions are welcome
> >
> > Regards,
> >
> > David
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting- 
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 



More information about the R-help mailing list