[R] aggregate data.frame by one column

Andrew Robinson A.Robinson at ms.unimelb.edu.au
Fri Jun 30 05:25:04 CEST 2006


Hi Wei-Wei,

try this:

eva.agg <- aggregate(x = list(
                       VC1=eva$VC1,
                       EO1=eva$EO1,
                       EO2=eva$EO2,
                       EO3=eva$EO3,
                       EO4=eva$EO4,
                       EO5=eva$EO5
                       ),
                     by = list(PARTNO=eva$PARTNO),
                     FUN = mean, na.rm = TRUE)

eva.agg$NUM <- aggregate(eva$PARTNO, list(eva$PARTNO), length)


Cheers

Andrew


On Fri, Jun 30, 2006 at 10:54:47AM +0800, Guo Wei-Wei wrote:
> Hi, everyone,
> 
> I have a data.frame named "eva" like this:
> 
> IND PARTNO VC1 EO1 EO2 EO3 EO4 EO5
> 114 114001   2   5   4   4   5   4
> 114 114001   2   4   4   4   4   4
> 114 114001   2   4  NA  NA  NA  NA
> 112 112002   2   3   3   6   2   6
> 112 112002   2   1   1   3   4   4
> 112 112003   2   6   6   6   5   6
> 112 112003   2   5   7   6   6   6
> 112 112003   2   6   6   6   4   5
> 114 114004   2   2   3   3   2   4
> 114 114004   2   5   3   4   4   2
> 114 114004   2  NA  NA  NA  NA  NA
> 113 113005   2   5   5   6   6   5
> 113 113005   2   7   7   4   7   6
> 111 111006   2   5   7   7   7   7
> 112 112007   2   7   7   7   2   2
> 112 112007   2   6   6   6   1   2
> 112 112007   2   7   6   6   2   2
> 111 111008   2   4   1   3   1   4
> 111 111008   2   3   1   5   3   2
> 
> This is only a small part of the whole data. "PARTNO" is a digit variable
> and I want to use it as a group variable to aggreate other variables.
> What I want to get looks like this:
> 
> IND PARTNO NUM VC1 EO1 EO2 EO3 EO4 EO5
> 114 114001   3   2 4.3   4   4 4.5   4
> 112 112002   2   2   2   2 4.5   3   5
> 112 112003   3   2 5.7 6.3   6   5 5.7
> 114 114004   3   2 3.5   3 3.5   3   3
> 113 113005   2   2   6   6   5 6.5 5.5
> 111 111006   1   2   5   7   7   7   7
> 112 112007   3   2 6.7 6.3 6.3 1.7   2
> 111 111008   2   2 3.5   1   4   2   3
> 
> "NUM" is a newly added variable which indicates the case number
> of each group grouped by "PARTNO".
> 
> I have two questions on this manipulation.
> 
> The first is how to get the newly added variable "NUM". I have no idea
> on this question.
> 
> The second is how to average other variables by group. If there are
> "NA", I want
> the average operation is done on other cases. For example, the
> variable "EO1" has
> values of 2, 5, and "NA" on case 114004. What I have done is
> 
> > aggregate(eva[,-2], by=eva[,-2], mean)
> 
> But it seems because there are "NA"s, the "aggregate" cannot process.
> Because the "NA" values are not a small part, I cannot use imputation
> methods. I'm not sure whether my operation is right.
> 
> Does anyone have any suggestion on the two problems? Thanks in advance!
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
Email: a.robinson at ms.unimelb.edu.au         http://www.ms.unimelb.edu.au



More information about the R-help mailing list