[R] strange answer when using 'aggregate()' with a formula

Chel Hee Lee chl948 at mail.usask.ca
Thu Jan 21 05:08:05 CET 2016


Could you kindly test the following codes?  It is because I found 
strange answer when 'aggregate()' is used with a formula.

I am trying to count how many missing data entries are in each group.  
For this exercise, I created data as below:

 > tmp <- data.frame(grp=c(2,3,2,3), y=c(NA, 0.5, 3, 0.5))
 > tmp
   grp   y
1   2  NA
2   3 0.5
3   2 3.0
4   3 0.5

I see that observations (variable y) can be grouped into two groups 
(variable grp).  For group 2, y has NA and 3.0.  For group 3, y has 0.5 
and 0.5.  Hence, the number of missing values is 1 and 0 for group 2 and 
3, respectively.   This work can be done using 'aggregate()' in the 
'stats' package as below:

 > aggregate(x=tmp$y, by=list(grp=tmp$grp), function(x) sum(is.na(x)))
   grp x
1   2 1
2   3 0

A formula can be used as below:

 > aggregate(y~grp, data=tmp, function(x) sum(is.na(x)))
   grp y
1   2 0
2   3 0

What a surprise!  Is this a bug?  I would appreciate if you share the 
results after testing the codes.   Thank you so much for your helps in 
advance!

Chel Hee Lee



More information about the R-help mailing list