# [R] strange answer when using 'aggregate()' with a formula

Chel Hee Lee chl948 at mail.usask.ca
Thu Jan 21 05:08:05 CET 2016

```Could you kindly test the following codes?  It is because I found
strange answer when 'aggregate()' is used with a formula.

I am trying to count how many missing data entries are in each group.
For this exercise, I created data as below:

> tmp <- data.frame(grp=c(2,3,2,3), y=c(NA, 0.5, 3, 0.5))
> tmp
grp   y
1   2  NA
2   3 0.5
3   2 3.0
4   3 0.5

I see that observations (variable y) can be grouped into two groups
(variable grp).  For group 2, y has NA and 3.0.  For group 3, y has 0.5
and 0.5.  Hence, the number of missing values is 1 and 0 for group 2 and
3, respectively.   This work can be done using 'aggregate()' in the
'stats' package as below:

> aggregate(x=tmp\$y, by=list(grp=tmp\$grp), function(x) sum(is.na(x)))
grp x
1   2 1
2   3 0

A formula can be used as below:

> aggregate(y~grp, data=tmp, function(x) sum(is.na(x)))
grp y
1   2 0
2   3 0

What a surprise!  Is this a bug?  I would appreciate if you share the
results after testing the codes.   Thank you so much for your helps in