[R] aggregate.formula implicitly removes rows containing NA

David Winsemius dwinsemius at comcast.net
Wed Jan 12 00:56:13 CET 2011


On Jan 11, 2011, at 5:41 PM, Dickison, Daniel wrote:

> The documentation for `aggregate` makes it sound like  
> aggregate.formula should behave identically to aggregate.data.frame  
> (apart from the way the parameters are passed).  But it looks like  
> aggregate.formula is quietly removing rows where any of the "output"  
> variables (those on the LHS of the formula) are NA.  This differs  
> from how aggregate.data.frame works.  Is this expected behavior?
>
> Here are a couple of examples:
>
>> d <- data.frame(a=rep(1:2, each=2),
> +                 b=c(1,2,NA,3))
>> aggregate(d["b"], d["a"], mean)
>  a   b
> 1 1 1.5
> 2 2  NA
>> aggregate(b ~ a, d, mean)
>  a   b
> 1 1 1.5
> 2 2 3.0
>
> It's removing whole rows even if just one of the columns is NA, i.e.:
>
>> d <- data.frame(a=rep(1:2, each=2),
> +                 b=c(1,2,NA,3),
> +                 c=c(NA,2,3,NA))
>> aggregate(cbind(b,c) ~ a, d, mean)
>  a b c
> 1 1 2 2
>

The help page for aggregate gives the calling defaults for  
aggregate.formula as:
## S3 method for class 'formula' aggregate(formula, data, FUN, ...,  
subset, na.action = na.omit)
So the description you give seems to be adhering to what I would have  
expected (had I initially read the help page.)
-- 
David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list