[R] data.frame and formula classes of aggregate

Mon Nov 29 15:49:38 CET 2010

On Nov 29, 2010, at 9:35 AM, David Freedman wrote:

>
> Hi - I apologize for the 2nd post, but I think my question from a  
> few weeks
> ago may have been overlooked on a Friday afternoon.
>
> I might be missing something very obvious, but is it widely known  
> that the
> aggregate function handles missing values differently depending if a  
> data
> frame or a formula is the first argument ?

I'm not sure if it is widely known, but it is certainly suggested by  
the documentation for aggregate, since aggregate.data.frame  has  
different defaults than aggregate.formula. See the Usage section at  
the very top of ?aggregate.

>  For example,
>
> (d<- data.frame(sex=rep(0:1,each=3),
> wt=c(100,110,120,200,210,NA),ht=c(10,20,NA,30,40,50)))
> x1<- aggregate(d, by = list(d$sex), FUN = mean);
> 	names(x1)[3:4]<- c('mean.dfcl.wt','mean.dfcl.ht')
> x2<- aggregate(cbind(wt,ht)~sex,FUN=mean,data=d);
> 	names(x2)[2:3]<- c('mean.formcl.wt','mean.formcl.ht')
> cbind(x1,x2)[,c(2,3,6,4,7)]
>
> The output from the data.frame class has an NA if there are missing  
> values
> in the group for the variable with missing values.  But, the formula  
> class
> output seems to delete the entire row (missing and non-missing  
> values) if
> there are any NAs.  Wouldn't one expect that the 2 forms (data frame  
> vs
> formula) of aggregate would give the same result?
>
> thanks very much
> david freedman, atlanta
>
>
-- 

David Winsemius, MD
West Hartford, CT