[R] data.frame and formula classes of aggregate

David Freedman dxf1 at cdc.gov
Mon Nov 29 15:35:35 CET 2010


Hi - I apologize for the 2nd post, but I think my question from a few weeks
ago may have been overlooked on a Friday afternoon.

I might be missing something very obvious, but is it widely known that the
aggregate function handles missing values differently depending if a data
frame or a formula is the first argument ?  For example, 

(d<- data.frame(sex=rep(0:1,each=3),
wt=c(100,110,120,200,210,NA),ht=c(10,20,NA,30,40,50)))
x1<- aggregate(d, by = list(d$sex), FUN = mean); 
	names(x1)[3:4]<- c('mean.dfcl.wt','mean.dfcl.ht')
x2<- aggregate(cbind(wt,ht)~sex,FUN=mean,data=d); 
	names(x2)[2:3]<- c('mean.formcl.wt','mean.formcl.ht')
cbind(x1,x2)[,c(2,3,6,4,7)]

The output from the data.frame class has an NA if there are missing values
in the group for the variable with missing values.  But, the formula class
output seems to delete the entire row (missing and non-missing values) if
there are any NAs.  Wouldn't one expect that the 2 forms (data frame vs
formula) of aggregate would give the same result? 

thanks very much
david freedman, atlanta




-- 
View this message in context: http://r.789695.n4.nabble.com/data-frame-and-formula-classes-of-aggregate-tp3063668p3063668.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list