[R] aggregate with missing values, data.frame vs formula

David Freedman dxf1 at cdc.gov
Sat Nov 13 20:50:09 CET 2010


It seems that the formula and data.frame forms of aggregate handle missing
values differently.  For example, 

(d=data.frame(sex=rep(0:1,each=3),
wt=c(100,110,120,200,210,NA),ht=c(10,20,NA,30,40,50)))
x1=aggregate(d, by = list(d$sex), FUN = mean);
names(x1)[3:4]=c('list.wt','list.ht')
x2=aggregate(cbind(wt,ht)~sex,FUN=mean,data=d);
names(x2)[2:3]=c('form.wt','form.ht')
cbind(x1,x2)

 Group.1 sex list.wt list.ht sex form.wt form.ht
1       0   0     110      NA   0     105      15
2       1   1      NA      40    1     205      35

So, the data.frame form deletes gives an NA if there are missing values in
the group for the variable with missing values.  But, the formula form
deletes the entire row (missing and non-missing values) if any of the values
are missing.  Is this what was intended or the best option ?

thanks, david freedman
-- 
View this message in context: http://r.789695.n4.nabble.com/aggregate-with-missing-values-data-frame-vs-formula-tp3041198p3041198.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list