[R] aggregate with missing values, data.frame vs formula
    David Freedman 
    dxf1 at cdc.gov
       
    Sat Nov 13 20:50:09 CET 2010
    
    
  
It seems that the formula and data.frame forms of aggregate handle missing
values differently.  For example, 
(d=data.frame(sex=rep(0:1,each=3),
wt=c(100,110,120,200,210,NA),ht=c(10,20,NA,30,40,50)))
x1=aggregate(d, by = list(d$sex), FUN = mean);
names(x1)[3:4]=c('list.wt','list.ht')
x2=aggregate(cbind(wt,ht)~sex,FUN=mean,data=d);
names(x2)[2:3]=c('form.wt','form.ht')
cbind(x1,x2)
 Group.1 sex list.wt list.ht sex form.wt form.ht
1       0   0     110      NA   0     105      15
2       1   1      NA      40    1     205      35
So, the data.frame form deletes gives an NA if there are missing values in
the group for the variable with missing values.  But, the formula form
deletes the entire row (missing and non-missing values) if any of the values
are missing.  Is this what was intended or the best option ?
thanks, david freedman
-- 
View this message in context: http://r.789695.n4.nabble.com/aggregate-with-missing-values-data-frame-vs-formula-tp3041198p3041198.html
Sent from the R help mailing list archive at Nabble.com.
    
    
More information about the R-help
mailing list