[R] randomForest and missing data

Darin A. England england at cs.umn.edu
Thu Jan 4 22:13:14 CET 2007


Does anyone know a reason why, in principle, a call to randomForest
cannot accept a data frame with missing predictor values? If each
individual tree is built using CART, then it seems like this
should be possible. (I understand that one may impute missing values
using rfImpute or some other method, but I would like to avoid doing
that.) 

If this functionality were available, then when the trees are being
constructed and when subsequent data are put through the forest, one
would also specify an argument for the use of surrogate rules, just
like in rpart. 

I realize this question is very specific to randomForest, as opposed
to R in general, but any comments are appreciated. I suppose I am
looking for someone to say "It's not appropriate, and here's why
..." or "Good idea. Please implement and post your code."

Thanks,

Darin England, Senior Scientist
Ingenix



More information about the R-help mailing list