[R] randomForest and missing data

Weiwei Shi helprhelp at gmail.com
Thu Jan 4 23:50:20 CET 2007


You can try randomForest in Fortran codes, which has that function
doing missing replacement automatically. There are two ways of
imputations (one is fast and the other is time-consuming) to do that.
I did it long time ago.

the link is below. If you have any question, just let me know.
http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

In principle, each individual tree is NOT a cart tree since each
splitting predictor is randomly selected. In my impression, rf is more
like nearest neighbor algorithm. The surrogation is NOT used in rf
implementation. That's "why" you have to impute it before using it;
while the imputation is not implemented in r-version, in my best
knowledge.
You can check that from reading the original technical report or some
presentation by original authors. I remember there was some slide
comparing rf and CART somewhere.


HTH,

weiwei

On 1/4/07, Darin A. England <england at cs.umn.edu> wrote:
>
> Does anyone know a reason why, in principle, a call to randomForest
> cannot accept a data frame with missing predictor values? If each
> individual tree is built using CART, then it seems like this
> should be possible. (I understand that one may impute missing values
> using rfImpute or some other method, but I would like to avoid doing
> that.)
>
> If this functionality were available, then when the trees are being
> constructed and when subsequent data are put through the forest, one
> would also specify an argument for the use of surrogate rules, just
> like in rpart.
>
> I realize this question is very specific to randomForest, as opposed
> to R in general, but any comments are appreciated. I suppose I am
> looking for someone to say "It's not appropriate, and here's why
> ..." or "Good idea. Please implement and post your code."
>
> Thanks,
>
> Darin England, Senior Scientist
> Ingenix
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III



More information about the R-help mailing list