[R] na.action in randomForest --- Summary

David Parkhurst parkhurs at ariel.ucs.indiana.edu
Tue Aug 5 21:31:03 CEST 2003


A few days ago I asked whether there were options other than
na.action=na.fail for the R port of Breiman’s randomForest;  the function’s
help page did not say anything about other options.

I have since discovered that a pdf document called “The randomForest
 Package” and made available by Andy Liaw (who made the tool available in
R---thank you) does discuss an option.  It is an implementation of Breiman’s
suggestion “to replace each missing value by the median of its column and
each missing categorical by the most frequent value in that categorical. My
impression is that because of the randomness and the many trees grown,
filling in missing values with a sensible values does not effect accuracy
much.” (from his report, "Manual On Setting Up, Using, And Understanding
Random Forests V3.1").

I now plan to try the na.roughfix option from Liaw’s package.

Thanks to Uwe Ligges and Brian Ripley for their replies to my posting.

Dave Parkhurst




More information about the R-help mailing list