[R] [handling] Missing [values in randomForest]

Kevin Bartz bartzk at yahoo-inc.com
Tue Sep 13 01:17:20 CEST 2005


Hi Jan-Paul,

You definitely want to be careful with na.omit in randomForest -- that
wipes out any row with even one NA. If NAs are sprawled throughout your
dataset, na.omit might end up killing a lot of rows. Here's my usual MO
for missing values:

1) "impute" in Hmisc fills in gaps with the mean, median, most common
value, etc.
2) rfImpute: fits a forest on the rows available and uses it to predict
the missing values.
3) aregImpute: similar to rfImpute, but using a linear model.
4) You may want to consider using a single tree ("rpart" package) in
this case instead of a forest. Single trees deal with missing values
cleanly through surrogate splits.

Good luck!

Kevin

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Uwe Ligges
Sent: Sunday, September 11, 2005 3:44 AM
To: Jan-Paul Roodbol
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] [handling] Missing [values in randomForest]

Jan-Paul Roodbol wrote:

> Does anyone know if randomForest in R can handle
> dataset with missings?

See ?randomForest, you can omit observations including NAs by specifying

na.action=na.omit

Please do not cross-post!
Please specify a sensible subject!

Uwe Ligges


> Thank you
> 
> Kind regards
> 
> Jan-Paul
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




More information about the R-help mailing list