[R] Random Forest & Cross Validation

Liaw, Andy andy_liaw at merck.com
Thu Feb 24 21:56:16 CET 2011


Exactly as Max said.  See the rfcv() function in the latest version of randomForest, as well as the reference in the help page for that function.

OOB estimate is as accurate as CV estimate _if_ you run straight RF.  Most other methods do not have this "feature".  However, if you start adding steps such as feature selections, all bets are off.

Andy 

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of mxkuhn
> Sent: Tuesday, February 22, 2011 7:17 PM
> To: ronzhao
> Cc: r-help at r-project.org
> Subject: Re: [R] Random Forest & Cross Validation
> 
> If you want to get honest estimates of accuracy, you should 
> repeat the feature selection within the resampling (not the 
> test set). You will get different lists each time, but that's 
> the point. Right now you are not capturing that uncertainty 
> which is why the oob and test set results differ so much.
> 
> The list you get int the original training set is still the 
> real list. The resampling results help you understand how 
> much you might be overfitting the *variables*.
> 
> Max
> 
> On Feb 22, 2011, at 4:39 PM, ronzhao <yzhaohsph at gmail.com> wrote:
> 
> > 
> > Thanks, Max.
> > 
> > Yes, I did some feature selections in the training set. Basically, I
> > selected the top 1000 SNPs based on OOB error and grow the 
> forest using
> > training set, then using the test set to validate the forest grown.
> > 
> > But if I do the same thing in test set, the top SNPs would 
> be different than
> > those in training set. That may be difficult to interpret.
> > 
> > 
> > 
> > 
> > -- 
> > View this message in context: 
> http://r.789695.n4.nabble.com/Random-Forest-Cross-Validation-t
p3314777p3320094.html
> > Sent from the R help mailing list archive at Nabble.com.
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:11}}



More information about the R-help mailing list