[R] Random forests prediction

Liaw, Andy andy_liaw at merck.com
Mon May 14 15:58:02 CEST 2012


I don't think this is so hard to explain.  If you evaluate AUC using either OOB prediction or on a test set (or something like CV or bootstrap), that would be what I expect for most data.  When you add more variables (that are, say, less informative) to a model, the model has to look harder to find the informative ones, and thus you pay a penalty.  One exception to that is if some of the "new" variables happen to have very strong interaction with some of the "old" variables, then you may see improved performance.

I've said it several times before, but it seems to be worth repeating:  Don't use the training set for evaluating models:  that almost never make sense.

Andy


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of matt
Sent: Friday, May 11, 2012 3:43 PM
To: r-help at r-project.org
Subject: [R] Random forests prediction

Hi all,

I have a strange problem when applying RF in R. 
I have a set of variables with which I obtain an AUC of 0.67.

I do have a second set of variables that have an AUC of 0.57. 

When I merge the first and second set of variables, the AUC becomes 0.64. 

I would expect the prediction to become better as I add variables that do
have some predictive power?
This is even more strange as the AUC on the training set increased when I
added more variables (while the AUC of the validation set thus decreased).

Is there anyone who has experienced the same and/or who know what could be
the reason?

Thanks,

Matthijs

--
View this message in context: http://r.789695.n4.nabble.com/Random-forests-prediction-tp4627409.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice:  This e-mail message, together with any attachme...{{dropped:11}}



More information about the R-help mailing list