[R] Different results from random.Forest with test option and using predict function

Liaw, Andy andy_liaw at merck.com
Tue Dec 4 16:04:25 CET 2012


Without data to reproduce what you saw, we can only guess.

One possibility is due to tie-breaking.  There are several places where ties can occur and are broken at random, including at the prediction step.  One difference between the two ways of doing prediction is that when it's all done within randomForest(), the test set prediction is performed as each tree is grown.  If there is any tie that needs to be broken at any prediction step, it will affect the RNG stream used by the subsequent tree growing step.

You can also inspect/compare the "forest" components of the randomForest objects to see if they are the same.  At least the first tree in both should be identical.

Andy

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of tdbuskirk
Sent: Monday, December 03, 2012 6:31 PM
To: r-help at r-project.org
Subject: [R] Different results from random.Forest with test option and using predict function

Hello R Gurus,

I am perplexed by the different results I obtained when I ran code like
this:
set.seed(100)
test1<-randomForest(BinaryY~., data=Xvars, trees=51, mtry=5, seed=200)
predict(test1, newdata=cbind(NewBinaryY, NewXs), type="response")

and this code:
set.seed(100)
test2<-randomForest(BinaryY~., data=Xvars, trees=51, mtry=5, seed=200,
xtest=NewXs, ytest=NewBinarY)

The confusion matrices for the two forests I thought would be the same by
virtue of the same seed settings, but they differ as do the predicted values
as well as the votes.  At first I thought it was just the way ties were
broken, so I changed the number of trees to an odd number so there are no
ties anymore.  

Can anyone shed light on what I am hoping is a simple oversight?  I just
can't figure out why the results of the predictions from these two forests
applied to the NewBinaryYs and NewX data sets would not be the same.

Thanks for any hints and help.

Sincerely,

Trent Buskirk



--
View this message in context: http://r.789695.n4.nabble.com/Different-results-from-random-Forest-with-test-option-and-using-predict-function-tp4651970.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice:  This e-mail message, together with any attachme...{{dropped:11}}




More information about the R-help mailing list