[R] random forest problem when calculating variable importanc e

Liaw, Andy andy_liaw at merck.com
Thu Oct 14 22:28:02 CEST 2004


Are the results dramatically different?

The result would be expected to be somewhat different, as setting
importance=TRUE would make many calls to the random number generator (for
permuting OOB data in each variable), making all but the first tree in the
forest different than if importance=FALSE.

Cheers,
Andy

> From: Scott Gilpin
> 
> Hi - 
> 
> When using the randomForest function for regression, I get different
> results for mean-squared error of the predictions depending on whether
> or not I specify to calculate variable importance.  There is an
> example below.  I looked briefly at the source code, but couldn't find
> anything that would indicate why calculating variable importance would
> (or should) change predictions.
> 
> I'm using randomForest version 4.3-3 (the latest from CRAN), and tried
>  R 1.9.0, 1.9.1 and 2.0.0 on Windows XP, and R 1.9.1 on solaris 8.
> 
> Thanks,
> Scott Gilpin
> 
> library(randomForest)
> set.seed(2863)
> x<-matrix(runif(1000),ncol=10)
> colnames(x)<-1:10
> beta<-matrix(c(1,2,3,4,5,0,0,0,0,0),ncol=1)
> y<-drop(x %*% beta + rnorm(100))
> newx<-matrix(runif(1000),ncol=10)
> newy<-drop(newx %*% beta + rnorm(100))
> 
> set.seed(2863)
> rf.fit <- randomForest(x=x,y=y,xtest=newx,ytest=newy,importance=F)
> print(rf.fit$test$mse[500])
> 
> set.seed(2863)
> rf.fit <- randomForest(x=x,y=y,xtest=newx,ytest=newy,importance=T)
> print(rf.fit$test$mse[500])
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>




More information about the R-help mailing list