[R] Interpretation of randomForest results

Liaw, Andy andy_liaw at merck.com
Tue Jan 18 14:12:23 CET 2005


> From: luk
> 
> I got the following results when I run radomForest with below 
> commands:
>  
> qair <- read.table("train10.dat", header = T)
> oz.rf <- randomForest(LESION ~ ., data = qair, ntree = 220,  
> importance = TRUE)
> print(oz.rf)
> 
> Call:
>  randomForest.formula(x = LESION ~ ., data = qair, ntree = 
> 220,      importance = TRUE) 
>                Type of random forest: classification
>                      Number of trees: 220
> No. of variables tried at each split: 2
>         OOB estimate of  error rate: 15.86%
          ^^^

Note what that says, which applies to the confusion matrix below as well.

> Confusion matrix:
>        lesion noninf class.error
> lesion   3949    525   0.1173447
> noninf    894   3580   0.1998212
> 
> What did this mean? Is 11.7% the classification error for 
> 'lesion' class, and 19.98% the classification error for 
> 'noninf' class in the training set?

The results you showed above are out-of-bag (OOB) results.  If you don't
know what that means, you should read the documentation, and perhaps the
references.
 
> But when I run below command to test the performance of 
> classification in the same training set.
> 
> ntrain <- read.table("train10.dat", header = T)
> ntrain.pred <- predict(oz.rf, ntrain)
> table(observed = ntrain[, "LESION"], predicted = ntrain.pred)
> 
> I got the following results. It seemed that the 
> classification rates for 'lesion' and 'noninf' classes are 0. 
> Any suggestion will be very appreciated.

randomForest is rather good at overfitting _training_ data, but that's
(usually) not a problem in classification.  What one usually cares about is
the _test set_ performance.  There, randomForest performance does not
degrade as the number of trees increases, and that's what Breiman meant by
`random forests do not overfit'.

Andy

 
> 
>         predicted
> observed lesion noninf
>   lesion 4474      0  
>   noninf    0   4474  
> 
> 
>  
> 
> 
> 
> 		
> ---------------------------------
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>




More information about the R-help mailing list