[R] confusion matrix in randomForest

Liaw, Andy andy_liaw at merck.com
Tue Jul 22 03:41:08 CEST 2008


randomForest predictions are based on votes of individual trees, thus
have little to do with error rates of individual trees.

Andy 

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Miklos Kiss
> Sent: Saturday, July 19, 2008 10:47 PM
> To: r-help at r-project.org
> Subject: [R] confusion matrix in randomForest
> 
> 
> I have a question on the output generated by randomForest in 
> classification
> mode, specifically, the confusion matrix.  The confusion 
> matrix lists the
> various classes and how the forest classified each one, plus the
> classification error.  Are these numbers essentially averages 
> over all the
> trees in the forest?  If so, is there a way I can get the 
> standard deviation
> values out of the randomForest, or do I have to evaluate each tree
> individually?  By way of illustration, let me show the 
> confusion matrix
> using the iris data.  The output below shows that the forest correctly
> classified 47 versicolor irises, but this is the result for the entire
> forest.  I'd like to know if every tree will have 47 
> correctly classified
> versicolor irises, but I don't think it will.  Same for the 
> class.error
> value.  Not every tree will have those exact same values, right?
> 
> But this raises another question.  For this example, I used 
> the entire data
> set to generate the forest, and so I assume that the 
> confusion matrix is
> based on OOB data, so if I created a training set and evaluated trees
> individually in the test set I could get averages and 
> standard deviations on
> the error rate.
> 
> Any thoughts?  Thanks in advance.
> 
> -Miklos Z. Kiss
> 
> > print(iris.rf)
> Call:
>  randomForest(formula = Species ~ ., data = iris, importance 
> = TRUE,     
> keep.forest = TRUE) 
>                Type of random forest: classification
>                      Number of trees: 500
> No. of variables tried at each split: 2
> 
>         OOB estimate of  error rate: 5.33%
> Confusion matrix:
>            setosa versicolor virginica class.error
> setosa         50          0         0        0.00
> versicolor      0         47         3        0.06
> virginica       0          5        45        0.10
> -- 
> View this message in context: 
> http://www.nabble.com/confusion-matrix-in-randomForest-tp18550
873p18550873.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:12}}



More information about the R-help mailing list