[R] Question on class 1, 2 output for RandomForest

Liaw, Andy andy_liaw at merck.com
Wed Mar 23 16:31:22 CET 2005


The `1' and `2' columns are the error rates within those classes.  E.g., the
last row of the `1' column should correspond to the class.error for "-", and
the last row of the `2' column to the class.error for "+".      (I would
have thought that that should be fairly obvious, but I guess not.  It mimics
what Breiman and Cutler's Fortran code does.)  I suspect you showed us the
output from two different runs, so they don't match.  It does for me:

> library(randomForest)
randomForest 4.5-4 
Type rfNews() to see new features/changes/bug fixes.
> credit <- read.csv(url("ftp://ftp.ics.
> credit <-
read.csv(url("ftp://ftp.ics.uci.edu/pub/machine-learning-databases/credit-sc
reening/crx.data"), header=FALSE, na.string="?")
> credit.rf <- randomForest(V16~., credit, imp=T, do.trace=100,
na.action=na.omit)
ntree      OOB      1      2
  100:  20.37% 14.01% 28.04%
  200:  21.59% 15.41% 29.05%
  300:  20.52% 13.45% 29.05%
  400:  20.52% 13.17% 29.39%
  500:  20.21% 12.61% 29.39%
> credit.rf

Call:
 randomForest(x = V16 ~ ., data = credit, imp = T, do.trace = 100,
na.action = na.omit) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 3

        OOB estimate of  error rate: 20.21%
Confusion matrix:
    -   + class.error
- 312  45   0.1260504
+  87 209   0.2939189

The article in R News was written for the first version of the package.  It
has changed quite a bit in many respects since then.  The `class error' may
be important, e.g., if one of the classes only make up a small proportion of
the data.

Andy


> From: Melanie Vida
> 
> Hi All,
> 
> I read the R-newsletter Volum 2/3, December 2002 on page 18. 
> I tried the 
> example there, too. Then, I used a different data set with 
> random Forest 
> from the UCI respository. The results for the "credit" data 
> generated 2 
> additional columns, column "1" and a column "2" that the 
> example given 
> in the newsletter did not generate from the  fgl data set.
> 
> For the "credit" data, what does the output with the heading 
> "1", " 2" 
> imply for ntree=100...500 (below)? Does the "1" imply the 
> actual data, 
> "class 1" and a group of synthetic data "2" -> "class 2"? Did 
> my random 
> forest automatically default to unsupervised learning  and 
> automatically 
> create the class 2, synthetic data, then classify the 
> combined data with 
> the random Forest? If so, which method did R used to generate the 
> synthetic data? The newsletter states that there are 2 ways 
> to generate 
> synthetic data.
> 
> Further, the  parameters to tune these randomForest would ideally 
> optimize the OOB error rate and whatever column 1 and 2 error rates 
> mean? I tried mtry=2, 3 and 10, but that didn't change the 
> errors much. 
> Are these results reasonable, or should I tried to tune different 
> parameters for this special case?
> 
> ntree      OOB      1      2
>   100:  20.72% 14.10% 28.99%
>   200:  18.99% 13.58% 25.73%
>   300:  19.71% 15.14% 25.41%
>   400:  20.00% 14.10% 27.36%
>   500:  19.13% 13.58% 26.06%
> 
> Call:
>  randomForest(x = V16 ~ ., data = credit, mtry = 3, importance = 
> TRUE,      do.trace = 100)
>                Type of random forest: classification
>                      Number of trees: 500
> No. of variables tried at each split: 3
> 
>         OOB estimate of  error rate: 19.86%
> Confusion matrix:
>     -   + class.error
> - 326  57   0.1488251
> +  80 227   0.2605863
> 
> 
> Thanks in advance,
> 
> -Melanie
> -------
> # Read in the credit table
> credit = 
> read.table(url('ftp://ftp.ics.uci.edu/pub/machine-learning-dat
abases/credit-screening/crx.data'),sep=",")
> str(credit)
> credit$V2 = as.numeric(credit$V2)
> credit$V14 = as.numeric(credit$V14)
> str(credit)
> 
> credit.rf <- randomForest(V16 ~ ., data=credit, mtry=3, importance = 
> TRUE, do.trace=100)
> print(credit.rf)
> 
> 
> -Melanie
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>




More information about the R-help mailing list