[R] RandomForest

Liaw, Andy andy_liaw at merck.com
Wed Aug 20 14:01:29 CEST 2003

Please tell us the version of the package, the version of R, and the
platform you're working in.

Sounds like you should upgrade to a newer version of the randomForest
package.  In Breiman's original code, he is counting the number of
misclassified cases and dividing that by the total number of cases.  This is
fine for sufficiently large number of trees (say about 20 or more).  Because
the prediction is based on aggregating the out-of-bag prediction, the error
rate should be number of misclassified cases divided by the number of cases
that have been predicted.  When the number of trees is small, not all cases
have been out-of-bag, and therefore not all of them have prediction.

There's similar problem with regression, which I have fixed in the R
package.  Users of Leo's Fortran code should be aware.

Currently there's no provision to cut trees out of a forest.  I originally
thought about writing a "burn" function that does this, but decided against
it: what would be the point?  Having more trees just takes up more
memory/disk space, and takes a bit longer for prediction, but does not
degrade prediction performance, unlike boosting.


> -----Original Message-----
> From: Vladimir N. Kutinsky [mailto:kutinskyv at obninsk.com] 
> Sent: Wednesday, August 20, 2003 4:43 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] RandomForest
> Hello,
> When I plot or look at the error rate vector for a random forest
> (rf$err.rate) it looks like a descending function except for 
> a few first points of the vector with error rates values 
> lower(sometimes much lower) than the general level of error 
> rates for a forest with such number of trees when the error 
> rates stop descending. Does it mean that there is a tree(s) 
> (that is built the first in the forest) that has a higher 
> predictive accuracy than the whole forest of trees?
> One more minor question. Is there a way to "snip" the forest 
> of 100 trees to, say, 50 trees?
> Thanks,
> Vladimir
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA), and/or
its affiliates (which may be known outside the United States as Merck Frosst,
Merck Sharp & Dohme or MSD) that may be confidential, proprietary copyrighted
and/or legally privileged, and is intended solely for the use of the
individual or entity named on this message.  If you are not the intended
recipient, and have received this message in error, please immediately return
this by e-mail and then delete it.

More information about the R-help mailing list