[R] CART vs. Random Forest

Thu Sep 26 23:26:10 CEST 2002

I wouldn't bother modifying classwt -- that doesn't seem to have much effect
(as Breiman has mentioned to Andy Liaw).

I use the following function, "biased predict".  (Not specific to random
forests, which is why it's not in the package.)

biased.predict <-
function(object, newdata, thresh,
           which.test = "Bad", if.high = "Bad", if.low = "Good",
           pred.type = "prob"){
    probs <- predict(object, newdata = newdata, type = pred.type)
    levels <- dimnames(probs)[[2]]
    ans <- apply(probs, 1, function(x){
      ifelse(x[which.test] > thresh, if.high, if.low)})
    ans <- factor(ans, levels = levels)
  }

You can get the errors of different types -- the confusion matrix -- from
table(data.frame(true = true.vals, pred = pred.vals)), and then multiply
this by a weight matrix to get a weighted error score.  You can run
biased.predict for a number of different threshold values and check the
weighted error scores, choosing the threshold that gives you the lowest.
(Though running this over and over is inefficient -- better predict the
probabilities once and then do multiple cutoffs.)  Or you can choose your
threshold by saying that one type of error must be no larger than a certain
value (which is what I've usually done, precisely to limit false negatives,
as you want to). 

Once you've chosen a threshold, you can used biased.predict for new data.

I hope I'm making sense, and that this helps.

Matt

-----Original Message-----
From: Andrew Baek [mailto:andrew at stat.ucla.edu]
Sent: Thursday, September 26, 2002 4:36 PM
To: Wiener, Matthew
Cc: r-help at stat.math.ethz.ch
Subject: RE: [R] CART vs. Random Forest

Of course, the CART & RF are different method. But at least,
I have to consider that false negative is more serious than
false positive in my problem. For this purpose, I used "prior"
in rpart and "classwt" in RF. Then, should I modify priors and 
cut-off point at the same time? 

Andrew

On Thu, 26 Sep 2002, Wiener, Matthew wrote:

> We haven't implemented different voting thresholds in the package itself,
> but when you predict you can get out votes or probabilities rather than
> classes if you want.  The argument type to predict.randomForest is "class"
> by default, but can also be "vote" or "prob".  You can use the training
set
> to figure out what a good threshold is, and then check your results on a
> test set.  Then you just use the threshold later.  
> 
> I suppose we could implement a threshold that could be supplied to
predict,
> but then we'd have to work something out for multi-class problems --
several
> different cutpoints, I guess.  It's not a priority for Andy or me right
now.
> I actually like to take a look at the ROC curve anyway, to decide what
> tradeoffs are worthwhile.
> 
> I'd compare the results by looking at the error rates -- if you can make
the
> (possibly weighted) error rate lower with one method or the other, that's
> the method that ones.
> 
> Regards,
> 
> Matt

------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it.

==============================================================================

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._