[R] rpart: which is correct?

Shawn Rutledge shawn at kfold.org
Mon Aug 3 01:49:46 CEST 2009


I am using rpart in classification mode and am confused about this
particular model's predictions. 

> predict(fit, train[8,])
         -1         1
8 0.5974089 0.4025911

> predict(fit, train[8,], type="class")
1 
Levels: -1 1

So, it seems like there is a 60% change of being class -1 according the
the "prob" output (which is the default for classification) but gives me
"1" for the label according to "class" (and "vector") output. 

This is consistent with the classifier itself, see leaf 13) which is
where instance 8 falls:
n= 130 
node), split, n, loss, yval, (yprob)
      * denotes terminal node
 1) root 130 3.267380e-01 -1 (0.8093746309 0.1906253691)  
   2) V27>=0.8191 55 4.529233e-02 -1 (0.9489214984 0.0510785016)  
     4) V11>=0.198 34 1.376853e-03 -1 (0.9983657106 0.0016342894) *
     5) V11< 0.198 21 1.585552e-04 1 (0.0073846214 0.9926153786) *
   3) V27< 0.8191 75 2.649122e-01 1 (0.6598071132 0.3401928868)  
     6) V21>=0.67445 36 8.797984e-02 -1 (0.8480453373 0.1519546627)  
      12) V42>=0.21395 19 1.212569e-02 -1 (0.9689540598 0.0310459402) *
      13) V42< 0.21395 17 5.462638e-02 1 (0.5974089208 0.4025910792) *
     7) V21< 0.67445 39 2.662329e-02 1 (0.2209155805 0.7790844195)  
      14) V49>=0.0989 7 9.839830e-04 -1 (0.9823268484 0.0176731516) *
      15) V49< 0.0989 32 8.058141e-05 1 (0.0008618964 0.9991381036) *

I'm using a 'weights' vector when fitting the rpart model which I'm
guessing may have something to do with it?

What is the correct classification for my observation? 

I see the same results on windows32/R2.9 and linux64/R2.8.




More information about the R-help mailing list