[R] Weka on command line c.f. using RWeka

Patrick Connolly p_connolly at slingshot.co.nz
Mon Nov 12 08:53:50 CET 2012


Running Weka's command line with calls to system(), like this

> system("java weka.classifiers.bayes.NaiveBayes -K -t HWlrTrain.arff -o")

=== Confusion Matrix ===

    a    b   <-- classified as
 3518  597 |    a = NoSpray
  644  926 |    b = Spray

=== Stratified cross-validation ===


=== Confusion Matrix ===

    a    b   <-- classified as
 3512  603 |    a = NoSpray
  653  917 |    b = Spray

So far, no surprises except that maybe I might have expected a few
more misclassifications in the cross-validation.

However,

If I use the same data in R
> train.df <- read.arff("HWlrTrain.arff")
using RWeka, like this:

NB <- make_Weka_classifier("weka/classifiers/bayes/NaiveBayes")
wNB <- NB(decision ~ ., data = train.df,
+           control = Weka_control(K = TRUE))
> summary(wNB)

=== Summary ===

Correctly Classified Instances        4437               78.0475 %
Incorrectly Classified Instances      1248               21.9525 %
Kappa statistic                          0.4446
Mean absolute error                      0.2679
Root mean squared error                  0.3924
Relative absolute error                 67.0055 %
Root relative squared error             87.7545 %
Coverage of cases (0.95 level)          97.9244 %
Mean rel. region size (0.95 level)      83.0519 %
Total Number of Instances             5685     

=== Confusion Matrix ===

    a    b   <-- classified as
 3520  595 |    a = NoSpray
  653  917 |    b = Spray

The resulting confusion matrix is different from both the training and
the cross-validation matrices from Weka's command line.

Somewhat ironically, if I use the model to predict on test data, like
this, predict(wNB, test.df)

I do get exactly the same as I would from the Weka CLI.

Maybe the difference isn't important, but I would have expected the
two approaches would have done exactly the same thing.

Any possible explanations?



-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___    Patrick Connolly   
 {~._.~}                   Great minds discuss ideas    
 _( Y )_  	         Average minds discuss events 
(:_~*~_:)                  Small minds discuss people  
 (_)-(_)  	                      ..... Eleanor Roosevelt
	  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.



More information about the R-help mailing list