[R] RWeka - Error in model.frame.default - evaluate_Weka_classifier

Andreas Jansson andreas.s.t.jansson at gmail.com
Wed Oct 20 00:45:41 CEST 2010


Hi,

First of all, I'm a complete rookie to R (~2 weeks). But anyway, I'm
trying to use the RWeka interface for C4.5 (J48) classification.

As a proof of concept I'm using the Iris data set to create a training
set of 30 instances (10 per species) and use the remaining 120
instances as my test set.

This is what I do:

trainingIndices <- rep(1:10, 3) + rep(0:2, each=10) * 50
testIndices <- c(1:150)[-(trainingIndices)]
testSet <- iris[testIndices,]
trainingSet <- iris[trainingIndices,]
t <- J48(trainingSet ~ ., data=trainingSet)

So far, so good. I can even do predict(t, testSet). Now I want to get
more detailed statistics about the performance of my classifier.

evaluate_Weka_classifier(t, testSet)

This is when I get

Error in model.frame.default(formula = trainingSet$Species ~ ., data = list( :
  variable lengths differ (found for 'Sepal.Length')

traceback() returns:

7: model.frame.default(formula = trainingSet$Species ~ ., data = list(
 [...]
       3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L)), drop.unused.levels = TRUE)
6: model.frame(formula = trainingSet$Species ~ ., data =
list(Sepal.Length = c(5.4,
 [...]
   3L, 3L, 3L, 3L)), drop.unused.levels = TRUE)
5: eval(expr, envir, enclos)
4: eval(mf, env)
3: model.frame.Weka_classifier(object, data = newdata)
2: model.frame(object, data = newdata)
1: evaluate_Weka_classifier(t, testSet)

([...] = Excluded values for readability, please let me know if they
would be of use)

When I try to evaluate a test set with the same number of rows as my
training set, it does work.

evaluate_Weka_classifier(t, testSet[1:30,])

Does this mean that you cannot use a test set larger (or smaller) than
the training set? Or am I completely misunderstanding the purpose of
the evaluate_Weka_classifier function?

Finally, some info on my system:
R version 2.10.1 (2009-12-14)
RWeka version [Most recent as of 2010-10-19, can't find the exact
number anywhere, sorry!]
Java version 1.6.0_22 (from the sun-java6-jdk package)
OS: Ubuntu 10.04.1

Many thanks,
Andreas Jansson



More information about the R-help mailing list