[R] Levels in new data fed to SVM
claus.orourke at gmail.com
Thu Jan 10 14:57:52 CET 2013
Thanks for clarifying!
On Thu, Jan 10, 2013 at 12:47 PM, Uwe Ligges
<ligges at statistik.tu-dortmund.de> wrote:
> On 08.01.2013 21:14, Claus O'Rourke wrote:
>> Hi all,
>> I've encountered an issue using svm (e1071) in the specific case of
>> supplying new data which may not have the full range of levels that
>> were present in the training data.
>> I've constructed this really primitive example to illustrate the point:
>>> training.data <- data.frame(x = c("yellow","red","yellow","red"), a =
>>> c("alpha","alpha","beta","beta"), b = c("a", "b", "a", "c"))
>>> my.model <- svm(x ~ .,data=training.data)
>>> test.data <- data.frame(x = c("yellow","red"), a = c("alpha","beta"), b =
>>> c("a", "b"))
>> Error in predict.svm(my.model, test.data) :
>> test data does not match model !
>>> levels(test.data$b) <- levels(training.data$b)
>> 1 2
>> yellow red
>> Levels: red yellow
>> In the first case test.data$b does not have the level "c" and this
>> results in the input data being rejected. I've debugged this down to
>> the point of model matrix creation in the SVM R code. Once I fill up
>> the levels in the test data with the levels from the original data,
>> then there is no problem at all.
>> Assuming my test data has to come from another source where the number
>> of category levels seen might not always be as large as those for the
>> original training data, is there a better way I should be handling
> You have to tell the factor about the possible levels, it does not
> necessarily contain examples.
> That means:
> levels(test.data$b) <- C("a", "b", "c")
> will help.
> Uwe Ligges
>> R-help at r-project.org mailing list
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help