[R] problem in testing data with e1071 package (SVM Multiclass)

Sat Sep 2 17:26:19 CEST 2017

Hello all,

this is the first time I'm using R and e1071 package and SVM multiclass 
(and I'm not a statistician)! I'm very confused, then. The goal is: I 
have a sentence with sunny; it will be classified as "yes" sentence; I 
have a sentence with cloud, it will be classified as "maybe"; I have a 
sentence with rainy il will be classified as "no".

The true goal is to do some text classification to apply then for my 
research.

I have two files:

  * train.csv: a file where there are two columns/Variables one is the
    data, the other is the label

Example:

|V1 V2 1sunny yes 2sunny sunny yes 3sunny rainy sunny yes 4sunny cloud 
sunny yes 5rainy no6rainy rainy no7rainy sunny rainy no8rainy cloud 
rainy no9cloud maybe 10cloud cloud maybe 11cloud rainy cloud maybe 
12cloud sunny cloud maybe|

  * test.csv: in this file there are the new data to be classified and
    it is in one column/variable.

Example:

|V1 1sunny 2rainy 3hello 4cloud 5a 6b 7cloud 8d 9e 10f 11g 12hello|

Following the examples from the iris dataset 
(https://cran.r-project.org/web/packages/e1071/e1071.pdfandhttp://rischanlab.github.io/SVM.html) 
I created my model and then test the training data in this way:

|>library(e1071)
>train <-read.csv(file="./train.csv",sep =";",header =FALSE)
 >test <-read.csv(file="./test.csv",sep =";",header =FALSE)>attach(train)
>x <-subset(train,select=-V2)
>y <-V2 >model <-svm(V2 ~.,data =train,probability=TRUE)
>summary(model)
Call:svm(formula =V2 ~.,data =train,probability 
=TRUE)Parameters:SVM-Type:C-classification SVM-Kernel:radial 
cost:1gamma:0.08333333Numberof SupportVectors:12(444)Numberof 
Classes:3Levels:maybe noyes
>pred <-predict(model,x)
 >system.time(pred <-predict(model,x))
user system elapsed 000
 >table(pred,y)y
|

|pred maybe noyes maybe 400no040yes 004>pred 123456789101112yes yes yes 
yes nonononomaybe maybe maybe maybe Levels:maybe noyes|

||

I think it's ok until now. Now the question is: what about the test 
data? I didn't find anything for the test data. Then, I thought that 
maybe I should test the model with the test data. And I did this:

| >test V1 1sunny 2rainy 3hello 4cloud 5a 6b 7cloud 8d 9e 10f 11g 12hello 
 >z <-subset(test,select=V1)>pred 
<-predict(model,z)Errorinpredict.svm(model,z):test data does notmatch 
model !|

What is wrong here? Can you please explain me how can I test new data 
using the old train model? For two days I asked everywhere and saw many 
websites but didn't find a solution and it's very complicated because I 
think that the logic behind the code is ok, but something is missin in 
my way to express it using R.

Thank you for your help

||

	[[alternative HTML version deleted]]