[R] Categorical Predictors for SVM (e1071)
Vanilla Sky
skies.vanilla at gmail.com
Sat May 15 07:20:02 CEST 2010
Dear all,
I have a question about using categorical predictors for SVM, using
"svm" from library(e1071). If I have multiple categorical predictors,
should they just be included as factors? Take a simple artificial data
example:
x1<-rnorm(500)
x2<-rnorm(500)
#Categorical Predictor 1, with 5 levels
x3<-as.factor(rep(c(1,2,3,4,5),c(50,150,130,70,100)))
#Catgegorical Predictor 2, with 3 levels
x4<-as.factor(rep(c("R","B","G"),c(100,200,200)))
#Response
y<-rep(c(-1,1),c(275,225))
class<-as.factor(y)
svmdata<-cbind(class,x1,x2,x3,x4)
mod1<-svm(class~.,data=svmdata,type="C-classification")
OR
should each factor be coded as an indicator variable? E.g. for
categorical predictor 2, since there're 3 levels, we code:
(R,R,B,G,G) = ( (1,0,0),(1,0,0),(0,1,0),(0,0,1),(0,0,1) )
There are no errors when I run the model using either method, but I'm
unsure which is correct for svm in 'e1071'.
Many thanks.
V.V.
More information about the R-help
mailing list