[R] e1071 SVM, cross-validation and overfitting
Robert Poor
rdpoor at gmail.com
Tue Jan 15 20:11:34 CET 2013
I am accustomed to the LIBSVM package, which provides cross-validation
on training with the -v option
% svm-train -v 5 ...
This does 5 fold cross validation while building the model and avoids
over-fitting.
But I don't see how to accomplish that in the e1071 package. (I
learned that svm(... cross=5 ...) only _tests_ using cross-validation
-- it doesn't affect the training.) Can someone clue me in how to do
something equivalent to LIBSVM's -v option?
Thanks!
- ff
P.S.: My test case follows. If you run the code, the "tuned" output
shows clear signs of over-fitting. I'd like to eliminate that.
require('e1071')
colors <- c(2, 3, 4, 5, 6)
set.seed(23) # set random seed for repeatability
# log(x) + cos(x) + noise
f <- function(x) log(x) + cos(x)
x <- seq(0.1, 5, by = 0.05)
y <- f(x) + rnorm(x, sd = 0.2)
plot(x, y, col="gray80")
legend("topleft",
c("log(x) + cos(x)", "SVM, untuned", "SVM, tuned"),
bty="n",
col=colors,
pch=20)
lines(x, f(x), col = colors[1]) # overlay noiseless data
# SVM, untuned
svmmodel1 <- svm(x, y)
print(summary(svmmodel1))
y1 <- predict(svmmodel1, x)
lines(x, y1, col = colors[2])
# SVM with tuning
tuning <- tune.svm(x, y, gamma = 2^(-4:4), cost = 2^(-2:2))
svmmodel2 <- tuning$best.model
print(summary(svmmodel2))
y2 <- predict(svmmodel2, x)
lines(x, y2, col = colors[3])
More information about the R-help
mailing list