# [Rd] crossvalidation in svm regression in e1071 gives incorrect results (PR#8554)

no228@cam.ac.uk no228 at cam.ac.uk
Thu Feb 2 16:28:25 CET 2006

```Full_Name: Noel O'Boyle
Version: 2.1.0
OS: Debian GNU/Linux Sarge
Submission from: (NULL) (131.111.8.96)

(1) Description of error

The 10-fold CV option for the svm function in e1071 appears to give incorrect
results for the rmse.

The example code in (3) uses the example regression data in the svm
documentation. The rmse for internal prediction is 0.24. It is expected the
10-fold CV rmse should be bigger, but the result obtained using the "cross=10"
option is 0.07. When the 10-fold CV is conducted either 'by hand' (not shown
below) or using the errorest function in ipred (shown below) the answer is
closer to 0.27, a more reasonable value.

(2) Description of system

I'm using the Debian Sarge version of R:
R : Copyright 2005, The R Foundation for Statistical Computing
Version 2.1.0  (2005-04-18), ISBN 3-900051-07-0

svm is in the e1071 package from CRAN:
Version: 1.5-11
Date: 2005-09-19

(3) Example code illustrating the problem

library(e1071)

set.seed(42)
# create data
x <- seq(0.1, 5, by = 0.05)
y <- log(x) + rnorm(x, sd = 0.2)
data <- as.data.frame(cbind(y,x))

# estimate model and predict input values
mysvm   <- svm(y ~ x,data)
result <- predict(mysvm, data)
(rmse <- sqrt(mean((result-data[,1])**2)))
# 0.2390489

# built-in 10-fold CV estimate of prediction error
for (i in 1:20) {
mysvm <- svm(y ~ x,data,cross=10)
}
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
# 0.06789 0.07089 0.07236 0.07310 0.07411 0.08434 (or something similar)

# 10-fold CV using errorest
library(ipred)
mysvm <- function(formula,data) {
model <- svm(formula,data)
function(newdata) predict(model,newdata)
}
for (i in 1:20) {
spread[i] <- errorest(y ~ x, data, model=mysvm)\$error
}