[Rd] crossvalidation in svm regression in e1071 gives incorrect	results (PR#8554)
    no228@cam.ac.uk 
    no228 at cam.ac.uk
       
    Thu Feb  2 16:28:25 CET 2006
    
    
  
Full_Name: Noel O'Boyle
Version: 2.1.0
OS: Debian GNU/Linux Sarge
Submission from: (NULL) (131.111.8.96)
(1) Description of error
The 10-fold CV option for the svm function in e1071 appears to give incorrect
results for the rmse.
The example code in (3) uses the example regression data in the svm
documentation. The rmse for internal prediction is 0.24. It is expected the
10-fold CV rmse should be bigger, but the result obtained using the "cross=10"
option is 0.07. When the 10-fold CV is conducted either 'by hand' (not shown
below) or using the errorest function in ipred (shown below) the answer is
closer to 0.27, a more reasonable value.
(2) Description of system
I'm using the Debian Sarge version of R:
   R : Copyright 2005, The R Foundation for Statistical Computing
   Version 2.1.0  (2005-04-18), ISBN 3-900051-07-0
svm is in the e1071 package from CRAN:
   Version: 1.5-11
   Date: 2005-09-19
(3) Example code illustrating the problem
library(e1071)
set.seed(42)
# create data
x <- seq(0.1, 5, by = 0.05)
y <- log(x) + rnorm(x, sd = 0.2)
data <- as.data.frame(cbind(y,x))
# estimate model and predict input values
mysvm   <- svm(y ~ x,data)
result <- predict(mysvm, data)
(rmse <- sqrt(mean((result-data[,1])**2)))
# 0.2390489
# built-in 10-fold CV estimate of prediction error
spread <- rep(0,20)
for (i in 1:20) {
    mysvm <- svm(y ~ x,data,cross=10)
    spread[i] <- mean(mysvm$MSE)
    }
summary(spread)
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
# 0.06789 0.07089 0.07236 0.07310 0.07411 0.08434 (or something similar)
# 10-fold CV using errorest
library(ipred)
mysvm <- function(formula,data) {
  model <- svm(formula,data)
  function(newdata) predict(model,newdata)
  }
spread <- rep(0,20)
for (i in 1:20) {
  spread[i] <- errorest(y ~ x, data, model=mysvm)$error
}
summary(spread)
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
#  0.2601  0.2649  0.2673  0.2696  0.2741  0.2927
Regards,
 Noel O'Boyle.
    
    
More information about the R-devel
mailing list