[R] Splitting dataset for Tuning Parameter with Cross Validation

Mon Jul 13 15:21:46 CEST 2009

Seems to me if splitting once for all the bias will be big and if splitting once for each choice of parameters the variance ill be big.  

In LibSVM, for each choice of (c, gamma),  the searching script grid.py calls svm_cross_validation() which has a random split of the dataset. So seems to me it is the second method. 

As to the first one, I come to it in Ch 7 Section 10 of "The Elements of Statistical Learning" by Hastie where it says first split the dataset, then evaluate validation error CV(alpha) and vary the complexity parameter value alpha to find the one giving smallest validation error. It appears to me the splitting is once for all choices of the complexity parameter.

Thanks!

--- On Sun, 7/12/09, Tim <timlee126 at yahoo.com> wrote:

> From: Tim <timlee126 at yahoo.com>
> Subject: [R] Splitting dataset for Tuning Parameter with Cross Validation
> To: R-help at stat.math.ethz.ch
> Date: Sunday, July 12, 2009, 6:58 PM
> 
> Hi,
> My question might be a little general.
> 
> I have a number of values to select for the complexity
> parameters in some classifier, e.g. the C and gamma in SVM
> with RBF kernel. The selection is based on which values give
> the smallest cross validation error.
> 
> I wonder if the randomized splitting of the available
> dataset into folds is done only once for all those choices
> for the parameter values, or once for each choice? And why?
> 
> Thanks and regards!
> 
> ______________________________________________
> R-help at r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>