[R] [caret package] [trainControl] supplying predefined partitions to train with cross validation

Fabon Dzogang fabon.dzogang at lip6.fr
Tue May 10 21:09:59 CEST 2011


Here is an answer from Max Khun thank you !

Fabon,

If I understand the problem, there are two ways of doing it. First, if you
are using caret's trian(), rfe() or sbf(), if you set the seed right before
you call the models, they end up using the same resampled data sets. (btw,
if you use the resamples() function in caret, it checks for the same
resampling indices)

If you want to manually fix the data sets, there is an example in section
5.2 of

 http://cran.r-project.org/web/packages/caret/vignettes/caretTrain.pdf

Using LGOCV. For 10-fold CV, you can use createFolds() with an additional
argument:

> createFolds(1:10, returnTrain = TRUE)
$Fold01
[1] 2 3 4 5 6 7 8 9 10

$Fold02
[1]  1  3  4  5  6  7  8  9 10

$Fold03
[1]  1  2  4  5  6  7  8  9 10

$Fold04
[1]  1  2  3  5  6  7  8  9 10

$Fold05
[1]  1  2  3  4  6  7  8  9 10

$Fold06
[1]  1  2  3  4  5  7  8  9 10

$Fold07
[1]  1  2  3  4  5  6  8  9 10

$Fold08
[1]  1  2  3  4  5  6  7  9 10

$Fold09
[1]  1  2  3  4  5  6  7  8 10

$Fold10
[1] 1 2 3 4 5 6 7 8 9

For the trainControl() function, the index argument should be a list of
samples indices for each resample. So if I give it the the above results of
createFolds(), it will do 10-fold cv.

MAx

On Fri, May 6, 2011 at 12:32 PM, Fabon Dzogang <fabon.dzogang at lip6.fr> wrote:
> Hello,
>
> Thank you for your reply but I'm not sure your code answers my needs,
> from what I read it creates a 10-fold partition and then extracts the
> kth partition for future processing.
>
> My question was rather: once I have a 10-fold partition of my data,
> how to supply it to the "train" function of the caret package. Here's
> some sample code :
>
> folds <- createFolds(my_dataset_classes, 10)
>
> # I can't use index=folds on this one, it will train on the 1/k and test on k-1
> t_control <- trainControl(method="cv", number=10)
>
> # here I would like train to take account of my predefined folds
> model <- train(my_dataset_predictors, my_dataset_classes,
> method="svmLinear", trControl = t_control)
>
> Cheers,
> Fabon.
>
> On Fri, May 6, 2011 at 10:59 AM, neetika nath <nikkihathi at gmail.com> wrote:
>> Hi,
>> I did the similar experiment with my data. may be following code will give
>> you some idea. It might not be the best solution but for me it worked.
>> please do share if you get other idea.
>> Thank you
>> #### CODE###
>>
>> library(dismo)
>>
>> set.seed(111)
>>
>> dd<-read.delim("yourfile.csv",sep=",",header=T)
>>
>> # To keep a check on error
>>
>> options(error=utils::recover)
>>
>> # dd- data to be split for 10 Fold CV, this will split complete data into 10
>> fold
>>
>> number<-kfold(dd, k=10)
>>
>> case 1: if k ==1
>>
>> x<-NULL;
>>
>> #retrieve all the index (from your data) for 1st fold in x, such that you
>> can use it as a test set and remaining can be used as train set for #1st
>> iteration.
>>
>> x<-which(number==k)
>>
>> On Thu, May 5, 2011 at 11:43 PM, Fabon Dzogang <fabon.dzogang at lip6.fr>
>> wrote:
>>>
>>> Hi all,
>>>
>>> I run R 2.11.1 under ubuntu 10.10 and caret version 2.88.
>>>
>>> I use the caret package to compare different models on a dataset. In
>>> order to compare their different performances I would like to use the
>>> same data partitions for every models. I understand that using a LGOCV
>>> or a boot type re-sampling method along with the "index" argument of
>>> the trainControl function, one is able to supply a training partition
>>> to the train function.
>>>
>>> However, I would like to apply a 10-fold cross validation to validate
>>> the models and I did not find any way to supply some predefined
>>> partition (created with createFolds) in this setting. Any help ?
>>>
>>> Thank you and great package by the way !
>>>
>>> Fabon Dzogang.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
>
> --
> Fabon Dzogang
>



-- 
Fabon Dzogang



More information about the R-help mailing list