[R] Leave One Group Out with caret

Marcus Hanisch marcus at deltalima.org
Thu Mar 2 12:09:40 CET 2017


In Social psychology we are working on a project where we try to predict 
relationship quality (outcome) by personality (features). Main goal is 
to contribute to better match people with have higher chances to have a 
happy long lasting romantic relationship. I would be very grateful if 
you could help me with this by answering the following question:

At the moment, in R the k-fold-cv randomly sorts rows of data/people 
into the folds. A couple is represented by two rows in the dataset 
(partner 1 and partner 2) which are of course not always equally happy 
in the relationship they have with each other. But nevertheless the 
relationship quality of partner 1 and partner 2 correlate, which means 
the cases are somehow dependent. How can I sort partners of one couple 
to the same fold (but still as two cases), so that the test sample is 
always completely independent to the trainings sample? How can I write a 
Leave One Group Out CS - command in R, as it exists in Python (which I 
unfortunately cannot perform with)?

Couples are identified by the same number in the row paarID.

Here is the processing part of the code in R from the situation:

library(caret)
outcome <- "RQ_continuaryScale"
variables <- colnames(dat)[use_covar_i]
model <- paste(variables, collapse=" + ")
model <- paste(outcome, '~', model, collapse=' ')
training_config <- trainControl(method="cv", number=5, repeats = 100)
fit <- train(as.formula(model), data=dat_nomiss, "glmnet", trControl = 
training_config)

Here is some Sampledata: 
https://github.com/topepo/caret/files/796416/Testdata_couples_1.csv.2.zip


I'm quite new to R and not a pro to the statistics topic. :(

I already tried carets LGOCV method, but the results are not that what i 
expected.

When I try following:

training_config <- trainControl(method="LGOCV", number=96, p=0.97)

then i just get a sample size of 188, but i need 190.

i hope i could describe my problem well for you. i am very thankful for 
any help and support.

Best regards!



More information about the R-help mailing list