[R] caret: Error when using rpart and CV != LOOCV

Max Kuhn mxkuhn at gmail.com
Wed May 16 17:30:58 CEST 2012


More information is needed to be sure, but it is most likely that some
of the resampled rpart models produce the same prediction for the
hold-out samples (likely the result of no viable split being found).

Almost every incarnation of R^2 requires the variance of the
prediction. This particular failure mode would result in a divide by
zero.

Try using you own summary function (see ?trainControl) and put a
print(summary(data$pred)) in there to verify my claim.

Max

On Wed, May 16, 2012 at 11:30 AM, Max Kuhn <mxkuhn at gmail.com> wrote:
> More information is needed to be sure, but it is most likely that some
> of the resampled rpart models produce the same prediction for the
> hold-out samples (likely the result of no viable split being found).
>
> Almost every incarnation of R^2 requires the variance of the
> prediction. This particular failure mode would result in a divide by
> zero.
>
> Try using you own summary function (see ?trainControl) and put a
> print(summary(data$pred)) in there to verify my claim.
>
> Max
>
> On Tue, May 15, 2012 at 5:55 AM, Dominik Bruhn <dominik at dbruhn.de> wrote:
>> Hy,
>> I got the following problem when trying to build a rpart model and using
>> everything but LOOCV. Originally, I wanted to used k-fold partitioning,
>> but every partitioning except LOOCV throws the following warning:
>>
>> ----
>> Warning message: In nominalTrainWorkflow(dat = trainData, info =
>> trainInfo, method = method, : There were missing values in resampled
>> performance measures.
>> -----
>>
>> Below are some simplified testcases which repoduce the warning on my
>> system.
>>
>> Question: What does this error mean? How can I avoid it?
>>
>> System-Information:
>> -----
>>> sessionInfo()
>> R version 2.15.0 (2012-03-30)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
>>  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
>>  [7] LC_PAPER=C                 LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] rpart_3.1-52   caret_5.15-023 foreach_1.4.0  cluster_1.14.2
>> reshape_0.8.4
>> [6] plyr_1.7.1     lattice_0.20-6
>>
>> loaded via a namespace (and not attached):
>> [1] codetools_0.2-8 compiler_2.15.0 grid_2.15.0     iterators_1.0.6
>> [5] tools_2.15.0
>> -------
>>
>>
>> Simlified Testcase I: Throws warning
>> ---
>> library(caret)
>> data(trees)
>> formula=Volume~Girth+Height
>> train(formula, data=trees,  method='rpart')
>> ---
>>
>> Simlified Testcase II: Every other CV-method also throws the warning,
>> for example using 'cv':
>> ---
>> library(caret)
>> data(trees)
>> formula=Volume~Girth+Height
>> tc=trainControl(method='cv')
>> train(formula, data=trees,  method='rpart', trControl=tc)
>> ---
>>
>> Simlified Testcase III: The only CV-method which is working is 'LOOCV':
>> ---
>> library(caret)
>> data(trees)
>> formula=Volume~Girth+Height
>> tc=trainControl(method='LOOCV')
>> train(formula, data=trees,  method='rpart', trControl=tc)
>> ---
>>
>>
>> Thanks!
>> --
>> Dominik Bruhn
>> mailto: dominik at dbruhn.de
>>
>>
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
>
> Max



-- 

Max



More information about the R-help mailing list