[R] caret: Error when using rpart and CV != LOOCV

Dominik Bruhn dominik at dbruhn.de
Wed May 16 18:22:10 CEST 2012


Sorry for the follow-up, but I dig deeper into the problem.

My text on the R^2 was wrong: In my opinion, and at least to Wikipedia,
R^2 yields a division by zero iff SStot (the total sum of squares) is
zero. SStot is the sum of the sum of the difference between the observed
(not the predicted) values and the mean of the observed values. As this
value is not dependeant on the the predicted/modelled values, the
occurrence of a DivByZero can not dependent on the model but only on the
data itself. In short to get a SStot=0 (and therefor a DivByZero), you
would need a training-dataset where every value equals the mean of the
training-set, therefor a constant dataset. My input and also my
trainingset is far from beeing constant, so where is the error?

Thanks again!
Dominik


On 16/05/12 17:30, Max Kuhn wrote:
> More information is needed to be sure, but it is most likely that some
> of the resampled rpart models produce the same prediction for the
> hold-out samples (likely the result of no viable split being found).
> 
> Almost every incarnation of R^2 requires the variance of the
> prediction. This particular failure mode would result in a divide by
> zero.
> 
> Try using you own summary function (see ?trainControl) and put a
> print(summary(data$pred)) in there to verify my claim.
> 
> Max
> 
> On Wed, May 16, 2012 at 11:30 AM, Max Kuhn <mxkuhn at gmail.com> wrote:
>> More information is needed to be sure, but it is most likely that some
>> of the resampled rpart models produce the same prediction for the
>> hold-out samples (likely the result of no viable split being found).
>>
>> Almost every incarnation of R^2 requires the variance of the
>> prediction. This particular failure mode would result in a divide by
>> zero.
>>
>> Try using you own summary function (see ?trainControl) and put a
>> print(summary(data$pred)) in there to verify my claim.
>>
>> Max
>>
>> On Tue, May 15, 2012 at 5:55 AM, Dominik Bruhn <dominik at dbruhn.de> wrote:
>>> Hy,
>>> I got the following problem when trying to build a rpart model and using
>>> everything but LOOCV. Originally, I wanted to used k-fold partitioning,
>>> but every partitioning except LOOCV throws the following warning:
>>>
>>> ----
>>> Warning message: In nominalTrainWorkflow(dat = trainData, info =
>>> trainInfo, method = method, : There were missing values in resampled
>>> performance measures.
>>> -----
>>>
>>> Below are some simplified testcases which repoduce the warning on my
>>> system.
>>>
>>> Question: What does this error mean? How can I avoid it?
>>>
>>> System-Information:
>>> -----
>>>> sessionInfo()
>>> R version 2.15.0 (2012-03-30)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>
>>> locale:
>>>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
>>>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
>>>  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
>>>  [7] LC_PAPER=C                 LC_NAME=C
>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] rpart_3.1-52   caret_5.15-023 foreach_1.4.0  cluster_1.14.2
>>> reshape_0.8.4
>>> [6] plyr_1.7.1     lattice_0.20-6
>>>
>>> loaded via a namespace (and not attached):
>>> [1] codetools_0.2-8 compiler_2.15.0 grid_2.15.0     iterators_1.0.6
>>> [5] tools_2.15.0
>>> -------
>>>
>>>
>>> Simlified Testcase I: Throws warning
>>> ---
>>> library(caret)
>>> data(trees)
>>> formula=Volume~Girth+Height
>>> train(formula, data=trees,  method='rpart')
>>> ---
>>>
>>> Simlified Testcase II: Every other CV-method also throws the warning,
>>> for example using 'cv':
>>> ---
>>> library(caret)
>>> data(trees)
>>> formula=Volume~Girth+Height
>>> tc=trainControl(method='cv')
>>> train(formula, data=trees,  method='rpart', trControl=tc)
>>> ---
>>>
>>> Simlified Testcase III: The only CV-method which is working is 'LOOCV':
>>> ---
>>> library(caret)
>>> data(trees)
>>> formula=Volume~Girth+Height
>>> tc=trainControl(method='LOOCV')
>>> train(formula, data=trees,  method='rpart', trControl=tc)
>>> ---
>>>
>>>
>>> Thanks!
>>> --
>>> Dominik Bruhn
>>> mailto: dominik at dbruhn.de
>>>
>>>
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>>
>> Max
> 
> 
> 


-- 
Dominik Bruhn
mailto: dominik at dbruhn.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120516/9a3c3989/attachment.bin>


More information about the R-help mailing list