[R] Package for .632 (and .632+) bootstrap and the cross-validation of ROC Parameters

Frank E Harrell Jr f.harrell at vanderbilt.edu
Fri Jul 13 14:26:48 CEST 2007


spime wrote:
> Suppose I have
> 
> Training data: my.train
> Testing data: my.test

The bootstrap does not need split samples.

> 
> I want to calculate bootstrap error rate for logistic model. My wrapper
> function for prediction
> 
> pred.glm <- function(object, newdata) {
>         ret <- as.factor(ifelse(predict.glm(object, newdata,
> type='response') < 0.4, 0, 1))
>         return(ret)
>         }
> 
> But i thing i cant understand if i want to calculate misclassification error
> for my testing data what will be in my data in the following formula.

Misclassification error has many problems because it is not a proper 
scoring rule, i.e., it is optimized by bogus models.

Frank

> 
> errorest(RES ~., data=???, model=glm, estimator="boot", predict=pred.glm, 
>        est.para=control.errorest(nboot = 10))
> 
> Using my.test got following error,
> 
> Error in predict(mymodel, newdata = outbootdata) : 
>         unused argument(s) (newdata = list(RES = c(1, 0, 0, 0, 1, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1,
> 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0,
> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1,
> 0), CAT01 = c(4, 4, 2, 4, 4, 4, 4, 4, 4, 2, 1, 2, 2, 4, 4, 4, 1, 1, 2, 2, 1,
> 4, 1, 4, 1, 4, 2, 4, 1, 4, 2, 3, 1, 1, 3, 3, 4, 2, 4, 2, 1, 2, 2, 1, 1, 
> 
> please reply...
> 
> 
> 
> 
> 
> 
> Frank E Harrell Jr wrote:
>> spime wrote:
>>> Hi users,
>>>
>>> I need to calculate .632 (and .632+) bootstrap and the cross-validation
>>> of
>>> area under curve (AUC) to compare my models. Is there any package for the
>>> same. I know about 'ipred' and using it i can calculate misclassification
>>> errors. 
>>>
>>> Please help. It's urgent. 
>> See the validate* functions in the Design package.
>>
>> Note that some simulations (see http://biostat.mc.vanderbilt.edu/rms) 
>> indicate that the advantages of .632 and .632+ over the ordinary 
>> bootstrap are highly dependent on the choice of the accuracy measure 
>> being validated.  The bootstrap variants seem to have advantages mainly 
>> if an improper, inefficient, discontinuous scoring rule such as the 
>> percent classified correct is used.
>>
>> -- 
>> Frank E Harrell Jr   Professor and Chair           School of Medicine
>>                       Department of Biostatistics   Vanderbilt University
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University



More information about the R-help mailing list