Frank E Harrell Jr
f.harrell at vanderbilt.edu
Fri Jul 13 14:26:48 CEST 2007
spime wrote:
> Suppose I have
>
> Training data: my.train
> Testing data: my.test
The bootstrap does not need split samples.
>
> I want to calculate bootstrap error rate for logistic model. My wrapper
> function for prediction
>
> pred.glm <- function(object, newdata) {
> ret <- as.factor(ifelse(predict.glm(object, newdata,
> type='response') < 0.4, 0, 1))
> return(ret)
> }
>
> But i thing i cant understand if i want to calculate misclassification error
> for my testing data what will be in my data in the following formula.
Misclassification error has many problems because it is not a proper
scoring rule, i.e., it is optimized by bogus models.
Frank
>
> errorest(RES ~., data=???, model=glm, estimator="boot", predict=pred.glm,
> est.para=control.errorest(nboot = 10))
>
> Using my.test got following error,
>
> Error in predict(mymodel, newdata = outbootdata) :
> unused argument(s) (newdata = list(RES = c(1, 0, 0, 0, 1, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1,
> 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0,
> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1,
> 0), CAT01 = c(4, 4, 2, 4, 4, 4, 4, 4, 4, 2, 1, 2, 2, 4, 4, 4, 1, 1, 2, 2, 1,
> 4, 1, 4, 1, 4, 2, 4, 1, 4, 2, 3, 1, 1, 3, 3, 4, 2, 4, 2, 1, 2, 2, 1, 1,
>
> please reply...
>
>
>
>
>
>
> Frank E Harrell Jr wrote:
>> spime wrote:
>>> Hi users,
>>>
>>> I need to calculate .632 (and .632+) bootstrap and the cross-validation
>>> of
>>> area under curve (AUC) to compare my models. Is there any package for the
>>> same. I know about 'ipred' and using it i can calculate misclassification
>>> errors.
>>>
>>> Please help. It's urgent.
>> See the validate* functions in the Design package.
>>
>> Note that some simulations (see http://biostat.mc.vanderbilt.edu/rms)
>> indicate that the advantages of .632 and .632+ over the ordinary
>> bootstrap are highly dependent on the choice of the accuracy measure
>> being validated. The bootstrap variants seem to have advantages mainly
>> if an improper, inefficient, discontinuous scoring rule such as the
>> percent classified correct is used.
>>
>> --
>>
>>
>
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
