[R] validate (rms package) using step instead of fastbw

Ramon Diaz-Uriarte rdiaz02 at gmail.com
Fri Feb 12 17:52:25 CET 2010


Dear Frank,

Thanks a lot for your response. And apologies for the question,
because the answer was obviously in the help.

As for the caveats on selection: yes, thanks. I think I am actually
closely following your book (eg., pp. 249 to 253), and one of the
points I am trying to make to my colleagues is that by doing variable
selection, we are actually getting a worse model (as evidenced by the
bias-corrected AUC, which is smaller if attempting variable
selection).


Best,

R.





On Fri, Feb 12, 2010 at 3:13 PM, Frank E Harrell Jr
<f.harrell at vanderbilt.edu> wrote:
> Ramon Diaz-Uriarte wrote:
>>
>> Dear All,
>>
>> For logistic regression models: is it possible to use validate (rms
>> package) to compute bias-corrected AUC, but have variable selection
>> with AIC use step (or stepAIC, from MASS), instead of fastbw?
>>
>>
>> More details:
>>
>> I've been using the validate function (in the rms package, by Frank
>> Harrell) to obtain, among other things, bootstrap bias-corrected
>> estimates of the AUC, when variable selection is carried out (using
>> AIC as criterion). validate calls predab.resample, which in turn calls
>> fastbw (from the Design package, by Harrell). fastbw " Performs a
>> slightly inefficient but numerically stable version of  fast backward
>> elimination on factors, using a method based on Lawless and Singhal
>> (1978). This method uses the fitted complete model (...)". However, I
>> am finding that the models returned by fastbw are much smaller than
>> those returned by stepAIC or step (a simple example is shown below),
>> probably because of the approximation and using the complete model.
>>
>> I'd like to use step instead of fastbw. I think this can be done by
>> hacking predab.resample in a couple of places but I am wondering if
>> this is a bad idea (why?) or if I am reinventing the wheel.
>>
>>
>> Best,
>>
>> R.
>>
>>
>> P.S. Simple example of fastbw compared to step:
>>
>> library(MASS) ## for stepAIC and bwt data
>> example(birthwt)
>> library(rms)
>>
>> bwt.glm <- glm(low ~ ., family = binomial, data = bwt)
>> bwt.lrm <- lrm(low ~ ., data = bwt)
>>
>> step(bwt.glm)
>> ## same as stepAIC(bwt.glm)
>>
>> fastbw(bwt.lrm)
>
> Hi Ramon,
>
> By default fastbw uses type='residual' to compute test statistics on all
> deleted variables combined.  Use type='individual' to get the behavior in
> step.  In your example fastbw(..., type='ind') gives the same model as
> step() and comes surprisingly close to estimating the MLEs without
> refitting.  Of course you refit the reduced model to get MLEs.  Both true
> and approximate MLEs are biased by the variable selection so beware.  type=
> can be passed from calibrate or validate to fastbw.
>
> Note that none of the statistics computed by step or fastbw were designed to
> be used with more than two completely pre-specified models.  Variable
> selection is hazardous both to inference and to prediction. There is no free
> lunch; we are torturing data to confess its own sins.
>
> Frank
>
> --
> Frank E Harrell Jr   Professor and Chairman        School of Medicine
>                     Department of Biostatistics   Vanderbilt University
>



-- 
Ramon Diaz-Uriarte
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz
Phone: +34-91-732-8000 ext. 3019



More information about the R-help mailing list