[R] validate (rms package) using step instead of fastbw

Ramon Diaz-Uriarte rdiaz02 at gmail.com
Fri Feb 12 18:30:22 CET 2010


Thanks for clarifying it Frank. (Yes, no univariate screening prior to
feeding the model to validate. And the "bias correct" I guess is from
my spanglish factory of new terminology ;-)


Best,

R.

On Fri, Feb 12, 2010 at 6:26 PM, Frank E Harrell Jr
<f.harrell at vanderbilt.edu> wrote:
> Ramon Diaz-Uriarte wrote:
>>
>> Frank, let me make sure I understand:
>>
>>
>>
>> On Fri, Feb 12, 2010 at 5:57 PM, Frank E Harrell Jr
>> <f.harrell at vanderbilt.edu> wrote:
>>>
>>> Ramon Diaz-Uriarte wrote:
>>>>
>>>> Dear Frank,
>>>>
>>>> Thanks a lot for your response. And apologies for the question,
>>>> because the answer was obviously in the help.
>>>>
>>>> As for the caveats on selection: yes, thanks. I think I am actually
>>>> closely following your book (eg., pp. 249 to 253), and one of the
>>>> points I am trying to make to my colleagues is that by doing variable
>>>> selection, we are actually getting a worse model (as evidenced by the
>>>> bias-corrected AUC, which is smaller if attempting variable
>>>> selection).
>>>>
>>>>
>>>> Best,
>>>>
>>>> R.
>>>
>>> Thanks Ramon.
>>>
>>> Bias-corrected measures need to be penalized for all variable selection
>>> steps and for univariable screening.  When the penalization is complete,
>>> you
>>> usually see worse model performance as compared with full model fits, as
>>> you
>>> wrote.
>>>
>>
>> I thought that by using validate, and starting from the original
>> (non-screened) model and using "bw = TRUE" in the call to validate,
>> the bias-corrected statistics already include that penalization. After
>> all, for each one of the bootstrap iterations, the selection process
>> is carried out only with the in-bag bootstrap sample, but the "test"
>> is conducted with the out-of-bag sample. So my understanding was that
>> using the Dxy under the "corrected index" column I had accounted for
>> the screening involved in the variable selection.
>>
>>
>> Thanks,
>>
>> R.
>
> Ramon,
>
> Yes you have it right, assuming there was no univariable or other screening
> done that bw=TRUE would not know about.   [Note that test and training
> samples overlap with the ordinary bootstrap procedure though.]  I wasn't
> familiar with "bias correct AIC" and assumed that came from another
> function.  validate() produces the proper corrected indexes for the indexes
> it prints.
>
> Frank
>
>>
>>
>>
>>
>>> Cheers
>>> Frank
>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Feb 12, 2010 at 3:13 PM, Frank E Harrell Jr
>>>> <f.harrell at vanderbilt.edu> wrote:
>>>>>
>>>>> Ramon Diaz-Uriarte wrote:
>>>>>>
>>>>>> Dear All,
>>>>>>
>>>>>> For logistic regression models: is it possible to use validate (rms
>>>>>> package) to compute bias-corrected AUC, but have variable selection
>>>>>> with AIC use step (or stepAIC, from MASS), instead of fastbw?
>>>>>>
>>>>>>
>>>>>> More details:
>>>>>>
>>>>>> I've been using the validate function (in the rms package, by Frank
>>>>>> Harrell) to obtain, among other things, bootstrap bias-corrected
>>>>>> estimates of the AUC, when variable selection is carried out (using
>>>>>> AIC as criterion). validate calls predab.resample, which in turn calls
>>>>>> fastbw (from the Design package, by Harrell). fastbw " Performs a
>>>>>> slightly inefficient but numerically stable version of  fast backward
>>>>>> elimination on factors, using a method based on Lawless and Singhal
>>>>>> (1978). This method uses the fitted complete model (...)". However, I
>>>>>> am finding that the models returned by fastbw are much smaller than
>>>>>> those returned by stepAIC or step (a simple example is shown below),
>>>>>> probably because of the approximation and using the complete model.
>>>>>>
>>>>>> I'd like to use step instead of fastbw. I think this can be done by
>>>>>> hacking predab.resample in a couple of places but I am wondering if
>>>>>> this is a bad idea (why?) or if I am reinventing the wheel.
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> R.
>>>>>>
>>>>>>
>>>>>> P.S. Simple example of fastbw compared to step:
>>>>>>
>>>>>> library(MASS) ## for stepAIC and bwt data
>>>>>> example(birthwt)
>>>>>> library(rms)
>>>>>>
>>>>>> bwt.glm <- glm(low ~ ., family = binomial, data = bwt)
>>>>>> bwt.lrm <- lrm(low ~ ., data = bwt)
>>>>>>
>>>>>> step(bwt.glm)
>>>>>> ## same as stepAIC(bwt.glm)
>>>>>>
>>>>>> fastbw(bwt.lrm)
>>>>>
>>>>> Hi Ramon,
>>>>>
>>>>> By default fastbw uses type='residual' to compute test statistics on
>>>>> all
>>>>> deleted variables combined.  Use type='individual' to get the behavior
>>>>> in
>>>>> step.  In your example fastbw(..., type='ind') gives the same model as
>>>>> step() and comes surprisingly close to estimating the MLEs without
>>>>> refitting.  Of course you refit the reduced model to get MLEs.  Both
>>>>> true
>>>>> and approximate MLEs are biased by the variable selection so beware.
>>>>>  type=
>>>>> can be passed from calibrate or validate to fastbw.
>>>>>
>>>>> Note that none of the statistics computed by step or fastbw were
>>>>> designed
>>>>> to
>>>>> be used with more than two completely pre-specified models.  Variable
>>>>> selection is hazardous both to inference and to prediction. There is no
>>>>> free
>>>>> lunch; we are torturing data to confess its own sins.
>>>>>
>>>>> Frank
>>>>>
>



-- 
Ramon Diaz-Uriarte
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz
Phone: +34-91-732-8000 ext. 3019



More information about the R-help mailing list