[R] Coefficients of Logistic Regression from bootstrap - how to get them?

Gustaf Rydevik gustaf.rydevik at gmail.com
Wed Jul 23 16:39:46 CEST 2008


On Wed, Jul 23, 2008 at 4:08 PM, Michal Figurski
<figurski at mail.med.upenn.edu> wrote:
> Gustaf,
>
> I am sorry, but I don't get the point. Let's just focus on predictive
> performance from the cited passage, that is the number of values predicted
> within 15% of the original value.
> So, the predictive performance from the model fit on entire dataset was 56%
> of profiles, while from bootstrapped model it was 82% of profiles. Well - I
> see a stunning purpose in the bootstrap step here: it turns an useless
> equation into a clinically applicable model!
>
> Honestly, I also can't see how this can be better than fitting on entire
> dataset, but here you have a proof that it is.
>
> I think that another argument supporting this approach is model validation.
> If you fit model on entire data, you have no data left to validate its
> predictions.
>
> On the other hand, I agree with you that the passage in methods section
> looks awkward.
>
> In my work on a similar problem, that is going to appear in August in Ther
> Drug Monit, I used medians since beginning and all the comparisons were done
> based on models with median coefficients. I think this is what the authors
> of that paper did, though they might just have had a problem with describing
> it correctly, and unfortunately it passed through review process unchanged.
>



Hi,

I believe that you misunderstand the passage. Do you know what
multiple stepwise regression is?

Since they used SPSS, I copied from
http://www.visualstatistics.net/SPSS%20workbook/stepwise_multiple_regression.htm

"Stepwise selection is a combination of forward and backward procedures.
Step 1

The first predictor variable is selected in the same way as in forward
selection. If the probability associated with the test of significance
is less than or equal to the default .05, the predictor variable with
the largest correlation with the criterion variable enters the
equation first.


Step 2

The second variable is selected based on the highest partial
correlation. If it can pass the entry requirement (PIN=.05), it also
enters the equation.

Step 3

>From this point, stepwise selection differs from forward selection:
the variables already in the equation are examined for removal
according to the removal criterion (POUT=.10) as in backward
elimination.

Step 4

Variables not in the equation are examined for entry. Variable
selection ends when no more variables meet entry and removal criteria.
-----------


It is the outcome of this *entire process*,step1-4, that they compare
with the outcome of their *entire bootstrap/crossvalidation/selection
process*, Step1-4 in the methods section, and find that their approach
gives better result
What you are doing is only step4 in the article's method
section,estimating the parameters of a model *when you already know
which variables to include*.It is the way this step is conducted that
I am sceptical about.

Regards,

Gustaf

-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik



More information about the R-help mailing list